Jump to content
You must now use your email address to sign in [click for more info] ×

Recommended Posts

>> No, I just mixed the things.

So when you specify that a string is to be deleted and you're assigning a paragraph style at the same time, it does delete the text string?

 

Affinity Publisher and Photo 1.8.3 (Windows). Lenovo laptop with decidedly sub-optimal monitor. At least it works.
“The wonderful thing about standards is that you can have as many of ’em as you want.”
– Anonymous cynic

Link to comment
Share on other sites

I was talking with a guy who's written extensively about publishing tools. He's a bit miffed about AP's lack of InDesign or QuarkXPress-style tag reading during text import. It might be that AP doesn't plan to try going head-to-head with InDesign in all aspects of the publishing business, such as book pagination. Absent a tagging system in AP, QXP and ID will remain more desirable. For shorter documents not requiring automation — could be a far different story. AP's graphics-related features are very impressive.

If a book contains 20 chapters with 6 paragraph styles per chapter plus two kinds of "hard" formatting (italics and boldface), you're looking at 160 separate instance of typing out the regular expressions and manually entering the style assignments into the Replace dialog. That strikes me as a lot of work that screams for automation. In the absence of a batch operations feature or at least the ability to save search/replace instructions for quick recall — doing this correctly requires that you make zero errors when you're typing the instructions out 160 times. Otherwise you end up having to undo a mistake and re-do it.

That's why the idea of using Word as an intermediate format appealed to me...in theory at least. Tagged text -> Perl script -> HTML file -> import into Word (with the CSS instructions actually creating the necessary paragraph styles) -> save as .DOCX -> import into AP document. Admittedly it'd be a workaround and not a solution, until a read-tagging-during-import feature is added.

As for something like this:

Find: <t>(.*?$)
Replace: \1, format with Title

Could you give yourself much less to type, repetitively, by doing this sort of thing:

Find: <t>
Replace: (with nothing) and format current paragraph with style named Title

Concerning the other replacements — e.g., Find: <i>(.*)</i> and Replace: \1 (format with italic) — I would suggest non-greedy matches for those rare situations in which a greedy match would give you some grief. Thus:

Find: <i>(.*?)</i>
Replace: \1 (format with italic)


 

Affinity Publisher and Photo 1.8.3 (Windows). Lenovo laptop with decidedly sub-optimal monitor. At least it works.
“The wonderful thing about standards is that you can have as many of ’em as you want.”
– Anonymous cynic

Link to comment
Share on other sites

1 hour ago, MikeA said:

>> No, I just mixed the things.

So when you specify that a string is to be deleted and you're assigning a paragraph style at the same time, it does delete the text string?

 

Yes. It was just a misuse of captured text, or a misunderstanding of regex processing in general.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

1 hour ago, MikeA said:

As for something like this:


Find: <t>(.*?$)
Replace: \1, format with Title

Could you give yourself much less to type, repetitively, by doing this sort of thing:


Find: <t>
Replace: (with nothing) and format current paragraph with style named Title

It's the capturing of the text after the <t> with the regex term (.*?$) and the use of \1 in the replacement string that allows it to work. But the capture expression might be shorter, depending on exact details of the input formatting. For example, it's possible that (.*) would suffice as long as everything following the <t> within a paragraph should be affected.

1 hour ago, MikeA said:

Concerning the other replacements — e.g., Find: <i>(.*)</i> and Replace: \1 (format with italic) — I would suggest non-greedy matches for those rare situations in which a greedy match would give you some grief. Thus:


Find: <i>(.*?)</i>
Replace: \1 (format with italic)

Yes, definitely a good idea to use non-greedy searches :)

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

1 hour ago, Lagarto said:

But does not <t> and replacing with nothing (paragraph formatting applied) leave <t> in place, when regex expression removes the tag so you do not need to clean afterwards?

I'm not sure I understand what you're saying, but a regex Find for <t> with a Replace term that only changes the formatting, will simply change the formatting of the <t> itself. The version you showed, with a captured string and a Replacement of \1 with formatting would work.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

8 minutes ago, Lagarto said:

But as the Replace formatting is a paragraph style, it is automatically extended to cover the whole paragraph. But <t> would remain so therefore the original expression works (even if it may be verbose). 

The Replace formatting is not necessarily a paragraph style. It can be a character style (and was, in my test; sorry).

But you're right: if it's a paragraph style then it should work as long as you get the <t> deleted.

By the way, I see this relevant change listed in the Release Notes for the newly released Publisher 1.7.2.420 beta:

Quote

Find and Replace with Paragraph style was only applying style to found text rather than whole paragraph

 

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

>> You could also import all chapters as separate documents into the same layout, but is there a specific reason to keep them as independent documents?

When I worked at a shop where customers supplied files in multiple formats, we didn't have the luxury of specifying precise file formatting or other preparation. We had to work with the material as-is. It's a perfectly understandable and sensible approach on the authors' parts to prepare books — and especially newsletter articles — as single files. I doubt many of them would have been happy with us if we'd said "We need you to go back and convert everything into a single large file." We wouldn't have been thrilled to have to concatenate all of the material into a single large file either. (I don't remember our ever having done that.)

In my opinion the software shouldn't force users to work in a given way and only that way. The developers might find it reasonable, but it doesn't do much for users with varying needs.

>> Also, a general note on (lack of) support for tags when importing text files: aren't tag based import filters basically a plain text based feature?

They were when I was working with XPress Tags files. (I don't know how it works with InDesign.) At that shop we received a lot of plain text material that needed formatting. Because plain text can be heavily — and very rapidly — manipulated outside a page composition or word-processing program via scripting, it can be (not always "is," but "can be") a highly efficient approach. I'm talking about scores of files updated via script in a few seconds. Perl and other such tools are very quick.

>> If so, they are not practical as there is plenty of stuff in Word format that is worth to be imported directly

They might sound impractical. But, run a complex script routine on dozens of input files in seconds rather than minutes. It might well change one's opinion about what is and isn't practical.

We could never predict how customer data would be supplied. If it arrived in some esoteric format that no program supported, we would have to strip it down to plain text and "massage" the text — say, by adding XPress Tags — for import into the page composition program. Those jobs became rather expensive for the customer due to the additional processing time.

And a customer-supplied Word file, even if ideal in some respects, might contain any number of problems that all had to be corrected via search/replace or 100% manual formatting. Word alone isn't, of itself, necessarily a saving grace.

>> Batch-based regex support with formatting criteria would allow much the same without requiring any scripting skills, and that would be kind of a minor revolutionary thing and I really hope this will happen at some point.

It would certainly move the program along, yes. An entire automated formatting routine stored for immediate access and use would be excellent.

>> It should be fairly easy to implement, compared to creating a scripting API with full access to document object model.

I hope for users' sakes that the underlying architecture makes that feasible. Would it be easy to implement if there aren't any "hooks" for it in the present design? Possibly not. (I'm not doubting that AP has a suitable architecture; I don't know anything about what's underneath the bonnet.) Someone from AP, replying to mail I sent, noted "It's early days yet." I do get that. They're just getting started...

 

Affinity Publisher and Photo 1.8.3 (Windows). Lenovo laptop with decidedly sub-optimal monitor. At least it works.
“The wonderful thing about standards is that you can have as many of ’em as you want.”
– Anonymous cynic

Link to comment
Share on other sites

>> [walt.farrell] The version you showed, with a captured string and a Replacement of \1 with formatting would work.

From what I'm reading earlier in the thread, I guess the program is designed to do this when the replacement string is empty:

•  Find a paragraph starting with "<t>" or some other tag.
•  Leave the "<t>" (or whatever it is) in place and assign, to the entire paragraph, the paragraph style named in the replacement instruction

I tried this with the Word "clone" program I have here and it does work that way. That's a bit of a surprise. I would expect that if your instruction is "delete the 'found' string" — then it's to be deleted! I'm trying to think of a use case in which I would specify deletion and yet not want the deletion to occur. So far nothing's coming to mind. Well, if it's designed that way, so it goes. And then using (.*) and "\1" makes sense after all.

In AP's regex parlance, does ^ mean start of paragraph and $ mean end of paragraph?

Affinity Publisher and Photo 1.8.3 (Windows). Lenovo laptop with decidedly sub-optimal monitor. At least it works.
“The wonderful thing about standards is that you can have as many of ’em as you want.”
– Anonymous cynic

Link to comment
Share on other sites

5 minutes ago, Lagarto said:

Sorry, do not know. I noticed that InDesign speaks different language to some extent (it is also named directly as GREP so users familiar with that "dialect" are probably quite capable with InDesign). That means that I cannot directly use regex that works with InDesign. But yes, $ means end of paragraph, it just needed to be appended with "?" to actually limit the search to paragraph per paragraph based search. In InDesign this is achieved by \r.

That's interesting. I wonder what would happen if "$" were simply omitted from the regular expression search (in AP, I mean). Something to try later on...

I guess it explains more about expressions such as (.*?$). It caught my eye because if I were doing searches and replacements with Perl regular expressions it would be typical for the "$" to appear outside the ")" — or a line boundary would be specified as "\n" and might or might not appear within the parens — depends on the situation.

All these different approaches to regular expression use ... it's the kind of thing about which some cynic remarked: The wonderful thing about standards is, you can have as many of 'em as you want.

Affinity Publisher and Photo 1.8.3 (Windows). Lenovo laptop with decidedly sub-optimal monitor. At least it works.
“The wonderful thing about standards is that you can have as many of ’em as you want.”
– Anonymous cynic

Link to comment
Share on other sites

28 minutes ago, MikeA said:

That's a bit of a surprise. I would expect that if your instruction is "delete the 'found' string" — then it's to be deleted! I'm trying to think of a use case in which I would specify deletion and yet not want the deletion to occur. So far nothing's coming to mind. Well, if it's designed that way, so it goes. And then using (.*) and "\1" makes sense after all.

In AP's regex parlance, does ^ mean start of paragraph and $ mean end of paragraph?

The instruction is not "to delete the found string", and that might be for convenience and compatibility with standad (non-regex) searches.

Consider, for example, the possibility that I want to find all "i.e." and italicize them. Using a standard Find/Replace I could Find "i.e." and replace with Character Style: emphasis. If an empty "replace" box meant delete in all cases, then I would have to replace "i.e." with "i.e." + emphasis. WIth the current approach the user avoids having to type the string twice, or copy/paste.

So, the way it seems to work is that if the replacement string is blank, the replacement formatting is applied to whatever was found. And that applies to both normal and regex searches.

In the absence of replacement formatting, an empty replacement string does mean delete, for both normal and regex searches.

Regarding ^ and $: It's more complicated.

  • ^ matches the beginning of a line, which seems to be defined as the beginning of a text frame, or (with show hidden characters on) just before a Paragraph break character or Line Break character.
  • $ seems to match just before a Paragraph break character, or a Line break character, and also just before the Frame Break character.

Somehow those feel wrong, but I'm too tired to play with it more at the moment.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

21 minutes ago, Lagarto said:

In that scenario, you'd need to run the regex tagging procedure multiple times.

Permit me to offer: ugh.

23 minutes ago, Lagarto said:

But considering the fact that all page layout programs currently support more or less directly e.g. Word tables and end and footnotes, plain text based procedures are quite impractical in situations where the text uses such features.

Point taken.

23 minutes ago, Lagarto said:

I do not think that this is at all a question of underlying architecture. It is just a question of allowing saving of mutliple regex expressions as a saved task that can be run in one go. Nothing but a UI and serialization thing.

I mis-spoke a bit. I was thinking about an underlying architecture that permits the development of third-party extensions. I don't think we can reasonably expect a given program's authors to have either the time or inclination to code every possible new feature someone might need. As Quark showed only too well, sometimes their own implementation of a feature was not so good, and along came someone else with a far better approach — and users were willing to pay for the extensions if they provided functionality not available in the program "proper" (or if the principal authors hadn't done a good job of it). Some extensions to QXP turned out to be ridiculous, buggy, and decidedly overpriced. But others were of the "can't live without it" variety.

Word's an example. A lot of people loathed and despised its ribbon-bar — especially when Microsoft made it impossible to switch back to classic view. I worked at Microsoft at the time and, trust me, a WHOLE lot of people who worked there absolutely hated that ribbon bar. Outlook's and Excel's ribbon bars weren't so awful. Word's was bloody-awful. But, when the Office group said "jump" the company as a whole pretty much had to respond "Ok — how high?" and there was nothing to be done about it. Then along came certain "MVP" developers with inexpensive add-ons that enabled people to restore classic view when they wanted it. It was a huge "plus" and I bought one of those things the first I heard of it. "Extensibility" is a good thing.

Affinity Publisher and Photo 1.8.3 (Windows). Lenovo laptop with decidedly sub-optimal monitor. At least it works.
“The wonderful thing about standards is that you can have as many of ’em as you want.”
– Anonymous cynic

Link to comment
Share on other sites

10 minutes ago, MikeA said:

All these different approaches to regular expression use ... it's the kind of thing about which some cynic remarked: The wonderful thing about standards is, you can have as many of 'em as you want.

It is always important to specify the dialect of regular expression processing in use, and even then one may find surprises.

Publisher's Help documents that:

Quote

Affinity Publisher supports Perl and ECMAScript (with perl extensions) expressions. Regular expressions use the "C" or "POSIX" locale, while Locale Aware Regular Expressions use the locale inferred from the text being searched and locale aware collation is implied.

But some of the fun we're talking about comes in with considerations of "what is a line" and what might the developers have done to try to make regex use conform to the users' expectations.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

3 minutes ago, walt.farrell said:

Consider, for example, the possibility that I want to find all "i.e." and italicize them. Using a standard Find/Replace I could Find "i.e." and replace with Character Style: emphasis. If an empty "replace" box meant delete in all cases, then I would have to replace "i.e." with "i.e." + emphasis. WIth the current approach the user avoids having to type the string twice, or copy/paste.

Matter of preference, I suppose. I would gladly trade off the minor inconvenience of having to re-type "i.e." if what came with it was the ability to leave the replacement string empty and thus successfully delete the "found" string in a single pass. If that isn't the design, so be it, and the "(.*) and "\1" approach is effective instead.

Affinity Publisher and Photo 1.8.3 (Windows). Lenovo laptop with decidedly sub-optimal monitor. At least it works.
“The wonderful thing about standards is that you can have as many of ’em as you want.”
– Anonymous cynic

Link to comment
Share on other sites

6 minutes ago, Lagarto said:

But opposing the plugin dependency is also what made InDesign big at the time: it simply just blasted Quark off the business by offering everything in-built, and by delivering a perfectly crafted scripting API and open plugin interface that allowed designers and their programming-skilled partners to do amazing things without (at least recurring) charge.

I got out of that business before InDesign became a "thing" and never used it.

Adobe opposed a plugin dependency — yet supported plugin development by providing an open plugin interface? I'm not clear what that means.

I can imagine certain users being downright delighted to be able to script home-grown solutions. I can also imagine plenty of users wanting to do their design work without also having to become developers. In which case, the presence of third-party tools — someone else does the heavy lifting development-wise — is a huge plus. As for recurring charge: Did plugin developers for QXP (or other page-composition programs) also go to some kind of subscription model?

Affinity Publisher and Photo 1.8.3 (Windows). Lenovo laptop with decidedly sub-optimal monitor. At least it works.
“The wonderful thing about standards is that you can have as many of ’em as you want.”
– Anonymous cynic

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.