GREP find/replace

garrettm30 · February 23, 2019

7 hours ago, Peter Kahrel said:

Great, thank you. It's incomplete (e.g. no lookahead and lookbehind yet) and still a bit buggy (bug reports submitted), but it's early days and I look forward to a full implementation.

P.

Are you sure about lookahead and lookbehind not working? It seems to work for me, for example

(?<=\s)[a-zA-Z]+(?=\s)

will find words that that have a space character on both sides. Maybe you found a case where it doesn't work?

Edit: With further testing, I noticed that the above will find the results, but it will not highlight them in text and it will not replace. Clearly a bug. If I remove the positive lookahead and lookbehind, then the remaining regex behaves as expected.

Edited February 23, 2019 by garrettm30

walt.farrell · February 23, 2019

10 minutes ago, garrettm30 said:
Are you sure about lookahead and lookbehind not working? It seems to work for me, for example
(?<=\s)[a-zA-Z]+(?=\s)
will find words that that have a space character on both sides. Maybe you found a case where it doesn't work?

Edit: With further testing, I noticed that the above will find the results, but it will not highlight them in text and it will not replace. Clearly a bug. If I remove the positive lookahead and lookbehind, then the remaining regex behaves as expected.

You should probably post in one of the Publisher Bug forums about that. The "not highlighting" sounds similar to a recent report about \b and the "not replacing" sounds similar to what I found when \b matches at the end of a word (also posted as a bug).

Seneca · February 23, 2019

1 hour ago, garrettm30 said:

With further testing, I noticed that the above will find the results, but it will not highlight them in text and it will not replace

I was about to report the same thing.

Positive look ahead and and positive look behind are not working as expected.

When performing searches with either of these the find is listed in the find list but is not highlighted and no change performed.

Negative look ahead and negative look behind work as expected.

I concur with @walt.farrell that you should report it as a bug.

Peter Kahrel · February 24, 2019

Yes, confirmed. The matches are displayed but not highlighted. And i reported the bug. Looks like a general problem with location markers (lookarounds are a kind of location marker).

P.

Patrick Connor · February 24, 2019

On 2/23/2019 at 3:41 PM, walt.farrell said:

You should probably post in one of the Publisher Bug forums about that.

Agreed. The QA team do not monitor request threads for bugs, so can participants please report future findings in the appropriate bugs forum as Peter Kahrel has

Oval · February 26, 2019

On 22. Februar 2019 at 12:47 PM, Patrick Connor said:

Regular expressions in find

Will it be possible to combine them with the (old) expressions for field input? For (that bad) example if we want to find all “numbers”, that would follow Abs(x)=5 or is it already possible?

walt.farrell · February 26, 2019

3 hours ago, Oval said:

Will it be possible to combine them with the (old) expressions for field input? For (that bad) example if we want to find all “numbers”, that would follow Abs(x)=5 or is it already possible?

If I knew what you mean by "numbers that would follow Abs(x)=5" I could provide a definite answer, but I don't quite understand where you'd want the numbers to appear. But I suspect the answer is yes, as a regular expression can contain both static text values and variable values.

Can you provide some specific examples of what you would want to find?

fde101 · February 26, 2019

36 minutes ago, walt.farrell said:

If I knew what you mean by "numbers that would follow Abs(x)=5"

While I agree the wording of that is not the most clear, I believe he is asking that an expression be evaluated for each result as an additional condition of the match. In Perl terminology, substitute $1 for the x and this would find the number 5 anywhere that it stands alone within a number sequence (ex. just searching for "5" might find the ones in "25" or "452"; restricting to "whole word" type searches would potentially omit "a5" or "5b" - while searching on regex "(\d+)" with the restriction "abs($1)=5" would locate "a5" and "5b" but not "25", "452" or "a52").

Oval · February 27, 2019

19 hours ago, walt.farrell said:

Can you provide some specific examples of what you would want to find?

Thanks for the efforts. Sorry, yes a bad example; here the hopefully clear question (actually was a developer question addressed to Patrick Connor):

Will it be possible to combine Perl expressions with the expressions for field input (like width and capheight)?

Burny · February 27, 2019

Hi everyone !

"Find" works for me but "replace" do not...

https://i.imgur.com/ixb939u.gifv

walt.farrell · February 27, 2019

5 hours ago, Oval said:

Thanks for the efforts. Sorry, yes a bad example; here the hopefully clear question (actually was a developer question addressed to Patrick Connor):

Will it be possible to combine Perl expressions with the expressions for field input (like width and capheight)?

Thanks. I think the answer is no, but I must admit that I'm still confused.

I don't see how regular expressions would be useful for, e.g., the Transform Panel or other places with field input. Nor do I see how variables like width, etc. would be useful in Find/Replace.

But I also wonder if by "Perl expressions" you mean something other than regular expressions used for searching.

walt.farrell · February 27, 2019

23 minutes ago, Burny said:

Hi everyone !

"Find" works for me but "replace" do not...

https://i.imgur.com/ixb939u.gifv

You should post that in the appropriate Publisher Bugs forum (Mac, Windows) where it will be seen and tracked by the Serif QA (Quality Assurance) staff.

fde101 · February 27, 2019

1 hour ago, walt.farrell said:

Nor do I see how variables like width, etc. would be useful in Find/Replace.

Not as useful as the values captured by the expression itself... particularly if using the expression as the replacement text.

For a highly contrived and unlikely example, if ^ represents the start of a paragraph (have not tested this but guessing it might) and I have a bunch of lines that look like:

50 Apples 20 Pears 10 Peaches

24 Apples 10 Pears 19 Peaches

I could use something like:

Find regex: ^(\d+) Apples (\d+) Pears (\d+) Peaches

Replace with expression (Perl syntax): "$& " . ($1 + $2 + $3) . " Total"

To add a total to each set of data.

Bhikkhu Pesala · February 27, 2019

I was unfamiliar with the term GREP, but cased that it was somehow related to RegEx. I tried Ctrl+F in the latest build 249, but got nowhere. Probably time to read the help file again, but I fear this amount of coding will be beyond me. For PagePlus, I keep a handful of RegEx like that below in a text file for cutting and pasting to the search dialogue:

Quote

(.[ ,-ﬄ]+)+. ... and any following characters such as punctuation or space:

I hope that Serif can come up with a user-friendly way to select some common GREP expressions without users having to learn to code.

MikeW · February 27, 2019

6 minutes ago, Bhikkhu Pesala said:

...

I hope that Serif can come up with a user-friendly way to select some common GREP expressions without users having to learn to code.

There are millions of tutorials & examples on the web that can satisfy many/most cases. But there will always be a learning curve. People will always need to ask for help in situations they have trouble with.

Something akin to InDesign's GREP "builder" would be helpful though, as well as being able to save expressions for later use.

Each of those > symbols flyout for the actual choices in each category. However, even the above will not prevent people needing specific help in building an expression that works in their particular situation. A look on sites like the Adobe forums, the InDesign GREP Facebook group, StackOverflow, etc., will demonstrate even professionals need help. I frequently enough make mistakes when adapting expressions I find on the web, because of poor copy/pasting of expressions, editing existing expressions, copying the wrong bits to the incorrect f or r fields, etc.

BTW, while I do not have a central text file for saving expressions, I do have a folder on my hard-drive with specific text files with the f/r expressions and sample text that I used them for. The files are named for what they do.

garrettm30 · February 27, 2019

35 minutes ago, Bhikkhu Pesala said:

I tried Ctrl+F in the latest build 249, but got nowhere.

Just to make sure: have you set your search to look for regular expressions? You do this by clicking the gear above and to the right of the find field. In the dropdown menu that pops up when you click the little gear, make sure to check "Regular Expression." I tried the regex that you posted, and I did come up with results.

16 minutes ago, MikeW said:

There are millions of tutorials & examples on the web that can satisfy many/most cases.

I've posted this before, but my favorite resource for building regex is the website https://regex101.com. It looks pretty technical, but is great for trying out regex before applying it to a whole document. I currently use it both with Indesign and programming, but when I need a new regex that's different from what I have done before, I first go to that website to construct it. I select a large sample of text from my document and place it in the Test String field, which is useful for testing what I have. I also use the Substitution field to test out results for a find/replace operation. The Quick Reference section is very helpful when I can't remember the exact syntax. I don't pay attention to the Explanation or the Match information section. For configuration advice, I recommend you set the regex flags (which is done from the far right of Regular Expression text field) to g (global - shows all matches rather than just one) and m (multiline - allows you to use ^ to indicate start of a paragraph and $ for its end). Once you get the regex to work like you want, you can copy it from the website and paste it into the find field in Affinity Publisher or Indesign. It works great.

garrettm30 · February 27, 2019

50 minutes ago, Bhikkhu Pesala said:

I was unfamiliar with the term GREP, but cased that it was somehow related to RegEx. I tried Ctrl+F in the latest build 249, but got nowhere. Probably time to read the help file again, but I fear this amount of coding will be beyond me. For PagePlus, I keep a handful of RegEx like that below in a text file for cutting and pasting to the search dialogue:

I hope that Serif can come up with a user-friendly way to select some common GREP expressions without users having to learn to code.

If they allow us to save previous searches (suggested in another thread), then they could supply a few common settings.

For any who are not familiar with regex, let me give a few simple examples with some explanation to give you a taste for what it can do. (If you already are familiar with regex, you can just skip the rest of this post. I think I got carried away.)

By the way, "regex" is the common shorthand for "regular expressions", which in turn just refers to a kind of search by pattern matching.

--------------------------------
Example 1

First, let's say that you have received a story or novel from an author, and it is your job to do the layout. The author has indented nearly every paragraph with tabs or spaces, but you prefer to control indentation with a paragraph style. This means you now need to delete all the tabs or spaces. You could do that one-by-one—and hasten the onset of carpel tunnel. You might be able to do a simple find/replace to replace every tab with nothing at all, if tabs are only used at the beginning of the paragraph and nowhere else. Or you could try something like this regex:

^\s+

That tiny regex pattern will select any space at the start of each paragraph, whether it is a tab, several spaces, or any combination. The ^ indicates that what follows must be at the beginning of the line. The \s counts for any white space such as tab, space, nonbreaking space, etc. The + indicates one or more of the preceding, in other words, as many tabs or spaces as there are at the beginning of the line.

To remove those opening spaces, replace them with nothing. That is, leave the replace field empty and press replace all. Every tab or space at the start of a paragraph will be removed.

Similarly, $ indicates the end of a line, so you could remove all trailing space at the end of each paragraph by searching for this pattern: \s+$

--------------------------------
Example 2

Next, let's say that the author used a lot of numbered lists, but he manually typed out each number, like this:

1. First item
2. Third
3. Fourth
4. Fifth

Maybe you want to change the manual numbering to a paragraph style with automatic numbering. Or maybe you just need to insert an extra item, such as if the author forgot one thing and wants to include it. Here, I purposely made the "mistake" to leave out "Second," and now my numbering is off. To fix it manually, I would have to type the new item and then renumber every item after it. The disadvantage to manual numbering is probably obvious enough that I do not need to do so much explaining. So now you set up a paragraph style with automatic numbering and you are ready to apply it. To select the manual numbering we need to delete, try this regex:

^[0-9]+\.\s+

This will select the "1. ", "2. ", "3. " portions of the first part of each line, and so on. It looks more complicated, but we can start with what we know from the previous example.

The ^ indicates the start of a line.

Next comes the brackets [ ]. You use this to define any single character you are looking for. For example, [abc] will match either a, b, or c. In our example, we have defined a range by entering 0-9, which is the same as typing [0123456789]. Any one of those characters will count as a match. You can also use ranges like [a-zA-Z] to match every unaccented letter in lowercase and uppercase.

[0-9] by itself will match one digit only, so "10. " and above would not match. So we add the plus sign: [0-9]+ will match one or more of the digits 0 through 9.

Next we want to match a period, but the period character has a special meaning in regex: it represents any single character. In this case, we want to match an actual period, so we do that by "escaping" the period, which is done by adding backslash before it: \. This tells the computer to match any literal period and not "any single character." You can escape any other characters that have special meaning in the same way. If you were looking for a literal plus sign, you put \+ . For a literal bracket, \[ , etc.

Finally, we have a space to search for. The author could have used a tab, a single space, multiple spaces, and he may not have been consistent. No matter, we can handle that with \s+ , which, as we saw earlier, will match one or more of any space character.

Now that we are correctly selecting the part that we want to remove (the "1. ", "2. ", etc.), lets do some replacing. Leave the "Replace with" field empty to just delete those manually typed numbers. But don't stop there: you can simultaneously apply the numbered style you want by selecting a replace format or paragraph style.

You see, once you have it set up, you can accurately format every numbered list item in an a whole document at the push of a button.

--------------------------------
Example 3

Let's say you have a long list of phone numbers in this format:

574-234-1998
930-823-1818

And you want them in this common US format:

1 (574) 234-1998
1 (930) 823-1818

There are several ways to go about it, but I might start with this:

([0-9]+)-([0-9]+)-([0-9]+)

We already know from the earlier examples that [0-9]+ will match one or more of any numeral.

In this case, you see three of those numeral matches, but they are surrounded by parentheses: ([0-9]+) . Parentheses do two things in regex. First, they group patterns together when things get complex (which is not necessary in this case), kind of like a math equation. Second, they also define "capture groups" that we can reuse later. You will see the benefit of a capture group in a moment.

The last thing I have done here is insert a simple hyphen, twice placed between these capture groups of numbers. It is a literal character, meaning it matches an actual hyphen in the text. Our regex pattern will match any string of numbers divided by two hyphens in this format (where x is any number):

xxx-xxx-xxxx
x-xxx-xx
xxxxxxx-x-xxxxx

(You see in this case it was not necessary to specify how many digits are in each group. If you had a string like 1-19321332-8, it would also match. If you had such numbers in the text that you did not want to match, you would have to be more specific.)

Now, let's replace using this pattern:

1 ($1) $2-$3

We are adding the numeral 1, parentheses, and a couple spaces, but we are also putting back the numbers that were found in the capture groups. $1 refers to whatever match was found between the first set of parentheses. $2 and $3 likewise refer to the matches inside the second and third sets of parentheses. This is what a capture group is and how it can be used.

As a variation on this example, maybe you live in a different country, and the the starting and ending format is different. Maybe you are starting with raw numbers from a database, like this:

5742341998
9308231818

And you want this format:

574.234.1998
930.823.1818

Again, no problem. We just need to alter our search and replace patterns a little bit. Search:

([0-9]{3})([0-9]{3})([0-9]{4})

Replace with :

$1.$2.$3

I have not changed the search pattern very much: I have taken out the hyphens, and instead of the + to indicate one or more matched character, I have instead used {3} and {4} to specify exactly 3 or 4 matched characters.

--------------------------------
Summary

I have tested all of these examples in Affinity Publisher, and they do work correctly in the current beta.

There are a few more basics I have not covered, but I hope these examples give you an idea of the power of regular expression. I regularly make thousands of corrections in book with a short series searches that are of common use in our material. It takes me about 10-30 seconds to do those thousands of corrections (since I saved the find/replace expressions that I use most).

But the value in understanding regex is not simply in having a few saved searches that may or may not match the problem before you; rather, it is in knowing how to build patterns to accomplish what you need. Anytime you have a repetitive task for changing text, find/replace can probably do it for you, and regex is what takes it beyond simple text replacement.

When you first start using it, it may take you just as long to figure out a pattern as to just manually make any necessary changes, so the temptation is to stick to what you know. However, the same time spent doing a boring repetitive task could instead be used as an exercise to broaden your skills while accomplishing the same goal. Depending on your interests, you may even enjoy the challenge. Regardless, next time you come across a similar problem, you will be able to apply what you learned, and over time your efficiency will greatly increase.

(Did anyone actually make it to the end of my long post? I did indeed get carried away.)

MikeW · February 27, 2019

2 minutes ago, garrettm30 said:

...

(Did anyone actually make it to the end of my long post? I did indeed get carried away.)

I did...good examples.

As well, I often get lists of names I need to swap around parts of. GREP to the rescue. Or text that is repeated (like movie listing titles that repeat twice in the listing that come from a service), or, or, or. Impossible to do with a normal find/replace, tedious & error prone to do by hand but easy in GREP. Heck, I've used GREP to change spellings in books destined for the US from Europe or vice versa. And because I use tagged text so much, those get GREP f/r for every book.

Mike

v_kyr · February 27, 2019

See for example one of the many ebooks about that regular expressions topic ...

Learning Regular Expressions eBook (PDF)

Bhikkhu Pesala · February 27, 2019

4 hours ago, MikeW said:

Something akin to InDesign's GREP "builder" would be helpful though, as well as being able to save expressions for later use.

InDesign’s GREP Builder looks remarkably similar to the Find and Replace dialogue in PagePlus.

Maybe another submenu could be added where users could store their most frequently used Regular Expressions. The drop list remembers about 20, but the one that I need often disappears from the bottom of the list. I spend some time finding the one that I want in my text file. It is all too easy to make an error and replace every occurrence of the letter e from your publication, or worse, replace something and not notice until it is too late to undo your mistake.

Ordinary users who do pick up the software once a month to publish their club Newsletter, are not likely to get far up the learning curve, and like me, will find it quicker to do things manually than to figure out what alchemy is required to automate the process.

MikeW · February 27, 2019

2 hours ago, Bhikkhu Pesala said:

...

Ordinary users who do pick up the software once a month to publish their club Newsletter, are not likely to get far up the learning curve, and like me, will find it quicker to do things manually than to figure out what alchemy is required to automate the process.

Yep, PP and ID use a similar "builder" and they are quite handy.

While I suspect there will be many users who choose a manual method—which is perfectly fine—there will be some who step out and explore more automated ways. I know many ID users who choose the same manual option, but the same applied to PP users as well. That's all fine and good. I'm just glad this made it into APub sooner than later.

I do hope some/many will opt to learn enough for their regular tasks if GREP will speed up what they do regularly. At the same time I hope people also learn that GREP isn't always the best way to accomplish a task. Using GREP can also be a rabbit hole of time wasting.

Bhikkhu Pesala · February 27, 2019

I finally discovered why Ctrl+F was not doing anything. I had to unhide the studio. Not very user-friendly. If the Find Panel is in the studio, then Ctrl+F should show the studio, or at least show the Find Panel.

On my 1200x1600 Portrait monitor, the studio takes up almost 2/3 of the Window. I only need the left panel to do the find and replace. The left/right pointing cursor will let me make the panels wider, but will not let me collapse them or reduce their width.

The list of search results showing the context is very helpful. It is not obvious what \n\r means. Presumably, it is something to do with New Line and/or new Paragraph. It would be helpful to update the search results automatically when selecting options like “match case” or changing the formatting.

v_kyr · February 27, 2019

Well as a dev/coder regular expressions are nearly daily usage and thus nothing fancy at all, since all programming languages and dev tools make use of them in the one or other way. Though there are of course also differences in reg exp implementation and their handling, depending on the used prog languages etc.

When working on critical code/text it's thus also always helpful to have first some test bed runs, before firing the setup expressions onto the major code. For such purposes there are a bunch of regexp test environments available, personally I usually use more full blown prog language dedicated/specific tools therefor, but there are also some more generic online services available here, like for example ...

Old Bruce · February 27, 2019

2 hours ago, MikeW said:

Using GREP can also be a rabbit hole of time wasting.

Tell me about it, once spent a few hours getting the pattern just right so it could find the two instances and properly replace them. Sigh, good times.

Wosven · February 27, 2019

7 minutes ago, Old Bruce said:

Tell me about it, once spent a few hours getting the pattern just right so it could find the two instances and properly replace them. Sigh, good times.

Hehe, perhaps there's a Murphy's law about it.
On long documents, that's usually when I notice an error occuring more than 3 times than I use RE to find… a 4th one or only 2 more.

GREP find/replace

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Top Posters In This Topic

Popular Posts

garrettm30

gaspar_schott

Fierys

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Important Information