Jump to content
rcheetah

GREP search doesn’t search per line

Recommended Posts

When I use GREP search, and define that I only want to find whole lines by using the ^ and $ signs, this will get ignored. 

In my case, I try to format guitar chord sheets, and give all the chords a pre-defined format. I try to detect the chord lines using following GREP pattern:

^.* {4,}.*$

Although this may not be a perfect pattern, it mostly works, as the chord lines include a tremendous amount of whitespace. This pattern works perfectly in another GREP search I tried. As I specified, I only want to match one line from the beginning (^) to the end ($). When I use search and replace in AP, I get the whole text as a single result, because it includes all the line ends. See attached screenshot. 

 

Bildschirmfoto 2019-07-21 um 21.42.29.png

Share this post


Link to post
Share on other sites

You have been affected by some combination of three different things:

  1. $ basically refers to the end of a paragraph, not the end of a line. Though, if you have inserted line breaks (shift+enter) it will match at them, too.
  2. The . expression can match the end of a line (line break, or paragraph break).
  3. The .* expression is "greedy" and will match as much as it can. So if you have two paragraphs, and use ^.*$ you match all the text, including the middle paragraph break. You may want to use .*? instead, as the ? makes the expression non-greedy, and it stops the first time it can rather than the last time it can.

-- Walt

Windows 10 Home, version 20H2 (19042.685),
   Desktop: 16GB memory, Intel Core i7-6700K @ 4.00GHz, GeForce GTX 970
   Laptop (2021-04-06):  32GB memory, Intel Core i7-10750H @ 2.60GHz
, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU
Affinity Photo 1.9.2.1035 and 1.9.2.1005 Beta   / Affinity Designer 1.9.2.1035 and 1.9.2.1005 Beta  / Affinity Publisher 1.9.2.1035 and 1.9.2.1024 Beta

Share this post


Link to post
Share on other sites

Thanks for your answer. Sadly, it still doesn’t make sense to me:

  1. I have a paragraph break after every line (in the result you can even see the pilcrow signs). So as you stated the $ should match that.
  2. This is pretty normal for grep, as . matches every character (if it’s not in one-line mode)
  3. I understand, that the standard is greedy. But even if I make the operation lazy as you suggested, the problem is still, that $ doesn’t match the paragraph break, as it should.

If I’m still misunderstanding the concept, could you maybe write me a pattern that does the job correctly, so I can understand how AP’s GREP interpreter is different?

 

Edit: I attached the file for testing purposes. 

Chords Test.afpub

Edited by rcheetah

Share this post


Link to post
Share on other sites

Try:

^.*? {4,}.*?$

 


-- Walt

Windows 10 Home, version 20H2 (19042.685),
   Desktop: 16GB memory, Intel Core i7-6700K @ 4.00GHz, GeForce GTX 970
   Laptop (2021-04-06):  32GB memory, Intel Core i7-10750H @ 2.60GHz
, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU
Affinity Photo 1.9.2.1035 and 1.9.2.1005 Beta   / Affinity Designer 1.9.2.1035 and 1.9.2.1005 Beta  / Affinity Publisher 1.9.2.1035 and 1.9.2.1024 Beta

Share this post


Link to post
Share on other sites

Thanks for your answer. I should’ve mentioned that at point 3: I actually tried making the operators lazy like in your pattern, but it doesn’t work either, as $ doesn’t match the paragraph ending. Now it strangely selects only one line of text and one line of chords, although every line has a it’s own paragraph ending.

Share this post


Link to post
Share on other sites
4 hours ago, rcheetah said:

Thanks for your answer. I should’ve mentioned that at point 3: I actually tried making the operators lazy like in your pattern, but it doesn’t work either, as $ doesn’t match the paragraph ending. Now it strangely selects only one line of text and one line of chords, although every line has a it’s own paragraph ending.

OK, I misunderstood what you want. You're trying to get only the lines with chord letters, not the paired lines that my suggestion gives.

With a bit more experimenting I have found something else that I think is a bug, which I will report separately. But this selects just the chord lines:

(?-s)^.* {4,}.*$

The (?-s) at the beginning prevents "." from matching the end of line/paragraph. And setting that option also seems to remove the need to make the ".*" non-greedy.


-- Walt

Windows 10 Home, version 20H2 (19042.685),
   Desktop: 16GB memory, Intel Core i7-6700K @ 4.00GHz, GeForce GTX 970
   Laptop (2021-04-06):  32GB memory, Intel Core i7-10750H @ 2.60GHz
, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU
Affinity Photo 1.9.2.1035 and 1.9.2.1005 Beta   / Affinity Designer 1.9.2.1035 and 1.9.2.1005 Beta  / Affinity Publisher 1.9.2.1035 and 1.9.2.1024 Beta

Share this post


Link to post
Share on other sites
5 hours ago, walt.farrell said:

(?-s)^.* {4,}.*$

The (?-s) at the beginning prevents "." from matching the end of line/paragraph. And setting that option also seems to remove the need to make the ".*" non-greedy.

Thank you very much! This pattern does work now :) although I still don’t understand why. 
Would you be so kind to explain
 (?-s) to me? Does this have a name? I couldn't find it in my regex cheatsheet. Is it something specifically for AP? I’d like to understand it, so I can prevent similar mistakes in the future. 

Also does this mean $ not matching a paragraph ending is intentional behavior? Or is it a bug? It seems pretty odd to me. I would totally understand if it doesn’t match a wrapped line obviously, and also if it would not match a line break. But not matching a paragraph break seems absurd to me, I simply don’t understand this behavior. 

Share this post


Link to post
Share on other sites

First, $ is matching the end of a paragraph. The entire problem with your (and my) earlier attempts was that . matches an end-of-paragraph, too.

So, in the expression (made non-greedy)

^.*? {4,}.*?$

you were expecting:

  1. ^ to match the beginning of a line or paragraph (and it does).
  2. .*? to match some unspecified characters (and it does).
  3.  {4,} (that is, a space followed by {4,} to match 4 or more spaces.
  4. .*? to match more unspecified characters.
  5. and finally, $ to match the end of a line or paragraph (and it does).

The problem comes in at step 2. Consider the first 2 lines of your text (corrected to make sure there are 4 spaces in the second line):

Intro
G    C

There's a paragraph break at the end of each of those, so in regex terms it could be thought of as looking like this:

Intro$
G    C$

You were expecting that the regex would not match the first line at all, as it has no spaces in it.

In fact, though, at step 2 the .*? matched "Intro$G" because the . can match anything, even the end-of-line or paragraph break. Then at step 3 the 4 or more spaces matched the "    C". At step 4 the .*? matches the empty string, and finally at step 5 the $ matches the $

Thus, the issue is really the fact that . can match the end-of-line, which allowed the expression to match from the beginning of line 1, rather than from the beginning of line 2.

The fix was to insert (?-s) at the beginning. You can find an explanation for (?....) in several places. (?-s) is simply one of the variants of that general form:

Basically, the (? signals the start of a modifier. The - negates the modifier (turns off a switch) and the s is the switch that is turned off. The s switch tells the regex processor whether . should match the end-of-line (or end-of-paragraph, in Publisher). When processing regular expressions Publisher sets the flag on, allowing . to match end-of-line and end-of-paragraph, so if you don't want it to you have to turn that flag off.


-- Walt

Windows 10 Home, version 20H2 (19042.685),
   Desktop: 16GB memory, Intel Core i7-6700K @ 4.00GHz, GeForce GTX 970
   Laptop (2021-04-06):  32GB memory, Intel Core i7-10750H @ 2.60GHz
, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU
Affinity Photo 1.9.2.1035 and 1.9.2.1005 Beta   / Affinity Designer 1.9.2.1035 and 1.9.2.1005 Beta  / Affinity Publisher 1.9.2.1035 and 1.9.2.1024 Beta

Share this post


Link to post
Share on other sites

Forgot to mention this part: Once the flag is set so that . doesn't match end-of-line or end-of paragraph, the processing works (even with greedy matching), as follows:

  1. ^ matches the beginning of the first line.

  2. .* to match some unspecified characters.

  3.  {4,} (that is, a space followed by {4,} to match 4 or more spaces doesn't match anything.

 

Thus, the first line fails the match, and Find starts looking at the second line, where:

  1. ^ matches the beginning of the line.
  2. .* matches the G.
  3.  {4,} matches the spaces after the G.
  4. .* matches the C
  5. and finally $ matches the end of paragraph.

 


-- Walt

Windows 10 Home, version 20H2 (19042.685),
   Desktop: 16GB memory, Intel Core i7-6700K @ 4.00GHz, GeForce GTX 970
   Laptop (2021-04-06):  32GB memory, Intel Core i7-10750H @ 2.60GHz
, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU
Affinity Photo 1.9.2.1035 and 1.9.2.1005 Beta   / Affinity Designer 1.9.2.1035 and 1.9.2.1005 Beta  / Affinity Publisher 1.9.2.1035 and 1.9.2.1024 Beta

Share this post


Link to post
Share on other sites

Thank you very very much for your detailed explanation! I now see clear, why my pattern didn’t work, and understand your correction. Thanks for taking the time! 

Share this post


Link to post
Share on other sites

You're welcome.


-- Walt

Windows 10 Home, version 20H2 (19042.685),
   Desktop: 16GB memory, Intel Core i7-6700K @ 4.00GHz, GeForce GTX 970
   Laptop (2021-04-06):  32GB memory, Intel Core i7-10750H @ 2.60GHz
, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU
Affinity Photo 1.9.2.1035 and 1.9.2.1005 Beta   / Affinity Designer 1.9.2.1035 and 1.9.2.1005 Beta  / Affinity Publisher 1.9.2.1035 and 1.9.2.1024 Beta

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...

Important Information

Please note there is currently a delay in replying to some post. See pinned thread in the Questions forum. These are the Terms of Use you will be asked to agree to if you join the forum. | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.