rcheetah Posted July 21, 2019 Posted July 21, 2019 When I use GREP search, and define that I only want to find whole lines by using the ^ and $ signs, this will get ignored. In my case, I try to format guitar chord sheets, and give all the chords a pre-defined format. I try to detect the chord lines using following GREP pattern: ^.* {4,}.*$ Although this may not be a perfect pattern, it mostly works, as the chord lines include a tremendous amount of whitespace. This pattern works perfectly in another GREP search I tried. As I specified, I only want to match one line from the beginning (^) to the end ($). When I use search and replace in AP, I get the whole text as a single result, because it includes all the line ends. See attached screenshot. Quote
walt.farrell Posted July 21, 2019 Posted July 21, 2019 You have been affected by some combination of three different things: $ basically refers to the end of a paragraph, not the end of a line. Though, if you have inserted line breaks (shift+enter) it will match at them, too. The . expression can match the end of a line (line break, or paragraph break). The .* expression is "greedy" and will match as much as it can. So if you have two paragraphs, and use ^.*$ you match all the text, including the middle paragraph break. You may want to use .*? instead, as the ? makes the expression non-greedy, and it stops the first time it can rather than the last time it can. Quote -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.2.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
rcheetah Posted July 21, 2019 Author Posted July 21, 2019 (edited) Thanks for your answer. Sadly, it still doesn’t make sense to me: I have a paragraph break after every line (in the result you can even see the pilcrow signs). So as you stated the $ should match that. This is pretty normal for grep, as . matches every character (if it’s not in one-line mode) I understand, that the standard is greedy. But even if I make the operation lazy as you suggested, the problem is still, that $ doesn’t match the paragraph break, as it should. If I’m still misunderstanding the concept, could you maybe write me a pattern that does the job correctly, so I can understand how AP’s GREP interpreter is different? Edit: I attached the file for testing purposes. Chords Test.afpub Edited July 21, 2019 by rcheetah Quote
walt.farrell Posted July 21, 2019 Posted July 21, 2019 Try: ^.*? {4,}.*?$ Quote -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.2.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
rcheetah Posted July 22, 2019 Author Posted July 22, 2019 Thanks for your answer. I should’ve mentioned that at point 3: I actually tried making the operators lazy like in your pattern, but it doesn’t work either, as $ doesn’t match the paragraph ending. Now it strangely selects only one line of text and one line of chords, although every line has a it’s own paragraph ending. Quote
walt.farrell Posted July 22, 2019 Posted July 22, 2019 4 hours ago, rcheetah said: Thanks for your answer. I should’ve mentioned that at point 3: I actually tried making the operators lazy like in your pattern, but it doesn’t work either, as $ doesn’t match the paragraph ending. Now it strangely selects only one line of text and one line of chords, although every line has a it’s own paragraph ending. OK, I misunderstood what you want. You're trying to get only the lines with chord letters, not the paired lines that my suggestion gives. With a bit more experimenting I have found something else that I think is a bug, which I will report separately. But this selects just the chord lines: (?-s)^.* {4,}.*$ The (?-s) at the beginning prevents "." from matching the end of line/paragraph. And setting that option also seems to remove the need to make the ".*" non-greedy. rcheetah 1 Quote -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.2.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
rcheetah Posted July 22, 2019 Author Posted July 22, 2019 5 hours ago, walt.farrell said: (?-s)^.* {4,}.*$ The (?-s) at the beginning prevents "." from matching the end of line/paragraph. And setting that option also seems to remove the need to make the ".*" non-greedy. Thank you very much! This pattern does work now although I still don’t understand why. Would you be so kind to explain (?-s) to me? Does this have a name? I couldn't find it in my regex cheatsheet. Is it something specifically for AP? I’d like to understand it, so I can prevent similar mistakes in the future. Also does this mean $ not matching a paragraph ending is intentional behavior? Or is it a bug? It seems pretty odd to me. I would totally understand if it doesn’t match a wrapped line obviously, and also if it would not match a line break. But not matching a paragraph break seems absurd to me, I simply don’t understand this behavior. Quote
walt.farrell Posted July 22, 2019 Posted July 22, 2019 First, $ is matching the end of a paragraph. The entire problem with your (and my) earlier attempts was that . matches an end-of-paragraph, too. So, in the expression (made non-greedy) ^.*? {4,}.*?$ you were expecting: ^ to match the beginning of a line or paragraph (and it does). .*? to match some unspecified characters (and it does). {4,} (that is, a space followed by {4,} to match 4 or more spaces. .*? to match more unspecified characters. and finally, $ to match the end of a line or paragraph (and it does). The problem comes in at step 2. Consider the first 2 lines of your text (corrected to make sure there are 4 spaces in the second line): Intro G C There's a paragraph break at the end of each of those, so in regex terms it could be thought of as looking like this: Intro$ G C$ You were expecting that the regex would not match the first line at all, as it has no spaces in it. In fact, though, at step 2 the .*? matched "Intro$G" because the . can match anything, even the end-of-line or paragraph break. Then at step 3 the 4 or more spaces matched the " C". At step 4 the .*? matches the empty string, and finally at step 5 the $ matches the $ Thus, the issue is really the fact that . can match the end-of-line, which allowed the expression to match from the beginning of line 1, rather than from the beginning of line 2. The fix was to insert (?-s) at the beginning. You can find an explanation for (?....) in several places. (?-s) is simply one of the variants of that general form: I used my Python reference material, as Python's regular expressions are reasonably close to ECMAScript/Perl regular expressions. But also, for ECMAScript (though it doesn't cover this case). And for Perl (especially the section on Modifiers). Basically, the (? signals the start of a modifier. The - negates the modifier (turns off a switch) and the s is the switch that is turned off. The s switch tells the regex processor whether . should match the end-of-line (or end-of-paragraph, in Publisher). When processing regular expressions Publisher sets the flag on, allowing . to match end-of-line and end-of-paragraph, so if you don't want it to you have to turn that flag off. rcheetah 1 Quote -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.2.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
walt.farrell Posted July 22, 2019 Posted July 22, 2019 Forgot to mention this part: Once the flag is set so that . doesn't match end-of-line or end-of paragraph, the processing works (even with greedy matching), as follows: ^ matches the beginning of the first line. .* to match some unspecified characters. {4,} (that is, a space followed by {4,} to match 4 or more spaces doesn't match anything. Thus, the first line fails the match, and Find starts looking at the second line, where: ^ matches the beginning of the line. .* matches the G. {4,} matches the spaces after the G. .* matches the C and finally $ matches the end of paragraph. Quote -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.2.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
rcheetah Posted July 22, 2019 Author Posted July 22, 2019 Thank you very very much for your detailed explanation! I now see clear, why my pattern didn’t work, and understand your correction. Thanks for taking the time! Quote
walt.farrell Posted July 22, 2019 Posted July 22, 2019 You're welcome. Quote -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.2.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.