Seneca Posted February 22, 2019 Posted February 22, 2019 Finding an m dash and replacing it with an n dash between words doesn't seem to work. I want to replace all m dashes with n dashes plus add a space on each side of the en dash. I know that this RE pattern doesn't take into account m dashes that are not between words (please ignore that). Search.mov 2017 27” iMac 4.2 GHz Quad-Core Intel Core i7 • Radeon Pr 580 8GB • 64GB • Ventura 13.6.4. iPad Pro (10.5-inch) • 256GB • Version 16.4
MikeW Posted February 22, 2019 Posted February 22, 2019 40 minutes ago, Seneca said: Finding an m dash and replacing it with an n dash between words doesn't seem to work. I want to replace all m dashes with n dashes plus add a space on each side of the en dash. I know that this RE pattern doesn't take into account m dashes that are not between words (please ignore that). This, using regular expressions, will only find/replace m-dashes that have no space between words and replace it with an n-dash.
Old Bruce Posted February 22, 2019 Posted February 22, 2019 43 minutes ago, Seneca said: I want to replace all m dashes with n dashes plus add a space on each side of the en dash. I Try : ([A-Z])—([A-Z]) for find (important, turn off the Match Case) and \1 – \2 for replace. Obviously the dashes in the example are Em and En dashes, not too sure how they will translate into web browsers. Mac Pro (Late 2013) Mac OS 12.7.6 Affinity Designer 2.5.7 | Affinity Photo 2.5.7 | Affinity Publisher 2.5.7 | Beta versions as they appear. I have never mastered color management, period, so I cannot help with that.
MikeW Posted February 22, 2019 Posted February 22, 2019 2 minutes ago, Old Bruce said: ...\1 – \2 for replace. Oops. Thanks, Bruce. I forgot the spaces on each side of the n-dash in my screen shot... Edit to add, one has to type the physical space as while \s can be used in the find, it cannot be used directly in the replace...
Seneca Posted February 22, 2019 Author Posted February 22, 2019 4 minutes ago, Old Bruce said: Obviously the dashes in the example are Em and En dashes, not too sure how they will translate into web browsers. Thanks @Old Bruce. But I wanted to show that this way should also work but it doesn't. 2017 27” iMac 4.2 GHz Quad-Core Intel Core i7 • Radeon Pr 580 8GB • 64GB • Ventura 13.6.4. iPad Pro (10.5-inch) • 256GB • Version 16.4
MikeW Posted February 22, 2019 Posted February 22, 2019 10 minutes ago, Seneca said: Thanks @Old Bruce. But I wanted to show that this way should also work but it doesn't. I would need to download the movie to see what you are trying. Why not post a screen shot or the pattern you tried, as well as whether it is a regular f/r or grep?
walt.farrell Posted February 22, 2019 Posted February 22, 2019 There is definitely something odd happening with \b processing, apparently when it matches the empty string at the end of a word. It works properly when matching the empty string at the beginning of a word. Simple test case: A text frame with the string "Here I go a wandring" A Find string of \b(.) A Replace string of ~\1 First match: |Here we go a wandring First replace: ~Here we go a wandring Second match: ~Here| we go a wandring Second replace: ~Here we go a wandring 3rd match: ~Here |we go a wandring 3rd replace: ~Here ~we go a wandring 4th match: ~Here ~we| go a wandring 4th replace: ~Here ~we go a wandring 5th match: ~Here ~we |go a wandring 5th replace: ~Here ~we ~go a wandring The 2nd and 4th ones matched the end of a word, plus another character, which is correct. However, the Replace action did not insert the ~ as expected. So. the replace when \b matches an empty string at the end of a word is not operating correctly, but it works when the \b matches empty space at the beginning of a word. For another interesting test, I changed the Find string to \b(..) which makes it match only before "Here" and at the end of every word (Here| w, we| g, etc.) and found that the ~ was only inserted the first time, before Here, which is the only "beginning of word" match. (Note: I'm on Windows.) -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.3, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
walt.farrell Posted February 22, 2019 Posted February 22, 2019 56 minutes ago, MikeW said: I would need to download the movie to see what you are trying. Why not post a screen shot or the pattern you tried, as well as whether it is a regular f/r or grep? Yes, that would have been nice To save you the download, he did a find for \b—\b and a replace for – I believe his test failed for the reason I mentioned just above: his search matched the empty string at the end of a word, which for some reason makes the Replace operation fail. -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.3, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
walt.farrell Posted February 22, 2019 Posted February 22, 2019 1 hour ago, MikeW said: Edit to add, one has to type the physical space as while \s can be used in the find, it cannot be used directly in the replace... That is correct behavior. \s is for searching, and is not a space but is "any whitespace character", e.g., tab, space and probably some I'm forgetting at the moment. As it is not a fixed character, it cannot be used in the replace context. In the replace context, \s is an escape character (\) followed by an s, so Publisher is working properly. garrettm30 1 -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.3, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
fde101 Posted February 22, 2019 Posted February 22, 2019 26 minutes ago, walt.farrell said: "any whitespace character", e.g., tab, space and probably some I'm forgetting at the moment In Perl it is defined as equivalent to [ \t\n\r\f] which would be space, tab, newline, carriage return, and form feed characters. Obviously the line terminators would only be relevant if the expression is permitted to span lines, which it can in Perl when that option is specified - haven't checked to see what Publisher permits yet.
walt.farrell Posted February 22, 2019 Posted February 22, 2019 1 minute ago, fde101 said: In Perl it is defined as equivalent to [ \t\n\r\f] which would be space, tab, newline, carriage return, and form feed characters. Obviously the line terminators would only be relevant if the expression is permitted to span lines, which it can in Perl when that option is specified - haven't checked to see what Publisher permits yet. Thanks. I had certainly forgotten \n\r\f. They would also be relevant at the end of a search string, not merely one that can cross lines. (I haven't checked yet if Publisher allows line-crossing, either. Nor whether it has a way to let . match \n) I think \s should also match in Publisher all kinds of spaces (zero-width, thin, non-breaking, ...). -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.3, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
fde101 Posted February 22, 2019 Posted February 22, 2019 1 minute ago, walt.farrell said: I think \s should also match in Publisher all kinds of spaces (zero-width, thin, non-breaking, ...). That makes sense. I think the Perl definitions are mainly considering normal ASCII and not taking Unicode into account.
Old Bruce Posted February 22, 2019 Posted February 22, 2019 28 minutes ago, walt.farrell said: I think \s should also match in Publisher all kinds of spaces (zero-width, thin, non-breaking, ...). Doesn't find zero-width does find \n\r. walt.farrell 1 Mac Pro (Late 2013) Mac OS 12.7.6 Affinity Designer 2.5.7 | Affinity Photo 2.5.7 | Affinity Publisher 2.5.7 | Beta versions as they appear. I have never mastered color management, period, so I cannot help with that.
MikeW Posted February 22, 2019 Posted February 22, 2019 21 minutes ago, Old Bruce said: Doesn't find zero-width does find \n\r. Yep, the \s in the replacement was pretty dumb, wasn't it. May I claim sleep deprivation? I haven't been to sleep yet after an all-nighter. You can search for unicode in the case of the zero-width character or any other unicode entity as well using unicode in the replacement. walt.farrell 1
fde101 Posted February 23, 2019 Posted February 23, 2019 It looks like the patterns span both lines and paragraphs. It might be nice to have an option at some point to restrict a match to fit within a single paragraph, but this is still a great start. 4 hours ago, walt.farrell said: \b(..) which makes it match only before "Here" and at the end of every word I believe this is happening because the search only restarts after the pattern; if you put at least three spaces between two words then it will match the beginning of the next word again. 5 hours ago, walt.farrell said: For another interesting test, I changed the Find string to \b(..) which makes it match only before "Here" and at the end of every word (Here| w, we| g, etc.) and found that the ~ was only inserted the first time, before Here, which is the only "beginning of word" match. Agreed, Perl puts it in each matched location: $ perl $txt = 'This is a test.'; $txt =~ s/\b(..)/~\1/g; print "\n\n$txt\n\n"; ^D ~This~ ~is~ a~ ~test.
Staff AdamW Posted February 26, 2019 Staff Posted February 26, 2019 Thanks for this, very helpful. We think we've tracked down the issue matching trailing '\b's. On the subject of whitespace it seems a zero width space is not generally classified as whitespace ('\s') http://www.unicode.org/reports/tr18/#Compatibility_Properties walt.farrell 1
Recommended Posts