MikeW Posted January 30, 2020 Posted January 30, 2020 Using the following text: john.doe@company.com george.dufus@company.com ringo@company.com paul.henry.whoever@company.com The regex needs to capitalize only the first letter of the name parts (left of the @ symbol) and leave the right side alone. The regex is: (\b\w|\b\w\.\K\w)(?=.+@) Replace is: \u$1 As can be seen in the below, it seems the \K is missed and matches to the right as well: Here's what should happen: Patrick Connor 1 Quote
Staff Pauls Posted January 31, 2020 Staff Posted January 31, 2020 Hi @Mikew Which regex tester are you using ? Quote
MikeW Posted January 31, 2020 Author Posted January 31, 2020 49 minutes ago, Pauls said: Hi @Mikew Which regex tester are you using ? Hello Pauls, RegexBuddy. It's the same expression I wrote for ID where it works fine. It works as written in UltraEdit. It works on a couple website regex checkers... And it should work in APub. Thank you, Mike Quote
walt.farrell Posted January 31, 2020 Posted January 31, 2020 14 hours ago, MikeW said: The regex is: (\b\w|\b\w\.\K\w)(?=.+@) I don't understand why your regular expression is so complex, Mike. (But the important part, for solving your problem, is at the end of this post.) Given as input john.doe@company.com the matching will work as follows: \b should match before the j, and \w should match the j, and the lookahead will succeed, so the replace will be done for the j. All is good so far, and the second alternative (\b\w\.\K\w) was not used. \b will match after the n, but \w will fail on the . and all is still good. \b will match before the d, and \w will match the d, and the lookahead will succeed, so the replace will be done for the d. All is good, but note that this was done using the \b\w part of the expression. The second alternative still was not used. \b will match after the e, but \w will not match the @. All is good. \b will match before the c, and if the lookahead fails as you intend the matching is done for that line. etc. The second alternative (\b\w\.\K\w) was never used. When would it be needed? But, in particular for your question, the real problem is in your lookahead. (?=.+@) is, in a sense, too greedy, because Publisher regular expression processing operates in multi-line mode by default, and with . matching newline characters, and therefore that lookahead will find an @ anywhere later in the text. So, for the lookahead, you need (?=[^\n]+@) in order to accomplish what you want. Or, you could use use this: (?-s:(\b\w)(?=.+@)) which nests your regular expression inside the options string (?-s: ... ) which has the effect of turning off the ability of . to match a newline within the nested expression. Quote -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.2.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
MikeW Posted January 31, 2020 Author Posted January 31, 2020 25 minutes ago, walt.farrell said: ... Or, you could use use this: (?-s:(\b\w)(?=.+@)) which nests your regular expression inside the options string (?-s: ... ) which has the effect of turning off the ability of . to match a newline within the nested expression. Did you try your expression? Quote
MikeW Posted January 31, 2020 Author Posted January 31, 2020 52 minutes ago, walt.farrell said: I don't understand why your regular expression is so complex... I forgot to address this. This was originally for use in an ID grep style. Simply because the complexity grew to handle all the variants. I began with: (\b\w)(?=.+@) <--which does work for the simplistic samples in this thread. I then altered it to catch more as the above didn't. It became: (\b\w|\b\w\.\w)(?=.+@) It eventually became the one in the opening post. There were 200 or so email addresses already in the ID file and the one I used was actually needed as part of the grep style (and is a valid regex) so when new entries were added either via typing new ones, correcting existing ones, or importing text, ID would auto-correct them. Quote
walt.farrell Posted January 31, 2020 Posted January 31, 2020 29 minutes ago, MikeW said: Did you try your expression? Yes. And it works in both the beta and stable versions of Publisher. It could be simplified slightly, as there's an unneeded set of (), but it works either way: (?-s:\b\w(?=.+@)) Quote -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.2.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
walt.farrell Posted January 31, 2020 Posted January 31, 2020 4 minutes ago, MikeW said: I forgot to address this. This was originally for use in an ID grep style. Simply because the complexity grew to handle all the variants. I began with: (\b\w)(?=.+@) <--which does work for the simplistic samples in this thread. I then altered it to catch more as the above didn't. It became: (\b\w|\b\w\.\w)(?=.+@) It eventually became the one in the opening post. There were 200 or so email addresses already in the ID file and the one I used was actually needed as part of the grep style (and is a valid regex) so when new entries were added either via typing new ones, correcting existing ones, or importing text, ID would auto-correct them. Thanks. I just don't understand why the added complexity of the regular expression would ever help. I guess I need to see one of the complex examples that the simpler version doesn't catch. My real issue with the complex one is that as written, if the second alternate would ever match, the first one would match, too. So the second one should never be attempted. It would only be attempted if the first alternate failed. But in that case, it would fail, too, because it starts the same way as the first alternate. Quote -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.2.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
MikeW Posted January 31, 2020 Author Posted January 31, 2020 7 minutes ago, walt.farrell said: Yes. And it works in both the beta and stable versions of Publisher. It could be simplified slightly, as there's an unneeded set of (), but it works either way: (?-s:\b\w(?=.+@)) Thanks, Walt. Doesn't work here even with restarting APub and trying afresh. I'm only trying the release version. And Serif really needs to fix the . = newline thing (add a switch, default off). I had forgotten that, so thanks... Quote
walt.farrell Posted January 31, 2020 Posted January 31, 2020 4 minutes ago, MikeW said: And Serif really needs to fix the . = newline thing (add a switch, default off). I had forgotten that, so thanks... You're welcome. And I agree that another switch for that in the Find options would be useful, and would greatly improve the usability. Kind of weird that it's not working for you, though. You sure that you had Regular Expression ticked in the options? (That would be an odd thing to forget, but I've done that before myself ) Quote -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.2.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
MikeW Posted January 31, 2020 Author Posted January 31, 2020 17 minutes ago, walt.farrell said: You're welcome. And I agree that another switch for that in the Find options would be useful, and would greatly improve the usability. Kind of weird that it's not working for you, though. You sure that you had Regular Expression ticked in the options? (That would be an odd thing to forget, but I've done that before myself ) OK. Your expression works in the beta, not in my release. I might try resetting the release. And yes, the regex option is ticked. Quote
walt.farrell Posted January 31, 2020 Posted January 31, 2020 Thanks, Mike. If it doesn't work in the release after resetting (saving stuff first, of course), you should be able to use the first version I showed, with [^\n]+ instead of .+, I think. Quote -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.2.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
MikeW Posted January 31, 2020 Author Posted January 31, 2020 15 minutes ago, walt.farrell said: Thanks, Mike. If it doesn't work in the release after resetting (saving stuff first, of course), you should be able to use the first version I showed, with [^\n]+ instead of .+, I think. Thanks, Walt... I don't yet need to do more than "play" with APub. The only reason I tried was because of testing in various applications of what I did in ID. So it was more "play" than needed for something real-use. Have you looked at the scant regex info in help? That's the first thing I did when the original expression failed. Quote
walt.farrell Posted January 31, 2020 Posted January 31, 2020 18 minutes ago, MikeW said: Have you looked at the scant regex info in help? That's the first thing I did when the original expression failed. I may have looked at it once, a long time ago. Mostly I've used my regex knowledge, some experimenting, and online resources including https://www.boost.org/doc/libs/1_72_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html which is the implementation Publisher uses, if I remember correctly and correctly interpreted some statements by Serif staff. Quote -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.2.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
MikeW Posted January 31, 2020 Author Posted January 31, 2020 Regular expressions Regular expressions extend the capabilities and power of the Find and Replace function beyond searching for simple text strings. They are widely used across the word-processing and DTP community, with a multitude of expressions available. As a result, listing regular expressions and their syntax is beyond the scope of Affinity Publisher Help. Please use Internet resources to research and develop your own regular expressions. Affinity Publisher supports Perl and ECMAScript (with perl extensions) expressions. Regular expressions use the "C" or "POSIX" locale, while Locale Aware Regular Expressions use the locale inferred from the text being searched and locale aware collation is implied. ********************* Serif should include a hyperlink to an Internet site...and then comply to its use of expressions. But, they could use some example expressions in Help, as well as explanations for what/why those examples work. It's not like they have to cut down trees to do so. There are a billion websites. Some are great, some not so much. Some that use Javascript syntax unless you change what language is being used, etc. Quote
walt.farrell Posted January 31, 2020 Posted January 31, 2020 Good points, Mike. And you're right; that's pretty scant info. Quote -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.2.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
MikeW Posted January 31, 2020 Author Posted January 31, 2020 Walt, the below was copied out of RegexBuddy. It would make for making samples and their explanation easy for Serif to add and where to look for making one's own. (\b\w)(?=.+@) Options: Case insensitive; Exact spacing; Dot doesn’t match line breaks; ^$ match at line breaks; Numbered capture * [Match the regex below and capture its match into backreference number 1][1] `(\b\w)` * [Assert position at a word boundary (position preceded or followed—but not both—by a Unicode letter, digit, or underscore)][2] `\b` * [Match a single character that is a “word character” (Unicode; any letter or ideograph, any mark, digit, letter number, connector punctuation)][3] `\w` * [Assert that the regex below can be matched starting at this position (positive lookahead)][4] `(?=.+@)` * [Match any single character that is NOT a line break character (line feed)][5] `.+` * [Between one and unlimited times, as many times as possible, giving back as needed (greedy)][6] `+` * [Match the character “@” literally][7] `@` \u$1 * [Convert the next character to uppercase][8] `\u` * [Insert the text that was last matched by capturing group number 1][9] `$1` Created with [RegexBuddy](https://www.regexbuddy.com/) [1]: https://www.regular-expressions.info/modifiers.html [2]: https://www.regular-expressions.info/wordboundaries.html [3]: https://www.regular-expressions.info/shorthand.html [4]: https://www.regular-expressions.info/lookaround.html [5]: https://www.regular-expressions.info/dot.html [6]: https://www.regular-expressions.info/repeat.html [7]: https://www.regular-expressions.info/characters.html [8]: https://www.regular-expressions.info/replacecase.html#perl [9]: https://www.regular-expressions.info/replacebackref.html EmilyGoater 1 Quote
walt.farrell Posted January 31, 2020 Posted January 31, 2020 37 minutes ago, MikeW said: the below was copied out of RegexBuddy. It would make for making samples and their explanation easy for Serif to add and where to look for making one's own. Unfortunately, copyright restrictions may make it impossible for Serif to copy explanations or examples from most online sources. All of the information on www.regular-expressions.info is copyrighted, for example. However, if Serif is using the Boost libraries for their regular expression processing (as I think they are), they would be able to point to or use the Boost regular expression documentation at https://www.boost.org/doc/libs/1_72_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html because the license terms for that code and its documentation would permit that. Quote -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.2.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
MikeW Posted January 31, 2020 Author Posted January 31, 2020 I think an email to Jan (the owner of the site and maker of RegexBuddy) may well obtain permission to include explanations of a half dozen or so examples as occurring in RegexBuddy and giving attribution. Especially as it "hypes" both his site, his software and his books (and the books by other authors he recommends, mainly revolving around the programming aspect in nearly every language). Quote
Staff AdamW Posted February 6, 2020 Staff Posted February 6, 2020 Thanks for the discussion, we'll add an option for 'Dot matches Paragraph Break' in the next 1.8 beta. walt.farrell and MikeW 2 Quote
MikeW Posted February 6, 2020 Author Posted February 6, 2020 1 hour ago, AdamW said: Thanks for the discussion, we'll add an option for 'Dot matches Paragraph Break' in the next 1.8 beta. Thanks, Adam. I reread your post. Currently the regex does match paragraph breaks. What is needed is an option to not match paragraph breaks. Please consider either having the default to not match a paragraph break or that the choice to be persistent, sticky. Quote
Staff AdamW Posted February 7, 2020 Staff Posted February 7, 2020 Hi Mike, Yes - defaulted to 'off' (and sticky). This will change existing default behaviour but I think it's preferable in the long run. MikeW 1 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.