Jump to content
You must now use your email address to sign in [click for more info] ×

Find upper case / lower case letters using Regex


Recommended Posts

Hi, I'm having trouble catching l/u case letters using regex expressions in Publisher. I try to use several syntaxes. With "[a-z]", "[A-Z]" it finds all cases in both expressions, "[[:lower:]]" and "[[:upper:]]" doesn't work too. Is it some bug or do I make something wrong? Thanks for reply.

Affinity Suite 2.3.1 | iMac 5K (2017) 24GB, macOS Monterey 12.6.9

Link to comment
Share on other sites

A regular expression like "[ A-Za-z] " specifies to match any single uppercase or lowercase letter. The regular expression "[A-Z][a-z]*" matches any sequence of letters that starts with an uppercase letter and is followed by zero or more lowercase letters. The special character * after the closing square bracket specifies to match zero or more occurrences of the character set. The regular expression " B[IAU]G " matches the strings "BIG", "BAG", and "BUG", but does not match the string "BOG". AFAIK now they follow the Perl style for reg expressions so look after some definitions for that.

☛ Affinity Designer 1.10.8 ◆ Affinity Photo 1.10.8 ◆ Affinity Publisher 1.10.8 ◆ OSX El Capitan
☛ Affinity V2.3 apps ◆ MacOS Sonoma 14.2 ◆ iPad OS 17.2

Link to comment
Share on other sites

Hi @v_kyr, thanks for info. The sign * i know to use, but my problem is not to catch string, but catch only lower or upper case letters. Actualy, Publisher catches both. See my printscreen. The expression [a-z] also doesn't catch diacritic letters. I'm looking for somthing in InDesign known like "\l" (any lower case) or "\u" (any uppercase).

On my screen you can also see, that Publisher has find also some "empty" locations. Lines without bold-highlighted finds.

Screenshot 2020-09-04 at 15.56.31.png

Screenshot 2020-09-04 at 15.56.54.png

Affinity Suite 2.3.1 | iMac 5K (2017) 24GB, macOS Monterey 12.6.9

Link to comment
Share on other sites

There is a Local Aware version of the Regular Expression you might try that and see if it will select the diacritic letters if you have your system set to the language being used.

1256953267_ScreenShot2020-09-04at7_08_17AM.png.3fd0de97a73032e1e3090b99bb8e14fc.png

Mac Pro (Late 2013) Mac OS 12.7.4 
Affinity Designer 2.4.1 | Affinity Photo 2.4.1 | Affinity Publisher 2.4.1 | Beta versions as they appear.

I have never mastered color management, period, so I cannot help with that.

Link to comment
Share on other sites

The Match Case option is honored even for regular expressions, if I remember correctly. So make sure it's on if you're trying to match only one case.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

1 hour ago, Laganama said:

Hi, I'm having trouble catching l/u case letters using regex expressions in Publisher. I try to use several syntaxes. With "[a-z]", "[A-Z]" it finds all cases in both expressions, "[[:lower:]]" and "[[:upper:]]" doesn't work too. Is it some bug or do I make something wrong? Thanks for reply.

[[:lower:]]" and "[[:upper:]] should work without using the match case option--and as far as I can see, they do not. So I would call it a bug.

It does work with the match case option.

They do work in my text editor when set to use Perl type of regular expressions.

Link to comment
Share on other sites

17 minutes ago, Laganama said:

Hi @Old Bruce, no change even with Local Aware. My system language is set to English. Do I need to swith to Slovak to perform this functionality doing well?

 

I am a monophonic anglophone, I know nothing of other languages but I think changing the language in the Text Style may work on its own.

31406023_ScreenShot2020-09-04at7_39_42AM.png.c7b55152dfbb102a217b4ddb76bcf3ff.png 

Mac Pro (Late 2013) Mac OS 12.7.4 
Affinity Designer 2.4.1 | Affinity Photo 2.4.1 | Affinity Publisher 2.4.1 | Beta versions as they appear.

I have never mastered color management, period, so I cannot help with that.

Link to comment
Share on other sites

34 minutes ago, walt.farrell said:

The Match Case option is honored even for regular expressions, if I remember correctly. So make sure it's on if you're trying to match only one case.

This helped a bit, but still at some locations with uppers (line 1 or 7 of the F&R table on the screenshot).

13 minutes ago, Old Bruce said:

I am a monophonic anglophone, I know nothing of other languages but I think changing the language in the Text Style may work on its own.

31406023_ScreenShot2020-09-04at7_39_42AM.png.c7b55152dfbb102a217b4ddb76bcf3ff.png 

Yes, I have Slovak. Actualy it looks like only the "ž" isn't catched (see the screen).

Screenshot 2020-09-04 at 16.32.56.png

Affinity Suite 2.3.1 | iMac 5K (2017) 24GB, macOS Monterey 12.6.9

Link to comment
Share on other sites

31 minutes ago, Laganama said:

Thanks for your discovery!

Walt mentioned this first. However note that it is a setting that is sticky--i.e., persists until changed again. Sticky settings, like this one, shouldn't be necessary.

31 minutes ago, Laganama said:

Do you someone know the purpose of these "empty" findings?

No, I don't beyond APub showing context...which is also something I don't like.

Link to comment
Share on other sites

6 hours ago, MikeW said:

However note that it is a setting that is sticky--i.e., persists until changed again. Sticky settings, like this one, shouldn't be necessary.

Every program that I've ever used that supports regex searching has a sticky setting like that, either explicitly or implicitly.

The program will invoke the regular expression processor with the "match case" flag either set on, or set off. There's no real choice for the programmer; that option is either turned on explicitly, or turned off explicitly, or left to default to however the regular expression processor defaults it (either on or off). The only possible issue with a sticky option like that in Publisher, that I see, is that it's hidden and you might forget how you last left it set, unless you remember to look.

To avoid that possibility, you could start your regular expression with either:

  • (?i) to do a case insensitive match (that is, the pattern "a" will match either "a" or "A"). This is the same as having Publisher's Match Case option off.
  • (?-i) to do a case sensitive match (that is, the pattern "a" will not match "A").
10 hours ago, MikeW said:

[[:lower:]]" and "[[:upper:]] should work without using the match case option--and as far as I can see, they do not. So I would call it a bug.

I'm not sure why they should work as you expect if you've told the regular expression processor to do a case insensitive match. In that case, "a" will match "A", so perhaps \l (or [[:lower:]]) should match either "a" or "A", too.

But really, I'm unsure how that should work. NotePad++ also considers "\l" to match "A" if you tell it to do a case insensitive search, just as the regex processor that Publisher uses (Boost) apparently does. On the other hand, the Python RegEx processor operates as you expect, Mike. So we have two different behaviors across 2 or 3 implementations. (I'm not sure what RegEx processor NotePad++ uses.)

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

[[:lower:]] and [[:upper:]] are explicitly for any and all lower/upper case respectively. There should not be any flag neccessary.

Boost supports [[:lower:]]/[[:upper:]]

There are certainly instances where the case switch has its uses. I'm not arguing about it's presence. 

Link to comment
Share on other sites

7 hours ago, MikeW said:

[[:lower:]] and [[:upper:]] are explicitly for any and all lower/upper case respectively. There should not be any flag neccessary.

And there isn't any flag necessary, but Publisher has one (as do other programs that support regex processing) and so you have to be aware of how it is set.

7 hours ago, MikeW said:

Boost supports [[:lower:]]/[[:upper:]]

Yes, it does. And, apparently, when the "case insensitive" flag is turned on (or, Match Case is off in the Publisher Formatting options), [[:lower:]] will match upper- or lower-case characters, as will [[:upper:]].

So it's up to the user to be aware of the options he's chosen, or to specifically set them in the regex when it's important. If you simply begin all regular expressions with either (?i) or (?-i) then you won't depend on the setting of the Match Case option in Publisher.

As I mentioned, Boost seems to operate one way (as does Notepad++), Python another. I could probably test how Perl operates but I'm too lazy to do that right now, as I would have to relearn enough Perl to make that test.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

22 hours ago, Laganama said:

The expression [a-z] also doesn't catch diacritic letters.

You can test your expressions on regex101, though it's probably also more dependent on the APub used implementation and how that is setup to operate here then.

regextest.png.94482aa71c84512da15ec931329e81d5.png

☛ Affinity Designer 1.10.8 ◆ Affinity Photo 1.10.8 ◆ Affinity Publisher 1.10.8 ◆ OSX El Capitan
☛ Affinity V2.3 apps ◆ MacOS Sonoma 14.2 ◆ iPad OS 17.2

Link to comment
Share on other sites

2 hours ago, walt.farrell said:

And there isn't any flag necessary, but Publisher has one (as do other programs that support regex processing) and so you have to be aware of how it is set.

Yes, it does. And, apparently, when the "case insensitive" flag is turned on (or, Match Case is off in the Publisher Formatting options), [[:lower:]] will match upper- or lower-case characters, as will [[:upper:]].

So it's up to the user to be aware of the options he's chosen, or to specifically set them in the regex when it's important. If you simply begin all regular expressions with either (?i) or (?-i) then you won't depend on the setting of the Match Case option in Publisher.

As I mentioned, Boost seems to operate one way (as does Notepad++), Python another. I could probably test how Perl operates but I'm too lazy to do that right now, as I would have to relearn enough Perl to make that test.

Seems like we're going around in circles.

Yes, my text editor also has a match case switch. But it is unneeded (i.e., it doesn't need set to on) if I use a Perl case flag such as APub requires. My text editor also doesn't need the case sensitive flag when using an explicit case construction such as [[:lower:]] or [[:upper:]].

I don't believe any other option--especially a half-hidden option that is sticky--should be needed in APub. You seem to shrug your shoulders over APub's inability to comply with the Boost library. So we differ on our opinion in this regard. So I'll leave it at that.

Link to comment
Share on other sites

23 minutes ago, MikeW said:

My text editor also doesn't need the case sensitive flag when using an explicit case construction such as [[:lower:]] or [[:upper:]].

That simply means that the regex processor used in your editor doesn't care about the flag when processing [[:lower:]] or [[:upper:]]. But the regex processor used in Publisher does care.

 

24 minutes ago, MikeW said:

I don't believe any other option--especially a half-hidden option that is sticky--should be needed in APub. You seem to shrug your shoulders over APub's inability to comply with the Boost library.

I don't think it should be as hidden as it is.

But unless you've used the Boost library directly (I haven't), how do you know that Publisher is not complying with it? From all the evidence so far, it may be Boost that is completely responsible here. I seriously doubt that Publisher is examining the regex search string in any detail. It's (I believe) merely passing the "ignorecase" flag depending on the option the user has chosen, and it would be Boost that is choosing to honor it for processing those regex terms.

There are many differences between regex engines, and perhaps this is simply another one.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

According to the Boost documentation, [[:lower:]]/[[:upper:]] are case classes that are "class names [that] are always supported by Boost.Regex."

The case sensitive flags are used for "literal strings" to allow, or disallow, case sensitivity.

Pretty much as simple as that, Walt. Now we could go on arguing what those statements mean and how they apply to APub...but that's mostly meaningless without one of the developers stepping into this discussion and explaining the rationale of their implementation of this/these features.

We seem to simply disagree on how case classes are implemented and APub's requirement to also add the case flag sticky setting to do what a case class is intended to do without it.

It's not a big deal in the larger scheme of things--unless one is coming from other applications that interpret/implement the Boost library differently. In the case of APub, there are X number of people coming from InDesign (and others) where such a flag (Match Case) isn't even present as the person doing/constructing the regex is expected to follow Perl/Boost regular expression syntax.

So when such a person is expecting Perl/Boost compliance comes to APub, threads such as this will be written wherein you (or others) can feel free to write the explanations of how to do it in APub. This will/may be more an issue if/when scripting ever comes to Affinity applications.

OK. Now I'm really done flogging this dead horse.

Link to comment
Share on other sites

25 minutes ago, MikeW said:

According to the Boost documentation, [[:lower:]]/[[:upper:]] are case classes that are "class names [that] are always supported by Boost.Regex."

Yes, I saw that, but it avoids the question of whether the "ignorecase" flag applies to those constructs or not.  The "always supported" just distinguishes one set of character classes from another set that is only valid in Unicode mode.

We can be done with this. But I expect that a Serif developer would simply tell us that they invoke Boost with either the "ignorecase" flag or the "do not ignorecase" flag set, depending on what the user specified. And everything beyond that is up to Boost.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

Also, RegExBuddy, when configured to use Boost, acts the same as Publisher. If you tell it to do a case insensitive match, the [[:lower:]] matches both lower-case or upper-case letters:

image.png.a7d3fa0ccc2e6ef31834dea9b82e1997.png

(RegExBuddy, when configured to operate as Perl, also agrees with that case-insensitive matching behavior.)

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

1 hour ago, walt.farrell said:

Also, RegExBuddy, when configured to use Boost, acts the same as Publisher. If you tell it to do a case insensitive match, the [[:lower:]] matches both lower-case or upper-case letters:

Testing applications need to account for all compilations/implementations of the chosen regex flavors they support. I think that RegExBuddy is simply doing that.

Try pressing Reset.

Capture_000705.png.28d675f1f2f2c25acf3ed29201f487eb.png

Link to comment
Share on other sites

Your screenshot shows a Case Sensitive match, Mike. Try it with Case Insensitive. That's what I believe we're discussing.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

3 minutes ago, walt.farrell said:

Your screenshot shows a Case Sensitive match, Mike. Try it with Case Insensitive. That's what I believe we're discussing.

Yes, it is currently on. Did you press the Reset button like I asked?

Why? No matter the state of RegexBuddy's case sensitivity flag of the regex type we are both displaying, if/when I press Reset, RegexBuddy turns Case sensitive on.

Link to comment
Share on other sites

On 9/4/2020 at 7:54 AM, Laganama said:

Yes, I have Slovak. Actualy it looks like only the "ž" isn't catched (see the screen).

Is this still happening? I am only guessing but it may be that the glyph used is not an actual coded lower case letter in that font. Does it happen with other fonts? I copy and pasted your text" only the "ž" isn't catched " (past tense of catch is Caught ) and I could catch it, plus I typed in the same letter and also caught it. This is what makes me think it might be a weird font issue as opposed to a regular expression problem

On 9/4/2020 at 8:13 AM, Laganama said:

Do you someone know the purpose of these "empty" findings?

I seem to recall seeing something like that months back, probably a bug but I couldn't reproduce it consistently. Does it still show up after restarting Publisher?

Mac Pro (Late 2013) Mac OS 12.7.4 
Affinity Designer 2.4.1 | Affinity Photo 2.4.1 | Affinity Publisher 2.4.1 | Beta versions as they appear.

I have never mastered color management, period, so I cannot help with that.

Link to comment
Share on other sites

On 9/4/2020 at 4:32 PM, MikeW said:

[[:lower:]]" and "[[:upper:]] should work without using the match case option--and as far as I can see, they do not. So I would call it a bug.

It does work with the match case option.

As far I understand Regex by default is case sensitive – and needs an extra option set to work case INsensitive.

From this perspective Affinity works correct but its UI is misleadingly confusing because it occurs vice versa, as if case sensitivity would be the extra option: if the option is ticked then case sensitivity is active. So, instead of offering an option called "Match Case" it could rather offer as a literally extra option "Don't match case", "Ignore case" or "Case insensitive".

The current option "Match Case" appears weird in particular for [[:upper]] and [[:lower]], which explicitly ask for case sensitivity. Here it becomes more obviuos that the additional option in Affinity should deactivate case sensitivity, not activate it.

macOS 10.14.6 | MacBookPro Retina 15" | Eizo 27" | Affinity V1

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.