Jump to content

Regex search for special characters, and finding footnotes


Recommended Posts

I'm trying to do a regex search to find all special characters, such as ligatures and accented characters. I haven't found a great way to do this but I've been playing with it and came up with:

[^a-zA-Z0-9‘’”“;.,-—–() !\n]

The idea is to exclude alphanumeric characters, punctuation, spaces, as well as the newline character. This works up to a point, but I still have almost 7000 matches. Some of these are great, such as finding non-curved quotes. However, many of the matches are footnotes and I think index references. Is there a way to exclude affinity-specific references like those?

Link to comment
Share on other sites

On 9/2/2024 at 6:06 PM, MikeTO said:

To find all non-ASCII characters, use:

[\x7f-\xff]

You can pair this with a specific text style to eliminate more matches.

Thank you, that works pretty well.

There is one odd thing happening, however. I have some ligature characters in the text, which include æ and œ. I looked those up and they both fall within the 7F-FF range (æ is E6 and œ is 9C) but it doesn't seem to be finding œ, only æ. If I search directly I find 15 instances of æ and oe each. Only the æ matches are showing up in the x7f-xff search. I assume this is actually a bug. I've attached a document which illustrates the problem. Presumably it's not just œ that isn't showing up, but I haven't tested anything else.

ligature search bug.afpub

Link to comment
Share on other sites

So I've been doing a bit if testing and it seems the range 128-159 (hex 80-9F) is not found by Affinity. That is presumably because that range is not defined in ISO-8859-1, while Windows-1252 does define that range (which included œ). Apparently it was added to ISO-8859-15. That's the only thing that makes sense to me. I guess I can just add those characters to the search, but part of what I'm trying to do is find characters I'm not expecting.

Link to comment
Share on other sites

[\x{0080}-\x{ffff}] will find both æ and œ.

The regex engine in APub makes me fairly happy (it does not yet do Unicode scripts or Unicode blocks, but the other \p{…} stuff seems to be there); the regrettable part is that there is nop Unicode class that would let you select letters minus ASCII letters. Also, the && syntax does not work as I had hoped:

[[:graph:]&&[^a-zA-Z0-9‘’”“;.,-—–()!]] should find your ligatures, not to mention the code point á or it'a "a" + combining accent version. Regrettably, the engine seems to dislike "&&" in a bracket expression, dooming also attempts like [\p{Letter}&&[\x{007f}-\x{ffff}]] and [[:graph:]&&[:^ascii:]]. Set subtraction syntax also fails: [[:graph:]-[:ascii:]]

O Serif: It would be nice to know which regex engine you use. Thanks!

Link to comment
Share on other sites

5 minutes ago, Felix Kasza said:

It would be nice to know which regex engine you use.

Boost.

But I don't think they've ever documented exactly which level of the engine, nor exactly which initialization parameters they pass into it (since it can operate in several different modes).

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
    Laptop 2: Windows 11 Pro 24H2,  16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU
iPad:  iPad Pro M1, 12.9": iPadOS 18.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.