Jump to content
THESE FORUMS ARE READ-ONLY: Please Read Me ×

Recommended Posts

Posted

I'm trying to do a regex search to find all special characters, such as ligatures and accented characters. I haven't found a great way to do this but I've been playing with it and came up with:

[^a-zA-Z0-9‘’”“;.,-—–() !\n]

The idea is to exclude alphanumeric characters, punctuation, spaces, as well as the newline character. This works up to a point, but I still have almost 7000 matches. Some of these are great, such as finding non-curved quotes. However, many of the matches are footnotes and I think index references. Is there a way to exclude affinity-specific references like those?

Posted
On 9/2/2024 at 6:06 PM, MikeTO said:

To find all non-ASCII characters, use:

[\x7f-\xff]

You can pair this with a specific text style to eliminate more matches.

Thank you, that works pretty well.

There is one odd thing happening, however. I have some ligature characters in the text, which include æ and œ. I looked those up and they both fall within the 7F-FF range (æ is E6 and œ is 9C) but it doesn't seem to be finding œ, only æ. If I search directly I find 15 instances of æ and oe each. Only the æ matches are showing up in the x7f-xff search. I assume this is actually a bug. I've attached a document which illustrates the problem. Presumably it's not just œ that isn't showing up, but I haven't tested anything else.

ligature search bug.afpub

Posted

So I've been doing a bit if testing and it seems the range 128-159 (hex 80-9F) is not found by Affinity. That is presumably because that range is not defined in ISO-8859-1, while Windows-1252 does define that range (which included œ). Apparently it was added to ISO-8859-15. That's the only thing that makes sense to me. I guess I can just add those characters to the search, but part of what I'm trying to do is find characters I'm not expecting.

Posted

[\x{0080}-\x{ffff}] will find both æ and œ.

The regex engine in APub makes me fairly happy (it does not yet do Unicode scripts or Unicode blocks, but the other \p{…} stuff seems to be there); the regrettable part is that there is nop Unicode class that would let you select letters minus ASCII letters. Also, the && syntax does not work as I had hoped:

[[:graph:]&&[^a-zA-Z0-9‘’”“;.,-—–()!]] should find your ligatures, not to mention the code point á or it'a "a" + combining accent version. Regrettably, the engine seems to dislike "&&" in a bracket expression, dooming also attempts like [\p{Letter}&&[\x{007f}-\x{ffff}]] and [[:graph:]&&[:^ascii:]]. Set subtraction syntax also fails: [[:graph:]-[:ascii:]]

O Serif: It would be nice to know which regex engine you use. Thanks!

Posted
5 minutes ago, Felix Kasza said:

It would be nice to know which regex engine you use.

Boost.

But I don't think they've ever documented exactly which level of the engine, nor exactly which initialization parameters they pass into it (since it can operate in several different modes).

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop 1:  Windows 11 Pro 24H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
    Laptop 2: Windows 11 Pro 24H2,  16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU
iPad:  iPad Pro M1, 12.9": iPadOS 26.0, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.6.1

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.