Jump to content
You must now use your email address to sign in [click for more info] ×

Affinity Publisher - meaning of each special character indicator


Recommended Posts

Hello! I keep Text > Show special characters enabled. Most of it makes sense and is extremely useful, for example carriage returns, tabs, spaces etc.

There have been a couple, over time, which I have not recognised. I can't seem to find a reference as to what they all mean? For example, there is a character at the start of the following text frame which is showing as an up/down arrow (it's not a frame decoration!) and I have no clue what it is?

1882635024_Screenshot2021-04-15at13_36_01.png.6d5d7a3b54e55178cbeefb2137422731.png 

Link to comment
Share on other sites

It seems to be a character used to indicate (to software) how to treat the text.

It also seems to be described as a zero-width non-breaking space. I'm not sure what value that has as it seems to be a contradiction by definition, haha.

What's really useful is that Affinity shows us it's there. In most cases that's enough. If we were to have some sort of lookup in the documentation they could use embedded images I guess.

Link to comment
Share on other sites

Interestingly, though, a "zero-width non-breaking space" is not one of the spaces that Affinity allows you to insert. Edit: at least not by name via the menus. You can type a regular space, then use Alt+U to enable editing it to U+FEFF.

 

23 minutes ago, w_yne_t_ylor said:

It also seems to be described as a zero-width non-breaking space. I'm not sure what value that has as it seems to be a contradiction by definition, haha.

It separates two words (or two strings of any text, really) but does not allow them to split over lines.

I'm not sure it has much use, but there are other forms of non-breaking space, and other widths of breaking spaces, so someone must have a good use case for them :)

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

3 hours ago, w_yne_t_ylor said:

There have been a couple, over time, which I have not recognised. I can't seem to find a reference as to what they all mean? For example, there is a character at the start of the following text frame which is showing as an up/down arrow (it's not a frame decoration!) and I have no clue what it is?

Based on its location at the beginning of the text, and assuming you did not enter it, it may be a BOM (byte order mark).
My guess is you placed this text from another source, not entered directly.

BOM has the same code as the old zero width non-breaking space (FEFF).
In some applications a BOM character is placed at the beginning of text to signal certain things.
So because it is at the beginning of your text my guess is you are bringing it in with the text.
Where is the text coming from?

Note: zero width non-breaking space is deprecated in Unicode; word joiner is now preferred.

Link to comment
Share on other sites

13 minutes ago, LibreTraining said:

In some applications a BOM character is placed at the beginning of text to signal certain things.

Yes, this is what I thought. Like some form of inline meta data.

13 minutes ago, LibreTraining said:

Where is the text coming from?

I inherited this (large) file as an IDML exported from, *cough*, InDesign. So I can't really say. But you assume correctly that it is not original, typed, content.

37 minutes ago, walt.farrell said:

so someone must have a good use case for them

Absolutely!

Link to comment
Share on other sites

42 minutes ago, LibreTraining said:

Based on its location at the beginning of the text, and assuming you did not enter it, it may be a BOM (byte order mark).
My guess is you placed this text from another source, not entered directly.

BOM has the same code as the old zero width non-breaking space (FEFF).
In some applications a BOM character is placed at the beginning of text to signal certain things.

Thanks. That makes sense.

One common use for the BOM, for those who may not know, is to mark a file as being UTF-8 rather than Latin-1 (ISO-8859-1).

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

On 4/15/2021 at 9:52 AM, walt.farrell said:

One common use for the BOM, for those who may not know, is to mark a file as being UTF-8 rather than Latin-1 (ISO-8859-1).

Well, it's really useful to distinguish UTF-16/UCS-2, which are streams of 16-bit/2-byte characters, from streams of 8-bit/1-byte characters such as UTF-8 and ASCII or any Windows code page.  Additionally, it tells the reading software what the native byte-order of the source material is.   If the first bytes, read individually, are 0xFF 0xFE, the software knows it has to byte-swap every 16-bit/2-byte word to get the proper Unicode encoding.  If the first two bytes are 0xFE 0xFF, the software doesn't have to do that.

If you have only used Intel-compatible processors, this is probably not familiar to you.  Look up "bigendian" and "littleendian" and ignore references to Jonathan Swift and Gulliver's Travels.  Software that's producing UTF-16/UCS-2 output could run on either a bigendian or a littleendian platform, and will naturally want to write 16-bit/2-byte quantities in the native byte ordering for that platform.  To allow the files to be read correctly on either type of platform, the writing software will start the file with a 2-byte BOM.

If the software is producing UTF-8 output, littleendian or bigendian doesn't matter.  And there's not really any need for the BOM.  Character encoding is usually identified through some other means for single-byte streams.

Link to comment
Share on other sites

2 minutes ago, sfriedberg said:

If the software is producing UTF-8 output, littleendian or bigendian doesn't matter.  And there's not really any need for the BOM.  Character encoding is usually identified through some other means for single-byte streams.

Without the BOM, a program that can read/understand both Latin-1 and UTF-8 sometimes has to guess what encoding the input file uses. And sometimes the guess will be wrong, and you'll end up with bad characters in the file.

On the other hand, a program that is not expecting a BOM character will end up with a garbage character (or two) at the start of the file.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

@walt.farrellI guess my point is that the BOM is not adequate to identify the encoding for single byte streams.  Latin-1 is not the only alternative to UTF-8.  The real value of the Byte Order Mark is to distinguish bigendian from littleendian byte order, which is a critial issue in UTF-16/UCS-2 encoding.  That is its primary purpose.  Use of BOM in UTF-8 streams is optional and allowed (it is a Unicode code point, after all), but it has only heuristic value in identifying a single byte stream as UTF-8.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.