Jump to content

Recommended Posts

Posted

Hi! I would like to discuss the breaking vs. non-breaking qualities of various space characters in Affinity Publisher 2.5.7. I am not going into script- or language-specific issues; rather I shall stick to the common whitespace characters that are shown on @MikeTO’s “Special characters” sheet. Today I had occasion to glance at the English version which lists nearly all space characters as “non-breaking” while the German version does not call that out.

All the space characters in the U+2000 through U+200A range, along with U+202F and a few oddments, have a “General_Category” of ”Space_Separator” (“Zs”), as shown in the current Unicode Database. See, for instance, the General_Category summary, lines 3464–3474.

For U+2000–U+2006 and U+2008–U+200A specifically, Unicode Standard Annex #14 states:

Quote

All of these space characters have a specific width, but otherwise behave as breaking spaces. In setting a justified line, none of these spaces normally changes in width, except for THIN SPACE when used in mathematical notation.

U+2007, the FIGURE SPACE, is non-breaking as it is intended to replace aligned (table) figures. U+202F, NARROW NO-BREAK SPACE (NNBSP) is also non-breaking.

In Publisher, only U+2000 and U+2001 are breaking spaces. In Word, these two are non-breaking, while U+2002 and U+2003 are breaking spaces. Both programs incorrectly treat U+2004–U+2006 and U+2008–U+200A. InDesign also treats most space characters incorrectly.

Serif: Please fix this. Otherwise, why have standards at all?

Thanks,
Felix,

 

Posted

There is at least one other thread on this topic and we learned it's not a bug but by design. Unicode doesn't specify which spaces should be non-breaking so InDesign,  Affinity, and Word can make their own choices and Affinity works similarly to InDesign. Unicode only designates a character as non-breaking if there is an identical character of the same width.

18 minutes ago, Felix Kasza said:

Today I had occasion to glance at the English version which lists nearly all space characters as “non-breaking” while the German version does not call that out.

I added "non-breaking" to all the relevant spaces in the English version sometime after I created the original quick reference but I didn't have space to add this to all of the translations. I will have to make changes to the design to fit that in but I haven't gotten to it yet.

Cheers

Posted
5 minutes ago, MikeTO said:

Unicode doesn't specify which spaces should be non-breaking

Um, it does and has been doing so for nearly 25 years.

Quote

A Unicode Standard Annex (UAX) forms an integral part of the Unicode Standard, carrying the same version number, but is published as a separate document. Note that conformance to a version of the Unicode Standard includes conformance to its Unicode Standard Annexes.

(https://www.unicode.org/reports/tr14/, “Status” section, emphasis in the original). It has been that way at least since Unicode 3.0; https://www.unicode.org/reports/tr14/tr14-7.html is dated Aug. 22, 2000.

By the way, I was _not_ whining about the German version of your reference card not calling the no-break attributes out but rather pointing out a bug that makes the special-width spaces unsuited for many purposes. The two spaces (among the ones under discussion) for which one expects no-break behaviour have it by default; the others require me to laboriously insert a ZWSP, and worse yet, to remember having to insert it.

Anyway, to you, once again thank you for your time and your knowledge and everything, and to Serif, a request to fix this.

Felix.

Posted
14 minutes ago, Felix Kasza said:

Um, it does and has been doing so for nearly 25 years.

My apologies, you are absolutely right. Most of my work with Unicode predates version 3 when this was added.

It's probably too late to change this - if Adobe and Serif were to change these spaces from non-breaking to breaking, anybody who had used these characters would find their text broke differently unless the apps automatically inserted zero-width non-breaking space before one of these spaces in legacy documents. I think that's unlikely to happen, especially as both programs treat these the same way.

And it really doesn't matter, as long as we know which way the apps will handle it, we can all make the apps do whatever we want. I skimmed that appendix and see that they specified an en dash as breaking instead of non-breaking which I disagree with, but I can work around this as long as I know how the app works.

Posted
17 hours ago, MikeTO said:

... if Adobe and Serif were to change these spaces from non-breaking to breaking, anybody who had used these characters would find their text broke differently unless the apps automatically inserted zero-width non-breaking space before one of these spaces in legacy documents. I think that's unlikely to happen, especially as both programs treat these the same way. ...

It's never too late to follow the standards.

Perhaps an option where Serif, when opening legacy documents (and from a preceding version), would ask the user whether to apply the standards or not. Alternatively, it wouldn't take much effort to do a F/R grep to find such spaces and make them non-breaking.

The only application I am aware of that has always had a fuller compliance in this breaking/non-breaking spaces topic is QXP. There may be others that I'm not aware of or have forgotten about, though.

Posted

Unicode version 3.0 was officially released in September 1999.

Serif, have you done something foolish and recruited the usability specialist you advertised for this summer... internally? If so, let your customers know. Be transparent. I see no evidence that a properly trained UX professional is involved in Affinity - on the contrary - and if you continue your work without usability expertise, it is an insult to all your customers. 

  • Staff
Posted

I think that it could be made optional to follow the unicode spec for those who agree with it, but we do not think this is a bug exactly. We believe the Unicode spec is incorrect in respect of many of their suggestions for fixed width breaking space characters [U+2002, U+2003, U+2004, U+2005, U+2006, U+2008, U+2009, U+200A, U+205F). 

Unicode:

Name

2.5.7 Behaviour

Unicode Expected

0020

SPACE

Breaking

Breaking

00A0

NO-BREAK SPACE

Non Breaking

Non Breaking

1680

OGHAM SPACE MARK

Breaking

Breaking

2000

EN QUAD

Breaking

Breaking

2001

EM QUAD

Breaking

Breaking

2002

EN SPACE

Non Breaking

Breaking

2003

EM SPACE

Non Breaking

Breaking

2004

THREE-PER-EM SPACE

Non Breaking

Breaking

2005

FOUR-PER-EM SPACE

Non Breaking

Breaking

2006

SIX-PER-EM SPACE

Non Breaking

Breaking

2007

FIGURE SPACE

Non Breaking

Non Breaking

2008

PUNCTUATION SPACE

Non Breaking

Breaking

2009

THIN SPACE

Non Breaking

Breaking

200A

HAIR SPACE

Non Breaking

Breaking

202F

NARROW NO-BREAK SPACE

Non Breaking

Non Breaking

205F

MEDIUM MATHEMATICAL SPACE

Non Breaking

Breaking

3000

IDEOGRAPHIC SPACE

Breaking

Breaking

I have made a suggestion APL-1793 to that effect.

This came up before (the other way around) here when we tried implementing some of the unicode approach and then removed it.

That thread includes this wise comment on this topic

Patrick Connor
Serif Europe Ltd

"There is nothing noble in being superior to your fellow man. True nobility lies in being superior to your previous self."  W. L. Sheldon

 

Posted

Hmm. I don't know if my linked comment is/was taken out of context or not. (But, wise? I dunno about that...)

If I combine what I believe I was getting at in the linked comment, in combination with my above reference to QXP, here's how Quark handles breaking/non-breaking spaces.

First, breaking space choices (the menu selection is actually Insert | Special. So there are other items that do not relate to spaces):

Capture_001195.png.a6bb306038f01e302bcb1811b5e0d5de.png

And here are the non-breaking choices:

Capture_001196.png.678f330883030434af9111e2d0fc5a6c.png

 

1 hour ago, Patrick Connor said:

...but we do not think this is a bug exactly. We believe the Unicode spec is incorrect in respect of many of their suggestions for fixed width breaking space characters [U+2002, U+2003, U+2004, U+2005, U+2006, U+2008, U+2009, U+200A, U+205F). ...

Not a bug...agreed. But, the implementation is incorrect according to a Standards committee. There is a process to get the Unicode Consortium to alter the spec--but at least in this case I imagine the proposal would be rejected out of hand.

Instead, I think that how Quark handles it would be the better option.

Posted

I'm glad this is being proposed. I really need this. I want all spaces that are not explicitly non-breaking by design, to be breaking spaces. At present, I can add a zero-width in front of, say, an Em Space, but then the Em Space will appear at the beginning of the line, which is definitely not a desired behavior. The Unicode standard is exactly the behavior I expect.

I cannot imagine a case where I want an Em or En space to appear in the middle of a line, but also at the beginning of a line should the line break.

Posted
On 12/21/2024 at 2:14 PM, mwdiers said:

At present, I can add a zero-width in front of, say, an Em Space,

Wouldn't adding it after the Em Space accomplish what you want, instead of adding it before?

(I agree the application should honor/follow the standard, by the way; I'm not arguing against that.)

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
    Laptop 2: Windows 11 Pro 24H2,  16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU
iPad:  iPad Pro M1, 12.9": iPadOS 18.2.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1

  • 3 weeks later...
Posted
Quote

Wouldn't adding it after the Em Space accomplish what you want, instead of adding it before?

Only partially. On justified text, that will move the space to the end of the current line, rather than the beginning of the next. Still not ideal, but at least better than the alternative.

Posted

You need (mostly smaller) spaces for various typographical reasons, so I'm wondering why such spaces should break then as they're not usable then. Even more: why should there be two space characters with an equal width (like U+2000 and U+2002) when they both break like in the red/green table here shown?

Posted
1 hour ago, mick0005 said:

Even more: why should there be two space characters with an equal width (like U+2000 and U+2002) when they both break like in the red/green table here shown?

There's no practical difference to using an en space versus an en quad, this is just some history preserved in Unicode.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.