Jump to content
You must now use your email address to sign in [click for more info] ×

Designer export to SVG character encoding problem with "ff" character string


Recommended Posts

When I export a design to SVG, the character encoding is causing an issue at some part of the process..

To reproduce:
1. Create a document
2. Add a frame text box "Staff" without quotes.
3. Set font to "Open Sans" (I'm on Windows)
4. Export to SVG

Examine the SVG file with a hex editor and you will find the "ff" has turned into 3 hex bytes..
&HF, &AC, &80

Instead of the expected 2 hex bytes..
&H66, &H66

This causes issues when I put the SVG text in 'some' html files - where the character encoding must default to something else.
ie. some it works fine, some it doesn't (even with same UTF-8 charset definition) but it shouldn't be getting encoded like that to begin with.

In putting this report together I've found it is the solely the font that causes the issue. If I export at step 2 when the default Arial font is set, then there is no problem. Also if I set the font back to Arial there is no problem. And if I have a mix of Arial and Open Sans, it is only the Open Sans text boxes that have the issue.

So I guess Affinity is picking up character encoding from the font. Is this a default? Can it be overridden somewhere in the UI? Or is there just something about this font and I should choose another?

In the attached file, the text boxes..

1Staff (open sans) = issue as above
2Staff (set to open sans then back to Arial) = no problem
3Staff (Arial) = no problem

The Open Sans was saved out of my Windows Font viewer. I believe this is from the Google web fonts collection. It is what is active on my machine if you need it for debugging, but probably not the latest release.. so even that could be part of the problem.. but I guess what I am asking is, in general is there somewhere to find this setting, or expectation from a particular font, so I don't get caught out by this gotcha in future.

https://fonts.google.com/specimen/Open+Sans

Bug-Character encoding ff export SVG problem-20211008.afdesign Bug-Character encoding ff export SVG problem-20211008.svg Open Sans.zip

Link to comment
Share on other sites

53 minutes ago, G-ELP said:

In the attached file, the text boxes..

1Staff (open sans) = issue as above
2Staff (set to open sans then back to Arial) = no problem
3Staff (Arial) = no problem

In your *.afdesign file, ‘1Staff’ has two separate ‘f’ characters but ‘2Staff’ and ‘3Staff’ have an ‘ff’ ligature.

Alfred spacer.png
Affinity Designer/Photo/Publisher 2 for Windows • Windows 10 Home/Pro
Affinity Designer/Photo/Publisher 2 for iPad • iPadOS 17.4.1 (iPad 7th gen)

Link to comment
Share on other sites

33 minutes ago, Alfred said:

In your *.afdesign file, ‘1Staff’ has two separate ‘f’ characters but ‘2Staff’ and ‘3Staff’ have an ‘ff’ ligature.

Possibly, although I see no different on my end, there are 2 separate characters I can individually edit. I didn't do anything crazy when typing that would trigger a ligature. And if all I do is change 2Staff to "Open Sans" the problem will appear there too. So Affinity is picking up a text encoding from the font and applying it. Inside affinity (and opening the exported file directly) I see the expected "ff", but on embedded webpages I will sometimes see a jumbled mess instead of the ff.

Also if I set 1Staff back to Arial, the problem will go away. So it's definitely on setting the font, or exporting with the font set.

btw: I have since tried installing the latest Open Sans font from the url above - and get the exact same behaviour and results.

The answer in this article describes what I am seeing..
>In case 2, the character is written as UTF-8 encoded, bytes 0xEF 0xAC 0x80, but then these bytes get interpreted according to windows-1252, yielding “ff”. 

https://tex.stackexchange.com/questions/119374/why-ff-displays-strange-using-unicode-encoding-vs-iso-8859-1-in-html-output-f

Also I am using WIndows 7 and Affinity Designer 1.8.5.703. (not the latest AD version, will update and test soon)

Link to comment
Share on other sites

Also I tried exporting when I had removed the open sans font from the system. So the font was missing from the system but still selected in Affinity as "? Open Sans". And there was no problem with the exported file in that case either.

 

Link to comment
Share on other sites

33 minutes ago, G-ELP said:

Possibly, although I see no different on my end, there are 2 separate characters I can individually edit.

Here’s what I see when I open your file in AD on iPad and zoom in:

83C9C558-42AA-4FB4-A52D-40D00988EA22.jpeg.8619acd30c18834abb4d6bef3355aba3.jpeg

The only change I made was to alter the vertical positions of the text frames in order to save space on this forum page.

I believe the editability of the separate characters is ‘by design’.

Quote

Also I tried exporting when I had removed the open sans font from the system. So the font was missing from the system but still selected in Affinity as "? Open Sans". And there was no problem with the exported file in that case either.

If the font is missing — as indicated by the question mark in front of the name — it will be substituted. The substitute for missing sans serif fonts is usually (always?) Arial!

Alfred spacer.png
Affinity Designer/Photo/Publisher 2 for Windows • Windows 10 Home/Pro
Affinity Designer/Photo/Publisher 2 for iPad • iPadOS 17.4.1 (iPad 7th gen)

Link to comment
Share on other sites

I am not really sure what you are saying..

My point is, I can type in plain English, no special Alt key combinations, the word "Staff", 5 letters.
(I only speak English so my computer keyboard, input languages, etc.. are as vanilla English as they come)

Then change the font from default Arial to Open Sans.

Then upon export I get something like "Staff".

If I change the font back to Arial and export, I'll get "Staff" as expected.

Changing the font, without editing the text, should not have this kind of affect. It's only by chance I've picked this up, eg. if it was a magazine column with lots of text, there could be all sorts of substitutions occurring. As it is, double f is reasonably common.

Link to comment
Share on other sites

Yes the image will display correctly if you open it directly, because it is correctly setting the charset in the browser from the header of the svg.

But if you open the text file with a hex viewer, you can see that what is output is not 2 bytes "f", "f", it's 3 completely different bytes as above.

There is no reason for that not to be "ff".

I am not intending to create a ligature, only changing font. And both fonts are English fonts! Imagine if this happened if you changed from Arial to Times New Roman. That is essentially all I am doing.

So the issue arises when you paste the svg text into another document, that must be a different charset, so "ff" not being "ff" becomes evident. Because once you embed the svg into the html page, you are then at the mercy of the charset of the page. On that, I "need" to embed the file because I am adding hyperlinks which don't work if using as a regular img linked to the svg file.

If you want a ligature. you should be going about creating one explicitly, just happening to type the word staff, or office, or off, or any of these thousands of words..
https://www.thefreedictionary.com/words-containing-ff

..and later changing the font. You wouldn't expect the text to be mangled and interpreted in another way. ie. full of ligatures and who knows what else and how many other 2 letter combinations are lying in wait.

No doubt it's related to options set on this particular font, or the glyphs available - but missing an English "f" would be impossible. As a programmer, this just seems more buggy, than a feature. Some piece of the text processing pipeline is filtering and applying an unexpected conversion on the text.

Also most apps let you choose the charset, I can't find any such option in AD.

Anyway I've identified the issue and simply chosen another font - probably any other single font on my system would not have had this issue either, lol, what are the chances!!

Link to comment
Share on other sites

9 hours ago, G-ELP said:

Examine the SVG file with a hex editor and you will find the "ff" has turned into 3 hex bytes..
&HF, &AC, &80

The character in your posted SVG for the Open Sans ff is: U+FB00  LATIN SMALL LIGATURE FF
And the encoding in the SVG is set to: encoding="UTF-8"
Whatever "hex editor" you are using is displaying the wrong codes for UTF-8, or it is set to some other encoding.

9 hours ago, G-ELP said:

This causes issues when I put the SVG text in 'some' html files - where the character encoding must default to something else.
ie. some it works fine, some it doesn't (even with same UTF-8 charset definition) but it shouldn't be getting encoded like that to begin with.

This is the real problem - if you are setting the encoding to something else it is not going to display UTF-8 text correctly.

Turning-Off all ligatures may work.

OpenType Standard Ligatures are On by default (per the OpenType specs this is correct).
Arial does not have the ff ligature in its OpenType Standard Ligatures.
So in Arial the f+f is not replaced automatically.
Open Sans does have the ff ligature in its OpenType Standard Ligatures.
So in Open Sans the f+f is automatically replaced with the ligature (FB00).
To prevent this, turn-Off Standard Ligatures in the Typography panel (as suggested above).

In addition ADesigner has a "helpful" Ligatures feature - which you can access in the Text menu - which apparently replaces the f+f with the ff ligature character when there is no OpenType ligature feature available (or in this case Standard Ligatures is set to Off).
Soooo helpful <roll-eyes>.
Set that to "Use None"
Text > Ligatures > Use None

Then the SVG will actually have no ligatures (no FB00),
which may work in your mixed-encoding text situation.

 

Link to comment
Share on other sites

5 hours ago, Lagarto said:

Maybe it is related to font versions but for me, Open Sans is ligatureless (I have version 2.0 from Google installed). On the other hand, Arial probably does not have specific ligature glyphs, but when Standard ligatures are set for it, the metrics ever so slightly change.

Not sure what you are looking at for Open Sans, but the font files posted above have OpenType ligatures. I was not paying attention to the version - tomorrow I will post a screenshot of the standard ligatures included and the feature code.

Arial ligatures look exactly like the original characters. So you are not going to see the change. And the FB00 ff ligature is only 1 font unit wider than 2 times the single f character. That may just rounding from the original node coordinates being non-integers. Dunno. Have to look tomorrow. But that 1 funit difference would explain the slight change in metrics you are seeing.

Arial has the ligature characters (such as FB00), it just does not have the OpenType code to do the replacements.

Many apps have mechanisms to replace those individual characters with the ligature character. For example MS Word and LibreOffice both have auto-correct entries which do this (even for fonts with no OpenType features at all such as old TrueType fonts). And ADesigner also has some sort of auto-replacements being done. This is independent of the OpenType replacements, but it does appear to interact with those as I mentioned above.

Link to comment
Share on other sites

First, regarding looking at Unicode characters in a hex editor ...
Unicode uses hexadecimal (Base16) to designate characters.
The hex editor is displaying UTF-8 characters.
The ff ligature character in UTF-8 is: ef ac 80
The ff ligature character in UTF-16 is: fb00

12 hours ago, Lagarto said:

so therefore OP's problem: copying such text and pasting it results in gibberish.

No, I still think his issue is mixing encodings.

The bottom line regarding ligatures is the user needs to know what is actually in the font (characters and OpenType features), and a clear explanation of what Affinity is doing in each situation.

 

Link to comment
Share on other sites

The tools one uses (editors, ide's, web browsers ... etc.) should generally be setup to create/handle files with the widely used common denominator here, namely ideally UTF-8 !

HTML:

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
...
  
  OR
  
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
...

NOTE:  If there is a UTF-8 BOM (byte-order mark) at the beginning of the file, most browsers apart from Internet Explorer 10 and 11 recognize that the page is encoded in UTF-8. The BOM has higher priority than anything else, including the HTTP header. The meta specification for character encoding could be dispensed with if a BOM is present. I always recommend using one, however, as it will help those looking at the source code to see the page's character encoding.

 

XHTML5:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html .... 

 

 

☛ Affinity Designer 1.10.8 ◆ Affinity Photo 1.10.8 ◆ Affinity Publisher 1.10.8 ◆ OSX El Capitan
☛ Affinity V2.3 apps ◆ MacOS Sonoma 14.2 ◆ iPad OS 17.2

Link to comment
Share on other sites

On 10/9/2021 at 8:55 PM, Lagarto said:

Neiither of these things are bugs, but just lacks in the current implementation of these features. 

Yes, this "feature" is quite confusing, and does not seem to make sense in the real world.

Example: the auto-replacement is Off for Arial because it includes an OpenType Standard Ligatures feature - even though that feature does not include any Latin characters (no ff).
Then when the user disables Standard Ligatures, the auto-replacement then gets turned-On and it takes over and replaces the ff with the ligature character.
So ...
- Standard Ligatures On - no ff ligature appears
- Standard Ligatures Off - ff ligature is applied
Gee, how could that possibly be confusing?  (which is exactly what happened here)

While I agree with you that this whole situation should probably be different,
it has become abundantly clear none of this is going to change as far as I can see.
This stuff was done this way on purpose, so you will have to convince someone to change it.
All Typography and OpenType stuff appears to be under the supervision/control of one person,
and that person rarely even comments here (any more) so we are wasting our breath.
So far I have seen a nearly zero response to other Typography/OpenType issues and questions I have posted, so I have just stopped posting. There are other OpenType issues with ordinals, discretionary ligatures, style groups, ccmp, etc. - but why bother - if none of it is going to change.
If this is an ego driven stubbornness problem the only thing that works is public mocking.
So until there is a very high-profile Affinity Annoyances article or book, it is futile.

In the mean time I am happy to help users figure-out ways around the crazy stuff.
 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.