G-ELP Posted October 8, 2021 Share Posted October 8, 2021 When I export a design to SVG, the character encoding is causing an issue at some part of the process.. To reproduce: 1. Create a document 2. Add a frame text box "Staff" without quotes. 3. Set font to "Open Sans" (I'm on Windows) 4. Export to SVG Examine the SVG file with a hex editor and you will find the "ff" has turned into 3 hex bytes.. &HF, &AC, &80 Instead of the expected 2 hex bytes.. &H66, &H66 This causes issues when I put the SVG text in 'some' html files - where the character encoding must default to something else. ie. some it works fine, some it doesn't (even with same UTF-8 charset definition) but it shouldn't be getting encoded like that to begin with. In putting this report together I've found it is the solely the font that causes the issue. If I export at step 2 when the default Arial font is set, then there is no problem. Also if I set the font back to Arial there is no problem. And if I have a mix of Arial and Open Sans, it is only the Open Sans text boxes that have the issue. So I guess Affinity is picking up character encoding from the font. Is this a default? Can it be overridden somewhere in the UI? Or is there just something about this font and I should choose another? In the attached file, the text boxes.. 1Staff (open sans) = issue as above 2Staff (set to open sans then back to Arial) = no problem 3Staff (Arial) = no problem The Open Sans was saved out of my Windows Font viewer. I believe this is from the Google web fonts collection. It is what is active on my machine if you need it for debugging, but probably not the latest release.. so even that could be part of the problem.. but I guess what I am asking is, in general is there somewhere to find this setting, or expectation from a particular font, so I don't get caught out by this gotcha in future. https://fonts.google.com/specimen/Open+Sans Bug-Character encoding ff export SVG problem-20211008.afdesign Bug-Character encoding ff export SVG problem-20211008.svg Open Sans.zip Quote Link to comment Share on other sites More sharing options...
Alfred Posted October 8, 2021 Share Posted October 8, 2021 53 minutes ago, G-ELP said: In the attached file, the text boxes.. 1Staff (open sans) = issue as above 2Staff (set to open sans then back to Arial) = no problem 3Staff (Arial) = no problem In your *.afdesign file, ‘1Staff’ has two separate ‘f’ characters but ‘2Staff’ and ‘3Staff’ have an ‘ff’ ligature. Quote Alfred Affinity Designer/Photo/Publisher 2 for Windows • Windows 10 Home/Pro Affinity Designer/Photo/Publisher 2 for iPad • iPadOS 17.4.1 (iPad 7th gen) Link to comment Share on other sites More sharing options...
G-ELP Posted October 8, 2021 Author Share Posted October 8, 2021 33 minutes ago, Alfred said: In your *.afdesign file, ‘1Staff’ has two separate ‘f’ characters but ‘2Staff’ and ‘3Staff’ have an ‘ff’ ligature. Possibly, although I see no different on my end, there are 2 separate characters I can individually edit. I didn't do anything crazy when typing that would trigger a ligature. And if all I do is change 2Staff to "Open Sans" the problem will appear there too. So Affinity is picking up a text encoding from the font and applying it. Inside affinity (and opening the exported file directly) I see the expected "ff", but on embedded webpages I will sometimes see a jumbled mess instead of the ff. Also if I set 1Staff back to Arial, the problem will go away. So it's definitely on setting the font, or exporting with the font set. btw: I have since tried installing the latest Open Sans font from the url above - and get the exact same behaviour and results. The answer in this article describes what I am seeing.. >In case 2, the character is written as UTF-8 encoded, bytes 0xEF 0xAC 0x80, but then these bytes get interpreted according to windows-1252, yielding “ff”. https://tex.stackexchange.com/questions/119374/why-ff-displays-strange-using-unicode-encoding-vs-iso-8859-1-in-html-output-f Also I am using WIndows 7 and Affinity Designer 1.8.5.703. (not the latest AD version, will update and test soon) Quote Link to comment Share on other sites More sharing options...
G-ELP Posted October 8, 2021 Author Share Posted October 8, 2021 Also I tried exporting when I had removed the open sans font from the system. So the font was missing from the system but still selected in Affinity as "? Open Sans". And there was no problem with the exported file in that case either. Quote Link to comment Share on other sites More sharing options...
Alfred Posted October 8, 2021 Share Posted October 8, 2021 33 minutes ago, G-ELP said: Possibly, although I see no different on my end, there are 2 separate characters I can individually edit. Here’s what I see when I open your file in AD on iPad and zoom in: The only change I made was to alter the vertical positions of the text frames in order to save space on this forum page. I believe the editability of the separate characters is ‘by design’. Quote Also I tried exporting when I had removed the open sans font from the system. So the font was missing from the system but still selected in Affinity as "? Open Sans". And there was no problem with the exported file in that case either. If the font is missing — as indicated by the question mark in front of the name — it will be substituted. The substitute for missing sans serif fonts is usually (always?) Arial! Quote Alfred Affinity Designer/Photo/Publisher 2 for Windows • Windows 10 Home/Pro Affinity Designer/Photo/Publisher 2 for iPad • iPadOS 17.4.1 (iPad 7th gen) Link to comment Share on other sites More sharing options...
G-ELP Posted October 8, 2021 Author Share Posted October 8, 2021 I am not really sure what you are saying.. My point is, I can type in plain English, no special Alt key combinations, the word "Staff", 5 letters. (I only speak English so my computer keyboard, input languages, etc.. are as vanilla English as they come) Then change the font from default Arial to Open Sans. Then upon export I get something like "Staff". If I change the font back to Arial and export, I'll get "Staff" as expected. Changing the font, without editing the text, should not have this kind of affect. It's only by chance I've picked this up, eg. if it was a magazine column with lots of text, there could be all sorts of substitutions occurring. As it is, double f is reasonably common. Quote Link to comment Share on other sites More sharing options...
lacerto Posted October 8, 2021 Share Posted October 8, 2021 (...) Quote Link to comment Share on other sites More sharing options...
G-ELP Posted October 8, 2021 Author Share Posted October 8, 2021 Yes the image will display correctly if you open it directly, because it is correctly setting the charset in the browser from the header of the svg. But if you open the text file with a hex viewer, you can see that what is output is not 2 bytes "f", "f", it's 3 completely different bytes as above. There is no reason for that not to be "ff". I am not intending to create a ligature, only changing font. And both fonts are English fonts! Imagine if this happened if you changed from Arial to Times New Roman. That is essentially all I am doing. So the issue arises when you paste the svg text into another document, that must be a different charset, so "ff" not being "ff" becomes evident. Because once you embed the svg into the html page, you are then at the mercy of the charset of the page. On that, I "need" to embed the file because I am adding hyperlinks which don't work if using as a regular img linked to the svg file. If you want a ligature. you should be going about creating one explicitly, just happening to type the word staff, or office, or off, or any of these thousands of words..https://www.thefreedictionary.com/words-containing-ff ..and later changing the font. You wouldn't expect the text to be mangled and interpreted in another way. ie. full of ligatures and who knows what else and how many other 2 letter combinations are lying in wait. No doubt it's related to options set on this particular font, or the glyphs available - but missing an English "f" would be impossible. As a programmer, this just seems more buggy, than a feature. Some piece of the text processing pipeline is filtering and applying an unexpected conversion on the text. Also most apps let you choose the charset, I can't find any such option in AD. Anyway I've identified the issue and simply chosen another font - probably any other single font on my system would not have had this issue either, lol, what are the chances!! Quote Link to comment Share on other sites More sharing options...
lacerto Posted October 8, 2021 Share Posted October 8, 2021 (...) Quote Link to comment Share on other sites More sharing options...
lacerto Posted October 8, 2021 Share Posted October 8, 2021 (...) Quote Link to comment Share on other sites More sharing options...
kenmcd Posted October 8, 2021 Share Posted October 8, 2021 9 hours ago, G-ELP said: Examine the SVG file with a hex editor and you will find the "ff" has turned into 3 hex bytes.. &HF, &AC, &80 The character in your posted SVG for the Open Sans ff is: U+FB00 LATIN SMALL LIGATURE FF And the encoding in the SVG is set to: encoding="UTF-8" Whatever "hex editor" you are using is displaying the wrong codes for UTF-8, or it is set to some other encoding. 9 hours ago, G-ELP said: This causes issues when I put the SVG text in 'some' html files - where the character encoding must default to something else. ie. some it works fine, some it doesn't (even with same UTF-8 charset definition) but it shouldn't be getting encoded like that to begin with. This is the real problem - if you are setting the encoding to something else it is not going to display UTF-8 text correctly. Turning-Off all ligatures may work. OpenType Standard Ligatures are On by default (per the OpenType specs this is correct). Arial does not have the ff ligature in its OpenType Standard Ligatures. So in Arial the f+f is not replaced automatically. Open Sans does have the ff ligature in its OpenType Standard Ligatures. So in Open Sans the f+f is automatically replaced with the ligature (FB00). To prevent this, turn-Off Standard Ligatures in the Typography panel (as suggested above). In addition ADesigner has a "helpful" Ligatures feature - which you can access in the Text menu - which apparently replaces the f+f with the ff ligature character when there is no OpenType ligature feature available (or in this case Standard Ligatures is set to Off). Soooo helpful <roll-eyes>. Set that to "Use None"Text > Ligatures > Use None Then the SVG will actually have no ligatures (no FB00), which may work in your mixed-encoding text situation. Old Bruce 1 Quote Link to comment Share on other sites More sharing options...
lacerto Posted October 8, 2021 Share Posted October 8, 2021 (...) Quote Link to comment Share on other sites More sharing options...
kenmcd Posted October 9, 2021 Share Posted October 9, 2021 5 hours ago, Lagarto said: Maybe it is related to font versions but for me, Open Sans is ligatureless (I have version 2.0 from Google installed). On the other hand, Arial probably does not have specific ligature glyphs, but when Standard ligatures are set for it, the metrics ever so slightly change. Not sure what you are looking at for Open Sans, but the font files posted above have OpenType ligatures. I was not paying attention to the version - tomorrow I will post a screenshot of the standard ligatures included and the feature code. Arial ligatures look exactly like the original characters. So you are not going to see the change. And the FB00 ff ligature is only 1 font unit wider than 2 times the single f character. That may just rounding from the original node coordinates being non-integers. Dunno. Have to look tomorrow. But that 1 funit difference would explain the slight change in metrics you are seeing. Arial has the ligature characters (such as FB00), it just does not have the OpenType code to do the replacements. Many apps have mechanisms to replace those individual characters with the ligature character. For example MS Word and LibreOffice both have auto-correct entries which do this (even for fonts with no OpenType features at all such as old TrueType fonts). And ADesigner also has some sort of auto-replacements being done. This is independent of the OpenType replacements, but it does appear to interact with those as I mentioned above. Quote Link to comment Share on other sites More sharing options...
lacerto Posted October 9, 2021 Share Posted October 9, 2021 (...) Quote Link to comment Share on other sites More sharing options...
kenmcd Posted October 9, 2021 Share Posted October 9, 2021 First, regarding looking at Unicode characters in a hex editor ... Unicode uses hexadecimal (Base16) to designate characters. The hex editor is displaying UTF-8 characters. The ff ligature character in UTF-8 is: ef ac 80 The ff ligature character in UTF-16 is: fb00 12 hours ago, Lagarto said: so therefore OP's problem: copying such text and pasting it results in gibberish. No, I still think his issue is mixing encodings. The bottom line regarding ligatures is the user needs to know what is actually in the font (characters and OpenType features), and a clear explanation of what Affinity is doing in each situation. Quote Link to comment Share on other sites More sharing options...
v_kyr Posted October 9, 2021 Share Posted October 9, 2021 Character encodings: Essential concepts Browser Test Page for Unicode Character 'LATIN SMALL LIGATURE FF' (U+FB00) - (en) Unicode Character 'LATIN SMALL LIGATURE FF' (U+FB00) - (en) Unicode-Zeichen „ff“ (U+FB00) - (de) The Unicode StandardVersion 6.1 – Core Specification - (pdf) The tools one uses (editors, ide's, web browsers ... etc.) should generally be setup to create/handle files with the widely used common denominator here, namely ideally UTF-8 ! HTML: <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"/> ... OR <!DOCTYPE html> <html lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> ... NOTE: If there is a UTF-8 BOM (byte-order mark) at the beginning of the file, most browsers apart from Internet Explorer 10 and 11 recognize that the page is encoded in UTF-8. The BOM has higher priority than anything else, including the HTTP header. The meta specification for character encoding could be dispensed with if a BOM is present. I always recommend using one, however, as it will help those looking at the source code to see the page's character encoding. XHTML5: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html .... Quote ☛ Affinity Designer 1.10.8 ◆ Affinity Photo 1.10.8 ◆ Affinity Publisher 1.10.8 ◆ OSX El Capitan ☛ Affinity V2.3 apps ◆ MacOS Sonoma 14.2 ◆ iPad OS 17.2 Link to comment Share on other sites More sharing options...
lacerto Posted October 10, 2021 Share Posted October 10, 2021 (...) Quote Link to comment Share on other sites More sharing options...
kenmcd Posted October 13, 2021 Share Posted October 13, 2021 On 10/9/2021 at 8:55 PM, Lagarto said: Neiither of these things are bugs, but just lacks in the current implementation of these features. Yes, this "feature" is quite confusing, and does not seem to make sense in the real world. Example: the auto-replacement is Off for Arial because it includes an OpenType Standard Ligatures feature - even though that feature does not include any Latin characters (no ff). Then when the user disables Standard Ligatures, the auto-replacement then gets turned-On and it takes over and replaces the ff with the ligature character. So ... - Standard Ligatures On - no ff ligature appears - Standard Ligatures Off - ff ligature is applied Gee, how could that possibly be confusing? (which is exactly what happened here) While I agree with you that this whole situation should probably be different, it has become abundantly clear none of this is going to change as far as I can see. This stuff was done this way on purpose, so you will have to convince someone to change it. All Typography and OpenType stuff appears to be under the supervision/control of one person, and that person rarely even comments here (any more) so we are wasting our breath. So far I have seen a nearly zero response to other Typography/OpenType issues and questions I have posted, so I have just stopped posting. There are other OpenType issues with ordinals, discretionary ligatures, style groups, ccmp, etc. - but why bother - if none of it is going to change. If this is an ego driven stubbornness problem the only thing that works is public mocking. So until there is a very high-profile Affinity Annoyances article or book, it is futile. In the mean time I am happy to help users figure-out ways around the crazy stuff. G-ELP 1 Quote Link to comment Share on other sites More sharing options...
lacerto Posted October 13, 2021 Share Posted October 13, 2021 (...) kenmcd and G-ELP 2 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.