A_B_C Posted November 20, 2019 Share Posted November 20, 2019 Hi there, here’s a very basic finding concerning the IDML importer. Unencoded glyphs that are not invoked by an OpenType feature, but placed into an Indesign document manually, are not recognised by the Affinity IDML importer. However, they are recognised when the IDML file is reimported to Indesign. So basically, there must be a way to achieve a proper import of such glyphs. To demonstrate, I created a simple font, consisting of three glyphs, two square-like ones (“A”, “B”) which have proper Unicode code points, and a round one called “glyph” and lacking a Unicode code point. I exported to .otf, installed the font, created a simple one-textframe Indesign document and exported to IDML. Re-importing the document to Indesign (CS) properly presented me with the unencoded glyph. Importing the document to Affinity Publisher failed to present me with the glyph. At the same time, the glyph was accessible from the Glyph Browser. Materials attached. Please have a look. Alex Unencoded.idml Unencoded-Regular.otf Link to comment Share on other sites More sharing options...
A_B_C Posted November 20, 2019 Author Share Posted November 20, 2019 To provide some more information, when adding an OpenType “case” feature to the font, substituting “A” for “glyph,” Indesign will, again, correctly import the IDML file, while Affinity Publisher will present the glyph “A” and not the active substitute. UnencodedPlusFeature.otf UnencodedPlusFeature.idml Link to comment Share on other sites More sharing options...
A_B_C Posted November 26, 2019 Author Share Posted November 26, 2019 Any news on that? (Personally, I would be more than happy if that could be investigated, for I have a few important ID documents that were created on the basis of fonts in which numerous special glyphs weren’t encoded properly. Bad luck. ) Link to comment Share on other sites More sharing options...
postmadesign Posted November 27, 2019 Share Posted November 27, 2019 What I have seen which might be related, is that links with non-ascii characters in them like ü or ß are not interpreted correctly, which means it can not automatically find the linked images and I have to relink them manually (even if all images are in the same folder and the others are properly located). They are also shown in the wrong way in the resource manager. It seems to be a text encoding issue. Link to comment Share on other sites More sharing options...
A_B_C Posted November 27, 2019 Author Share Posted November 27, 2019 Hmm. Not sure about that. Umlauts and ß seem to be parsed correctly when used in a text frame. It’s strange that they aren’t understood in file names. It’s surely a text encoding issue, but maybe a different one. postmadesign 1 Link to comment Share on other sites More sharing options...
A_B_C Posted November 27, 2019 Author Share Posted November 27, 2019 My problem is that documents based on an improperly encoded math font turn into this on import: Which is useless, of course, even if the text formatting could be corrected. Link to comment Share on other sites More sharing options...
fde101 Posted November 27, 2019 Share Posted November 27, 2019 4 hours ago, A_B_C said: documents based on an improperly encoded math font Just so I understand this clearly... the concern is that Publisher is treating invalid fonts as being invalid? Link to comment Share on other sites More sharing options...
walt.farrell Posted November 27, 2019 Share Posted November 27, 2019 58 minutes ago, fde101 said: Just so I understand this clearly... the concern is that Publisher is treating invalid fonts as being invalid? I think the concern is that InDesign was able to deal with it, and Publisher isn't. But I agree that if the font is broken, one shouldn't expect all programs to deal with it equally -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. iPad: iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1 Link to comment Share on other sites More sharing options...
fde101 Posted November 27, 2019 Share Posted November 27, 2019 39 minutes ago, walt.farrell said: if the font is broken, one shouldn't expect all programs to deal with it equally True, but I'm curious because I'm not 100% certain this means the font is invalid. Is it considered valid (in spite of being strange) for a program to use glyphs of a font which are not mapped to a valid code point? I could understand that in the case of an embedded subset of a font in a PDF for example (I would think a specific glyph would need to be referenced anyway to indicate which should be rendered?) but with a "normal" font source not sure how a program would be expected to find the glyph with no code point being mapped... InDesign does appear to have features for doing exactly that - identifying characters by glyph directly without going through the code point - so this might actually be technically valid: https://helpx.adobe.com/indesign/using/glyphs-special-characters.html Link to comment Share on other sites More sharing options...
Staff Pauls Posted November 27, 2019 Staff Share Posted November 27, 2019 We'l l have a look at this - thanks Link to comment Share on other sites More sharing options...
A_B_C Posted November 27, 2019 Author Share Posted November 27, 2019 It is perfectly possible to have unencoded glyphs in a font. It doesn’t make a font invalid. As a matter of fact, glyphs that are invoked by a substitution feature, like small capitals or glyph variants, are very often not assigned a Unicode code point in a font, but referred to by the glyph name (glyph ID) or glyph index. So while the math font in the document I shared a screenshot of may actually be considered “broken” (as it contains unencoded glyphs that should have had assigned a Unicode code point and were entered by using the glyph panel), it is obvious that the IDML importer of Affinity Publisher will have to take better care for unencoded glyphs, unless you want to lose all glyph substitutions that are targeting to unencoded glyphs in a font referenced in an IDML file. This is shown by my second example above. For a real life example, take a look at a workhorse typeface like Adobe Caslon Pro. Link to comment Share on other sites More sharing options...
fde101 Posted November 27, 2019 Share Posted November 27, 2019 @A_B_C, those ".sc" (small caps) glyphs may be characters that result from a combination of multiple code points, rather than not being mapped at all...? There are actually small caps code points in the unicode standard but they do not include the diacritical marks, so those characters would need to use the combining character code points to be represented. Not sure what tool you are using to view the font, but do you know if that property field would populate in that scenario? Details: https://en.wikipedia.org/wiki/Small_caps#Unicode https://en.wikipedia.org/wiki/Combining_character EDIT: I don't think that is what is happening here. The code points are actually in your screenshot, just below each letter in the big area on the left, and they are not the "small caps" code points even for the glyphs with the ".sc" in the name. Code point 0041 is the code point for a capital letter A, for example. Note also that the combined "AE" form is 00C6 for both of them - the "normal" and the "small caps" variant... these are mapped to code points, but for some reason your tool is not showing those mappings in the field you pointed to. Link to comment Share on other sites More sharing options...
kenmcd Posted November 27, 2019 Share Posted November 27, 2019 Until Affinity can connect the actual font to the text none of this matters. You cannot simply rely on Unicode code points - because (1) many characters are un-mapped, and (2) even if they are mapped there is no standard across fonts for the use of the Private Use Area (PUA) which would make the characters identifiable. The advantage of having all glyphs mapped in the PUA is applications which only understand Unicode, such as LibreOffice (LO), can use their Special Characters browser to insert characters from the PUA. There is no Glyph Browser in LO so you cannot manually insert an Arial small cap character; they are all un-mapped. InDesign/Adobe are connecting the actual font to the characters using Unicode code points, and/or glyph number or glyph name. And they are coding this into the PDF export. That is the only explanation that makes sense when you can round-trip a PDF and get back to the same characters. With some characters such as ligatures, the ToUnicode table has codes for the individual characters, not the ligature. They are reading the font and decomposing the ligature, and then encoding the ToUnicode so it will import properly. If an un-mapped glyph is surviving a round-trip they must be using the glyph number. They could be using the glyph number for that specific font and putting that in the ToUnicode table (or somewhere). To bring it back in they would have to know it is a glyph number in the ToUnicode table. How is that marked? Or is that info being kept somewhere else in the PDF? Another Affinity user mentioned that IL was importing un-mapped glyphs as outlines. Is ID different than IL in this respect? Appears to be. Or is IL only doing that when the PDF was not created by ID (with the special glyph number handling)? We do know that the applications creating the PDFs are all over the place in terms of quality. Some do not even have a ToUnicode table. Adobe may either just be doing it all right (per the PDF specs), or they could be doing their usual special rules. I think the only way to find out how they are doing it is to systematically analyze what is going into the PDFs. And what is then imported back again. Find out how are they identifying the glyphs. Back to my first statement ... until Affinity can connect the incoming text to the actual font this is not going to work. Some AI to analyze the PDF will be needed. And then the ability to do a lot of IF-THEN's to get to the right glyphs. Link to comment Share on other sites More sharing options...
A_B_C Posted November 27, 2019 Author Share Posted November 27, 2019 3 hours ago, fde101 said: Not sure what tool you are using to view the font, but do you know if that property field would populate in that scenario? The tool is Fontlab VI, so I am pretty sure that if these glyphs had a Unicode code point assigned, Fontlab would display them in the Glyph Panel. That’s the most basic thing a font editor should do when you open a font. In fact, the greyed-out code points you can see below the glyphs just indicate the semantic dependence of the small caps glyphs on the respective standard characters. To confirm what I said, here is a screen shot from DTL Open Type Master (OTM). As you can see, even the naked “A.sc” has no Unicode code point assigned: You may say “Well, the input fields below the glyph contour only relate to the Basic Multilingual Plane (BMP), so what about code points beyond this plane?“ — but you must know that OTM adds input fields for code points that lie beyond the BMP, as soon as you assign such a code point to a glyph and export the font anew. For instance, when I assign “A.sc” to a code point in the Supplementary Private Use Area (PUA), here F0000, and do the export, the additional input fields will immediately show up: So you can really believe me that all of the small caps glyphs are unencoded in the version of Adobe Caslon Pro I took my screenshot from (this version came with an earlier iteration of Indesign, so it is by no means “unofficial”). And there are many other fonts where the same is the case. Link to comment Share on other sites More sharing options...
A_B_C Posted November 27, 2019 Author Share Posted November 27, 2019 1 hour ago, LibreTraining said: And they are coding this into the PDF export. But aren’t we talking about IDML export here? Link to comment Share on other sites More sharing options...
fde101 Posted November 28, 2019 Share Posted November 28, 2019 1 hour ago, A_B_C said: if these glyphs had a Unicode code point assigned, Fontlab would display them in the Glyph Panel What are the numbers below the characters in your screenshot? Link to comment Share on other sites More sharing options...
A_B_C Posted November 28, 2019 Author Share Posted November 28, 2019 Seems you don’t believe me. As I said in my explanation above, the greyed-out code points you can see below the glyphs just indicate the semantic dependence of the small caps glyphs on the respective standard characters. They don’t indicate any assigned code point. This is a peculiarity of the FontLab VI interface. I admit it is not very obvious, but you would have to ask the development team of FontLab VI, why they decided to do it this way. I don’t like it either. Here are two screen shots from Fontlab Studio 5, the earlier iteration of Fontlab, that didn’t have this peculiarity. You can clearly see that the small cap “A” is not encoded: This is what an encoded glyph would look like in Fontlab Studio 5: Hope this is convincing enough … Link to comment Share on other sites More sharing options...
A_B_C Posted November 28, 2019 Author Share Posted November 28, 2019 And you will note in the last screenshots, how many glyphs in the said version of Adobe Caslon Pro are not encoded. It is all those that have the “- - -” in the glyph cell header. Link to comment Share on other sites More sharing options...
kenmcd Posted November 28, 2019 Share Posted November 28, 2019 22 hours ago, A_B_C said: But aren’t we talking about IDML export here? Brain fart! But, I think the answer is sort of the same. They have to be saving those glyph numbers in the IDML file to be able call them back. ... Which is exactly what you said in your first post ... Link to comment Share on other sites More sharing options...
A_B_C Posted November 28, 2019 Author Share Posted November 28, 2019 No problem. I am happy that you emphasised the ubiquity of unencoded glyphs. Link to comment Share on other sites More sharing options...
A_B_C Posted November 28, 2019 Author Share Posted November 28, 2019 When I reviewed my second post today, I noticed what might have gone wrong there. Or rather, I believe my example was not suitably chosen. As described above, I had created a font with a “case” feature and applied that feature to an uppercase letter, in order to exchange it for an unencoded glyph. Applying this feature results in the following XML code in the IDML document: Without knowing the inner workings of the IDML importer, I can imagine why this may confuse the algorithms. As “A” is uppercase, the importer may consider the property “Capitalization = AllCaps” already applied. Hence there is no substitution. So that was perhaps an unfortunate example. I should have checked the resulting XML code before uploading. So in order to test the IDML importer, we would have to use an example that is closer to real life. So I replaced the “case” feature by a “c2sc” feature with the same substitution rule as above. Then the resulting XML code is: But Publisher still refuses to apply the OpenType substitution. Instead, a faux small cap is created from the “A” glyph: Here are the documents, if you would like to check: UnencodedSmallCaps.otf UnencodedSmallCaps.idml Link to comment Share on other sites More sharing options...
A_B_C Posted November 29, 2019 Author Share Posted November 29, 2019 Regarding my first post, the problem seems to be that the IDML importer of Publisher doesn’t interpret the following XML context correctly: Story_ucc.xml As you can see, the glyph name “glyph” is registered in the IDML code of the story contained in my text frame. Indeed, there is a special syntax for handling custom glyphs in the IDML specification, as I noticed this morning. @Pauls, I would be more than happy to provide additional examples if needed. If you haven’t already discovered the root of the issue, please attach these findings to your report (afb-3049). It would be just awesome, if that could be sorted out. Link to comment Share on other sites More sharing options...
Staff Pauls Posted November 29, 2019 Staff Share Posted November 29, 2019 @A-B_C - That's what's been reported - thanks A_B_C 1 Link to comment Share on other sites More sharing options...
A_B_C Posted November 29, 2019 Author Share Posted November 29, 2019 Great! Thank you very much! Link to comment Share on other sites More sharing options...
Staff Patrick Connor Posted December 13, 2019 Staff Share Posted December 13, 2019 We have made fixes/improvements to these areas (Unencoded Glyphs Not Handled Properly) & (AllSmallCaps, SmallCaps and CapsToSmallCaps implementation) of the program in the latest Affinity Publisher beta. If you would like to try these changes the beta software is available in the forum posts listed below. Once Affinity Publisher has been through a full beta process the change will be released in a future free 1.8.0 update to all customers. The 1.8.0 builds are in links at the top of these beta forum posts Affinity Publisher 1.8.0.523 for Windows Affinity Publisher 1.8.0.523 for macOS kenmcd and A_B_C 2 Patrick Connor Serif Europe Ltd Latest V2 releases on each platform Help make our apps better by joining our beta program! "There is nothing noble in being superior to your fellow man. True nobility lies in being superior to your previous self." W. L. Sheldon Link to comment Share on other sites More sharing options...
Recommended Posts