Jump to content
A_B_C

IDML: Unencoded Glyphs Not Handled Properly

Recommended Posts

Hi there,

here’s a very basic finding concerning the IDML importer. Unencoded glyphs that are not invoked by an OpenType feature, but placed into an Indesign document manually, are not recognised by the Affinity IDML importer. However, they are recognised when the IDML file is reimported to Indesign. So basically, there must be a way to achieve a proper import of such glyphs.

To demonstrate, I created a simple font, consisting of three glyphs, two square-like ones (“A”, “B”) which have proper Unicode code points, and a round one called “glyph” and lacking a Unicode code point. I exported to .otf, installed the font, created a simple one-textframe Indesign document and exported to IDML. Re-importing the document to Indesign (CS) properly presented me with the unencoded glyph. Importing the document to Affinity Publisher failed to present me with the glyph. At the same time, the glyph was accessible from the Glyph Browser. Materials attached.

Please have a look.

Alex :)

Unencoded.idml

Unencoded-Regular.otf

Fontlab.png.dd61b2ced372e7142fb28ab12c0cf2bf.png

Indesign.thumb.png.291421c7c359a33894f878ef04aeca5c.png

Unencoded.png.1ac6c1bdf73988ca5598c9968caefa45.png

Share this post


Link to post
Share on other sites

Any news on that? :)

(Personally, I would be more than happy if that could be investigated, for I have a few important ID documents that were created on the basis of fonts in which numerous special glyphs weren’t encoded properly. Bad luck. :()

Share this post


Link to post
Share on other sites

What I have seen which might be related, is that links with non-ascii characters in them like ü or ß are not interpreted correctly, which means it can not automatically find the linked images and I have to relink them manually (even if all images are in the same folder and the others are properly located). They are also shown in the wrong way in the resource manager. It seems to be a text encoding issue. 

Share this post


Link to post
Share on other sites

Hmm. Not sure about that. Umlauts and ß seem to be parsed correctly when used in a text frame. It’s strange that they aren’t understood in file names. It’s surely a text encoding issue, but maybe a different one. :/

Umlauts.png.113e9c57e3d73832982ef095bdd52eb3.png

Share this post


Link to post
Share on other sites

My problem is that documents based on an improperly encoded math font turn into this on import:

Math.png.8de4bd99dd186868e4f3b669c681b6f0.png

Which is useless, of course, even if the text formatting could be corrected. :(

Share this post


Link to post
Share on other sites
4 hours ago, A_B_C said:

documents based on an improperly encoded math font

Just so I understand this clearly...  the concern is that Publisher is treating invalid fonts as being invalid?

Share this post


Link to post
Share on other sites
58 minutes ago, fde101 said:

Just so I understand this clearly...  the concern is that Publisher is treating invalid fonts as being invalid?

I think the concern is that InDesign was able to deal with it, and Publisher isn't. But I agree that if the font is broken, one shouldn't expect all programs to deal with it equally :)

Share this post


Link to post
Share on other sites
39 minutes ago, walt.farrell said:

if the font is broken, one shouldn't expect all programs to deal with it equally

True, but I'm curious because I'm not 100% certain this means the font is invalid.  Is it considered valid (in spite of being strange) for a program to use glyphs of a font which are not mapped to a valid code point?

I could understand that in the case of an embedded subset of a font in a PDF for example (I would think a specific glyph would need to be referenced anyway to indicate which should be rendered?) but with a "normal" font source not sure how a program would be expected to find the glyph with no code point being mapped...

 

InDesign does appear to have features for doing exactly that - identifying characters by glyph directly without going through the code point - so this might actually be technically valid:

https://helpx.adobe.com/indesign/using/glyphs-special-characters.html

Share this post


Link to post
Share on other sites

It is perfectly possible to have unencoded glyphs in a font. It doesn’t make a font invalid. As a matter of fact, glyphs that are invoked by a substitution feature, like small capitals or glyph variants, are very often not assigned a Unicode code point in a font, but referred to by the glyph name (glyph ID) or glyph index.

So while the math font in the document I shared a screenshot of may actually be considered “broken” (as it contains unencoded glyphs that should have had assigned a Unicode code point and were entered by using the glyph panel), it is obvious that the IDML importer of Affinity Publisher will have to take better care for unencoded glyphs, unless you want to lose all glyph substitutions that are targeting to unencoded glyphs in a font referenced in an IDML file. This is shown by my second example above.

For a real life example, take a look at a workhorse typeface like Adobe Caslon Pro.

Caslon.thumb.png.0a7038c08192e8b6836fbcfeeeb08389.png

Share this post


Link to post
Share on other sites

@A_B_C, those ".sc" (small caps) glyphs may be characters that result from a combination of multiple code points, rather than not being mapped at all...?  There are actually small caps code points in the unicode standard but they do not include the diacritical marks, so those characters would need to use the combining character code points to be represented.  Not sure what tool you are using to view the font, but do you know if that property field would populate in that scenario?

 

Details:

https://en.wikipedia.org/wiki/Small_caps#Unicode

https://en.wikipedia.org/wiki/Combining_character

 

 

EDIT: I don't think that is what is happening here.  The code points are actually in your screenshot, just below each letter in the big area on the left, and they are not the "small caps" code points even for the glyphs with the ".sc" in the name.  Code point 0041 is the code point for a capital letter A, for example.  Note also that the combined "AE" form is 00C6 for both of them - the "normal" and the "small caps" variant...  these are mapped to code points, but for some reason your tool is not showing those mappings in the field you pointed to.

Share this post


Link to post
Share on other sites

Until Affinity can connect the actual font to the text none of this matters.

You cannot simply rely on Unicode code points - because (1) many characters are un-mapped, and (2) even if they are mapped there is no standard across fonts for the use of the Private Use Area (PUA) which would make the characters identifiable.
The advantage of having all glyphs mapped in the PUA is applications which only understand Unicode, such as LibreOffice (LO), can use their Special Characters browser to insert characters from the PUA.
There is no Glyph Browser in LO so you cannot manually insert an Arial small cap character; they are all un-mapped.

InDesign/Adobe are connecting the actual font to the characters using Unicode code points, and/or glyph number or glyph name.
And they are coding this into the PDF export.
That is the only explanation that makes sense when you can round-trip a PDF and get back to the same characters.

With some characters such as ligatures, the ToUnicode table has codes for the individual characters, not the ligature.
They are reading the font and decomposing the ligature, and then encoding the ToUnicode so it will import properly.

If an un-mapped glyph is surviving a round-trip they must be using the glyph number.
They could be using the glyph number for that specific font and putting that in the ToUnicode table (or somewhere).
To bring it back in they would have to know it is a glyph number in the ToUnicode table.
How is that marked? 
Or is that info being kept somewhere else in the PDF?

Another Affinity user mentioned that IL was importing un-mapped glyphs as outlines.
Is ID different than IL in this respect? Appears to be.
Or is IL only doing that when the PDF was not created by ID (with the special glyph number handling)?

We do know that the applications creating the PDFs are all over the place in terms of quality.
Some do not even have a ToUnicode table.
Adobe may either just be doing it all right (per the PDF specs), or they could be doing their usual special rules.

I think the only way to find out how they are doing it is to systematically analyze what is going into the PDFs.
And what is then imported back again.
Find out how are they identifying the glyphs.

Back to my first statement ... until Affinity can connect the incoming text to the actual font this is not going to work.
Some AI to analyze the PDF will be needed.
And then the ability to do a lot of IF-THEN's to get to the right glyphs.

 

Share this post


Link to post
Share on other sites
3 hours ago, fde101 said:

Not sure what tool you are using to view the font, but do you know if that property field would populate in that scenario?

The tool is Fontlab VI, so I am pretty sure that if these glyphs had a Unicode code point assigned, Fontlab would display them in the Glyph Panel. That’s the most basic thing a font editor should do when you open a font. In fact, the greyed-out code points you can see below the glyphs just indicate the semantic dependence of the small caps glyphs on the respective standard characters.

To confirm what I said, here is a screen shot from DTL Open Type Master (OTM). As you can see, even the naked “A.sc” has no Unicode code point assigned:

No-Unicode.png.ee4da1efdcec8238e4939fada11cb872.png

You may say “Well, the input fields below the glyph contour only relate to the Basic Multilingual Plane (BMP), so what about code points beyond this plane?“ — but you must know that OTM adds input fields for code points that lie beyond the BMP, as soon as you assign such a code point to a glyph and export the font anew. For instance, when I assign “A.sc” to a code point in the Supplementary Private Use Area (PUA), here F0000, and do the export, the additional input fields will immediately show up:

Beyond-BMP.png.191f1869ff25b4d2a124277e3e254522.png

So you can really believe me that all of the small caps glyphs are unencoded in the version of Adobe Caslon Pro I took my screenshot from (this version came with an earlier iteration of Indesign, so it is by no means “unofficial”). And there are many other fonts where the same is the case. :)

Share this post


Link to post
Share on other sites
1 hour ago, A_B_C said:

if these glyphs had a Unicode code point assigned, Fontlab would display them in the Glyph Panel

What are the numbers below the characters in your screenshot?

image.png.b2848521db462e8ec932c685cf9421d4.png

Share this post


Link to post
Share on other sites

Seems you don’t believe me. ;-)

As I said in my explanation above, the greyed-out code points you can see below the glyphs just indicate the semantic dependence of the small caps glyphs on the respective standard characters. They don’t indicate any assigned code point. This is a peculiarity of the FontLab VI interface. I admit it is not very obvious, but you would have to ask the development team of FontLab VI, why they decided to do it this way. I don’t like it either.

Here are two screen shots from Fontlab Studio 5, the earlier iteration of Fontlab, that didn’t have this peculiarity. You can clearly see that the small cap “A” is not encoded:

Not-encoded.thumb.png.2c9801469408de244acc664684f8752b.png

This is what an encoded glyph would look like in Fontlab Studio 5:

Encoded.thumb.png.fa8e6886a20054fee771523e284e289f.png

Hope this is convincing enough … :)

 

 

Share this post


Link to post
Share on other sites

And you will note in the last screenshots, how many glyphs in the said version of Adobe Caslon Pro are not encoded. It is all those that have the “- - -” in the glyph cell header. :)

Share this post


Link to post
Share on other sites
22 hours ago, A_B_C said:

But aren’t we talking about IDML export here? :)

Brain fart!

But, I think the answer is sort of the same.
They have to be saving those glyph numbers in the IDML file to be able call them back.

... Which is exactly what you said in your first post ...
 O.o

 

Share this post


Link to post
Share on other sites

When I reviewed my second post today, I noticed what might have gone wrong there. Or rather, I believe my example was not suitably chosen. As described above, I had created a font with a “case” feature and applied that feature to an uppercase letter, in order to exchange it for an unencoded glyph. Applying this feature results in the following XML code in the IDML document:

AllCaps.png.b3c21998c2c91a4528341d1e7182772c.png

Without knowing the inner workings of the IDML importer, I can imagine why this may confuse the algorithms. As “A” is uppercase, the importer may consider the property “Capitalization = AllCaps” already applied. Hence there is no substitution. So that was perhaps an unfortunate example. I should have checked the resulting XML code before uploading.

So in order to test the IDML importer, we would have to use an example that is closer to real life. So I replaced the “case” feature by a “c2sc” feature with the same substitution rule as above. Then the resulting XML code is:

C2SC.png.dda82f6ee9efbeadb03414c567a74a01.png

But Publisher still refuses to apply the OpenType substitution. Instead, a faux small cap is created from the “A” glyph:

Publisher.png.fe3258a2d819e11ca3d42ea3804cd01d.png

Here are the documents, if you would like to check:

UnencodedSmallCaps.otf

UnencodedSmallCaps.idml

Share this post


Link to post
Share on other sites

Regarding my first post, the problem seems to be that the IDML importer of Publisher doesn’t interpret the following XML context correctly:

Glyph.thumb.png.c6514e789c86c2ae3e16bf4057523bcd.png

Story_ucc.xml

As you can see, the glyph name “glyph” is registered in the IDML code of the story contained in my text frame. Indeed, there is a special syntax for handling custom glyphs in the IDML specification, as I noticed this morning.

@Pauls, I would be more than happy to provide additional examples if needed. If you haven’t already discovered the root of the issue, please attach these findings to your report (afb-3049). It would be just awesome, if that could be sorted out. :)

Share this post


Link to post
Share on other sites

We have made fixes/improvements to these areas (Unencoded Glyphs Not Handled Properly) & (AllSmallCaps, SmallCaps and CapsToSmallCaps implementation) of the program in the latest Affinity Publisher beta. If you would like to try these changes the beta software is available in the forum posts listed below. Once Affinity Publisher has been through a full beta process the change will be released in a future free 1.8.0 update to all customers.

The 1.8.0 builds are in links at the top of these beta forum posts

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

Please note the Annual Company Closure section in the Terms of Use. These are the Terms of Use you will be asked to agree to if you join the forum. | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.