Jump to content
You must now use your email address to sign in [click for more info] ×

Ligatures not recognised in pdf


Recommended Posts

When you open a pdf-file in Publisher which contains ligatures, they render incorrect.
The ligatures are not recognized and/or displayed correctly.
Also they are not replaced with/edditable as the original letter combination.

For example:
original letter combination of liguature -> lettercombination shown in Affinity Publisher

'tf' -> 'Ō'

'ti' -> 'Ï '  or 'ti' -> 'Ó'

'tt' -> 'R '

The real characters could not be copied in this post, but this is how they look.

Schermafbeelding 2019-03-11 om 10.40.56.png

Schermafbeelding 2019-03-11 om 10.39.53.png

Link to comment
Share on other sites

Thanks for the PDF.
There have been some other posts regarding ligatures so I wanted to take a closer look at the issue.

The bad news is I do not see any way this can reliably be imported (to any application).

I focused on the "ti" ligature in "innovatieve" to see what is happening.
When editing or viewing the PDF that text is copied as "innova4eve" in my tests.
The "ti" ligature glyph that is displayed in that word actually has the Unicode for the number 4.
So when it is cut-and-pasted that is what appears.

The font used is Calibri-Light so I took a look at the Win7 and Win10 versions of that font.
The "ti" ligature does not have a Unicode point (in this or any font).
To get the right glyph you would have to know where to find it in that font (the glyph #).
Apparently when fonts are sub-setted the only way to find the correct glyph is if a particular Unicode  mapping table is included in the PDF, but even then it only maps to the Unicode character names.
The "ti" ligature does not have a Unicode character name.

There is no standard code point or glyph number for the majority of ligatures.
There are a few ligatures which do have Unicode points purely for legacy support.
But for most ligatures it is up to the font designer where they put them (so they are all different).

So based on what I have found and understand ... there is no way to reverse decode these ligatures.

 

Link to comment
Share on other sites

I also had problems with ligatures being wrongly translated from pdf to AP. The original text I exported to pdf from the mellel word processor app. There all looked ok. Importing in AP went wrong. I t emporarily solved it to export from mellel with all ligatures disabled. Import of that pdf in AP  had no problems. Maybe this is an option for you.

Link to comment
Share on other sites

The ligatures are not "wrongly translated" or "imported wrong" by APub.
The problem is how the ligatures are embedded in the PDF.
The ligature outline is embedded with no connection back to the actual character in the actual font.

The export of the PDF without ligatures is a good workaround (if possible).
That way all of the characters have an actual Unicode point which can be recovered (or imported).

 

Link to comment
Share on other sites

11 hours ago, LibreTraining said:

The "ti" ligature does not have a Unicode point (in this or any font).

To get the right glyph you would have to know where to find it in that font (the glyph #).
Apparently when fonts are sub-setted the only way to find the correct glyph is if a particular Unicode  mapping table is included in the PDF, but even then it only maps to the Unicode character names.
The "ti" ligature does not have a Unicode character name. 

There is no standard code point or glyph number for the majority of ligatures.

That's quite bizarre.

Turning them off (when possible) is no big deal to me.
I didn't expect it would be that hard / impossible to fix.

Link to comment
Share on other sites

Yes, it seems quite bizarre to me too.

I am going to test and see what happens when the full font is embedded, not just a subset.
I wonder if then the connection is made to the correct glyph in the correct font.
The font sub-setting is definitely a problem so I wonder if the full font embedding fixes this issue.

But this also assumes the application makes the proper connection to the embedded font,
and I am not sure if APub can do this - there seem to be a number of issues with recognizing the fonts.
If the PDF library used in APub does not allow/enable this it simply will never work.

One of my PDF printers can force embed all fonts in the PDF (optionally).
One of my PDF editors can actually install the fonts from the PDF (optionally).
So I am going to test what happens with ligatures when the full font is embedded,
and the reader/editor can use that full font.
I want to know if it is possible with applications that have the right capabilities.

Adobe applications will never embed the full font even if it is a free, open source, "installable" font.
I assume this is their way of making sure all embedded fonts are basically broken.
Since sub-setted fonts in those PDFs will never work properly for ligatures,
any old ID PDF we try to import could (will) have problems if there are any ligatures.
There are many other characters/glyphs in fonts which could also be a problem.

So I think it would be helpful to determine exactly what is and is not possible on PDF import.
And then document that to prevent users from wasting time figuring it out over and over again.

Link to comment
Share on other sites

My head is spinning ...

Tested with fonts Calibri-Light, Bookmania, and Vollkorn.

As expected the standard ligatures which are Unicode points (fi, fl, ffi, ffl, etc.) always work.

LibreOffice > Export to PDF (subset fonts)  > Open PDF in APub.
The standard ligatures all seemed to import OK (even the non-unicode ligatures).
The discretionary ligatures all did not import correctly.

LibreOffice > Print to PDF printer (embed full fonts) > Open PDF in APub.
The standard ligatures all seemed to import OK (even the non-unicode ligatures).
And all the discretionary ligatures all seemed to import OK too.

Even the "ti" ligature in Calibri-Light always came across in the import.

I have not checked how those imported characters are coded.
Are they just outlines with odd codes like in your original above? Or?
Have to check.

So the writing application has a definite effect on how well it works.
The print to PDF with full font embeds brought over all ligatures on import to APub.

Why do the Calibri-Light "ti" standard ligatures come over when yours did not?
That can only be attributed to how the writing PDF application creates the PDF.

I tested more things than just the above and now my brain is tired of this.
Need a break.

 

Link to comment
Share on other sites

Thanks dor the extensive research.

Please note, that I picked a random pdf which just happens to be made by Sketch.

It was the first pdf I tried, so I thought it would be the same for any pdf-file.
It's not very likely many people would try to import a pdf from Sketch.

Today I tried another pdf with the same font, this time made exported from InDesign CS 6 and it worked perfectly.
Also the result of the import was much cleaner.
The pdf from Sketch was a bit of a mess, with strangely oversized text-frames and hard to select items.
The converted InDesign pdf was very VERY clean and easy to edit.
It only had some issues with texts with applied letter-spacing.
 

Link to comment
Share on other sites

26 minutes ago, A for Design said:

It's not very likely many people would try to import a pdf from Sketch.

If you did, chances are someone else will, so even if it turns out to be a help file comment like "Note that PDF files generated by some applications and containing ligatures may not import cleanly due to the manner in which some applications encode the text within the file" it is probably worth making people aware of.

Link to comment
Share on other sites

  • 3 months later...

I've noticed problems with ligatures in imported PDFs, too, as well as other misalignment of text. I've continued to see problems after rinsing the PDF through PostScript and re-distilling with Acrobat; also re-saving from Preview, so blaming the PDF is not viable. The PDFs display fine in Illustrator and other graphics programs.

The only solution is to use Acrobat or GhostScript to outline the fonts in the PDF, and then Designer (or Publisher) will display the PDF correctly.

Link to comment
Share on other sites

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.