Jump to content
You must now use your email address to sign in [click for more info] ×

Text Not Rendering Properly with Copy/Paste From PDF


Recommended Posts

(a black diamond with a white question mark) this character usually represent a character that is missing in the font use by a program (for example when writing in UTF-8 in an application or web page that use simpler encoding).

Perhaps there are ligatures or other complexe characters in your PDF, or special spaces, etc.?
If you provided at least a sample PDF with those characters some would be able to explain it better.

 

Link to comment
Share on other sites

Depending of the answers on this page:

Quote

It depends on the name used in the font for the "ffi" etc glyph. If the name is the standard (f_f_i) then intelligent pdf-readers like acrobat are able to separate the glyph in its parts. If a non-standard name is used (like e.g. in old versions of the cm-fonts) than the reader can not identify the glyph components. So how copy & paste works depends 1. on the font and 2. on the pdf-reader.

 

Link to comment
Share on other sites

Those ligatures have no code points in Calibri and are composed of a single glyph. Affinity products are not decomposing the parts and are just stuffing the glyph as is into an unused Unicode code point.

They are named properly in the font file, so ID, for instance, properly encodes the underlying tt glyph (for example) as U+0074 0074, which are two /t characters.

Link to comment
Share on other sites

Just now, BrianUni said:

Do you think this is an issue with Affinity Publisher, or with the PDF reader?

I just downloaded Adobe Acrobat Reader DC, then used the copy/paste function, and it gave me squares, not question marks. 

It's an issue with how Affinity applications are writing the PDF.

The issue of the boxed question marks versus the rectangle isn't important per se.

Link to comment
Share on other sites

Just now, BrianUni said:

Sorry, Mike.  Hadn't refreshed my page. 

So it's a Calibri issue!

I used Ariel and it solved the issue.

Thanks for your help!

We are cross posting...

It's not really a font issue in that both fonts are properly made. In the Arial instance, there are not ligatures as would affect this issue. Calibri does have those ligatures.

Link to comment
Share on other sites

30 minutes ago, LibreTraining said:

@BrianUni

Did you test this with the same document exported to PDF using InDesign?
If yes I would like to see it.

Been playing with this since you first posted.
I have been looking at the actual ToUnicode table in the PDFs for each test and would like to see one from ID too.

 

On 8/19/2019 at 12:55 PM, MikeW said:

...They are named properly in the font file, so ID, for instance, properly encodes the underlying tt glyph (for example) as U+0074 0074, which are two /t characters.

I did and the result is in the post of mine I quoted.

Link to comment
Share on other sites

  • 4 weeks later...

Thanks for the ID PDF.

These PDFs kinda confirmed what I was thinking based on reviewing the actual ToUnicode tables in the PDFs.

When APub exports a font subset it does not put the correct glyph number in the ToUnicode table. It appears to just have increasing/incrementing numbers in that field. That is why we see glyph numbers like #03 which is just wrong. And the character it gets mapped to is also wrong where it is often simply [20] which is the Unicode space code point.

When APub prints to a PDF printer and embeds the entire font it does a bit better. I saw in the ToUnicode table that there were now what looked like actual glyph numbers. It does actually have the correct glyph number for the "ti" ligature (#415), but it still does not connect that to correct Unicode code points.
In my test PDF with the full font embedded it maps to: LATIN CAPITAL LETTER O WITH MIDDLE TILDE [19F] - which is obviously wrong.

In your ID PDF, which only has a subset embedded, it correctly identifies it as glyph #415 in the font, and maps it to two Unicode code points: LATIN SMALL LETTER T [74] + LATIN SMALL LETTER I [74] - (Note this could be an error in my PDF tool as small letter i is actually [69] - or ID messed-up). The small letter t is correct as [74].
So it has the correct glyph number if you have the font installed.
And it maps to the correct multiple Unicode code points if not.

The "fi" ligature is different in that it actually has a Unicode code point.
In your ligatures_apub2 PDF APub incorrectly sets the glyph number as 11 (should be #302), but it does map it to the correct Unicode code point: LATIN SMALL LIGATURE FI [FB01].

So APub is inserting both wrong glyph numbers, and wrong Unicode code points.
Sometimes it does get one or the other correct, but not at the same time.

Since we have not heard from any Affinity folks about this I assume that they know this is not working properly and currently have the fire hose aimed elsewhere. :D

Link to comment
Share on other sites

1 hour ago, LibreTraining said:

...In your ID PDF, which only has a subset embedded, it correctly identifies it as glyph #415 in the font, and maps it to two Unicode code points: LATIN SMALL LETTER T [74] + LATIN SMALL LETTER I [74] - (Note this could be an error in my PDF tool as small letter i is actually [69] - or ID messed-up). The small letter t is correct as [74]...

The tool is in error. ID, etc., does this correctly.

Capture_000219.png.9341e8ea4b74604c1ce076ccf5a25f89.png

Link to comment
Share on other sites

I did realize this morning an error in my thinking.
When I print to PDF to force embed the whole font it is the PDF printer rendering engine that is doing the glyph/Unicode mapping, not the APub PDF rendering engine.
So one of the many non-Adobe PDF printers may work correctly (as a work-around for now).
Have to test some others.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.