BrianUni Posted August 19, 2019 Share Posted August 19, 2019 When I copy and paste text from PDF file I exported from Affinity Publisher, I get weird characters. Text Not Rendering Properly in PDF Export.mp4 Quote Link to comment Share on other sites More sharing options...
Wosven Posted August 19, 2019 Share Posted August 19, 2019 � (a black diamond with a white question mark) this character usually represent a character that is missing in the font use by a program (for example when writing in UTF-8 in an application or web page that use simpler encoding). Perhaps there are ligatures or other complexe characters in your PDF, or special spaces, etc.? If you provided at least a sample PDF with those characters some would be able to explain it better. Quote Link to comment Share on other sites More sharing options...
BrianUni Posted August 19, 2019 Author Share Posted August 19, 2019 Thanks for your reply! Attached is the PDF in question. 8-19-19 Quote Universal Healthcare - Fuquay-Varina .pdf Quote Link to comment Share on other sites More sharing options...
Wosven Posted August 19, 2019 Share Posted August 19, 2019 Thanks, I hope @LibreTraining or some other Fonts Gurus will help with this one Oups, I clicked too fast, it's the ligatures (tt and ti) that aren't reconized when copied: Des�ni Sco� Quote Link to comment Share on other sites More sharing options...
Wosven Posted August 19, 2019 Share Posted August 19, 2019 Depending of the answers on this page: Quote It depends on the name used in the font for the "ffi" etc glyph. If the name is the standard (f_f_i) then intelligent pdf-readers like acrobat are able to separate the glyph in its parts. If a non-standard name is used (like e.g. in old versions of the cm-fonts) than the reader can not identify the glyph components. So how copy & paste works depends 1. on the font and 2. on the pdf-reader. Quote Link to comment Share on other sites More sharing options...
MikeW Posted August 19, 2019 Share Posted August 19, 2019 Those ligatures have no code points in Calibri and are composed of a single glyph. Affinity products are not decomposing the parts and are just stuffing the glyph as is into an unused Unicode code point. They are named properly in the font file, so ID, for instance, properly encodes the underlying tt glyph (for example) as U+0074 0074, which are two /t characters. Quote Link to comment Share on other sites More sharing options...
BrianUni Posted August 19, 2019 Author Share Posted August 19, 2019 Do you think this is an issue with Affinity Publisher, or with the PDF reader? I just downloaded Adobe Acrobat Reader DC, then used the copy/paste function, and it gave me squares, not question marks. Quote Link to comment Share on other sites More sharing options...
MikeW Posted August 19, 2019 Share Posted August 19, 2019 Just now, BrianUni said: Do you think this is an issue with Affinity Publisher, or with the PDF reader? I just downloaded Adobe Acrobat Reader DC, then used the copy/paste function, and it gave me squares, not question marks. It's an issue with how Affinity applications are writing the PDF. The issue of the boxed question marks versus the rectangle isn't important per se. Quote Link to comment Share on other sites More sharing options...
BrianUni Posted August 19, 2019 Author Share Posted August 19, 2019 Sorry, Mike. Hadn't refreshed my page. So it's a Calibri issue! I used Ariel and it solved the issue. Thanks for your help! Quote Link to comment Share on other sites More sharing options...
MikeW Posted August 19, 2019 Share Posted August 19, 2019 Just now, BrianUni said: Sorry, Mike. Hadn't refreshed my page. So it's a Calibri issue! I used Ariel and it solved the issue. Thanks for your help! We are cross posting... It's not really a font issue in that both fonts are properly made. In the Arial instance, there are not ligatures as would affect this issue. Calibri does have those ligatures. Quote Link to comment Share on other sites More sharing options...
BrianUni Posted August 19, 2019 Author Share Posted August 19, 2019 Ok, I see that connection with the tt in the Calibri font. Thanks for your help, Mike! Quote Link to comment Share on other sites More sharing options...
kenmcd Posted August 22, 2019 Share Posted August 22, 2019 @BrianUni Did you test this with the same document exported to PDF using InDesign? If yes I would like to see it. Been playing with this since you first posted. I have been looking at the actual ToUnicode table in the PDFs for each test and would like to see one from ID too. Quote Link to comment Share on other sites More sharing options...
MikeW Posted August 22, 2019 Share Posted August 22, 2019 30 minutes ago, LibreTraining said: @BrianUni Did you test this with the same document exported to PDF using InDesign? If yes I would like to see it. Been playing with this since you first posted. I have been looking at the actual ToUnicode table in the PDFs for each test and would like to see one from ID too. On 8/19/2019 at 12:55 PM, MikeW said: ...They are named properly in the font file, so ID, for instance, properly encodes the underlying tt glyph (for example) as U+0074 0074, which are two /t characters. I did and the result is in the post of mine I quoted. Quote Link to comment Share on other sites More sharing options...
BrianUni Posted August 23, 2019 Author Share Posted August 23, 2019 Copy Paste Directly From Affinitny Publisher.mp4 Quote Link to comment Share on other sites More sharing options...
Staff Pauls Posted September 16, 2019 Staff Share Posted September 16, 2019 Can we get the original afpub file please. It can be uploaded here Quote Link to comment Share on other sites More sharing options...
BrianUni Posted September 16, 2019 Author Share Posted September 16, 2019 Done! thanks, Paul! Pauls 1 Quote Link to comment Share on other sites More sharing options...
lacerto Posted September 16, 2019 Share Posted September 16, 2019 (...) Pauls 1 Quote Link to comment Share on other sites More sharing options...
lacerto Posted September 16, 2019 Share Posted September 16, 2019 (...) Quote Link to comment Share on other sites More sharing options...
kenmcd Posted September 17, 2019 Share Posted September 17, 2019 Thanks for the ID PDF. These PDFs kinda confirmed what I was thinking based on reviewing the actual ToUnicode tables in the PDFs. When APub exports a font subset it does not put the correct glyph number in the ToUnicode table. It appears to just have increasing/incrementing numbers in that field. That is why we see glyph numbers like #03 which is just wrong. And the character it gets mapped to is also wrong where it is often simply [20] which is the Unicode space code point. When APub prints to a PDF printer and embeds the entire font it does a bit better. I saw in the ToUnicode table that there were now what looked like actual glyph numbers. It does actually have the correct glyph number for the "ti" ligature (#415), but it still does not connect that to correct Unicode code points. In my test PDF with the full font embedded it maps to: LATIN CAPITAL LETTER O WITH MIDDLE TILDE [19F] - which is obviously wrong. In your ID PDF, which only has a subset embedded, it correctly identifies it as glyph #415 in the font, and maps it to two Unicode code points: LATIN SMALL LETTER T [74] + LATIN SMALL LETTER I [74] - (Note this could be an error in my PDF tool as small letter i is actually [69] - or ID messed-up). The small letter t is correct as [74]. So it has the correct glyph number if you have the font installed. And it maps to the correct multiple Unicode code points if not. The "fi" ligature is different in that it actually has a Unicode code point. In your ligatures_apub2 PDF APub incorrectly sets the glyph number as 11 (should be #302), but it does map it to the correct Unicode code point: LATIN SMALL LIGATURE FI [FB01]. So APub is inserting both wrong glyph numbers, and wrong Unicode code points. Sometimes it does get one or the other correct, but not at the same time. Since we have not heard from any Affinity folks about this I assume that they know this is not working properly and currently have the fire hose aimed elsewhere. Quote Link to comment Share on other sites More sharing options...
MikeW Posted September 17, 2019 Share Posted September 17, 2019 1 hour ago, LibreTraining said: ...In your ID PDF, which only has a subset embedded, it correctly identifies it as glyph #415 in the font, and maps it to two Unicode code points: LATIN SMALL LETTER T [74] + LATIN SMALL LETTER I [74] - (Note this could be an error in my PDF tool as small letter i is actually [69] - or ID messed-up). The small letter t is correct as [74]... The tool is in error. ID, etc., does this correctly. Quote Link to comment Share on other sites More sharing options...
lacerto Posted September 17, 2019 Share Posted September 17, 2019 (...) Quote Link to comment Share on other sites More sharing options...
kenmcd Posted September 17, 2019 Share Posted September 17, 2019 I did realize this morning an error in my thinking. When I print to PDF to force embed the whole font it is the PDF printer rendering engine that is doing the glyph/Unicode mapping, not the APub PDF rendering engine. So one of the many non-Adobe PDF printers may work correctly (as a work-around for now). Have to test some others. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.