Jump to content
You must now use your email address to sign in [click for more info] ×

Trouble with loading text from PDFs- "ti" "tt" "ft"


Recommended Posts

I am having an issue when loading a PDF document. The document is read properly when loaded in Adobe Reader, but when processed intro Affinity the following Text errors occur-

"ti" becomes "I " (space intentional) or "Ł " or "È"
"ti" (bold intentional) becomes "Ĉ"
"tt" becomes "N"
"ft" becomes "J " (space intentional)

I may have missed some examples. Whats also interesting is that this error is not consistent, some words are corrupted, others are not, and there doesn't seem to be a pattern as to which are corrupted or not. The error becomes permanent if saved as a pdf from affinity (that is, Adobe reads the text error and not the intended text after Affinity creates a new PDF save file), however, simply closing the document without saving and opening it in Reader shows no error.

This issue has appeared on versions of Affinity both 1 year old and current. I have searched and not found anyone else discussing this issue.

I am using Win10 64bit, and when loading into affinity I select "Load All Pages" Estimate/Estimate, and  "Group lines of text into text frame."
Affinity claims all fonts used by the document are available, in the options dialog box.

I am really not sure what to do, but as you can imagine, with over 100 pages of text, the rate of typos is far too much to simply address each time I have to reload a PDF file into Affinity.

Edit- I am using only Affinity Publisher, and also I have tried checking "Favor Editable Text over Fidelity" (It didn't help).

Edited by Faolind
Edit for additional useful information
Link to comment
Share on other sites

Welcome to the Affinity forums @Faolind!

Are the fonts used in the PDF installed on your system? Could you upload a sample page of that PDF for others to check? An alternative would be trying to save to text from a capable PDF reader.

------
Windows 10 | i5-8500 CPU | Intel UHD 630 Graphics | 32 GB RAM | Latest Retail and Beta versions of complete Affinity range installed

Link to comment
Share on other sites

9 hours ago, Joachim_L said:

Welcome to the Affinity forums @Faolind!

Are the fonts used in the PDF installed on your system? Could you upload a sample page of that PDF for others to check? An alternative would be trying to save to text from a capable PDF reader.


All fonts are installed, I've just checked. One of the fonts in question is Calibri, another is Papyrus. As calibri is ubiquitous, I didn't think to check.

The document itself is business-sensitive. However, let me see if I can find an unobtrusive page...

Attached- "Page 7" is the altered page, exported by Affinity.
                  "Page 7 (Original)" is the original text, exported via Reader as Print To PDF (I don't have access at current to another export method outside of Affinity, so while this erases the original encoding, it is the best I have for comparison). You may note a couple small errors similar to the Affinity export, this is because this has happened multiple times, and the first time we simply tried to correct it manually, clearly we missed some on that first pass.
 

Page 7.pdf Page 7 (original).pdf

Link to comment
Share on other sites

14 hours ago, Faolind said:

I have searched and not found anyone else discussing this issue.

I’m surprised you haven’t found any other discussions of this or similar issues. The character sequences “ti”, “tt” and “ft” are commonly replaced by ligatures, but if those ligatures are used in the PDF file and your installed copy of the font in question doesn’t have the expected glyphs at the specified codepoints you’ll get the effect that you describe.

15 hours ago, Faolind said:

Edit- I am using only Affinity Publisher, and also I have tried checking "Favor Editable Text over Fidelity" (It didn't help).

That only affects layout (e.g. keeping chunks of text together for easier editing) so the issue that you’ve described is unrelated.

Alfred spacer.png
Affinity Designer/Photo/Publisher 2 for Windows • Windows 10 Home/Pro
Affinity Designer/Photo/Publisher 2 for iPad • iPadOS 17.4.1 (iPad 7th gen)

Link to comment
Share on other sites

21 minutes ago, Alfred said:

I’m surprised you haven’t found any other discussions of this or similar issues. The character sequences “ti”, “tt” and “ft” are commonly replaced by ligatures, but if those ligatures are used in the PDF file and your installed copy of the font in question doesn’t have the expected glyphs at the specified codepoints you’ll get the effect that you describe.

Is there a way to fix this? Calibri shouldn't have this sort of problem, right? Also is there a way to verify this is the issue, by checking my font files, for example?

Link to comment
Share on other sites

14 minutes ago, Faolind said:

Is there a way to fix this? Calibri shouldn't have this sort of problem, right? Also is there a way to verify this is the issue, by checking my font files, for example?

Unless you have access to a font editor, I suspect that the simplest way to check would be to create some sample text and export it to PDF twice: once with the font subsetting option switched on and once with it switched off. Then open the resultant PDF files and look for differences between them.

Alfred spacer.png
Affinity Designer/Photo/Publisher 2 for Windows • Windows 10 Home/Pro
Affinity Designer/Photo/Publisher 2 for iPad • iPadOS 17.4.1 (iPad 7th gen)

Link to comment
Share on other sites

4 hours ago, Faolind said:

Is there a way to fix this? Calibri shouldn't have this sort of problem, right? Also is there a way to verify this is the issue, by checking my font files, for example?

Nothing to do with the fonts.

Your "original" PDF above has no text in it.
It is a bunch of oddly spaced drawings.
Try to "find" any word; you will get nothing in the search results.
Try to highlight text to copy it - nothing.

I have no idea how Affinity is getting any text out of that.

Also look at the spacing in the first paragraph.
It is not left-justified.
It is not fully-justified.
It has a bunch of weird spaces between the words.
So Affinity is guessing how to put it back together into actual text.

The issue appears to be with the original source.

 

Link to comment
Share on other sites

26 minutes ago, LibreTraining said:

Your "original" PDF above has no text in it.
It is a bunch of oddly spaced drawings.

I'm afraid your conclusion is wrong. As I stated, that is a snippet via "Print to PDF" which "prints" the displayed PDF page as a series of images. as I stated-
 

5 hours ago, Faolind said:

this erases the original encoding

but I don't have another way to snip out a non-sensitive page from the rest of the sensitive material.

So, While I appreciate your help, your conclusion is misinformed.

Link to comment
Share on other sites

1 hour ago, Faolind said:

but I don't have another way to snip out a non-sensitive page from the rest of the sensitive material.

Can't you just Export-to-PDF that one page from the other application?
Even printing to a PDF printer should preserve the text and embed the fonts properly.
So, you are right, I do not understand what you are doing, or why.

Calibri does have standard ligatures for ti, tt, and ft.
Your PDF writing application may or may not be encoding those in a way that can be imported.
The only way to tell is by looking at the actual original PDF text to see the codes behind it.
Then we can tell if it is a problem with APub or the writing app.

Note: one work-around is to disable the standard ligatures in the source app.
That way individual characters (not ligatures) get embedded in the PDF.
Then when re-opened in the target app those standard ligatures will automatically re-appear in the text.

Link to comment
Share on other sites

On 11/1/2021 at 7:15 PM, Faolind said:

... Whats also interesting is that this error is not consistent, some words are corrupted, others are not, and there doesn't seem to be a pattern as to which are corrupted or not. ...

I am going to guess that somehow the corrupted text (versus the not corrupted text) is not applying the ligatures in the original application. The original application is the one which is generating the PDF that you are opening with Publisher. 

I have to ask why you are using a PDF instead of the actual text and images.

Mac Pro (Late 2013) Mac OS 12.7.4 
Affinity Designer 2.4.1 | Affinity Photo 2.4.1 | Affinity Publisher 2.4.1 | Beta versions as they appear.

I have never mastered color management, period, so I cannot help with that.

Link to comment
Share on other sites

17 hours ago, LibreTraining said:

Even printing to a PDF printer should preserve the text and embed the fonts properly.

Some (many?) PDF printer drivers ‘print’ a series of images, as the OP has stated. That’s why I suggested testing the font(s) by exporting to PDF from an Affinity app.

Alfred spacer.png
Affinity Designer/Photo/Publisher 2 for Windows • Windows 10 Home/Pro
Affinity Designer/Photo/Publisher 2 for iPad • iPadOS 17.4.1 (iPad 7th gen)

Link to comment
Share on other sites

7 minutes ago, LibreTraining said:

Which ones only print images?

I’ve encountered several over the years but I’m afraid I can’t name any off the top of my head.

Alfred spacer.png
Affinity Designer/Photo/Publisher 2 for Windows • Windows 10 Home/Pro
Affinity Designer/Photo/Publisher 2 for iPad • iPadOS 17.4.1 (iPad 7th gen)

Link to comment
Share on other sites

1 minute ago, Alfred said:

I’ve encountered several over the years but I’m afraid I can’t name any off the top of my head.

Yeah, I tend to focus on which ones give me the most control over font embedding (i.e to force embed everything, etc.) and do not pay much attention to the others.
So I was just curious. Some of the ones I have probably work that way. :-)

 

Link to comment
Share on other sites

  • 1 year later...
  • 7 months later...

I, too, have this problem on Mac, when exporting PDF from Pages using Calabri font.

After export, when I copy and paste from the export PDF, words like "software" become "So4ware".  Thus "ft", becomes "4".  Weird and awful experience.

Link to comment
Share on other sites

44 minutes ago, Michael_7 said:

I, too, have this problem on Mac, when exporting PDF from Pages using Calabri font.

After export, when I copy and paste from the export PDF, words like "software" become "So4ware".  Thus "ft", becomes "4".  Weird and awful experience.

Hi Michael and welcome to the forums. This does seem to be a bug in Publisher.

You can avoid it by editing the document in Pages, selecting all of its text, and choosing Format > Font > Ligature > Use None. Also when opening a PDF in Publisher, ensure you select Favour editable text over fidelity.

Here's a test file for Serif in case this isn't a known bug. Calibri must be variable since it's not available in Publisher so the font is being substituted with Helvetica. It doesn't seem to matter what it's substituted with, but the font substitution combined with the ligature doesn't work. Changing the font in Pages to something else that works in Publisher resolves the problem, as does turning off ligatures in Pages.

test4.pagestest4.pdf

Download a free manual for Publisher 2.4 from this forum - expanded 300-page PDF

My system: Affinity 2.4.2 for macOS Sonoma 14.4.1, MacBook Pro 14" (M1 Pro)

Link to comment
Share on other sites

Quote

You can avoid it by editing the document in Pages, selecting all of its text, and choosing Format > Font > Ligature > Use None. 

Wow, this seems to work in Pages.app!  I'm just learning about Font Ligatures for the first time.

Thank you Mike, thank you!

Link to comment
Share on other sites

  • 1 month later...

Today, I encountered a recurring issue with the Calibri font in my PDFs. Initially created in Adobe InDesign, the PDFs were later edited using Affinity Publisher and exported again. While the PDFs appeared fine upon opening, transferring their text into programs like Word resulted in certain characters displaying as symbols.

This proved to be a significant setback for me, especially since one of these PDFs was my CV! Considering the possibility of text recognition software processing my CVs incorrectly, I'm quite displeased with the situation.

As a solution, I've opted to switch the font in all my PDFs from Calibri to another font to prevent further complications.

Link to comment
Share on other sites

15 minutes ago, KinoYarov said:

As a solution, I've opted to switch the font in all my PDFs from Calibri to another font to prevent further complications

A simpler approach, that you might try: Ensure that in your Export settings you turn off the Subset Fonts option when you create your PDF file.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.