Jump to content

Bug in converting PDF to Publisher


Recommended Posts

  • Moderators

Hi @NNN,

Thanks for the files.

I can replicate the issue and will get this logged with the Dev team.

If you just need to display the PDF, you can use File>Place and place the PDF on the page and the PDF will be displayed and exported correctly.

Link to comment
Share on other sites

EDIT: The problem is more complicated than what I posted below.
The Unicode codes are off, but there is more going on with the encoding.
Investigating now, and will post below when done.

Original post:
The problem is in the original PDF.
Those odd Unicode codes are behind the visible characters.
So APub is just reading and displaying what is actually there.

Edited by LibreTraining
More to the issue than original post.
Link to comment
Share on other sites

Whilst I acknowledge Libre Trainings expertise in type, fonts, Unicode's, hidden control characters, etc there must be more that can be done.

Opening PDF's is one of the biggest headaches I have with Publisher. If I convert NNN's original PDF to word using an online tool (Ilovepdf) it does so without problems. I know that the original probably came from a word doc exported as a PDF, but if I copy and paste the text into publisher it gives the right characters. There are many of us who do work free of charge for small groups and charities who welcome Publisher as a stable tool at an affordable price, but most of the text for our work comes in word document format, dealing with bullets and numbering copied over is a nightmare. If you open a pdf it recognises the Bullets, but gets characters wrong, if you copy and paste it recognises the characters but not the bullets.  You also need to be aware of paste or pastes special. 

Whilst I accept using the place to get things right, which I often do with commercially prepared leaflets being inserted into a publication, it means any alterations are a start over in the original pdf/doc.

Not being a user of other commercial software, I cannot compare how they perform these tasks

 

Alan Pickup

Windows 10 Home all Afinity suite of Apps PC and Gigabyte Laptop 16gb Ram and Nvidia GTX1660 Super on each.

Link to comment
Share on other sites

Most of my PDF editors/converters could not convert that text.
Nitro Pro displayed a warning about non-standard encoding.
PDF-XChange displayed a warning: Errors detected in the XREF table
FlexiPDF, PDF-XChange, Nitro Pro - export to Word did not work.
Infix PDF, Master PDF - export to text did not work.
Surprisingly, Foxit PDF Editor did export to Word correctly.
So it appears it did correctly figure-out the encoding (or guess).

On the majority of the text the encoding is marked as Custom or Built In.
The three text blocks which cause problems are marked as Identity-H encoding.
There is no reason for those particular pieces of text to be Identity-H.
They are just plain Latin text characters which are in the most common encodings.
The document was created with InDesign 16 on a Mac, so the PDF library is Quartz.
Not sure why it would single out those three blocks of text as Identity-H.

Above, after a quick look, I originally stated that APub was displaying the Unicode codes that are there, but it is not. It is display something else. The codes I see under the characters are all wacko, but they are not what is displayed. So I am not sure what APub is doing.
In theory, because Identity-H is a standard character map, you should be able to go from the embedded fonts character IDs (CID) and translate that into the correct Unicode code points, completely ignoring the PDF ToUnicode table. Dunno. Makes me crazy.

11 hours ago, AlanPickup said:

If I convert NNN's original PDF to word using an online tool (Ilovepdf) it does so without problems.

Their back-end application touts its ability to deal with non-standard encoding.
See: https://player.vimeo.com/video/574161060
So not sure if that is the best example, as most apps seem to have a problem with this.

11 hours ago, AlanPickup said:

most of the text for our work comes in word document format, dealing with bullets and numbering copied over is a nightmare. If you open a pdf it recognises the Bullets, but gets characters wrong, if you copy and paste it recognises the characters but not the bullets.

This is a different issue - which caused a lot of issues with LibreOffice compatibility with Word.
Word uses a symbol font (non-Unicode) for the bullets.
The font is also called Symbol (file symbol.ttf).
In the font file the PostScriptName is SymbolMT (which is what you see in the PDF fonts).
So what happened with LibreOffice is they want to be able to round-trip documents with Word users on Windows, and LibreOffice user on Linux, etc.
LibreOffice does not always have access to this Windows font (e.g Linux users).
So they needed to get the bullets working on multiple OSs and applications.
It stopped working on LO on Linux, then it did not work on Windows, etc.
They finally got it working.

When you paste the bullets and the text, the first thing the app sees is that bullet, and that bullet is a symbol font, so it guesses that the text is all the same, and you get a mess because it thinks it is all symbols.
If you paste just the text, it only sees the text as a normal Unicode font.
If you import the DOCX it does bring both the bullets and text correctly.
 

Link to comment
Share on other sites

On 2/15/2022 at 5:20 AM, NNN said:

Hi,

I recently opened a PDF (see: Original.pdf) in Publisher and a few errors occured (see: Opened in Publisher and exported to PDF) which I marked in red. I hope this will help to improve the PDF conversion.

Thanks.

Opened in Publisher and exported as PDF.pdf 649.61 kB · 29 downloads Original.pdf 126.15 kB · 22 downloads

 

On 2/16/2022 at 4:02 AM, stokerg said:

Hi @NNN,

Thanks for the files.

I can replicate the issue and will get this logged with the Dev team.

If you just need to display the PDF, you can use File>Place and place the PDF on the page and the PDF will be displayed and exported correctly.

Sorry your having difficulty with this.

I don't believe that this can be officially classified as a bug in Affinity Publisher, due to the following observations. We are simply dealing with a corrupt document that was exported from another application or via transport over a network.

I was able to confirm that the original file showed evidence of corruption upon either export from InDesign and "Quartz PDFContext" on an Apple computer or through file transmission. I have also heard of situations like this where Acrobat will also find the document corrupted. The fact is here is that we are dealing with a corrupt file, this happens with MS Word documents, Photoshop PSD files I could go on and on.  If an app opened this file up it is going to end up being a garbage in and garbage out situation, which is what we have here, but that is not due to Affinity Publisher. The problem is how this particular source file was created and generated using those other apps and what occurred during transport plain and simple.

It seems that Affinity Publisher does not have the problem here.

This could've happened due to numerous issues technical issues on the end of the user that exported the file or even through e-mail transmission over a network. Was this file sent to you in a zip file? As I mentioned the incompletion of data packets could be due to a number of issues which are numerous. PDF apps like Acrobat will attempt to fix corrupted tables but there is no guarantee depending on what was broken. There are so many variables.

BTW, I was able to successfully see the wording in its true form after using special tools, but even then you need to have those fonts installed in your system (some that may require a license).  The first jumbled text BTW said HAL ELROD's. Whoever sent this particular PDF needs to establish better guidelines to check the integrity of every PDF file before sending them out, its not impossible and it doesn't have to be expensive either. I don't recommend wasting your time working with any corrupt files sent to you. Your time is too valuable for that. At least we now confirmed that the file was corrupted from another source to begin with. 

Link to comment
Share on other sites

3 hours ago, PixelEngineer said:

I don't recommend wasting your time working with any corrupt files sent to you.

This file was for internal use only. It was not sent by a customer and I placed this file here because I want Publisher to iconvert PDF files as good as possible.

Link to comment
Share on other sites

Thank you Libre Office and Pixel Engineer as stated in my post i acknowledge your skill and expertise in these subjects and your explanations really give an insight into how these things happen (oh. for the halcyon days of hot and cold metal of my youth).

I am taking it Libre when you say import at the bottom of your post you refer to place?

My normal way of working is to remove any bullets in the word doc before pasting, using a shortcut to paste without format, then add bullets as needed using the bullet function.

As Poirot would say "it keeps my little grey cells working"

 

Alan Pickup

Windows 10 Home all Afinity suite of Apps PC and Gigabyte Laptop 16gb Ram and Nvidia GTX1660 Super on each.

Link to comment
Share on other sites

8 hours ago, PixelEngineer said:

We are simply dealing with a corrupt document that was exported from another application or via transport over a network.

Yesterday after posting I ran the original PDF through a PDF repair tool - it showed many errors and said the PDF is corrupt.

Think I will do that test first in the future.

Link to comment
Share on other sites

7 hours ago, AlanPickup said:

(oh. for the halcyon days of hot and cold metal of my youth).

Dear god no. I don't miss those at all. Do you not remember customers wanting the font size to be 'a little bit bigger'? Reset it all and then 'Okay, the first was better.'

MacBook Pro (13-inch, Mid 2012) Mac OS 10.12.6 || Mac Pro (Late 2013) Mac OS 11.6.8

Affinity Designer 1.10.5 | Affinity Photo 1.10.5 | Affinity Publisher 1.10.5 | Beta versions as they appear.

I have never mastered color management, period, so I cannot help with that.

Link to comment
Share on other sites

Being a jobbing printer it was the Henry Ford method any design as long as it is in our book of designs. Letterheads,  invoice books, flyers. Design work was a rarity, those that wanted it usually had their own design team and went down the Offset-Litho route, full colour process was still something done in the big cities.

When I served my time in our small town there were at least 7 jobbing printers most running pre war machines (the poster printers machines were pre first world war), think we were the only one with offset-litho and camera processing, none of them exist today.

Needless to say I changed careers to support a young family, but always kept my interest in graphics, as my new career involved computers I soon got in to the graphics programmes, but never had to make a living at it again.

 

Alan Pickup

Windows 10 Home all Afinity suite of Apps PC and Gigabyte Laptop 16gb Ram and Nvidia GTX1660 Super on each.

Link to comment
Share on other sites

2 hours ago, AlanPickup said:

When I served my time in our small town there were at least 7 jobbing printers most running pre war machines (the poster printers machines were pre first world war), think we were the only one with offset-litho and camera processing, none of them exist today.

Me in my town it was two printers. We had a hand fed letterpress and a Heidelberg letterpress, scary stuff. Couple of trays of wooden type in the two inch size. Offset and darkroom and plate burning. I have managed to lose the ability to read text upside down and backwards, a real niche skill.

MacBook Pro (13-inch, Mid 2012) Mac OS 10.12.6 || Mac Pro (Late 2013) Mac OS 11.6.8

Affinity Designer 1.10.5 | Affinity Photo 1.10.5 | Affinity Publisher 1.10.5 | Beta versions as they appear.

I have never mastered color management, period, so I cannot help with that.

Link to comment
Share on other sites

We had an Auto-Vicobold which had a feeder that could be removed and then used as a hand feeder, as this was attached to an electric motor it was scary hand feeding and the feeder was a nightmare to work. It was a great machine for cutting and creasing as it toggled rather than clam shell, I was glad when we swapped it for a Heidelberg windmill. We also had a multilith 1250 and those two were the busy machines. The Rotaprint 95 was the larger format but only got used a couple of days a week, the main jobs on that were for draughtsmen as we could print in reverse on the translucent drawing film which gave better contact when printing plans.

We had two process cameras and a negative touchup and plate making machine, retouching negatives really increased my brush skills. We had a small studio as the owner was a trained graphic artist and a member of SLADE, this was the reason I took the job as he encouraged my passion for graphic design, but there was not enough call to be full time.

Type setting was from 6pt to largest 72pt, anything bigger was hand drawn and printed litho, setting 6pt type by hand, manly for business cards was tough.

The poster printer had an old wharfedale and could print all the big size posters, I was friendly with him and used to love going round his workshop. I still watch some of the videos on Youtube of people restoring and running those machines, although I have not run a printing machine in over 40 years.

 

Alan Pickup

Windows 10 Home all Afinity suite of Apps PC and Gigabyte Laptop 16gb Ram and Nvidia GTX1660 Super on each.

Link to comment
Share on other sites

6 hours ago, AlanPickup said:

I have not run a printing machine in over 40 years.

About the same here. Moved over to design and layout for a tiny town's daily paper. Remember Newspapers? [sardonic smiley face emoticon]

MacBook Pro (13-inch, Mid 2012) Mac OS 10.12.6 || Mac Pro (Late 2013) Mac OS 11.6.8

Affinity Designer 1.10.5 | Affinity Photo 1.10.5 | Affinity Publisher 1.10.5 | Beta versions as they appear.

I have never mastered color management, period, so I cannot help with that.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

×
×
  • Create New...

Important Information

Please note there is currently a delay in replying to some post. See pinned thread in the Questions forum. These are the Terms of Use you will be asked to agree to if you join the forum. | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.