Jump to content

compare two similar documents in Publisher


Recommended Posts

I just received an email from a client who produces audiobooks.

Quote

It turns out that for one of our productions... [the] Press sent us the PDF of the first edition, instead of the second edition. The first edition was the one we recorded. To turn the recording into the second edition we need to compare the two PDFs, first and second editions.

The Press recommended we "export both PDFs into Word docs and run a comparison using the “Review – Compare” function."

Can I do this in AF Publisher? I don't have MS Word.

Link to comment
Share on other sites

1 hour ago, mistergarth said:

Can I do this in AF Publisher? I don't have MS Word

No.

Adobe Acrobat Pro/DC itself has this functionality (ability to compare two PDF documents). If you're on Windows, you might want to consider purchasing a license for PDF/X-Change Editor, which can do this, as well --  it does not cost much, it is perpetual, and the app itself is full of useful, well implemented features.

Link to comment
Share on other sites

Thanks for the replies. I'm on a Mac, so is my client. The online "Compare PDFs" tool is interesting. Maybe my client will find it useful, but it gets confused with line-ending hyphens, so the results are muddied by many erroneous differences.

I know I could do this in Adobe Acrobat, but I hate hate hate the subscription model, and I've used my free trial of Acrobat. I might use a different computer to get another trial for this one project.

Link to comment
Share on other sites

58 minutes ago, mistergarth said:

Arg... as I plunge deeper into this rabbit hole, it looks like the perfect solution does not exist. Persons with a similar problem may find this illuminating:

https://eclecticlight.co/2021/08/14/how-to-compare-two-pdf-documents/

The last paragraph of that article, beginning with the phrase "The sorry truth..." sums it up nicely.

Affinity Photo 1.10.5, Affinity Designer 1.10.5, Affinity Publisher 1.10.5;  2020 iMac 27"; 3.8GHz i7, Radeon Pro 5700, 32GB RAM; macOS 10.15.7
Affinity Photo 
1.10.5.280 & Affinity Designer 1.10.5 for iPad; 6th Generation iPad 32 GB; Apple Pencil; iPadOS 15.0.2

Link to comment
Share on other sites

4 hours ago, mistergarth said:

I know I could do this in Adobe Acrobat, but I hate hate hate the subscription model

There is a non-subscription-based version of Adobe Acrobat Pro available, too, although I suppose that being able to subscribe just on demand (e.g. for a month) could be seen as a benefit for these kinds of tasks. Usefulness of a PDF comparison tool much depends on the kinds of documents compared (and what is compared, text only or other changes, as well), and the kinds and amount of changes between the versions, but the tools I mentioned can be very efficient (and are light years ahead of free tools). Word itself does a good job, but if the starting point is a PDF, conversion to Word is typically the weak point, and to do that well, a proper non-free tool might be needed, too (Word itself also being subscription based). 

If previous versions still exist as Publisher documents, it might be worth a try to just copy paste text from Publisher to LibreOffice Writer and then use its document comparison tool to track deletions and additions between two versions. But that would work only for simple document structures; if there is need to compare text in separate stories (e.g. captions, independent text boxes, etc.), direct PDF comparison is the way to go.

Link to comment
Share on other sites

I assume that the idea is to find the additions and deletions from the first edition that has been recorded so as to re-record those paragraphs and then splice those new recordings into the recorded first edition.

I would be tempted to use my BBEdit text editor's compare function to highlight the differences in copied and pasted text from the two PDFs. It is almost just the work of a moment.

MacBook Pro (13-inch, Mid 2012) Mac OS 10.12.6 || Mac Pro (Late 2013) Mac OS 11.7

Affinity Designer 1.10.5 | Affinity Photo 1.10.5 | Affinity Publisher 1.10.5 | Beta versions as they appear.

I have never mastered color management, period, so I cannot help with that.

Link to comment
Share on other sites

44 minutes ago, Old Bruce said:

I would be tempted to use my BBEdit text editor's compare function to highlight the differences in copied and pasted text from the two PDFs. It is almost just the work of a moment.

If there are a lot of text objects in the PDFs, I suspect it would take substantially more than a moment to do the comparison that way.

Affinity Photo 1.10.5, Affinity Designer 1.10.5, Affinity Publisher 1.10.5;  2020 iMac 27"; 3.8GHz i7, Radeon Pro 5700, 32GB RAM; macOS 10.15.7
Affinity Photo 
1.10.5.280 & Affinity Designer 1.10.5 for iPad; 6th Generation iPad 32 GB; Apple Pencil; iPadOS 15.0.2

Link to comment
Share on other sites

58 minutes ago, R C-R said:

If there are a lot of text objects in the PDFs, I suspect it would take substantially more than a moment to do the comparison that way.

I am assuming that an audiobook recording is of book consisting of mostly text. @mistergarth is most likely needing to know how the text from the first edition is different from the second edition. He needs to get the new text recorded and then the deleted text's recording needs to be redone again. 

BBEdit just gives me the text when I copy paste text from a PDF. Have the two PDFs open to annotate and look through the BBEdit differences window to find the changes.

It isn't automatic but I feel it would be accurate.

MacBook Pro (13-inch, Mid 2012) Mac OS 10.12.6 || Mac Pro (Late 2013) Mac OS 11.7

Affinity Designer 1.10.5 | Affinity Photo 1.10.5 | Affinity Publisher 1.10.5 | Beta versions as they appear.

I have never mastered color management, period, so I cannot help with that.

Link to comment
Share on other sites

6 minutes ago, Old Bruce said:

BBEdit just gives me the text when I copy paste text from a PDF. Have the two PDFs open to annotate and look through the BBEdit differences window to find the changes.

That's usually something every better text editor offers, namely a build-in diff tool, which historically stems from Unix systems CLI tools for text processing & development. Also every version controll system comes along with diff capabilities. A prominent Unix based implementation are the Gnu Diffutils (diff/diff3/sdiff/cmp/patch). - And (no surprise here) the algorithms behind Diff & Co. work always best on plain textual file (plain text) representations.

☛ Affinity Designer 1.10.5 ◆ Affinity Photo 1.10.5 ◆ OSX El Capitan

Link to comment
Share on other sites

4 hours ago, Old Bruce said:

BBEdit just gives me the text when I copy paste text from a PDF.

Yes, I know that. But my point is if the PDF files have a lot of different text objects in them then just opening them in some app that allows copying each of them as text objects & then pasting it into a BBEdit document (& making sure everything is pasted in the same order) could take a very long time.

For example, consider something like two versions of the GraphicConverter User manual PDF. I have PDFs saved for versions 9 (about 340 pages) & 11 (about 500 pages). Almost all their pages have multiple text objects on them, & I do not have any app that can open either one & allow me to select every test object in one step.

How would you handle something like that?

Affinity Photo 1.10.5, Affinity Designer 1.10.5, Affinity Publisher 1.10.5;  2020 iMac 27"; 3.8GHz i7, Radeon Pro 5700, 32GB RAM; macOS 10.15.7
Affinity Photo 
1.10.5.280 & Affinity Designer 1.10.5 for iPad; 6th Generation iPad 32 GB; Apple Pencil; iPadOS 15.0.2

Link to comment
Share on other sites

35 minutes ago, R C-R said:

How would you handle something like that?

There are many/several different tools for extracting all the text out of PDF files. Some tools use OCR to extract all text, others use specific file format parsers (in this case for PDF) for doing the job.

I'm naming exemplary just one Java based tool here, which can deal with many file formats, namely Apache Tika, dowloads available: here. You should have a Java runtime environment installed.

Quote

The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

 

tika1.jpg.5fa2192ce4295a5f4bcc1579505e8378.jpg

tika2.jpg.02245f8105b0f17ff0239b3a157712a7.jpg

☛ Affinity Designer 1.10.5 ◆ Affinity Photo 1.10.5 ◆ OSX El Capitan

Link to comment
Share on other sites

46 minutes ago, v_kyr said:

There are many/several different tools for extracting all the text out of PDF files. Some tools use OCR to extract all text, others use specific file format parsers (in this case for PDF) for doing the job.

Have you ever tested any of them on comparing PDF documents created by different PDF creation softwares (versions or engines)? Do they work well for that?

Affinity Photo 1.10.5, Affinity Designer 1.10.5, Affinity Publisher 1.10.5;  2020 iMac 27"; 3.8GHz i7, Radeon Pro 5700, 32GB RAM; macOS 10.15.7
Affinity Photo 
1.10.5.280 & Affinity Designer 1.10.5 for iPad; 6th Generation iPad 32 GB; Apple Pencil; iPadOS 15.0.2

Link to comment
Share on other sites

22 minutes ago, R C-R said:

Have you ever tested any of them on comparing PDF documents created by different PDF creation softwares (versions or engines)? Do they work well for that?

For comparing plain PDF docs on a visual base manually, you just need a tool which offers to showup two PDF docs side by side with syncronized scrolling views/windows, that's often more accurate than any offered Diff-PDF tools. - Personally I compare documentation/manuals stuff I've written myself and here I usually don't diff/compare the resulting plain PDFs, but instead the file format of the tools I've used to wrote the docs/manuals in (Word .docx, FM .fm, LaTeX .tex, Markup .md, Text .txt, Code ... etc.).

If I would have to compare PDFs where I don't have the intial file format and tools those have been generated from, I would extract the whole texts and compare those, since that's granted to work. - Since comparing all other sort of data (images, graphics, metadata, formated tables & code ... etc.) automatically via PDF diff tools (...aka comparing the PDF files directly), in most cases don't work always accurate (I don't know of any always/absolute accurate/trustable working PDF tool here). Thus I would preferably use a visual sync scrolling vis-a-vis PDF view tool for such purposes and decide then with my own eyes where the diffs in the PDFs are.

☛ Affinity Designer 1.10.5 ◆ Affinity Photo 1.10.5 ◆ OSX El Capitan

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

×
×
  • Create New...

Important Information

Please note there is currently a delay in replying to some post. See pinned thread in the Questions forum. These are the Terms of Use you will be asked to agree to if you join the forum. | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.