Jump to content

Recommended Posts

Posted

There's an old out of print book I'd like to edit and print. Some of the images need to be cleaned up, and I'd like to make some minor formating changes and edits to the text.

Of the file formats available at internet archive, what would you suggest for easy importation into and editing in Affinity Publisher? How do I make sure the text is editable?

Posted

The only choices that will work at all are PDF and Full Text. And the only one that will include both images and text is PDF. 

However, the way those PDFs are constructed they have an image layer, and beneath it they have a text layer where the text has no fill and cannot be seen in the Affinity apps. This has been discussed before, for example in the topic below:

You can Open that PDF with Publisher (though I'm not sure you'll be able to open all 244 pages at the same time; I can't try that right now). Then, you could use Find and Replace with a regular-expression Search for and you could then use the Format cog for the Replace field to set the text fill to black. Then the currently-hidden text layer would be visible. But as it's covered by the image of the text you won't be able to see it, or select it, unless you hide the image layer (or select using the Layers panel).

But that might get you started. 

Note, too, that you have raw uncorrected OCR text in those books, and it may have a lot of errors.

For that specific book, I would suggest using Project Gutenberg (https://www.gutenberg.org/ebooks/28299) which will give you an HTML version with text corrected by the proofreaders at Distributed Proofreaders, and which you could print using your favorite web browser.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
    Laptop 2: Windows 11 Pro 24H2,  16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU
iPad:  iPad Pro M1, 12.9": iPadOS 18.3.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1

Posted
Quote

For that specific book, I would suggest using Project Gutenberg (https://www.gutenberg.org/ebooks/28299) which will give you an HTML version with text corrected by the proofreaders at Distributed Proofreaders, and which you could print using your favorite web browser.

That's helpful advice, thanks!
But I need to modify the book for book binding purposes. So can I utilize this DP proofed version in Publisher, here I can modify it?

Posted
3 hours ago, AntiqueFlaneur said:

That's helpful advice, thanks!
But I need to modify the book for book binding purposes. So can I utilize this DP proofed version in Publisher, here I can modify it?

 

Yes, you will be able to modify it, but the text may be a mess and need a thorough review. The OCR quality is rather low for many archive.org documents so don't be surprised to find a lot of flagged spelling mistakes. Also, opening a PDF means there will be no text styles, frames won't be linked from page to page to reflow edited text, etc. Be prepared to spend a lot of time cleaning up the converted document.

But Publisher is a great tool to do this with.

Posted
7 hours ago, AntiqueFlaneur said:

But I need to modify the book for book binding purposes.

If you only need to change the page dimensions or margins, you can use the PDF and adjust the document properties without having to edit the (hidden) OCR text.

To get rid of the brownish background you could use an adjustment layer, placed in a group on a master page, applied to all pages and moved to the top in the layer hierarchy for the entire document.

macOS 10.14.6 | MacBookPro Retina 15" | Eizo 27" | Affinity V1

Posted
17 hours ago, AntiqueFlaneur said:

So can I utilize this DP proofed version in Publisher, here I can modify it?

Project Gutenberg does not supply a format that you can easily use in Publisher for your purposes. They do supply the plain text, which will have the OCR errors corrected, but then you would to handle italic and bold formatting yourself.

The text layer in the PDF at archive.org is really bad when examined in Publisher. I would say it's unusable.

But if you can simply treat each page as an image in Publisher, and ignore the actual text, that might work. It will depend on exactly what kind of changes you want to make.

 

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
    Laptop 2: Windows 11 Pro 24H2,  16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU
iPad:  iPad Pro M1, 12.9": iPadOS 18.3.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.