Jump to content
You must now use your email address to sign in [click for more info] ×

Open and work on PDF in Publisher.


Recommended Posts

Hi. If I open a PDF it won't create spreads, just individual pages. The pages don't flow, each one just overflows within itself. Is it even possible to open a 200 page PDF and have it treated like any other data?  I am really in a bind, trying to help a guy convert a (dreadful) website into a book  -  imagine the thousands of tags I've edited by hand.  Having got the pages looking ok, all I can do is 'print' it as a PDF  -  no other choices. So I really must be able to further edit and properly typeset the pages as a single whole document. I really need this. Thanks

Link to comment
Share on other sites

I'd guess at the moment you have a multipage document with each page containing text frames isolated from each other. So have you tried just linking the text frames where appropriate and then removing the empty ones? I'm sure that would have to be done manually but it wouldn't take that long I'd have thought.

Windows 10 Pro, I5 3.3G PC 16G RAM

Link to comment
Share on other sites

Thanks for input.  The method using 'Add Pages' makes all the text boxes into individual lines, so page flow is impossible.  The linking of text boxes I have tried, and I still can find no way of linking page-to-page.  Getting the page breaks to work well over a 200 page document is really somewhat fundamental. 

Link to comment
Share on other sites

8 minutes ago, vikingtone said:

The linking of text boxes I have tried, and I still can find no way of linking page-to-page. 

You will probably have to link each page manually. With the Frame Text Tool active,

  1. Click the lower-right linking triangle of the text frame on page "n".
  2. Click in the text frame on page "n"+1.
  3. Repeat.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

3 minutes ago, N.P.M. said:

That will not flow text from page to page or frame to frame.

No, but hopefully you should have far fewer text frames to deal with. The text flow from frame to frame will always have to be done manually. I'm not aware of any software that can open a PDF and automatically link text frames across pages. Perhaps others might know.

Windows 10 Pro, I5 3.3G PC 16G RAM

Link to comment
Share on other sites

Opening the PDF as a new file brings in all the text as large blocks, bringing it in as 'Add Pages' splits the blocks. 

Anyway  -  I have just checked something and I know see that I can link each page, one after the other after the other, all 200, but at least they do link manually. The text flows as I would like, across all the linked pages  -  it's just (just) a matter of manually linking all 200 pages.

I realise it may sound like a stupid suggestion, but if the thing flows perfectly after manually linking all the pages, surely the 'flow' is there and it's just the automated linking which isn't.  If it was, it's sure be a powerful app, for editing PDFs. 

Anyway, I reckon we've put it to bed  -  thanks for all your help and suggestions

Link to comment
Share on other sites

31 minutes ago, vikingtone said:

I realise it may sound like a stupid suggestion, but if the thing flows perfectly after manually linking all the pages, surely the 'flow' is there and it's just the automated linking which isn't.  If it was, it's sure be a powerful app, for editing PDFs. 

But there is no indication in the PDF that text should flow from page "n" to page "n"+1. And in many Publisher documents, at various places within the document, it doesn't flow between pages. There are intentional breaks.

There is not even (as far as I know) any indication in a PDF that lines are arranged in paragraphs.

PDF is a presentation format, not an editing format.

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

Yes, I understand that (once again) I am doing something I shouldn't.  It's a PDF.  If anyone could suggest a way that I can take a very wide website and reduce it's architecture to portrait A4 format, then I'd be delighted to hear it.  The tools I have are TextMate to move the html pieces into a narrow format, which doesn't export other than saving back as HTML. So I can only get anything from there by 'printing' a PDF from a browser.  If it's a case of "oh, you want to get to that place huh, well if it were me I wouldn't start from here."   Well, neither would I, but I am stuck with what I have to play with.

I see the point that there's no indication that it ought to flow from page to page as a PDF  -  even though it does as an HTML file. 

 

 

 

Link to comment
Share on other sites

47 minutes ago, vikingtone said:

Yes, I understand that (once again) I am doing something I shouldn't.  It's a PDF.  If anyone could suggest a way that I can take a very wide website and reduce it's architecture to portrait A4 format, then I'd be delighted to hear it.  The tools I have are TextMate to move the html pieces into a narrow format, which doesn't export other than saving back as HTML. So I can only get anything from there by 'printing' a PDF from a browser.  If it's a case of "oh, you want to get to that place huh, well if it were me I wouldn't start from here."   Well, neither would I, but I am stuck with what I have to play with.

I see the point that there's no indication that it ought to flow from page to page as a PDF  -  even though it does as an HTML file. 

 

 

 

Could you copy and paste the entire text into a word processor and save as .docx or RTF and then import into APub?

d.

Affinity Designer 1 & 2   |   Affinity Photo 1 & 2   |   Affinity Publisher 1 & 2
Affinity Designer 2 for iPad   |   Affinity Photo 2 for iPad   |   Affinity Publisher 2 for iPad

Windows 11 64-bit - Core i7 - 16GB - Intel HD Graphics 4600 & NVIDIA GeForce GTX 960M
iPad pro 9.7" + Apple Pencil

Link to comment
Share on other sites

How many actual HTML pages are on the website? (can you post a link?)

Why not go directly from the HTML without the complications of the PDF format?

Using developer tools you can turn-off CSS  and images and have just linear text.

The Web Developer extension has a feature to "linearize page" which shows the text and images as one long page (with no width settings).

There are "export as text" extensions.

There are "copy and paste as text" extensions.

Far easier to place a bunch of properly flowing text and then format it with styles.

Or import/open the HTML pages in LibreOffice (or Word) and delete all the formatting, and/or modify it, and then place the DOCX into APub.

You already have flowing text in the HTML pages - converting to PDF breaks that.

You could probably copy all the text, format it, and place any images - in a day.

Link to comment
Share on other sites

The copy/paste method works very well at getting all the text into Apple Pages, for example. But the tables which I preserved from the html because they are required, and all of the 200+ graphic files (jpg for the mos part) are not picked up with 'copy' so would need replacing.

All options seemingly involve a lot more work than I had hoped for. 

thanks for the input though :)   

Of all the options, it may end up being that the least worst choice will be to simply place all the text and completely rebuild my required architecture in Pages, or Publisher  -  but the thought horrifies me

Link to comment
Share on other sites

5 hours ago, vikingtone said:

imagine the thousands of tags I've edited by hand

It's hard to imagine something like that because there are apps that will remove all tags with a single click… ;) 

5 hours ago, vikingtone said:

I really must be able to further edit and properly typeset the pages as a single whole document.

duckduckgo.com/?q=convert+pdf+to+rtf
duckduckgo.com/?q=convert+html+to+rtf

MacBookAir 15": MacOS Ventura > Affinity v1, v2, v2 beta // MacBookPro 15" mid-2012: MacOS El Capitan > Affinity v1 / MacOS Catalina > Affinity v1, v2, v2 beta // iPad 8th: iPadOS 16 > Affinity v2

Link to comment
Share on other sites

kenmcd:   There is no CSS on the site.  The whole site (some 30,000 html pages) is in old world html tables.  All of it. 

I have discussed the simplicity of just grabbing the text and adding all the tables and graphics  -  but I think it's a big task.

You say: "why not go directly from html, not bothering with going via pdf?"    OK, sounds great.  How do I do that, please?  I wish I knew, as it solves everything (once I've got rid of thousands of tags manually in TextMate,) but where do I go with the html file after that?  I don't know of anything which would open html to get the pagination correct.

thanks for the input :)

Link to comment
Share on other sites

loukash:  thanks for the links to converters  -  I have attempted some of these.  The one you suggest for html>>rtf fails in the conversion.  Don't know why as no explanation.

Whatever happens I need to edit the html to get most of the tables out of the txt, and to move table cells from the width into the length, in the correct place. The entire site, every bit of it, is in tables.

thanks again

t

 

Link to comment
Share on other sites

There are too many variables.
Without knowing the structure of your source material, there is no generic formula as in "do this and then you'll get that".
You must post examples.

MacBookAir 15": MacOS Ventura > Affinity v1, v2, v2 beta // MacBookPro 15" mid-2012: MacOS El Capitan > Affinity v1 / MacOS Catalina > Affinity v1, v2, v2 beta // iPad 8th: iPadOS 16 > Affinity v2

Link to comment
Share on other sites

12 minutes ago, vikingtone said:

kenmcd:   There is no CSS on the site.  The whole site (some 30,000 html pages) is in old world html tables.  All of it. 

I have an app that can rip it to a Word doc. If you can provide a link I can take a look later today. That should preserve the tables. Wadda nightmare.

EDIT: OK. Got the link above.

Link to comment
Share on other sites

A thing that may become a major issue is that Affinity Publisher does not properly support tables. It would be a pretty easy task to have HTML imported to Word and stripped of all HTML and then import a cleaned Word file into InDesign. But to make it work in Affinity apps, you should probably have all tables converted to tab separated text flow, and that might mean lots of manual work (setting tab positions, etc.)

Link to comment
Share on other sites

Looking at the website from your link - and having got over the initial shock - I'd query whether Affinity Publisher was suitable for this. If you intend to still use tables a lot then you probably need the tables to be able to flow from one page to another. I'm pretty sure Affinity Publisher can't do that. 

Windows 10 Pro, I5 3.3G PC 16G RAM

Link to comment
Share on other sites

@vikingtone

I only got 2,227 files for the Jockey Club History.
That includes 1,347 HTML files and 878 images (not "30,000 HTML pages").
Can upload them if you do not already have them.

Exported those to a CHM file to take a look if all levels are there.
286MB CHM here: https://workupload.com/file/D9JtmNwrhPj
Looks OK to me.

Imported the HTML and images to H+M, and then exported to a Word file.
With import settings I converted all simple box tables to just paragraphs of text.
So that may be helpful for some cut-n-paste.
Not sure why all the tables text got big grey borders on import.
Was easy to change manually, but quite a PITA for so many.
As mentioned by others above, you could/should modify the HTML first (may fix).
Normally in H+M you would rearrange the imported HTML page titles into your desired outline, but this is just page title order as imported (alphabetical).
310MB DOCX here: https://workupload.com/file/kvY9jMWR6c6

After looking at the bizarre layout of the pages - not sure how helpful any of this is.
Found myself look at it and wondering how I would rearrange/reformat.
You do have a daunting task.
 

 

 

Link to comment
Share on other sites

Mick Rose  -  thanks for looking at this for me  -  it is appreciated. Yes, the site is a challenge on so many levels.  It may be that Publisher can't cope, though with manual page links and adding many blank pages just to prune them later  -  may work, but a lot of effort and a bodge. I just don't have many other options.

I have tried PDF and html converters online, and most either butcher the result, or crash.

 

anyway, I really need to find a way.   After this one, there are over ten more sections the ownder would like turning into books  -  good grief

 

cheers and thanks again

t

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.