vikingtone Posted February 10, 2023 Posted February 10, 2023 Hi. If I open a PDF it won't create spreads, just individual pages. The pages don't flow, each one just overflows within itself. Is it even possible to open a 200 page PDF and have it treated like any other data? I am really in a bind, trying to help a guy convert a (dreadful) website into a book - imagine the thousands of tags I've edited by hand. Having got the pages looking ok, all I can do is 'print' it as a PDF - no other choices. So I really must be able to further edit and properly typeset the pages as a single whole document. I really need this. Thanks Quote
MickRose Posted February 10, 2023 Posted February 10, 2023 I'd guess at the moment you have a multipage document with each page containing text frames isolated from each other. So have you tried just linking the text frames where appropriate and then removing the empty ones? I'm sure that would have to be done manually but it wouldn't take that long I'd have thought. Quote Windows 10 Pro, I5 3.3G PC 16G RAM
vikingtone Posted February 10, 2023 Author Posted February 10, 2023 Thanks for input. The method using 'Add Pages' makes all the text boxes into individual lines, so page flow is impossible. The linking of text boxes I have tried, and I still can find no way of linking page-to-page. Getting the page breaks to work well over a 200 page document is really somewhat fundamental. Quote
walt.farrell Posted February 10, 2023 Posted February 10, 2023 8 minutes ago, vikingtone said: The linking of text boxes I have tried, and I still can find no way of linking page-to-page. You will probably have to link each page manually. With the Frame Text Tool active, Click the lower-right linking triangle of the text frame on page "n". Click in the text frame on page "n"+1. Repeat. Quote -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.3.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
vikingtone Posted February 10, 2023 Author Posted February 10, 2023 Thanks for the info, I appreciate it. Yes, that works perfectly. 200 pages to work on - and the 'link all pages' doesn't work. Great. Sorry for the sour grapes - but really................ there are some things a chap expects............... thanks again Quote
MickRose Posted February 10, 2023 Posted February 10, 2023 16 minutes ago, vikingtone said: The method using 'Add Pages' makes all the text boxes into individual lines, so page flow is impossible. Can you not open the PDF as a new document and click this Quote Windows 10 Pro, I5 3.3G PC 16G RAM
MickRose Posted February 10, 2023 Posted February 10, 2023 3 minutes ago, N.P.M. said: That will not flow text from page to page or frame to frame. No, but hopefully you should have far fewer text frames to deal with. The text flow from frame to frame will always have to be done manually. I'm not aware of any software that can open a PDF and automatically link text frames across pages. Perhaps others might know. Quote Windows 10 Pro, I5 3.3G PC 16G RAM
vikingtone Posted February 10, 2023 Author Posted February 10, 2023 Opening the PDF as a new file brings in all the text as large blocks, bringing it in as 'Add Pages' splits the blocks. Anyway - I have just checked something and I know see that I can link each page, one after the other after the other, all 200, but at least they do link manually. The text flows as I would like, across all the linked pages - it's just (just) a matter of manually linking all 200 pages. I realise it may sound like a stupid suggestion, but if the thing flows perfectly after manually linking all the pages, surely the 'flow' is there and it's just the automated linking which isn't. If it was, it's sure be a powerful app, for editing PDFs. Anyway, I reckon we've put it to bed - thanks for all your help and suggestions Quote
walt.farrell Posted February 10, 2023 Posted February 10, 2023 31 minutes ago, vikingtone said: I realise it may sound like a stupid suggestion, but if the thing flows perfectly after manually linking all the pages, surely the 'flow' is there and it's just the automated linking which isn't. If it was, it's sure be a powerful app, for editing PDFs. But there is no indication in the PDF that text should flow from page "n" to page "n"+1. And in many Publisher documents, at various places within the document, it doesn't flow between pages. There are intentional breaks. There is not even (as far as I know) any indication in a PDF that lines are arranged in paragraphs. PDF is a presentation format, not an editing format. Quote -- Walt Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases PC: Desktop: Windows 11 Pro 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 Laptop: Windows 11 Pro 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU. Laptop 2: Windows 11 Pro 24H2, 16GB memory, Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) 12 Core CPU 4.01 GHz, Qualcomm(R) Adreno(TM) X1-85 GPU iPad: iPad Pro M1, 12.9": iPadOS 18.3.1, Apple Pencil 2, Magic Keyboard Mac: 2023 M2 MacBook Air 15", 16GB memory, macOS Sequoia 15.0.1
vikingtone Posted February 10, 2023 Author Posted February 10, 2023 Yes, I understand that (once again) I am doing something I shouldn't. It's a PDF. If anyone could suggest a way that I can take a very wide website and reduce it's architecture to portrait A4 format, then I'd be delighted to hear it. The tools I have are TextMate to move the html pieces into a narrow format, which doesn't export other than saving back as HTML. So I can only get anything from there by 'printing' a PDF from a browser. If it's a case of "oh, you want to get to that place huh, well if it were me I wouldn't start from here." Well, neither would I, but I am stuck with what I have to play with. I see the point that there's no indication that it ought to flow from page to page as a PDF - even though it does as an HTML file. Quote
dominik Posted February 10, 2023 Posted February 10, 2023 47 minutes ago, vikingtone said: Yes, I understand that (once again) I am doing something I shouldn't. It's a PDF. If anyone could suggest a way that I can take a very wide website and reduce it's architecture to portrait A4 format, then I'd be delighted to hear it. The tools I have are TextMate to move the html pieces into a narrow format, which doesn't export other than saving back as HTML. So I can only get anything from there by 'printing' a PDF from a browser. If it's a case of "oh, you want to get to that place huh, well if it were me I wouldn't start from here." Well, neither would I, but I am stuck with what I have to play with. I see the point that there's no indication that it ought to flow from page to page as a PDF - even though it does as an HTML file. Could you copy and paste the entire text into a word processor and save as .docx or RTF and then import into APub? d. Quote Affinity Suite on Windows (V2) and iPad (V2). Beta testing when available. Windows 11 64-bit - Core i7 - 16GB - Intel HD Graphics 4600 & NVIDIA GeForce GTX 960M iPad pro 9.7" + Apple Pencil
kenmcd Posted February 10, 2023 Posted February 10, 2023 How many actual HTML pages are on the website? (can you post a link?) Why not go directly from the HTML without the complications of the PDF format? Using developer tools you can turn-off CSS and images and have just linear text. The Web Developer extension has a feature to "linearize page" which shows the text and images as one long page (with no width settings). There are "export as text" extensions. There are "copy and paste as text" extensions. Far easier to place a bunch of properly flowing text and then format it with styles. Or import/open the HTML pages in LibreOffice (or Word) and delete all the formatting, and/or modify it, and then place the DOCX into APub. You already have flowing text in the HTML pages - converting to PDF breaks that. You could probably copy all the text, format it, and place any images - in a day. Quote
vikingtone Posted February 10, 2023 Author Posted February 10, 2023 The copy/paste method works very well at getting all the text into Apple Pages, for example. But the tables which I preserved from the html because they are required, and all of the 200+ graphic files (jpg for the mos part) are not picked up with 'copy' so would need replacing. All options seemingly involve a lot more work than I had hoped for. thanks for the input though Of all the options, it may end up being that the least worst choice will be to simply place all the text and completely rebuild my required architecture in Pages, or Publisher - but the thought horrifies me Quote
loukash Posted February 10, 2023 Posted February 10, 2023 5 hours ago, vikingtone said: imagine the thousands of tags I've edited by hand It's hard to imagine something like that because there are apps that will remove all tags with a single click… 5 hours ago, vikingtone said: I really must be able to further edit and properly typeset the pages as a single whole document. duckduckgo.com/?q=convert+pdf+to+rtfduckduckgo.com/?q=convert+html+to+rtf Quote MacBookAir 15": MacOS Ventura > Affinity v1, v2, v2 beta // MacBookPro 15" mid-2012: MacOS El Capitan > Affinity v1 / MacOS Catalina > Affinity v1, v2, v2 beta // iPad 8th: iPadOS 16 > Affinity v2
vikingtone Posted February 10, 2023 Author Posted February 10, 2023 kenmcd: There is no CSS on the site. The whole site (some 30,000 html pages) is in old world html tables. All of it. I have discussed the simplicity of just grabbing the text and adding all the tables and graphics - but I think it's a big task. You say: "why not go directly from html, not bothering with going via pdf?" OK, sounds great. How do I do that, please? I wish I knew, as it solves everything (once I've got rid of thousands of tags manually in TextMate,) but where do I go with the html file after that? I don't know of anything which would open html to get the pagination correct. thanks for the input Quote
vikingtone Posted February 10, 2023 Author Posted February 10, 2023 loukash: thanks for the links to converters - I have attempted some of these. The one you suggest for html>>rtf fails in the conversion. Don't know why as no explanation. Whatever happens I need to edit the html to get most of the tables out of the txt, and to move table cells from the width into the length, in the correct place. The entire site, every bit of it, is in tables. thanks again t Quote
loukash Posted February 10, 2023 Posted February 10, 2023 There are too many variables. Without knowing the structure of your source material, there is no generic formula as in "do this and then you'll get that". You must post examples. Quote MacBookAir 15": MacOS Ventura > Affinity v1, v2, v2 beta // MacBookPro 15" mid-2012: MacOS El Capitan > Affinity v1 / MacOS Catalina > Affinity v1, v2, v2 beta // iPad 8th: iPadOS 16 > Affinity v2
vikingtone Posted February 10, 2023 Author Posted February 10, 2023 Here is the main site, and the following link is the section I am trying to convert - which is section 10 from the 'menu,' thanks http://www.greyhoundderby.com http://www.greyhoundderby.com/Jockey Club History.html Quote
kenmcd Posted February 10, 2023 Posted February 10, 2023 12 minutes ago, vikingtone said: kenmcd: There is no CSS on the site. The whole site (some 30,000 html pages) is in old world html tables. All of it. I have an app that can rip it to a Word doc. If you can provide a link I can take a look later today. That should preserve the tables. Wadda nightmare. EDIT: OK. Got the link above. Quote
vikingtone Posted February 10, 2023 Author Posted February 10, 2023 A nightmare indeed thanks again t Quote
MickRose Posted February 10, 2023 Posted February 10, 2023 Looking at the website from your link - and having got over the initial shock - I'd query whether Affinity Publisher was suitable for this. If you intend to still use tables a lot then you probably need the tables to be able to flow from one page to another. I'm pretty sure Affinity Publisher can't do that. Quote Windows 10 Pro, I5 3.3G PC 16G RAM
kenmcd Posted February 10, 2023 Posted February 10, 2023 @vikingtone I only got 2,227 files for the Jockey Club History. That includes 1,347 HTML files and 878 images (not "30,000 HTML pages"). Can upload them if you do not already have them. Exported those to a CHM file to take a look if all levels are there. 286MB CHM here: https://workupload.com/file/D9JtmNwrhPj Looks OK to me. Imported the HTML and images to H+M, and then exported to a Word file. With import settings I converted all simple box tables to just paragraphs of text. So that may be helpful for some cut-n-paste. Not sure why all the tables text got big grey borders on import. Was easy to change manually, but quite a PITA for so many. As mentioned by others above, you could/should modify the HTML first (may fix). Normally in H+M you would rearrange the imported HTML page titles into your desired outline, but this is just page title order as imported (alphabetical). 310MB DOCX here: https://workupload.com/file/kvY9jMWR6c6 After looking at the bizarre layout of the pages - not sure how helpful any of this is. Found myself look at it and wondering how I would rearrange/reformat. You do have a daunting task. Quote
vikingtone Posted February 10, 2023 Author Posted February 10, 2023 I enclose my edit, which is the foundation of how I'd like the book to appear, though with pagination for decent layout. I left a lot of text blocks unedited, in order to get pagination right later. I appreciate you taking trouble to look at this, thank you Jockey Club history.html Quote
vikingtone Posted February 10, 2023 Author Posted February 10, 2023 Mick Rose - thanks for looking at this for me - it is appreciated. Yes, the site is a challenge on so many levels. It may be that Publisher can't cope, though with manual page links and adding many blank pages just to prune them later - may work, but a lot of effort and a bodge. I just don't have many other options. I have tried PDF and html converters online, and most either butcher the result, or crash. anyway, I really need to find a way. After this one, there are over ten more sections the ownder would like turning into books - good grief cheers and thanks again t Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.