Jump to content
You must now use your email address to sign in [click for more info] ×

Data Merge unusable on large quantities, insanely long export time and enormous amount of disk space is consumed


Blake_S

Recommended Posts

I tried to do the following:

  • Create a 2-page document 90x50mm
  • Create 2 master pages, on each page place one PDF page to serve as the static content, apply them to respective regular pages.
    PDF file size is 2.5MB
  • Create a single text field on one of the pages, link it to a data field from a spreadsheet, this will be the only dynamic content present, 2000 records total
  • Data merge -> create merged document, goes fast enough, under 10 seconds
  • Export as PDF -> the export process didn't even move past 25% after 5 minutes.

This is completely insane, as the only variable element is a single text field, it should be done in seconds.

It also eats enormous amounts of disk space, dozens of gigabytes. At around 33% I ran out of disk space, and the job processing was cancelled with an error.
PersonaBackstore.dat was 34 GB at the time the disk ran out of space.

Seems like its trying to re-print the background from the linked PDF for every single record, rather than once.
_____________________

I did the exact same steps in Adobe IdDesign CS5, and export to PDF for all 2000 records took 15 seconds.
No additional disk space was consumed, barely any RAM was used.
Same export settings. 
Didn't need to create the merged file too, export directly to PDF.

Link to comment
Share on other sites

It sounds like you're creating a 4000 page document with your data merge, and each page contains a PDF file.

That sounds fairly complex to me. Perhaps you should simplify things. For example, Place a TIFF or JPG rather than PDF, perhaps?

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

37 minutes ago, walt.farrell said:

It sounds like you're creating a 4000 page document with your data merge, and each page contains a PDF file.

That sounds fairly complex to me. Perhaps you should simplify things. For example, Place a TIFF or JPG rather than PDF, perhaps?

Not that it matters, but it's a 2000 page document when the merge is completed...

The image, no matter the type, will still be included on each and every page. While that might enable the pdf to generate fully with APub, the resultant pdf will still be far larger than it ought to be...at least compared to ID/QXP, and likely others.

An ID/QXP pdf of the same construction will only include a single copy of the master page pdf/image and merely reference it on subsequent pages that use it. That matters a lot both in speed of export but also of final pdf size.

Link to comment
Share on other sites

17 minutes ago, MikeW said:

Not that it matters, but it's a 2000 page document when the merge is completed...

It started as a 2-page document, so the merge should give 4000 pages, shouldn't it?

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

19 minutes ago, walt.farrell said:

It started as a 2-page document, so the merge should give 4000 pages, shouldn't it?

Yep. My bad.

Still, the rest of what I wrote remains true. There would still be the single pdf/image for the first two master page instances of the generated pdf, the rest of the pages reference those.

Link to comment
Share on other sites

2 hours ago, walt.farrell said:

That sounds fairly complex to me.

Its not. I already mentioned that only a single text field is variable, meanwhile all 4000 pages are supposed to be linked, since they use exact same content.
Plus as I said, InDesign exported the file in 15 seconds.
The number of records shouldn't matter much as long as your only variables are in text and you don't try to merge images or PDFs, which I do not.

If you can't even data merge a single text field, what would happen if you try to data merge a brochure with like 10 text fields and several pictures on each page

2 hours ago, walt.farrell said:

For example, Place a TIFF or JPG rather than PDF, perhaps?

That heavily degrades text quality so not an option.

Link to comment
Share on other sites

1 hour ago, MikeW said:

An ID/QXP pdf of the same construction will only include a single copy of the master page pdf/image and merely reference it on subsequent pages that use it.

Yep that's how its supposed to be, the file placed into a master page should be linked on all the pages that used said master page, so that only one copy of it would be present in the exported file, and not duplicated.

Link to comment
Share on other sites

1 hour ago, Blake_S said:

The number of records shouldn't matter much as long as your only variables are in text and you don't try to merge images or PDFs, which I do not.

Don't you have that PDF file on at least 2000 of the pages, once it's merged?

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

19 hours ago, walt.farrell said:

Don't you have that PDF file on at least 2000 of the pages, once it's merged?

Sure, but as I said, the only variable data there is a single text field. Its the simplest data merge job possible.
Everything else present is the same on all pages, so only exists in one copy with no duplication, it can be 10000 pages and file size would be about the same.

I've seen datamerge examples posted on official Affinity site with many data merged text fileds on a page, plus merged images.

Yet my job fails with only one text field.
And of course there is absolutely no way a program should write dozens of gigabytes to disk to make a merged file that is under 10MB.
Exact same problem as in V1

Link to comment
Share on other sites

3 hours ago, Blake_S said:

Its the simplest data merge job possible.

No, it's not. 

A simpler one would be with a TIFF or JPG exported from the PDF, as I mentioned earlier. Is there a reason that your merged output file requires 2000 copies of a PDF, rather than using an image file?

-- Walt
Designer, Photo, and Publisher V1 and V2 at latest retail and beta releases
PC:
    Desktop:  Windows 11 Pro, version 23H2, 64GB memory, AMD Ryzen 9 5900 12-Core @ 3.00 GHz, NVIDIA GeForce RTX 3090 

    Laptop:  Windows 11 Pro, version 23H2, 32GB memory, Intel Core i7-10750H @ 2.60GHz, Intel UHD Graphics Comet Lake GT2 and NVIDIA GeForce RTX 3070 Laptop GPU.
iPad:  iPad Pro M1, 12.9": iPadOS 17.4.1, Apple Pencil 2, Magic Keyboard 
Mac:  2023 M2 MacBook Air 15", 16GB memory, macOS Sonoma 14.4.1

Link to comment
Share on other sites

  • 3 weeks later...
  • Staff

Hi @Blake_S,

I've not been able to replicate this.  I followed the same setup as yourself and even used a Excel file with over 2000 records in and the merge was done in under a minute and didn't eat away at the hard drive space.

If you can still replicate this, can you provide your afpub file before the merge, making sure your PDFs are embedded and also attach your CSV file.

Link to comment
Share on other sites

  • 2 weeks later...

Still having constant problems with this on nearly every data merge job, export takes minutes and writes dozens of GB to disk.

Just now export failed because Affinity consumed all disk space and then crashed.

I'll setup a test file and link it here later.

Link to comment
Share on other sites

On 1/17/2023 at 2:01 PM, walt.farrell said:

Is there a reason that your merged output file requires 2000 copies of a PDF, rather than using an image file?

Yes, rasterizing PDFs is a bad practice as it will degrade text quality significantly. Rasterised version prints way worse.

Also its not 2000 copies, its 1 copy which should be linked on all pages, or 2 copies in case of 2-sided print with different sides.
There should be almost no difference in the number of records, as all of them should link to the same background PDFs.

Link to comment
Share on other sites

Also just noticed - don't even have to Export the file - Affinity starts writing dozens of GB of data to disk just from having a merged file open.
Right now It had written 23GB to disk after merge with about 500 records. Insane. And all this is to make a 2MB PDF in the end...

Link to comment
Share on other sites

@Blake_S

What helped with me was that I first removed the PDF document from the master pages and then reinserted it. After that the data merge was done within 3 seconds. But exported PDF file has a size of over 40 GB. 

AMD Ryzen 7 5700X | INTEL Arc A770 LE 16 GB  | 32 GB DDR4 3200MHz | Windows 11 Pro 23H2 (22631.3296)
AMD A10-9600P | dGPU R7 M340 (2 GB)  | 8 GB DDR4 2133 MHz | Windows 10 Home 22H2 (1945.3803) 

Affinity Suite V 2.4 & Beta 2.(latest)
Better translations with: https://www.deepl.com/translator  
Interested in a robust (selfhosted) PDF Solution? Have a look at Stirling PDF

Life is too short to have meaningless discussions!

Link to comment
Share on other sites

On 2/20/2023 at 3:49 PM, Komatös said:

After that the data merge was done within 3 seconds.

In V2 I have no problem with data merge speed, that is fast.

I have problems with:

  • disk space / RAM consumption during data merge and while merged document is opened (can be multiple dozens of GB)
  • disk space / RAM consumption during Export process - can exceed 100 GB depending on the project
  • Export speed - dozens of time slower than Adobe InDesign with about the same settings
     
Link to comment
Share on other sites

  • Staff
On 2/20/2023 at 1:49 PM, Komatös said:

But exported PDF file has a size of over 40 GB. 

Which PDF preset did you use?  I tried the Presets, For Print and For Export and both gave me a 33MB PDF file.

On 2/22/2023 at 8:40 AM, Blake_S said:
  • disk space / RAM consumption during data merge and while merged document is opened (can be multiple dozens of GB)
  • disk space / RAM consumption during Export process - can exceed 100 GB depending on the project

I've replicated this with your files and will get this logged with the Dev team :) 

Link to comment
Share on other sites

Hi @stokerg

File sizes vary between 40.03 and 41.17 MB

1017330131_forexport.png.e0e505b55aed68c39a7f82d3fa793d12.png 149612412_forprint.png.75161c3ce3aaaa3c21f0d319a2b9d264.png

1706752707_pressready.png.0bcebb4cde9702bde4ca2a7ab8195796.png

 

Maybe it's the font substitution? Since I don't have Minion, I used Arial as a substitute font.

Edited by Komatös
Additional info

AMD Ryzen 7 5700X | INTEL Arc A770 LE 16 GB  | 32 GB DDR4 3200MHz | Windows 11 Pro 23H2 (22631.3296)
AMD A10-9600P | dGPU R7 M340 (2 GB)  | 8 GB DDR4 2133 MHz | Windows 10 Home 22H2 (1945.3803) 

Affinity Suite V 2.4 & Beta 2.(latest)
Better translations with: https://www.deepl.com/translator  
Interested in a robust (selfhosted) PDF Solution? Have a look at Stirling PDF

Life is too short to have meaningless discussions!

Link to comment
Share on other sites

  • Staff
3 minutes ago, Komatös said:

File sizes vary between 40.03 and 41.17 MB

 

Okay thats fine :)  

Just in the previous post, you'd said 40 GB and not MB.  I was starting to worry why my exported PDF was very much smaller!  

Link to comment
Share on other sites

48 minutes ago, stokerg said:

Okay thats fine :)  

Just in the previous post, you'd said 40 GB and not MB.  I was starting to worry why my exported PDF was very much smaller!  

The wild horses had probably run away with me 🤣

AMD Ryzen 7 5700X | INTEL Arc A770 LE 16 GB  | 32 GB DDR4 3200MHz | Windows 11 Pro 23H2 (22631.3296)
AMD A10-9600P | dGPU R7 M340 (2 GB)  | 8 GB DDR4 2133 MHz | Windows 10 Home 22H2 (1945.3803) 

Affinity Suite V 2.4 & Beta 2.(latest)
Better translations with: https://www.deepl.com/translator  
Interested in a robust (selfhosted) PDF Solution? Have a look at Stirling PDF

Life is too short to have meaningless discussions!

Link to comment
Share on other sites

  • 2 weeks later...

I try to generate 13.500 pages with only 2 fields in each page, a numbered page, white page, not have an image or pdf or nothing linked or embedd. Only text field, a number to use after on acrobat.

Resume: i have imported an excel file with 13.000 lines of data. A number 5001 to 18500, simple file.

Result: A large amount of GB ( 25gb of ram used ) and AFP crashes when i click on export menu, the program crashes in front of me kkkkkkk.

Indesign have an option to export directly, not have a necessity of generate a new file with all generated pages to export later.

I have try to set cache memory to high amount of GB and the performance option on AFP to use large amount too, and not have a good result.

Link to comment
Share on other sites

18 hours ago, Bruno Belo said:

I try to generate 13.500 pages with only 2 fields in each page, a numbered page, white page, not have an image or pdf or nothing linked or embedd. Only text field, a number to use after on acrobat.

So you have two fields; a "number" and a "white page". I understand what the number is but I am struggling to figure out what the white page is. What is in the Excel file for that cell? In the screen show (of Apple's Numbers) what would be in the cells with the "???".

68428305_ScreenShot2023-03-07at7_06_44AM.png.10232d3f2c732efdaabb74e4a4c1c13c.png

Mac Pro (Late 2013) Mac OS 12.7.4 
Affinity Designer 2.4.1 | Affinity Photo 2.4.1 | Affinity Publisher 2.4.1 | Beta versions as they appear.

I have never mastered color management, period, so I cannot help with that.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.