Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to batch open multiple PDF files and batch export them again as PDF files
#4
(11-07-2018, 09:56 PM)Ofnuts Wrote: You can't specify the resolution when importing them via the API, unfortunately.

Thank you for the information! That is unfortunately already a preventing issue for me...

(11-08-2018, 10:01 AM)rich2005 Wrote: Saw your post on gimpchat, are you an ex-M$ photoshop user? If so then try and forget old PS habits. Using Gimp 2.8, nothing wrong with that, old plugins more likely to work. Probably not using a cutting edge linux, again nothing wrong with that. (oh, just seen you are from CERN - very impressive)

Before Gimp I was using...Microsoft Paint :-P So thankfully no PS habits to drop but not much experience either to help me.
You're right: Gimp 2.8 because of https://packages.debian.org/stretch/gimp

(11-08-2018, 10:01 AM)rich2005 Wrote: Apart from a bespoke plugin and Ofnuts gave advice on that. My thoughts on the subject.

100 PDF's to open. Gimp can open as layers but that involves 100 clicks on the import button. Take into consideration that Gimp will render the PDF(s) to bitmaps. What was a scalable vector (including text) becomes a fixed ppi bitmap. If the PDF's are multi-page, Gimp will number the layers per document ie. 1,2,3, 1#1,2#1,3#1,4 which can result in a scrambled final combined PDF. There are scripts/plug-ins that will rename layers with consecutive numbers.

A better way is combine the pdfs beforehand with a utility such as PDFsam (pdf split-and-merge) there is a free version. Combining will give a single PDF to open in Gimp.
or
If many of pages are pure text, splitting allows picking just the graphic pages from the PDF(s) to be edited/recombined. Depending on the number of pure text pages this can make a big difference in the final PDF size.

edit: Also have a look at pdftk. It is command line but more suitable for a linux bash file.

What is possible with Gimp (and linux)
Install ImageMagick (IM), it is probably already there, if not it will be in your disto repository.
Export all the layers, there are scripts/plugins for this, the attached sg-save-all-layers.scm exports as png, will flatten layer groups and layer masks (not all will do that) screenshot: https://i.imgur.com/k0HgM6q.jpg  Now combine those into a PDF using IM convert command

Code:
convert /path/to/*.png new.pdf

(note about IM and pdf/eps/ps formats. I use Kubuntu 16.04 that command suddenly stopped working with the last update, a month ago. Had to go into /etc/ImageMagick-6/policy.xml and comment out those disabled formats. Why disabled? Who knows.)

ImageMagick same as Gimp, rasterizes any file so you do end up with large PDF's

The above but straight from Gimp.
There is a (old, bit flakey) plug-in export-layers-to-pdf.py (attached) that uses IM (see note about pdf format) to export the layers to a temp folder, combines those to a new PDF. Caveats, will fail with layer groups and layer mask so up-to-you to flatten first. screenshot with result open in PDF viewer: https://i.imgur.com/CqG5eMg.jpg Choose a quality setting less than 100 and the bitmaps are jpeg, makes a big difference to the PDF size.

Long and complicated: Yes, but at least it is not dumbed-down click-n-wish.

Thank you so much for your detailed answer - I really appreciate it.

A few notes on my use case: we're processing scientific papers (PDF, with hundreds of pages of content and hundreds of figures at times) that eventually get printed. When printing, conversion to PDF/X is needed. (Vector) Figures are suspect #1 to create problems when converting to PDF/X. One good-enough solution in that case is to rasterise problematic figures (or even the entire page that contains the figure). When rasterising we want to use a high enough DPI (>=300, usually 500) to make sure there is no visible drop in quality (for A4 printing).

This is where my initial question and Gimp come in. Normally we have the PDF/EPS figure files (could be hundreds) or we use pdftk to pick single pages. Then we use Gimp to open them (which converts them to bitmaps with the DPI we select, as you mention) and then export them to PDF again. For a few figures/pages that's easy to manually do - but when there is a lot of them we were looking for ways to automate.

For now, given an input.pdf file, we automated like this (imagine this in a bash script with loops etc):
Code:
$ pdftoppm -r 300 input.pdf output.ppm
Code:
$ convert -density 300 output.ppm output.pdf
However, we're not as happy with the outcome of pdftoppm + convert as we are with Gimp, especially when it comes to the final file size. Gimp seems to optimize a few things in the process (ImageMagick vs cairo graphics ?). So we were wishing to automate it in Gimp.

See for example the difference in these files: https://cernbox.cern.ch/index.php/s/LWdT2UDsECdVrfn
* levels2nn_INPUT.pdf is the original file (based on an EPS file, following epspdf conversion) --> 11K
* levels2nn_GIMP.pdf is the output of Gimp (import with 300 DPI and export as PDF) --> 151K
* levels2nn_PPM_IM.pdf is the output of pdftoppm converted to PDF with IM (using 300 DPI as well, with the commands quoted just above) --> 250K
* levels2nn_PPM_GIMP.pdf is the output of pdftoppm imported Gimp and exported to PDF (first scale image to 300 DPI and then export as PDF) --> 97K

Maybe the ideal combination would be to batch convert to PPM (with pdftoppm) and then batch open in GIMP, scale to 300 DPI and export as PDF. Would that be possible with a Gimp script?

Thank you both very much for your time!
Reply


Messages In This Thread
RE: How to batch open multiple PDF files and batch export them again as PDF files - by kasioumis - 11-22-2018, 01:38 PM

Forum Jump: