Chapter 13
Working with Images

cpdf -extract-images in.pdf [<range>] [-im <path>] [-p2p <path>]      [-dedup | -dedup-perpage] -o <path>

cpdf -image-resolution <minimum resolution> in.pdf [<range>]

13.1 Extracting images

Cpdf can extract the raster images to a given location. JPEG, JPEG2000 and JBIG2 images are extracted directly. Other images are written as PNGs, processed with either ImageMagick’s “magick” command, or NetPBM’s “pnmtopng” program, whichever is installed.

cpdf -extract-images in.pdf [<range>] [-im <path>] [-p2p <path]      [-dedup | -dedup-perpage] -o <path>

The -im or -p2p option is used to give the path to the external tool, one of which must be installed. The output specifer, e.g -o output/%%% gives the number format for numbering the images. Output files are named serially from 0, and include the page number too. For example, output files might be called output/000-p1.jpg, output/001-p1.png, output/002-p3.jpg etc. Here is an example invocation:

cpdf -extract-images in.pdf -im magick -o output/%%%

The output directory must already exist. The -dedup option deduplicates images entirely; the -dedup-perpage option only per page.

13.2 Detecting Low-resolution Images

To list all images in the given range of pages which fall below a given resolution (in dots-per-inch), use the -image-resolution function:

cpdf -image-resolution 300 in.pdf [<range>]

2, /Im5, 531, 684, 149.935297, 150.138267
2, /Im6, 184, 164, 149.999988, 150.458710
2, /Im7, 171, 156, 149.999996, 150.579145
2, /Im9, 65, 91, 149.999986, 151.071856
2, /Im10, 94, 60, 149.999990, 152.284285
2, /Im15, 184, 139, 149.960011, 150.672060
4, /Im29, 53, 48, 149.970749, 151.616446

The format is page number, image name, x pixels, y pixels, x resolution, y resolution. The resolutions refer to the image’s effective resolution at point of use (taking account of scaling, rotation etc).

13.3 Removing an Image

To remove a particular image, find its name using -image-resolution with a sufficiently high resolution (so as to list all images), and then apply the -draft and -draft-remove-only operations from Section 18.1.

Python Interface

 
# CHAPTER 13. Images 
 
def getImageResolution(pdf, min_required_resolution): 
    """Return a list of all uses of images in the PDF which do not meet the 
    minimum required resolution in dpi as tuples of: 
    (pagenumber, name, x pixels, y pixels, x resolution, y resolution)"""