Chapter 13
Working with Images

cpdf -extract-images in.pdf [<range>] [-im <path>] [-p2p <path>]      [-dedup | -dedup-perpage] -o <path>

cpdf -image-resolution <minimum resolution> in.pdf [<range>]

13.1 Extracting images

Cpdf can extract the raster images to a given location. JPEG, JPEG2000 and JBIG2 images are extracted directly. Other images are written as PNGs, processed with either ImageMagick’s “magick” command, or NetPBM’s “pnmtopng” program, whichever is installed.

cpdf -extract-images in.pdf [<range>] [-im <path>] [-p2p <path]      [-dedup | -dedup-perpage] -o <path>

The -im or -p2p option is used to give the path to the external tool, one of which must be installed. The output specifer, e.g -o output/%%% gives the number format for numbering the images. Output files are named serially from 0, and include the page number too. For example, output files might be called output/000-p1.jpg, output/001-p1.png, output/002-p3.jpg etc. Here is an example invocation:

cpdf -extract-images in.pdf -im magick -o output/%%%

The output directory must already exist. The -dedup option deduplicates images entirely; the -dedup-perpage option only per page.

13.2 Detecting Low-resolution Images

To list all images in the given range of pages which fall below a given resolution (in dots-per-inch), use the -image-resolution function:

cpdf -image-resolution 300 in.pdf [<range>]

2, /Im5, 531, 684, 149.935297, 150.138267
2, /Im6, 184, 164, 149.999988, 150.458710
2, /Im7, 171, 156, 149.999996, 150.579145
2, /Im9, 65, 91, 149.999986, 151.071856
2, /Im10, 94, 60, 149.999990, 152.284285
2, /Im15, 184, 139, 149.960011, 150.672060
4, /Im29, 53, 48, 149.970749, 151.616446

The format is page number, image name, x pixels, y pixels, x resolution, y resolution. The resolutions refer to the image’s effective resolution at point of use (taking account of scaling, rotation etc).

13.3 Removing an Image

To remove a particular image, find its name using -image-resolution with a sufficiently high resolution (so as to list all images), and then apply the -draft and -draft-remove-only operations from Section 18.1.

.NET Interface

 
CHAPTER 13. Images. 
 
Cpdf.startGetImageResolution(Cpdf.Pdf, Double) 
 
Cpdf.getImageResolutionPageNumber(Int32) 
 
Cpdf.getImageResolutionImageName(Int32) 
 
Cpdf.getImageResolutionXPixels(Int32) 
 
Cpdf.getImageResolutionYPixels(Int32) 
 
Cpdf.getImageResolutionXRes(Int32) 
 
Cpdf.getImageResolutionYRes(Int32) 
 
Cpdf.endGetImageResolution 
 
Gets image data, including resolution at all points of use. Call 
startGetImageResolution(pdf, min_required_resolution) will begin the 
process of obtaining data on all image uses below min_required_resolution, 
returning the total number. So, to return all image uses, specify a very 
high min_required_resolution. Then, call the other functions giving a 
serial number 0..n - 1, to retrieve the data. Finally, call 
endGetImageResolution to clean up.