Chapter 5

cpdf -decompress in.pdf -o out.pdf

cpdf -compress in.pdf -o out.pdf

cpdf -squeeze in.pdf [-squeeze-log-to <filename>]
     [-squeeze-no-recompress] [-squeeze-no-page-data] -o out.pdf

cpdf provides basic facilities for decompressing and compressing PDF streams, and for reprocessing the whole file to ‘squeeze’ it.

5.1 Decompressing a Document

To decompress the streams in a PDF file, for instance to manually inspect the PDF, use:

cpdf -decompress in.pdf -o out.pdf

If cpdf finds a compression type it can’t cope with, the stream is left compressed. When using -decompress, object streams are not compressed. It may be easier for manual inspection to also remove object streams, by adding the -no-preserve-objstm option to the command.

5.2 Compressing a Document

To compress the streams in a PDF file, use:

cpdf -compress in.pdf -o out.pdf

cpdf compresses any streams which have no compression using the FlateDecode method, with the exception of Metadata streams, which are left uncompressed.

5.3 Squeezing a Document

To squeeze a PDF file, reducing its size by an average of about twenty percent (though sometimes not at all), use:

cpdf -squeeze in.pdf -o out.pdf

Adding -squeeze to the command line when using another operation will squeeze the file or files upon output.

The -squeeze operation writes some information about the squeezing process to standard output. The squeezing process involves several processes which losslessly attempt to reduce the file size. It is slow, so should not be used without thought.

$ ./cpdf -squeeze in.pdf -o out.pdf
Initial file size is 238169 bytes
Beginning squeeze: 123847 objects
Squeezing... Down to 114860 objects
Squeezing... Down to 114842 objects
Squeezing page data
Recompressing document
Final file size is 187200 bytes,  78.60% of original.

The -squeeze-log-to <filename> option writes the log to the given file instead of to standard output. Log contents is appended to the end of the log file, preserving existing contents.

There are two options which turn off parts of the squeezer. They are -squeeze-no-recompress for avoiding the reprocessing of malformed compressed sections, and -squeeze-no-page-data for avoiding the reprocessing of malformed page data.

Python Interface

# CHAPTER 5. Compression 
def compress(pdf): 
    """Compress any uncompressed streams in the given PDF using the Flate 
def decompress(pdf): 
    """Decompress any streams in the given PDF, so long as the compression 
    method is supported.""" 
def squeezeInMemory(pdf): 
    """squeezeToMemory(pdf) squeezes a pdf in memory. Squeezing is a lossless 
    compression method which works be rearrangement of a PDFs internal