Chapter 2
Merging and Splitting

2.1 Merging

The -merge operation allow the merging of several files into one. Ranges can be used to select only a subset of pages from each input file in the output. The output file consists of the concatenation of all the input pages in the order specified on the command line. Actually, the -merge can be omitted, since this is the default operation of cpdf.

Merge maintains bookmarks, named destinations, and name dictionaries.

Forms and other objects which cannot be merged are retained if they are from the document which first exhibits that feature.

The -retain-numbering option keeps the PDF page numbering labels of each document intact, rather than renumbering the output pages from 1.

The -remove-duplicate-fonts ensures that fonts used in more than one of the inputs only appear once in the output.

2.2 Splitting

The -split operation splits a PDF file into a number of parts which are written to file, their names being generated from a format. The optional -chunk option allows the number of pages written to each output file to be set.

If the output format does not provide enough numbers for the files generated, the result is unspecified. The following format operators may be used:


%, %%, %%% etc.  Sequence number padded to the number of percent signs
             @F  Original filename without extension
             @N  Sequence number without padding zeroes
             @S  Start page of this chunk
             @E  End page of this chunk
             @B  Bookmark name  at this page

2.3 Splitting on Bookmarks

The -split-bookmarks <level> operation splits a PDF file into a number of parts, according to the page ranges implied by the document’s bookmarks. These parts are then written to file with names generated from the given format.

Level 0 denotes the top-level bookmarks, level 1 the next level (sub-bookmarks) and so on. So -split-bookmarks 1 creates breaks on level 0 and level 1 boundaries.

Now, there may be many bookmarks on a single page (for instance, if paragraphs are bookmarked or there are two subsections on one page). The splits calculated by -split-bookmarks ensure that each page appears in only one of the output files. It is possible to use the @ operators above, including operator @B which expands to the text of the bookmark:

The bookmark text used for a name is converted from unicode to 7 bit ASCII, and the following characters are removed, in addition to any character with ASCII code less than 32:

2.4 Encrypting with Split and Split Bookmarks

The encryption parameters described in Chapter 4 may be added to the command line to encrypt each split PDF. Similarly, the -recrypt switch described in 1 may by given to re-encrypt each file with the existing encryption of the source PDF.