Chapter 11
Document Information and Metadata

11.1 Listing Fonts

The -list-fonts operation prints the fonts in the document, one-per-line to standard output. For example:

The first column gives the page number, the second the internal unique font name, the third the type of font (Type1, TrueType etc), the fourth the PDF font name, the fifth the PDF font encoding.

11.2 Reading Document Information

The -info option prints entries from the document information dictionary, and from any XMP metadata to standard output.

The details of the format for creation and modification dates can be found in Appendix A.

By default, cpdf strips to ASCII, discarding character codes in excess of 127. In order to preserve the original unicode, add the -utf8 option. To disable all postprocessing of the string, add -raw.

The -page-info option prints the page label, media box and other boxes page-by-page to standard output, for all pages in the current range.

Note that the format for boxes is minimum x, minimum y, maximum x, maximum y.

The -pages operation prints the number of pages in the file.

11.3 Setting Document Information

The document information dictionary in a PDF file specifies various pieces of information about a PDF. These can be consulted in a PDF viewer (for instance, Acrobat).

Here is a summary of the commands for setting entries in the document information dictionary:

(The details of the format for creation and modification dates can be found in Appendix A. Using the date "now" uses the time and date at which the command is executed. Note also that -producer and -creator may be used to set the producer and/or the creator when writing any file, separate from the operations described in this chapter.)

For example, to set the title, the full command line would be

The text string is considered to be in UTF8 format, unless the -raw option is added—in which case, it is unprocessed, save for the replacement of any octal escape sequence such as \017, which is replaced by a character of its value (here, 15).

11.4 Upon Opening a Document

11.4.1 Page Layout

The -set-page-layout option specifies the page layout to be used when a document is opened in, for instance, Acrobat. The possible (case-sensitive) values are:

SinglePage      Display one page at a time
OneColumn       Display the pages in one column
TwoColumnLeft   Display the pages in two columns, odd numbered pages
               on the left
TwoColumnRight   Display the pages in two columns, even numbered pages
               on the left
TwoPageLeft     (PDF  1.5 and above) Display the pages two at a time,
               odd numbered pages on the left
TwoPageRight    (PDF  1.5 and above) Display the pages two at a time,
               even numbered pages on the left

For instance:

11.4.2 Page Mode

The page mode in a PDF file defines how a viewer should display the document when first opened. The possible (case-sensitive) values are:

UseNone         Neither document outline nor thumbnail images visible
UseOutlines     Document outline (bookmarks) visible

UseThumbs       Thumbnail images visible
FullScreen      Full- screenmode (nomenu bar, window controls, or any-
               thing but the document visible)
UseOC           (PDF  1.5 and above) Optional content group panel visi-
UseAttachments   (PDF  1.5 and above) Attachments panel visible

For instance:

11.4.3 Display Options

-hide-toolbar      Hide the viewer’s toolbar
-hide-menubar      Document outline (bookmarks) visible
-hide-window- ui     Hide the viewer’s scroll bars

-fit-window        Resize the document’s windows to fit size of first page
-center-window      Position thedocumentwindow inthecenterofthescreen
                  Displaythedocumenttitleinsteadofthefile name inthe
-display-doc- title  title bar

For instance:

The page a PDF file opens at can be set using -open-at-page:

To have that page scaled to fit the window in the viewer, use -open-at-page-fit instead:

11.5 Metadata

PDF files can contain a piece of arbitrary metadata, often in XMP format. This is typically stored in an uncompressed stream, so that other applications can read it without having to decode the whole PDF. To set the metadata:

To remove any metadata:

To print the current metadata to standard output:

11.6 Page Labels

It is possible to add page labels to a document. These are not the printed on the page, but may be displayed alongside thumbnails or in print dialogue boxes by PDF readers. We use -add-page-labels to do this, by default with decimal arabic numbers (1,2,3…). We can add -label-style to choose what type of labels to add from these kinds:

   DecimalArabic  1,2,3,4,5...
  LowercaseRoman  i,ii,iii,iv,v...
  UppercaseRoman  I,II,III,IV,V...
 LowercaseLetters  a,b,c,...,z,aa,bb...
 UppercaseLetters  A,B,C,...,Z,AA,BB...
NoLabelPrefixOnly  No number, but a prefix will be used if defined.

We can use -label-prefix to add a textual prefix to each label. Consider a file with twenty pages and no current page labels (a PDF reader will assume 1,2,3…if there are none). We will add the following page labels:

i, ii, iii, iv, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, A-0, A-1, A-2, A-3, A-4, A-5

Here are the commands, in order:

By default the labels begin at page number 1 for each range. To override this, we can use -label-startval (we used 0 in the final command), where we want the numbers to begin at zero rather than one.

Page labels may be removed altogether by using -remove-page-labels command. To print the page labels from an existing file, use -print-page-labels. For example: