Editing PDF Structure with QPDF

Encryption Options

Contrary to some passing references on the web, QPDF's main purpose is not to crack password protected PDFs. It may enable cracking with the use of --password-is-hex-key, which interprets the password as a hexadecimal-encoded key value. However, the lack of a viewer to support this mode means that the option is only possibly useful, allowing the output file to be viewed with forensic tools – although the manual is careful not to specify which tools.

However, if you have the password for a PDF, you can edit its encryption options. If you have the password, the encryption key can be viewed with --show-encryption-key. You can also remove all encryption with the option --decrypt.

In addition, you can edit a PDF's built-in permissions. The necessary snippet of the command structure is:

--encrypt USER-PASSWORD OWNER-PASSWORD KEY-LENGTH PERMISSIONS

USER-PASSWORD and OWNER-PASSWORD refer to the passwords added when the PDF is created. And, despite its name, KEY-LENGTH does not refer to the public key used in an application like GPG, but to groups of settings that are part of the PDF standard. These groups are designated by lengths of 40, 128, and 256. Each group has its own settings, as shown in Table 2.

Table 2

PDF Permission Settings

Key Length = 40

--print=[yn]

Allows printing

--extract=[yn]

Allows text or image extraction

--annotate=[yn]

Allows comments and form fill-in and signing

Key Length = 128

--accessibility=[yn]

Allows accessibility to visually impaired

--extract= [yn]

Allows text or image extraction

--assemble=[yn]

Allows rotation and reordering of pages

--annotate=[yn]

Allows comments, form fill-in, and signing

--form=[yn]

Whether filling form fields is allowed

--modify-other=[yn]

Allows all document editing except those controlled separately by --assemble, --annotate, and --form

--print=print-opt[full, low, none]

Controls printing resolution or whether it is allowed

--modify=[all, annotate, form, assembly, none]

Controls modify access

Key Length = 256

--use-aes=[yn]

Uses AES encryption instead of RC4 encryption

The lengths of 40 and 128 give the same permissions as are available using CommonPDF file creators. Be aware that the built-in encryption is notoriously weak and can be bypassed by a number of applications that are available for the download. If you are seriously concerned about security that goes beyond providing an obstacle for unsophisticated users, be sure to include a key length of 256, which provides more serious encryption. My recommendation is to use it alongside the 128 key length, which provides comprehensive options. If no key length is specified, the output file is fully editable.

QDF Mode

Generally, the easiest way to edit a PDF file is to open it in LibreOfice Writer. Writer is especially ideal if you are using a hybrid PDF – that is, one created in Writer that also includes a copy of the file in OpenDocument Format, LibreOffice's default format. At the cost of a file twice as large as an ordinary PDF, a hybrid provides a fully editable file that also updates the accompanying PDF file when saved. But if you do not have a hybrid file, then a PDF can only be edited line by line in Writer and other editors, and new lines are only practical in blank space.

QDF mode is a format that displays like any other PDF, but it can be edited in a regular text editor, as long as there is no password protection. If a file does have a password, it can be viewed, but not edited. The catch is that the format displays all objects in numerical order. This format takes some practice to read. Content is easy to find, but objects like images need to be carefully edited – for instance, if you remove an image, you need to update every other image, or else the output file will not build or display properly (Figure 2).

Figure 2: QDF mode allows you to view and edit the structure of a PDF in a text file.

To create a file in QDF mode, simply add the --qdf option. If you run into trouble with a QDF mode file, try using --fix-qdf. This option tries to repair everything from object streams to cross-reference tables, although the repairs may not be entirely what you hoped. Also, be aware that QDF mode is incompatible with linearization, which essentially gives the same view of the file.

Other Options

This article only covers the uses of QPDF that might be useful to end users. The QPDF manual [2] is current and contains almost as much information again for developers. As well as options for testing and debugging, QPDF has options for how it handles Unicode passwords and file names and for use in C++, C, JavaScript, and Python.

However, you do not need to be a developer to find QPDF useful. Although you will probably want to work with the latest version of the manual open, QPDF is a comprehensive toolkit and can replace several common scripts under one command. If you regularly edit PDFs, QPDF is in many ways an essential application.

The Author

Bruce Byfield is a computer journalist and a freelance writer and editor specializing in free and open source software. In addition to his writing projects, he also teaches live and e-learning courses. In his spare time, Bruce writes about Northwest coast art (http://brucebyfield.wordpress.com). He is also co-founder of Prentice Pieces, a blog about writing and fantasy at https://prenticepieces.com/.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News