Adapt the PDFtk PDF tool's call syntax with Go

Programming Snapshot – PDFtk Go Variant

© Lead Image © Sergey Nivens, 123RF.com

© Lead Image © Sergey Nivens, 123RF.com

Article from Issue 237/2020
Author(s):

Go is not only suitable for complex server programs, but it also cuts a fine figure with simple command-line tools for automating everyday life. Mike Schilli restructures the signature of a PDF manipulation tool.

One of my favorite tools at the command line is the PDFtk utility, a veritable Swiss Army knife for merging PDF documents. However, when called, the tool expects unusual syntax for passing parameters, which I find hard to memorize and type every time. That's why I decided to wire up a variant in Go, which tries to guess what the user wants. En route to doing this, the inclined reader will discover how Go reads and writes files, extracts and manipulates single characters from strings, and calls external programs with interactive input, as well as find out how to cross-compile Go programs for various platforms.

For close to a decade now, I've been digitizing paper books and magazines with my scanner and then tossing them into the recycling bin. Sometimes a book comes out as two or more PDFs because the scanner got stuck between two stacks and I had to continue the scanning process with a new document. Sometimes the cover of a hardback book simply will not fit through the scanner feeder, which means that the front and back covers are individual PDF files from a flatbed scanner. PDFtk makes putting the parts together a breeze:

$ pdftk book-*.pdf cat output book.pdf

What happens here is that the shell grabs any files matching the pattern (book-*.pdf) in the current directory, and – if they are numbered book-1.pdf, book-2.pdf, and so on – passes them to PDFtk in the correct order. The cat subcommand tells the tool to stitch together all the input documents in sequence. Finally, PDFtk expects the name of the output file after the keyword. So far, so good, but couldn't this all be a little more standards compliant and easier?

Yes, We Can!

The freshly squeezed Go program you are looking at today, aptly named Pdftki, simply grabs the PDF book parts, discovers that they all start with book-*, and concatenates them. It decides on book.pdf as the name of the target file, as this is the largest common denominator of all the subfiles. And all of this is part of just one simple call:

$ pdftki book-*.pdf

It's nice, compact, and easy to remember. But what if you need to leave out a page because you have two copies of it, like at the end of book-1.pdf and the start of book-2.pdf? Thanks to PDFtk, you do this by assigning an uppercase letter to each of the documents and letting the cat statement for the second document start at page 2 instead of page 1 (Listing 1) [1].

Listing 1

Skipping a Page

$ pdftk A=book-1.pdf B=book-2.pdf cat A1-end B2-end output book.pdf

While PDFtk fully integrates the first file (1-end), it skips page one of the second document (2-end). This gives you some insight into how powerful PDFtk is, but at a cost of the kind of syntax that has you regularly browsing the man page.

In contrast to this, the following call to the Go tool Pdftki automatically cobbles together PDFtk parameters for concatenating all parts and assigns them letters with page ranges as shown in Figure 1:

$ pdftki -e book-*.pdf
Figure 1: Calling pdftki -e lets the user modify the command in the vi editor before running it.

The -e option then tells it to launch an editor, giving the user the ability to modify the suggested call parameters for PDFtk. After quitting the editor, it then merges the subfiles according to the commands saved. Very handy!

Built-In Help

Listing 2 shows the main program, which uses the flag package to interpret its command-line options (such as -e, which I just described) and the Args() method to extract the list of PDF files specified by the user.

Listing 2

pdftki.go

01 package main
02
03 import (
04   "bytes"
05   "flag"
06   "fmt"
07   "log"
08   "os/exec"
09 )
10
11 func main() {
12   var edit = flag.Bool("e", false,
13     "Pop up an editor")
14   flag.Parse()
15   pdftkArgs := pdftkArgs(flag.Args())
16
17   if *edit {
18     editCmd(&pdftkArgs)
19   }
20
21   var out bytes.Buffer
22   cmd := exec.Command(pdftkArgs[0],
23     pdftkArgs[1:]...)
24   cmd.Stdout = &out
25   cmd.Stderr = &out
26   err := cmd.Run()
27
28   if err != nil {
29     log.Fatal(err)
30   }
31
32   fmt.Printf("OK: [%s]\n", out.String())
33 }

After calling Parse() in line 14, the edit variable contains a pointer to a bool type value. By default, it is set to false, but it changes to true if the user specifies -e. In this case, line 18 starts the editCmd() function, which I'll get to later in Listing 4. The user can now modify the arguments determined in line 15 for the PDFtk call in an editor (Figure 1) before line 22 is executed.

The handy os/exec package from the Go standard library uses Run() to call external programs and their arguments; if so desired, it also captures their standard output and standard error output. Lines 24 and 25 assign out buffers of the Buffer type from the bytes package to the respective attributes. exec then runs the command and collects the output in the buffer. If an error occurs, line 29 prints it as a log message. If everything works as intended, line 32 calls the method out.String() to print the captured command for the user's perusal.

As an additional goody, the flag package provides a simple help function that tells the user which options the program supports if called by typing pdftki -h (Listing 3).

Listing 3

Calling Help

01 $ ./pdftki -h
02 Usage of ./pdftki:
03   -e    Pop up an editor

Keyboard Pass-Through

Listing 4 comes into play if the user entered the -e switch at the command line – that is, if they want to edit the command before executing it.

Listing 4

edit.go

01 package main
02
03 import (
04   "io/ioutil"
05   "log"
06   "os"
07   "os/exec"
08   "strings"
09 )
10
11 func editCmd(args *[]string) {
12   tmp, err := ioutil.TempFile("/tmp", "")
13   if err != nil {
14     log.Fatal(err)
15   }
16   defer os.Remove(tmp.Name())
17
18   b := []byte(strings.Join(*args, " "))
19   err = ioutil.WriteFile(
20     tmp.Name(), b, 0644)
21   if err != nil {
22     panic(err)
23   }
24
25   cmd := exec.Command("vi", tmp.Name())
26   cmd.Stdout = os.Stdout
27   cmd.Stdin = os.Stdin
28   cmd.Stderr = os.Stderr
29   err = cmd.Run()
30   if err != nil {
31     panic(err)
32   }
33
34   str, err := ioutil.ReadFile(tmp.Name())
35   if err != nil {
36     panic(err)
37   }
38   line :=
39     strings.TrimSuffix(string(str), "\n")
40   *args = strings.Split(line, " ")
41 }

To call an external program such as an instance of the editor vi, with which the user can also interact, you have to tell the exec package to not just pass Stdout and Stderr from the external program to the identically named channels of the current terminal but also wire up standard input Stdin, so that any keystrokes made by the user will actually reach the editor. Go offers matching system file descriptors in the os package. Lines 26 to 28 of Listing 4 link the three to the anchor pads of the same name in the exec package.

For the user to be able to modify the call in the editor, the editCmd() function needs to store the pdftk command and its arguments in a file and call the editor with it. After the user has saved the changes and returned, editCmd() reads the file and saves its contents in array format in the args variable, which was passed in as a pointer.

To do this, editCmd() creates a temporary file in the /tmp directory. The useful TempFile() function from the standard io/ioutil package ensures that the new file's name does not collide with files that already exist in /tmp, so that different processes won't clobber each other's temp files. After the work is done, the program has to get rid of what is then an obsolete file. This is done by the defer call in line 16, which kicks in automatically at the end of the function.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • PDF Toolkit

    To manage the mountains of paper that cross our desks every day, we need to file, retrieve, copy, stamp, investigate, and classify documents. A special tool can help users keep on top of their electronic paperwork: pdftk – the PDF toolkit.

  • Perl – Tagging e-Books in Evernote

    Google Drive lacks a mechanism for tagging files, so we look at two APIs that scripts can use to store metadata on Evernote, allowing searches of e-books by category or property.

  • Command Line: make

    Developers, LaTeX users, and system administrators can all harness the power of make.

  • Patterns in the Archive

    To help him check his Google Drive files with three different pattern matchers, Mike builds a command-line tool in Go to maintain a meta cache.

  • Programming Snapshot – Bulk Renaming

    Renaming multiple files following a pattern often requires small shell scripts. Mike Schilli looks to simplify this task with a Go program.

comments powered by Disqus