Adapt the PDFtk PDF tool's call syntax with Go
Programming Snapshot – PDFtk Go Variant
Go is not only suitable for complex server programs, but it also cuts a fine figure with simple command-line tools for automating everyday life. Mike Schilli restructures the signature of a PDF manipulation tool.
One of my favorite tools at the command line is the PDFtk utility, a veritable Swiss Army knife for merging PDF documents. However, when called, the tool expects unusual syntax for passing parameters, which I find hard to memorize and type every time. That's why I decided to wire up a variant in Go, which tries to guess what the user wants. En route to doing this, the inclined reader will discover how Go reads and writes files, extracts and manipulates single characters from strings, and calls external programs with interactive input, as well as find out how to cross-compile Go programs for various platforms.
For close to a decade now, I've been digitizing paper books and magazines with my scanner and then tossing them into the recycling bin. Sometimes a book comes out as two or more PDFs because the scanner got stuck between two stacks and I had to continue the scanning process with a new document. Sometimes the cover of a hardback book simply will not fit through the scanner feeder, which means that the front and back covers are individual PDF files from a flatbed scanner. PDFtk makes putting the parts together a breeze:
$ pdftk book-*.pdf cat output book.pdf
What happens here is that the shell grabs any files matching the pattern (book-*.pdf
) in the current directory, and – if they are numbered book-1.pdf
, book-2.pdf
, and so on – passes them to PDFtk in the correct order. The cat
subcommand tells the tool to stitch together all the input documents in sequence. Finally, PDFtk expects the name of the output
file after the keyword. So far, so good, but couldn't this all be a little more standards compliant and easier?
Yes, We Can!
The freshly squeezed Go program you are looking at today, aptly named Pdftki, simply grabs the PDF book parts, discovers that they all start with book-*
, and concatenates them. It decides on book.pdf
as the name of the target file, as this is the largest common denominator of all the subfiles. And all of this is part of just one simple call:
$ pdftki book-*.pdf
It's nice, compact, and easy to remember. But what if you need to leave out a page because you have two copies of it, like at the end of book-1.pdf
and the start of book-2.pdf
? Thanks to PDFtk, you do this by assigning an uppercase letter to each of the documents and letting the cat
statement for the second document start at page 2 instead of page 1 (Listing 1) [1].
Listing 1
Skipping a Page
$ pdftk A=book-1.pdf B=book-2.pdf cat A1-end B2-end output book.pdf
While PDFtk fully integrates the first file (1-end
), it skips page one of the second document (2-end
). This gives you some insight into how powerful PDFtk is, but at a cost of the kind of syntax that has you regularly browsing the man page.
In contrast to this, the following call to the Go tool Pdftki automatically cobbles together PDFtk parameters for concatenating all parts and assigns them letters with page ranges as shown in Figure 1:
$ pdftki -e book-*.pdf
The -e
option then tells it to launch an editor, giving the user the ability to modify the suggested call parameters for PDFtk. After quitting the editor, it then merges the subfiles according to the commands saved. Very handy!
Built-In Help
Listing 2 shows the main program, which uses the flag
package to interpret its command-line options (such as -e
, which I just described) and the Args()
method to extract the list of PDF files specified by the user.
Listing 2
pdftki.go
01 package main 02 03 import ( 04 "bytes" 05 "flag" 06 "fmt" 07 "log" 08 "os/exec" 09 ) 10 11 func main() { 12 var edit = flag.Bool("e", false, 13 "Pop up an editor") 14 flag.Parse() 15 pdftkArgs := pdftkArgs(flag.Args()) 16 17 if *edit { 18 editCmd(&pdftkArgs) 19 } 20 21 var out bytes.Buffer 22 cmd := exec.Command(pdftkArgs[0], 23 pdftkArgs[1:]...) 24 cmd.Stdout = &out 25 cmd.Stderr = &out 26 err := cmd.Run() 27 28 if err != nil { 29 log.Fatal(err) 30 } 31 32 fmt.Printf("OK: [%s]\n", out.String()) 33 }
After calling Parse()
in line 14, the edit
variable contains a pointer to a bool
type value. By default, it is set to false
, but it changes to true
if the user specifies -e
. In this case, line 18 starts the editCmd()
function, which I'll get to later in Listing 4. The user can now modify the arguments determined in line 15 for the PDFtk call in an editor (Figure 1) before line 22 is executed.
The handy os/exec package from the Go standard library uses Run()
to call external programs and their arguments; if so desired, it also captures their standard output and standard error output. Lines 24 and 25 assign out
buffers of the Buffer
type from the bytes package to the respective attributes. exec
then runs the command and collects the output in the buffer. If an error occurs, line 29 prints it as a log message. If everything works as intended, line 32 calls the method out.String()
to print the captured command for the user's perusal.
As an additional goody, the flag package provides a simple help function that tells the user which options the program supports if called by typing pdftki -h
(Listing 3).
Listing 3
Calling Help
01 $ ./pdftki -h 02 Usage of ./pdftki: 03 -e Pop up an editor
Keyboard Pass-Through
Listing 4 comes into play if the user entered the -e
switch at the command line – that is, if they want to edit the command before executing it.
Listing 4
edit.go
01 package main 02 03 import ( 04 "io/ioutil" 05 "log" 06 "os" 07 "os/exec" 08 "strings" 09 ) 10 11 func editCmd(args *[]string) { 12 tmp, err := ioutil.TempFile("/tmp", "") 13 if err != nil { 14 log.Fatal(err) 15 } 16 defer os.Remove(tmp.Name()) 17 18 b := []byte(strings.Join(*args, " ")) 19 err = ioutil.WriteFile( 20 tmp.Name(), b, 0644) 21 if err != nil { 22 panic(err) 23 } 24 25 cmd := exec.Command("vi", tmp.Name()) 26 cmd.Stdout = os.Stdout 27 cmd.Stdin = os.Stdin 28 cmd.Stderr = os.Stderr 29 err = cmd.Run() 30 if err != nil { 31 panic(err) 32 } 33 34 str, err := ioutil.ReadFile(tmp.Name()) 35 if err != nil { 36 panic(err) 37 } 38 line := 39 strings.TrimSuffix(string(str), "\n") 40 *args = strings.Split(line, " ") 41 }
To call an external program such as an instance of the editor vi, with which the user can also interact, you have to tell the exec package to not just pass Stdout
and Stderr
from the external program to the identically named channels of the current terminal but also wire up standard input Stdin
, so that any keystrokes made by the user will actually reach the editor. Go offers matching system file descriptors in the os package. Lines 26 to 28 of Listing 4 link the three to the anchor pads of the same name in the exec package.
For the user to be able to modify the call in the editor, the editCmd()
function needs to store the pdftk
command and its arguments in a file and call the editor with it. After the user has saved the changes and returned, editCmd()
reads the file and saves its contents in array format in the args
variable, which was passed in as a pointer.
To do this, editCmd()
creates a temporary file in the /tmp
directory. The useful TempFile()
function from the standard io/ioutil package ensures that the new file's name does not collide with files that already exist in /tmp
, so that different processes won't clobber each other's temp files. After the work is done, the program has to get rid of what is then an obsolete file. This is done by the defer
call in line 16, which kicks in automatically at the end of the function.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Canonical Bumps LTS Support to 12 years
If you're worried that your Ubuntu LTS release won't be supported long enough to last, Canonical has a surprise for you in the form of 12 years of security coverage.
-
Fedora 40 Beta Released Soon
With the official release of Fedora 40 coming in April, it's almost time to download the beta and see what's new.
-
New Pentesting Distribution to Compete with Kali Linux
SnoopGod is now available for your testing needs
-
Juno Computers Launches Another Linux Laptop
If you're looking for a powerhouse laptop that runs Ubuntu, the Juno Computers Neptune 17 v6 should be on your radar.
-
ZorinOS 17.1 Released, Includes Improved Windows App Support
If you need or desire to run Windows applications on Linux, there's one distribution intent on making that easier for you and its new release further improves that feature.
-
Linux Market Share Surpasses 4% for the First Time
Look out Windows and macOS, Linux is on the rise and has even topped ChromeOS to become the fourth most widely used OS around the globe.
-
KDE’s Plasma 6 Officially Available
KDE’s Plasma 6.0 "Megarelease" has happened, and it's brimming with new features, polish, and performance.
-
Latest Version of Tails Unleashed
Tails 6.0 is based on Debian 12 and includes GNOME 43.
-
KDE Announces New Slimbook V with Plenty of Power and KDE’s Plasma 6
If you're a fan of KDE Plasma, you'll be thrilled to hear they've announced a new Slimbook with an AMD CPU and the latest version of KDE Plasma desktop.
-
Monthly Sponsorship Includes Early Access to elementary OS 8
If you want to get a glimpse of what's in the pipeline for elementary OS 8, just set up a monthly sponsorship to help fund its continued existence.