Adapt the PDFtk PDF tool's call syntax with Go
Reading and Writing
The ioutil package also includes the convenience functions ReadFile()
and WriteFile()
. They read or write a snippet of text, which must exist as a byte
array slice (and not as a string), from or to a file.
To do this, line 18 in Listing 4 first uses Join()
to concatenate the command and all parameters, separated by space characters, thus creating a long string. The []byte()
cast operator then converts the string to a byte
array slice.
Conversely, ReadFile()
in line 34 reads the modified file into memory. In line 39, the program converts the resulting bytes in the line
variable into a character string, using string()
to do so. It also chops off the line break that vi appends to the end of the file without being asked.
The Split()
function in line 40 separates the program and its arguments into a new array and assigns it back to the input array, by dereferencing the passed-in pointer. The calling function will then access the modified data instead of the original.
The pdftkArgs()
function from Listing 5 builds the PDFtk command-line syntax introduced at the beginning of this article; it's preparing the more elaborate parameter set to be used for more complicated cases. It assigns an uppercase letter to each input file and then lists the pages to be joined with A1-end
, B1-end
, etc. To do this, it iterates over all input files in line 10 and increments the index idx
by one, starting at
. With this, it increases the ASCII value determined in line 8 by int('A')
and thus obtains B, C, and so on.
Listing 5
args.go
01 package main 02 03 import "fmt" 04 05 func pdftkArgs(files []string) []string { 06 args := []string{"pdftk"} 07 catArgs := []string{} 08 letterChr := int('A') 09 10 for idx, file := range files { 11 letter := string(letterChr + idx) 12 args = append(args, 13 fmt.Sprintf("%s=%s", letter, file)) 14 catArgs = append(catArgs, 15 fmt.Sprintf("%s1-end", letter)) 16 } 17 18 args = append(args, "cat") 19 args = append(args, catArgs...) 20 args = append(args, 21 "output", outfile(files)) 22 return args 23 }
Greatest Common Denominator
That leaves us with the task of determining the name of the output file from all input files using the greatest common denominator. For this purpose, Listing 6 removes the file extension in outfile()
using the Ext()
function from the path/filepath package. The extension stripped should be .pdf
.
Listing 6
outfile.go
01 package main 02 03 import ( 04 "fmt" 05 "path/filepath" 06 "strings" 07 ) 08 09 func outfile(infiles []string) string { 10 if len(infiles) == 0 { 11 panic("Cannot have zero infiles") 12 } 13 14 ext := filepath.Ext(infiles[0]) 15 base := longestSubstr(infiles) 16 base = strings.TrimSuffix(base, ext) 17 base = strings.TrimSuffix(base, "-") 18 19 return fmt.Sprintf( 20 "%s-out%s", base, ext) 21 } 22 23 func longestSubstr(all []string) string { 24 testIdx := 0 25 keepGoing := true 26 27 for keepGoing { 28 var c byte 29 30 for _, instring := range all { 31 if testIdx >= len(instring) { 32 keepGoing = false 33 break 34 } 35 36 if c == 0 { // uninitialized? 37 c = instring[testIdx] 38 continue 39 } 40 41 if instring[testIdx] != c { 42 keepGoing = false 43 break 44 } 45 46 } 47 testIdx++ 48 } 49 50 if testIdx <= 1 { 51 return "" 52 } 53 return all[0][0 : testIdx-1] 54 }
In longestSubstr()
from line 23, it then looks, starting at the beginning of the string, for the longest string common to all file names before removing any hyphens in line 17. The call in line 19 adds -out
to the base name determined in this way, as well as the .pdf
suffix, which was removed at the beginning. This concludes generating the name of the target file from the source files.
To determine the longest common substring from the beginning, the longestSubstr()
function starting in line 23 implements a small finite state machine. Line 30 iterates over the names of all input files and stores the currently investigated letter position in the first file name from the list in the variable c
. The use of zero values eases the control flow here; these are fixed values that Go assigns to variables that have not yet been initialized.
The byte
type variable c
is still not initialized after its declaration in line 28. Therefore, according to the Go manual, it has an integer value of
. Line 36 uses this to check whether the variable c
has already been set to the currently investigated letter of the first file name in the inner for
loop starting in line 30. If not, line 37 goes ahead and primes the variable. After this, continue
will be ringing in the next round.
During the following passes of the inner loop, line 41 checks whether the currently investigated letter in the next source file from the list still matches the letter from the first file name (and thus the value stored in c
). When the first mismatch occurs, line 42 sets the variable keepGoing
to false
, and line 43 breaks out of the inner loop (break
). This also causes the outer loop to terminate. The testIdx
counter is incremented by one in each round to move to the next character, matching as many as possible.
At the end of this, the value of testIdx
has overshot by one, because the file names start to differ at this index position. Consequently, longestSubstr()
in line 53 returns an array slice whose highest element index has been reduced by one while running the state machine.
Other Worlds
The build
command from line 1 of Listing 7 creates the pdftki
executable for the current platform from the four subprograms. The subsequent call to pdftki *.pdf
runs the binary with the source files and initiates the desired action. The program does not need any additional modules – just what's available in Go's standard library.
Listing 7
Compiling Pdftki
01 $ go build pdftki.go edit.go args.go outfile.go 02 $ GOOS=linux GOARCH=i386 go build pdftki.go edit.go args.go outfile.go
If you want to use the binary under Linux but develop the program on a Mac, you can compile it there with the command from line 2 of Listing 7 and then simply copy the resulting executable to a Linux computer, where it runs without any problems. The opposite route also works, if necessary.
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU
This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
-
XZ Gets the All-Clear
The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
-
Canonical Collaborates with Qualcomm on New Venture
This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.
-
Kodi 21.0 Open-Source Entertainment Hub Released
After a year of development, the award-winning Kodi cross-platform, media center software is now available with many new additions and improvements.
-
Linux Usage Increases in Two Key Areas
If market share is your thing, you'll be happy to know that Linux is on the rise in two areas that, if they keep climbing, could have serious meaning for Linux's future.
-
Vulnerability Discovered in xz Libraries
An urgent alert for Fedora 40 has been posted and users should pay attention.
-
Canonical Bumps LTS Support to 12 years
If you're worried that your Ubuntu LTS release won't be supported long enough to last, Canonical has a surprise for you in the form of 12 years of security coverage.
-
Fedora 40 Beta Released Soon
With the official release of Fedora 40 coming in April, it's almost time to download the beta and see what's new.
-
New Pentesting Distribution to Compete with Kali Linux
SnoopGod is now available for your testing needs
-
Juno Computers Launches Another Linux Laptop
If you're looking for a powerhouse laptop that runs Ubuntu, the Juno Computers Neptune 17 v6 should be on your radar.