Bulk renaming in a single pass with Go

Closed Case

The secret is known as closure and is a feature supported not only by Go but also by many other scripting and programming languages. Listing 4 illustrates the procedure with a simple example.

Listing 4

closure.go

01 package main
02
03 import "fmt"
04
05 func main() {
06   mycounter := mkmycounter()
07
08   mycounter()
09   mycounter()
10   mycounter()
11 }
12
13 func mkmycounter() func() {
14   count := 1
15
16   return func() {
17     fmt.Printf("%d\n", count)
18     count++
19   }
20 }

Before a function-creating function like mkmycounter() returns a newly constructed subroutine to the caller, it is allowed to define local variables, which are then wrapped into the returned function's context. When called multiple times, those variables subsequently appear global (or rather static) to the call context. If a call to the generated and returned function modifies one of these variables, the next call to the function will also find the previously modified value. The enclosed variables therefore belong to the function, much like instance variables belong to an object in object-oriented programming.

As expected, the call of the binary compiled from Listing 4 shows successive calls of the generated function outputting growing counter values (Listing 5).

Listing 5

Calling the Binary

01 $ go build closure.go
02 $ ./closure
03 1
04 2
05 3

Characters, Bytes, and Runes

The call to the regexp function ReplaceAllString() in line 31 of Listing 3 also needs some explanation. It replaces all the characters in the org string matched by the regular expression rex with the characters in the repl string. On the other hand, the ReplaceAll() function (without the String suffix), which the user may find first in a cursory study of the man page, expects slices of the type []byte instead of strings. Attentive readers may wonder what the difference is, considering the fact that you can easily convert a string into a byte slice with []byte(string).

To explain this, it is worthwhile digressing into Go's implementation of strings [2]. Astonished Go students will discover that strings and byte slices ([]byte) are fundamentally different data types in Go. You are not allowed to modify existing strings: Strings are immutable, but you are allowed to mess around with byte slices. In addition, strings distinguish between characters and bytes. Since strings are UTF-8 encoded in Go code, the "PiÒata" string in the program text of Listings 6 and 7 takes up seven bytes, since the accented Ò character in UTF-8 is represented as c3 b1 hex.

As the meaning of the word "character" has historically often been confused with "byte," the Unicode standard refers to them as code points. The Ò character occupies position U+00F1, which UTF-8 encodes as c3 b1. To make things worse, there is also an alternative rendering of it in the form of two Unicode code points. This has a squiggly tilde floating above an n, but we'll not be going into that today. The only important thing is that Go refers to code points in the Unicode standard as "runes."

While the range operator in Listing 6 parses the runes (Figure 1), the for loop in Listing 7 indexes the individual bytes and returns the accented character in the form of two illegible bytes. You see: It makes sense to check very carefully whether a function processes strings or byte slices. Converting between the two different data types looks easy, but it involves a great deal of internal overhead – that is, it'll cost you compute cycles at runtime.

Listing 6

range.go

package main
import "fmt"
func main() {
  str := "PiÒata"
  for i, c := range str {
    fmt.Printf("str[%d]='%c'\n", i, c)
  }
}

Listing 7

forloop.go

package main
import "fmt"
func main() {
  str := "PiÒata"
  for i := 0; i < len(str); i++ {
    fmt.Printf("str[%d]='%c'\n", i, str[i])
  }
}
Figure 1: When parsing strings, the range operator and for loop return different results.

Off We Go

Let's get back to Listing 4. Because of the closure implemented there, the function increments the value of the seq variable by one for each call and replaces the {seq} placeholder in the file template with the integer value padded out to four digits with leading zeros. foo-{seq}.log first becomes foo-0001.log, then foo-0002.log, and so on.

The call to

go build renamer.go mkmodifier.go

compiles both listings and links the result together into a binary called renamer. Figure 2 shows some usage examples.

Figure 2: The Go program renames files and numbers them if so desired.

By the way, the os.Rename() function also accepts identical source and target files – in which case it just does nothing. But if the target file already exists, it overwrites it with the source file without any warning. If you don't want that, you can add a test and maybe a new --force option, which tells the program to bulldoze whatever it finds in the way.

To avoid unintentional renaming of critical files, it is always a good idea to do a dry run first with -d. Is everything okay? Then go again, and do it live this time.

Infos

  1. Renamer: https://github.com/adriangoransson/renamer
  2. "Strings, bytes, runes, and characters in Go": https://blog.golang.org/strings

The Author

Mike Schilli works as a software engineer in the San Francisco Bay area, California. Each month in his column, which has been running since 1997, he researches practical applications of various programming languages. If you email him at mailto:mschilli@perlmeister.com he will gladly answer any questions.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Bulk Renaming

    When it comes to renaming multiple files, the command line offers time-saving options in the form of mv, rename, and mmv.

  • Shell Download Manager

    A few lines of shell code and the Gawk scripting language make downloading files off the web a breeze.

  • Swiss File Knife

    Swiss File Knife replaces more than 100 individual command-line tools at once, but it still fits on a USB stick and runs on all major operating systems.

  • Motion Sensor

    Inotify lets applications subscribe to change notifications in the filesystem. Mike Schilli uses the cross-platform fsnotify library to instruct a Go program to detect what's happening.

  • Python 3

    What do Python 2.x programmers need to know about Python 3?

comments powered by Disqus