Tracking down hard disk space consumption with Go

Chart Topper Tree

Listing 2 shows the implementation of the top N list using a min heap from the Go container/heap standard library. Its nodes are variables of the FsEntry type structure starting in line 7, which hold the access path to a file along with its size in bytes. The topN structure starting in line 12 contains an entire array slice of FsEntry structures in the entries field and the maximum size of the heap used in the n field.

Listing 2

heap.go

01 package main
02
03 import (
04   "container/heap"
05 )
06
07 type FsEntry struct {
08   path string
09   used int64
10 }
11
12 type topN struct {
13   entries []FsEntry
14   n       int
15 }
16
17 func NewTopN(n int) *topN {
18   h := &topN{n: n, entries: []FsEntry{}}
19   heap.Init(h)
20   return h
21 }
22
23 func (h *topN) Add(fse FsEntry) {
24   heap.Push(h, fse)
25   if h.Len() > h.n {
26     heap.Pop(h)
27   }
28 }
29
30 func (h *topN) Iterate(cb func(fse FsEntry)) {
31   save := make([]FsEntry, len(h.entries))
32   copy(save, h.entries)
33   r := make([]FsEntry, h.Len())
34   for i := len(r) - 1; i >= 0; i-- {
35     r[i] = heap.Pop(h).(FsEntry)
36   }
37   for _, e := range r {
38     cb(e)
39   }
40   h.entries = save
41 }
42
43 func (h topN) Len() int {
44   return len(h.entries)
45 }
46
47 func (h topN) Less(i, j int) bool {
48   return h.entries[i].used < h.entries[j].used
49 }
50
51 func (h topN) Swap(i, j int) {
52   h.entries[i], h.entries[j] = h.entries[j], h.entries[i]
53 }
54
55 func (h *topN) Push(x interface{}) {
56   h.entries = append(h.entries, x.(FsEntry))
57 }
58
59 func (h *topN) Pop() interface{} {
60   old := h.entries
61   n := len(old)
62   x := old[n-1]
63   h.entries = old[:n-1]
64   return x
65 }

The NewTopN constructor starting in line 17 creates the instance of the structure, typical of object-oriented Go programming, and then calls the heap.Init() heap initializer from the standard library. NewTopN then returns a pointer to the structure to the caller, and the caller uses the pointer as a reference to the object instance for future calls.

The Add() function starting in line 23 takes an FsEntry type variable with the path to a file and its size in bytes and dumps it onto the min heap. To do this, it uses the heap.Push() function from the standard container/heap library, but then Add() uses Len() to check that this action has not grown the heap beyond its permissible length. If this has happened, heap.Pop() in line 26 remedies the situation by removing and discarding the entry at the root of the tree (i.e., the smallest file).

To traverse the entries in the heap, Iterate() starting in line 30 dismantles the heap with successive calls to heap.Pop() within the for loop starting in line 34. Iterate() starts with the smallest element at the root of the tree and then works its way down to the next largest. However, I want to reverse the output, starting with the largest file and ending with the smallest. This is why the for loop starting at line 34 stores the incoming values in reverse order in a newly created array, named r. The for loop starting in line 37 then iterates forward through the array and calls the cb() callback function supplied by the caller for each node found.

The caller can define what happens to the results. In the case of the test program, the function simply outputs a combination of the path and the storage usage. Because heap.Pop() successively dismantles the heap, destroying it in the process, Iterate() saves the internal data structure up front to be able to restore it at the end. To do this, Iterate() copies the entries array slice into another array using Go's internal copy() function, before finally restoring the heap object with it. This keeps the heap intact for subsequent calls.

Programmed Magic

Attentive readers may now be wondering how it can be that the user-defined structure topN type suddenly appears as a parameter in calls such as heap.Init() without Go, with its strict type system, complaining about the intruder. How does the standard container/heap library know that the array slice used in the heap is in the entries field of the struct, or how to sort the FsEntry type elements in descending order by file size?

You can find the solution to this conundrum starting in line 43 of Listing 2: The topN type defines the Len(), Push(), Pop(), Less(), and Swap() functions, satisfying the requirements of the container/heap interface by doing so. This means that the standard library can determine the length of the heap, append elements to the end of the internal array, or remove the array's last element – without having to know the actual implementation of the type.

Less() starting in line 47 determines the smaller of two given elements, while Swap() specifies how the heap maintenance algorithm should swap two elements. Using this toolbox, a function like heap.Pop() from the standard library (not to be confused with Pop() from Listing 2, line 59) can then later remove the node at the root of the tree, return the node, and get the tree back into shape. Talk about magic happening under the hood!

Deep Dive

Charged with this magic, Listing 3 now proceeds to recursively search the directory specified by the user at the command line. It adds up the used bytes count of the files residing within the directory hierarchy to provide a total sum and creates a min heap with the five biggest storage hogs for each direct subdirectory.

Listing 3

space.go

01 package main
02
03 import (
04   "os"
05   "path/filepath"
06   "strings"
07 )
08
09 type duEntry struct {
10   h    *topN
11   used int64
12 }
13
14 func duDir(dir string, duTotal *map[string]duEntry) (int64, error) {
15   total := int64(0)
16
17   err := filepath.Walk(dir,
18     func(fpath string, info os.FileInfo, err error) error {
19       if err != nil {
20         return 0, err
21       }
22
23       if info.IsDir() {
24         return nil
25       }
26
27       fpath, _ = filepath.Rel(dir, fpath)
28       parts := strings.Split(fpath, "/")
29       fileName := strings.TrimPrefix(fpath, parts[0]+"/")
30       topPath := parts[0]
31
32       // find or create new path mapping
33       var due duEntry
34       var ok bool
35       if due, ok = (*duTotal)[topPath]; !ok {
36         due = duEntry{}
37         due.h = NewTopN(5)
38       }
39
40       // add file to dir's heap
41       fse := FsEntry{path: fileName, used: info.Size()}
42       fse.used = info.Size()
43       due.h.Add(fse)
44       due.used += info.Size()
45       (*duTotal)[topPath] = due
46       total += info.Size()
47       return nil
48     })
49
50   return total, err
51 }

To do this, the standard Walk() function from the Go filepath library delves into the directory structure of the disk in a depth-first search (Figure 5) and calls the callback defined starting in line 18 for each entry found. The callback ignores any subdirectories and just focuses on the files. Starting in line 27, it determines the relative path to the start directory and its topmost subdirectory (topPath) for the currently examined file (or link). For every file found, it maintains an entry under topPath in the hash map that stores both the running byte total within the hierarchy and a new min hash to keep track of the biggest files.

Figure 5: Depth-first tree traversal method.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Ready to Rumble

    A Go program writes a downloaded ISO file to a bootable USB stick. To prevent it from accidentally overwriting the hard disk, Mike Schilli provides it with a user interface and security checks.

  • Making History

    In the history log, the Bash shell records all commands typed by the user. Mike Schilli extracts this data with Go for a statistical analysis of his typing behavior.

  • Perl: Lyrical Logfiles

    Instead of just monitoring incoming requests in your web server's logfile, a sound server makes them audible and lets you listen to the tune of users surfing the site.

  • Motion Sensor

    Inotify lets applications subscribe to change notifications in the filesystem. Mike Schilli uses the cross-platform fsnotify library to instruct a Go program to detect what's happening.

  • Programming Snapshot – Go

    To find files quickly in the deeply nested subdirectories of his home directory, Mike whips up a Go program to index file metadata in an SQLite database.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News