Tracking down hard disk space consumption with Go
Chart Topper Tree
Listing 2 shows the implementation of the top N list using a min heap from the Go container/heap
standard library. Its nodes are variables of the FsEntry
type structure starting in line 7, which hold the access path to a file along with its size in bytes. The topN
structure starting in line 12 contains an entire array slice of FsEntry
structures in the entries
field and the maximum size of the heap used in the n
field.
Listing 2
heap.go
01 package main 02 03 import ( 04 "container/heap" 05 ) 06 07 type FsEntry struct { 08 path string 09 used int64 10 } 11 12 type topN struct { 13 entries []FsEntry 14 n int 15 } 16 17 func NewTopN(n int) *topN { 18 h := &topN{n: n, entries: []FsEntry{}} 19 heap.Init(h) 20 return h 21 } 22 23 func (h *topN) Add(fse FsEntry) { 24 heap.Push(h, fse) 25 if h.Len() > h.n { 26 heap.Pop(h) 27 } 28 } 29 30 func (h *topN) Iterate(cb func(fse FsEntry)) { 31 save := make([]FsEntry, len(h.entries)) 32 copy(save, h.entries) 33 r := make([]FsEntry, h.Len()) 34 for i := len(r) - 1; i >= 0; i-- { 35 r[i] = heap.Pop(h).(FsEntry) 36 } 37 for _, e := range r { 38 cb(e) 39 } 40 h.entries = save 41 } 42 43 func (h topN) Len() int { 44 return len(h.entries) 45 } 46 47 func (h topN) Less(i, j int) bool { 48 return h.entries[i].used < h.entries[j].used 49 } 50 51 func (h topN) Swap(i, j int) { 52 h.entries[i], h.entries[j] = h.entries[j], h.entries[i] 53 } 54 55 func (h *topN) Push(x interface{}) { 56 h.entries = append(h.entries, x.(FsEntry)) 57 } 58 59 func (h *topN) Pop() interface{} { 60 old := h.entries 61 n := len(old) 62 x := old[n-1] 63 h.entries = old[:n-1] 64 return x 65 }
The NewTopN
constructor starting in line 17 creates the instance of the structure, typical of object-oriented Go programming, and then calls the heap.Init()
heap initializer from the standard library. NewTopN
then returns a pointer to the structure to the caller, and the caller uses the pointer as a reference to the object instance for future calls.
The Add()
function starting in line 23 takes an FsEntry
type variable with the path to a file and its size in bytes and dumps it onto the min heap. To do this, it uses the heap.Push()
function from the standard container/heap
library, but then Add()
uses Len()
to check that this action has not grown the heap beyond its permissible length. If this has happened, heap.Pop()
in line 26 remedies the situation by removing and discarding the entry at the root of the tree (i.e., the smallest file).
To traverse the entries in the heap, Iterate()
starting in line 30 dismantles the heap with successive calls to heap.Pop()
within the for
loop starting in line 34. Iterate()
starts with the smallest element at the root of the tree and then works its way down to the next largest. However, I want to reverse the output, starting with the largest file and ending with the smallest. This is why the for
loop starting at line 34 stores the incoming values in reverse order in a newly created array, named r
. The for
loop starting in line 37 then iterates forward through the array and calls the cb()
callback function supplied by the caller for each node found.
The caller can define what happens to the results. In the case of the test program, the function simply outputs a combination of the path and the storage usage. Because heap.Pop()
successively dismantles the heap, destroying it in the process, Iterate()
saves the internal data structure up front to be able to restore it at the end. To do this, Iterate()
copies the entries
array slice into another array using Go's internal copy()
function, before finally restoring the heap object with it. This keeps the heap intact for subsequent calls.
Programmed Magic
Attentive readers may now be wondering how it can be that the user-defined structure topN
type suddenly appears as a parameter in calls such as heap.Init()
without Go, with its strict type system, complaining about the intruder. How does the standard container/heap
library know that the array slice used in the heap is in the entries
field of the struct, or how to sort the FsEntry
type elements in descending order by file size?
You can find the solution to this conundrum starting in line 43 of Listing 2: The topN
type defines the Len()
, Push()
, Pop()
, Less()
, and Swap()
functions, satisfying the requirements of the container/heap
interface by doing so. This means that the standard library can determine the length of the heap, append elements to the end of the internal array, or remove the array's last element – without having to know the actual implementation of the type.
Less()
starting in line 47 determines the smaller of two given elements, while Swap()
specifies how the heap maintenance algorithm should swap two elements. Using this toolbox, a function like heap.Pop()
from the standard library (not to be confused with Pop()
from Listing 2, line 59) can then later remove the node at the root of the tree, return the node, and get the tree back into shape. Talk about magic happening under the hood!
Deep Dive
Charged with this magic, Listing 3 now proceeds to recursively search the directory specified by the user at the command line. It adds up the used bytes count of the files residing within the directory hierarchy to provide a total sum and creates a min heap with the five biggest storage hogs for each direct subdirectory.
Listing 3
space.go
01 package main 02 03 import ( 04 "os" 05 "path/filepath" 06 "strings" 07 ) 08 09 type duEntry struct { 10 h *topN 11 used int64 12 } 13 14 func duDir(dir string, duTotal *map[string]duEntry) (int64, error) { 15 total := int64(0) 16 17 err := filepath.Walk(dir, 18 func(fpath string, info os.FileInfo, err error) error { 19 if err != nil { 20 return 0, err 21 } 22 23 if info.IsDir() { 24 return nil 25 } 26 27 fpath, _ = filepath.Rel(dir, fpath) 28 parts := strings.Split(fpath, "/") 29 fileName := strings.TrimPrefix(fpath, parts[0]+"/") 30 topPath := parts[0] 31 32 // find or create new path mapping 33 var due duEntry 34 var ok bool 35 if due, ok = (*duTotal)[topPath]; !ok { 36 due = duEntry{} 37 due.h = NewTopN(5) 38 } 39 40 // add file to dir's heap 41 fse := FsEntry{path: fileName, used: info.Size()} 42 fse.used = info.Size() 43 due.h.Add(fse) 44 due.used += info.Size() 45 (*duTotal)[topPath] = due 46 total += info.Size() 47 return nil 48 }) 49 50 return total, err 51 }
To do this, the standard Walk()
function from the Go filepath
library delves into the directory structure of the disk in a depth-first search (Figure 5) and calls the callback defined starting in line 18 for each entry found. The callback ignores any subdirectories and just focuses on the files. Starting in line 27, it determines the relative path to the start directory and its topmost subdirectory (topPath
) for the currently examined file (or link). For every file found, it maintains an entry under topPath
in the hash map that stores both the running byte total within the hierarchy and a new min hash to keep track of the biggest files.
« Previous 1 2 3 4 Next »
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
-
Armbian 24.11 Released with Expanded Hardware Support
If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
-
SUSE Renames Several Products for Better Name Recognition
SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
-
Red Hat Enterprise Linux 9.5 Released
Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.
-
Linux Sees Massive Performance Increase from a Single Line of Code
With one line of code, Intel was able to increase the performance of the Linux kernel by 4,000 percent.
-
Fedora KDE Approved as an Official Spin
If you prefer the Plasma desktop environment and the Fedora distribution, you're in luck because there's now an official spin that is listed on the same level as the Fedora Workstation edition.
-
New Steam Client Ups the Ante for Linux
The latest release from Steam has some pretty cool tricks up its sleeve.
-
Gnome OS Transitioning Toward a General-Purpose Distro
If you're looking for the perfectly vanilla take on the Gnome desktop, Gnome OS might be for you.