Tracking down hard disk space consumption with Go
Programming Snapshot – Disk Space Analyzer
This month, Mike Schilli writes a quick terminal UI in Go to help track down space hogs on his hard disk.
When installing a new hard disk, users think they are in seventh heaven – it seems almost impossible to ever fill up the drive. Given today's huge capacities, though, virtually everyone becomes a thoughtless space waster sooner or later. As the disk fills up, space needs to be freed up to keep things going. Most of the time, the culprit is huge files no longer in use that are still hanging around in forgotten directories and causing overcrowding. The most effective strategy is often a simple one of tracking down the biggest space wasters and then deleting the unneeded files.
The spacetree
Go program helps with this housecleaning by pointing to the most storage-intensive paths starting from a root directory and provides their disk space usage in bytes. As Figure 1 shows, the Go binary, which you call with the directory to be analyzed, switches the terminal to graphics mode and displays the storage paths as an interactive tree. You can expand the green elements in the tree with a mouse click or by pressing the Enter key to determine which folders contain the fattest files.
Biggest Fish
In Figure 1, the Go spacetree
binary takes a look at the ~/git/
directory and discovers some 14GB of data there. Using the arrow keys, their vi equivalents J and K, or even the mouse if you fancy, you can navigate to the directories highlighted in green and expand them by pressing the Enter key. Figure 2 shows that, for example, the articles/
directory, which contains the articles from 26 years of this column, occupies a total of almost 4GB.
Most of it, some 850MB, resides in the internal Git directory, .git
, which I can't touch without disrupting its inner workings. However, for the biggest disposable space waster, check out the subdirectory with the wifi
network checker from the October 2023 issue: This directory still has the screencast video in it! By deleting it, I've saved another 850MB. On top of that, the March 2019 progressbar
article contains the unused foo
and foo.bak
files that really need to be removed.
Smarter with Heap
But how does the Go program traverse the deeply nested file structure to count the drive bytes used by each file and print a short list of the largest files? It finds the 10 biggest files from a huge number of files, which can easily be millons on a heavily used system. The simple sounding solution of dropping all the entries into an array, sorting the array in descending order by file size, and then outputting the first 10 entries is not an option. Even a highly effective sorting algorithm would take quite some time to process a million entries, not to mention the RAM wasted by such a huge array. After all, only 10 entries are of interest at the end of the day.
The so-called "min heap" proves to be an efficient data structure for problems like this. The nodes of this binary tree always satisfy the rule that the entries below a given node always have larger values than the node itself. In Figure 3, the node with the value 3 is at the top of the tree root and is smaller than all other nodes in the tree. Note, though, that the rule applies only to directly or indirectly connected nodes. In Figure 3, for example, it is perfectly okay for the first element of the second row from the top, which belongs to the left subtree, to contain a value of 8 while the third row of the right subtree contains a value of 7. They're not in a direct or indirect parent/child relationship, so the rule doesn't apply.
Minimal Heap
Software programs represent the tree as a simple array which, as shown in Figure 3, creates subsets of one, two, four, eight, and so on that border on each other seamlessly, each subset representing one horizontal row within the tree. It is now relatively easy to find and remove the smallest entry in the tree, because it is always at the top, in the root element of the tree. Removing the node leaves an empty spot and disrupts the tree order, but the standardized heap algorithm can reorder the tree so that another node takes the place of the removed one. This causes subsequent nodes to bubble up, until the tree again satisfies the heap rules.
This makes the data structure ideal for an actively maintained list of the N largest files found so far. If the tree initially contains less than N nodes, all newly found files are moved into it without checking. On the other hand, if the tree grows to N+1 nodes when a new file is inserted, it needs to be shrunk back to N nodes. This works best if the algorithm removes the root node with the smallest file and lets the remaining nodes move up by repairing the tree.
Figure 4 shows the output of the test program in Listing 1. It first fills the heap with the paths to eight files including their sizes in bytes. However, it limits the maximum size of the heap to six entries. The result is the six largest files from the entire collection, sorted in descending order by file size. As expected, the abc
and ghi
entries with byte counts of 123 and 124 are dropped as the smallest files.
Listing 1
heap-main.go
01 package main 02 import ( 03 "fmt" 04 ) 05 func main() { 06 entries := []FsEntry{ 07 {path: "abc", used: 123}, 08 {path: "def", used: 456}, 09 {path: "ghi", used: 124}, 10 {path: "jkl", used: 457}, 11 {path: "mno", used: 125}, 12 {path: "pqr", used: 458}, 13 {path: "stu", used: 126}, 14 {path: "vwx", used: 999}, 15 } 16 17 h := NewTopN(6) 18 for _, e := range entries { 19 h.Add(e) 20 } 21 22 h.Iterate(func(e FsEntry) { 23 fmt.Printf("%s %d\n", e.path, e.used) 24 }) 25 }
Listing 1 uses NewTopN(6)
in line 17 to create a min heap with a maximum of six nodes. The for
loop starting in line 18 uses Add()
in each round to place the next entry from the entries
array slice onto the min heap. Finally, Iterate()
works its way through the winning entries in the data structure starting in line 22, from the file with the highest byte count to the smallest of the top six.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
New Slimbook EVO with Raw AMD Ryzen Power
If you're looking for serious power in a 14" ultrabook that is powered by Linux, Slimbook has just the thing for you.
-
The Gnome Foundation Struggling to Stay Afloat
The foundation behind the Gnome desktop environment is having to go through some serious belt-tightening due to continued financial problems.
-
Thousands of Linux Servers Infected with Stealth Malware Since 2021
Perfctl is capable of remaining undetected, which makes it dangerous and hard to mitigate.
-
Halcyon Creates Anti-Ransomware Protection for Linux
As more Linux systems are targeted by ransomware, Halcyon is stepping up its protection.
-
Valve and Arch Linux Announce Collaboration
Valve and Arch have come together for two projects that will have a serious impact on the Linux distribution.
-
Hacker Successfully Runs Linux on a CPU from the Early ‘70s
From the office of "Look what I can do," Dmitry Grinberg was able to get Linux running on a processor that was created in 1971.
-
OSI and LPI Form Strategic Alliance
With a goal of strengthening Linux and open source communities, this new alliance aims to nurture the growth of more highly skilled professionals.
-
Fedora 41 Beta Available with Some Interesting Additions
If you're a Fedora fan, you'll be excited to hear the beta version of the latest release is now available for testing and includes plenty of updates.
-
AlmaLinux Unveils New Hardware Certification Process
The AlmaLinux Hardware Certification Program run by the Certification Special Interest Group (SIG) aims to ensure seamless compatibility between AlmaLinux and a wide range of hardware configurations.
-
Wind River Introduces eLxr Pro Linux Solution
eLxr Pro offers an end-to-end Linux solution backed by expert commercial support.