Introducing sorting algorithms in Go
Programming Snapshot – Sorting in Go
Whether alphabetical or numerical, bubble sort or quicksort, there is no escape from sorting in computer science. In this month's column, Mike Schilli sorts out the pros and cons of various sorting algorithms in Go.
Long before the first computers existed, sorting data occupied mankind. The German-American Herman Hollerith invented an electromechanical machine as early as 1890 that sorted punched cards into different exit shafts to speed up the evaluation of the US census of that era. And even today, computers continue to sort data – whether this be for a list of YouTube video suggestions, the top 100 charts in the music industry by sales figures, or the slowest queries against a MySQL database for performance analysis purposes.
Machines sort at amazing speeds with programs changing the order of arbitrarily formed data structures. Most of the time, they're relying on a simple key as a sorting criterion, often an integer or a character string. This explains why classical reference works on sorting algorithms [1] [2] often only show you how to sort a series of numbers. This is because porting these algorithms to deal with more complex data structures is trivial and can be done by simply defining a mapping of the data structure to a key.
Bubbles Rise to the Surface
As the simplest sorting algorithm, programming beginners often learn how to bubble sort with two for
loops. In the first loop, the i
index in Listing 1 [3] works its way from left to right through the elements of an array. For each element examined, another for
loop starts with the j
index; it scans all the following elements and checks if the element under review from the first loop is smaller than all the following ones. If the procedure finds a smaller element further to the right, it simply swaps it with the one it is examining. In this way, all elements looked at by the first loop are guaranteed to not have any smaller elements to their right. Finally, when the first loop arrives at the far right end, the array is sorted in ascending order (Figure 1).
Listing 1
bubble.go
01 package main 02 03 import ( 04 "fmt" 05 ) 06 07 func main() { 08 items := []string{"green", "red", 09 "blue", "black", "white"} 10 11 for i, _ := range items { 12 for j := i + 1; j < len(items); j++ { 13 if items[i] > items[j] { 14 items[i], items[j] = 15 items[j], items[i] 16 } 17 } 18 } 19 20 fmt.Printf("%+v\n", items) 21 }
In the first for
loop starting in line 11, Listing 1 shows you the range
operator; for each element of the items
array slice, it returns the index, starting at 0, and the element itself. The loop header ignores the latter by assigning it to the pseudo-variable _
, since it is only interested in the index of the currently examined element.
The second for
loop in line 12 starts with the element directly to the right of the currently examined one; the index, j
, starts at i+1
. If an element is found at index position j
that is larger than the one referenced by i
, the program uses an a,b = b,a
construct to quickly swap the two, which is a boon in Go compared to other programming languages mandating the use of temporary variables. Listing 1 is easily compiled by typing
go build bubble.go
and the program will display an alphabetically sorted list of colors (Listing 2).
Listing 2
Color List
$ ./bubble [black blue green red white]
The bubble sort algorithm is easy to understand, but requires a relatively large number (O(n2)) of steps to complete the sorting task. Smarter methods like merge sort or quicksort offer better performance in general. For small amounts of data, such as arrays with up to five elements, bubble sort works well enough. It sorts the entries in the original array without external storage requirements, it is stable (the order of identical elements remains unchanged), and it handles presorted input data quickly.
Counting Sort
In job interviews, candidates are sometimes asked to name a fast sort algorithm. Before instinctively replying "quicksort," they might do well to consider whether the specific framework conditions of the problem at hand might allow for clever alternatives. For example, if the elements to be sorted consist of, say, just a few different keys, the counting sort algorithm [4] is most likely much quicker (Figure 2).
In Listing 3, the things
array to be sorted contains just a few repeating values, 3
, 4
, and 5
. Instead of firing up a traditional sorting algorithm, the implementation uses a newly allocated count
array, which uses counters at index positions 3
, 4
, and 5
to note how often the corresponding values occur in the input array. In this way, a value of 7 is written down at index position 3
, because there are seven 3
s in things
, and index position 4
in count
contains a 5, for the five 4
s in things
.
Listing 3
counting.go
01 package main 02 03 import ( 04 "fmt" 05 ) 06 07 func main() { 08 things := []int{5, 4, 3, 3, 3, 4, 4, 09 4, 4, 5, 5, 3, 3, 3, 3} 10 fmt.Printf("%+v\n", things) 11 12 count := []int{} 13 14 for _, thing := range things { 15 if len(count) <= thing { 16 count = append(count, make([]int, 17 thing - len(count) + 1)...) 18 } 19 count[thing]++ 20 } 21 fmt.Printf("count=%+v\n", count) 22 23 idx := 0 24 for val, run := range count { 25 for x := 0; x < run; x++ { 26 things[idx] = val 27 idx++ 28 } 29 } 30 31 fmt.Printf("%+v\n", things) 32 }
Extending Static Arrays
Since the sorting routine does not initially know how many different values will occur in the array to be sorted, it first creates an empty ("count") array. If the for
loop then wants to create an entry in count
at a position that exceeds the size of the array, line 16 uses the built-in append()
function to extend count
to the required size.
In contrast to typical scripting languages, Go does not work with arrays that grow dynamically; instead, they are allocated with fixed sizes. If a program creates a new array with capacity for three elements using make([]int, 3)
and later accesses an element outside the range 0 to 2, Go will throw a run-time error and terminate the program. This also applies to array slices, by the way. They only define a sliding window onto an existing static array. An array can only be extended by the built-in append()
function, which creates a new array with all additionally required elements appended at the end.
In other words, line 16 creates a new, longer array from the old count
array, which has been found to be too short, and then reassigns it to the original count
. The new array differs from the old one by the difference between the desired index (in the thing
variable) and the last index of the existing array. To do this, the code uses make()
to create a small array that fills up the difference and feeds its elements one by one (expanded by the ...
operator) to the append()
function.
After the first for
loop, the count
array contains the counters for the collected keys, as printed out as a checkpoint in line 21. The second for
loop starting in line 24 iterates over the counters in count
and fills the things
array with seven 3
s first, then with five 4
s, and so on. Although the procedure uses an external count
array for counting, the extra memory required is not relevant for a small number of different keys. Sorting is done in situ (i.e., the original array is modified). The number of required steps is linear (O(n)), so the method works orders of magnitude faster than even the most sophisticated generic sorting function for large amounts of data – but only because of the very special properties of the input data, of course.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Canonical Releases Ubuntu 24.04
After a brief pause because of the XZ vulnerability, Ubuntu 24.04 is now available for install.
-
Linux Servers Targeted by Akira Ransomware
A group of bad actors who have already extorted $42 million have their sights set on the Linux platform.
-
TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU
This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
-
XZ Gets the All-Clear
The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
-
Canonical Collaborates with Qualcomm on New Venture
This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.
-
Kodi 21.0 Open-Source Entertainment Hub Released
After a year of development, the award-winning Kodi cross-platform, media center software is now available with many new additions and improvements.
-
Linux Usage Increases in Two Key Areas
If market share is your thing, you'll be happy to know that Linux is on the rise in two areas that, if they keep climbing, could have serious meaning for Linux's future.
-
Vulnerability Discovered in xz Libraries
An urgent alert for Fedora 40 has been posted and users should pay attention.
-
Canonical Bumps LTS Support to 12 years
If you're worried that your Ubuntu LTS release won't be supported long enough to last, Canonical has a surprise for you in the form of 12 years of security coverage.
-
Fedora 40 Beta Released Soon
With the official release of Fedora 40 coming in April, it's almost time to download the beta and see what's new.