Request Spotify dossiers and evaluate them with Go and R

Programming Snapshot – Go and R

© Lead Image © bowie15, 123RF.com

© Lead Image © bowie15, 123RF.com

Article from Issue 268/2023
Author(s):

Spotify, the Internet music service, collects data about its users and their taste in music. Mike Schilli requested a copy of his files to investigate them with Go.

Streaming services such as Spotify or Apple Music dominate the music industry. Their extensive catalogs now cover the entire spectrum of consumable music. Relying on artificial intelligence, these services introduce users to new songs they'll probably like, as predicted by the services' algorithms. Traditional physical music media no longer stand a chance against this and gather dust on the shelves. Of course, this development also means that anonymous music consumption is a thing of the past, because streaming services keep precise records of who played what track, when, and for how long.

On request, Spotify will even hand over the acquired data (Figure 1). If you poke around a bit on their website, you'll find the buttons you need to press to request a copy of these files in Account | Privacy Settings, but Spotify takes their sweet time to respond. From the time of the request, it takes about a week for their archivist to retrieve the data from the files in the Spotify basement, compress them, and post them as a ZIP archive on the website for you to pick up. After receiving Spotify's email notification, you can then download the data for two weeks and poke around in it locally to your heart's content.

Figure 1: Spotify lets its users view the data it collected about them.

Exercise

The ZIP file containing the downloaded data includes a JSON file named StreamingHistory0.json with the metadata of all the streams you played in historical order (Figure 2). In addition to the song and artist, the entries also list the start date and time and the playback duration. Playback duration is particularly interesting because if the user interrupts a stream after a few seconds and fast forwards to the next song, the track probably made it onto the playlist by mistake and was something the user didn't actually like. It will most likely turn out to be a false positive when it comes to putting together music suggestions.

Figure 2: JSON data from the author's streaming history.

As an exercise, Listing 1 shows a Go program that traverses the JSON data and creates charts featuring the most frequently played tracks. The top three output in Listing 2 shows you my favorite songs – minus the ones that I excluded because they were just too embarrassing to own up to.

Listing 1

stats.go

 

Listing 2

Top Three Songs

 

To do this, Listing 1 opens the JSON file in line 17 and returns a byte array with its content in the content variable. Line 22 passes this to the Unmarshal function from the json package in Go's standard library, along with a pointer to a stream type structure defined previously in line 8. As you know, Go insists on strict type checking. In order for the JSON parser to create an internal Go data structure from the Spotify data, the format must be known and also match that of the actual JSON format.

The JSON blob provided by Spotify, as shown in Figure 2, consists of an array whose elements each correspond to a streamed track. The artist and track names are stored as strings in the artistName and trackName fields. msPlayed gives you the playback time in milliseconds, while endTime has the date and time at the end of playback.

The fields of the stream structure in Listing 1 each start with a capital letter, which means that other packages can also access them later on. However, this means that the names are not identical to the variable names in JSON format, each of which starts with a lowercase letter. However, this is no big deal, because Go lets you give a structure a name that can differ from the field name with the json: tag.

For example, ArtistName string `json:artistName` in line 10 specifies that the artist in the ArtistName field is of the string type in the Go structure, and the name used for it in the incoming JSON is artistName. This is all you need for json.Unmarshal() to dig through all the entries in the JSON file in line 22, because the function has been passed a pointer to what is still an empty array of these stream entries in data. Using Go's reflection mechanism, the function figures out which JSON structures it needs to work its way through.

Listing 1 counts how many times each song occurs in the streaming history in the bySong map defined in line 16. To do this, it uses the title's string as a key and increments the 64-bit integer map value by one for each playback event it finds in the streaming data. At the end, the function then needs to sort the map by the highest integer value in descending order to output the top three.

Sorting Is No Piece of Cake

In a scripting language, sorting the map data would be a snap, but Go offers type safety, and that's why Listing 1 converts the map entries into an array slice of kv (for Key/Value) structures whose type it defines starting in line 30. The for loop starting in line 35 then needs to slog through the entries of the map and append each key value pair it finds as a kv struct to the kvs array slice. The slice can then be sorted by Go's standard sort.Slice() function. The callback in line 39 tells it that it can determine the desired order of two entries in the slice at positions i and j by a numeric comparison of the two counters at those positions.

Wow, that's pretty convoluted! At the end, the for loop from line 41 goes through the sorted array, outputs the top positions, and terminates after the third value.

Faster with R

Go programs for parsing JSON data and computing statistics are a real pain. Go's type safety requires a disproportionate amount of boilerplate code here, which scripting languages just elegantly do without. This calls for a classic data wrangling language like R, which takes a more carefree approach, saving programmers a great deal of work. If you don't have R on your machine yet, simply install it on Ubuntu, for example, with:

sudo apt install r-base

Listing 3 shows a simple application that scans a user's Spotify streaming history, produces a histogram of the actual playing times of the songs they listened to, and displays it nicely as a bar graph (Figure 3). The diagram illustrates that many songs were simply canceled after less than 15 seconds (15,000 milliseconds). In this case, Spotify's suggestion algorithm most likely made a mistake, annoying the listener, who then switched to the next song. Starting at about one minute of playback time (i.e., after 60,000 milliseconds), an almost Gaussian-like bell curve appears, peaking at 220 seconds. Most songs these days are about three and a half minutes long, with the majority being between two and five minutes.

Listing 3

hist.r

 

Figure 3: Histogram on playback duration, generated by Listing 3

To be able to call Listing 3 at the command line, the shebang statement in line 1 searches for the Rscript program in the shell's search paths and calls the underlying R interpreter with the program code from the listing. Also make sure to mark the file hist.r (Listing 3) as executable via the chmod +x command.

For an elegant approach to reading the JSON data, Listing 3 uses the jsonlite package; you will need to install this up front. After opening an R session (just type R at the command line), the install.packages("jsonlite") command loads the package's C++ sources from the Comprehensive R Archive Network (CRAN), compiles them locally, and integrates the library into the local R universe. After that, any R script can use library("jsonlite") to include the new library and call functions from it.

Line 3 reads the JSON data from the streaming history using the fromJSON function exported from jsonlite and stores it as a dataframe in the jdata variable. This R standard type is a kind of database table with row-by-row vector values, each spanning multiple columns. In addition to numeric values and character strings, the columns can also contain what are known as factors. In R, these factors are variables with a certain number of possible values, for example, small, medium, and large.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Perl: Spotify

    For a monthly fee, the Spotify streaming service beams music onto your desktop or phone. To intensify the groove, Perlmeister Mike Schilli archived his Spotify playlists for eternity using an OAuth-protected web API.

  • Migrating Music

    Use a Python API to migrate a music library from SQL to a NoSQL document database.

  • Waxing Lyrical

    Whether you listen to music on Spotify or a classic audio player like Rhythmbox, Lollypop, or Audacious, a tool named Lyrics-in-terminal will let you read the lyrics for the track you are currently playing.

  • JSON Deep Dive

    JSON data format is a standard feature of today's Internet – and a common option for mobile and desktop apps – but many users still regard it as something of a mystery. We'll take a close look at JSON format and some of the free tools you can use for reading and manipulating JSON data.

  • Pathfinder

    When Mike Schilli is faced with the task of choosing a hiking tour from his collection of city trails, he turns to a DIY program trained to make useful suggestions.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News