Go retrieves GPS data from the komoot app

Scraping the Web

The web scraper is running as a compiled Go binary called from the command line. As a browser replacement, it uses the Go Colly package, which has been featured in our Snapshot programming series previously [4]. The functions in Listing 4 log in to the komoot account (kLogin(), line 23), retrieve a list of tours stored there (kTours(), line 48), and extract the GPS data from individual tours (kTour(), line 70).

Listing 4

kfetch.go

01 package main
02
03 import (
04   "fmt"
05   "github.com/gocolly/colly/v2"
06 )
07
08 var loginURL = "https://account.komoot.com/v1/signin"
09 var signinURL = "https://account.komoot.com/actions/transfer?type=signin"
10
11 type kColl struct {
12   c     *colly.Collector
13   creds map[string]string
14 }
15
16 func NewkColl() kColl {
17   return kColl{
18     c:     colly.NewCollector(),
19     creds: readCreds(),
20   }
21 }
22
23 func (kc knoll) kLogin() error {
24   c := kc.c.Clone()
25   c.OnRequest(func(req *colly.Request) {
26     fmt.Println("Visiting", req.URL)
27   })
28
29   payload := map[string]string{
30     "email":    kc.creds["email"],
31     "password": kc.creds["password"],
32     "reason":   "null",
33   }
34
35   err := c.Post(loginURL, payload)
36   if err != nil {
37     return err
38   }
39
40   err = c.Visit(signinURL)
41   if err != nil {
42     return err
43   }
44
45   return nil
46 }
47
48 func (kc kColl) kTours() ([]byte, error) {
49   c := kc.c.Clone()
50   toursURL := fmt.Sprintf(
51     "https://www.komoot.com/user/%s/tours",
52     kc.creds["client_id"])
53
54   jdata := []byte{}
55   var err error
56
57   c.OnRequest(func(req *colly.Request) {
58     fmt.Println("Visiting", req.URL)
59     req.Headers.Set("onlyprops", "true")
60   })
61
62   c.OnResponse(func(resp *colly.Response) {
63     jdata = resp.Body
64   })
65
66   c.Visit(toursURL)
67   return jdata, err
68 }
69
70 func (kc kColl) kTour(tourID string) ([]byte, error) {
71   c := kc.c.Clone()
72   tourURL := fmt.Sprintf(
73     "https://www.komoot.com/tour/%s", tourID)
74
75   jdata := []byte{}
76   var err error
77
78   c.OnRequest(func(req *colly.Request) {
79     fmt.Println("Visiting", req.URL)
80     req.Headers.Set("onlyprops", "true")
81   })
82
83   c.OnResponse(func(resp *colly.Response)
84  {
85     jdata = resp.Body
86   })
87
88   c.Visit(tourURL)
89   return jdata, err
90 }

Go does not offer classic object orientation, but with a data structure like kColl in line 11, a constructor like NewkColl() in line 16, and receivers on the left side of the function names used as methods, it has something very similar, to all extents and purposes. The functions share the data structure, which the caller initializes once at the beginning with the constructor. The constructor in the code at hand stores an instance of the Colly scraper and the creds hash table with the previously obtained user credentials.

The Colly open source scraper library jumps to the OnRequest() callbacks before it executes the requested HTTP request with the Visit() or Post() functions. In Listing 4, Print() shows the user which URL is currently being processed and, in some cases, sets special HTTP headers so that the komoot servers won't return HTML code but easier-to-analyze JSON data.

Tasty Cookies

All three functions share a scraper instance that preserves the cookies set at the beginning of the komoot session, which starts when logging in because the server would not hand out the tour data to simply every Tom, Dick, and Harry. One thing to look out for, though: The Colly scraper does not replace the callbacks in the OnRequest() calls when you set them again later but stacks them up so that, in the code at hand, the third function would not output the URL being accessed once but three times. This is remedied by clones created with Clone(), which keep the cookies but reset the callbacks. Figure 5 shows how the program compiles with the listings that will be explained in the remaining sections of this article. It also illustrates the program's typical output as it finds tours on the server but only downloads them if they are not already available locally.

Figure 5: A typical call to kbak retrieves new tours from komoot.

Cats and Dogs

Komoot's web server delivers both the tour list and the details of individual tours in JSON format because of the headers set in Listing 4. JSON and Go are as compatible as cats and dogs, however, because JSON offers dynamic types with few type checks, while Go insists on precisely defined data structures. To convert deeply nested JSON text into internal Go data structures, programmers need to really coax the language. If you wanted to import JSON into a scripting language such as Python and convert it to GPX later on, you could do so effortlessly with just a dozen lines of code. Go, on the other hand, as you can see from Listing 5 and Listing 6, calls for some pretty exhausting requirements.

Listing 5

tours.go

01 package main
02
03 import (
04   "encoding/json"
05 )
06
07 func tourIDs(jdata []byte) []string {
08   var data map[string]interface{}
09
10   err := json.Unmarshal(jdata, &data)
11   if err != nil {
12     panic(err)
13   }
14
15   data = drill(data,
16     []string{"kmtx", "session",
17     "_embedded", "profile",
18     "_embedded", "tours",
19     "_embedded"})
20
21   items :=
22   data["items"].([]interface{})
23
24   ids := []string{}
25
26   for _, item := range items {
27     table :=
28     item.(map[string]interface{})
29     id := table["id"].(string)
30     ids = append(ids, id)
31   }
32
33   return ids
34 }
35
36 func drill(part map[string]interface{}, keys []string) map[string]interface{} {
37   for _, key := range keys {
38     part = part[key].(map[string]interface{})
39   }
40
41   return part
42 }

Listing 6

gpx.go

01 package main
02
03 import (
04   "encoding/json"
05   "fmt"
06   "time"
07 )
08
09 func toGpx(jdata []byte) []byte {
10   var data map[string]interface{}
11
12   json.Unmarshal([]byte(jdata), &data)
13   tour := drill(data, []string{
14     "page", "_embedded", "tour"})
15   start := tour["date"].(string)
16
17   coord := drill(tour, []string{
18     "_embedded", "coordinates"})
19   items :=
20     coord["items"].([]interface{})
21   ts, err := time.Parse(time.RFC3339, start)
22   if err != nil {
23     panic(err)
24   }
25
26   xml := "<gpx><trk>"
27   for _, item := range items {
28     pt := item.(map[string]interface{})
29     secs := pt["t"].(float64) / 1000.0
30     t := ts.Add(time.Duration(secs) * time.Second)
31     xml += fmt.Sprintf(`<trkseg>
32 <trkpt lat="%f" lon="%f">
33   <ele>%.1f</ele>
34   <time>%s</time>
35 </trkpt></trkseg>`, pt["lat"],
36     pt["lng"], pt["alt"],
37     t.Format(time.RFC3339))
38   }
39   xml += "</trk></gpx>\n"
40   return []byte(xml)
41 }

Since the komoot data is nested a whopping nine levels deep, the officially prescribed approach for the conversion would be a bit of a pain. It would mean defining the complete data structure with all its levels using struct declarations in Go. If you are wary of this much typing, you can simply define a one-dimensional map with an empty interface{} as a placeholder instead, like in line 8 of Listing 5, and make a type assertion to a hashmap each time when descending into the depths of the sub-hashmaps (line 38). Go then looks at the value, concludes that it might be a map, and lets you dig deeper.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Pathfinder

    When Mike Schilli is faced with the task of choosing a hiking tour from his collection of city trails, he turns to a DIY program trained to make useful suggestions.

  • Wanderlust

    For running statistics on his recorded hiking trails, Mike Schilli turns to Go to extract the GPS data while relying on plotters and APIs for a bit of geoanalysis.

  • File Inspector

    Spotify, the Internet music service, collects data about its users and their taste in music. Mike Schilli requested a copy of his files to investigate them with Go.

  • Data Scraper

    The Colly scraper helps developers who work with the Go programming language to collect data off the web. Mike Schilli illustrates the capabilities of this powerful tool with a few practical examples.

  • Knight's Tour

    If you're looking for a head start on solving the classic Knight's Tour chess challenge, try this homegrown Python script.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News