Go retrieves GPS data from the komoot app
Scraping the Web
The web scraper is running as a compiled Go binary called from the command line. As a browser replacement, it uses the Go Colly package, which has been featured in our Snapshot programming series previously [4]. The functions in Listing 4 log in to the komoot account (kLogin()
, line 23), retrieve a list of tours stored there (kTours()
, line 48), and extract the GPS data from individual tours (kTour()
, line 70).
Listing 4
kfetch.go
01 package main 02 03 import ( 04 "fmt" 05 "github.com/gocolly/colly/v2" 06 ) 07 08 var loginURL = "https://account.komoot.com/v1/signin" 09 var signinURL = "https://account.komoot.com/actions/transfer?type=signin" 10 11 type kColl struct { 12 c *colly.Collector 13 creds map[string]string 14 } 15 16 func NewkColl() kColl { 17 return kColl{ 18 c: colly.NewCollector(), 19 creds: readCreds(), 20 } 21 } 22 23 func (kc knoll) kLogin() error { 24 c := kc.c.Clone() 25 c.OnRequest(func(req *colly.Request) { 26 fmt.Println("Visiting", req.URL) 27 }) 28 29 payload := map[string]string{ 30 "email": kc.creds["email"], 31 "password": kc.creds["password"], 32 "reason": "null", 33 } 34 35 err := c.Post(loginURL, payload) 36 if err != nil { 37 return err 38 } 39 40 err = c.Visit(signinURL) 41 if err != nil { 42 return err 43 } 44 45 return nil 46 } 47 48 func (kc kColl) kTours() ([]byte, error) { 49 c := kc.c.Clone() 50 toursURL := fmt.Sprintf( 51 "https://www.komoot.com/user/%s/tours", 52 kc.creds["client_id"]) 53 54 jdata := []byte{} 55 var err error 56 57 c.OnRequest(func(req *colly.Request) { 58 fmt.Println("Visiting", req.URL) 59 req.Headers.Set("onlyprops", "true") 60 }) 61 62 c.OnResponse(func(resp *colly.Response) { 63 jdata = resp.Body 64 }) 65 66 c.Visit(toursURL) 67 return jdata, err 68 } 69 70 func (kc kColl) kTour(tourID string) ([]byte, error) { 71 c := kc.c.Clone() 72 tourURL := fmt.Sprintf( 73 "https://www.komoot.com/tour/%s", tourID) 74 75 jdata := []byte{} 76 var err error 77 78 c.OnRequest(func(req *colly.Request) { 79 fmt.Println("Visiting", req.URL) 80 req.Headers.Set("onlyprops", "true") 81 }) 82 83 c.OnResponse(func(resp *colly.Response) 84 { 85 jdata = resp.Body 86 }) 87 88 c.Visit(tourURL) 89 return jdata, err 90 }
Go does not offer classic object orientation, but with a data structure like kColl
in line 11, a constructor like NewkColl()
in line 16, and receivers on the left side of the function names used as methods, it has something very similar, to all extents and purposes. The functions share the data structure, which the caller initializes once at the beginning with the constructor. The constructor in the code at hand stores an instance of the Colly scraper and the creds
hash table with the previously obtained user credentials.
The Colly open source scraper library jumps to the OnRequest()
callbacks before it executes the requested HTTP request with the Visit()
or Post()
functions. In Listing 4, Print()
shows the user which URL is currently being processed and, in some cases, sets special HTTP headers so that the komoot servers won't return HTML code but easier-to-analyze JSON data.
Tasty Cookies
All three functions share a scraper instance that preserves the cookies set at the beginning of the komoot session, which starts when logging in because the server would not hand out the tour data to simply every Tom, Dick, and Harry. One thing to look out for, though: The Colly scraper does not replace the callbacks in the OnRequest()
calls when you set them again later but stacks them up so that, in the code at hand, the third function would not output the URL being accessed once but three times. This is remedied by clones created with Clone()
, which keep the cookies but reset the callbacks. Figure 5 shows how the program compiles with the listings that will be explained in the remaining sections of this article. It also illustrates the program's typical output as it finds tours on the server but only downloads them if they are not already available locally.

Cats and Dogs
Komoot's web server delivers both the tour list and the details of individual tours in JSON format because of the headers set in Listing 4. JSON and Go are as compatible as cats and dogs, however, because JSON offers dynamic types with few type checks, while Go insists on precisely defined data structures. To convert deeply nested JSON text into internal Go data structures, programmers need to really coax the language. If you wanted to import JSON into a scripting language such as Python and convert it to GPX later on, you could do so effortlessly with just a dozen lines of code. Go, on the other hand, as you can see from Listing 5 and Listing 6, calls for some pretty exhausting requirements.
Listing 5
tours.go
01 package main 02 03 import ( 04 "encoding/json" 05 ) 06 07 func tourIDs(jdata []byte) []string { 08 var data map[string]interface{} 09 10 err := json.Unmarshal(jdata, &data) 11 if err != nil { 12 panic(err) 13 } 14 15 data = drill(data, 16 []string{"kmtx", "session", 17 "_embedded", "profile", 18 "_embedded", "tours", 19 "_embedded"}) 20 21 items := 22 data["items"].([]interface{}) 23 24 ids := []string{} 25 26 for _, item := range items { 27 table := 28 item.(map[string]interface{}) 29 id := table["id"].(string) 30 ids = append(ids, id) 31 } 32 33 return ids 34 } 35 36 func drill(part map[string]interface{}, keys []string) map[string]interface{} { 37 for _, key := range keys { 38 part = part[key].(map[string]interface{}) 39 } 40 41 return part 42 }
Listing 6
gpx.go
01 package main 02 03 import ( 04 "encoding/json" 05 "fmt" 06 "time" 07 ) 08 09 func toGpx(jdata []byte) []byte { 10 var data map[string]interface{} 11 12 json.Unmarshal([]byte(jdata), &data) 13 tour := drill(data, []string{ 14 "page", "_embedded", "tour"}) 15 start := tour["date"].(string) 16 17 coord := drill(tour, []string{ 18 "_embedded", "coordinates"}) 19 items := 20 coord["items"].([]interface{}) 21 ts, err := time.Parse(time.RFC3339, start) 22 if err != nil { 23 panic(err) 24 } 25 26 xml := "<gpx><trk>" 27 for _, item := range items { 28 pt := item.(map[string]interface{}) 29 secs := pt["t"].(float64) / 1000.0 30 t := ts.Add(time.Duration(secs) * time.Second) 31 xml += fmt.Sprintf(`<trkseg> 32 <trkpt lat="%f" lon="%f"> 33 <ele>%.1f</ele> 34 <time>%s</time> 35 </trkpt></trkseg>`, pt["lat"], 36 pt["lng"], pt["alt"], 37 t.Format(time.RFC3339)) 38 } 39 xml += "</trk></gpx>\n" 40 return []byte(xml) 41 }
Since the komoot data is nested a whopping nine levels deep, the officially prescribed approach for the conversion would be a bit of a pain. It would mean defining the complete data structure with all its levels using struct
declarations in Go. If you are wary of this much typing, you can simply define a one-dimensional map with an empty interface{}
as a placeholder instead, like in line 8 of Listing 5, and make a type assertion to a hashmap each time when descending into the depths of the sub-hashmaps (line 38). Go then looks at the value, concludes that it might be a map, and lets you dig deeper.
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Direct Download
Read full article as PDF:
Price $2.95
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
News
-
An All-Snap Version of Ubuntu is In The Works
Along with the standard deb version of the open-source operating system, Canonical will release an-all snap version.
-
Mageia 9 Beta 2 Ready for Testing
The latest beta of the popular Mageia distribution now includes the latest kernel and plenty of updated applications.
-
KDE Plasma 6 Looks to Bring Basic HDR Support
The KWin piece of KDE Plasma now has HDR support and color management geared for the 6.0 release.
-
Bodhi Linux 7.0 Beta Ready for Testing
The latest iteration of the Bohdi Linux distribution is now available for those who want to experience what's in store and for testing purposes.
-
Changes Coming to Ubuntu PPA Usage
The way you manage Personal Package Archives will be changing with the release of Ubuntu 23.10.
-
AlmaLinux 9.2 Now Available for Download
AlmaLinux has been released and provides a free alternative to upstream Red Hat Enterprise Linux.
-
An Immutable Version of Fedora Is Under Consideration
For anyone who's a fan of using immutable versions of Linux, the Fedora team is currently considering adding a new spin called Fedora Onyx.
-
New Release of Br OS Includes ChatGPT Integration
Br OS 23.04 is now available and is geared specifically toward web content creation.
-
Command-Line Only Peropesis 2.1 Available Now
The latest iteration of Peropesis has been released with plenty of updates and introduces new software development tools.
-
TUXEDO Computers Announces InfinityBook Pro 14
With the new generation of their popular InfinityBook Pro 14, TUXEDO upgrades its ultra-mobile, powerful business laptop with some impressive specs.