Go retrieves GPS data from the komoot app
Scraping the Web
The web scraper is running as a compiled Go binary called from the command line. As a browser replacement, it uses the Go Colly package, which has been featured in our Snapshot programming series previously [4]. The functions in Listing 4 log in to the komoot account (kLogin()
, line 23), retrieve a list of tours stored there (kTours()
, line 48), and extract the GPS data from individual tours (kTour()
, line 70).
Listing 4
kfetch.go
01 package main 02 03 import ( 04 "fmt" 05 "github.com/gocolly/colly/v2" 06 ) 07 08 var loginURL = "https://account.komoot.com/v1/signin" 09 var signinURL = "https://account.komoot.com/actions/transfer?type=signin" 10 11 type kColl struct { 12 c *colly.Collector 13 creds map[string]string 14 } 15 16 func NewkColl() kColl { 17 return kColl{ 18 c: colly.NewCollector(), 19 creds: readCreds(), 20 } 21 } 22 23 func (kc knoll) kLogin() error { 24 c := kc.c.Clone() 25 c.OnRequest(func(req *colly.Request) { 26 fmt.Println("Visiting", req.URL) 27 }) 28 29 payload := map[string]string{ 30 "email": kc.creds["email"], 31 "password": kc.creds["password"], 32 "reason": "null", 33 } 34 35 err := c.Post(loginURL, payload) 36 if err != nil { 37 return err 38 } 39 40 err = c.Visit(signinURL) 41 if err != nil { 42 return err 43 } 44 45 return nil 46 } 47 48 func (kc kColl) kTours() ([]byte, error) { 49 c := kc.c.Clone() 50 toursURL := fmt.Sprintf( 51 "https://www.komoot.com/user/%s/tours", 52 kc.creds["client_id"]) 53 54 jdata := []byte{} 55 var err error 56 57 c.OnRequest(func(req *colly.Request) { 58 fmt.Println("Visiting", req.URL) 59 req.Headers.Set("onlyprops", "true") 60 }) 61 62 c.OnResponse(func(resp *colly.Response) { 63 jdata = resp.Body 64 }) 65 66 c.Visit(toursURL) 67 return jdata, err 68 } 69 70 func (kc kColl) kTour(tourID string) ([]byte, error) { 71 c := kc.c.Clone() 72 tourURL := fmt.Sprintf( 73 "https://www.komoot.com/tour/%s", tourID) 74 75 jdata := []byte{} 76 var err error 77 78 c.OnRequest(func(req *colly.Request) { 79 fmt.Println("Visiting", req.URL) 80 req.Headers.Set("onlyprops", "true") 81 }) 82 83 c.OnResponse(func(resp *colly.Response) 84 { 85 jdata = resp.Body 86 }) 87 88 c.Visit(tourURL) 89 return jdata, err 90 }
Go does not offer classic object orientation, but with a data structure like kColl
in line 11, a constructor like NewkColl()
in line 16, and receivers on the left side of the function names used as methods, it has something very similar, to all extents and purposes. The functions share the data structure, which the caller initializes once at the beginning with the constructor. The constructor in the code at hand stores an instance of the Colly scraper and the creds
hash table with the previously obtained user credentials.
The Colly open source scraper library jumps to the OnRequest()
callbacks before it executes the requested HTTP request with the Visit()
or Post()
functions. In Listing 4, Print()
shows the user which URL is currently being processed and, in some cases, sets special HTTP headers so that the komoot servers won't return HTML code but easier-to-analyze JSON data.
Tasty Cookies
All three functions share a scraper instance that preserves the cookies set at the beginning of the komoot session, which starts when logging in because the server would not hand out the tour data to simply every Tom, Dick, and Harry. One thing to look out for, though: The Colly scraper does not replace the callbacks in the OnRequest()
calls when you set them again later but stacks them up so that, in the code at hand, the third function would not output the URL being accessed once but three times. This is remedied by clones created with Clone()
, which keep the cookies but reset the callbacks. Figure 5 shows how the program compiles with the listings that will be explained in the remaining sections of this article. It also illustrates the program's typical output as it finds tours on the server but only downloads them if they are not already available locally.
Cats and Dogs
Komoot's web server delivers both the tour list and the details of individual tours in JSON format because of the headers set in Listing 4. JSON and Go are as compatible as cats and dogs, however, because JSON offers dynamic types with few type checks, while Go insists on precisely defined data structures. To convert deeply nested JSON text into internal Go data structures, programmers need to really coax the language. If you wanted to import JSON into a scripting language such as Python and convert it to GPX later on, you could do so effortlessly with just a dozen lines of code. Go, on the other hand, as you can see from Listing 5 and Listing 6, calls for some pretty exhausting requirements.
Listing 5
tours.go
01 package main 02 03 import ( 04 "encoding/json" 05 ) 06 07 func tourIDs(jdata []byte) []string { 08 var data map[string]interface{} 09 10 err := json.Unmarshal(jdata, &data) 11 if err != nil { 12 panic(err) 13 } 14 15 data = drill(data, 16 []string{"kmtx", "session", 17 "_embedded", "profile", 18 "_embedded", "tours", 19 "_embedded"}) 20 21 items := 22 data["items"].([]interface{}) 23 24 ids := []string{} 25 26 for _, item := range items { 27 table := 28 item.(map[string]interface{}) 29 id := table["id"].(string) 30 ids = append(ids, id) 31 } 32 33 return ids 34 } 35 36 func drill(part map[string]interface{}, keys []string) map[string]interface{} { 37 for _, key := range keys { 38 part = part[key].(map[string]interface{}) 39 } 40 41 return part 42 }
Listing 6
gpx.go
01 package main 02 03 import ( 04 "encoding/json" 05 "fmt" 06 "time" 07 ) 08 09 func toGpx(jdata []byte) []byte { 10 var data map[string]interface{} 11 12 json.Unmarshal([]byte(jdata), &data) 13 tour := drill(data, []string{ 14 "page", "_embedded", "tour"}) 15 start := tour["date"].(string) 16 17 coord := drill(tour, []string{ 18 "_embedded", "coordinates"}) 19 items := 20 coord["items"].([]interface{}) 21 ts, err := time.Parse(time.RFC3339, start) 22 if err != nil { 23 panic(err) 24 } 25 26 xml := "<gpx><trk>" 27 for _, item := range items { 28 pt := item.(map[string]interface{}) 29 secs := pt["t"].(float64) / 1000.0 30 t := ts.Add(time.Duration(secs) * time.Second) 31 xml += fmt.Sprintf(`<trkseg> 32 <trkpt lat="%f" lon="%f"> 33 <ele>%.1f</ele> 34 <time>%s</time> 35 </trkpt></trkseg>`, pt["lat"], 36 pt["lng"], pt["alt"], 37 t.Format(time.RFC3339)) 38 } 39 xml += "</trk></gpx>\n" 40 return []byte(xml) 41 }
Since the komoot data is nested a whopping nine levels deep, the officially prescribed approach for the conversion would be a bit of a pain. It would mean defining the complete data structure with all its levels using struct
declarations in Go. If you are wary of this much typing, you can simply define a one-dimensional map with an empty interface{}
as a placeholder instead, like in line 8 of Listing 5, and make a type assertion to a hashmap each time when descending into the depths of the sub-hashmaps (line 38). Go then looks at the value, concludes that it might be a map, and lets you dig deeper.
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)