8 May 2014

Mapping where you ride- big data for cyclists

How do you work out the routes cyclists take? If you ride locally a lot you can take a pretty good guess, but your perception will always be influenced by your daily routes and your riding preferences. Understanding cycle traffic is key to improving road conditions for people on bikes and suggesting improvements which actually work instead of the ones which traffic engineers think may benefit most people.

Logging the routes cyclists use is a time consuming task- it boils down to how record data from enough people over a long enough period to build a meaningful picture of what's going on. Logging can be done using traffic counts, questionnaires or by traffic counters but these methods give only a pinpoint view in terms of geography and time- traffic counts can't be done on more than a few points at once at a particular day, questionnaires generally don't hit a big enough sample and automated traffic counters are few and far between. Gathering all this data up from its various sources, manipulating, aligning and calculating takes a great deal of time and effort for those who produce it. 

Accessing the results as a member of the public can be really difficult since they're not all in the same place or, in some cases, published at all.

Big Data

As with so many things these days, the solution is to use a 'big data' approach- or, more accurately, get data from another source and do interesting things with it. The rise of smartphone apps has given us a great source of location data for cyclists from apps like Strava which use GPS to record your ride if the data is publicly available for people to do things with via an API. I lack the coding skills to build things using the API but some useful people have built some apps which do some interesting things.

But isn't Strava just used by MAMILs?

Mainly, yes. Some people (like me) record all their rides and some Mamils commute. Filtering the data selected by time, location and rider speeds will give a better view of utility/commuting cycling and a view of leisure/sporting cycling allows us to understand their most used routes and give a good feed for safety/development work. 

The principle is pretty much the same across all the apps- take some data from strava and map it using Google maps, which allows you to zoom in and out and change to a satellite view.

What's there now

 Individual heatmaps- analyse your own riding

Stava lets premium (ie paying) members build heatmaps of their activities. Unfortunately I don't have a premium app so I can't tell you about how this works!

Multiple ride mapper

 Strava multiple ride mapper  produces heatmaps based on one or more rider's rides, filtered by date. It's quite good but it would be difficult to build up a large enough sample of riders to make good local analysis easy.

Global Heatmaps- Mapping more than one user

 Strava have produced a global heatmap showing points reported by all users. 

Strava Global Heatmap

This lights up St Albans usage very nicely , showing in particular how much the Alban Way is used but also the most important routes connecting St Albans to local towns and leisure runs. We can get a lot of information out of this view, but the breakdown is a bit too coarse to make real strategic use.
Race Shape had a similar product, though the visual representation wasn’t quite as good as Strava's. This was available last week but today the page just points at Strava’s heatmap so I’m guessing Strava are capitalising on their data by removing access for other apps.

VeloViewer  lets you look at your own data in different ways and allows you to take a good look at segments so it could be useful if dedicated segments are set up to capture people at various points of interest.


Strava's Saturday project  recorded a 'typical Saturday' by the hour was a pretty good experiment showing lots of detail (here's how it was done but that's only a single day and it's still fairly coarse.

Give us what we want, what we really really want

Campaigners and planners need the ability to select journeys and users more accurately by time and place and process the information differently to make more sense of the data.

We need to be able to look at journeys at certain times of day and days of the week ('show all journeys within a 5 mile diameter of St Albans on a weekday between 6am and 9.30am' would pick out a lot of commute traffic) as would showing all journeys finishing at a particular location ('show me all journeys finishing within a 100 metre radius of St Albans station' would pick out journeys to the station) or all journeys passing through a location. 

The ability to pick out groups of users would be useful too ('where do rail commuters go when the aren't commuting to the train'), where do slower commuters go compared to faster commuters. Age/gender breakdowns would be really useful where the data is publically available. There are some data privacy issues here since we don't particularly want to publish maps which would hint at where people live, but there are ways to protect privacy)

As a starter for 10, here's what I'd like to play with

Journey data for all Strava bike rides starting or finishing within a 5km raidus of central St Albans, with riderid, journey id date, start and finish times. These can be either vectors or points- if points include speed at each point. (Ideally start and finish points to have a 50m resolution so that start/finish hotspots can be identified as there’s a link to cycle parking projects here too.) 

Anonymysed rider id with age and gender, including a rider id which links to journey data. If possible aggregated stats per rider on number of rides split by time, average speeds, total number of rides within the sample area, total number of rides outside the sample area.

Here’s what I’d do:

Commute time heatmap, weekday morning and evening showing an overall view, split out by any rider characteristics available done for Spring, Summer, Autumn and Winter

Utility/leisure  heatmap- the remainder of the daytime weekly data split into morning, lunchtime and evening to identify any differences between routes use for commuting and routes used for utility/leisure done for Spring, Summer, Autumn and Winter.

Weekend heatmaps using the same idea as commute and utility, done for Spring, Summer, Autumn and Winter

Site specific- look at trips to and from the railway stations, the market, Westminster lodge and any other hotpots. 

Time specific-slice out bank holidays, events in the park and city centre, etc to see if there are any changes.

Intercept- where do people passing through a specific are go? Look at points within the city like access points to the Alban Way, St Peter’s street, Verulamium park bike routes and work out how they fit into overall bike movements.


I started writing this post last week and time and Strava's business plans have caught up with me!

Strava today (8th May) said that have made their data available on a consultancy basis to London and Glasgow as well as some othercities This is great but it's done at commercial rates- Oregon is paying $20,000 for a year's access to data on 17,000 Strava users in Portland . There's a good report by Bike Portland on the project here 

Rates vary by the number of Strava users captured so St Albans/Hertfordshire should be fairly low- I have asked Strava for a sample and I'll ask HCC if they are interested too.

1 comment:

Mike1727 said...

Herts County Council are looking at Strava data, hopefully they will be able to secure a budget.