Riding through Mountains of Data: Visualizations of Cycling · 2010-05-12 · Riding through...

Riding through Mountains of Data:

Visualizations of Cycling

Robert CarlsenInteractive Telecommunications Program

Tisch School of the ArtsNew York University

[email protected]

May 11, 2010

Abstract

This project attempts to describe the cycling expe-riences of several riders in New York City through aseries of visualizations. Specifically, I am interestedto discover if riders similar to myself share a com-mon experience through which a sense of connectioncould be derived.

Cyclists were encouraged to record their travelsusing their personal mobile devices running MobileLogger, a custom iPhone application. Log data wasuploaded by the application to an online database innear real-time during each ride. This data was ana-lyzed and filtered to provide source material for theresulting visualizations and system “dashboard” athttp://mobilelogger.robertcarlsen.net.

Keywords

Cycling, New York City, sensors, iPhone, visualiza-tion, mapping, tracking, logging, mobile, applica-tion, bicycle

Introduction

This is an exploration in creating a sense of connec-tion among cyclists through self-quantification and

visualization. The project has two principle compo-nents: data collection using personal mobile devicesand visualization.

I ride a bicycle daily through NYC, and encountermany other cyclists, walkers and drivers. We passeach other in a moment, or perhaps share a lanefor a bit and then continue on our separate ways.Can we create a sense of connection by sharing ourexperiences? How does my 5 mile, 25 minute ridefrom Greenpoint to the East Village compare tosomeone riding from Queens? What does a ridearound Prospect Park share with one in CentralPark? What’s the loudest part of the city for acyclist? Which routes are ridden most frequentlyand at what time of day?

What do these experiences look like? How couldthey be recorded? What could we learn about our-selves and our world if there was a ubiquitous net-work of sensors collecting data about the environ-ment as we experience it? Would analysis and visu-alization reveal trends and patterns in the aggregatebehavior of participants in the network?

Personal Statement

I’m an avid cyclist. I ride as a commuter, enthusiastand occasional racer.

I can get just about anywhere in the city, when-

1

ever I want and under my own power. It also pro-vides fitness and for many people, employment. It’sfaster than walking and more maneuverable thandriving. In dense city congestion it can be fasterthan mass transit. It’s a cheap way to get around.But above all. . . it’s just FUN!

Cycling in the city—and what I really mean bythat is riding among several-ton moving vehicles,in all sorts of weather, often on roads not designedto accommodate bikes—is by some estimation, in-sane. I could describe close calls, spin tales of get-ting from Midtown to the East Village in less than10 minutes at 6pm on a weekday or talk about anepic ride past Nyack where I bonked on the returntrip—but there’s something really interesting aboutquantifying our experiences; somehow they becomemore tangible.

I first felt the excitement of self-quantificationwhen using a heart rate monitor for training, andlater when using a PowerTap cyclecomputer whichcould download ride logs to my computer. Hav-ing numbers for heart rate, speed and power pro-vided nearly endless bragging rights (and sometimesshame) among my cycling teammates. It was a wayfor us to connect our individual experiences in amanner that we could understand and compare.

I’m certainly not the only cyclist on the roads. Isee scores of other commuters as well as couriers,delivery riders and pedicabs—from folks leisurelyriding around the Park to kitted racers in a paceline. However, apart from a friendly head nod, oroccasional exchange of choice words I generally feelisolated as I head from here to there; in the mar-gins of the roads (or sometimes splitting lanes), overthe bridges with their scenic views of the city sky-line, in the frenetic bustle of 5th Ave. Midtown atrush hour and the desolation and darkness of post-midnight industrial Greenpoint and Queens.

I wonder, what do these other people experiencewhile riding through the city? I have a clear under-standing of what it feels like to me to be a cyclist inNew York City. Is that experience a common oneamong other riders? How could I foster a sense ofconnection by relating through this shared experi-ence?

There are several ways of describing an experi-

ence. We can say that we’ve ridden some numberof miles in a day, or show the raw data numbers. Vi-sualizations are a manifestation of real events withwhich I’m hoping to make an emotional connectionbetween the riders and the viewer.

Background

This project really began in my first courses at ITP,resulting in a project which I called CycleSense andthe subsequent visualizations created from it’s data.CycleSense attempted to provide me with a di-rect perception of the unseen space behind my bikewhile riding through traffic, with a goal of increas-ing spatial awareness through haptic feedback. Aset of sonar range finders measured bilateral, rear-ward distance between my bike and traffic, drivinghelmet-mounted vibration motors with increasingintensity as the open space diminished.

Figure 1: CycleSense: Proxemic visualization

However, the range needed to positively informaction at the speed of cycling in traffic is bettersuited for visual and aural sensing, rather than theultrasonic sensors which had a maximum range lim-ited to about 25 feet. While the real-time feedbackgoal was not entirely achieved, visualizations of thedata provided me with insight into how space is

2

maintained by traffic and myself as a vehicle in thestreets (it’s alarmingly tight at times).

The visualization mapped the range values ontoEdward T. Hall’s Proxemic[2] distance categoriesof Intimate, Personal, Social and Public to providecontext (fig 1). Ten inches of space between myselfand a multi-ton vehicle at 20 MPH can be describedas intimate proximity in this system. Imagine beingwithin an intimate space with someone, then imag-ine them as a large truck barreling past you andyou can get the idea.

I continued to explore this topic withseismi{c}ycling, which mapped bumpiness Iexperienced while cycling through New York Cityover several months. Where CycleSense usedcustom hardware mounted to the bicycle, I wasinterested in using a personal mobile device, aniPhone which I already carried, to act as a sensorin this project. By using my mobile phone withcustom software, I was able to record more than 90rides over the duration of the project. It becamevery easy to record consistently when I already hadthe sensor and data logger in my pocket. Usingthe network connection provided by the phone,I was able to push the readings to a database inreal-time, streamlining data collection even further.

The visualization created for seism{c}ycling wasan animated map, tracing each ride in turn, depict-ing the biggest bumps as glowing red circles. Theentire visualization would scatter apart with eachnew bump, providing a sense of being physicallyrattled by rough roads. Eventually these spots ac-cumulated to paint a portrait of my experience ofthe roads in NYC as a cyclist. While the end re-sult was very satisfying for me, it was still centeredon my singular experience. I wondered, what aboutother riders? Do we share a common experience? Ifso, in what ways? How are our experiences unique?I knew that the project needed to be expanded toinclude other people.

Work Description

Overall, the work involved with this project fol-lowed the seven steps for data visualization as out-

lined by Ben Fry in his book “Visualizing Data”[1]:

� Acquire Obtain the data, whether from a fileon a disk or a source over a network.

� Parse Provide some structure for the data’smeaning, and order it into categories.

� Filter Remove all but the data of interest.

� Mine Apply methods from statistics or datamining as a way to discern patterns or placethe data in mathematical context.

� Represent Choose a basic visual model, such asa bar graph, list, or tree.

� Refine Improve the basic representation tomake it clearer and more visually engaging.

� Interact Add methods for manipulating thedata or controlling what features are visible.

A significant challenge in this project was imple-menting data acquisition. Existing studies of cy-cling has largely relied on cyclist survey[5], case-study interview[3], or logging cycling at a series offixed locations. New York City Department of CityPlanning publishes an annual cycling map1, depict-ing existing and planned bicycle route and green-ways (fig 2). Their methodology in generating themap is described on the department’s website:

The existing and planned routes are theresult of extensive fieldwork involvinganalysis of traffic conditions and assess-ment of the connectivity, accessibility, andsafety of the network, as well as meetingsthat sought community input.

However, I am interested in studying and com-paring my personal experience with that of othercyclists, something which is only indirectly repre-sented by survey and only inferred by fixed, exter-nal observation.

The data collection portion of this project uti-lizes a mobile logging application to record eachrider’s experience. The application uploads data toa server for storage and later analysis. Each rider

3

Figure 2: Lower Manhattan bicycle routes. Takenfrom the 2010 NYC Cycling Map.

may view their own data within the application andcan see an aggregate view of all the data on the Mo-bile Logger website2.

I was inspired by the Personal Environmental Im-pact Report (PEIR) created by the Urban Sensinggroup at UCLA, as described by Nathan Yau, a de-veloper of PEIR, in the book “Beautiful Data”[4].That project also uses a mobile application, but logsposition at a far longer interval. It determines themode of transportation and estimates a user’s “im-pact on and exposure to the environment.”

Post collection, I analyzed the data looking for re-lationships and trends among riders and locations.This analysis is critical for the eventual visualiza-tions. I have an initial set of questions which I’mlooking to answer: Do several riders experience sim-ilar conditions at the same place and time? Wheredo riders go? Where do they originate? Where do

they congregate? More personally, how do otherriders’ experiences relate to my own?

Initial analysis was implemented using GoogleMaps to plot each ride. It was a useful tool toidentify the location of new events and to verifythe integrity of recorded data. Through this toolseveral issues with location accuracy and networkconnectivity were identified.

Further exploration of the ride data was done us-ing R, the language and environment for statisticalcomputing and graphics3. Several aspects of ridedata were investigated, starting with simple plotsof event routes then later graphs of the frequencyof ride times and locations. R was also used to filterand clean the raw data.

Several graphs and plots generated in R (usingthe excellent ggplot2 library4) were further manipu-lated in graphics software to create initial, static vi-sualizations, including a world map of Mobile Log-ger users. Graphs of ride distribution by time ofday, day of week and location also helped to cate-gorize data.

Ultimately, an animated visualization of ride datain New York was created using Cinder, an open-source C++ framework5. This visualization de-picts events in a 3D environment primarily by loca-tion and time, showing similar inferred experiencesswirling about each other.

Before any of this, however, data needed to becollected and tools to achieve this needed to be au-dited and created.

Ubiquitous Sensing

Sensors are increasingly being embedded in every-day devices and infrastructure. Many modern lap-top computers include a “sudden motion sensor”designed to save the hard drive in the case of afall. Buildings can include climate and movementsensors—which can be used to coordinate dampersto help stabilize the building in high winds or anearthquake.6 Small wearables like the FitBit7 orDirectLife8 track activities like walking and run-ning as well as restlessness during sleep. Portabledevices, like the iPhone include a small array of sen-

4

sors, which can record movement and orientation,sound levels, location, photos and video. Fleets ofcars and trucks include GPS devices not only fornavigation but for tracking the vehicle’s locationand speed.

However, the real impact of these sensors be-comes realized when imagining them networked.Adam Greenfield describes powerful scenarios re-sulting from “ubiquitous urban sensor networks.”He envisions advancement in sensor technology pro-viding for a pervasiveness in which urban infrastruc-ture, such as road surfaces, sidewalks and buildingsare addressable, queryable and therefore scriptable,providing a programmatic interface to the moderncity.9

What if each computer with a sudden motionsensor could report anomalies in real-time? Couldthis serve as an earthquake early warning system,as Quake Catchers is trying to implement?10 UsingGPS-enabled mobile devices, services like Waze pro-vide navigation directions in exchange for users pas-sively providing traffic conditions, which then in-forms the service’s direction algorithm.11 SenseNet-works12 has released the CabSense13 application us-ing the NYC Taxi GPS database and the GPS fea-ture of iPhone and Android devices to point usersto the best places to catch a cab.

Going mobile

Two broad categories of sensors are fixed and mo-bile. Fixed sensors have the benefit of being ableto directly connect to existing power and commu-nications infrastructure. They can be installed atan appropriate density and dispersion for the en-vironment and what they are designed to monitor.For example, air quality sensors in a sparsely pop-ulated area may have different requirements thansound sensors in a city. They can remain loggingwhen no one is present. They can be installed ininaccessible or uninhabitable locations which never-theless would be useful to monitor—subway tracks,pipelines, transit vehicles, etc.

However, when trying to describe how peoplemove through an environment, the best they can

Figure 3: MIT’s Copenhagen Wheel

do is to infer these experiences from afar. Mo-bile sensors, specifically those embedded in mobiledevices, prioritize sensing a personal experience.Of course, even these sensors are limited to theirspecific capacity—such as movement, location, ororientation—but have the great advantage of beingrepresentative of an individual in that environment.

Mobile data collection presents significant diffi-culties. Aside from persistent challenges of securingreliable power and communication, these sensorshave to be deployed to people willing to carry them.In the case of the FitBit a user needs to acquire anduse a new piece of hardware. The same challengefaces Nike+ devices and initiatives like CopenhagenWheel, a project by the MIT Senseable City Lab14

(fig 3), a veritable environmental lab built into thehub of a rear bicycle wheel.

Using personal mobile devices as sensors in thisubiquitous network, the aforementioned challengesnotwithstanding, provides an advantage that theyare already carried by owners and are already useddaily. The threshold for encouraging greater par-ticipation drops sharply when you already have thenecessary tool in your pocket. This assumes that ca-pable devices are widespread—GPS-enabled mobiledevices are becoming prevalent enough to use themfor large-scale personal data collection. Nielsenreports that (sensor-laden) smartphones make up27% of the wireless market in the US, and are ex-

5

pected to reach 50% by the end of 2011.15

Mobile application

This becomes a trade-off between depth andbreadth. Adhoc, mobile sensor widgets can be de-signed to record rich data beyond what is avail-able in existing mobile devices—like air quality,heart rate, stress and breathing—but each devicestarts with zero units in the world. Apple intro-duced over 20 million GPS-enabled iPhone devicesin 200916—quite a head start for my project. Ifuseful data about individual experiences and rela-tionships could be told using the relatively limitedsensors available on existing mobile devices, whatwould that look like?

Figure 4: Mobile Logger on my bicycle

To that end, I released an iPhone application,named Mobile Logger17, to facilitate this user-centric data collection. It’s available as a free down-load on the Apple App Store and as open-sourcecode on GitHub18 under the GNU Public License.The application was written in JavaScript using theopen-source Titanium Appcelerator framework19,which is designed for cross-platform distribution to

iPhone and Android platforms, as well as upcom-ing support for BlackBerry. This choice was madeto allow for flexibility in the future porting of theapplication to a wider selection of devices.

Mobile Logger acts as a dashboard for some-one interested in recording their movement. Thehome screen presents a compass-like heading dis-play, along with current speed, sound levels, acceler-ation force, trip distance and duration. Each log isstored on the device and can be inspected to revealits route on a map. The log file can be exported viae-mail in several common formats. While logging,the data is optionally pushed to an online databaseat frequent intervals.

To date, 164 unique devices have uploaded logdata globally representing 239 events longer than1 minute, with approximately 10 active riders inNYC. Most of these new devices appeared afterpublic release of the application in the Apple AppStore. In addition to the ambient logging, I pro-moted a call for participation to coincide with EarthDay on Thursday, April 22nd with the intention ofbenefiting from a tendency for people to ride a bi-cycle on that day.

The application has been used in every conti-nent, save Antarctica. To illustrate the globalreach of deploying the application on an alreadyestablished platform, the Mobile Logger Dashboardat http://mobilelogger.robertcarlsen.net displays amap of every location where the application hasbeen used for longer than one minute, as well astallies of unique devices, events, duration and dis-tance.

Collection

To facilitate collection of log data from partici-pants I setup an online database and rudimentaryapplication programming interface (API) for themobile devices to utilize. The database used isApache CouchDB, a document-oriented, schema-less database which can easily be interfaced via a“RESTful JSON API that can be accessed fromany environment that allows HTTP requests”20.CouchDB was chosen for several reasons, among

6

Figure 5: Global users of Mobile Logger

them:

� Most mobile devices that can access the inter-net support HTTP requests, eliminating theneed for specific software libraries to commu-nicate with the database.

� Without a strict schema, devices with varioussensor capabilities can be easily included in thedatabase, simply adding new fields as neces-sary.

� CouchDB includes facilities for easy replica-tion, or cloning, of the database. This can beused to archive the live database, to make acopy which can then be queried for explorationwithout affecting real-time upload from mobiledevices, for scalability as demand may increaserequiring more server capacity and eventuallyfor sharing the data with other interested re-searchers.

� Queries can be constructed in JavaScript usingthe MapReduce paradigm to provide powerfulways to aggregate the collected data.

� CouchDB can scale to hundreds of millions ofdocuments. Each sample in a log file is treatedas a separate document and a typical 24 minutelog file may contain 1600 samples.

To provide a simplified interface for mobile de-vices and to provide a layer of abstraction betweenthe database itself and open internet a middle-ware layer was created using Sinatra, a lightweight,Ruby-based microframework21. Sinatra made iteasy to expose only the needed methods for addingand retrieving log data from the database. Thisprovides security by limiting the database accessonly to the API methods via the middleware, en-ables flexibility in changing the location or name ofthe actual database and allows for future scalabilitythrough sharding, or splitting the database amongseveral servers to distribute the load.

The middleware layer also provides access to thedata for the Mobile Logger Dashboard site, and maybe used in a future version of the Mobile Loggerapplication for real-time feedback about the overallstate of the system. Eventually, this API may en-able third-parties to access log data for their ownanalysis and visualization.

Dirty Data

The raw data collected from participants can bevery messy. GPS often misreports location amongstinterfering objects such as tall buildings in cities.Also, this project is interested in the travels of cy-clists in New York City, but there is no physicallimitation preventing users from logging any typeof movement (or no movement at all) anywhere inthe world. The data needed to be filtered, cleanedand sorted before meaningful analysis could begin.Filtering specific locations was achieved by geohash-ing the location data. Geohashing creates a uniquestring of letters and numbers for latitude and lon-gitude pairs, which represents a decreasing rectan-gular area by increasing hash length.22 A hash of 6characters such as “dr5rsn” represents a 600 me-ter wide area of Lower Manhattan (fig 6). Thedatabase can be queried for specific hashes to getdata associated with particular locations.23

Filtering events by mode of travel is another chal-lenge. A simple solution is to factor the eventsby average speed, assuming that bicycles typicallytravel faster than walking, but slower than driv-

7

Figure 6: NYU geohash. Displayed with DavidTroy’s geohash demonstrator.

ing. However, I can often get around the city fasterthan a car, depending on congestion and time ofday. Maximum speed can provide more insight, al-though a car may go slower in an intra-city trip thanI ride descending one of the bridges. Investigatingthe route may also help, assuming that a bicycle isnot likely to travel on highways, and vehicles not onbikeways. Unfortunately, in both regards, this is afalse assumption in the city. Ultimately, some naivefiltering with manual analysis was necessary to iso-late cycling events from the rest of the data. Thegreater challenge is to identify trends and categoriesin the remaining ride data.

Bias

In considering the collected data it becomes appar-ent that there are specific biases which should beaddressed in order to make an honest assessmentabout anything derived from it. The project con-siders cycling as the primary activity, and as suchother modes of transportation have been eliminatedfrom consideration in the data. While using a mo-bile device did tap into a large pool of existing users,the iPhone is still a limiter of possible participants.

Rather than targeting all cyclists in New York, thisrequires participants to be a cyclist (in New YorkCity) and be an iPhone owner. While a targetedstudy may identify what portions of these popu-lations actually overlap, anecdotally it seems thatthere is certainly a demographic bias.

From a technological standpoint, Mobile Log-ger places significant drain on the device’s battery.Users have reported that the maximum logging du-ration on a full charge has been about 2 hours. Inmy personal use, a typical 25 minute commute con-sumes 15-20% of the battery charge (on an iPhone3GS). I could get through a day with two loggingsessions and light use (e-mail, web, infrequent voicecalling) on a single charge. However, I often wouldrecharge the phone over the course of the day.

These considerations seem to have made the ap-plication most suitable for logging trips less than30 minutes in length, where the device could beused solely for data collection and have limited useotherwise or be recharged during the day. The ap-plication and use case have been designed for riderslike myself. Given this project’s focus in wanting tosee what riders like myself are experiencing I don’tfeel that this is a deleterious bias, but somethingcertainly to be aware of when considering the data.

Finally, I need to mention that data from my de-vice is by far the most prevalent in the Manhattanarea, representing 55% of the recorded samples.

Analysis

Described earlier, geohashing provides a mechanismto isolate specific locations in the database. Theadjacent hashes of “dr5r” and “dr72” encompassmost of Manhattan and Brooklyn, were ride datais concentrated. Querying the database for theselocations returns approximately 160,000 samples,roughly 3,000 minutes of log data (fig 7). A typ-ical event contains 1,600 samples at 1Hz resolution.

In Manhattan, the raw data indicates 150 eventswere recorded by 17 devices. Simple filtering ofevents where the maximum speed was less than60km/h (generally faster than a bicycle travels onflat ground) reveals that my device was the only one

8

Figure 7: New York City ride data. Colored bydevice.

used in the Manhattan data set to record a trip bycar. Further filtering of devices which recorded lessthan two minutes of data leaves 10 devices. I be-

lieve that three of these device IDs represent testingor simulator devices.

Rides most often occurred at typical morning andevening commute times, with greatest density ofsamples in the 8–9AM and the 6–7PM hours (fig8). There is also a significant dispersion of my ridefiles throughout the evening, reflecting riding homefrom ITP at late hours.

Figure 8: Log samples by hour. Colored by device.

Ride activity trends upwards on a daily basis witha peak on Fridays (fig 9). Low activity on weekendsseems to confirm that the data reflects commutingrides. The pattern is interesting, but it’s difficult toextrapolate whether this is indicative of anythingother than the weekly schedules of this small groupof riders.

Three events have unusually long durations (morethan 10 hours), appearing to have been accidentallyresumed for a few moments long after the intendedlogging period ended. Filtering these events as wellas those with durations of less than 1 minute leaves123 events with an average duration of 24 minutes.One event has several breaks of 20-30 minutes, withan overall duration of 160 minutes. I spoke with therider about this and he mentioned that he consid-ered a trip, including stopping at various locationsfor errands, as one event. I had previously consid-

9

Figure 9: Log samples by day of week. Colored bydevice.

ered each segment of a trip as separate events, sohis was an illuminating perspective which needs tobe considered.

Ride data was concentrated in Lower Manhat-tan and Brooklyn, again likely due to the statedsampling bias (fig 10). The densest location is theBrooklyn entrance to the Willamsburg Bridge. Thebridge path and routes to ITP’s East Village loca-tion are the most frequently represented, as well asa route through Queens to the 59th Street Bridge,likely from my own commute to Midtown.

Regarding GPS performance, the mean accuracyfor this data is 60 meters, with a best reported ac-curacy of 9 meters. Mean altitude is 18 meters(with 13 and 23 meters at lower and upper 25thpercentiles), however the mean altitude accuracy is62 meters, so I suspect that the altitude data is notentirely reliable (although these values do seem con-sistent with external altitude data available via theNYC Data Mine site24).

An assumption I’ve made is that Mobile Loggerwould be used for ground-based travel. However,New York has many tall buildings and bridges. In-deed, there are several altitude readings in the 40meter range, the approximate height of the Manhat-

Figure 10: Ride density

tan, Willamsburg and Queensboro bridges which Ifrequent. There are also small groupings of sam-ples with recorded altitudes at 200 and 500 meterswhich have an accuracy of 228 meters and have beenfiltered.

The average moving speed of rides, filtering ze-roes, for the entire data set was 18.5 km/h with astandard deviation of 7.6 km/h (fig 12). My av-erage speed was fastest (of this small set) at 20.6km/h, but I have company with other riders at20.1 km/h and 18.5 km/h. There are two deviceswith markedly slow average speeds. It’s difficult totell if these devices were used for walking, very ca-

10

Figure 11: Manhattan altitude readings, filtered.

sual riding, or suffered from poor GPS reception.The speed accuracy of the device seems to improveat greater velocities, such as driving and cyclingquickly. Walking, even briskly, often reports zerospeed.

Figure 12: Average speed by device

Average distance for the rides was 8km with astandard deviation of almost 5km, including severalrides over 20km and one at 32km. My typical ridelength was 8.7km and it appears like there is an-other rider who rides a similar distance frequently

(fig 13). A by-product of GPS inaccuracy is incor-rect distance estimates when plotting between rawcoordinates. If several location readings are off by alarge amount, distance can quickly, and erroneouslyaccumulate. One promising technique for filteringraw GPS data is outlined by Julien Cayzac, titled“The Cumulative Displacement Filter.”25 I’m ea-ger to apply this technique to the ride data in futuredevelopment.

Figure 13: Event distances. Colored by device.

Audio levels were recorded using a digital scale(-60 to 0 dbFS) and it is difficult to convert thesevalues into a generally comparable unit (dbSPL)without proper microphone calibration. Neverthe-less, relative assessment can be made using thesevalues. The mean sound level recorded was -23dbFS, with a standard deviation of 8.2 dbFS. How-ever, little is known about circumstances of the de-vice while recording, specifically it’s location on theparticipant. It could have been in a variety of placeswhich may affect detected sound levels, such as ina pocket, backpack or attached to the bicycle itself.

The accelerometer values can indicate two as-

11

pects of the device’s experience: orientation andmovement. Since the application is recording at aone second resolution, subtle device movement islost in subsequent downsampling. However, over-all device orientation can be inferred by analyzinga distribution of the available values (fig 14). Peakdensities for X,Y and Z-axis accelerometer forcesindicate that the (iPhone) device was primarily up-right, tilted right and somewhat face down. I keepthe device in the same location and position for eachlog, in a pocket on the shoulder strap of my back-pack, which keeps it upright and angled to it’s right.I always put the device in the pocket facing forward,but when I’m in a riding position my chest is point-ing downward as I reach for the handlebars. Sincemy logs account for 55% of the data, it’s unsurpris-ing that my consistent position is apparent.

Figure 14: Overall accelerometer forces

The magnitude is a vector created from all threeaxes, and should have a value of 1.0 when the deviceis stationary, regardless of orientation. The distri-bution of magnitude values should indicate whatdegree of movement has been experienced. There isa noticeable spike at 3g, which is the upper specifi-cation of the sensor on the device and may indicatesignificant jostling.

Weather

During 34-day window in which this data was col-lected (March 29–May 2), the daily average tem-perature ranged from 46–77°F with a mean of 59°F,

minimum of 40°F and maximum of 92°F.26 It seemsas though most of the riding occurred in mild tem-peratures. Rain was recorded for 10 days of thisperiod, accumulating 7 inches of water.

Figure 15: Sample count by daily average temper-ature.

Wind speeds over this time averaged 10 km/h,including at least three days with appreciable gusts.This wind may have had an effect on ride speed,although it’s difficult to determine solely throughthe data. Personally, I recall several days where mytrips were buffeted by wind on the bridges.

Figure 16: Rider speed colored by wind speed

12

Visualization

While statistical analysis and numerical representa-tion are indispensable in identifying categories andtrends while mining the data set, eliciting an emo-tional response from a table is a difficult proposi-tion for me. In cycling training software numericsummary tables, graphs and plots are often used todisplay the logged ride data. The graphs honestlydepict the data, but how do these visuals really con-vey the recorded experience? Various plots whichhave become common in cycling training analysiscertainly do provide some insight into the numbers,at the expense of having to learn to decipher them.

Figure 17: Golden Cheetah

Golden Cheetah27, an open-source cycling anal-ysis application, implements several of these typ-ical displays. This application does indeed repre-sent ride data in tabular format, identifying aver-age, maximal values and aggregate values (fig 17).It also displays the data in a time-based graph, over-laying several metrics along a vertical axis. Oneparticular plot, popularized by Dr. Andy Coggan,is the Pedal Force / Pedal Velocity scatterplot28

(fig 18). This plot is highly revealing of the typesof physiological exertion demanded of a rider duringan effort. However, to a wider audience it is highlyopaque. How could these visualizations be done in

a way that is immediately apparent and meaningfulto an uninitiated viewer?

My goal is not to single Golden Cheetah out, butto use it as representative of cycling analysis soft-ware (full disclosure: I have been a developer withthe project for several years).

Figure 18: Pedal Force / Pedal Velocity plot for anindividual time trial effort

There are great examples of visualizations whichcommunicate their data concisely and are visuallystriking enough to resonate emotionally with view-ers. Aaron Koblin’s Flight Patterns (fig 19) showthe daily patterns of air traffic across the UnitedStates.29 Cabspotting (fig 20) by Stamen Designalso tracks taxis, displaying their routes over time inSan Francisco.30 My own project, seismi{c}ycling(fig 21) painted a portrait of road roughness I ex-perienced over several months in the Fall of 2009.31

What does it look like?

The visualizations in this project consist of threebroad components: the in-application map, thewebsite “dashboard” and the animated off-line vi-sualization.

13

Figure 19: Flight Patterns by Aaron Koblin

Figure 20: Cabspotting by Stamen Design

Within Mobile Logger, each log stored on the de-vice can be displayed on a map, with a subset ofthe data points plotted to depict that log’s route(fig 22). Each data point can be tapped to displaythe speed and sound levels at that moment. Thiswas implemented to provide value to the end user,beyond and apart from the centralized data collec-tion component I am most interested in. I wantedthe application to be able to stand alone, without

Figure 21: Seismi{c}ycling

the online database if necessary, or if the user pre-ferred to opt out of sharing their data.

On the Mobile Logger Dashboard website there isa map displaying the location and duration of globaluse of the application and an animated marker in-dicating the location of the most recently loggedevent. This has created an immediate and profoundeffect in generating a sense of connection for me.I find myself frequently checking the map to lookfor new locations representing other people loggingaround the world. While this is a departure fromthe intended study of New York City cyclists, theexcitement of acknowledging global use warrantsmention.

An animated visualization was created to displaythe ride data in a fluid, aesthetic way. This visu-alization plots rides in various ways in an attemptto show now only the characteristics of each ride,but also of how they relate to one another. It wasdesigned to be standalone and installed in a semi-public location.

14

Figure 22: Mobile Logger map view

Personal data

I’ve considered the issue of personal data privacyheavily. Each of these logs contains one-second res-olution of the location of individual participants.Although the application doesn’t record personallyidentifiable contact information, each log (unlessdisabled) contains a unique identifier for the de-vice. Over time a pattern of movements could be-come unique enough to isolate an individual. In anera when personal information is increasingly vol-unteered and/or unknowingly collected, is locationprivacy a specific concern? Is the solution to bal-ance the power between the participant and usersof that data through transparency? Rather, are thestated principles32 of Sense Networks sufficient?

� People should own their own data

� People have a right to privacy

� People should receive a meaningful benefit inexchange for sharing data

� Aggregate anonymous location data should beused for common good

In consideration of these points, I’ve made delib-erate decisions regarding the data collected by thisproject. I’ve opted not to request personal contactinformation from any participant, and have addi-tionally provided a means for participation with ananonymous device identifier. The data uploadingfeature of Mobile Logger may be disabled at anytime, without any detriment to the usability of theapplication for the end user. By providing theirdata to the Mobile Logger system, they can seetheir place among other users on the Mobile LoggerDashboard and associated visualizations. I’m alsoconsidering other ways for participants to more di-rectly benefit from sharing, perhaps with personal-ized data display on the Dashboard site, or real-timestatus of the system within the device application.

If the system were to fully embrace the SenseNetworks principles, a method would need to beimplemented for any user to remove their histori-cal data from the database as desired. This is myprimary hesitation in opening the data set to thirdparties, although I’m interested in eventually mak-ing the data available to facilitate the final principleof working for the common good.

Where do we go now?

The stated goal was to foster a sense of connectionamong strangers through sharing common experi-ences, using these visualizations as a way to makeabstract numbers meaningful to uninitiated view-ers.

While there have only been a few riders in NewYork logging themselves, I have certainly had a pal-pable shift in my perception of riding through thecity. Perhaps it’s because I’ve been looking overmy own ride data lately, but I often imagine whatother riders’ log files would say and how ours would

15

compare. One Mobile Logger user who also lives inmy neighborhood saw screenshots of my log routesand contacted me, mentioning that he lives nearbyand offered to meet up.

I have progressively altered my routes in themonths I have been tracking my rides. Generally,I strive for the most direct route, and the mappingfeature of the mobile app enables me to immedi-ately see if I can make improvements. However, Ioften prefer less-trafficked or quieter roads to busierones, and will look for a bike lane if it’s convenient.

When I cross paths with another rider, I imag-ine a ribbon tracing our paths and extending intopossible future routes. I also like to consider wherethey’ve been and think about how my ride could beinfluenced by their route. I have also begun to strikeup conversation with other riders, often describingMobile Logger and encouraging their participationwhich is generally well-received. In the end, how-ever, there are still many riders and vehicles aroundand I still get cut off, have close calls and epic rides(and make some ill-advised maneuvers of my own).

The preliminary global data has had a profoundeffect on me. I noticed one day that while I wascommuting home after work in Manhattan therewas a training ride in Philadelphia, an evening drivein Bristol, UK and an early morning walk to thebeach in southern Australia. This provides me asense of connection to these far away strangers.Even in this abstract representation, I feel less aloneknowing that other people are also moving about.Of course, technology isn’t needed to prove this, butsometimes a construct helps to shift perspective.

I’m pursuing a greater density of data throughencouragement of wider rider participation. Pub-lic, aggregate, real-time visualizations will help pro-mote persistent participation if individual contri-butions can be recognized by the user. I’d like tosolicit more feedback from participants about theirexperiences. Perhaps providing a mechanism to tagor annotate ride data could assist with categoriza-tion.

I’m also very interested in extending the mobileapplication to record external sensors. I would verymuch like to log physiological data such as heartrate, breathing rate / depth and stress levels as well

as environmental data. Perhaps this can be accom-plished through various levels of commitment on thepart of the rider to incorporated several devices forself-quantification. Ideally, the mobile device willact as a hub for storage and transmission.

I’ve published the project’s process in the spirit ofopen source. This includes releasing the logger ap-plication source code, documenting collection meth-ods and describing the visualization process. Even-tually, I aim to publish the collected data for furtheranalysis. Hopefully, this will enable other people toextend and augment the work in ways I haven’t en-visioned.

16

Notes1http://www.nyc.gov/html/dcp/html/bike/cwbm.shtml2http://mobilelogger.robertcarlsen.net3http://www.r-project.org/4http://had.co.nz/ggplot2/5http://libcinder.org/6http://news.wustl.edu/news/Pages/8961.aspx7http://www.fitbit.com/8http://www.directlife.philips.com/9http://speedbird.wordpress.com/2009/10/11/toward-

urban-systems-design/10http://qcn.stanford.edu/11http://www.waze.com/homepage/12http://www.sensenetworks.com/macrosense.php13http://cabsense.com/14 senseable.mit.edu/copenhagenwheel/

Photos by Max Tomasinelli www.maxtomasinelli.com15http://blog.nielsen.com/nielsenwire/consumer/smartphones-

to-overtake-feature-phones-in-u-s-by-2011/16http://www.apple.com/investor/17http://robertcarlsen.net/dev/mobile-logger18http://github.com/rcarlsen/Mobile-Logger19http://www.appcelerator.com/20http://couchdb.apache.org/21http://www.sinatrarb.com/22http://en.wikipedia.org/wiki/Geohash23http://openlocation.org/geohash/geohash-js/24http://www.nyc.gov/html/datamine25http://julien.cayzac.name/code/gps/26http://www.wunderground.com/history/27http://goldencheetah.org/28http://home.trainingpeaks.com/articles/cycling/quadrant-

analysis.aspx29http://www.aaronkoblin.com/work/flightpatterns/30http://cabspotting.org/31http://robertcarlsen.net/projects/seismicycling32http://www.sensenetworks.com/principles.php

References

[1] Ben Fry. Visualizing Data: Exploring and Ex-plaining Data with the Processing Environment.O’Reilly Media, 2008.

[2] E.T. Hall. A system for the notation ofproxemic behavior. American anthropologist,65(5):10031026, 1963.

[3] Matthew Ides. Hub and Spokes: Imageability ofthe Daily Bicycle Commute in New York City,2008.

[4] Toby Segaran and Jeff Hammerbacher. Beauti-ful Data: The Stories Behind Elegant Data So-lutions. O’Reilly Media, 2009.

[5] Ph.D. William E. Moritz. A SURVEYOF NORTH AMERICAN BICYCLE COM-MUTERS, 1997.

17

Riding through Mountains of Data: Visualizations of Cycling · 2010-05-12 · Riding through...

Documents

Transcript of Riding through Mountains of Data: Visualizations of Cycling · 2010-05-12 · Riding through...