An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and...

18
An Empirical Study of Combining Participatory and Physical Sensing to Better Understand and Improve Urban Mobility Networks Xiao-Feng Xie The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 WIOMAX LLC PO Box 19184 Pittsburgh, PA, 15213 [email protected] Zun-Jing Wang Department of Physics Carnegie Mellon University Pittsburgh, PA 15213 WIOMAX LLC PO Box 19184 Pittsburgh, PA, 15213 [email protected] 4997 words + 10 figures + 0 tables November 14, 2014

Transcript of An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and...

Page 1: An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an area controlled by

An Empirical Study of Combining Participatory andPhysical Sensing to Better Understand and ImproveUrban Mobility Networks

Xiao-Feng Xie

The Robotics InstituteCarnegie Mellon UniversityPittsburgh, PA 15213

WIOMAX LLCPO Box 19184Pittsburgh, PA, [email protected]

Zun-Jing Wang

Department of PhysicsCarnegie Mellon UniversityPittsburgh, PA 15213

WIOMAX LLCPO Box 19184Pittsburgh, PA, [email protected]

4997 words + 10 figures + 0 tables

November 14, 2014

Page 2: An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an area controlled by

ABSTRACTThe rapid rise of location-based services provides us an opportunity to achieve the information ofhuman mobility, in the form of participatory sensing, whereusers can share their digital footprints(i.e., checkins) at different geo-locations (i.e., venues) with timestamps. These checkins provide abroad citywide coverage, but the instant number of checkinsin urban areas is still limited. Smarttraffic control systems can provide abundant traffic flow databy physical sensing, but each con-trolled region only covers a small area, and there is no user information in the data. Here wepresent a study combining participatory and physical sensing data, based on 3.4 million checkinscollected in the Pittsburgh metropolitan area, and 125 million vehicle records collected in a sub-area controlled by an adaptive urban traffic control system.Our aim is to disclose how we couldutilize the combined data for a better understanding on urban mobility networks and activity pat-terns in urban environments, and how we may take advantage ofsuch combined data to improveurban mobility applications such as anomaly traffic detection and reasoning, topic-based nontrivialtraffic information extraction, and traffic demand analysis.

Page 3: An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an area controlled by

Xie and Wang 2

INTRODUCTIONUnderstanding human mobility (1, 2, 3, 4, 5) and activity patterns in urban environments is asignificant and fundamental issue from various perspectives, e.g., understanding regional socio-economics, improving traffic planning, providing local-based services, and promoting sustainableurban mobility. Traditionally, relevant information is however rarely obtained due to difficultiesand costs in tracking the time-resolved locations of individuals over time.

In recent years, an increasing attention has been placed on introducing smart traffic controlsystems (6, 7) into urban road networks. The primary objectives of such systems are to reducetravel time, resolve traffic congestion, and reduce vehicleemissions. Recent work in real-time, de-centralized, schedule-driven control of traffic signals (7, 8) has demonstrated the strong potentialof real-time adaptive signal control in urban environments. The smart and scalable urban traf-fic control system (9) achieved improvements of over 26% reductions in travel times, over 40%reductions in idle time, and a projected reduction in emissions of over 21%, in an initial urbandeployment. To facilitate effective real-time control, vehicle flows can be monitored by differentphysical sensors, e.g., induction loops and video detectors, and pedestrian flows can be detectedand inferred using push-buttons or other devices. Althoughtraffic flow data in fine granularity canbe logged in real time, usage of these physical sensing data is often limited to the region that isbeing controlled. For the broad uncontrolled regions instead, no information is available.

The rise of location-based services provides us another wayto achieve the information ofhuman mobility, in the form of participatory sensing (10, 11, 12). With mobile devices,userscanshare their digital footprints at various geo-locations (i.e.,venues) with timestamps throughcheck-ins, e.g., geo-enabled tweets and geo-tagged photos and videos. Using of these services grows fastworldwide, although the instant sampling rate of trajectories is still very limited, and some web-based services, e.g., Waze and Facebook, do not open their location-based data to public. Researchwork has been conducted to understand temporal, spatial, social patterns, and some combinedpatterns of human mobility (13, 14, 15, 16, 17, 18, 19, 20, 21,22).

In this study, we focus on exploring the potentials of combining participatory and physicalsensing data in urban mobility networks. The participatorysensing data are collected in the Pitts-burgh metropolitan area with the APIs provided by the location-based services including Twitter,Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an areacontrolled by the scalable urban traffic control system (9).Basic spatial and temporal characteris-tics are displayed for both the physical and participatory sensing data.

We first study human mobility patterns for a better understand of the urban mobility net-work. User checkins are examined to disclose the distribution of user behaviors, a fundamentalstatistical property of mobility pattern. Geo-location based cluster analysis is performed to iden-tify personal favoriteplacesof users in the studied regions. User entropy is measured to reveal thedegree of predictability of user activities. Time-dependent mobility patterns are analyzed to showthe regularity of user behaviors, based on primary and secondary most visited places of users.

We then further inform how we may benefit from combining physical and participatorysensing data to improve urban mobility applications with three examples. First, we evaluate theattraction and limit of using sensing data for anomaly detection and reasoning of traffic flow. Sec-ond, we examine if nontrivial information could be extracted from participatory sensing data toeffectively recognize traffic congestion in temporal and spatial dimensions. Third, by choosingtwo zones in the controlled region, we perform a close check on the correlation between physicaland participatory sensing patterns. For the two zones, we also illustrate how we may investigate

Page 4: An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an area controlled by

Xie and Wang 3

the origin and destination (O-D) patterns that are valuablefor urban mobility from thetransitionsbetween user checkins.

sensing data

(a) Participatory Sensing Region (b) Physical Sensing Region

(c) Checkin Locations (d) Heat Map of Checkins

FIGURE 1 : Participatory and Physical Sensing Regions in Pittsburgh, PA.

DATA DESCRIPTIONData CollectionWe implement our study in the Pittsburgh metropolitan area.The participatory sensing data con-tains a list of checkins. Each checkin can be represented as atuple <userID, venueID, time,[comment]>, whereuserIDis associated with a unique user,venueIDis associated with a venue atthe geo-location of (latitude, longitude) with the precision of six decimal places. Our checkin datawere collected (between March and July of 2014) from the geo-APIs of some location-sharingservices, including geo-enabled tweets from Twitter and geo-tagged photos from Flickr, Picasa

Page 5: An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an area controlled by

Xie and Wang 4

and Panoramio. We also included existing checkin data directly crawled from Foursquare (23).For studying urban mobility patterns, we only consider checkins at venues in the spatial lati-tude/longitude bounding box of (40.309640, -80.135014, 40.608740, -79.676678), as shown inFigure 1a. For studying up-to-date patterns, we only consider the recent data within the rangeof dates [1/1/2012, 7/1/2014]. The collected data contains3,399,376 checkins of 74,658 users at2,198,572 venues.

The physical traffic flow data are collected in a road network which is currently controlledby the smart and scalable urban traffic control system (7, 8, 9) (see Figure 1a, and for more detailssee Figure 1b). The system was first installed on nine intersections (A to I ) in the East Libertyneighborhood since June 2012 (see the pink region in Figures1a and 1b), and then expanded tonine more intersections (J to R) in the Bakery Square neighborhood since October 2013 (see thegreen region in Figures 1a and 1b). The total area includes five major streets, Penn Ave, CentreAve, Highland Ave, East Liberty Blvd, and Fifth Ave, with dynamic traffic flows throughout theday. For collecting real-time flow data, detectors were deployed on each entry/exit lane at thenear end of each intersection and on entry lanes at the far endof boundary intersections. For eachdetector, avehicle recordis generated at the time when a vehicle is detected to pass thedetector.Compared to that in a checkin, there is no userID and comment information available in a vehiclerecord. In total, the traffic flow data contains 125,369,318 vehicle records generated at 126 stop-bardetectors by the end of 7/1/2014.

Basic Spatial and Temporal CharacteristicsSpatial Distribution of CheckinsWe first display the spatial distribution of all checkins in the whole region. Figure 1c shows thegeo-distribution of checkins. It reflects the highly non-uniform dynamics of human mobility in theurban area. Figure 1d shows the heat map of checkins, where the high-density regions are clearlycolored in red. Notice, one of the red regions in Figure 1d overlaps with the controlled region inFigure 1b.

Temporal PatternsWe are interested in the recurrent nature of human mobility over time. To investigate temporalmobility patterns, each week is segmented into 24×7= 168 hourly bins (starting from Monday).For obtaining seasonal results, each year is divided into four seasons (A to D), and the binnedresults are averaged over 13 weeks in each season. We considered all four seasons in 2013 and thefirst two seasons in 2014.

Figure 2a gives the seasonal checkin patterns in the participatory sensing region. It showsthat the number of checkins has increased significantly overthe seasons. Figure 2b shows thecheckin frequency normalized by the total checkin size in each season. It shows that the checkinfrequency has similar patterns for different seasons. Thesocial day(11, 20) of Pittsburgh starts ataround 4AM, and the checkin frequency peaks at around 8-9PM.A high checkin activity duringSunday is disclosed by Figure 2b.

For vehicle flow in the controlled region in Figure 1b, we consider two pivotal intersectionswhich service most vehicles in this road network: the intersectionD of Centre Ave and Penn Ave inEast Liberty, and the intersectionP of Fifth Ave and Penn Ave in Bakery Square. For intersectionP, the vehicle flow data is only available for the two seasons in2014. The seasonal average vehicleflow patterns with physical sensing data of the intersectionsD andP are respectively shown in Fig-

Page 6: An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an area controlled by

Xie and Wang 5

2014-B2014-A

2013-D2013-C

2013-B2013-A

Time (day)

AverageCheckin

Count

SunSatFriThuWedTueMon

1000

800

600

400

200

0

(a) Checkin Counts in the Participatory Sensing Region

2014-B2014-A

2013-D2013-C

2013-B2013-A

Time (day)

Checkin

Frequency

SunSatFriThuWedTueMon

0.016

0.012

0.008

0.004

0

(b) Checkin Frequency in the Participatory Sensing Region

2014-B2014-A

2013-D2013-C

2013-B2013-A

Time (day)

AverageVehicle

Count

SunSatFriThuWedTueMon

2500

2000

1500

1000

500

0

(c) Vehicle Flow Patterns at IntersectionD

2014-B2014-A

Time (day)

AverageVehicle

Count

SunSatFriThuWedTueMon

3000

2500

2000

1500

1000

500

0

(d) Vehicle Flow Patterns at IntersectionP

FIGURE 2 : Seasonal Temporal Patterns for the Participatory and Physical Sensing Data.

Page 7: An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an area controlled by

Xie and Wang 6

User Checkin Count

Probability

1E+031E+021E+011E+00

1E-02

1E-03

1E-04

1E-05

1E-06

1E-07

(a) Distribution of User Checkin Counts

Radius of Gyration (kilometer)

Probability

1E+011E+001E-01

1E-02

1E-03

1E-04

1E-05

1E-06

1E-07

(b) Distribution of Radius of Gyration

Inter-Checkin Time (day)

Probability

28211470

1E+00

1E-01

1E-02

1E-03

1E-04

1E-05

1E-06

1E-07

(c) Distribution of Inter-Checkin Times

Inter-Checkin Distance (kilometer)

Probability

1E+011E+001E-01

1E-01

1E-02

1E-03

1E-04

1E-05

1E-06

(d) Distribution of Inter-Checkin Distances

FIGURE 3 : Basic User Checkin Statistics in the Participatory Sensing Region.

ures 2c and 2d. During weekdays, the traffic flow shows three peak values, which are respectivelyin the morning (8AM), middle day (12PM), and afternoon (5PM)(i.e. Monday through Friday,corresponding to the first five pattern curves in Figures 2c and 2d). While during weekends, thetraffic pattern shows a decreased peak-traffic-flow value andonly one peak value at around 1PM(i.e. Saturday and Sunday, corresponding to the last two pattern curves in Figures 2c and 2d). Itturns out that the traffic flow in weekdays is heavier than thatin weekends.

UNDERSTANDING HUMAN MOBILITY PATTERNSUser Checkin StatisticsFigure 3 presents some basic user checkin statistics in the participatory sensing region. LetNi

be the number of checkins for useri, it is shown that the probability distribution ofNi follows ascaling law (see Figure 3a). We now further consider the distribution of the radius of gyration (rg)for each user. For useri, let V j

i be the venue of thejth checkin, the radius of gyration is then

r ig =

1Ni

∑Nij=1(dist(V j

i ,Vic))

2, where thedist function gives the distance between a checkinV ji

and the center of massV ic for the checkins of useri. As shown in Figure 3b, most user activities

are confined to a limited neighborhood within 10 km, which is similar to the finding in ref. (4).Figures 3c and 3d show respectively the distributions of thetime and distance intervals betweenconsecutive checkins made by users. The probability of inter-checkin time decreases as the timeincreases, and interestingly, it shows apparent daily and weekly patterns (see Figure 3c). Theprobability of inter-checkin distances (Figure 3d) is quite similar to that ofrg in Figure 3b.

Page 8: An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an area controlled by

Xie and Wang 7

User Checkin Count

UserEntropy

1E+041E+031E+021E+01

1E+01

1E+00

1E-01

1E-02

(a) Checkin Counts of Users

Cluster Number

UserEntropy

1E+021E+011E+00

1E+01

1E+00

1E-01

1E-02

(b) Checkin Cluster Numbers of Users

Secondary(eps=1000m)Primary(eps=1000m)Secondary(eps=250m)Primary(eps=250m)

Time (day)

Probability

SunSatFriThuWedTueMon

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

(c) Mobility Regularity of Users at Primary and Secondary Most Visited Places

FIGURE 4 : User Entropy and Regularity for Checkins in the Participatory Sensing Region.

Finding Checkin PlacesEach user tends to stay in a limited number ofplaces, where eachplaceis defined to accommodatesimilar checkins/activities of the user in its vicinity (17, 24, 25). This definition ofplacesis ableto tolerate some geo-location tracking errors. Notice thatsome existing tracking techniques (e.g.,mobile tower) might have the location accuracy as low as several hundred meters (25, 26).

For our collected data, we identify the checkin places by clustering, where each clusterCdefines the place of a set of checkins. We use an unsupervised clustering method, DBSCAN (27),which is a density-based clustering algorithm that requires only two parameters, i.e.,epsto definea neighborhood threshold andminPtsto define a density threshold. By default,eps=250 metersandminPts=2 points were used.

User EntropyUser entropy (25) is a fundamental quantity to capture the degree of predictability for a user.For useri, let Ck

i be thekth cluster, andK be the number of clusters. After the clustering, thetemporal-uncorrelated user entropySi is defined asSi = −∑K

k=1 pi(k) log2 pi(k), wherepi(k) =|Ck

i |/∑Kk=1 |C

ki | is the probability that clusterk was visited by useri. A lower user entropy means a

higher degree of predictability for visitation patterns. Figures 4a and 4b respectively show the userentropy versus the checkin numbers (N) and the cluster numbers (K). The users in a single clusterare not shown in the log-scale figures since their user entropy are 0. As shown in Figure 4a, a userwith a large number of checkins might have very low user entropy. The trend is more reasonableand clearer in Figure 4b, where a larger cluster number oftenleads to a higher user entropy.

Page 9: An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an area controlled by

Xie and Wang 8

(Oct 23, 2013)

Bridge Reopen

(Mar 4, 2013)

Bridge Closure

Time (day)

Vehicle

Count

365292219146730

7000

6000

5000

4000

3000

2000

1000

FIGURE 5 : Vehicle Flow Pattern in Year 2013 for the Turning Movement CDE (in Fig. 1b) atIntersectionD. (The Data in 09/01, 09/02, and 10/02 are Excluded due to the Disk Full Issue.)

RegularityWe study the mobility regularity of users by computing the probability of finding users in theirprimary and secondary most-visited places (which is rankedusing pi(k)) at hourly interval in aweek, and we use DBSCAN clustering witheps={250, 1000} meters. Figure 4c shows the time-dependent mobility patterns for the primary and secondary most visited places of users.

We first check the case for the primary most-visited places. The regularity is high duringthe night, and it peaks at around 4AM, the start of asocial day. In weekdays, there are minimaduring morning (around 8AM), middle day (1PM), and afternoon (6PM), which are correspondingto the transitions to other places for commuting or having lunch. While in weekends, there is noapparent local minimum during morning, i.e. before 10AM. This pattern indicates that the primarymost-visited places contain a large portion of home places.

For the secondary most-visited places, the regularity peaks at around 8AM. In weekdays,the regularity curve reaches minima during night. In weekends, however, the regularity curveshows little difference beteen daytime and night. Therefore, the secondary most-visited places arevery likely associated with work places.

Notice that when we changeeps from 250m to 1000m (i.e., clustering becomes morecoarsed), the increase in probability for the primary most-visited places is much more than thedecrease in probability for the secondary most-visited places. This means that many users performtheir other activities (i.e. except for the primary and secondary activities) near their home places.During night, the probability is nearly 90% for the primary place and 10% for the secondary place.

PROMISES FOR URBAN MOBILITY APPLICATIONSIn this section, by presenting three examples, we inform howwe may take advantage of the com-bined participatory and physical sensing data to improve urban mobility applications.

Anomaly Detection and ReasoningIn 2013, the bridge on South Highland Ave (location shown in 1b) was closed for replacement.This cut off the connection between Shadyside and East Liberty. Therefore a large portion oftraffic (including all bus lines) on Highland Avenue was forced to pass through intersectionD,which is a pivotal node that services most vehicles in this road network.

Figure 5 shows the vehicle flow pattern in year 2013 for the right-turn movement at inter-sectionD (see Figure 1b, fromC to D to E). The flow significantly increased between early March

Page 10: An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an area controlled by

Xie and Wang 9

Time (day)

Checkin

Count

SunSatFriThuWedTueMon

120

100

80

60

40

20

0

(a) The Main “Traffic” Topic

Time (day)

Checkin

Count

SunSatFriThuWedTueMon

35

30

25

20

15

10

5

0

(b) The “Accident” Sub-Topic of the “Traffic” Topic

FIGURE 6 : Temporal Checkin Patterns for Traffic-Related Topics.

and late October. This is a typicalchange point problemfor anomaly detection in time series oftraffic. Using analysis techniques of statistics (28) on traffic data, we can approximately detectchange points, but we are not able to find the exact time and thereason leading to the events.

Analyzing participatory sensing data is helpful at this point. A search of our geo-taggeddata gives five checkins with the time of the bridge reopeningceremony. For example, at 5:47PM,October 23, a user mentioned joyfully,Wow the bridge on South Highland is open. Life wasrough for a while.We did not find a checkin associated with the bridge closure for our collecteddata, which might be due to the closure event happened too long time ago and the earlier datawas not kept by our location-sharing services. A search of the non-geo-tagged Tweets pinpointsthe time both for closure and reopening events (for which nine and fourteen tweets have beenfound respectively, posted by some of the users or their first-degree friends). In further studies,fusing non-geo-tagged and geo-tagged information will be very useful for anomaly detection andreasoning, especially in case the available geo-tagged checkin information is not sufficient. In fact,many users generate both non-geo-tagged and geo-tagged information in practice. In presence ofa high user regularity as shown in Figure 4c (which is a very common case), the locations of non-geo-tagged information of the users can be inferred from their geo-tagged checkins. In addition,locations can also be estimated using content-derived information (29).

Topic-Based Traffic Information ExtractionMany users generate checkins with the information of their current traffic conditions in travel.Tweet semantics has been used to build indicators for long-term traffic prediction (30). By taking

Page 11: An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an area controlled by

Xie and Wang 10

the search topic “traffic” as an example, we examine more potential usages of the participatorysensing checkins for obtaining important traffic information.

Figure 6a shows the temporal patterns of the checkins for themain “traffic” topic. Wecan clearly see that checkins mainly happened during the morning (around 8AM) and afternoon(around 5PM) of weekdays, especially the latter, reflectingthe rush hours by commuters. Figure6b further shows the temporal patterns of the checkins for both “traffic” (as the main topic) and“accident” (as the sub-topic). It discloses that people should pay more attention to avoid accidentsduring the morning rush hours, especially for the peaks in Figure 6b.

(a) Global View (b) Local View

(c) Sub-Topics on Major Roads (d) Sub-Topics on Accidents

FIGURE 7 : Spatial Checkin Patterns for Sub-Topics of the “Traffic” Topic.

Figures 7a and 7b show the spatial patterns of the checkins onthe main “traffic” topic,where each red circle point represents a checkin. A global view (see Figure 7a) indicates mostcheckins are located on major highways and congested neighborhoods, while a local view (seeFigure 7b) shows most checkins are located on intersectionsin the urban road network. These par-

Page 12: An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an area controlled by

Xie and Wang 11

ticipatory sensing information is nontrivial for marking popular route choices and for identifyingcongested intersections and road segments where traffic conditions should be improved signifi-cantly.

Figure 7c gives the checkin results using sub-topics on major roads, including “I-376”,“I-279”, “Rt-28”, “Rt-51”, and “Penn Ave”, where the checkin points are colored differently foreach road. It turns out that the segments with heavy traffic onthese roads can be quite preciselypinpointed with the checkin results based on these sub-topics. Figure 7d shows the checkin resultsusing sub-topics on “accident”. The spatial checkin distribution is surprisingly broad for the “ac-cident” sub-topic. This implies the importance of further improving the urban road safety (whichmight be addressed in the emerging autonomous and connectedvehicle technology).

As shown in Figure 7b, some traffic-related checkins are located in the controlled region(the pink and green colored regions), therefore we can combine physical sensing traffic data withthese participatory points for a deeper analysis. For each checkin, we know its timet and location.Hence we can further investigate the traffic flow in the time window [t-4h, t+4h] and at its closestintersection in the controlled region.

Figure 8 shows the vehicle flow patterns at IntersectionsA andQ that are associated withtwo checkin examples using the “traffic” topic (T1 and T2 in Figure 7b which are respectivelypertinent to a road closure and an accident event), where we set t = 0. In Figures 8a and 8c,both the traffic flow in the invested time window [t-4h, t+4h] (denoted as “Real-Time Data”) anda 10-week average traffic flow (denoted as “10-Week Average”)are shown, where the 10-weekaverage traffic flow is calculated for capturing the weekly periodic patterns of human mobility. InFigures 8b and 8d, thedetrendedflows are obtained by subtracting the “10-Week Average” flowfrom the “Real-Time” flow in Figures 8a and 8c, respectively.In each detrended flow, a short-term flow disruption can be clearly identified by the deep deviation for sufficient long time. Thisprovides us a plain information on how the non-recurrent incident impacts local traffic. The twoevents impacted the traffic for about 4 hours and 1 hour respectively (see Figures 8b and 8d).Interestingly, checkins pinpointed the events during the early stage for both the events. It indicatesthe potential of using checkins for incident detection, by narrowing down the searches to specifictimes and locations.

A Tale of Two ZonesOur controlled region spans some adjacentlivehoods(19) with different life activity patterns, in-cluding East Liberty and Bakery Square.

We choose two zones,Z1 in East Liberty andZ2 in Bakery Square as shown in Figure 1b,for closer observation. For the two zones, accurate departure vehicle flows can be detected at theside-street exits of the intersectionsF andH, and of the intersectionsO andM , respectively.

Figures 9a and 9b show the seasonal average vehicle flow patterns for zonesZ1 andZ2respectively. In the zoneZ1, the traffic flow in weekends is significantly higher than thatin week-days; While in the zoneZ2, the traffic flow in weekends is significantly lower than that in week-days. The zonesZ1 and Z2 respectively contain a major store (Target) and a major company(Google), the two figures therefore are consistent with the the common human behaviors of shop-ping in weekends but working in weekdays.

In the participatory sensing data, there were 2390 checkinsfrom 1054 users in ZoneZ1,and 5349 checkins from 1260 users in ZoneZ2. There were only 253 users appeared in both zones.

We further find thetransitionsof users for the zones. For each zone, each user transition

Page 13: An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an area controlled by

Xie and Wang 12

10-Week AverageReal-Time Data

Time (hour)

Vehicle

Count

43210-1-2-3-4

50

40

30

20

10

0

(a) Southbound Flow at IntersectionA

Difference

Checkin Time: 6/18/2014 12:25:17

Time (hour)

Vehicle

Count

43210-1-2-3-4

20

0

-20

-40

Difference

Checkin Time: 6/18/2014 12:25:17

Time (hour)

Vehicle

Count

43210-1-2-3-4

20

0

-20

-40

(b) Detrended Southbound Flow at IntersectionA

10-Week AverageReal-Time Data

Time (hour)

Vehicle

Count

43210-1-2-3-4

120

100

80

60

40

20

0

(c) Eastbound Flow at IntersectionQ

Difference

Checkin Time: 3/26/2014 7:07:57

Time (hour)

Vehicle

Count

43210-1-2-3-4

40

0

-40

-80

-120Difference

Checkin Time: 3/26/2014 7:07:57

Time (hour)

Vehicle

Count

43210-1-2-3-4

40

0

-40

-80

-120

(d) Detrended Eastbound Flow at IntersectionQ

FIGURE 8 : Vehicle Flow Patterns near Checkins of the “Traffic” Topic.

is defined as two consecutive checkins, one is within the zoneand another is outside of the zone(or vice versa), made by a user in a time threshold (four hoursin this paper). If the later checkin isin the zone, the transition is anin-zonetransition, otherwise it is anout-zonetransition. For eachzone, the in-zone and out-zone transitions are more relatedto arrival and departure traffic. Thetransitions also provide origin and destination (O-D) information for each zone.

Figures 9c and 9d give respectively the checkin and transition patterns for the two zones.In Z1, most checkins can be classifed into transitions, which might due to the fact that users donot stay too long after shopping. InZ2, the number of transitions is significantly lower than thatof checkins, since users might stay long and send checkins intheir workplaces. For weekdays andweekends, the temporal checkin patterns are roughly similar to that of traffic flow patterns (seethe comparisons of Figures between 9a and 9c, and between 9b and 9d). Both the numbersof checkins and transitions shown in Figures 9c and 9d are nownot large enough to support astatistical analysis of correlation with the traffic flow data shown in Figures 9a and 9b. But thesituation should be improved in the near future based on the rapid increasing trend in participatorysensing data as shown in Figure 2a.

Figure 10 gives the user transition information for zonesZ1 andZ2. The origin/destination(O-D) checkins of transitions have been distributed broadly in the participatory sensing region. Asshown in 10c and 10d, the O-D locations are highly clustered,and the majority of them are coveredby a few clusters of sources. For the two zones, they share most of the O-D sources (clusters inblue), even though they have an apparent difference in both the checkin and flow patterns. Itimplies that urban traffic might be mostly generated among a few clustered sources.

Page 14: An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an area controlled by

Xie and Wang 13

2014-B2014-A

2013-D2013-C

2013-B2013-A

Time (day)

AverageVehicle

Count

SunSatFriThuWedTueMon

300

200

100

0

(a) Seasonal Average Vehicle Flow Patterns in ZoneZ1

2014-B2014-A2013-D

Time (day)

AverageVehicle

Count

SunSatFriThuWedTueMon

500

400

300

200

100

0

(b) Seasonal Average Vehicle Flow Patterns in ZoneZ2Transition CountCheckin Count

Time (day)

Count

SunSatFriThuWedTueMon

60

50

40

30

20

10

0

(c) Checkin and Transiton Patterns in ZoneZ1Transition CountCheckin Count

Time (day)

Count

SunSatFriThuWedTueMon

100

80

60

40

20

0

(d) Checkin and Transiton Patterns in ZoneZ2

FIGURE 9 : Vehicle Flow and Checkin Patterns for Two ZonesZ1 andZ2.

Page 15: An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an area controlled by

Xie and Wang 14

(a) Transitions for ZoneZ1 (b) Transitions for ZoneZ2

(c) Origin/Destination Checkins for ZoneZ1 (d) Origin/Destination Checkins for ZoneZ2

FIGURE 10 : Users Transitions for ZonesZ1 andZ2.

This O-D information might be used to further understand thetraffic demands, based onsome modeling frameworks, e.g., the gravity model (31) or the radiant model (2). These clus-tered O-D locations can be used to construct popular routes (32), and the information beyond thetraffic control region might then be useful for recommendingtime-sensitive alternative routes (33)to help reducing traffic congestion within urban traffic control systems (34), especially if trafficanomaly (35) has been quickly identified or predicted. It is also possible to provide carpoolingrecommendation for users based on the similarity in their O-D transition patterns.

CONCLUSIONSIn this empirical study, we explored the potentials of combining participatory and physical sensingdata for better understanding and improving urban mobilitynetworks. The participatory sensingdata were collected in the Pittsburgh metropolitan area with the APIs provided by the currentlocation-based services, and the physical traffic flow data were collected in an area controlled by

Page 16: An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an area controlled by

Xie and Wang 15

the smart and scalable urban traffic control system. For boththe sensing data, spatial and temporalcharacteristics were displayed.

We first investigated human mobility patterns for a better understand on urban mobility net-work. By examining user checkins, we disclosed the distribution of user behaviors, a fundamentalstatistical properties of mobility pattern. With geo-location based cluster analysis, we identifiedpersonal favoriteplacesof users in the studied regions. By measuring user entropy, we revealedthe degree of predictability of user activities. By analyzing time-dependent mobility patterns, weshowed the regularity of user behaviors.

Next we informed how we might take advantage of the combined participatory and physicalsensing data to improve urban mobility applications. We presented three examples to illustratethe value of the combined data for urban mobility networks: (1) anomaly traffic detection andreasoning, (2) topic-based nontrivial traffic informationextraction, and (3) traffic demand analysis.The study might shed some lights on further research for enhancing urban mobility.

Acknowledgements.This research was supported in part by the T-SET University TransportationCenter at CMU, the CMU Robotics Institute, and the CMU Physics Department.

REFERENCES[1] Brockmann, D., L. Hufnagel, and T. Geisel. The scaling laws of human travel.Nature, Vol. 439, No. 7075, 2006,

pp. 462–465.

[2] Simini, F., M. C. González, A. Maritan, and A.-L. Barabási. A universal model for mobility and migrationpatterns.Nature, Vol. 484, No. 7392, 2012, pp. 96–100.

[3] Song, C., T. Koren, P. Wang, and A.-L. Barabási. Modelling the scaling properties of human mobility.NaturePhysics, Vol. 6, No. 10, 2010, pp. 818–823.

[4] Gonzalez, M. C., C. A. Hidalgo, and A.-L. Barabasi. Understanding individual human mobility patterns.Nature,Vol. 453, No. 7196, 2008, pp. 779–782.

[5] Han, X.-P., Q. Hao, B.-H. Wang, and T. Zhou. Origin of the scaling law in human mobility: Hierarchy of trafficsystems.Physical Review E, Vol. 83, No. 3, 2011, p. 036117.

[6] Papageorgiou, M., C. Diakaki, V. Dinopoulou, A. Kotsialos, and Y. Wang. Review of road traffic control strate-gies.Proceedings of the IEEE, Vol. 91, 2003, pp. 2043–2067.

[7] Xie, X.-F., S. F. Smith, L. Lu, and G. J. Barlow. Schedule-driven intersection control.Transportation ResearchPart C, Vol. 24, 2012, pp. 168–189.

[8] Xie, X.-F., S. F. Smith, and G. J. Barlow. Schedule-driven coordination for real-time traffic network control. InInternational Conference on Automated Planning and Scheduling, Sao Paulo, 2012, pp. 323–331.

[9] Xie, X.-F., S. F. Smith, and G. J. Barlow.Smart and Scalable Urban Signal Networks: Methods and Systems forAdaptive Traffic Signal Control. U.S. Patent Pending, Application No. 14/308,238, 2014.

[10] Burke, J. A., D. Estrin, M. Hansen, A. Parker, N. Ramanathan, S. Reddy, and M. B. Srivastava. Participatorysensing. InInternational Workshop on World-Sensor-Web, 2006.

[11] Silva, T. H., P. Vaz de Melo, J. M. Almeida, and A. A. Loureiro. Challenges and opportunities on the large scalestudy of city dynamics using participatory sensing. InIEEE Symposium on Computers and Communications,2013, pp. 528–534.

Page 17: An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an area controlled by

Xie and Wang 16

[12] Doran, D., S. Gokhale, and K. C. Konduri. ParticipatoryParadigms: Promises and Challenges for Urban Trans-portation. InTransportation Research Board Annual Meeting, 2014.

[13] Ferrari, L., A. Rosi, M. Mamei, and F. Zambonelli. Extracting urban patterns from location-based social net-works. InACM SIGSPATIAL International Workshop on Location-Based Social Networks, 2011, pp. 9–16.

[14] Chiang, M.-F., Y.-H. Lin, W.-C. Peng, and P. S. Yu. Inferring distant-time location in low-sampling-rate tra-jectories. InACM SIGKDD international Conference on Knowledge Discovery and Data Mining, 2013, pp.1454–1457.

[15] Wang, D., D. Pedreschi, C. Song, F. Giannotti, and A.-L.Barabasi. Human mobility, social ties, and link predic-tion. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011, pp. 1100–1108.

[16] Cheng, Z., J. Caverlee, K. Lee, and D. Z. Sui. Exploring millions of footprints in lcation sharing services. InInternational AAAI Conference on Weblogs and Social Media, 2011, pp. 81–88.

[17] Gao, H., J. Tang, X. Hu, and H. Liu. Modeling temporal effects of human mobile behavior on location-basedsocial networks. InACM International Conference on Information and KnowledgeManagement, 2013, pp. 1673–1678.

[18] Arase, Y., X. Xie, T. Hara, and S. Nishio. Mining people’s trips from large scale geo-tagged photos. InInterna-tional Conference on Multimedia, 2010, pp. 133–142.

[19] Cranshaw, J., R. Schwartz, J. I. Hong, and N. M. Sadeh. The livehoods project: Utilizing social media to un-derstand the dynamics of a city. InInternational AAAI Conference on Weblogs and Social Media, 2012, pp.58–65.

[20] Noulas, A., S. Scellato, C. Mascolo, and M. Pontil. An empirical study of geographic user activity patterns inFoursquare. InInternational AAAI Conference on Weblogs and Social Media, 2011, pp. 570–573.

[21] Schneider, C. M., V. Belik, T. Couronné, Z. Smoreda, andM. C. González. Unravelling daily human mobilitymotifs.Journal of the Royal Society: Interface, Vol. 10, No. 84, 2013, p. 20130246.

[22] Hasan, S. and S. V. Ukkusuri. Urban activity pattern classification using topic models from online geo-locationdata.Transportation Research Part C, Vol. 44, 2014, pp. 363–381.

[23] Long, X., L. Jin, and J. Joshi. Exploring trajectory-driven local geographic topics in foursquare. InWorkshop onLocation-Based Social Networks, 2012, pp. 927–934.

[24] Cheng, C., H. Yang, I. King, and M. R. Lyu. Fused matrix factorization with geographical and social influencein location-based social networks. InAAAI Conference on Artificial Intelligence, 2012, pp. 17–23.

[25] Song, C., Z. Qu, N. Blumm, and A.-L. Barabási. Limits of predictability in human mobility.Science, Vol. 327,No. 5968, 2010, pp. 1018–1021.

[26] Jiang, S., J. Ferreira Jr, and M. C. Gonzalez. Discovering urban spatial-temporal structure from human activitypatterns. InACM SIGKDD International Workshop on Urban Computing, 2012, pp. 95–102.

[27] Ester, M., H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatialdatabases with noise. InInternational Conference on Knowledge Discovery and Data Mining, 1996, pp. 226–231.

[28] Hajji, H. Statistical analysis of network traffic for adaptive faults detection.IEEE Transactions on Neural Net-works, Vol. 16, No. 5, 2005, pp. 1053–1063.

Page 18: An Empirical Study of Combining Participatory and Physical ... · Foursquare, Flickr, Picasa, and Panoramio. The physical traffic flow data are collected in an area controlled by

Xie and Wang 17

[29] Cheng, Z., J. Caverlee, and K. Lee. You are where you tweet: a content-based approach to geo-locating twitterusers. InACM International Conference on Information and KnowledgeManagement, 2010, pp. 759–768.

[30] He, J., W. Shen, P. Divakaruni, L. Wynter, and R. Lawrence. Improving traffic prediction with tweet semantics.In International Joint Conference on Artificial Intelligence, 2013, pp. 1387–1393.

[31] Yang, F., P. J. Jin, X. Wan, R. Li, and B. Ran. Dynamic Origin-Destination Travel Demand Estimation usingLocation Based Social Networking Data. InTransportation Research Board Annual Meeting, Washington, DC,2014.

[32] Wei, L.-Y., Y. Zheng, and W.-C. Peng. Constructing popular routes from uncertain trajectories. InACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining, 2012, pp. 195–203.

[33] Hsieh, H.-P., C.-T. Li, and S.-D. Lin. Exploiting large-scale check-in data to recommend time-sensitive routes.In ACM SIGKDD International Workshop on Urban Computing, 2012, pp. 55–62.

[34] Xie, X.-F., Y. Feng, S. F. Smith, and K. L. Head. Unified route choice framework and empirical study in urbantraffic control environment. InTransportation Research Board Annual Meeting, Washington, DC, 2014.

[35] Pan, B., U. Demiryurek, C. Shahabi, and C. Gupta. Forecasting spatiotemporal impact of traffic incidents onroad networks. InIEEE International Conference on Data Mining, 2013, pp. 587–596.