Predicting Travel Time Reliability using Mobile Phone GPS Data€¦ · travel time reliability to a...

Predicting Travel Time Reliability using Mobile PhoneGPS Data

Dawn Woodard, Cornell University, Ithaca, NY∗

Galina Nogin, Cornell University, Ithaca, NY†

Paul Koch, Microsoft Research, Redmond, WADavid Racz, Microsoft Corp., Palo Alto, CA‡

Moises Goldszmidt, Microsoft Research, Mountain View, CA§

Eric Horvitz, Microsoft Research, Redmond, WA

November 11, 2016

Abstract

Estimates of road speeds have become commonplace and central to route planning, but few systemsin production provide information about the reliability of the prediction. Probabilistic forecasts of traveltime capture reliability and can be used for risk-averse routing, for reporting travel time reliability to auser, or as a component of fleet vehicle decision-support systems. Many of these uses (such as thosefor mapping services like Bing or Google Maps) require predictions for routes in the road network, atarbitrary times; the highest-volume source of data for this purpose is GPS data from mobile phones. Weintroduce a method (TRIP) to predict the probability distribution of travel time on an arbitrary route ina road network at an arbitrary time, using GPS data from mobile phones or other probe vehicles. TRIPcaptures weekly cycles in congestion levels, gives informed predictions for parts of the road networkwith little data, and is computationally efficient, even for very large road networks and datasets. Weapply TRIP to predict travel time on the road network of the Seattle metropolitan region, based on largevolumes of GPS data from Windows phones. TRIP provides improved interval predictions (forecastranges for travel time) relative to Microsoft’s engine for travel time prediction as used in Bing Maps.It also provides deterministic predictions that are as accurate as Bing Maps predictions, despite usingfewer explanatory variables, and differing from the observed travel times by only 10.1% on average over35,190 test trips. To our knowledge TRIP is the first method to provide accurate predictions of traveltime reliability for complete, large-scale road networks.Keywords. Location data, traffic, travel time, forecasting, statistics.

1 Introduction

Several mapping services provide predictions of the expected travel time on an arbitrary route in a roadnetwork, in real time and using traffic, time of day, day of the week, and other information. They use∗Corresponding author; http://people.orie.cornell.edu/woodard. Research performed while Visiting Re-

searcher at Microsoft Research.†Research performed while intern at Microsoft Research.‡David Racz is currently at Google, Mountain View, CA.§Current contact for Moises Goldszmidt is [email protected].

1

http://people.orie.cornell.edu/woodard

these predictions to recommend a route or routes with minimum expected travel time. Microsoft’s mappingservice (Bing Maps) predicts travel time for large-scale road networks around the world using a methodcalled Clearflow (Microsoft Research, 2012), which employs probabilistic graphical models learned fromdata to predict flows on arbitrary road segments. The method, which has its roots in the earlier Smartphloweffort on forecasting highway flows and reliability (Horvitz et al., 2005), considers evidence about real-timetraffic conditions, road classifications, topology of the road network, speed limits, time of day and day ofweek, and numerous other variables. With Clearflow, travel time predictions made on all segments across ageographic region are used in route-planning searches (Delling et al., 2015).

Beyond expected flows, it is important to consider uncertainty in travel time caused for instance by un-predictable traffic light schedules, accidents, unexpected road conditions, and differences in driver behavior.Such travel time variability (conversely, its reliability) also strongly affects the desirability of routes in theroad network (Jenelius, 2012; Texas Transportation Institute, 2015). For fleets of delivery vehicles, suchas those transporting perishables, decisions including routing need to provide on-time deliveries with highprobability. In the case of ambulance fleets, taking into account uncertainty in the travel time of an ambu-lance to potential emergency scenes leads to improved ambulance positioning decisions, and consequentlyincreases the survival rate of cardiac arrest patients (Erkut et al., 2007). A prediction of the probabilitydistribution of travel time can be more valuable than a deterministic prediction of travel time, by accountingnot just for measured traffic congestion and other known conditions, but also for the presence of unmea-sured conditions. Distribution predictions of travel time can be used for risk-averse routing, for reportingtravel time reliability to a user (e.g. the travel time is predicted to be in the range 10-15 minutes), and as acomponent of fleet vehicle decision-support systems (Samaranayake et al., 2012; Westgate et al., 2016).

Figure 1: Anonymized GPS locations from Windows phones in the Seattle metropolitan region, aggregatedinto 4.8m×3.2m grid boxes and colored by average speed in the grid box. Image resolution has beenreduced for privacy reasons.

We introduce a statistical solution to predicting the distribution of travel time on an arbitrary route inthe road network, at an arbitrary future time. We call the method TRIP (for travel time reliability inferenceand prediction). For typical road networks of interest, the number of possible routes is extremely large,and any particular route may have very few or no observed trips in the historical data. For these reasons

2

it is infeasible to apply methods designed for prediction on a particular set of heavily traveled routes, suchas Jenelius & Koutsopoulos (2013); Ramezani & Geroliminis (2012); Rahmani et al. (2015). TRIP usesinformation from all the trips in the historical data to train a model for travel time on routes, learning thecharacteristics of individual roads and the effect of time of the week, road classification, and speed limit.We model travel time variability both at the trip level and at the level of the individual road network linksincluded in the route. This decomposition is appropriate because some sources of variability affect theentire trip (such as driver habits, vehicle characteristics, or unexpected network-wide traffic conditions),while other sources of variability are localized (such as a delay due to a train crossing or construction). Wedefine a network link to be a directed section of road that is not divided by an intersection, and on which themeasured features of the road (road classification, speed limit, number of lanes, etc.) are constant.

TRIP captures important features of the data, including weekly cycles in congestion levels, heavy rightskew of travel time distributions, and probabilistic dependence of travel times for different links within thesame trip (for example, if the travel speed is high on the first link of the route, the speed is also likely tobe high on the other links of the route). We capture the multimodality of travel time distributions using amixture model where the mixture components correspond to unobserved congestion states, and model theprobabilistic dependence of these congestion states across the links of the route using a Markov model.Because we model travel time for individual links, the travel time prediction can be updated en route.

We introduce a computational method for training and prediction based on maximum a posteriori esti-mation via Expectation Conditional Maximization (Meng & Rubin, 1993). This yields an iterative trainingprocess with closed-form update equations that can be computed using parallelization across links and trips.As a result it is computationally efficient even on large road networks and for large datasets.

TRIP uses GPS data from vehicle trips on the road network; we obtain large volumes of such tripsusing anonymized mobile phone GPS data from Windows phones in the Seattle metropolitan region. Wecompare the accuracy of our predictions to a variety of alternative approaches, including Clearflow. TheGPS location and speed measurements are illustrated in Figure 1, which shows that they contain valuableinformation regarding the speed of traffic on individual roads. Unlike other sources of vehicle speed infor-mation (Hofleitner et al., 2012b), vehicular GPS data does not require instrumentation on the roadway, andcan achieve near-comprehensive coverage of the road network. Additionally, there is increasing evidencethat traffic conditions can be estimated accurately using only vehicular GPS data (Work et al., 2010). Onechallenge of mobile phone GPS data is that it is often sampled at low frequency (typically 1-90 secondsbetween measurements). A related data source, GPS data from fleet vehicles, is also often sampled at lowfrequency (Rahmani et al., 2015), and TRIP can also be applied to such data.

Some existing approaches to predicting the probability distribution of travel time on a road networkmodel exclusively link-level variability, and assume independence of travel time across the links in theroute (Westgate et al., 2013; Hunter et al., 2013a). This leads to considerable underprediction of the amountof variability (Westgate et al. (2013) and Section 4). Dependence across links is incorporated by Hofleitneret al. (2012a,b), who use a mixture model for travel time on links where the mixture component represents acongestion state (as we do). They allow these congestion states to depend on the link and the time, and modeldependence of their congestion states across the road network and across time using a dynamic Bayesiannetwork. This approach is intuitive but computationally demanding (leveraging a high-dimensional parti-cle filter in each iteration of the algorithm), so is unlikely to be efficient enough to use on complete roadnetworks (they apply it to 800 links in the San Francisco arterial network). Additionally, the method still un-derpredicts the amount of variability in travel time. Motivated by evidence in the data (Section 4), we allowthe congestion state to additionally depend on the whole trip. We model dependence of this congestion stateacross the links of the route, instead of across all links of the network. This improves the flexibility of themodel, leading to accurate variability prediction. It also facilitates computation: because our specification

3

corresponds to a one-dimensional Markov model, computation can be done exactly and efficiently in eachiteration, by using the forward-backward algorithm (cf., Russell & Norvig (2009)).

Another existing method for prediction of travel time variability (Hunter et al., 2013b) is designed forhigh-frequency (e.g. every second) GPS measurements. These authors estimate the number of stops ofthe vehicle on each link of the route, and introduce a model for the number of stops and a model for thedistribution of travel time conditional on the number of stops. There are also some methods designed foremergency vehicles, which directly model the distribution of travel time on the entire trip, as a functionof quantities like the route distance (Budge et al., 2010; Westgate et al., 2016). This is appropriate foremergency vehicles because the relevant data source (lights-and-sirens emergency vehicle trips) is too low-volume to effectively model the distribution of travel times for individual links; for non-emergency data itdoes not work as well as our approach, as we show in Section 4.

The scale of the road network for which we do prediction of travel time distributions (221,980 links) isan order of magnitude larger than networks studied in existing work for non-emergency vehicles. Hunteret al. (2013b) consider a network of over half a million links, but then remove the links with too fewobservations; presumably this limits the predictions to routes that do not include any of the deleted links.

A major use case for TRIP is in risk-averse routing. Conceptually speaking, one can recommend aroute by optimizing a route selection criterion based on variability (for example, minimizing a particularpercentile of travel time). However, unlike expected travel time, most variability-based criteria are notadditive across the links in the route. For this reason, variability predictions from TRIP cannot be directlyused to do route planning via standard algorithms like Dijkstra, A∗, or hub-labeling (Delling et al., 2015).However, one could use a hybrid procedure that first obtains a set of candidate routes by minimizing anadditive criterion such as expected travel time, then ranks those routes according to a variability-basedcriterion. Ranking according to the 90th percentile of travel time, for example, would provide a risk-averseroute selection. Alternatively, the candidate routes could be directly shown to users, along with associatedpredictive ranges for travel time, allowing the user to select the route according to their risk preferences.

A second use case for TRIP is in fleet vehicle decision systems, such as ride-sharing platforms. Theseare similar to the ambulance positioning use case, in that they require travel time predictions conditional onorigin and destination but unconditional on route. This use case could be addressed by obtaining an optimalroute as above and then predicting the travel time distribution for that route. Such an approach is takenfor ambulances, as described in Westgate et al. (2013, 2016). Although some bias is introduced due to thedriver not always following the optimal route, one can adjust for this bias.

In Section 2 we describe our statistical model and in Section 3 present methods for training and predic-tion with the model. Section 4 gives the Seattle case study, including providing support for our modelingchoices and reporting our prediction results. We draw conclusions and discuss extensions in Section 5.

2 Modeling

TRIP uses GPS measurements recorded periodically during vehicle trips. Each GPS observation consistsof location, speed, and heading measurements, along with a time stamp. For mobile phones, a GPS mea-surement may be recorded whenever phone applications access GPS information, many of which are notmapping or routing applications. For this reason, the frequency of GPS measurements vary, and the phoneis not necessarily in a moving vehicle at the time when the measurements are taken. However, motorizedvehicle trips can be isolated using a set of heuristics.

For the Seattle case study, we identify sequences of at least 3 GPS measurements from the same devicethat satisfy requirements such as: (a) starting and ending with speed measurements of at least 3 m/s, (b)

4

having median speed of at least 5 m/s and maximum speed of at least 9 m/s, and (c) covering distance at least1 km (as measured by the sum of the great circle distances between pairs of sequential measurements). Theresulting GPS sequences appear to consist almost exclusively of motorized vehicular trips that follow pathsin the road network. The requirements regarding the median and maximum speeds, for example, eliminatemost trips from non-motorized travel such as biking or walking. We define each trip to start with the firstGPS reading and end with the last GPS reading in the sequence. Consequently, the total trip duration isobserved precisely (as the difference of the two time stamps). The median time between GPS readings inthe resulting trips is 16s.

The next step is to estimate the route taken in each trip i ∈ I, by which we mean the sequence Ri =(Ri,1, . . . , Ri,ni) of links traversed (so that Ri,k for k ∈ {1, . . . , ni} is an element of the set J of networklinks), the distance di,k traversed for each link Ri,k (so that di,k is equal to the length of link Ri,k for allexcept possibly the first and last link of the trip), and the travel time Ti,k on each link Ri,k. Obtaining thisestimate is called map-matching, a problem for which there are numerous high-quality approaches (Newson& Krumm, 2009; Hunter et al., 2014).

Some of those approaches provide the uncertainty associated with the path and with the link travel times.Although this uncertainty can be taken into account during statistical modeling, we have found little benefitto this approach in prior work on predicting travel time distributions on routes (see Section 5.2 of Westgateet al. (2016)). This is due to the fact that the start and end locations and times for the trips are known with ahigh degree of accuracy, and the uncertainty is primarily with regards to the allocation of that total time tothe links of the route. Ignoring this uncertainty can affect the estimates of the model parameters (Jenelius& Koutsopoulos, 2015), but in our analysis did not significantly affect the predictions for the travel time onthe entire trip.

It is common practice to obtain estimated link travel times for the historical data, and then to estimateparameters of a travel time model using those link travel times (Hofleitner et al., 2012b; Westgate et al.,2016). In order to take into account the uncertainty, one either has to (a) handle a large number of latentvariables (unobserved quantities) that make estimation of travel time parameters far more computationallyintensive (Hunter et al., 2009; Westgate et al., 2013); or (b) assume that travel times on links are multivari-ate Gaussian or independent gamma distributed (Jenelius & Koutsopoulos, 2013; Hofleitner et al., 2012a;Hunter et al., 2013a). The latter assumptions don’t hold even approximately in our dataset; see Section 4.

For these reasons, we use deterministic rather than probabilistic estimates of Ri,k, di,k, and Ti,k asobtained from a standard map-matching procedure. First, we obtain the route estimate using the methodof Newson & Krumm (2009). Then, we allocate the total travel time to the links of the route as follows.Map the GPS observations to the closest location on the route; for sequential pairs of GPS points, allocatethe time spent between those points proportionally to the length of the full or partial links along the routebetween the two points. Then the total time spent on each link is calculated as the sum of the full or partialtimes associated with that link. Trips that don’t follow the road network closely are discarded; these canbe from travel by train, for example. Although our focus is on personal vehicular travel, it is possible thatsome of the trips are from bus or other transit modes. We note that this is an unavoidable challenge whenusing mobile phone location data.

Having obtained the values Ti,k, we model Ti,k as the ratio of several factors:

Ti,k =di,k

EiSi,ki ∈ I, k ∈ {1, . . . , ni} (1)

where Ei and Si,k are positive-valued latent variables associated with the trip and the trip-link pair, respec-tively. The latent variable Ei captures the fact that the trip i may have, say, 10% faster speeds than averageon every link in the trip. This could occur for example due to traffic conditions that affect the entire trip, or

5

to driver habits or vehicle characteristics. The latent variable Si,k represents the vehicle speed on the linkbefore accounting for the trip effect Ei, and captures variability in speed due to local conditions such aslocal traffic situations, construction on the link, and short-term variations in driver behavior. The model (1)decomposes the variability of travel time on route Ri into two types: link-level variability captured by Si,k,and trip-level variability captured by Ei.

We model Ei with a log-normal distribution:

log(Ei) ∼ N(0, τ2) (2)

for unknown variance parameter τ2. The evidence from the Seattle mobile phone data that supports ourmodeling assumptions is discussed in Section 4. Other data sets may have different characteristics, and theassumption (2) can be replaced if needed with a t-distribution on log(Ei) without substantively affectingthe computational method described in Section 3 (Liu, 1997). Additionally, the variance τ2 can be allowedto depend on the time at which trip i begins, the type of route (highway, arterial, etc.) and other factors.Again, this has little impact on the computational method or complexity. For the Seattle data, we foundthat the distribution of estimated trip effects Ei varied too little across times of week and parts of the roadnetwork to motivate this extension.

We model Si,k in terms of an unobserved discrete congestion state Qi,k ∈ {1, . . . ,Q} affecting thetraversal of link Ri,k in trip i. Notice that this congestion state is allowed to depend on the trip, so that Qi,k

could be different for two trips traversing the same link Ri,k at the same time. This is motivated by featuresof the data, as we show in Section 4. Assume that the week has been divided into time bins b ∈ B thatreflect distinct traffic patterns, but do not have to be contiguous (such as “morning rush hour” or “weekenddaytime”), and let bi,k be the time bin during which trip i begins traversing link Ri,k. Conditional on Qi,k,we model Si,k with a log-normal distribution:

log(Si,k) |Qi,k ∼ N(µRi,k,bi,k,Qi,k, σ2

Ri,k,bi,k,Qi,k) (3)

where µj,b,q and σ2j,b,q are unknown parameters associated with travel speed on link j ∈ J under conditions

q ∈ {1, . . . ,Q}, in time bin b ∈ B. The normal distribution for log(Si,k) can be replaced with a t distribu-tion, or a skew normal or skew t distribution, as needed without substantively changing the computationalmethod described in Section 3 (Lee & McLachlan, 2013).

We use a Markov model for the congestion states Qi,k (motivated by features in the data; Section 4):

Pr(Qi,1 = q) = γRi,1,bi,1(q)

Pr(Qi,k = q|Qi,k−1 = q) = ΓRi,k,bi,k(q, q) k ∈ {2, . . . , ni}; q, q ∈ {1, . . . ,Q} (4)

where γj,b is an unknown probability vector for the initial congestion state for trips starting on link j, andΓj,b is the transition matrix for the congestion state on link j conditional on the congestion state in theprevious link of the trip, during time bin b. This model captures weekly cycles in the tendency of thelink to be congested; for example, there may be a high chance of congestion during rush hour. It alsoprovides a second way to capture dependence of travel time across links (in addition to the trip effect Ei).Our specifications (1)-(4) imply a normal mixture model for log(Ti,k); for instance, when k > 1 andconditioning on Qi,k−1 we obtain

log(Ti,k) | Qi,k−1 = q ∼Q∑

q=1

ΓRi,k,bi,k(q, q)N(log di,k − µRi,k,bi,k,q, σ

2Ri,k,bi,k,q + τ2). (5)

6

This mixture model captures important features of the data, including heavy right skew of the distributionsof the travel times Ti,k, and multimodality of the distributions of log(Ti,k); in particular, a mixture of log-normal distributions provides a good approximation to the distribution of vehicle speeds on individual links(see Section 4). In order to enforce the interpretation of the mixture components q as increasing levels ofcongestion, we place the restriction µj,b,q−1 ≤ µj,b,q for each j ∈ J , b ∈ B, and q ∈ {2, . . . ,Q}.

Typically, there are some network links j ∈ J with insufficient data (in terms of the number of linktraversals Lj ≡ |{i ∈ I, k ∈ {1, . . . , ni} : Ri,k = j}|) to accurately estimate the link-specific parametersµj,b,q, σ2

j,b,q, γj,b, and Γj,b. For such links, we use a single set of parameters within each road category, bywhich we mean the the combination of road classification (e.g., “highway”, “arterial”, or “major road”) andspeed limit, and which is denoted by c(j) for each link j. Defining a minimum number L of traversals, forlinks with Lj < L we set

µj,b,q = µc(j),b,q, σ2j,b,q = σ2

c(j),b,q, γj,b = γc(j),b, Γj,b = Γc(j),b

for q ∈ {1, . . . ,Q}, b ∈ B, j ∈ J : Lj < L (6)

where µc,b,q, σ2c,b,q, γc,b, and Γc,b are parameters associated with the road category c ∈ C.

Our travel time model (1)-(6) incorporates both trip-level variability like driver effects, and link-levelvariability due for example to construction. It also captures the effect of weekly cycles, speed limit, androad classification. Combined with an assumption regarding changes in vehicle speed across the link, itprovides a realistic model for the location of the vehicle at all times during the trip. Since links are typicallyshort, we assume constant speed of each vehicle across the link. This assumption can be relaxed using theapproach described in Hofleitner et al. (2012a), although those authors find only a modest improvement inpredictive accuracy relative to a constant-speed assumption. Our model does not currently take into accountobservations about real-time traffic conditions, although this is a direction of ongoing work; see Section 5.

3 Training and Prediction

Computation is done in two stages: model training (obtaining parameter estimates) and prediction (usingthose estimates in the model to obtain a forecast distribution). Training can be done offline and repeatedperiodically, incorporating new data and discarding outdated data. This can be used to accumulate informa-tion about rarely observed links and to account for changes in the weekly cycles of travel times, or trendslike gradual increases in total traffic volume. Prediction is done at the time when a user makes a routing ortravel time query, so must be very computationally efficient.

An alternative Bayesian approach is to take into account uncertainty in the parameter values when doingprediction, by integrating over the posterior distribution of the parameters (their probability distributionconditional on the data). This is typically done using Markov chain Monte Carlo computation (Gelmanet al., 2013). However, this approach is more computationally intensive, with complexity that can scalepoorly (Woodard & Rosenthal, 2013). Additionally, in other work on travel time distribution prediction, wehave found that the parameter uncertainty is dwarfed by the travel time variability (Westgate et al., 2016),so that there is little change in the predictions using the more computationally intensive approach. Fornotational simplicity, in this description we drop the use of common parameters as in (6).

3.1 Training

We train the model using maximum a posteriori (MAP; cf. Cousineau & Helie (2013)) estimation, an ap-proach in which one estimates the vector θ of unknown quantities in the model to be the value that maxi-

7

mizes the posterior density of θ. Latent variables can either be integrated out during this process (if compu-tationally feasible), or included in the vector θ. For example, in image segmentation using Markov randomfield models, there is a long history of MAP estimation where θ is taken to include all of the latent variables.In this case the latent variables correspond to the segment associated with each image pixel, of which therecan be hundreds of thousands, and approximate MAP estimation of the latent variables provides a computa-tionally tractable solution (Besag, 1986; Grava et al., 2007). We take an intermediate approach, integratingover the congestion variables Qi,k (for which the uncertainty is high), and doing MAP estimation of the tripeffects Ei (for which the uncertainty is lower due to having multiple observations Ti,k for each trip). Sowe take θ ≡ (τ, {µj,b,q, σj,b,q, γj,b,Γj,b}j∈J ,b∈B,q∈{1,...,Q}, {logEi}i∈I) to include the parameters and thetrip effects. We are able to do MAP estimation of θ using an efficient iterative procedure with closed-formupdates. MAP estimation does not depend on the transformations chosen in θ (e.g., the exponentiated MAPestimate of logEi is equal to the MAP estimate of Ei, if both are unique). Our computational procedure isguaranteed to obtain only a local maximum of the posterior density, but by repeating the procedure multipletimes using random initializations one can increase the chance of finding the global maximum.

MAP estimation requires specification of a prior density for the parameters; we use the priorp(τ, {µj,b,q, σj,b,q, γj,b,Γj,b}) ∝ 1 that is uniform on the support of the parameter space. This prior is non-integrable, but leads to valid MAP estimation. Such uniform priors on unbounded parameter spaces arecommonly used in situations where there is little or no prior information regarding the parameter values(Section 2.8 of Gelman et al. (2013)); see for instance Gelman (2006) for the use of uniform priors forstandard deviation parameters like τ and σj,b,q.

Now consider the observed data to consist of the transformed values {log Si,k}i∈I,k∈{1,...,ni} wherelog Si,k ≡ log di,k− log Ti,k is the log speed during link traversal i, k. Because the prior density is uniform,MAP estimation of θ corresponds to maximizing over θ the product of the likelihood function times theprobability density of the trip effects:

p(θ|{log Si,k}) = p({log Si,k}|θ) p(θ)/p({log Si,k})∝ p({log Si,k}|θ) p({logEi}|τ). (7)

Maximum likelihood estimation of θ would maximize p({log Si,k}|θ); by including the second term p({logEi}|τ)in our objective function (7), we reduce noise in the estimated logEi values (a technique called regulariza-tion; James et al. (2013)). Notice that to compute the likelihood function p({log Si,k}|θ) =∑{Qi,k} p({Qi,k, log Si,k}|θ) directly, one would have to sum over all the possible values of the latent

variables {Qi,k}, which is computationally infeasible.Expectation Maximization (EM) is an iterative method for maximum likelihood or MAP estimation in

the presence of many latent variables; accessible introductions are given in Hofleitner et al. (2012a) and,in more detail, Bilmes (1997). EM is most efficient when the parameter updates in each iteration can bedone in closed form, which is not the case for our model. However, we can obtain closed-form updatesusing a variant called Expectation Conditional Maximization (ECM; Meng & Rubin (1993)). ECM allowsfor closed-form updates in situations where the parameter vector can be partitioned into subvectors, each ofwhich would have a closed-form EM update if the remaining parameters were known.

We apply ECM by partitioning the parameter vector θ into the two subvectorsθ1 = (τ, {µj,b,q, σj,b,q, γj,b,Γj,b}) and θ2 = {logEi}. In ECM, attention focuses on the complete-data log

8

posterior density log p(θ|{Qi,k, log Si,k}), which is equal to a term that does not depend on θ, plus

log p({Qi,k, log Si,k}

∣∣∣θ)+ log p({logEi}|τ) =∑i∈I

log(γRi,1,bi,1(Qi,1)) +∑

i∈I,k∈{2,...,ni}

log(ΓRi,k,bi,k(Qi,k−1, Qi,k))

+∑

i∈I,k∈{1,...,ni}

[−

log σ2Ri,k,bi,k,Qi,k

2−

(log Si,k − logEi − µRi,k,bi,k,Qi,k)2

2σ2Ri,k,bi,k,Qi,k

]

+∑i∈I

[− log τ2

2− (logEi)2

2τ2

]. (8)

To update a parameter subvector θj in each iteration of ECM, one treats the remaining subvectors θ[−j] asfixed, and maximizes over θj the expectation of (8) with respect to p({Qi,k} | θ(t), {log Si,k}). The latterquantity is the probability distribution of the congestion variables conditional on the data and the currentvalue θ(t) of θ. Such an update leads to an increase in the value of the objective function (7) in each iterationt, and ultimately to a (local) maximizer of (7), as shown in Meng & Rubin (1993).

This yields the procedure given in Algorithm 1. The first step is to run the forward-backward algorithmfor each trip. Briefly, the model (4) is a one-dimensional Markov model for {Qi,k : k ∈ 1, . . . , ni} for eachtrip i. Also, the model (1)-(3) implies a conditional distribution for the observed data log(Si,k) given Qi,k,namely log(Si,k) |Qi,k, logE(t)

i ∼ N(logE(t)i +µ(t)

Ri,k,bi,k,Qi,k, σ

2,(t)Ri,k,bi,k,Qi,k

). So p({Qi,k} | θ(t), {log Si,k})is a hidden Markov model for each i, for which computation can be done exactly and efficiently using theforward-backward algorithm (cf., Russell & Norvig (2009)).

We iterate until there is no change in the parameter estimates up to the third significant figure. The timecomplexity of each iteration of the ECM algorithm, implemented without parallelization, isO(Q2|J |

(|B| +

∑i∈I ni/|J |

)), i.e. the procedure is linear in the average number of traversals per link

(∑

i∈I ni)/|J |, which is a measure of the spatial concentration of data. If this quantity is held fixed (forexample if the same data-gathering mechanism is applied to a larger geographic region), the complexitygrows linearly in the size of the road network, |J |. The number of ECM iterations required for convergenceis also likely to grow with the size of the network and the spatial concentration of data, but this change inefficiency is difficult to characterize in general. In practice, ECM often converges quickly; for the Seattlecase study of Section 4, fewer than 50 iterations are required.

For the case study, training takes about 15 minutes on a single processor. The road map of NorthAmerica has about 400 times as many directional links as the Seattle metropolitan region, so it would takeroughly 4.2 days to run our implementation for the continent of North America, using the same numberof iterations of ECM that we used in Seattle. This time can be reduced dramatically, since each of theparameter updates (10)-(11) can be computed using parallelization across trips and/or links. Since trainingcan be done offline, and the model only needs to be retrained occasionally (e.g., monthly or weekly),the training procedure is expected to be sufficiently fast for use in commercial mapping services (whichoperate at a continental scale). Even if the spatial concentration of data is increased by several ordersof magnitude (e.g., where smartphone-based GPS systems provide coverage for a significant portion ofdrivers) the training procedure is likely to be efficient enough for commercial use by exploiting parallelism.As explained in the introduction, the trained TRIP model would be applied as a second stage of routeplanning, after obtaining a set of candidate routes that minimize an additive criterion like expected traveltime. The running time of this predictive step is discussed in Section 3.2.

The updates of µj,b,q, σj,b,q, and γj,b in (10) are the same as those in EM for normal mixture models(Bilmes, 1997), except that we restrict the calculation to relevant subsets of the data and adjust for the esti-

9

Algorithm 1 MAP Estimation for TRIP

1: Initialize θ(0) and t = 02: while t = 0 or ‖θ(t−1) − θ(t)‖ > ε do3: Use the forward-backward algorithm to calculate

φi,k(q) ≡ Pr(Qi,k = q | θ(t), {log Si,k}) (9)

ψi,k(q, q) ≡ Pr(Qi,k−1 = q, Qi,k = q | θ(t), {log Si,k})for i ∈ I, k ∈ {1, . . . , ni}, and q, q ∈ {1, . . . ,Q}.

4: Update the link parameter and τ estimates as:

µ(t+1)j,b,q =

∑{i,k:Ri,k=j,bi,k=b} φi,k(q)(log Si,k − logE(t)

i )∑{i,k:Ri,k=j,bi,k=b} φi,k(q)

σ2,(t+1)j,b,q =

∑{i,k:Ri,k=j,bi,k=b} φi,k(q)(log Si,k − logE(t)

i − µ(t+1)j,b,q )2∑

{i,k:Ri,k=j,bi,k=b} φi,k(q)

γ(t+1)j,b (q) =

∑i:Ri,1=j,bi,1=b

φi,1(q)

/

∑i:Ri,1=j,bi,1=b

1

Γ(t+1)

j,b (q, q) =

∑i,k:Ri,k=j, bi,k=b, k>1

ψi,k(q, q)

/

∑i,k:Ri,k=j, bi,k=b, k>1

φi,k−1(q)

τ2,(t+1) =

1|I|∑i∈I

(logE(t)i )2. (10)

5: Calculate

ai,k =Q∑

q=1

φi,k(q)

σ2,(t+1)Ri,k,bi,k,q

hi,k =Q∑

q=1

φi,k(q)µ(t+1)Ri,k,bi,k,q

σ2,(t+1)Ri,k,bi,k,q

6: Update the trip effect estimates:

logE(t+1)i =

∑k∈{1,...,ni}(ai,k log Si,k − hi,k)

1/τ2,(t+1) +∑

k∈{1,...,ni} ai,k. (11)

7: t = t+ 18: end while9: Take the parameter estimates to be θ = θ(t)

10

mated trip effect logE(t)i . The update of Γj,b is analogous to that of γj,b, but for a transition matrix instead

of a probability vector. The update of τ is the maximum likelihood estimate conditional on {logE(t)i }i∈I .

The update of logEi in (11) is not a standard form. However, in the special case where the σ2j,b,q are equal

for all j, b, and q, the updated value logE(t+1)i is approximately the average across k of the difference

between log Si,k and its expectation under the model, which is a reasonable estimator for the trip effect.

3.2 Prediction

Prediction in model (1)-(4) is done by simulation, as shown in Algorithm 2 for a new trip i. For the specifiedroute Ri, starting at a specified time, we simulateM vectors of travel times (T (m)

i,1 , . . . , T(m)i,ni

) directly fromthe model, using the trained parameter values θ from Algorithm 1. During prediction the time bins bi,kfor k > 1 are not fixed, but rather depend on the simulated value of {T (m)

i,1 , . . . , T(m)i,k−1}. Having obtained

simulated values from the distribution of total travel time∑ni

k=1 Ti,k in this way, Monte Carlo is used toapproximate the expectation, quantiles, percentiles, and other summaries of that distribution.

Algorithm 2 Prediction for TRIP1: for m ∈ {1, . . . ,M} do2: Sample logE(m)

i ∼ N(0, τ2)3: Sample Q(m)

i,1 from model (4), given the initial time bin bi,1 and the estimated vector γRi,1,bi,1

4: for k ∈ {1, . . . , ni} do5: if k > 1 then6: Determine the time bin bi,k based on the trip start time and {T (m)

i,1 , . . . , T(m)i,k−1}

7: Sample Q(m)i,k from model (4), given Q(m)

i,k−1, bi,k, and ΓRi,k,bi,k

8: end if9: Sample S(m)

i,k from model (3), given Q(m)i,k , bi,k, µ

Ri,k,bi,k,Q(m)i,k

, and σRi,k,bi,k,Q

(m)i,k

10: Set T (m)i,k = di,k/(E

(m)i S

(m)i,k )

11: end for12: end for

The time complexity of prediction, without parallelization, is O(M× ni). For the analysis of Sec-tion 4 we use M = 1000 Monte Carlo simulations per route; we also tried increasing to M = 10, 000,which yielded no measurable changes in the values of several summary statistics of the predictions, andconsequently no measurable improvements in the predictive accuracy. One can also calculate the estimatedMonte Carlo standard error for a particular route and set Mi to obtain a specified accuracy level for theprediction on that route i.

Prediction took an average of only 17 milliseconds per route, on a single processor. Commercial map-ping services require computation times of less than 50 milliseconds for a route recommendation. By takingthe two-stage approach to route recommendation discussed in the introduction (ranking candidate routes ac-cording to a probabilistic criterion), and by utilizing parallelization, the running time of our approach is fastenough for routing in commercial mapping services. It should even be efficient enough for more complexqueries involving multiple routes, like ranking points of interest by driving time.

11

4 Seattle Case Study

We obtained anonymized mobile phone GPS data gathered from Windows phones, for the Seattle metropoli-tan region during a time window in 2014 (Figure 1; the precise duration of the time period is kept confi-dential). No personal identifiers were available. We isolated vehicle trips and estimated the correspondingroutes as described in Section 2, yielding 145,657 trips that have distance along the road network of at least3 km. These trips have mean trip duration of 791 s, maximum trip duration of 6967 s, mean trip distance of11.4 km, and maximum trip distance of 97.2 km. We divide this dataset into a training dataset consisting ofthe 110,467 trips from the first three-quarters of the time period, and a test dataset consisting of the 35,190trips from the last one-quarter of the time period.

In Figure 2, we show the volume of trips in the training data, by the hour of the week in which the tripbegan. The volume dips overnight, and peaks associated with morning and evening rush hour are presenton weekdays. The volume of recorded trips is highest on Saturday and Sunday daytimes, most likely dueto higher usage of GPS-utilizing phone applications on weekends. We define the time bins b ∈ B basedin part on the changes in volume over the week as seen in Figure 2, yielding five bins: “AM Rush Hour”:weekdays 7-9 AM; “PM Rush Hour”: weekdays 3-6 PM; “Nighttime”: Sunday-Thursday nights 7 PM-6AM, Friday night 8 PM-9 AM, and Saturday night 9 PM-9 AM; “Weekday Daytime”: remaining timesduring weekdays; and “Weekend Daytime”: remaining times during weekends.

Hour of the Week

Ave

rage

Num

ber

of T

rips

0 24 48 72 96 120 144 168

Figure 2: Volume of trips in the Seattle data, by hour of the week. The scale of the y-axis is omitted forconfidentiality.

There are 221,980 directional network links in the study region. Of the link traversals in our dataset,32.8% are on highways, 34.8% are on arterials, 14.0% are on major roads, 12.1% are on surface streets, and6.3% are on other road classifications. We take the minimum number of observations per parameterizedlink (as used in (6)) to be L = 30. There are 24,990 directional links in the Seattle area road network thatsatisfy this criterion. Although this is only 11.3% of the links in the network, they account for 85.5% oflink traversals.

Using two congestion states (Q = 2), in Figure 3 we validate the estimated distribution of travel timeduring evening rush hour for some road links, based on our trained model. We focus on the links thathave the highest number of observed travel times Lj , showing the two most commonly observed highwaylinks (omitting links on the same section of highway), the most commonly observed “arterial” link, andthe most commonly observed “major road” link. For each of these links, Figure 3 gives the histogramof the travel times during evening rush hour from the training data, adjusted for the estimated trip ef-fect (i.e., the histogram of log Ti,k + log Ei). This adjustment is done so that we can overlay the esti-

12

mated density from the model in a way that’s comparable across trips (the curve in the plot, calculated as∑Qq=1 γRi,k,bi,k

(q)N(log di,k − µRi,k,bi,k,q, σ2Ri,k,bi,k,q)).

Figure 3 illustrates the multimodality of the distribution of travel time. Histograms restricting to par-ticular 15-minute time periods have a similar shape to those in Figure 3, indicating that the multimodalityis not caused by aggregating over time periods. The mixture of log-normals used by TRIP appears to fitthe observations well. By contrast, assuming a single gamma, normal, or log-normal distribution for traveltimes in a particular time period, as done for example in Hofleitner et al. (2012a,b); Westgate et al. (2013)and Hunter et al. (2013a), leads to poor model fit for this mobile phone data.

log( travel time )

Den

sity

1 2 3 4 5

0.0

1.0

travel time (s)

Den

sity

5 10 15 20

0.0

0.3

log( travel time )

Den

sity

3.5 4.0 4.5 5.0 5.5

0.0

1.0

2.0

travel time (s)

Den

sity

50 100 150

0.00

0.03

0.06

log( travel time )

Den

sity

1 2 3 4 5

0.0

0.4

0.8

travel time (s)

Den

sity

0 20 40 60 80 100

0.00

0.04

0.08

log( travel time )

Den

sity

1.0 1.5 2.0 2.5 3.0 3.5 4.0

0.0

1.0

2.0

travel time (s)

Den

sity

5 10 15 20

0.0

0.3

Figure 3: Validating the distribution of travel times estimated by TRIP on four road links. For each plot,the histogram shows the observed travel times during evening rush hour from the training data, adjusted byremoving the estimated trip effect (i.e., log Ti,k + log Ei). The curve is the corresponding estimated densityfrom TRIP.

Next we motivate use of the Markov model (4) for the congestion states Qi,k, and our choice to allowQi,k to depend on the trip rather than just the link and time. First, we will give evidence that the auto-correlation of log travel times within a trip is high and decreases with distance. Second, we show that thecorrelation of log travel times for vehicles traversing the same link at roughly the same time is not con-sistently high. These observations suggest that it is more appropriate to model the congestion level as aproperty of the individual trip, rather than as a property of the link and time. They also suggest the use of aMarkov model for Qi,k, which can capture association that decays with distance.

Our example corresponds to the first 10 links of the second route shown in Figure 4 (highway 520 West).In Figure 5, we illustrate the correlation of log travel times within and across trips on this sequence of links.The plots in the left column show that the autocorrelation of log travel times within the same trip is highand decreasing with distance. The plots in the middle column show that the correlation of log travel times isnot consistently high for pairs of distinct trips traversing the sequence of links in the same 15-minute timeperiod. Although the correlation appears high in one of the two plots in the middle column, the scatterplotsin the right column show that the travel times do not have a strong association across trips. In summary, thecongestion level experienced appears to depend on the trip, and not just the links traversed.

This effect occurs in part due to a high-occupancy vehicle (HOV) lane on this section of highway, so

13

Seattle Results

1

Prediction for 3.4 km section of 420 North, evening rush hour:

Histogram: test observations Density: prediction

Dawn Woodard, Cornell University

B

A N

driving time

Den

sity

100 200 300 400 500 600

0.00

00.

006

travel time (s)

Seattle Results

1

Prediction for 3.3 km section of 520 West, evening rush hour:



N

A

B

driving time

Den

sity

0 200 400 600 800

0.00

00.

015

travel time (s)

Seattle Results

1

Prediction for 0.9 km section of Lake City Way, evening rush hour:


N

B

A

N


driving timeD

ensi

ty

100 200 300 400

0.00

00.

015

travel time (s)

Figure 4: Three routes in the road network and their predicted travel time distributions during eveningrush hour, compared to observed travel times from the test data. Left: the routes; right: histograms of theobserved travel times and curves showing the predictive densities. Top: a 3.4 km section of highway 405North; middle: a 3.3 km route consisting of a section of 520 West followed by the exit ramp to 405 South;bottom: a 0.9 km section of northbound Lake City Way NE.

that vehicles traveling in the HOV lane experience less congestion than other vehicles. Additionally, thissection of 520 West is just before the interchange with highway 405. The traffic can be congested on 405,which can cause lane-specific congestion on 520 West extending significant distances from congested exits.Such effects from HOV lanes and congested interchanges are common throughout the road network, sothat vehicles traveling on the same link in the same time period (per GPS resolution) can experience verydifferent congestion levels depending on the choice of lane. Moreover, the choice of lane can be a functionof the intended route.

Next we summarize the model fit and parameter estimates. Figure 6 shows the distributions of theestimated travel speed parameters, across the links j in the road network. The distribution over links ofthe estimated mean log-speed parameters µj,b,q for the uncongested state (q = 2) shows two distinct peaks.These peaks correspond to highways (the right peak, at 60-65 mph) and non-highways (the left peak, at30-40 mph). The values of µj,b,q for the congested state q = 1 are typically much lower than for theuncongested state, and vary considerably across links. This supports the idea that congestion patterns varyconsiderably across links. These conclusions are consistent across time bins b, although there are somenoticeable differences in the µj,b,q across those bins. In particular, during evening rush hour the high-speedpeak is less pronouced, and the low-speed peak is shifted left, relative to nighttime. This corresponds to thefact that speeds tend to be lower during peak times.

The estimated variability σj,b,q tends to be lower in the uncongested state than in the congested state.This parameter varies more across links for the congested state than for the uncongested state, again reflect-ing differences between links in the speed characteristics associated with congestion.

The initial probability γj,b(2) of congestion varies widely among links, but is typically below 0.5. Not

14

0 1 2 3 4 5 6 7 8 9

0.0

0.2

0.4

0.6

0.8

1.0

Lag

Cor

rela

tion

0 1 2 3 4 5 6 7 8 9

0.0

0.2

0.4

0.6

0.8

1.0

Lag

Cor

rela

tion

●

●

●● ●●

●●●●

●

●

●●

●●●●●

●

●●

●●

●

●●●●●

●

●●

●●

●●

●

●

●

●

●●

●●●

●

●

●●●●

●

●●

●●

●●

●●●

●●

●●

●

●●●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●●

●●●●

●

●

●

●

●

●

●●

●

●

●●●

●

●●

●●

●●

●●

●●

●

●

●

●●

●

●

●

●

●●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●●●● ●●

● ●

●●● ●

●

●

●

●

●

●

●

●

●

● ●●

●●

●

●

●

●●

●

●●

●●

●

●

●

●

●●

●●

●●●

●

●

●●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●●

●

●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●●●

●

●

●

●

●

●

●●●

●

●

●●

●

●

●●

●●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●●

●

●

●● ●

●

●

●

●

●●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●●

●

●●

●●

●

●

●●●

●

●

●

●

●

● ●

●

● ●●

●

●

●●●●●

●●

●

●

●

●

●

●

●●

●

●●

●

●●

●●●

●●

●

●●●

●●●●

●●

●

●

●●●

●

●●●

●

●

●

●

●

●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●● ●

●●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●●●●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●●

Log(driving time) on eighth link

Log(

driv

ing

time)

on

tent

h lin

k

2 3 4 5

01

23

4

0 1 2 3 4 5 6 7 8 9

0.0

0.2

0.4

0.6

0.8

1.0

Lag

Cor

rela

tion

0 1 2 3 4 5 6 7 8 9

0.0

0.2

0.4

0.6

0.8

1.0

Lag

Cor

rela

tion

●

●●●

●

●

●●

●●●●

● ●●

●

●

●

●

●

●

●

●●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●●●

●

●●

●

●

●

●

●

●

●●

●●

●●

●●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●●

●●

●● ●

●

●●

● ●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●●● ●

●

●

●

●

●

●

●● ●

●

●

●

●●

●●

●

●

●●

●

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●

●

● ●●●

●

●

●

●●

●

● ●

●

●

●●

●●

●

●

●

●●

●

●●● ●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

● ●

●

●●

●

●

●

Log(driving time) for first trip in pairLog(

driv

ing

time)

for

seco

nd tr

ip in

pai

r

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Figure 5: Association of travel times within and between trips, on a 10-link route. Top left: Correlation oflog travel time on the first link with that on each link, within the same trip. Bottom left: Correlation of thelast link with each link, within the same trip. Top middle: Correlation of the first link with each link, forpairs of distinct trips in the same 15-minute time period. Bottom middle: Correlation of the last link witheach link, for pairs of trips in the same time period. Top right: Scatterplot for the eighth vs. tenth links,within the same trip. Bottom right: Scatterplot for pairs of trips on the tenth link in the same time period.

surprisingly, γj,b(2) tends to be higher during evening rush hour than during the night. Regardless of thetime period, the probability Γj,b of transitioning between congestion states is typically low. This corre-sponds to the fact that congestion patterns tend to affect more than one link. Additionally, the probability oftransitioning from the congested state to the uncongested state tends to be higher than the chance of transi-tioning from uncongested to congested. This is consistent with the above observation that the uncongestedstate is more common than the congested state.

The distribution of estimated trip effects log Ei is also shown in Figure 6. This distribution is roughlysymmetric about zero and typical values are in the range -0.2 to 0.2. A value of 0.2, for example, meansthat the travel speeds in trip i were exp(0.2) = 1.22 times higher than expected.

Having interpreted the model estimates, we now evaluate predictive accuracy. Figure 4 shows three ofthe most common routes in the road network: two highway routes and one arterial route. Histograms ofthe observed travel times for these routes in the test data during evening rush hour are shown, along withthe predicted probability density of evening rush hour travel time obtained from TRIP. The observed traveltimes are obtained from all trips that include the route of interest; these typically are trips that have longerroutes, but for the purpose of this figure we focus on the travel time only for the portion corresponding tothe route of interest. The predictive densities match the histograms generally well, capturing the heavy rightskew and multimodal nature of the travel times, and accurately predicting the amount of variability of thedistributions.

Next we report predictive accuracy on the entire test dataset (35,190 trips on routes throughout thenetwork). We report the accuracy of deterministic predictions in Tables 1-2, and of interval predictions inFigure 7. To obtain a deterministic prediction, we use the geometric mean of the travel time distribution.This is more appropriate than the arithmetic mean, due to the heavy right skew of travel time distributions.To obtain an interval prediction with theoretical coverage 100(1 − α) where α ∈ (0, 1), we take the lowerand upper bounds of the interval to be the 100(α/2) and 100(1 − α/2) percentiles of the predicted travel

15

0 1 2 3 4

0.0

1.0

density.default(x = params[, "mu_00"])

µj

Den

sity

evening rush hour, congestedevening rush hour, uncongestednighttime, congestednighttime, uncongested

0 1 2 3 4

0.0

0.5

1.0

1.5

density.default(x = params[, "mu_00"])

µj

Den

sity


µj,b,q

0.0 0.5 1.0 1.5

02

46

8

density.default(x = sqrt(params[, "sigma2_00"]))

σj

Den

sity


0.0 0.5 1.0 1.5

02

46

density.default(x = sqrt(params[, "sigma2_00"]))

σj

Den

sity


σj,b,q

0.0 0.2 0.4 0.6 0.8 1.0

0.0

1.0

2.0

density.default(x = params[, "init_probs_00"])

Initial probability pj(0)(2) of congestion

Den

sity

evening rush hournighttime

0.0 0.2 0.4 0.6 0.8 1.0

0.0

1.0

2.0

density.default(x = params[, "init_probs_00"])

Initial probability pj(0)(2) of congestion

Den

sity

evening rush hournighttime

Initial probability γj,b(2) of congestion

0.00 0.05 0.10 0.15 0.20 0.25

05

10

density.default(x = params[, "markov_param_010"], bw = 0.01)

Transition probability pj

Den

sity

evening rush, uncongested to con.evening rush, congested to un.nighttime, uncongested to con.nighttime, congested to un.

0.00 0.05 0.10 0.15 0.20 0.25

05

1015

density.default(x = params[, "markov_param_010"], bw = 0.01)

Transition probability pj

Den

sity

evening rush, uncongested to con.evening rush, congested to un.nighttime, uncongested to con.nighttime, congested to un.

Transition probability Γj,b

−0.6 −0.4 −0.2 0.0 0.2 0.4 0.6

02

4

density.default(x = tripEffects$Log_Trip_Effect)

Estimated Trip Effect Ei

Den

sity

Trip effect log Ei

Figure 6: Parameter and latent variable estimates from TRIP. Top left: the distribution over links j ofthe estimated mean parameters µj.b.q, for two time bins b and two congestion states q. Middle left: thedistribution over links j of the standard deviations σj.b.q, for various b and q. Top right: the distributionover j of the initial probability γj,b(2) of congestion, for various b. Middle right: the distribution over j ofthe transition probabilities Γj,b between congestion states, for various b. Bottom: the distribution of the tripeffect log Ei, over trips i in the training data.

time distribution. An interval with theoretical coverage of 100(1 − α) means that the interval is intendedto include 100(1 − α) percent of future trips. For example, the 95% interval is given by the 2.5 and 97.5percentiles of the predicted travel time distribution.

In Tables 1-2 we report the geometric mean across trips of the percentage error of deterministic predic-tions (the absolute difference between the predicted and observed travel time, divided by the observed traveltime). We also report the mean absolute error. Third, we report the bias of the deterministic prediction onthe log scale: precisely, the mean across trips of the log of the prediction minus the log of the observedtravel time. The bias does not measure how far the predictions are from the observations, so a small amountof nonzero bias is not a problem. However, adjusting the deterministic predictions to remove the bias cansometimes improve the accuracy as measured using the percentage error or the mean absolute error. Specif-ically, in Table 2 we also report those accuracy measures after multiplying the deterministic prediction bythe exponent of -1 times the log-scale bias. Such a bias adjustment is also done by Westgate et al. (2016).

To measure the accuracy of interval predictions in Figure 7, we first report their empirical coverage,meaning the percentage of test trips for which the observed travel time is inside of the predictive inter-val. If the variability is predicted correctly, the empirical coverage is close to the theoretical coverage.

16

TRIP TRIP, TRIP, no TRIP, no Clearflow Linearno trip effect Markov model dependence regression

Geometric mean of 10.1% 9.6% 10.0% 9.8% 10.4% 12.8%|predicted - actual|/actualMean absolute error (s) 121.9 119.7 121.3 120.6 124.5 145.6Bias on log scale .030 .014 .028 .024 .033 -.005

Table 1: Accuracy of deterministic predictions for the Seattle test data. Since the deterministic predictioncaptures the center of the predicted distribution, these results are similar across the TRIP variants (whichdiffer according to how they model variability).

Additionally, we report the average width of the interval. As an example, if two methods both have 95%interval predictions with 95% empirical coverage, the method with the narrower intervals is preferred. Thisis because both methods are accurately characterizing variability, but one is taking better advantage of theinformation in the data to give a narrow predicted range (Gneiting et al., 2007).

●

●

●

●

●

●

●

●

●

●

●

●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Theoretical Coverage of Predictive Interval

Em

piric

al C

over

age

on T

est D

ata

● IdealTRIPTRIP, no trip effectTRIP, no Markov modelTRIP, no dependenceLinear regressionClearflow

0.0 0.2 0.4 0.6 0.8 1.0

010

020

030

040

050

060

070

0


Ave

rage

Inte

rval

Wid

th (

s)

TRIPLinear regression

0.0 0.2 0.4 0.6 0.8 1.0

010

020

030

040

050

060

070

0


Ave

rage

Inte

rval

Wid

th (

s)

TRIP, heavily traveledLinear regression, heavily traveledTRIP, sparsely traveledLinear regression, sparsely traveled

Figure 7: Accuracy of interval predictions for the Seattle test data. Left: Coverage; Middle: Averageinterval width, for the methods that have approximately correct coverage; Right: Average interval width,broken down according to whether the links in the route are heavily traveled.

Tables 1-2 and Figure 7 compare TRIP to several alternatives. First, we compare to three simplifiedversions of TRIP that drop one or both of the types of dependence across links. One type of dependence isinduced by the trip effect Ei, so we consider dropping this term from the model. The other type is causedby the Markov model on Qi,k, so we consider replacing this with an independence model, Pr(Qi,k = q) =γRi,k,bi,k

(q) for all k ∈ {1, . . . ,Q}.Second, we compare the results to inferences from an adaptation of Clearflow (Microsoft Research,

2012). Clearflow models the distribution of travel time on each link in the network using Bayesian structuresearch and inference to integrate evidence from a large number of variables, including traffic flow sensormeasurements, speed limit, road classification, road topological distinctions, time of day, day of week, andproximity to schools, shopping malls, city centers, and parks. It was designed for accurate prediction ofmean real-time flows on all links of large metropolitan traffic networks. In practice, the Clearflow inferencesabout flows on individual links are used to guide route planning procedures that generate routes via summingthe times of the set of links of a trip between starting point and destination. For this reason Clearflow wasnot targeted at modeling the distribution of travel time on entire routes. However, we can combine Clearflowwith an assumption of independence across links, in order to produce distribution predictions on routes forthe purposes of comparison.

17

Heavily Traveled Sparsely TraveledTRIP Clearflow Linear TRIP Clearflow Linear

regression regressionGeometric mean of 9.8% 10.4% 12.7 % 13.7% 11.3% 13.7%|predicted - actual|/actualGeometric mean of 9.4% 9.6% 12.9% 11.0% 10.2% 12.4%|predicted - actual|/actualafter bias correctionMean absolute error (s) 122.5 126.9 148.4 114.8 100.6 117.7Mean absolute error 120.6 124.3 149.0 101.3 96.4 110.3after bias correction (s)Bias on log scale .019 .029 -.013 .108 .066 .077

Table 2: Accuracy of deterministic predictions, broken down according to whether the links in the route areheavily traveled.

Finally, we compare to a regression approach that models the travel time for the entire trip, as done inmethods for predicting ambulance travel times (Budge et al., 2010; Westgate et al., 2016). In particular, weuse a standard linear regression model where the outcome is the log of the trip travel time, and the predictorvariables are: (a) the log of route distance; (b) the time bin bi,1 in which the trip begins; and (c) the log of thetravel time according to the speed limit. We include an interaction term between (b) and (c), meaning thatthe linear slope for (c) is allowed to depend on the value of (b). The assumptions of the linear regressionmodel hold approximately; for example, scatterplots of the log travel time and the variables (a) and (c) showan approximately linear relationship.

As seen in Table 1, the predictions differ from the actual values by 9.6-10.4% for all the methods exceptlinear regression, which has error of 12.8%. The same conclusions hold when considering mean absoluteerror instead of percentage error. The accuracy of TRIP is slightly better than Clearflow overall, despitethe fact that Clearflow harnesses a larger number of variables than those considered by TRIP in its currentform. The bias is small for all of the methods (the largest being .033, which corresponds to a factor ofexp{.033} = 1.034 in travel time, i.e. a bias of 3.4%).

In Table 2 we break down the accuracy of deterministic predictions according to whether the links inthe route are heavily traveled; this is defined to mean that over half of the links in the route have at least 30observations in the training data. Out of the 35,190 trips in the test data, all but 3,197 are on such heavilytraveled routes. The accuracy of TRIP is considerably better than linear regression on heavily traveledroutes, and the same or better than linear regression on the sparsely traveled routes. TRIP performs slightlybetter than Clearflow on heavily traveled routes, whereas on sparsely traveled routes Clearflow is moreaccurate. We believe that this is due to the fact that Clearflow uses a large number of variables describingthe link, which provide relevant information for predicting the speed. The bias is larger for the sparselytraveled routes than for the heavily traveled ones, so we provide accuracy results both before and after biascorrection; the qualitative conclusions are unaffected by this adjustment.

Figure 7 shows that, out of the methods considered, only TRIP and linear regression have predictiveinterval coverage that is close to correct. The other methods have dramatically lower coverage, Clearflow’scoverage of 95% intervals being 69.8%. Clearflow and the simplest version of TRIP have similar coveragebecause they both assume independence of travel time across the links of a trip. For the simplified versionsof TRIP that incorporate one out of the two kinds of dependence, the coverage is much better than thatof the methods that assume independence, but still well below the desired value. Figure 7 also showsthat the interval predictions from TRIP are 19-21% narrower on average than those from linear regression,

18

providing evidence that TRIP is substantially better for interval prediction. Finally, Figure 7 shows thatTRIP’s advantage relative to linear regression in interval prediction is greatest on heavily traveled routes;on sparsely traveled routes the performance of the two methods in interval prediction is similar.

The fact that TRIP provides little benefit relative to linear regression on the sparsest parts of the roadnetwork (which account for a small proportion of trips in the mobile phone data) is not surprising sincelinear regression is a simpler model with fewer parameters. This conclusion is consistent with previousresults for emergency vehicles (Westgate et al., 2016). Indeed, regression-style approaches are typicallyused for emergency vehicles, due to the sparsity of data.

5 Conclusions

We have introduced a method (TRIP) for predicting the probability distribution of travel time on arbitraryroutes in the road network, at arbitrary times. We evaluated TRIP on a case study using mobile phonedata from the Seattle metropolitan region. Based on a complexity analysis and on the running time of thealgorithms for this case study, we argue that TRIP is computationally feasible for the continental-scale roadnetworks and high-volume data of commercial mapping services.

TRIP’s deterministic predictions are more accurate on heavily traveled routes, although slightly lessaccurate on sparsely traveled routes, than Microsoft’s commercially fielded system (Clearflow). The intervalpredictions from TRIP are much better. Clearflow’s consideration of flows on a segment by segment basisis valuable for use with current route planning procedures, which consider the road speeds on separatesegments in building routes. However, such independent handling of inferences about segments can lead tounderprediction of route-specific variability. TRIP solves this issue by accurately capturing dependencies intravel time across the links of the trip. Although a linear regression approach yields reasonable accuracy ofinterval predictions, it gives worse deterministic predictions than TRIP. To our knowledge TRIP is the firstmethod to provide accurate predictions of travel time reliability for complete, large-scale road networks.

Future work includes extending TRIP to incorporate additional variables, including those used in Clearflowlearning and inference. For example, this would allow TRIP to take into account real-time informationabout traffic conditions, as measured using data from sensors installed in highways, or average measuredGPS speeds from mobile phones during the current time period. This extension has the potential to providenarrower distribution forecasts and predictive intervals, and even more accurate deterministic estimates.

There is also opportunity to employ active information gathering methods to guide both selective real-time sensing of different portions of a road network and the bulk collection of data to reduce uncertaintyabout the flows over segments and routes. There has been related prior work on the use of active sensingfor reducing uncertainty about the travel time on segments in a demand-weighted manner (Krause et al.,2008). The work considers the probabilistic dependencies across a city-wide traffic network and the valueof sensing from different regions for reducing uncertainty across the entire road network. We foresee theuse of similar methods in combination with TRIP to guide the optimal collection of data.

Acknowledgements

This work was supported by Microsoft Research.

19

References

Besag, J. On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, Series B, 48:259–302, 1986.

Bilmes, J. A. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaus-sian mixture and hidden Markov models. Technical Report TR-97-021, International Computer ScienceInstitute, 1997.

Budge, S., Ingolfsson, A., and Zerom, D. Empirical analysis of ambulance travel times: The case of Calgaryemergency medical services. Management Science, 56:716–723, 2010.

Cousineau, D. and Helie, S. Improving maximum likelihood estimation with prior probabilities: A tuto-rial on maximum a posteriori estimation and an examination of the Weibull distribution. Tutorials onQuantitative Methods for Psychology, 9(2):61–71, 2013.

Delling, D., Goldberg, A. V., Pajor, T., and Werneck, R. F. Customizable route planning in road networks.Transportation Science, 2015. In press.

Erkut, E., Ingolfsson, A., and Erdogan, G. Ambulance location for maximum survival. Naval ResearchLogistics (NRL), 55:42–58, 2007.

Gelman, A. Prior distributions for variance parameters in hierarchical models (comment on article by Brownand Draper). Bayesian Analysis, 1:515–534, 2006.

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. Bayesian Data Analysis.CRC Press, Boca Raton, FL, 3rd edition, 2013.

Gneiting, T., Balabdaoui, F., and Raftery, A. E. Probabilistic forecasts, calibration, and sharpness. Journalof the Royal Statistical Society, Series B, 69:243–268, 2007.

Grava, C., Gacsadi, A., Gordan, C., Grava, A.-M., and Gavrilut, I. Applications of the Iterated ConditionalModes algorithm for motion estimation in medical image sequences. In International Symposium onSignals, Circuits, and Systems. IEEE, 2007.

Hofleitner, A., Herring, R., Abbeel, P., and Bayen, A. Learning the dynamics of arterial traffic from probedata using a dynamic Bayesian network. IEEE Transactions on Intelligent Transportation Systems, 13:1679–1693, 2012a.

Hofleitner, A., Herring, R., and Bayen, A. Arterial travel time forecast with streaming data: A hybridapproach of flow modeling and machine learning. Transportation Research Part B, 46:1097–1122, 2012b.

Horvitz, E., Apacible, J., Sarin, R., and Liao, L. Prediction, expectation, and surprise: Methods, designs,and study of a deployed traffic forecasting service. In Proceedings of the Conference on Uncertainty inArtificial Intelligence, pp. 275–283, Arlington, Virginia, 2005. AUAI Press.

Hunter, T., Herring, R., Abbeel, P., and Bayen, A. Path and travel time inference from GPS probe vehicledata. In Neural Information Processing Systems, Vancouver, Canada, December 2009.

Hunter, T., Das, T., Zaharia, M., Abbeel, P., and Bayen, A. Large-scale estimation in cyberphysical systemsusing streaming data: A case study with arterial traffic estimation. IEEE Transactions on AutomationScience and Engineering, 10:884–898, 2013a.

20

Hunter, T., Hofleitner, A., Reilly, J., Krichene, W., Thai, J., Kouvelas, A., Abbeel, P., and Bayen, A.Arriving on time: Estimating travel time distributions on large-scale road networks. Technical ReportarXiv:1302.6617, arXiv, 2013b.

Hunter, T., Abbeel, P., and Bayen, A. M. The path inference filter: Model-based low-latency map matchingof probe vehicle data. IEEE Transactions on Intelligent Transportation Systems, 15(2):507–529, 2014.

James, G., Witten, D., Hastie, T., and Tibshirani, R. An Introduction to Statistical Learning with Applica-tions in R. Springer, New York, 2013.

Jenelius, E. The value of travel time variability with trip chains, flexible scheduling, and correlated traveltimes. Transportation Research Part B, 46:762–780, 2012.

Jenelius, E. and Koutsopoulos, H. N. Travel time estimation for urban road networks using low frequencyprobe vehicle data. Transportation Research Part B, 53:64–81, 2013.

Jenelius, E. and Koutsopoulos, H. N. Probe vehicle data sampled by time or space: Implications for traveltime allocation and estimation. Transportation Research Part B, 71:120–137, 2015.

Krause, A., Horvitz, E., Kansal, A., and Zhao, F. Toward community sensing. In Proceedings of ISPN 2008,International Conference on Information Processing in Sensor Networks, St. Louis, Missouri, 2008.

Lee, S. X. and McLachlan, G. J. EMMIXuskew: An R package for fitting mixtures of multivariate skew tdistributions via the EM algorithm. Journal of Statistical Software, 55(12):1–22, 2013.

Liu, C. ML estimation of the multivariate t distribution and the EM algorithm. Journal of MultivariateAnalysis, 63:296–312, 1997.

Meng, X.-L. and Rubin, D. B. Maximum likelihood via the ECM algorithm: A general framework.Biometrika, 80:267–278, 1993.

Microsoft Research. Predictive analytics for traffic: Machine learning and intelligence for sensing, infer-ring, and forecasting traffic flows. http://research.microsoft.com/en-us/projects/clearflow, 2012.

Newson, P. and Krumm, J. Hidden Markov map matching through noise and sparseness. In 17th ACMSIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 336–343,Seattle, WA, November 2009. Association for Computing Machinery.

Rahmani, M., Jenelius, E., and Koutsopoulos, H. N. Non-parametric estimation of route travel time distri-butions from low-frequency floating car data. Transportation Research Part C, 58:343–362, 2015.

Ramezani, M. and Geroliminis, N. On the estimation of arterial route travel time distribution with Markovchains. Transportation Research Part B, 46:1576–1590, 2012.

Russell, S. and Norvig, P. Artificial Intelligence: A Modern Approach. Pearson Education, U.K., 3rdedition, 2009.

Samaranayake, S., Blandin, S., and Bayen, A. A tractable class of algorithms for reliable routing in stochas-tic networks. Transportation Research Part C, 20:199–217, 2012.

21

http://research.microsoft.com/en-us/projects/clearflow

http://research.microsoft.com/en-us/projects/clearflow

Texas Transportation Institute. Travel time reliability: Making it there on time, every time. Technical report,U.S. Department of Transportation, 2015. http://www.ops.fhwa.dot.gov/publications/tt_reliability/index.htm.

Westgate, B. S., Woodard, D. B., Matteson, D. S., and Henderson, S. G. Travel time estimation for ambu-lances using Bayesian data augmentation. Annals of Applied Statistics, 7:1139–1161, 2013.

Westgate, B. S., Woodard, D. B., Matteson, D. S., and Henderson, S. G. Large-network travel time dis-tribution estimation, with application to ambulance fleet management. In Press, European Journal ofOperational Research, 2016.

Woodard, D. B. and Rosenthal, J. S. Convergence rate of Markov chain methods for genomic motif discov-ery. Annals of Statistics, 41:91–124, 2013.

Work, D. B., Blandin, S., Tossavainen, O.-P., Piccoli, B., and Bayen, A. M. A traffic model for velocitydata assimilation. Applied Mathematics Research eXpress, 2010(1):1–35, 2010.

22

http://www.ops.fhwa.dot.gov/publications/tt_reliability/index.htm

http://www.ops.fhwa.dot.gov/publications/tt_reliability/index.htm

Predicting Travel Time Reliability using Mobile Phone GPS Data€¦ · travel time reliability to a...

Documents

Transcript of Predicting Travel Time Reliability using Mobile Phone GPS Data€¦ · travel time reliability to a...