Estimating Through Trip Travel without External Surveysceprofs.tamu.edu/mburris/Papers/Through Trip...

22
Estimating Through Trip Travel without External Surveys Eric S. Talbot Resource Systems Group, Inc. 55 Railroad Row White River Junction, VT 05001 Ph: 802-295-4999 Fax: 802-295-1006 [email protected] Mark W. Burris, Ph.D, P.E. E.B. Snead Associate Professor Zachry Department of Civil Engineering Texas A&M University 3136 TAMU College Station, TX 77843-3136 Ph: 979-845-9875 Fax: 979-845-6481 Corresponding author: Mark W. Burris, [email protected] And Steve Farnsworth Associate Research Scientist Texas Transportation Institute 3135 TAMU College Station, TX 77843-3135 Ph: (979) 862-4927 Fax: (979) 845-7548 [email protected] Paper Submitted for Publication and Presentation at the Transportation Research Board Annual Meeting, 2011 July 2010 Words: 5746 Text + [(4 Tables + 3 Figures) × 250 words] = 7496 words.

Transcript of Estimating Through Trip Travel without External Surveysceprofs.tamu.edu/mburris/Papers/Through Trip...

  • Estimating Through Trip Travel without External Surveys Eric S. Talbot Resource Systems Group, Inc. 55 Railroad Row White River Junction, VT 05001 Ph: 802-295-4999 Fax: 802-295-1006 [email protected] Mark W. Burris, Ph.D, P.E. E.B. Snead Associate Professor Zachry Department of Civil Engineering Texas A&M University 3136 TAMU College Station, TX 77843-3136 Ph: 979-845-9875 Fax: 979-845-6481 Corresponding author: Mark W. Burris, [email protected] And Steve Farnsworth Associate Research Scientist Texas Transportation Institute 3135 TAMU College Station, TX 77843-3135 Ph: (979) 862-4927 Fax: (979) 845-7548 [email protected] Paper Submitted for Publication and Presentation at the Transportation Research Board Annual Meeting, 2011 July 2010 Words: 5746 Text + [(4 Tables + 3 Figures) × 250 words] = 7496 words.

  • Talbot, Burris and Farnsworth 1

    ABSTRACT

    Since through trips can be a significant portion of travel in a study area, estimating them

    is an important part of travel demand modeling. In the past, through trips have often been estimated using external surveys. Recently, external surveys were suspended in Texas, so Texas transportation planners need a way to estimate through trips without using external surveys. Previous research has focused on study areas with a population of less than 200,000, but many Texas study areas have a population of more than 200,000. This research developed a set of two logit models to estimate through trips for a wide range of study area sizes. The first model estimates the portion of all trips at an external station that are through trips. The second model estimates the external stations where the trips originated. The models produce separate results for commercial and non-commercial vehicles, and these results can be used to develop through trip tables. For predictor variables, the models use results from a simple gravity model; the average daily traffic (ADT) at each external station as a proportion of the total ADT at all external stations; the number of turns on the routes between external station pairs; and whether the route passes through the study area and does not pass through any other external stations. Evaluations of the performance of the models showed that the predictions fit the observations reasonably well, indicating that the models can be useful for practical applications.

  • Talbot, Burris and Farnsworth 2

    INTRODUCTION Through trips are an important part of travel demand modeling. To develop and calibrate the through-trip component of travel demand models, Metropolitan Planning Organizations (MPOs) use data from external surveys, which gather information from travelers entering or leaving the study area. In the past, transportation planners generally conducted external surveys using the roadside interview technique at locations where traffic enters and exits the study area (external stations).

    Although the roadside interview method is an effective way to collect information on through trip travel patterns, it also has potential drawbacks. First, the roadside interview method may create an unsafe situation for drivers because they may not expect to encounter stopped or slowed traffic at the location where the survey is conducted. Second, roadside interviews cause delays for drivers who are surveyed, and may also cause delays for drivers who are not surveyed. Third, some drivers consider being stopped for a survey as an invasion of their privacy. Fourth, the roadside interview method is expensive, with the cost for a complete set of external surveys for a study area usually exceeding $100,000 (Texas Transportation Institute, unpublished internal reports, 2001-2006).

    For some or all of these reasons, in early 2008 the Texas Department of Transportation suspended external surveys throughout the state. Texas MPOs now need a way to estimate through trips without recent external survey data. Previous research efforts developed models for estimating through trip patterns without external survey data. However most of these models focused on study areas with less than 50,000 people, while MPO study areas have more than 50,000 people. In addition, most of the previous research used linear regression, which may not be the best approach because the variable of interest is a proportion, rather than a continuous number.

    To improve on these previous models, this research developed a system of two models to estimate through trips without the benefit of external survey data. Each model was developed using logistic regression, which is appropriate for data where the responses are proportions. The first model is a binary logit model which estimates the proportion of trips exiting the study area at an external station that are through trips. The second model is a multinomial logit model which distributes these through trips between all the external stations where a through trip could have entered the study area. These models were developed for study areas with 50,000 to 6 million people. Model evaluation shows that these models perform well and will be useful for estimating through travel.

    LITERATURE REVIEW Modlin developed a set of multiple linear regression equations to estimate through trips for small study areas (study areas with less than 50,000 people). He used the roadway functional classification, average daily traffic, percent trucks, and percent pickups and panel vans at each external station, and the population of the study area as explanatory variables. Modlin's model has two stages. The first stage estimates the percent of all trips at each external station that are through trips. The second stage distributes the through trips between external stations (1). A few years later, Pigman published a set of similar equations for small study areas (2), and Modlin followed up with a new set of regression equations. This second set of equations was similar to his first, but also included route continuity, which is a binary variable signifying whether or not two external stations are on the same highway route (3).

  • Talbot, Burris and Farnsworth 3

    The Modlin and Pigman models considered the study area in isolation from the rest of the world. More recent research has worked to develop better models by incorporating the geographic and economic context of the study area into the model.

    Anderson reported on the results of an evaluation of the effectiveness of simple gravity models for estimating through trips. Anderson tested the three models using a city in Iowa with four external stations and a population of 8,500 (4). Anderson later applied one of the models to three small cities in Alabama (5).

    Horowitz developed a model which assigns a “catchment” area from the region outside of the study area to each external station, and calculates a weight factor for trips between two external stations by calculating the probability that a line connecting two points within their catchment areas passes through the study area, or crosses a barrier to travel between catchment areas. These weight factors are then used to estimate through trips using the procedure outlined in the first Quick Response Freight Manual (6,7).

    Han combined the work of Modlin, Anderson and Horowitz. Like the Modlin model, Han's model is a set of regression equations, and uses information about the traffic and roadway at the external station, and about the study area, as explanatory variables. However, the Han model also includes a simple gravity model and Horowitz weights as explanatory variables. Han's model was developed for small and medium sized study areas with up to 200,000 people (8,9).

    Anderson also developed Modlin-like regression equations, and included a variable to signify the presence of a near-by major city (10).

    Martchouk and Fricker recently proposed modeling through trips using logistic regression rather than linear regression, as all previous regression-based models had done. They developed two alternative multinomial logit models which each predict the percent through trips and the distribution of through trips between external stations. The authors also tested a nested logit model, but found the nested structure unnecessary. These models were developed using results from regional model subarea analyses for 15 small urban areas in Indiana. At least one of the models was tested using results from license plate origin-destination surveys in two urban areas in Indiana. The model performed better than Modlin’s model, Anderson’s model, and subarea analysis for one urban area, but performed worse than Modlin’s and Anderson’s models for the second urban area. The models use the average daily traffic at each external station and whether the route between two external stations is continuous as predictor variables. These models provided the basis for the modeling approach for this research (11).

    These efforts yielded insight into the variables, characteristics and methods that may prove useful in estimating through trips in larger urban areas. Therefore, many of the variables discussed above were investigated for their potential use in the models developed here.

    DATA COLLECTION Model development required a significant amount of data, including external survey data, traffic data, roadway data, demographic data, interaction score data, and measures of external station separation. The external survey data was used as the source for observed through trip data, and it came from external surveys performed in 13 study areas in Texas between 2001 and 2006, listed in Table 1. Average values of key variables at these external stations are included in Table 2. Other data were used as sources for the through trip predictor variables that were considered for inclusion in the models. These predictor variable data needed to be available without the use of an external survey.

  • Talbot, Burris and Farnsworth 4

    Traffic data consisted of information on the average daily traffic and the proportion of large vehicles at each external station. This data was obtained from automated vehicle classification counts at each of the external stations. This data was used as a potential predictor, but is also necessary for expanding the model results to represent measured volumes. Roadway data included the number of lanes at each external station, whether the road at the external station was divided, and whether the road at the external station was a limited-access facility. The roadway data came from examining satellite imagery. The demographic data came from the U.S. Census Bureau and the U.S. Bureau of Labor Statistics, and included the population, employment, average income and surface area of the study area.

    The interaction score quantifies the volume of traffic between urban areas, then assigns that estimated quantity to the appropriate study area external stations. The data for calculating the interaction score came from the U.S. Census Bureau, which publishes population estimates for each Census urban area and urban cluster, as well as provides a GIS file with polygons for all urban areas and clusters throughout the United States (12). Least time routes between pairs of urban area centroids and between centroid-external station pairs were extracted from a web routing service and analyzed programmatically to determine which of the study area external stations, if any, the route passes through.

    The interaction score is defined by

    ijINT

    },{2610wvUv Uw

    vwijvw

    wv fDPP

    (1)

    and

    jINTLCL

    },{2610wvUv Uw

    vwjvw

    wv gDPP (2)

    where

    ijINT the through interaction score for entry external station i and exit external

    station j

    jINTLCL the local interaction score for exit external station j

    U the set of each U.S. Census Bureau urban area and urban cluster which has

    its centroid within the study area; or has its centroid within 50 miles of the

    study area boundary; or has a population of at least 50,000 people and has

    its centroid within 250 miles of the urban area boundary

    wv, indices for the urban areas in the set of urban areas U

    wv PP , the populations of v and w

  • Talbot, Burris and Farnsworth 5

    vwD the non-congested least time route distance in miles from the centroid of v

    to the centroid of w

    vwijf a binary variable which is 1 if the non-congested least time route from the

    centroid of v to the centroid of w passes through external stations i and j ,

    and if the route segment between i and j is valid. The route segment is

    considered valid if (1) it passes through i before j ; and (2) it passes

    through the study area; and (3) it crosses the study area boundary only at i

    and j . Otherwise, the variable is 0.

    vwjg a binary variable which is 1 if the non-congested least time route from the

    centroid of v to the centroid of w passes through external station j , and if

    the centroid of v is inside the study area; and if the route segment between

    v and j is valid. The route segment is considered valid if it crosses the

    study area boundary only at j . Otherwise, the variable is 0.

    The model development process tested several variables which were based on the interaction score. The variables that were included in the final model are PINTTHj, INTTL1j, and PINT1ij. The variable jPINTTH is the proportion through trips at the external station as predicted by the interaction score, and it is defined by

    jPINTTH =

    },{

    },{

    jqEqqjj

    jqEqqj

    INTINTLCL

    INT (3)

    where

    E = the set of all external stations in the study area

    j = an index for the external station for which the estimation is being made

    q = an index for each of the other external stations where through trips can enter

    the study area

    If the denominator of this fraction is zero, then the value of jPINTTH is defined to be zero.

    The variable jINTTL1 indicates whether the external station j has any interaction scores, and it is defined by

  • Talbot, Burris and Farnsworth 6

    jINTTL1 { , }

    { , }

    1, if 0

    0, if 0

    j qjq E q j

    j qjq E q j

    INTLCL INT

    INTLCL INT

    (4)

    The variable PINT1ij indicates whether the external station j has any interaction scores with a route that enters the study area at external station i, and it is defined by

    ijPINT1 { , }1, if 0 and 0

    0, otherwise

    ij qjq E q j

    INT INT

    (5)

    The measures of external station separation were characteristics of least-time routes

    between external station pairs and between urban area centroid-external station pairs. These characteristics included the number of turns and freeway ramps on the route, the route distance and travel time, and whether or not the route passes through any external stations at any point besides the beginning and ending of the route (route validity). The mean and standard deviations for these measures of external station pair separation in the data set, along with those of the small and large vehicle ADT’s at the external stations are presented in Table 2.

    METHOD This research developed a system of two models for estimating through trips in study areas using data that can be obtained with external surveys of drivers. The model development process for each model followed the same general outline:

    1. Choose a subset of the candidate predictor variables using forward selection to form the preliminary model

    2. Evaluate the preliminary model using model diagnostics 3. Choose a subset of the variables from the preliminary model to form a final model 4. Refine the final model using variable interactions and transformations

    The preliminary variable selection was made using forward selection (as described by Hosmer and Lemeshow [13]). Usually, the forward selection process continues until no variable can be added with a p-value smaller than some pre-specified value, such as 0.05 or 0.01. However, preliminary analysis showed that following such a rule would result in selecting most of the candidate variables, and such a large model is not desirable, since a large model would have higher data collection costs than a smaller model.

    To help limit the number of variables selected, the model development process used the Akaike information criterion (AIC)(13); the Bayesian information criterion (BIC)(13); adjusted rho-square with respect to the constants only model ( 2C )(14) (for the through trip split model) or adjusted rho-square with respect to the completely random model ( 20 )(14) (for the through trip distribution model); and root mean square error (RMSE) of the observed proportions compared to the estimated proportions. Normally, the model with the best value of one of these criteria would be chosen. However, preliminary analysis showed that following this rule would result in using most of the predictor variables. Rather than the absolute value of each criterion, the rate of change of each criterion was used as a guide for forward selection. With this rule, a

  • Talbot, Burris and Farnsworth 7

    significant decrease in the rate of improvement of the criteria would suggest ending forward selection.

    Before continuing to the final variable selection, the model development process checked each observation using model diagnostics. For the through trip split model the diagnostics Δχ2m, ΔDm, and Δβm as defined by Hosmer and Lemeshow (13) were used. For the through trip distribution model the diagnostic process consisted of checking and validating observations with extremely low probabilities as estimated by the fitted model. This process was recommended by Ben-Akiva and Lerman (15), since more formal diagnostic procedures for validating these observations are not as clearly documented for multinomial models as for binary models (13).

    The final variable selection was based on the following experiment: Take a random group of study areas from the set of all 13 study areas, then fit a model using the results from the external surveys for that group of study areas. Repeat this a number of times and then compare the parameter estimates from each repetition. Variables whose parameter estimates change relatively little between repetitions of the experiment are better predictors than variables whose parameter estimates change a great deal.

    The results from this approach roughly give some of the same information as the standard error, since both measure the variability of the parameter estimates. However, this approach has the advantage that it measures parameter estimate variability by sampling whole study areas at a time, which gives confidence that the results can be extended to an entirely new study area.

    To carry out the experiment, the set of 13 study areas was randomly divided into four groups. For each new model, one group of study areas was removed from the dataset, and a model was fit to the remaining data, to produce four sets of parameter estimates for the smaller number of cities. Then the relative change in the parameter estimates from the parameter estimates for the complete dataset was calculated.

    The final step of the model development process was to check the assumptions that the continuous variables are linear in the logit, and that the effect of each variable does not vary across the levels of any of the other variables. To test the first assumption the model development process used the logit step test, as described by Hosmer and Lemeshow (13). To test the second assumption, the model development process compared the final model to a model with added interactions.

    MODEL DEVELOPMENT

    Through Trip Split Model Development

    In the forward selection process for the through trip split (percentage of through trips at an external station) model each criterion improved quickly up to the second variable, where the rate of improvement of the criteria slowed significantly, suggesting that the model with the first two variables is the best model (see Table 3). However, to allow for the possibility that additional variables would be important to the model, variable selection continued to six variables.

    The model diagnostics process for the preliminary model revealed that three observations had especially poor values of Δχ2m and ΔDm, and that one of these three had an especially poor value of Δβm compared to the other observations. An investigation revealed no data errors, and showed that the observations are plausible. Attempts to find a variable which would improve the fit of the model for these points while not over-fitting the model were unsuccessful.

  • Talbot, Burris and Farnsworth 8

    In the final variable selection procedure, the first two variables from forward selection have significantly smaller changes than do the last four, confirming the hypothesis that the smaller two variable model was the better model. One of these two variables is PINTTHj, which is the interaction score for through trips as a portion of the interaction score for all trips. However, some of the external stations have no interaction scores at all. For these external stations the portion is not defined, and the variable is set to be zero. Thus, the meaning of a zero value for this variable is ambiguous, because it could mean that the external station has no interactions at all, or it could mean that the external station has interactions, but none of them are interactions for through trips. To allow the model to distinguish between these two situations, the variable INTTL1j was added to the model. This variable is 0 when the external station has no interaction scores, and is 1 when the external station has any interaction score. Thus it serves as an adjustment to the model to distinguish between the two cases when PINTTHj is zero.

    The second of the two variables ISCVi is a binary variable indicating whether the estimation is for commercial or non-commercial vehicles. It is equal to one when the estimate is for commercial vehicles, and is equal to zero when the estimate is for non-commercial vehicles. This variable allows the model to make separate predictions for commercial and non-commercial vehicles.

    Investigation of the model assumptions showed that the model assumptions were valid, so no interactions or transformations were applied. For details on the through trip split model development process, see Table 3.

    The final through trip split model is

    )1( YPj = exp[ ( )]

    1 exp[ ( )]j

    j

    ggx

    x (6)

    where

    j = indexes the external station for which the estimation is being made

    Y = a variable for the coded response, where Y=1 for a through trip response

    and Y=0 for a non-through trip response

    1YPj = the probability of a through trip at external station j

    )( jg x = jj INTTL1PINTTH 235.038.278.1 for commercial vehicles

    )( jg x = 2.94 2.38 0.235j jPINTTH INTTL1 for non-commercial vehicle,

    jPINTTH = the proportion of through trips at the external station as predicted by the

    interaction score (as defined in Equation 3)

    jINTTL1 = a binary variable which is 1 if the external station has any interaction

    scores, and is 0 if it has no interaction scores (as defined in Equation 4)

  • Talbot, Burris and Farnsworth 9

    Through Trip Distribution Model Development

    In the forward selection process for the through trip distribution (estimating the origin station for each trip at each exit station) model, each of the criteria improved significantly from the completely random model to the model with 1 variable, as expected. For the models with one or more variables, the rate of improvement of the criteria slows after the model with 3 variables. Forward selection stopped at seven variables (see Table 4).

    In the final variable selection process, the changes in parameter estimates for four of the variables (the first, second, third, and fifth variables from forward selection) were equal to or less than 22 percent, whereas the other three parameters change by at least 29 percent at least once. The four variables with the smaller parameter estimate changes were retained to form the final model.

    The model development process then tested the assumptions that the continuous variables are linear in the logit, resulting in variable transformations for two of the four variables in the final model. The model refinement for the through trip distribution model did not include investigating variable interactions. Interactions related to whether the vehicle is a commercial vehicle or not are probably the only potentially useful interactions, but the non-commercial and commercial data was combined to allow the through trip distributions to be based on more observations.

    For details on the through trip distribution model development process, see Table 4.

    The final through trip distribution model is

    sYPj SSjjq

    qj

    sj

    gg

    ,1,,1,1,,2,1)](exp[

    )](exp[

    xx

    (7)

    where

    s = an index for an external station where a through trip can enter the study area

    j = an index for the external station for which the estimation is made (the

    survey external station or the external station where the through trip exited

    the study area)

    sYPj = the probability that a through trip that exits the study area at external station

    j entered the study area at external station s

    )( ijg x ijijijij ROUTEPADTLGTURN02PINT1 814.0572.009.124.2

    qi, = indexes for the external station where through trips entered the study area

    ijx the vector of predictor variables for survey external station j paired with

    entry external station i

    ijPINT1 A binary variable which is 1 if the external station j has any interaction

    scores with a route that enters at external station i (as defined in Equation 5)

  • Talbot, Burris and Farnsworth 10

    ijTURN02 the square root of the number of turns on the least time route between

    external station i and external station j, where a turn is any movement at an

    intersection besides the main through movement

    ijPADTLG the natural logarithm of the ADT at external station i as a portion of the

    total ADT across all possible entry external stations

    ijROUTE a binary variable which is 1 when the least time route between external

    station i and external station j is valid, and is 0 otherwise. A route is

    considered valid when (1) it does not pass through an external station at any

    point other than the route start and end, and (2) the route passes through the

    study area. That a least time route is invalid does not indicate that a through

    trip between a pair of external stations is not possible, since travelers may

    follow a non-least time route

    MODEL EVALUATION The goal of this research was to develop a model that could be applied with reasonably accurate results to Texas study areas, including study areas that are not in the dataset for this research. The model evaluation simulated applying the model to new study areas using cross validation. A random group of study areas from the 13 study areas in the data set was chosen, and the model parameters were estimated using the data from this study area group. The models with these parameter estimates were then tested by applying them to the remaining study areas to estimate through trips, and comparing the results from this estimation to the observed through trips. This process was repeated four times, so that each study area was part of the group used to estimate the model parameters once.

    The prediction error was calculated to allow comparison of the estimated and observed through trips. The prediction error is defined to be the predicted percentage of through trips minus the observed percentage of through trips. Positive values of the error indicate that the prediction is greater than the observation, and negative values of the error indicate that the prediction is less than the observation. The predicted values were based on outputs from Equations 6 and 7, while actual results were from all 13 survey data sets.

    The prediction errors were ordered by the predicted percentage of through trips, then separated into four intervals with very nearly the same number of observations in each. Then, a box plot of the prediction error was made for each of the four intervals (see Figures 1, 2, and 3). The bottom and the top of each box respectively represent the 25th percentile and 75th percentile of the error distribution. The heavy line inside each box represents the median of the error. Each “whisker” either extends to the most extreme prediction error, or is as long as 1.5 times the difference between the 25th and 75th percentiles. Any prediction errors falling outside this range are plotted as small circles.

    For the through trip distribution model, two sets of box plots were created: one for cases where the observation was equal to zero, and one where the observation was greater than zero.

  • Talbot, Burris and Farnsworth 11

    These box plots are presented in Figure 1 and Figure 2. Approximately 86 percent of all external station pairs had no observed through trips. That is, no vehicles were observed entering the study area at location i and exiting at j. This is not surprising considering that in a large study area, such as the Longview area, there can be 30 or more survey external stations and thus more than 900 potential through trip combinations. Even with more than 8000 interviews of through travelers, it is likely that many of the 900 combinations will not have had a trip.

    The box plots indicate that the predictions fit the observations fairly well. The center of the distribution suggested by each box plot is close to zero, and in general small errors are more likely than more extreme errors. The box plots also show that prediction errors increase as the value of the prediction becomes larger, indicating that the models perform better when predicting low through trip splits and distributions than when predicting high through trip splits and distributions.

    DISCUSSION Unlike many previous models, the through trip split model does not include the average daily traffic at the external station, percent trucks, or study area population as predictor variables. These variables were important in previous models but not important in the through trip split model developed in this research because of differences in the way that study area boundaries were defined. In previous research, most or all of the study areas could be visualized as a wheel, with urban area forming the “hub” of the wheel, highways forming the “spokes” of the wheel, and the study area boundary forming the wheel “rim”. For these study areas, all of the external stations were on highways that provided direct access to a single, central urban area, with the result that most trips passing through an external station were coming from or going to that urban area.

    The Texas study areas used in this research were different than most of the study areas used in previous research in that many of the study area external stations were on highways that had little to do with any main urban area. The study area boundaries almost always followed county lines. Thus, even though the study area encompassed the entire urban area, there were many instances of highways “cutting through the corner” of the study area, forming pairs of external stations that were close together on the same route. For these pairs of external stations, the percentage of through trips was very high, regardless of the ADT at the external stations, the percent trucks, or the population of the study area. Therefore, these variables were not robust predictors for the Texas study areas in this research.

    Instead of the variables used in previous research, the through trip split model relies on variables based on the interaction score. Unlike the traffic, roadway, and study area variables, the interaction score does not place the study area on an “island”, separated from its regional context. Rather, it uses detailed information about the location of the study area boundary and external stations in relation to the location of urban areas within and surrounding the study area, and the locations of routes connecting study areas. Because the interaction score uses this detailed information about the regional context of the study area, it’s robustness does not depend on how the study area boundary is defined; in fact, the interaction score may even function well for a study area of random size, shape, and location, as long as the study area boundary does not pass through any urban area. Thus the model can reasonably be applied to other study areas outside of Texas.

    A unique aspect of the through trip split model is that it makes two separate predictions: one for each vehicle type. All previous models made a single prediction for all vehicle types. The separate predictions are very useful to the through trip modeling process, since commercial

  • Talbot, Burris and Farnsworth 12

    vehicles and non-commercial vehicles are different in the traffic flow, pavement, and air quality demands they place on the transportation system. Knowing the proportion of through trips for each type individually helps to have a more accurate understanding of transportation system needs.

    The number of turns between entry and exit external stations has not been tested in previous research, but it proves to be important for the through trip distribution model. This is interesting since the number of turns provides little information about values normally used to distribute trips, such as trip duration and distance. The estimated coefficient is negative, indicating that through trips with few turns are more likely than through trips with many turns. Like the number of turns, the validity of the least time route between entry and exit external stations is new in this research. The estimated coefficient is positive, which indicates that through trips which follow a least-time route are more likely than through trips which do not follow a least-time route. This is expected because drivers normally try to minimize travel time.

    CONCLUSION This research developed a system of two models for estimating through trips when data from an external survey is not available. The data collection requirements for the models are minimal; all of the variables can be obtained from network maps, Census demographic data, and traffic counts. The models make separate predictions for commercial and non-commercial vehicles with reasonable accuracy, and will be useful for estimating through trips in large urban areas.

    ACKNOWLEDGEMENTS & DISCLAIMER The authors would like to thank the Texas Department of Transportation (TxDOT) and Texas Transportation Institute (TTI) for providing support for this research. This paper reflects work in progress, not yet formally accepted by the TxDOT. The findings of this study are solely the opinions of the authors and do not represent the opinions of the TxDOT, TTI, or any other agency or organization.

    REFERENCES 1. Modlin Jr., D. G. Synthetic Through Trip Patterns. Journal of the Transportation

    Engineering Division of the American Society of Civil Engineers, Vol. 100, No. 2, 1974, pp. 363-378.

    2. Pigman, J. G., and R. C. Deen. Simulation of Travel Patterns for Small Urban Areas. In Transportation Research Record: Journal of the Transportation Research Board, No. 730, Transportation Research Board of the National Academies, Washington, D.C., 1979, pp. 23-29.

    3. Modlin Jr., D. G. Synthesized Through-Trip Table for Small Urban Areas. In Transportation Research Record: Journal of the Transportation Research Board, No. 842, Transportation Research Board of the National Academies, Washington, D.C., 1982, pp. 16-21.

    4. Anderson, M. D. Evaluation of Models to Forecast External-External Trip Percentages. Journal of Urban Planning and Development, Vol. 125, No. 2, 1999, pp. 110-120.

  • Talbot, Burris and Farnsworth 13

    5. Anderson, M. D. Spatial Economic Model for Forecasting the Percentage Splits of External Trips on Highways Approaching Small Communities. In Transportation Research Record: Journal of the Transportation Research Board, No. 1931, Transportation Research Board of the National Academies, Washington, D.C., 2005, pp. 68-73.

    6. Cambridge Systematics, Inc. Quick Response Freight Manual. FHWA, U.S. Department of Transportation, 1996.

    7. Horowitz, A. J. and M. H. Patel. Through-Trip Tables for Small Urban Areas: a Method for Quick-Response Travel Forecasting. In Transportation Research Record: Journal of the Transportation Research Board, No. 1685, Transportation Research Board of the National Academies, Washington, D.C., 1999, pp. 57-64.

    8. Han, Y. Synthesized Through Trip Model for Small and Medium Urban Areas. Ph. D. Dissertation, North Carolina State Univ., Raleigh, N. Ca., 2007.

    9. Han, Y., and J. R. Stone. Synthesized Through-Trip models for Small and Medium Urban Areas. In Transportation Research Record: Journal of the Transportation Research Board, No. 2077, Transportation Research Board of the National Academies, Washington, D.C., 2008, pp. 148-155.

    10. Anderson, M. D., Y. M . Abdullah, S. E. Gholston, and S. L. Jones. Development of a Methodology to Predict Through-Trip Rates for Small Communities. Journal of Urban Planning and Development, Vol. 132, No. 2, 2006, pp. 112-114.

    11. Martchouk, M., and J. D. Fricker. Through-Trip Matrices Using Discrete Choice Models: Planning Tool for Smaller Cities. Presented at 88th Annual Meeting of the Transportation Research Board, Washington, D.C., 2009.

    12. Locating Urbanized Area and Urban Cluster Boundaries. http://www.census.gov/geo/www/ua/uaucbndy.html. Accessed June 30, 2010

    13. Hosmer, D. W., and S. Lemeshow. Applied Logistic Regression, Wiley, New York, 2000. 14. Koppelman, F. S., and C. Bhat. A Self Instructing Course In Mode Choice Modeling:

    Multinomial and Nested Logit Models,

  • Talbot, Burris and Farnsworth 14

    List of Figures Figure 1. Through Trip Split (Percentage) Model Prediction Errors for Each Quartile of the Prediction Values. Figure 2. Distribution Model Prediction Errors for Each Quartile of the Prediction Values (for Observations Equal to Zero). Figure 3. Distribution Model Prediction Errors for Each Quartile of the Prediction Values (for Observations Not Equal to Zero).

  • Talbot, Burris and Farnsworth 15

    Table 1. Survey Data Study Areas

    Study Area Year Survey External Stations

    Total Responses

    Abilene 2005 11 3,329 Amarillo 2005 12 4,234 Austin 2005 22 8,298 Dallas – Fort Worth 2005 32 12,642 Longview 2004 30 8,426 Lubbock 2005 17 3,988 Midland – Odessa 2002 13 4,023 San Angelo 2004 11 4,031 San Antonio 2005 22 9,892 Sherman – Denison 2005 10 3,975 Tyler 2004 18 5,124 Waco 2006 15 4,557 Wichita Falls 2005 11 3,093

  • Talbot, Burris and Farnsworth 16

    Table 2. Mean and Standard Deviation for Some Key Variables

    Mean Standard Deviation

    External Station Small Vehicle ADT 3911.56 6307.16

    External Station Large Vehicle ADT 1025.10 2252.65

    Distance in miles between two external stations 60.61 38.55

    Whether the least-time route between two exernal stations is valid (1= yes, 0 = no)

    0.53 0.50

    Travel time in minutes between pairs of external stations

    63.66 38.37

    Number of turns on route between two external stations

    3.87 2.23

    Number of freeway ramps on route between two external stations

    1.83 1.76

  • Talbot, Burris and Farnsworth 17

    Table 3. Models from the Through Trip Split Model Development Process

    Forward Selection

    Variable added

    AIC BIC 2C RMSE p

    Constant 110.2 110.2 16.4 PINTTHj 102.7 102.7 0.068 13.8

  • Talbot, Burris and Farnsworth 18

    Table 4. Models from the Through Trip Distribution Model Development Process

    Forward Selection

    Variable added

    AIC BIC 2C RMSE p

    None 247.1 247.1 9.64 PINT1ij 172.1 172.1 0.304 8.60 < 10-100 TURNSij 159.8 159.9 0.353 8.32 < 10-100 PADTALij 150.8 150.9 0.390 7.91 < 10-100 RAMPSij 148.3 148.4 0.400 7.89 < 10-100 ROUTEij 146.3 146.4 0.408 7.76 10-94 PINTij 144.6 144.7 0.415 7.61 10-81 DISTROij 144.1 144.2 0.417 7.57 10-24 Preliminary Model ( 20 0.417)

    Variable Coeff. Std. Err. z p PINT1ij 1.87 4.91×10-2 3.81×101

  • Talbot, Burris and Farnsworth 19

    Figure 1. Through Trip Split (Percentage) Model Prediction Errors for Each Quartile of the Prediction

    Values.

  • Talbot, Burris and Farnsworth 20

    Figure 2. Distribution Model Prediction Errors for Each Quartile of the Prediction

    Values (for Observations Equal to Zero).

  • Talbot, Burris and Farnsworth 21

    Figure 3. Distribution Model Prediction Errors for Each Quartile of the

    Prediction Values (for Observations Not Equal to Zero).