PPD214_Final_2012_12_05_12_00

download PPD214_Final_2012_12_05_12_00

of 22

Transcript of PPD214_Final_2012_12_05_12_00

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    1/22

    Factors That AffectTransit Ridership in

    Southern California

    2005-2011

    2012

    A STATISTICAL STUDY OF EXTERNAL FACTORS, 2005-2011

    SAMANTHA BEIER, ANDREW REKER, ELIZA YU, XINYU XU

    UNIVERSITY OF CALIFORNIA, IRVINE PPD 204 QUANTITATIVE ANALYSIS FOR PLANNERS | Project

    Submitted for Final Grade

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    2/22

    Page 1

    INTRODUCTION

    For this project, we chose to explore public transportation in order to describe and better

    understand the statistical relationship between the factors that may impact public transit ridership

    in Southern California. The geographical scope of our study focuses on the Los Angeles

    Combined Metropolitan Statistical Area (CMSA) including: Los Angeles, Orange, Riverside,

    San Bernardino, and Ventura counties. We have chosen to explore transit ridership because the

    maximized utilization of public transit is often considered to be a form of social welfare and may

    be associated with positive change and well being at the community level.

    We have analyzed multiple variables using monthly data from January 2005 until December

    2011, such as, non-farm employment levels, average unleaded gasoline price, and precipitation.

    We hypothesize that there is a relationship between public transit ridership and these variables.

    In addition, we hypothesize that there is a difference in public transit ridership when comparing

    the categories of school and non-school months. Due to limitations such as our regional scope,

    the presence of a large number of transit agencies in this region, and the availability of applicable

    data, we will not be analyzing internal factors of transit ridership such as fare rates or the

    quantity or quality of public transit service.

    Income is a contributing factor we believe can possibly explain transit riderships increase or

    decrease in usage in the LACMSA region. We will be analyzing income qualitatively as opposed

    to quantitatively due to lack of reliable monthly data available. Annual data, collected by the

    U.S. Census Bureau, will allow us to better understand how income has changed throughout this

    time period throughout the combined metropolitan areas in our area of interest.

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    3/22

    Page 2

    QUALITATIVE ANALYSIS

    We have identified four time-series datasets to enable us to gain some insight into our

    research topic. These datasets include: unlinked transit ridership in Southern California, monthly

    total precipitation in inches at the downtown Los Angeles Civic Center weather station, average

    monthly price of unleaded regular gasoline in the Los Angeles metropolitan area, and the Los

    Angeles-Orange County-Inland Empire-Ventura County Combined Metropolitan Statistical Area

    non-farm employment.

    This paper is structured to examine the relationships between transit ridership and each

    factor independently. The literature on factors that affect transit ridership generally divides these

    factors into two categories: internal organizational factors and external economic or societal

    factors (Taylor & Fink, 2003). We seek to examine the way in which factors external to transit

    operators have an effect on transit ridership, generally, economic, climatological, and seasonal

    factors.

    The time period we have selected for examination spans from January 2005 to December

    2011 and were recorded at monthly increments. We will examine the descriptive statistics for

    each of the four variables in subsequent sub-sections. Afterward, we will examine the

    correlation between transit ridership and precipitation, unleaded gasoline price, and non-farm

    employment, respectively. This is followed by a seasonal comparison between ridership in

    summer months, here described as non-school and school months using a t-test. Descriptive

    statistics for each of the variables we will be discussing may be found in the table below (Table

    1).

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    4/22

    Page 3

    TABLE 1GENERAL STATISTICS FOR DATA SETS

    Unlinked

    Transit

    Ridership in

    SouthernCalifornia

    Group

    Id.

    Summer

    versusNon-

    Summer

    Monthly

    Total Precip

    in Inches at

    LA CivicCenter

    Unleaded

    Price LA-

    OC

    (USD)

    LA-OC-IE-VC

    Combined MSA

    Non-Farm

    Employment

    N 84 84 84 84 84

    Mean 59,876,176.93 1.25 1.2268 3.04165 6,882,615.48

    Median 59,915,069.50 1.00 0.3100 3.03550 6,994,200.00

    Mode 51,088,939a 1 .00 2.519a 7,202,300

    Std. Deviation 3,364,748.747 .436 2.21565 .577614 288,170.031

    Variance 11,321,534,130,019.254

    .190 4.909 .334 83,041,966,866.036

    Range 16,686,804 1 11.02 2.664 730,900

    Minimum 51,088,939 1 .00 1.798 6,493,100

    Maximum 67,775,743 2 11.02 4.462 7,224,000

    Percentile 25 57,962,155.25 1.00 .0000 2.57475 6,551,825.00

    50 59,915,069.50 1.00 .3100 3.03550 6,994,200.00

    75 62,009,243.50 1.75 1.6250 3.33875 7,162,975.00

    a. Multiple modes exist. The smallest value is shown

    VARIABLE 1: Transit Ridership

    To begin the qualitative portion of our research project, we started by describing transit

    ridership with respect to the number of trips made using public transportation in the Southern

    California area. The dataset we selected for transit ridership is from the US DOT National

    Transit Database and is a sum of all unlinked rides served by transit operators in the Los

    Angeles-Orange-Riverside-San Bernardino-Ventura County Combined Metropolitan Statistical

    Area, calculated monthly. We collected monthly time-series data for January 2005 to December

    2011, a total of 84 months. (United States Department of Transportation National Transit

    Database, 2012). The list of operators for this CMSA can be found in Appendix A.

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    5/22

    Page 4

    When looking at this dataset, we found that the public transit ridership data shows a

    relatively broad range in the measures of central tendency. The range of ridership over these 84

    samples is somewhat broad at 16,686,804, with a minimum value of 51,088,939 and a maximum

    value of 67,773,743 (Figure 1). The standard deviation for this data is 3,364,749. This shows us

    that, even though the range is seemingly broad, with the standard deviation relatively small, the

    majority of the data points are expected to be tightly distributed near the mean.

    FIGURE 1:MONTHLY TRANSIT RIDERSHIP 2005-2011

    The mean for the ridership data is 59,876,177. When comparing the range of this data to that

    of the mean, this combination indicates that there are very few outliers. The median value of the

    ridership data is 59,915,069. Also, the mean and median are particularly close in value; there is

    only a difference of 38,993 between the two values. With the mean, the median, standard

    50

    52

    54

    56

    58

    60

    62

    64

    66

    68

    70

    PublicTransitRidership

    Millions

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    6/22

    Page 5

    deviation, and range, we expect that the distribution of data to be close to normal and that there

    would not be a significant positive or negative skew when graphing this data.

    To test this conclusion, we use a histogram and can confirm that monthly ridership is

    distributed closely to that of a normal distribution, centering near the mean of 59,976,177 (Figure

    2). The histogram in our figure also shows a line that is fitted to a theoretical normal

    distribution.

    FIGURE 2HISTOGRAM OF MONTHLY RIDERSHIP WITH NORMAL DISTRIBUTION FITTED

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    7/22

    Page 6

    VARIABLE 2: Unleaded Gas PricesThe next factor is gas price. In this section we describe the set of data reflecting average gas

    price within the Los Angeles-Orange-Riverside-San Bernardino-Ventura County Combined

    Metropolitan Statistical Area from January 2005 to December 2011. We use the monthly average

    unleaded gas price, as an independent variable to study and analyze the factors that affect transit

    ridership. Based upon the data summarized below, we can gauge the progression of the average

    gas prices over time in order to better describe possible patterns present in the data.

    FIGURE 3MONTHLY UNLEADED GASOLINE PRICE 2005-2011

    The mean of the gas price is $3.042, and the median is $3.036; they are almost the same.

    The range is $2.664 and the interquartile range is just $0.764. From these descriptive statistics,

    we conclude that gas prices in this time period are not distributed evenly. The box plot below

    shows that the median is almost in the middle between the highest price and the lowest price, but

    that it is slightly closer to Q3 than to Q1. (Figure 4) Because two of the data points are extreme

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    8/22

    Page 7

    outliers, they are not included in this figure, but they are the highest prices. To better understand

    this distribution, we chose to display this information using a histogram and to draw a fitted line

    on the graph; it shows that the distribution of gas price follows a normal distribution curve

    (Figure 5).

    FIGURE 4BOX PLOT OF MONTHLY UNLEADED GAS PRICE

    FIGURE 5HISTOGRAM OF MONTHLY UNLEADED GAS PRICES,FITTED NORMAL

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    9/22

    Page 8

    VARIABLE 3: Total Non-Farm Employment

    The second factor we want to analyze is the number of non-farm jobs. We choose to use the

    total non-farm employment, which is representative of jobs in the Combined Metropolitan

    Statistical Area. The mean of non-farm employment is 6,882,615.48 with a standard deviation of

    288170.03. Looking at the month-to-month employment trends, from 2005 to 2007, employment

    rose positively (Figure 6). But from 2008 to 2009, there is a decrease in non-farm jobs, which is

    likely a result of the deep economic crisis. After that, the economy began a recovery with

    increased non-farm employment, but had regular fluctuation.

    FIGURE 6MONTHLY NON-FARM EMPLOYMENT LEVELS 2005-2011

    When we examine the distribution of monthly non-farm data, we see that the distribution of

    the non-farm employment is not normal (Figure 7). Using both a histogram and a stem-and-leaf

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    10/22

    Page 9

    plot, we see that there are two peaks in the data: the first one at approximately 6,500,000, and the

    second one at approximately 7,100,000.

    FIGURE 7HISTOGRAM AND STEM-AND-LEAF OF MONTHLY NON-FARM EMPLOYMENT

    LEVELS

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    11/22

    Page 10

    VARIABLE 4: Precipitation

    The third factor we want to discuss is weather. Anecdotally, we see that when it is raining in

    this region, many people may postpone, alter, or cancel their travel plans. Therefore, we selected

    the element of precipitation to represent the condition of inclement weather that may explain

    possible reasons transit riders defer or cancel their travel plans. The value of precipitation rates

    fluctuated dramatically from 2005 to 2011 (Figure 8). Since Southern California is a drier

    climate and is semi-arid, the values of precipitation of many months are very close to zero.

    FIGURE 8MONTHLY PRECIPITATION RECORDED AT LACIVIC CENTER 2005-2011

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    12/22

    Page 11

    TABLE 2DESCRIPTIVE STATISTICS FOR MONTHLY PRECIPITATION

    The mean value of precipitation is 1.227 inches, with a significantly lower median of 0.310.

    Given the significantly higher mean when compared to the median, we can conclude that the

    distribution is positively skewed; we have also calculated the skewedness as having a value of

    2.915, showing a very significant positive skew (Figure 9).

    FIGURE 9HISTOGRAM OF MONTHLY PRECIPITATION

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    13/22

    Page 12

    VARIABLE 5: Household Income

    Household income in the LACMSA region is shown below in a graph to better represent this

    data (Table 3). Here we can see that there is a steady increase from 2005-2008 when it peaks and

    then we can see there is a decline from 2008 to 2011 with a flattening out moving from 2010 to

    2011 (Figure 10). From this information we can assess that the higher the household income, the

    less likely residents within LACMSA service areas are to take public transportation. The lower

    the income however, the greater likelihood that LACMSA residents will choose to take public

    transportation.

    TABLE 3ANNUAL HOUSEHOLD INCOME IN THE LACMSA

    Year Household Income

    2005 $52,069

    2006 $55,678

    2007 $58,648

    2008 $60,141

    2009 $58,005

    2010 $56,542

    2011 $56,231

    FIGURE 10ANNUAL HOUSEHOLD INCOME IN THE LACMSA

    $48,000

    $50,000

    $52,000

    $54,000

    $56,000

    $58,000

    $60,000

    $62,000

    2005 2006 2007 2008 2009 2010 2011

    CSA Median Household Income 2005-2011

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    14/22

    Page 13

    QUANTITATIVE ANALYSIS

    Now that we have described our data more generally, we will test our hypotheses (Table 4).

    First, we will test the strength of transit ridership correlations to that of: employment level,

    weather, and average unleaded gasoline price. Finally, we will test for seasonality in our data,

    that is, we will look to show a significant difference in transit ridership based upon groupings of

    school and non-school (summer) months.

    TABLE 4:RESEARCH HYPOTHESES

    Hypothesis 1: Employment level, weather, and average gasoline price correlate with transitridership.

    Research Hypothesis There is a correlation between the three identified factors and transitridership.

    Null Hypothesis There is no correlation.

    Hypothesis 2: There is a seasonal trait in transit ridership based on school year.

    Research Hypothesis There is a significant difference in transit ridership means betweenschool months and non-school (summer) months.

    Null Hypothesis There is no significant difference.

    CORRELATION: Transit Ridership and Total Non-Farm Employment

    First, we examine the relationship between public transit ridership and non-farm

    employment in order to better understand the relationship between these two variables. We

    calculated a correlation coefficient of 0.431, which has significance at the 0.01 level. This

    correlation value of 0.431 can be interpreted as a modest positive relationship. We found that as

    the quantity of people employed in non-farm jobs increased, so did the quantity of passenger

    trips in transit ridership (Figure 11). There is a clear, positive correlation between these two

    variables when graphing these two variables for display. While this positive correlation does

    show a relationship between public transit ridership and non-farm employment, we are unable to

    determine a causal relationship from this form of analysis.

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    15/22

    Page 14

    FIGURE 11: CORRELATION OF TRANSIT RIDERSHIP AND NON-FARM EMPLOYMENT,2005-

    2011

    CORRELATION: Transit Ridership and Precipitation

    In order to better understand the relationship between transit ridership and that of weather

    patterns, we will be using precipitation (in inches) as a way of characterizing weather conditions

    during any given month. For this analysis, we ran a correlation of these two variables in order to

    better understand this relationship. When calculating the correlation of these two variables, we

    found that there is a correlation of -0.589, which is significance at the 0.01 level and indicates a

    moderate negative relationship. We found that as the average precipitation (in inches) increases,

    the quantity of passenger trips in transit ridership tends to decrease (Figure 12). When graphed,

    there is a relatively clear negative correlation between these two variables. While this negative

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    16/22

    Page 15

    correlation does show a relationship between public transit ridership and precipitation, we are

    unable to determine a causal relationship from this form of analysis.

    FIGURE 12: CORRELATION OF TRANSIT RIDERSHIP AND PRECIPITATION,2005-2011

    CORRELATION: Transit Ridership & Gas Prices

    We examined the influence of gas price on transit ridership by calculating a Pearsons

    correlation statistic between ridership and the average price of unleaded gasoline within the

    LACMSA. The correlation statistic for these two variables is 0.334 with a p-significance of

    0.001. This correlation is meaningful as it is under the p-critical value of 0.05 (Table 5). The

    correlation statistic also shows that there is a weak positive correlation between unleaded

    gasoline price and transit ridership.

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    17/22

    Page 16

    TABLE 5TRANSIT RIDERSHIP &GAS PRICE CORRELATION TEST RESULT

    Correlation p-significance n0.334 0.001 84

    An alternative way of representing this relationship is to use a graph (Figure 13). A graph to

    test for relationship between gasoline price and transit ridership would have gasoline price on the

    x-axis and the transit ridership on the y-axis. Each point on the graph would represent one

    particular month in our data set. The x-value would reflect the average gas price for a given

    month while the y-value would reflect the transit ridership during that same month. The 84

    points (n=84) would then be plotted and a line of best fit is plotted for reference. When rendered

    for our data, the graph showed that there is a weak positive correlation between our two variables

    indicating that as gas price increases, on average, so would public transit ridership.

    FIGURE 13: CORRELATION OF TRANSIT RIDERSHIP AND PRECIPITATION,2005-2011

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    18/22

    Page 17

    T-TEST: Transit Ridership & School Status

    For our next test, we sought to explain differences in transit ridership as it relates to school

    status, here categorized as school and non-school months. We hypothesize that during the

    academic school year there is an increase in transit ridership in the Los Angeles CMSA when

    compared to non-school (summer) months. We have categorized school months to include

    months September through May and non-school months to include months June through August

    based on the public school academic calendar in the LA CMSA region.

    To test this hypothesis, we used independent samples T-test in SPSS to help us analyze how

    school and non-school status affect transit ridership (Figure 14). We assigned a group ID for

    school and non-school categories for the purposes of this test. They were assigned the ID number

    of 1 and 2, respectively.

    FIGURE 14: SCHOOL SEASONALITY -INDEPENDENT SAMPLES T-TEST RESULTS

    From the results of this T-test, we can conclude that there is no significant difference in

    transit ridership between school and non-school months. This is seen in the p-value labeled as

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    19/22

    Page 18

    Sig (2-tailed) in our table, which has a value of greater than 0.05. This value means that the

    variance in means of transit ridership between school and non-school months not statistically

    different in both groups. Therefore, we fail to reject the null hypothesis, as there is no significant

    difference between these two groups. Looking at the group-level descriptive statistics between

    the two groups, both groups means are very close to each other as well as overlapping ranges.

    This further reinforces the conclusion of the t-test (Table 6).

    TABLE 6:SCHOOL AND NON-SCHOOL STATUS IN CORRELATION WITH TRANSIT RIDERSHIP

    IN THE LA-CMSAREGION

    Group 1 (School) Group 2 (Non-

    school)

    N Valid 63 21Mean 59,495,727.59 61,017,524.95Median 59,557,368.00 60,326,975.00Mode None NoneStd. Deviation 3,546,591.19 2,483,518.28Variance 12,578,309,076,811.90 6,167,863,065,431.20Range 16,686,804.00 9,077,955.00Minimum

    51,088,939.00 57,898,818.00

    Maximum 67,775,743.00 66,976,773.00

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    20/22

    Page 19

    Conclusions

    Our statistical analysis found some interesting conclusion in the case of our selected factors.

    We found significant correlations for all of our factors; however, they were only moderately

    strong. With most of these calculated correlations, we can say that no single factor is strongly

    correlated to transit ridership (Table 7). Further areas for research could involve time-series

    analysis with regression. In addition, a model could be created to create generalized forecasts of

    transit ridership in Southern California.

    TABLE 7:SUMMARY TABLE OF CORRELATIONS AND T-TEST RESULTS

    Transit Ridership and Correlation Relationship

    Gasoline Prices 0.334 Weak Positive

    Employment Levels 0.431 Weak Positive

    Precipitation -0.589 Moderate Negative

    Seasonality Seasonality Not Found

    Several interesting points from our data include: the relative strength of the negative

    correlation between precipitation and transit ridership, compared to the other two variables we

    tested for correlation was surprising. One possible explanation for this is that December is a

    month that often has increased precipitation in Southern California; it is also the month that

    typically has fewer workdays due to a greater number of prominent holidays and an increase in

    the use of vacation days at the years end. It is the multiple factors that impact transit ridership,

    both internal and external, that make transit ridership such an interesting topic for research as no

    single factor can exclusively predict change.

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    21/22

    Page 20

    REFERENCES

    State of California Employment Development Department. (2012). MSA Seasonal Adjusted

    Total Non-farm Employment. [Data File]. Retrieved from

    http://www.calmis.ca.gov/file/indhist/msa$shws.xls

    Taylor, Brian D., and Camille N. Y. Fink. The Factors Influencing Transit Ridership: A Review

    and Analysis of the Ridership Literature. Fall 2003, p. 681

    United States Census Bureau (2005-2011). American Fact Finder. Los Angeles-Long Beach-

    Riverside, CA CSA

    United States Department of Labor Bureau of Labor Statistics. (2012). Consumer Price Index -

    Average Price Data Los Angeles-Riverside-Orange County, CA [Data file]. Retrieved from

    http://data.bls.gov/timeseries/APUA42174714

    United States Department of Transportation National Transit Database. (2012). Monthly Raw

    Data. [Data File]. Retrieved from

    http://www.ntdprogram.gov/ntdprogram/pubs/MonthlyData/MONTHLY_RAW_DATA_10

    _03_2012.xls

    Western Regional Climate Center. (2012). Monthly Precipitation Los Angeles Civic Center,

    California. [Data File] Retrieved from http://www.wrcc.dri.edu/cgi-

    bin/cliMONtpre.pl?ca5115

  • 7/30/2019 PPD214_Final_2012_12_05_12_00

    22/22

    Page 21

    APPENDIX A Transit Operators for LA - CMSA

    Santa Monica's Big Blue BusAccess ServicesAnaheim Transportation NetworkAntelope Valley Transit AuthorityCity of Arcadia TransitCity of Commerce Municipal BuslinesCity of CoronaCity of Gardena Transportation DepartmentCity of La Mirada TransitCity of Los Angeles Department of TransportationCity of Redondo Beach - Beach Cities TransitCulver City Municipal Bus LinesDAVE Transportation Services, Inc.Foothill Transit

    Gold Coast TransitLaguna Beach Municipal TransitLaidlaw Transit ServicesLong Beach TransitLos Angeles County Metropolitan Transportation Authority dba: MetroMontebello Bus LinesNorwalk Transit SystemOmnitransOrange County Transportation AuthorityRiverside Transit AgencyRyder/ATE

    Santa Clarita TransitSimi Valley TransitSouthern California Regional Rail Authority dba: MetrolinkSunLine Transit AgencyThousand Oaks TransitTorrance Transit SystemVentura Intercity Service Transit AuthorityVictor Valley Transit Authority