Task 2 of the project is to “select one or more possible model designs for RSPM mode shift, estimate model parameters and evaluate the designs and estimated parameters with sensitivity tests and validation”. More specifically, the plan is to select and estimate one or more possible designs of the mode choice model based on literature review and data exploration in Task 1 and to understand what mode shifts occur as vehicle travel is reduced, incorporating and testing interactions in RSPM. These approaches build on the existing RSPM module and utilize household and land use inputs and budget constraints already embedded in the RSPM tool. The PSU team will suggest functional form and independent variables for model estimation with associated data sources for estimation and validation. PSU researchers will also identify sensitivity tests to assess the upgraded model with literature elasticities, repeating some of the tests previously calculated by the RSPM to ensure these remain intact, as well as adding tests to evaluate the new functionality. The PSU team will discuss and coordinate with Brian Gregor in the model design and estimation process, as he implements the RSPM common framework, to make sure the design and data format match the latest RSPM modeling framework. ODOT staff shall review and adjust the proposed designs, estimation data and validation data/approach.
The deliverable of Task 2 is a working paper (this document) that describes model designs, data sources, estimation, results of sensitivity tests and validation; documented R scripts used to process and analyze data.
Members of the TAC/OSA contract will review and suggest adjustments to the PSU researchers for model design, estimation data and results, validation data, approach and results; guide the selection of the best model design.
The primary data sources we identified and used for Task 2 are the 2009 National Household Travel Survey (NHTS) and 2010 EPA Smart Location Database (SLD). Additional data sources include TTI’s Urban Mobility Report dataset and National Transit Database. We retrieved the 2009 NHTS data with confidential block group level residence location (and Census Tract and ZIP code of workplace location), which is the ideal data set we eyed for modeling mode choice. With the confidential residential block group location, we joined the 2009 NHTS with the 2010 SLD to get a combined dataset of travel information and built environment/urban form variables of households’ residential block group.
We also looked into alternative data sources, including the 2012 California Travel Survey (CATS), 2011 Oregon Household Travel and Activity Survey (OHAS), the 2014-2015 Puget Sound Regional Travel Study (PSRTS) and explored the potential to combine these 3 surveys to create a unified data set with diverse coverage. But with the 2009 NHTS data with confidential residential block group location, there is limited additional benefit of pursuing combining CATS, OHAS and PSRTS for a number of reasons. First, these three data sets were collected in different years, which may create odd effects that are included in the model estimation. Second, since each of the data sets was collected by a different State/agency, the information collected varies and how the variables are measured or coded are likely different, which adds extra work to create a unified data set at the least and may weaken the final models at the worst.
There are two areas that extra data would be still beneficial. One area we wish to have a better handle is the day-to-day variation in mode choice and total demand (for example, the amount of driving measured in vehicle miles traveled), so that we can predict long-term behavior from a daily model. However, NHTS, as well as the three travel surveys above only capture the travel information for one single day. In GreenSTEP, Brian Gregor assumed the stochasticity in household daily VMT model (a linear regression model with transformed VMT as the dependent variable) represents the day-to-day variation in VMT. Such approximation of weekly VMT from daily information may be imperfect. Verification of the relationship between daily and longer term VMT and an explicit model of weekly (or annual) VMT may be necessary. A few potential data sets would be helpful in looking into the relationship. In particular, the 2004 – 2006 Traffic Choices Study by the Puget Sound Regional Council. For a pilot project on congestion-based tolling sponsored by Federal Highway Administration, the study placed GPS data loggers into the vehicles of about 275 households in the Seattle metropolitan area. The project recorded roughly 18 months of trip data (from November 2004 to April 2006) and included more than 400 vehicles. Such long-term data would be helpful to look into the relationship between daily and long term VMT.
Another potential area we are looking into for improvement is the modeling of price elasticities of travel demand. Brian tested three different methods of capturing price elasticities: income effect, price coefficient and household budget model. There are a number of challenges to get realistic price elasticities, including 1) The lack of disaggregate panel data that can be used to study how household travel decisions change over time in response to changes in fuel prices; 2) The relatively low historical price of fuel; 3) The prospect for future fuel prices that may be several times greater than present prices; 4) A lack of research consensus on the magnitude of the effects; and, 5) The difficulty of sorting out short range and long range effects.
Because of these challenges, the first two methods do not have sufficient sensitivity and Brian adopted the household budget model. All the challenges Brian identified above remain for the current project. Using the household budget model as the baseline model, we hope to draw from literature around the world (for example, Graham and Glaister, 2002) on the magnitude of the price elasticities and explore alternative methods of incorporating the elasticities into the new model of travel demand. Tolling studies such as the Puget Sound Traffic Choices Study provide some useful information on the price elasticities of travel demand (even though not from fuel price change).
In addition to surveyed households’ socio-demographic characteristics, the 2009 NHTS [@NHTS2009] collected daily trips taken in a 24-hour period, and includes:
The 2009 NHTS included 150,145 households, 308,901 household members and 1,079,763 trips.
According to codebook for G34 TRPTRANS, we re-classify the modes into 5 categories:
Code | Name |
---|---|
1 | Car |
2 | Van |
3 | SUV |
4 | pickup truck |
5 | other truck |
6 | recreational vehicle |
7 | motorcycle |
Code | Name |
---|---|
9 | transit bus |
10 | commuter bus |
11 | school bus |
12 | charter bus |
13 | city to city bus |
14 | Shuttle bus |
15 | Amtrak |
16 | Commuter train |
17 | Subway |
18 | Street car/trolley |
Code | Name |
---|---|
22 | bicycle |
Code | Name |
---|---|
23 | walk |
Code | Name |
---|---|
8 | Light electric veh (golf cart) |
19 | taxi cab |
20 | Ferry |
21 | airplanes |
24 | Special transit-people w/disabilitie |
Variable Names | Description |
---|---|
ANNMILES | [NHTS] Self-reported annualized mile estimate |
BESTMILE | [NHTS] Best estimate of annual miles (ORNL) |
TDAYDATE | [NHTS] Date of Travel Day (YYYYMM) |
TRAVDAY | [NHTS] Travel day - day of week |
DRVRCNT | [NHTS] Number of drivers in HH |
HHSIZE | [NHTS] Count of HH members |
HHVEHCNT | [NHTS] Count of HH vehicles |
NUMADLT | [NHTS] Count of adult HHMs at least 18 years old |
TRPMILES | [NHTS] Calculated Trip distance converted into miles |
TRPTRANS | [NHTS] Transportation mode used on trip |
TRIPPURP | [NHTS] General Trip Purpose (Home-Based Purpose types) |
TRVL_MIN | [NHTS] Derived trip time - minutes |
TRVLCMIN | [NHTS] Calculated travel time |
DVMT | [NHTS] Calculated Trip distance (miles) for Driver Trips |
WRKCOUNT | [NHTS] Number of workers in HH |
LIF_CYC | [NHTS] Household Life Cycle (Single, Young Couple, Couple with children, Empty Nester) |
Htppopdn | |
TOTPOP10_1 | [PT] 2010 Total population within 1 mile buffer of BG centroid |
EMPTOT_2 | [PT] Total employment within 2 mile buffer of BG centroid |
[PT] | |
E5_RET10 | [SLD] 2010 Retail employment |
E5_SVC10 | [SLD] 2010 Service employment |
D1D | [SLD] Gross activity density (employment + HUs) on unprotected land |
D2A_JPHH | [SLD] Jobs per household |
D3amm | [SLD] Network density in terms of facility miles of multi-modal links per square mile |
D3apo | [SLD] Network density in terms of facility miles of pedestrian-oriented links per square mile |
D4a | [SLD] Distance from population weighted centroid to nearest transit stop (meters) |
D4c | [SLD] Aggregate frequency of transit service within 0.25 miles of block group boundary per hour during evening peak period |
D4d | [SLD] Aggregate frequency of transit service (D4c) per square mile |
Fwylnmicap | [TTI] 2010 Urbanized Area freeway lane miles per capita |
Tranmilescap | [NTD] 2009 Urbanized Area annual vehicle revenue miles per capita |
ACCESS | [Place Type] Accessibility measure ACCESS = (2 * EMPTOT_2 * TOTPOP10_5) / 10000 * (EMPTOT_2 + TOTPOP10_5) , where EMPTOT_2 is employment within 2-mile radius, and TOTPOP10_5 is total 2010 population within 5-mile radius |
mode | n | % |
---|---|---|
Auto | 955345 | 88.477 |
Walk | 93182 | 8.630 |
Transit | 22483 | 2.082 |
Bike | 8753 | 0.811 |
TRIPPURP | mode | n | % |
---|---|---|---|
HBO | Auto | 195189 | 84.670 |
HBO | Bike | 1023 | 0.444 |
HBO | Transit | 13157 | 5.707 |
HBO | Walk | 21161 | 9.179 |
HBSHOP | Auto | 243832 | 95.249 |
HBSHOP | Bike | 1097 | 0.429 |
HBSHOP | Transit | 1251 | 0.489 |
HBSHOP | Walk | 9814 | 3.834 |
HBSOCREC | Auto | 110582 | 71.482 |
HBSOCREC | Bike | 4832 | 3.123 |
HBSOCREC | Transit | 812 | 0.525 |
HBSOCREC | Walk | 38473 | 24.870 |
HBW | Auto | 102319 | 95.909 |
HBW | Bike | 684 | 0.641 |
HBW | Transit | 1671 | 1.566 |
HBW | Walk | 2009 | 1.883 |
NHB | Auto | 303423 | 91.432 |
NHB | Bike | 1117 | 0.337 |
NHB | Transit | 5592 | 1.685 |
NHB | Walk | 21725 | 6.546 |
(#fig:trippurp.v.trptrans)Figure 1: Shares of trips by trip purpose and mode
The distribution of raw trip distance (miles) is very skewed
Summary of trip distance by mode
mode | n | 5% | 25% | 50% | 75% | 95% | 99% | max | mean | sd |
---|---|---|---|---|---|---|---|---|---|---|
Auto | 955345 | 0.556 | 2.000 | 4.0 | 10.000 | 29 | 57 | 91 | 8.027 | 10.831 |
Bike | 8753 | 0.111 | 0.556 | 1.0 | 2.889 | 8 | 17 | 22 | 2.211 | 3.113 |
Transit | 22483 | 0.556 | 2.000 | 4.0 | 9.000 | 26 | 55 | 95 | 7.727 | 10.301 |
Walk | 93182 | 0.111 | 0.222 | 0.5 | 0.778 | 2 | 3 | 4 | 0.646 | 0.612 |
Given the skewedness of trip distance, a cutoff of 99-percentile of trip distance for each mode is used. Results below are after applying the cutoff.
mode | n | 5% | 25% | 50% | 75% | 95% | 99% | max | mean | sd |
---|---|---|---|---|---|---|---|---|---|---|
Auto | 127999 | 4.000 | 17.000 | 40.00 | 80.00 | 183.00 | 308.12 | 1205.0 | 59.91 | 64.91 |
Bike | 3412 | 0.222 | 1.111 | 3.00 | 7.00 | 20.00 | 37.10 | 76.0 | 5.67 | 7.61 |
Transit | 9107 | 1.000 | 4.000 | 10.00 | 22.00 | 66.00 | 130.00 | 434.0 | 19.08 | 27.31 |
Walk | 32780 | 0.222 | 0.556 | 1.11 | 2.22 | 5.44 | 8.67 | 40.2 | 1.84 | 1.88 |
mode | n | 5% | 25% | 50% | 75% | 95% | 99% | max | mean | sd |
---|---|---|---|---|---|---|---|---|---|---|
Auto | 127999 | 19 | 50 | 97 | 167 | 325 | 500 | 2084 | 124.9 | 105.6 |
Bike | 3412 | 5 | 19 | 30 | 60 | 140 | 240 | 515 | 48.6 | 49.6 |
Transit | 9107 | 13 | 31 | 60 | 106 | 220 | 380 | 1155 | 82.7 | 79.5 |
Walk | 32780 | 4 | 15 | 30 | 55 | 118 | 196 | 1110 | 40.6 | 42.4 |
Boxplot of household travel distance (mile) and time (minutes) by mode
DVMT - Calculated Trip distance (miles) for Auto Trips
ANNMILES - Self-reported annualized mile estimate;
BESTMILE - Best estimate of annual miles (by ORNL)
The Smart Location Database [@Ramsey2014] is a nationwide geographic data resource for measuring location efficiency. It includes more than 90 attributes summarizing characteristics such as housing density, diversity of land use, neighborhood design, destination accessibility, transit service, employment, and demographics. Most attributes are available for every census block group in the United States. The variables in SLD are largely organized according to the 5D built environment measures: Density, Diversity, Design, Transit, Destination, in addition to demographics and employment. A complete list of the variables can be found here.
The confidential NHTS data contain Census Block Group information of households’ residence Census block group (2010 geography), which is joined with SLD to retrieve land use features for these locations. Land use information in SLD provide a rich set of factors that are documented in existing research literature to have influence on households’ travel behavior including mode choices and travel distance.
All households in the 2009 NHTS data have a matched block group in the SLD.
Place types are land uses categories that are useful for describing development patterns and their relationship to human behavior (e.g. travel behavior) and well being (e.g. health) (Gregor, 2016). In the RSPM mode shift project, we use place types as a means to simplify the work for RSPM users when they create scenarios.
This project adopts the work by Brian Gregor and others and establishes categories over the following 3 dimensions:
By default, the accessibility measure ACCESS = (2 * EMPTOT_2 * TOTPOP10_5) / 10000 * (EMPTOT_2 + TOTPOP10_5)
, where EMPTOT_2
is employment within 2-mile radius, and TOTPOP10_5
is total 2010 population within 5-mile radius. The break points for very low, low, medium, and high are 0.1, 0.5 and 2, respectively.
The Density level uses D1D variable in SLD - gross activity density (employment + HUs) on unprotected land (per acre) - with break points of 0.1, 1, and 5.
The Design measure is based on two variables from the SLD: D3amm variable (network density in terms of multimodal links per square mile) and D3apo variable (network density in terms of facility miles of pedestrian-oriented links per square mile). The default break points for D3amm are 1.3, 2.5, and 3.3, while those for D3apo are 12.5, 15.6, and 20. The final value of the Design measure is the maximum value of the two. For example, if the D3amm value is low and D3apo value is medium, the final value of the design measure would be medium.
Diversity Level is a measure of the mixing of jobs and households in the block group. It is based on measures in the SLD: D2A_JPHH
(ratio of jobs to households in the block group and the ratio of retail and service jobs to the number of households (E5_RET10 + E5_SVC10)/HH
.
Transit Level is a measure of the level of transit service derived from the SLD D4c (aggregate frequency of transit service within 0.25 miles of block group boundary per hour during evening peak period). The threshold values for the 4 levels are 1, 20, and 150.
Based on discussion with the TAC, in particular, Brian and Tara, we primilarily use the place types as an intermediate step to faciliate scenario creation, but not as independent variables directly included in model specification.
GreenSTEP focuses on Daily Vehicle Mile Travel (VMT) by drivers in its household travel model and does not explicitly models non-driving travel (for example, by transit or non-motorized modes), except for diversion of short-distance trips to bike. The current household travel model in GreenSTEP has two sequential (conditional) model: a binary model of whether a household will have non-zero VMT and a regression model of the actual VMT for households with non-zero VMT. Such a model structure provides a good balance between behavioral realism and simplicity and performance:
\(P(Daily VMT==0) = logit(DrvAgePop + LogIncome + Htppopdn + Age65Plus + Hhvehcnt + ZeroVeh + Tranmilescap + Urban:Tranmilescap)\), and
metro | nonmetro | |
(1) | (2) | |
DrvAgePop | 0.065*** | -0.070*** |
(0.019) | (0.021) | |
LogIncome | -0.453*** | -0.435*** |
(0.022) | (0.020) | |
HTPPOPDN | -0.003 | 0.002 |
(0.008) | (0.008) | |
Age65Plus | -0.101*** | -0.013 |
(0.026) | (0.022) | |
HHVEHCNT | -0.522*** | -0.304*** |
(0.029) | (0.022) | |
ZeroVeh | 3.730*** | 3.680*** |
(0.094) | (0.091) | |
Tranmilescap | 0.023*** | |
(0.001) | ||
Constant | 2.650*** | 2.520*** |
(0.224) | (0.202) | |
Observations | 53,461 | 70,324 |
Log Likelihood | -11,856.000 | -14,397.000 |
Akaike Inf. Crit. | 23,728.000 | 28,808.000 |
Note: | p<0.1; p<0.05; p<0.01 |
metro | auc | pseudo.r2 |
---|---|---|
metro | 0.806 | 0.306 |
non_metro | 0.743 | 0.198 |
\((Daily VMT)^{0.18} = lm(Census\_r + LogIncome + Htppopdn + Hhvehcnt + ZeroVeh + Tranmilescap + Fwylnmicap + DrvAgePop + Age65Plus + Urban + Htppopdn:Tranmilescap)\)
metro | nonmetro | |
(1) | (2) | |
CENSUS_RNE | -0.004 | 0.051*** |
(0.008) | (0.005) | |
CENSUS_RS | 0.011* | 0.054*** |
(0.006) | (0.004) | |
CENSUS_RW | -0.008 | 0.015*** |
(0.006) | (0.005) | |
LogIncome | 0.081*** | 0.076*** |
(0.002) | (0.002) | |
HTPPOPDN | -0.004*** | 0.004*** |
(0.001) | (0.001) | |
HHVEHCNT | 0.058*** | 0.054*** |
(0.002) | (0.001) | |
ZeroVeh | 0.036 | 0.066*** |
(0.024) | (0.025) | |
Tranmilescap | -0.001*** | |
(0.0003) | ||
Fwylnmicap | 0.029*** | |
(0.006) | ||
DrvAgePop | 0.045*** | 0.054*** |
(0.002) | (0.002) | |
Age65Plus | -0.049*** | -0.052*** |
(0.002) | (0.002) | |
HTPPOPDN:Tranmilescap | 0.0001* | |
(0.0001) | ||
Constant | 0.765*** | 0.781*** |
(0.023) | (0.019) | |
Observations | 48,249 | 65,356 |
R2 | 0.170 | 0.163 |
Adjusted R2 | 0.169 | 0.163 |
Residual Std. Error | 0.307 (df = 48236) | 0.327 (df = 65346) |
F Statistic | 821.000*** (df = 12; 48236) | 1,416.000*** (df = 9; 65346) |
Note: | p<0.1; p<0.05; p<0.01 |
metro | rmse | nrmse | r.squared |
---|---|---|---|
metro | 37.2 | 37.2 | 0.170 |
non_metro | 44.6 | 44.6 | 0.163 |
metro | rmse | nrmse |
---|---|---|
metro | 55.0 | 1.47 |
non_metro | 65.3 | 1.46 |
Another related model in GreenSTEP is the household budget model that captures the price elasticity of travel. The budget approach to modeling is based on the perspective that households make their travel decisions within money and time budget constraints. According to Brian’s research on historical consumer expenditure survey data, household spending on gasoline and other variable costs is done within a household transportation budget that is relatively stable, as households shift expenses between transportation budget categories when gasoline prices fluctuate. Households will necessarily reduce their travel in direct proportion to the cost increase only when fuel prices or other variable costs increase to the point where it is no longer possible to shift money from other parts of the transportation budget [@gregor]. Brian assumes the transition between inelastic and elastic behavior will not be abrupt unless there is little time for the household to recognize the impact of the cost increases on the budget or respond to the cost increases. If the changes are more gradual, the transition will be less abrupt. Given the focus of GreenSTEP/RSPM on long term forecasting, we would only need to model long run elasticities.
Instead of modeling DVMT and then approximating annual VMT from it, an alternative is to directly model annual average daily VMT (AADVMT). Both 2001 and 2009 NHTS contain annual mile estimates provided by ORNL, from which we can derive AADVMT.
ln(AADVMT) = f(HH variables, 5D variables)
metro | nonmetro | |
(1) | (2) | |
DrvAgePop | 0.274*** | |
(0.009) | ||
HHSIZE | 0.261*** | 0.113*** |
(0.005) | (0.007) | |
WRKCOUNT | 0.407*** | 0.349*** |
(0.007) | (0.006) | |
CENSUS_RNE | -0.124*** | |
(0.015) | ||
CENSUS_RS | 0.074*** | |
(0.012) | ||
CENSUS_RW | -0.132*** | |
(0.016) | ||
LogIncome | 0.357*** | 0.392*** |
(0.007) | (0.005) | |
Age65Plus | -0.024*** | -0.037*** |
(0.008) | (0.006) | |
ns(log1p(VehPerDriver), 3)1 | 2.120*** | 2.320*** |
(0.052) | (0.041) | |
ns(log1p(VehPerDriver), 3)2 | 4.330*** | 4.140*** |
(0.215) | (0.167) | |
ns(log1p(VehPerDriver), 3)3 | 2.120*** | 2.570*** |
(0.236) | (0.159) | |
log1p(TRPOPDEN) | -0.047*** | -0.059*** |
(0.011) | (0.011) | |
log1p(EMPTOT_5) | -0.074*** | -0.049*** |
(0.006) | (0.003) | |
Tranmilescap | -0.003*** | |
(0.0005) | ||
D1B | -0.002*** | 0.008*** |
(0.0004) | (0.003) | |
D3bpo4 | -0.0004* | |
(0.0002) | ||
D2A_EPHHM | 0.070*** | |
(0.024) | ||
ACCESS | 0.002 | -0.027*** |
(0.002) | (0.008) | |
Tranmilescap:D4c | -0.00001*** | |
(0.00000) | ||
D1B:D2A_EPHHM | -0.026*** | |
(0.005) | ||
Constant | -1.730*** | -2.280*** |
(0.122) | (0.089) | |
Observations | 41,497 | 73,899 |
R2 | 0.390 | 0.417 |
Adjusted R2 | 0.390 | 0.417 |
Residual Std. Error | 1.040 (df = 41482) | 1.040 (df = 73881) |
F Statistic | 1,893.000*** (df = 14; 41482) | 3,107.000*** (df = 17; 73881) |
Note: | p<0.1; p<0.05; p<0.01 |
metro | rmse | nrmse | r.squared |
---|---|---|---|
metro | 30.7 | 0.573 | 0.390 |
non_metro | 34.2 | 0.566 | 0.417 |
## $metro
##
## Call:
## pscl::hurdle(formula = AADVMT.int ~ HHSIZE + WRKCOUNT + CENSUS_R +
## LogIncome + Age65Plus + ns(log1p(VehPerDriver), 3) + log1p(EMPTOT_5) +
## D1D + D1B:D2A_EPHHM + Tranmilescap + Tranmilescap:D4c + D3bpo4 +
## ACCESS | HHSIZE + WRKCOUNT + LogIncome + Age65Plus + ns(log1p(VehPerDriver),
## 3) + log1p(TRPOPDEN) + D1D + D1B:D2A_EPHHM, data = ., na.action = na.omit,
## dist = "negbin")
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -1.663 -0.686 -0.162 0.487 10.175
##
## Count model coefficients (truncated negbin with log link):
## Estimate Std. Error z value
## (Intercept) 1.01205323 0.07871768 12.86
## HHSIZE 0.16429554 0.00284087 57.83
## WRKCOUNT 0.20046525 0.00430530 46.56
## CENSUS_RNE -0.06553070 0.01627132 -4.03
## CENSUS_RS 0.02571395 0.01375002 1.87
## CENSUS_RW -0.03530183 0.01382868 -2.55
## LogIncome 0.17903454 0.00419696 42.66
## Age65Plus -0.02988608 0.00452710 -6.60
## ns(log1p(VehPerDriver), 3)1 1.17448224 0.03150512 37.28
## ns(log1p(VehPerDriver), 3)2 1.89891522 0.14345867 13.24
## ns(log1p(VehPerDriver), 3)3 1.01860008 0.14084883 7.23
## log1p(EMPTOT_5) -0.04421578 0.00329058 -13.44
## D1D -0.00047001 0.00020798 -2.26
## Tranmilescap -0.00078248 0.00034049 -2.30
## D3bpo4 -0.00033312 0.00012605 -2.64
## ACCESS 0.00215679 0.00121729 1.77
## D1B:D2A_EPHHM -0.00156197 0.00056375 -2.77
## Tranmilescap:D4c -0.00000713 0.00000236 -3.02
## Log(theta) 1.05398938 0.00753693 139.84
## Pr(>|z|)
## (Intercept) < 0.0000000000000002 ***
## HHSIZE < 0.0000000000000002 ***
## WRKCOUNT < 0.0000000000000002 ***
## CENSUS_RNE 0.00005640306333 ***
## CENSUS_RS 0.0615 .
## CENSUS_RW 0.0107 *
## LogIncome < 0.0000000000000002 ***
## Age65Plus 0.00000000004068 ***
## ns(log1p(VehPerDriver), 3)1 < 0.0000000000000002 ***
## ns(log1p(VehPerDriver), 3)2 < 0.0000000000000002 ***
## ns(log1p(VehPerDriver), 3)3 0.00000000000048 ***
## log1p(EMPTOT_5) < 0.0000000000000002 ***
## D1D 0.0238 *
## Tranmilescap 0.0216 *
## D3bpo4 0.0082 **
## ACCESS 0.0764 .
## D1B:D2A_EPHHM 0.0056 **
## Tranmilescap:D4c 0.0025 **
## Log(theta) < 0.0000000000000002 ***
## Zero hurdle model coefficients (binomial with logit link):
## Estimate Std. Error z value
## (Intercept) -8.05823 0.65520 -12.30
## HHSIZE -0.12906 0.03851 -3.35
## WRKCOUNT 1.19329 0.09631 12.39
## LogIncome 0.71847 0.05045 14.24
## Age65Plus 0.46273 0.07781 5.95
## ns(log1p(VehPerDriver), 3)1 1.35483 0.51821 2.61
## ns(log1p(VehPerDriver), 3)2 13.97910 1.93649 7.22
## ns(log1p(VehPerDriver), 3)3 9.41377 3.39600 2.77
## log1p(TRPOPDEN) -0.48856 0.07674 -6.37
## D1D 0.00880 0.00482 1.82
## D1B:D2A_EPHHM -0.02015 0.00826 -2.44
## Pr(>|z|)
## (Intercept) < 0.0000000000000002 ***
## HHSIZE 0.0008 ***
## WRKCOUNT < 0.0000000000000002 ***
## LogIncome < 0.0000000000000002 ***
## Age65Plus 0.00000000273370 ***
## ns(log1p(VehPerDriver), 3)1 0.0089 **
## ns(log1p(VehPerDriver), 3)2 0.00000000000052 ***
## ns(log1p(VehPerDriver), 3)3 0.0056 **
## log1p(TRPOPDEN) 0.00000000019327 ***
## D1D 0.0681 .
## D1B:D2A_EPHHM 0.0147 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Theta: count = 2.869
## Number of iterations in BFGS optimization: 32
## Log-likelihood: -1.95e+05 on 30 Df
##
## $non_metro
##
## Call:
## pscl::hurdle(formula = AADVMT.int ~ HHSIZE + WRKCOUNT + CENSUS_R +
## LogIncome + Age65Plus + ns(log1p(VehPerDriver), 3) + log1p(TRPOPDEN) +
## log1p(EMPTOT_5) + D1D + D1B:D2A_EPHHM + ACCESS | WRKCOUNT +
## CENSUS_R + LogIncome + Age65Plus + ns(log1p(VehPerDriver), 3) +
## log1p(EMPTOT_5) + D1B:D2A_EPHHM + ACCESS, data = ., na.action = na.omit,
## dist = "negbin")
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -1.705 -0.685 -0.159 0.479 11.759
##
## Count model coefficients (truncated negbin with log link):
## Estimate Std. Error z value
## (Intercept) 0.939339 0.055473 16.93
## HHSIZE 0.173281 0.002258 76.75
## WRKCOUNT 0.183136 0.003206 57.12
## CENSUS_RNE -0.057008 0.008371 -6.81
## CENSUS_RS 0.053989 0.006653 8.12
## CENSUS_RW -0.045698 0.008971 -5.09
## LogIncome 0.200394 0.002981 67.23
## Age65Plus -0.015876 0.003241 -4.90
## ns(log1p(VehPerDriver), 3)1 1.144715 0.025061 45.68
## ns(log1p(VehPerDriver), 3)2 1.404949 0.107054 13.12
## ns(log1p(VehPerDriver), 3)3 1.024127 0.089391 11.46
## log1p(TRPOPDEN) -0.034741 0.005444 -6.38
## log1p(EMPTOT_5) -0.024967 0.001603 -15.57
## D1D -0.000789 0.000958 -0.82
## ACCESS -0.013824 0.005239 -2.64
## D1B:D2A_EPHHM -0.004449 0.002213 -2.01
## Log(theta) 1.091443 0.005492 198.75
## Pr(>|z|)
## (Intercept) < 0.0000000000000002 ***
## HHSIZE < 0.0000000000000002 ***
## WRKCOUNT < 0.0000000000000002 ***
## CENSUS_RNE 0.00000000000973672 ***
## CENSUS_RS 0.00000000000000048 ***
## CENSUS_RW 0.00000035091456787 ***
## LogIncome < 0.0000000000000002 ***
## Age65Plus 0.00000096504772293 ***
## ns(log1p(VehPerDriver), 3)1 < 0.0000000000000002 ***
## ns(log1p(VehPerDriver), 3)2 < 0.0000000000000002 ***
## ns(log1p(VehPerDriver), 3)3 < 0.0000000000000002 ***
## log1p(TRPOPDEN) 0.00000000017563824 ***
## log1p(EMPTOT_5) < 0.0000000000000002 ***
## D1D 0.4103
## ACCESS 0.0083 **
## D1B:D2A_EPHHM 0.0444 *
## Log(theta) < 0.0000000000000002 ***
## Zero hurdle model coefficients (binomial with logit link):
## Estimate Std. Error z value
## (Intercept) -7.8887 0.5919 -13.33
## WRKCOUNT 1.1217 0.0903 12.43
## CENSUS_RNE -0.4442 0.1889 -2.35
## CENSUS_RS -0.3819 0.1579 -2.42
## CENSUS_RW -0.5114 0.1919 -2.66
## LogIncome 0.7016 0.0461 15.22
## Age65Plus 0.3877 0.0640 6.06
## ns(log1p(VehPerDriver), 3)1 2.1057 0.4062 5.18
## ns(log1p(VehPerDriver), 3)2 13.3883 1.3937 9.61
## ns(log1p(VehPerDriver), 3)3 7.3197 2.3514 3.11
## log1p(EMPTOT_5) -0.0827 0.0260 -3.18
## ACCESS -0.0664 0.0611 -1.09
## D1B:D2A_EPHHM -0.0459 0.0249 -1.84
## Pr(>|z|)
## (Intercept) < 0.0000000000000002 ***
## WRKCOUNT < 0.0000000000000002 ***
## CENSUS_RNE 0.0187 *
## CENSUS_RS 0.0156 *
## CENSUS_RW 0.0077 **
## LogIncome < 0.0000000000000002 ***
## Age65Plus 0.0000000014 ***
## ns(log1p(VehPerDriver), 3)1 0.0000002179 ***
## ns(log1p(VehPerDriver), 3)2 < 0.0000000000000002 ***
## ns(log1p(VehPerDriver), 3)3 0.0019 **
## log1p(EMPTOT_5) 0.0015 **
## ACCESS 0.2770
## D1B:D2A_EPHHM 0.0653 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Theta: count = 2.979
## Number of iterations in BFGS optimization: 25
## Log-likelihood: -3.55e+05 on 30 Df
metro | rmse | nrmse | pseudo.r2 |
---|---|---|---|
metro | 30.9 | 0.577 | 0.0445 |
non_metro | 35.0 | 0.579 | 0.0458 |
The AADVMT hurdle model adds more complexity yet brings little benefits in terms of prediction accuracies (rmse) or sensitivities, thus the AADVMT power-transformed model is preferred.
## $metro
##
## Call:
## pscl::hurdle(formula = int_round(td.miles.Transit) ~ log1p(VehPerDriver) +
## HHSIZE + WRKCOUNT + LIF_CYC + Age0to14 + LogIncome + D1D + D2A_EPHHM +
## Fwylnmicap + Tranmilescap + D4c | AADVMT + VehPerDriver + HHSIZE +
## WRKCOUNT + LIF_CYC + Age0to14 + D1D + D3bmm4 + Fwylnmicap +
## Tranmilescap:D4c + LogIncome, data = .)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -3.221 -0.268 -0.153 -0.117 27.418
##
## Count model coefficients (truncated poisson with log link):
## Estimate Std. Error z value
## (Intercept) 1.9212620 0.0995875 19.29
## log1p(VehPerDriver) -0.1291904 0.0355742 -3.63
## HHSIZE 0.0653538 0.0063841 10.24
## WRKCOUNT 0.0268560 0.0084115 3.19
## LIF_CYCCouple w/o children 0.2762891 0.0214246 12.90
## LIF_CYCEmpty Nester 0.2650850 0.0240630 11.02
## LIF_CYCSingle 0.3283556 0.0369382 8.89
## Age0to14 -0.0193269 0.0086041 -2.25
## LogIncome 0.0239053 0.0084591 2.83
## D1D -0.0014416 0.0001659 -8.69
## D2A_EPHHM -0.0562551 0.0272781 -2.06
## Fwylnmicap -0.1631917 0.0418730 -3.90
## Tranmilescap 0.0027340 0.0004983 5.49
## D4c 0.0002471 0.0000742 3.33
## Pr(>|z|)
## (Intercept) < 0.0000000000000002 ***
## log1p(VehPerDriver) 0.00028 ***
## HHSIZE < 0.0000000000000002 ***
## WRKCOUNT 0.00141 **
## LIF_CYCCouple w/o children < 0.0000000000000002 ***
## LIF_CYCEmpty Nester < 0.0000000000000002 ***
## LIF_CYCSingle < 0.0000000000000002 ***
## Age0to14 0.02469 *
## LogIncome 0.00471 **
## D1D < 0.0000000000000002 ***
## D2A_EPHHM 0.03918 *
## Fwylnmicap 0.000097270 ***
## Tranmilescap 0.000000041 ***
## D4c 0.00087 ***
## Zero hurdle model coefficients (binomial with logit link):
## Estimate Std. Error z value
## (Intercept) -0.9625233 0.3684409 -2.61
## AADVMT -0.0037454 0.0006157 -6.08
## VehPerDriver -0.7904598 0.0731318 -10.81
## HHSIZE 0.0620903 0.0249369 2.49
## WRKCOUNT 0.3095209 0.0307155 10.08
## LIF_CYCCouple w/o children -1.1173552 0.0773660 -14.44
## LIF_CYCEmpty Nester -1.6122166 0.0893579 -18.04
## LIF_CYCSingle -1.3448620 0.1312501 -10.25
## Age0to14 0.4511670 0.0310501 14.53
## D1D 0.0060544 0.0008202 7.38
## D3bmm4 0.0030048 0.0018077 1.66
## Fwylnmicap -0.2489311 0.1327192 -1.88
## LogIncome -0.0744235 0.0313442 -2.37
## Tranmilescap:D4c 0.0001073 0.0000167 6.43
## Pr(>|z|)
## (Intercept) 0.009 **
## AADVMT 0.00000000117985 ***
## VehPerDriver < 0.0000000000000002 ***
## HHSIZE 0.013 *
## WRKCOUNT < 0.0000000000000002 ***
## LIF_CYCCouple w/o children < 0.0000000000000002 ***
## LIF_CYCEmpty Nester < 0.0000000000000002 ***
## LIF_CYCSingle < 0.0000000000000002 ***
## Age0to14 < 0.0000000000000002 ***
## D1D 0.00000000000016 ***
## D3bmm4 0.096 .
## Fwylnmicap 0.061 .
## LogIncome 0.018 *
## Tranmilescap:D4c 0.00000000013145 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 30
## Log-likelihood: -2.06e+04 on 28 Df
##
## $non_metro
##
## Call:
## pscl::hurdle(formula = int_round(td.miles.Transit) ~ log1p(VehPerDriver) +
## HHSIZE + WRKCOUNT + LIF_CYC + LogIncome + UZAPOPDEN + D2A_EPHHM |
## AADVMT + VehPerDriver + HHSIZE + WRKCOUNT + LIF_CYC + Age0to14 +
## Age65Plus + LogIncome + D3apo + WRKCOUNT + CENSUS_R, data = .)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -1.1727 -0.1312 -0.0775 -0.0651 61.3687
##
## Count model coefficients (truncated poisson with log link):
## Estimate Std. Error z value
## (Intercept) 2.35475 0.14358 16.40
## log1p(VehPerDriver) 0.07368 0.04643 1.59
## HHSIZE 0.08088 0.00833 9.71
## WRKCOUNT 0.00169 0.01235 0.14
## LIF_CYCCouple w/o children -0.10792 0.05199 -2.08
## LIF_CYCEmpty Nester 0.30676 0.04500 6.82
## LIF_CYCSingle -0.13659 0.10128 -1.35
## LogIncome -0.02427 0.01286 -1.89
## UZAPOPDEN -0.04786 0.00784 -6.11
## D2A_EPHHM -0.24876 0.03980 -6.25
## Pr(>|z|)
## (Intercept) < 0.0000000000000002 ***
## log1p(VehPerDriver) 0.113
## HHSIZE < 0.0000000000000002 ***
## WRKCOUNT 0.891
## LIF_CYCCouple w/o children 0.038 *
## LIF_CYCEmpty Nester 0.0000000000093 ***
## LIF_CYCSingle 0.177
## LogIncome 0.059 .
## UZAPOPDEN 0.0000000010246 ***
## D2A_EPHHM 0.0000000004118 ***
## Zero hurdle model coefficients (binomial with logit link):
## Estimate Std. Error z value
## (Intercept) -0.499103 0.492886 -1.01
## AADVMT -0.000277 0.000651 -0.43
## VehPerDriver -0.132132 0.076127 -1.74
## HHSIZE -0.034550 0.035261 -0.98
## WRKCOUNT 0.123666 0.042565 2.91
## LIF_CYCCouple w/o children -2.399990 0.145637 -16.48
## LIF_CYCEmpty Nester -3.071090 0.178348 -17.22
## LIF_CYCSingle -2.887385 0.262570 -11.00
## Age0to14 0.535248 0.040731 13.14
## Age65Plus 0.130595 0.083443 1.57
## LogIncome -0.151872 0.043401 -3.50
## D3apo -0.019118 0.005504 -3.47
## CENSUS_RNE 0.666180 0.109336 6.09
## CENSUS_RS 0.075224 0.089570 0.84
## CENSUS_RW -0.190835 0.114349 -1.67
## Pr(>|z|)
## (Intercept) 0.31124
## AADVMT 0.67033
## VehPerDriver 0.08262 .
## HHSIZE 0.32717
## WRKCOUNT 0.00367 **
## LIF_CYCCouple w/o children < 0.0000000000000002 ***
## LIF_CYCEmpty Nester < 0.0000000000000002 ***
## LIF_CYCSingle < 0.0000000000000002 ***
## Age0to14 < 0.0000000000000002 ***
## Age65Plus 0.11757
## LogIncome 0.00047 ***
## D3apo 0.00051 ***
## CENSUS_RNE 0.0000000011 ***
## CENSUS_RS 0.40100
## CENSUS_RW 0.09514 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 35
## Log-likelihood: -9.83e+03 on 25 Df
metro | rmse | nrmse | pseudo.r2 |
---|---|---|---|
metro | 3.37 | 4.36 | 0.440 |
non_metro | 2.33 | 4.75 | 0.678 |
## $metro
##
## Call:
## pscl::hurdle(formula = int_round(td.miles.Walk) ~ VehPerDriver +
## HHSIZE + LIF_CYC + Age0to14 + D3bpo4 + Fwylnmicap + LogIncome |
## AADVMT + VehPerDriver + HHSIZE + LIF_CYC + Age0to14 + D1D +
## D2A_EPHHM + D3bmm4 + D3bpo4 + ACCESS + WRKCOUNT + Fwylnmicap +
## Tranmilescap + LogIncome + D3apo + D4c, data = .)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -1.759 -0.463 -0.391 -0.310 10.660
##
## Count model coefficients (truncated poisson with log link):
## Estimate Std. Error z value
## (Intercept) -0.907874 0.178813 -5.08
## VehPerDriver -0.049935 0.023479 -2.13
## HHSIZE 0.060988 0.011543 5.28
## LIF_CYCCouple w/o children 0.119167 0.032594 3.66
## LIF_CYCEmpty Nester 0.101605 0.033228 3.06
## LIF_CYCSingle -0.013599 0.056549 -0.24
## Age0to14 0.029688 0.016598 1.79
## D3bpo4 0.000793 0.000324 2.45
## Fwylnmicap -0.192943 0.060215 -3.20
## LogIncome 0.120746 0.014843 8.14
## Pr(>|z|)
## (Intercept) 0.00000038298006387 ***
## VehPerDriver 0.03344 *
## HHSIZE 0.00000012661306718 ***
## LIF_CYCCouple w/o children 0.00026 ***
## LIF_CYCEmpty Nester 0.00223 **
## LIF_CYCSingle 0.80996
## Age0to14 0.07368 .
## D3bpo4 0.01444 *
## Fwylnmicap 0.00135 **
## LogIncome 0.00000000000000041 ***
## Zero hurdle model coefficients (binomial with logit link):
## Estimate Std. Error z value
## (Intercept) -4.013593 0.238696 -16.81
## AADVMT -0.001190 0.000339 -3.51
## VehPerDriver -0.146407 0.032266 -4.54
## HHSIZE 0.116488 0.017506 6.65
## LIF_CYCCouple w/o children -0.143977 0.044730 -3.22
## LIF_CYCEmpty Nester -0.137506 0.046731 -2.94
## LIF_CYCSingle -0.298566 0.068666 -4.35
## Age0to14 0.068833 0.024435 2.82
## D1D 0.001792 0.000910 1.97
## D2A_EPHHM 0.120234 0.059099 2.03
## D3bmm4 0.008061 0.001149 7.02
## D3bpo4 0.001364 0.000649 2.10
## ACCESS 0.027498 0.004874 5.64
## WRKCOUNT 0.114347 0.019439 5.88
## Fwylnmicap -0.332528 0.083261 -3.99
## Tranmilescap 0.007673 0.001168 6.57
## LogIncome 0.205425 0.019768 10.39
## D3apo 0.011566 0.002910 3.97
## D4c 0.000489 0.000196 2.49
## Pr(>|z|)
## (Intercept) < 0.0000000000000002 ***
## AADVMT 0.00045 ***
## VehPerDriver 0.0000056918330 ***
## HHSIZE 0.0000000000285 ***
## LIF_CYCCouple w/o children 0.00129 **
## LIF_CYCEmpty Nester 0.00326 **
## LIF_CYCSingle 0.0000137312743 ***
## Age0to14 0.00485 **
## D1D 0.04903 *
## D2A_EPHHM 0.04191 *
## D3bmm4 0.0000000000023 ***
## D3bpo4 0.03552 *
## ACCESS 0.0000000168321 ***
## WRKCOUNT 0.0000000040428 ***
## Fwylnmicap 0.0000650219698 ***
## Tranmilescap 0.0000000000499 ***
## LogIncome < 0.0000000000000002 ***
## D3apo 0.0000706825964 ***
## D4c 0.01275 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 22
## Log-likelihood: -3.11e+04 on 29 Df
##
## $non_metro
##
## Call:
## pscl::hurdle(formula = int_round(td.miles.Walk) ~ HHSIZE + LIF_CYC +
## D3apo + LogIncome | AADVMT + VehPerDriver + HHSIZE + LIF_CYC +
## Age0to14 + D1D + D2A_EPHHM + WRKCOUNT + LogIncome + D3apo, data = .)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.944 -0.386 -0.338 -0.283 13.422
##
## Count model coefficients (truncated poisson with log link):
## Estimate Std. Error z value
## (Intercept) -1.35795 0.16263 -8.35
## HHSIZE 0.04621 0.01102 4.19
## LIF_CYCCouple w/o children 0.06395 0.03266 1.96
## LIF_CYCEmpty Nester 0.07535 0.03276 2.30
## LIF_CYCSingle 0.11151 0.05509 2.02
## D3apo 0.00407 0.00149 2.73
## LogIncome 0.14341 0.01398 10.26
## Pr(>|z|)
## (Intercept) < 0.0000000000000002 ***
## HHSIZE 0.000027 ***
## LIF_CYCCouple w/o children 0.0502 .
## LIF_CYCEmpty Nester 0.0215 *
## LIF_CYCSingle 0.0430 *
## D3apo 0.0063 **
## LogIncome < 0.0000000000000002 ***
## Zero hurdle model coefficients (binomial with logit link):
## Estimate Std. Error z value
## (Intercept) -5.004156 0.189416 -26.42
## AADVMT -0.000568 0.000254 -2.24
## VehPerDriver -0.201467 0.023947 -8.41
## HHSIZE 0.087666 0.016330 5.37
## LIF_CYCCouple w/o children -0.087148 0.039682 -2.20
## LIF_CYCEmpty Nester -0.062195 0.041998 -1.48
## LIF_CYCSingle -0.138422 0.062963 -2.20
## Age0to14 0.112863 0.021986 5.13
## D1D 0.020040 0.003220 6.22
## D2A_EPHHM 0.089203 0.050131 1.78
## WRKCOUNT 0.073121 0.016876 4.33
## LogIncome 0.286021 0.016677 17.15
## D3apo 0.016426 0.002109 7.79
## Pr(>|z|)
## (Intercept) < 0.0000000000000002 ***
## AADVMT 0.025 *
## VehPerDriver < 0.0000000000000002 ***
## HHSIZE 0.0000000793832374 ***
## LIF_CYCCouple w/o children 0.028 *
## LIF_CYCEmpty Nester 0.139
## LIF_CYCSingle 0.028 *
## Age0to14 0.0000002845411910 ***
## D1D 0.0000000004846051 ***
## D2A_EPHHM 0.075 .
## WRKCOUNT 0.0000147146567698 ***
## LogIncome < 0.0000000000000002 ***
## D3apo 0.0000000000000068 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 24
## Log-likelihood: -4.14e+04 on 20 Df
metro | rmse | pseudo.r2 |
---|---|---|
metro | 1.013 | 0.340 |
non_metro | 0.806 | 0.122 |
## $metro
##
## Call:
## pscl::hurdle(formula = int_round(td.miles.Bike) ~ LIF_CYC + Age0to14 +
## D3apo | log1p(VehPerDriver) + HHSIZE + WRKCOUNT + LIF_CYC +
## Age0to14 + D3bpo4 + Fwylnmicap + Tranmilescap:D4c + Tranmilescap,
## data = .)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.6224 -0.1198 -0.0901 -0.0763 22.6290
##
## Count model coefficients (truncated poisson with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.65068 0.10133 6.42 0.00000000013 ***
## LIF_CYCCouple w/o children 0.29517 0.09756 3.03 0.0025 **
## LIF_CYCEmpty Nester 0.14039 0.09390 1.50 0.1349
## LIF_CYCSingle 0.14000 0.14464 0.97 0.3331
## Age0to14 -0.11177 0.04795 -2.33 0.0197 *
## D3apo -0.00443 0.00509 -0.87 0.3840
## Zero hurdle model coefficients (binomial with logit link):
## Estimate Std. Error z value
## (Intercept) -2.8757916 0.2657413 -10.82
## log1p(VehPerDriver) -0.7692363 0.1750131 -4.40
## HHSIZE 0.0723571 0.0412962 1.75
## WRKCOUNT 0.1737007 0.0507142 3.43
## LIF_CYCCouple w/o children -0.5943595 0.1340230 -4.43
## LIF_CYCEmpty Nester -0.6467664 0.1358374 -4.76
## LIF_CYCSingle -0.6461547 0.2050500 -3.15
## Age0to14 0.4389457 0.0533470 8.23
## D3bpo4 0.0046401 0.0012066 3.85
## Fwylnmicap -1.1209505 0.2404129 -4.66
## Tranmilescap -0.0209388 0.0040677 -5.15
## Tranmilescap:D4c -0.0000300 0.0000142 -2.10
## Pr(>|z|)
## (Intercept) < 0.0000000000000002 ***
## log1p(VehPerDriver) 0.00001106 ***
## HHSIZE 0.07975 .
## WRKCOUNT 0.00061 ***
## LIF_CYCCouple w/o children 0.00000922 ***
## LIF_CYCEmpty Nester 0.00000192 ***
## LIF_CYCSingle 0.00163 **
## Age0to14 < 0.0000000000000002 ***
## D3bpo4 0.00012 ***
## Fwylnmicap 0.00000312 ***
## Tranmilescap 0.00000026 ***
## Tranmilescap:D4c 0.03535 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 24
## Log-likelihood: -4.5e+03 on 18 Df
##
## $non_metro
##
## Call:
## pscl::hurdle(formula = int_round(td.miles.Bike) ~ Age0to14 + D3bpo4 |
## AADVMT + log1p(VehPerDriver) + LIF_CYC + Age0to14 + LogIncome,
## data = .)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.5517 -0.0963 -0.0617 -0.0566 31.4488
##
## Count model coefficients (truncated poisson with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.22866 0.06086 3.76 0.00017 ***
## Age0to14 -0.10417 0.04887 -2.13 0.03303 *
## D3bpo4 0.00261 0.00190 1.38 0.16839
## Zero hurdle model coefficients (binomial with logit link):
## Estimate Std. Error z value
## (Intercept) -5.784563 0.664318 -8.71
## AADVMT -0.001752 0.000908 -1.93
## log1p(VehPerDriver) -0.649427 0.229176 -2.83
## LIF_CYCCouple w/o children -1.133926 0.142812 -7.94
## LIF_CYCEmpty Nester -1.134590 0.125830 -9.02
## LIF_CYCSingle -0.934868 0.211834 -4.41
## Age0to14 0.474818 0.045337 10.47
## LogIncome 0.194778 0.060384 3.23
## Pr(>|z|)
## (Intercept) < 0.0000000000000002 ***
## AADVMT 0.0537 .
## log1p(VehPerDriver) 0.0046 **
## LIF_CYCCouple w/o children 0.000000000000002 ***
## LIF_CYCEmpty Nester < 0.0000000000000002 ***
## LIF_CYCSingle 0.000010184556702 ***
## Age0to14 < 0.0000000000000002 ***
## LogIncome 0.0013 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 16
## Log-likelihood: -4.01e+03 on 11 Df
## # A tibble: 2 × 3
## metro rmse pseudo.r2
## <chr> <dbl> <dbl>
## 1 metro 0.313 0.237
## 2 non_metro 0.179 0.161
An alternative model structure we propose is a combination of household level models of trip frequency and average trip length by mode (Figure 2).
Figure 1: Figure 2: Flow Chart of Trip Frequency-Length Model
The trip frequency models of Transit, Bike, and Walk are hurdle models of dependent variable (# Trips): \((\# Trips) = zinb(X\beta)\). The hurdle models allows the “inflated” zeros in Transit, Bike, and Walk trip counts to be accounted. It differs from a hurdle model in that a hurdle model allows zeros to arise from both the zero inflation process and the count process, while a hurdle model only allows zeros to arise from the zero hurdle process but not the count process. Like other models, the trip frequency models are segmented by metro and non-metro areas.
## $metro
##
## Call:
## pscl::hurdle(formula = ntrips.Transit ~ AADVMT + HHSIZE + LIF_CYC +
## Age0to14 + D1D + Tranmilescap + D4c | AADVMT + VehPerDriver +
## HHSIZE + WRKCOUNT + LIF_CYC + Age0to14 + D1D + Fwylnmicap +
## Tranmilescap:D4c, data = ., na.action = na.omit)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -2.457 -0.254 -0.155 -0.115 32.570
##
## Count model coefficients (truncated poisson with log link):
## Estimate Std. Error z value
## (Intercept) 0.048617 0.057046 0.85
## AADVMT -0.001171 0.000338 -3.47
## HHSIZE 0.112374 0.012259 9.17
## LIF_CYCCouple w/o children 0.148899 0.050201 2.97
## LIF_CYCEmpty Nester 0.150393 0.055856 2.69
## LIF_CYCSingle 0.006653 0.097939 0.07
## Age0to14 0.167051 0.016449 10.16
## D1D -0.000692 0.000313 -2.21
## Tranmilescap 0.005202 0.000920 5.65
## D4c 0.000694 0.000158 4.39
## Pr(>|z|)
## (Intercept) 0.39408
## AADVMT 0.00053 ***
## HHSIZE < 0.0000000000000002 ***
## LIF_CYCCouple w/o children 0.00302 **
## LIF_CYCEmpty Nester 0.00709 **
## LIF_CYCSingle 0.94584
## Age0to14 < 0.0000000000000002 ***
## D1D 0.02689 *
## Tranmilescap 0.000000016 ***
## D4c 0.000011527 ***
## Zero hurdle model coefficients (binomial with logit link):
## Estimate Std. Error z value
## (Intercept) -1.5660306 0.1334279 -11.74
## AADVMT -0.0039284 0.0005527 -7.11
## VehPerDriver -0.8753771 0.0668357 -13.10
## HHSIZE 0.1124917 0.0225501 4.99
## WRKCOUNT 0.2854531 0.0273473 10.44
## LIF_CYCCouple w/o children -0.9103851 0.0678056 -13.43
## LIF_CYCEmpty Nester -1.5156797 0.0790749 -19.17
## LIF_CYCSingle -1.1199809 0.1148641 -9.75
## Age0to14 0.4018514 0.0285833 14.06
## D1D 0.0063118 0.0007727 8.17
## Fwylnmicap -0.4099763 0.1208247 -3.39
## Tranmilescap:D4c 0.0001036 0.0000148 6.99
## Pr(>|z|)
## (Intercept) < 0.0000000000000002 ***
## AADVMT 0.00000000000117882 ***
## VehPerDriver < 0.0000000000000002 ***
## HHSIZE 0.00000060841501968 ***
## WRKCOUNT < 0.0000000000000002 ***
## LIF_CYCCouple w/o children < 0.0000000000000002 ***
## LIF_CYCEmpty Nester < 0.0000000000000002 ***
## LIF_CYCSingle < 0.0000000000000002 ***
## Age0to14 < 0.0000000000000002 ***
## D1D 0.00000000000000031 ***
## Fwylnmicap 0.00069 ***
## Tranmilescap:D4c 0.00000000000267718 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 32
## Log-likelihood: -1.39e+04 on 22 Df
##
## $non_metro
##
## Call:
## pscl::hurdle(formula = ntrips.Transit ~ log1p(AADVMT) + log1p(VehPerDriver) +
## HHSIZE + LIF_CYC + Age0to14 + LogIncome + D1D | log1p(AADVMT) +
## log1p(VehPerDriver) + WRKCOUNT + LIF_CYC + Age0to14 + D1B +
## D3bmm4 + LogIncome, data = ., na.action = na.omit)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -2.0226 -0.1014 -0.0818 -0.0718 56.3148
##
## Count model coefficients (truncated poisson with log link):
## Estimate Std. Error z value
## (Intercept) 0.4628 0.1781 2.60
## log1p(AADVMT) -0.0397 0.0188 -2.12
## log1p(VehPerDriver) 0.1421 0.0636 2.23
## HHSIZE 0.1421 0.0110 12.87
## LIF_CYCCouple w/o children 0.3869 0.0709 5.46
## LIF_CYCEmpty Nester 0.4382 0.0663 6.61
## LIF_CYCSingle 0.5946 0.1258 4.73
## Age0to14 0.2310 0.0150 15.43
## LogIncome -0.0467 0.0165 -2.83
## D1D -0.0158 0.0047 -3.37
## Pr(>|z|)
## (Intercept) 0.00934 **
## log1p(AADVMT) 0.03442 *
## log1p(VehPerDriver) 0.02555 *
## HHSIZE < 0.0000000000000002 ***
## LIF_CYCCouple w/o children 0.000000048842 ***
## LIF_CYCEmpty Nester 0.000000000038 ***
## LIF_CYCSingle 0.000002285863 ***
## Age0to14 < 0.0000000000000002 ***
## LogIncome 0.00468 **
## D1D 0.00076 ***
## Zero hurdle model coefficients (binomial with logit link):
## Estimate Std. Error z value
## (Intercept) -0.71019 0.27560 -2.58
## log1p(AADVMT) -0.08173 0.02803 -2.92
## log1p(VehPerDriver) -0.23302 0.09966 -2.34
## WRKCOUNT 0.11309 0.02412 4.69
## LIF_CYCCouple w/o children -2.50520 0.08279 -30.26
## LIF_CYCEmpty Nester -2.92327 0.08555 -34.17
## LIF_CYCSingle -2.75239 0.15258 -18.04
## Age0to14 0.53435 0.01964 27.21
## D1B -0.03508 0.00693 -5.07
## D3bmm4 -0.01023 0.00441 -2.32
## LogIncome -0.07992 0.02634 -3.03
## Pr(>|z|)
## (Intercept) 0.0100 **
## log1p(AADVMT) 0.0035 **
## log1p(VehPerDriver) 0.0194 *
## WRKCOUNT 0.00000276 ***
## LIF_CYCCouple w/o children < 0.0000000000000002 ***
## LIF_CYCEmpty Nester < 0.0000000000000002 ***
## LIF_CYCSingle < 0.0000000000000002 ***
## Age0to14 < 0.0000000000000002 ***
## D1B 0.00000041 ***
## D3bmm4 0.0204 *
## LogIncome 0.0024 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 23
## Log-likelihood: -1.69e+04 on 21 Df
2 validation
metro | rmse | pseudo.r2 |
---|---|---|
metro | 0.704 | 0.125 |
non_metro | 0.651 | 0.203 |
## $metro
##
## Call:
## pscl::hurdle(formula = ntrips.Walk ~ AADVMT + VehPerDriver + HHSIZE +
## LIF_CYC + Age0to14 + D1D + D2A_EPHHM + D3bmm4 + D3bpo4 + ACCESS +
## Fwylnmicap + Tranmilescap + LogIncome + D3apo + D4c | AADVMT +
## VehPerDriver + HHSIZE + LIF_CYC + Age0to14 + D1D + D2A_EPHHM +
## D3bmm4 + D3bpo4 + ACCESS + WRKCOUNT + Fwylnmicap + Tranmilescap +
## LogIncome + D3apo + D4c, data = ., na.action = na.omit)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -2.428 -0.546 -0.450 0.198 11.639
##
## Count model coefficients (truncated poisson with log link):
## Estimate Std. Error z value
## (Intercept) 0.3396326 0.1108162 3.06
## AADVMT -0.0006296 0.0001593 -3.95
## VehPerDriver -0.0627771 0.0159711 -3.93
## HHSIZE 0.0661640 0.0071273 9.28
## LIF_CYCCouple w/o children 0.0818843 0.0204502 4.00
## LIF_CYCEmpty Nester 0.0149507 0.0209264 0.71
## LIF_CYCSingle -0.1564482 0.0359595 -4.35
## Age0to14 0.1787000 0.0095290 18.75
## D1D 0.0002525 0.0001596 1.58
## D2A_EPHHM 0.0763013 0.0287273 2.66
## D3bmm4 0.0021534 0.0004074 5.29
## D3bpo4 0.0007432 0.0002638 2.82
## ACCESS 0.0043409 0.0010836 4.01
## Fwylnmicap -0.2160661 0.0418904 -5.16
## Tranmilescap 0.0020197 0.0005384 3.75
## LogIncome 0.0381371 0.0089785 4.25
## D3apo 0.0032860 0.0013631 2.41
## D4c 0.0002665 0.0000787 3.39
## Pr(>|z|)
## (Intercept) 0.00218 **
## AADVMT 0.00007773 ***
## VehPerDriver 0.00008471 ***
## HHSIZE < 0.0000000000000002 ***
## LIF_CYCCouple w/o children 0.00006226 ***
## LIF_CYCEmpty Nester 0.47495
## LIF_CYCSingle 0.00001357 ***
## Age0to14 < 0.0000000000000002 ***
## D1D 0.11373
## D2A_EPHHM 0.00791 **
## D3bmm4 0.00000013 ***
## D3bpo4 0.00483 **
## ACCESS 0.00006176 ***
## Fwylnmicap 0.00000025 ***
## Tranmilescap 0.00018 ***
## LogIncome 0.00002161 ***
## D3apo 0.01592 *
## D4c 0.00071 ***
## Zero hurdle model coefficients (binomial with logit link):
## Estimate Std. Error z value
## (Intercept) -3.139966 0.212511 -14.78
## AADVMT -0.001089 0.000306 -3.55
## VehPerDriver -0.151875 0.028867 -5.26
## HHSIZE 0.115032 0.016143 7.13
## LIF_CYCCouple w/o children -0.158409 0.040800 -3.88
## LIF_CYCEmpty Nester -0.201468 0.042687 -4.72
## LIF_CYCSingle -0.273041 0.061517 -4.44
## Age0to14 0.100385 0.022590 4.44
## D1D 0.002613 0.000949 2.75
## D2A_EPHHM 0.111833 0.053647 2.08
## D3bmm4 0.007243 0.001089 6.65
## D3bpo4 0.002233 0.000599 3.73
## ACCESS 0.028885 0.005245 5.51
## WRKCOUNT 0.127150 0.017774 7.15
## Fwylnmicap -0.296850 0.074853 -3.97
## Tranmilescap 0.008907 0.001074 8.29
## LogIncome 0.159085 0.017562 9.06
## D3apo 0.010425 0.002659 3.92
## D4c 0.000461 0.000188 2.46
## Pr(>|z|)
## (Intercept) < 0.0000000000000002 ***
## AADVMT 0.00038 ***
## VehPerDriver 0.00000014305766 ***
## HHSIZE 0.00000000000104 ***
## LIF_CYCCouple w/o children 0.00010 ***
## LIF_CYCEmpty Nester 0.00000236229526 ***
## LIF_CYCSingle 0.00000906148248 ***
## Age0to14 0.00000883779981 ***
## D1D 0.00589 **
## D2A_EPHHM 0.03711 *
## D3bmm4 0.00000000002937 ***
## D3bpo4 0.00019 ***
## ACCESS 0.00000003642928 ***
## WRKCOUNT 0.00000000000085 ***
## Fwylnmicap 0.00007315368140 ***
## Tranmilescap < 0.0000000000000002 ***
## LogIncome < 0.0000000000000002 ***
## D3apo 0.00008836073023 ***
## D4c 0.01399 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 33
## Log-likelihood: -4.32e+04 on 37 Df
##
## $non_metro
##
## Call:
## pscl::hurdle(formula = ntrips.Walk ~ AADVMT + VehPerDriver + HHSIZE +
## LIF_CYC + Age0to14 + D1D + D2A_EPHHM + D3bpo4 + ACCESS + WRKCOUNT +
## LogIncome | AADVMT + VehPerDriver + HHSIZE + LIF_CYC + Age0to14 +
## D1D + D2A_EPHHM + D3bpo4 + WRKCOUNT + LogIncome + D3apo, data = .,
## na.action = na.omit)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -1.325 -0.464 -0.401 -0.320 20.853
##
## Count model coefficients (truncated poisson with log link):
## Estimate Std. Error z value
## (Intercept) 0.299015 0.095775 3.12
## AADVMT -0.000333 0.000131 -2.54
## VehPerDriver -0.060192 0.012722 -4.73
## HHSIZE 0.043963 0.007597 5.79
## LIF_CYCCouple w/o children 0.049262 0.019986 2.46
## LIF_CYCEmpty Nester -0.007278 0.021042 -0.35
## LIF_CYCSingle -0.051017 0.034032 -1.50
## Age0to14 0.132969 0.009983 13.32
## D1D 0.011622 0.001456 7.98
## D2A_EPHHM 0.067366 0.026231 2.57
## D3bpo4 0.001211 0.000291 4.16
## ACCESS -0.026325 0.009260 -2.84
## WRKCOUNT -0.015735 0.008499 -1.85
## LogIncome 0.045475 0.008372 5.43
## Pr(>|z|)
## (Intercept) 0.0018 **
## AADVMT 0.0111 *
## VehPerDriver 0.0000022320118256 ***
## HHSIZE 0.0000000071777780 ***
## LIF_CYCCouple w/o children 0.0137 *
## LIF_CYCEmpty Nester 0.7294
## LIF_CYCSingle 0.1338
## Age0to14 < 0.0000000000000002 ***
## D1D 0.0000000000000014 ***
## D2A_EPHHM 0.0102 *
## D3bpo4 0.0000311868571506 ***
## ACCESS 0.0045 **
## WRKCOUNT 0.0641 .
## LogIncome 0.0000000557378292 ***
## Zero hurdle model coefficients (binomial with logit link):
## Estimate Std. Error z value
## (Intercept) -4.035095 0.162199 -24.88
## AADVMT -0.000651 0.000222 -2.93
## VehPerDriver -0.193868 0.020660 -9.38
## HHSIZE 0.092010 0.014375 6.40
## LIF_CYCCouple w/o children -0.143740 0.034839 -4.13
## LIF_CYCEmpty Nester -0.128328 0.036772 -3.49
## LIF_CYCSingle -0.199495 0.054693 -3.65
## Age0to14 0.104998 0.019551 5.37
## D1D 0.022151 0.003020 7.34
## D2A_EPHHM 0.115400 0.044049 2.62
## D3bpo4 -0.002354 0.000756 -3.11
## WRKCOUNT 0.078126 0.014819 5.27
## LogIncome 0.235719 0.014362 16.41
## D3apo 0.020123 0.002598 7.75
## Pr(>|z|)
## (Intercept) < 0.0000000000000002 ***
## AADVMT 0.00340 **
## VehPerDriver < 0.0000000000000002 ***
## HHSIZE 0.0000000001544648 ***
## LIF_CYCCouple w/o children 0.0000369449170576 ***
## LIF_CYCEmpty Nester 0.00048 ***
## LIF_CYCSingle 0.00026 ***
## Age0to14 0.0000000785595015 ***
## D1D 0.0000000000002207 ***
## D2A_EPHHM 0.00880 **
## D3bpo4 0.00184 **
## WRKCOUNT 0.0000001349459596 ***
## LogIncome < 0.0000000000000002 ***
## D3apo 0.0000000000000095 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 24
## Log-likelihood: -6.02e+04 on 28 Df
2 validation
metro | rmse | pseudo.r2 |
---|---|---|
metro | 1.69 | 0.0406 |
non_metro | 1.42 | 0.0217 |
## $metro
##
## Call:
## pscl::hurdle(formula = ntrips.Bike ~ AADVMT + Age0to14 + Age65Plus +
## D1C + D3bpo4 + WRKCOUNT + LogIncome | log1p(AADVMT) + HHSIZE +
## LIF_CYC + Age0to14 + Age65Plus + D2A_EPHHM + D3bpo4 + WRKCOUNT +
## Fwylnmicap + Tranmilescap + LogIncome, data = ., na.action = na.omit)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.7027 -0.1645 -0.1205 -0.0981 39.8895
##
## Count model coefficients (truncated poisson with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.119569 0.320612 -0.37 0.7092
## AADVMT -0.001340 0.000482 -2.78 0.0054 **
## Age0to14 0.151201 0.019000 7.96 0.0000000000000018 ***
## Age65Plus 0.093488 0.035817 2.61 0.0090 **
## D1C 0.003128 0.001747 1.79 0.0734 .
## D3bpo4 0.002008 0.000723 2.78 0.0054 **
## WRKCOUNT 0.046955 0.024350 1.93 0.0538 .
## LogIncome 0.069766 0.029169 2.39 0.0168 *
## Zero hurdle model coefficients (binomial with logit link):
## Estimate Std. Error z value
## (Intercept) -4.967972 0.488708 -10.17
## log1p(AADVMT) -0.074752 0.038759 -1.93
## HHSIZE 0.085512 0.031548 2.71
## LIF_CYCCouple w/o children -0.539160 0.092424 -5.83
## LIF_CYCEmpty Nester -0.446249 0.116576 -3.83
## LIF_CYCSingle -0.674839 0.154108 -4.38
## Age0to14 0.389527 0.038843 10.03
## Age65Plus -0.238551 0.063021 -3.79
## D2A_EPHHM 0.312249 0.117192 2.66
## D3bpo4 0.003756 0.000975 3.85
## WRKCOUNT 0.183997 0.038413 4.79
## Fwylnmicap -0.921581 0.128641 -7.16
## Tranmilescap -0.017346 0.002804 -6.19
## LogIncome 0.192054 0.043330 4.43
## Pr(>|z|)
## (Intercept) < 0.0000000000000002 ***
## log1p(AADVMT) 0.05377 .
## HHSIZE 0.00672 **
## LIF_CYCCouple w/o children 0.00000000542706 ***
## LIF_CYCEmpty Nester 0.00013 ***
## LIF_CYCSingle 0.00001192270338 ***
## Age0to14 < 0.0000000000000002 ***
## Age65Plus 0.00015 ***
## D2A_EPHHM 0.00771 **
## D3bpo4 0.00012 ***
## WRKCOUNT 0.00000166819099 ***
## Fwylnmicap 0.00000000000078 ***
## Tranmilescap 0.00000000061729 ***
## LogIncome 0.00000931996713 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 30
## Log-likelihood: -8.65e+03 on 22 Df
##
## $non_metro
##
## Call:
## pscl::hurdle(formula = ntrips.Bike ~ AADVMT + VehPerDriver + HHSIZE +
## LIF_CYC + Age0to14 + Age65Plus + D1D + ACCESS + WRKCOUNT + LogIncome +
## D3apo | AADVMT + VehPerDriver + LIF_CYC + Age0to14 + Age65Plus +
## D1A + D2A_EPHHM + ACCESS + WRKCOUNT + LogIncome + D3apo, data = .,
## na.action = na.omit)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.6605 -0.1437 -0.1056 -0.0873 39.0882
##
## Count model coefficients (truncated poisson with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.536486 0.288182 5.33 0.000000097 ***
## AADVMT -0.001198 0.000465 -2.57 0.01003 *
## VehPerDriver -0.061889 0.041843 -1.48 0.13912
## HHSIZE 0.048332 0.022306 2.17 0.03025 *
## LIF_CYCCouple w/o children 0.109894 0.066463 1.65 0.09823 .
## LIF_CYCEmpty Nester 0.205788 0.078116 2.63 0.00843 **
## LIF_CYCSingle -0.141329 0.123715 -1.14 0.25330
## Age0to14 0.112147 0.029309 3.83 0.00013 ***
## Age65Plus -0.145641 0.043436 -3.35 0.00080 ***
## D1D 0.002849 0.006946 0.41 0.68167
## ACCESS 0.084040 0.028946 2.90 0.00369 **
## WRKCOUNT 0.039836 0.026798 1.49 0.13714
## LogIncome -0.089071 0.025349 -3.51 0.00044 ***
## D3apo 0.006663 0.003702 1.80 0.07189 .
## Zero hurdle model coefficients (binomial with logit link):
## Estimate Std. Error z value
## (Intercept) -6.360728 0.451172 -14.10
## AADVMT -0.002043 0.000619 -3.30
## VehPerDriver -0.203577 0.061281 -3.32
## LIF_CYCCouple w/o children -0.770327 0.082411 -9.35
## LIF_CYCEmpty Nester -0.645316 0.103512 -6.23
## LIF_CYCSingle -0.830495 0.137582 -6.04
## Age0to14 0.400223 0.032650 12.26
## Age65Plus -0.233184 0.058701 -3.97
## D1A 0.039241 0.014639 2.68
## D2A_EPHHM 0.233317 0.120469 1.94
## ACCESS 0.030270 0.040782 0.74
## WRKCOUNT 0.130113 0.037873 3.44
## LogIncome 0.257638 0.041040 6.28
## D3apo 0.027031 0.005232 5.17
## Pr(>|z|)
## (Intercept) < 0.0000000000000002 ***
## AADVMT 0.00096 ***
## VehPerDriver 0.00089 ***
## LIF_CYCCouple w/o children < 0.0000000000000002 ***
## LIF_CYCEmpty Nester 0.00000000045 ***
## LIF_CYCSingle 0.00000000158 ***
## Age0to14 < 0.0000000000000002 ***
## Age65Plus 0.00007114663 ***
## D1A 0.00735 **
## D2A_EPHHM 0.05278 .
## ACCESS 0.45795
## WRKCOUNT 0.00059 ***
## LogIncome 0.00000000034 ***
## D3apo 0.00000023806 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 27
## Log-likelihood: -9.42e+03 on 28 Df
2 validation
metro | rmse | pseudo.r2 |
---|---|---|
metro | 0.512 | 0.0562 |
non_metro | 0.456 | 0.0578 |
The average trip length models are linear regression models with dependent variable (TRPMILES) power-transformed: \(TRIPMILES^{0.10} = X\beta\). These models are similar in model structure to the non-zero DVMT model in GreenSTEP, but for average trip length for Transit, Bike and Walk trips.
The TFL model option is simplified from the original Trip Frequency-Length-Mode (TFLM) Model, which models individual trips for each household in the sample. One of reason for this simplification was performance: even though it has advantages in that it allows trip information to be utilized in these models, for example, trip purpose and trip length, which are important factors in mode choice decision. In estimation of TFLM model with NHTS data, it needs to use the trip dataset, which has more than 1 million observations; while in simulation, it requires to create a dataset with one observation for every trip. Even though it can work, the requirement for memory and the penalty of speed are high. We eventually settle with the simplified TFL model that caputres the essential of travel demand for non-driving modes.
metro | nonmetro | |
(1) | (2) | |
log1p(AADVMT) | -0.026*** | -0.005** |
(0.003) | (0.002) | |
log1p(VehPerDriver) | -0.188*** | -0.013 |
(0.014) | (0.008) | |
HHSIZE | 0.027*** | 0.010*** |
(0.003) | (0.003) | |
LIF_CYCCouple w/o children | -0.076*** | -0.163*** |
(0.009) | (0.006) | |
LIF_CYCEmpty Nester | -0.098*** | -0.158*** |
(0.009) | (0.006) | |
LIF_CYCSingle | -0.073*** | -0.152*** |
(0.013) | (0.009) | |
Age0to14 | 0.093*** | 0.158*** |
(0.005) | (0.004) | |
D1D | 0.001*** | 0.001* |
(0.0002) | (0.001) | |
D2A_EPHHM | -0.028** | -0.018** |
(0.011) | (0.008) | |
D3bmm4 | 0.0003 | |
(0.0002) | ||
D3bpo4 | 0.0005*** | |
(0.0001) | ||
ACCESS | 0.002** | -0.0001 |
(0.001) | (0.004) | |
WRKCOUNT | 0.039*** | 0.012*** |
(0.004) | (0.003) | |
Fwylnmicap | 0.039*** | |
(0.015) | ||
Tranmilescap | 0.005*** | |
(0.0002) | ||
LogIncome | -0.013*** | -0.008*** |
(0.004) | (0.002) | |
D3apo | -0.003*** | -0.003*** |
(0.001) | (0.0003) | |
D4c | -0.0004*** | |
(0.0001) | ||
Tranmilescap:D4c | 0.00002*** | |
(0.00000) | ||
Constant | 0.361*** | 0.285*** |
(0.043) | (0.026) | |
Observations | 38,676 | 67,755 |
R2 | 0.101 | 0.132 |
Adjusted R2 | 0.101 | 0.132 |
Residual Std. Error | 0.465 (df = 38656) | 0.416 (df = 67741) |
F Statistic | 229.000*** (df = 19; 38656) | 792.000*** (df = 13; 67741) |
Note: | p<0.1; p<0.05; p<0.01 |
metro | rmse | r2 |
---|---|---|
metro | 3.78 | 0.101 |
non_metro | 3.43 | 0.132 |
metro | nonmetro | |
(1) | (2) | |
log1p(AADVMT) | -0.026*** | -0.005** |
(0.003) | (0.002) | |
log1p(VehPerDriver) | -0.188*** | -0.013 |
(0.014) | (0.008) | |
HHSIZE | 0.027*** | 0.010*** |
(0.003) | (0.003) | |
LIF_CYCCouple w/o children | -0.076*** | -0.163*** |
(0.009) | (0.006) | |
LIF_CYCEmpty Nester | -0.098*** | -0.158*** |
(0.009) | (0.006) | |
LIF_CYCSingle | -0.073*** | -0.152*** |
(0.013) | (0.009) | |
Age0to14 | 0.093*** | 0.158*** |
(0.005) | (0.004) | |
D1D | 0.001*** | 0.001* |
(0.0002) | (0.001) | |
D2A_EPHHM | -0.028** | -0.018** |
(0.011) | (0.008) | |
D3bmm4 | 0.0003 | |
(0.0002) | ||
D3bpo4 | 0.0005*** | |
(0.0001) | ||
ACCESS | 0.002** | -0.0001 |
(0.001) | (0.004) | |
WRKCOUNT | 0.039*** | 0.012*** |
(0.004) | (0.003) | |
Fwylnmicap | 0.039*** | |
(0.015) | ||
Tranmilescap | 0.005*** | |
(0.0002) | ||
LogIncome | -0.013*** | -0.008*** |
(0.004) | (0.002) | |
D3apo | -0.003*** | -0.003*** |
(0.001) | (0.0003) | |
D4c | -0.0004*** | |
(0.0001) | ||
Tranmilescap:D4c | 0.00002*** | |
(0.00000) | ||
Constant | 0.361*** | 0.285*** |
(0.043) | (0.026) | |
Observations | 38,676 | 67,755 |
R2 | 0.101 | 0.132 |
Adjusted R2 | 0.101 | 0.132 |
Residual Std. Error | 0.465 (df = 38656) | 0.416 (df = 67741) |
F Statistic | 229.000*** (df = 19; 38656) | 792.000*** (df = 13; 67741) |
Note: | p<0.1; p<0.05; p<0.01 |
2 validation
metro | rmse | r2 |
---|---|---|
metro | 1.047 | 0.0514 |
non_metro | 0.434 | 0.0274 |
metro | nonmetro | |
(1) | (2) | |
log1p(AADVMT) | -0.002 | |
(0.001) | ||
AADVMT | -0.0001*** | |
(0.00001) | ||
VehPerDriver | -0.003** | |
(0.001) | ||
HHSIZE | 0.002* | |
(0.001) | ||
LIF_CYCCouple w/o children | -0.017*** | -0.015*** |
(0.003) | (0.002) | |
LIF_CYCEmpty Nester | -0.014*** | -0.013*** |
(0.003) | (0.003) | |
LIF_CYCSingle | -0.018*** | -0.012*** |
(0.004) | (0.004) | |
Age0to14 | 0.028*** | 0.020*** |
(0.002) | (0.001) | |
UZAEMPDEN | 0.002* | |
(0.001) | ||
D2A_EPHHM | 0.011*** | |
(0.004) | ||
D3bpo4 | 0.0001*** | |
(0.00003) | ||
D1D | 0.0005** | |
(0.0002) | ||
WRKCOUNT | 0.009*** | 0.004*** |
(0.001) | (0.001) | |
Fwylnmicap | -0.021*** | |
(0.004) | ||
Tranmilescap | -0.001*** | |
(0.0001) | ||
LogIncome | 0.005*** | 0.006*** |
(0.001) | (0.001) | |
D3apo | 0.001*** | |
(0.0001) | ||
Constant | -0.010 | -0.042*** |
(0.014) | (0.010) | |
Observations | 50,547 | 67,755 |
R2 | 0.020 | 0.017 |
Adjusted R2 | 0.020 | 0.016 |
Residual Std. Error | 0.189 (df = 50534) | 0.166 (df = 67743) |
F Statistic | 87.000*** (df = 12; 50534) | 104.000*** (df = 11; 67743) |
Note: | p<0.1; p<0.05; p<0.01 |
2 validation
metro | rmse | r2 |
---|---|---|
metro | 0.765 | 0.0203 |
non_metro | 0.591 | 0.0166 |
The PMT model is made up two sequential models: a total person miles traveled (PMT) and a mode allocation model(Figure 1).
Figure 1. Flow Chart of Person Miles Traveled by Mode (PMT) Model
The total person miles traveled is a household level model of total person miles traveled by all household members. It is a linear regression model with pmt (log transformed) as the dependent variable:\(\ln(pmt) = X\beta\)
The model can be segmented by life stage of a household (e.g. single, young couple, full nesters, empty nesters) and place types, etc for better model fit and predicting power.
The mode allocation model captures the percentage of PMT by modes for households and allocates total PMT to each mode in prediction. In estimation, we first choose a base mode, compute the ratio of PMT percentage for all other modes relative to that for the base mode, and then use log of the ratio (i.e., log-odds ratio) as the dependent variable of the mode allocation model. We will estimate \(n - 1\) models if there are \(n\) modes in total. In prediction, we first predict the log-odds ratios from each of the \(n-1\) models, exponentiate the predicted log-odds ratios to get odds ratios, and apply the additional condition that the odds for all modes sum up to 1 to get the predicted PMT percentage for each mode. The model structure is consistent with a multinomial logit model that is commonly used in mode choice modeling.
\(\ln(\frac{P_{Transit}}{P_{Auto}}) = X\beta\), and \(\ln(\frac{P_{Bike/Walk}}{P_{Auto}}) = X\beta\).
The advantage of the PMT model is that the model structure is similar to the existing household travel model in GreenSTEP, and consistent with mode choice models in travel demand modeling. The disadvantages includes:
After reporting to the TAC in October 2016, we converged to suspend the work on PMT models and focus on TFL models (below).
Statistical significance, theoretical foundation, and predicting power: because of the large sample size (n>15,000) of 2009 NHTS, it is easy to get a large number of significant coefficients, but they do not necessarily make for a good predictive model. On the other hand, models solely focusing on predictive power (for example, those based on machine learning algorithms) may lack of theoretical basis thus may break down when predict outcomes for conditions far from the base year range. One thing that is particularly hard to do for predictive model is for them to capture behavior that has not been observed in data, for example, potential non-linearity of price elasticities when price rise.
Circella, Giovanni, Susan Handy, and Marlon G. Boarnet, 2014. Impacts of Gas Price on Passenger Vehicle Use and Greenhouse Gas Emissions, California Air Resources Board, Technical Background Document.
Dong, Jing, Diane Davidson, Frank Southworth and Tim Reuscher, 2012. Analysis of Automobile Travel Demand Elasticities With Respect To Travel Cost, Federal Highway Administration
Graham, D.J., Glaister, S., 2002. The Demand for Automobile Fuel: A Survey of Elasticities. Journal of Transport Economics and Policy (JTEP) 36, 1–25.
Gregor, Brian, Modeling the Effects of Vehicle Travel Costs on Household Vehicle Travel. GreenSTEP Technical Document.
Ramsey, Kevin and Alexander Bell, 2014. Smart Location Database Version 2.0 User Guide. U.S. EPA. URL: https://www.epa.gov/smartgrowth/smart-location-mapping#SLD, accessed on 03/01/2015.
U.S. Department of Transportation, Federal Highway Administration, 2009 National Household Travel Survey. URL: http://nhts.ornl.gov, accessed on 02/09/2016.
sessionInfo()
## R version 3.3.3 (2017-03-06)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] splines stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] stringr_1.2.0 modelr_0.1.0 pander_0.6.0 stargazer_5.2
## [5] magrittr_1.5 scales_0.4.1 dplyr_0.5.0 purrr_0.2.2
## [9] readr_1.1.0 tidyr_0.6.1 tibble_1.3.0 ggplot2_2.2.1
## [13] tidyverse_1.1.1 pastecs_1.3-18 boot_1.3-18 moments_0.14
## [17] knitr_1.15.1 gridExtra_2.2.1 pacman_0.4.1
##
## loaded via a namespace (and not attached):
## [1] gtools_3.5.0 reshape2_1.4.2 haven_1.0.0
## [4] lattice_0.20-35 colorspace_1.3-2 htmltools_0.3.5
## [7] yaml_2.1.14 foreign_0.8-67 DBI_0.6-1
## [10] readxl_0.1.1 plyr_1.8.4 munsell_0.4.3
## [13] gtable_0.2.0 rvest_0.3.2 caTools_1.17.1
## [16] codetools_0.2-15 psych_1.7.3.21 evaluate_0.10
## [19] labeling_0.3 forcats_0.2.0 pscl_1.4.9
## [22] parallel_3.3.3 highr_0.6 broom_0.4.2
## [25] Rcpp_0.12.10 KernSmooth_2.23-15 ROCR_1.0-7
## [28] backports_1.0.5 gdata_2.17.0 jsonlite_1.4
## [31] gplots_3.0.1 mnormt_1.5-5 hms_0.3
## [34] digest_0.6.12 stringi_1.1.5 bookdown_0.3
## [37] grid_3.3.3 rprojroot_1.2 bitops_1.0-6
## [40] tools_3.3.3 lazyeval_0.2.0 MASS_7.3-45
## [43] xml2_1.1.1 lubridate_1.6.0 assertthat_0.2.0
## [46] rmarkdown_1.4 httr_1.2.1 R6_2.2.0
## [49] nlme_3.1-131