Problem 1 – Asking Price for USEd Carscourse1.winona.edu/bdeppa/Stat 425/Assignments/ST… · Web...
Transcript of Problem 1 – Asking Price for USEd Carscourse1.winona.edu/bdeppa/Stat 425/Assignments/ST… · Web...
STAT 425 – Modern Methods of Data AnalysisAssignment 1 – OLS Regression (105 points)
PROBLEM 1 – ASKING PRICE FOR USED CARSThese data come from a study of the asking price for different makes and models of cars on the used car market. The response of interest is asking price and the remaining variables are potential predictors. The dataframes to use in R are called Usedcars.working and Usedcars, which includes the make model information for these cars. For developing OLS regression models it will be easier to use the Usedcars.working data frame. These data are also contained in the Arc file Used-car.lsp.
Variable Info Descriptionasking response Asking price for a used car.year
Predictor
Model yearnumopt Number of optionsmiles Miles on odometerpricenew Price of car newloanval Remainder of original
loan amount left to payavgretail Current blue book value
Grading rubric (35 points) Fitting base model, critiquing it, and discussing any deficiencies. (5 pts.) Model development, documentation, and discussion. (20 pts.)
Consideration of assumptions Possible predictor transformations Stepwise procedures
Fitting final model, critiquing it, interpreting it, and discussing any deficiencies. (10 pts.)
1
PROBLEM 2 – THE BOSTON HOUSING DATA
The Boston Housing data set was the basis for a 1978 paper by Harrison and Rubinfeld, which discussed approaches for using housing market data to estimate the willingness to pay for clean air. The authors employed a hedonic price model, based on the premise that the price of the property is determined by structural attributes (such as size, age, condition) as well as neighborhood attributes (such as crime rate, accessibility, environmental factors). This type of approach is often used to quantify the effects of environmental factors that affect the price of a property.
Data were gathered for 506 census tracts in the Boston Standard Metropolitan Statistical Area (SMSA) in 1970, collected from a number of sources including the 1970 US Census and the Boston Metropolitan Area Planning Committee. The variables used to develop the Harrison Rubinfeld housing value equation are listed in the table below. (Boston.working)
Variables Used in the Harrison-Rubinfeld Housing Value EquationVARIABLE TYPE DEFINITION SOURCECMEDV Dependent
VariableMedian value of homes in thousands of dollars
1970 U.S. Census
RMStructural
Average number of rooms 1970 U.S. CensusAGE % of units built prior to 1940 1970 U.S. Census
B
Neighborhood
Black % of population 1970 U.S. CensusLSTAT % of population that is lower
socioeconomic status1970 U.S. Census
CRIM Crime rate FBI (1970)
ZN % of residential land zoned for lots > than 25,000 sq. ft.
Metro Area Planning Commission (1972)
INDUS % of non-retail business acres (proxy for industry)
Mass. Dept. of Commerce & Development (1965)
TAX Property tax rate Mass. Taxpayers Foundation (1970)
PTRATIO Pupil-Teacher ratio Mass. Dept. of Ed (’71-‘72)
CHAS Dummy variable indicating proximity to Charles River (1 = on river)
1970 U.S. Census Tract maps
DISAccessibility
Weighted distances to major employment centers in area
Schnare dissertation (Unpublished, 1973)
RAD Index of accessibility to radial highways MIT Boston Project
NOX Air Pollution Nitrogen oxide concentrations (pphm) TASSIM
2
REFERENCE
Harrison, D., and Rubinfeld, D. L., “Hedonic Housing Prices and the Demand for Clean Air,” Journal of Environmental Economics and Management, 5 (1978), 81-102.
Develop a regression model for the CMEDV using the available predictors in the table above. In R use the dataframe Boston.working as that will allow you fit the first model using the command:
> bos.lm = lm(CMEDV~.,data=Boston.working)
As the authors of the original paper were primarily interested in the roll of air pollution in housing prices that variable should be retained throughout. Your analysis should be thorough! Document the model development process by copying and pasting relevant R commands, output, and graphics into your write-up. You may also use the Bostonworking.lsp file for Arc, but I would like you fit your final model from Arc using R. Include diagnostic plots for your final model from R.
Grading rubric (35 points) Fitting base model, critiquing it, and discussing any deficiencies. (5 pts.) Model development, documentation, and discussion. (20 pts.)
Consideration of assumptions Possible predictor transformations Stepwise procedures
Fitting final model, critiquing it, and discussing any deficiencies. (5 pts.) Discussion of the role of NOx in your final model, which was the predictor of
primary interest to researchers. (5 pts.)
3
PROBLEM 3 – MANPOWER AND WORKLOAD FOR U.S. NAVY BACHELOR OFFICERS’ QUARTERS (BOQ)
The U.S. Navy attempts to develop equations for estimation of manpower needs for manning installations such as Bachelor Officers’ Quarters. Develop such an equation for the U.S. Navy using the dataframe BOQ in R and BOQ.lsp in Arc.
Variable Info DescriptionBERTHberth
Predictors
Operational berthing capacity
CHECKINSchkin
Monthly average number of check-ins
COMMONScommons
Square feet of common use area
OCCUPANCYocc
Average daily occupancy
ROOMSrooms
Number of rooms
SERVICEserv
Weekly hours of service desk operation
WING.NUMwing
Number of building wings
MANHRSmanhrs
Response Monthly man-hours, i.e. manpower
Grading rubric (35 points) Fitting base model, critiquing it, and discussing any deficiencies. (5 pts.) Model development, documentation, and discussion. (20 pts.)
Consideration of assumptions Possible predictor transformations Stepwise procedures
Fitting final model, critiquing it, interpreting it, and discussing any deficiencies. (10 pts.)
4