Predicting Rental Prices in NYC

Post on 09-Feb-2017

102 views 1 download

Transcript of Predicting Rental Prices in NYC

predicting rental unit prices in NYC Joel Carlsonhttp://joelcarlson.me

two hypotheses

1. Changes in the number of issued liquor licenses precedes changes in rental price increases

2. Changes in the number of taxi pickups and drop-offs precedes changes in rental price increases

motivation

1. Identify regions ripe for investment

2. Identify areas which may undergo gentrification

• Give early chance to policy makers to implement rent controls

the dataRental Unit Prices

• Published by Zillow

Liquor Licenses

• NY Gov’t Liquor Authority Database*

Taxi pickups/drop-offs

• Published by NY City Gov’t

• ~30 Gb / year

* Databases were, unfortunately, harmed in the creation of this project

data pipeline

Raw Data : Roughly oscillatory trend

data pipeline

Raw Data : Roughly oscillatory trend

Raw Data : Month over month changes Too noisy

data pipeline

STL Data (Trend): Acquire oscillatory trend

STL Decomposition

STL Data (Seasonal): Extract seasonal component

STL Data (Remainder): Discard remainder

data pipeline

Raw Data : Roughly oscillatory trend

Raw Data Processed Data

data pipeline

Raw Data : Roughly oscillatory trend

Processed Data

Prediction Target!

Model Features (lagged 3 to 12 months):

• Monthly changes in number of liquor licenses issued

• Monthly changes in taxi pickups and drop-offs

• Historical changes in price

pipeline

Liquor Data

Taxi Data

Rental Price Data

Aggregate Synchronize

Trend

Train/optimize Models • Vector Autoregression (VAR) • Random Forest • Random Forest w/o L+T

Goal 1 : Test taxi and liquor license hypotheses

Goal 2 : Accurately forecast monthly changes

are the models accurate?

VAR found no

statistically significant

relationship between

taxis + liquor licenses

and rent increases

Model Forecasts for A Single Zip-code

Random Forests

VAR

Target

training forecast

how far can the models predict?Forecast Accuracy for All NY Zip-codes

• Hoped to observe Full RF outperform RF on long term predictions

• Failed to observe

• Adding taxi and liquor data does not improve predictions

• Confirms VAR finding

what can the models tell us about NY?NYU has identified a number of regions which have been undergoing gentrification

Three categories:

1. Gentrifying

• Low-income in 1990, experienced rent growth above the median between 1990 and 2014

2. Non-Gentrifying

• Started off as low-income in 1990 but experienced more modest growth than gentrifying areas

3. Higher Income

• Those that were already at high income levels in 1990

what can the models tell us about NY?Bimodal accuracy distribution by zipcode:

For some zip codes, models trained with liquor and taxi data well outperform models without

These regions are almost unanimously gentrifying

• Bed-Stuy and Crown Heights

• Bronx near Yankee Stadium

• Jackson Heights near Citi Field (Mets)

• Not included in NYU map

• Google results from 2016 indicate gentrification has just begun

Regions where Liquor and Taxi

Models Perform Better

Perhaps there is some signal after all…

Thank you!

http://joelcarlson.me

jnkcarlson@gmail.com

github.com/joelcarlson

github.com/joelcarlson/CityPredictions