Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The...

10
Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina Christakopoulou Liang Zeng Group G21 Related to the Chapter 28: Data Mining

Transcript of Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The...

Page 1: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina.

Big Data: New Tricks for EconometricsVarian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27.

Konstantina ChristakopoulouLiang ZengGroup G21

Related to the Chapter 28: Data Mining

Page 2: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina.

Motivation. Machine Learning for Economic

Transactions: Linear Regression is not Enough!

Big data size A lot of features: Choose variables Relationships are not only linear!!

Page 3: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina.

Connection to the Course: Decision Trees e.g ID3Challenges of ID3:- Cannot handle continuous attributes- Prone to outliers

1. C4.5, Classification And Regression Trees (CART) can handle: + continuous and discrete attributes+ handle missing attributes+ over-fitting by post-pruning

2. Random Forests: Ensemble of decision stumps. Randomization (choosing sample + choosing attributes) leads to better accuracy!

Page 4: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina.

ID3 Decision Tree

Page 5: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina.

Classification and Regression Trees(CART)CART:Classification tree is when the predicted

outcome is the class to which the data belongs.

Regression tree is when the predicted outcome can be considered a real number (e.g. the age of a house, or a patient’s length of stay in a hospital).

Page 6: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina.

Classification and Regression Trees(CART)Predict Titanic survivors using age and class

Page 7: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina.

Classification and Regression Trees(CART)A CART for Survivors of the Titanic using R language

Page 8: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina.

Random Forests

Page 9: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina.

Random Forests Choose a bootstrap sample and start to grow a tree At each node:

Choose random sample of predictors to make the next decision

Repeat many times to grow a forest of trees

For prediction: have each tree make its prediction and then a majority vote.

Decision Tree Learning + One Tree+ On all learning samples+ Prone to distortions e.g outliers

Random Forest

+ Many decision trees+ Each DT on a random subset of samples+ Reduce the effect of outliers (no overfitting)

Page 10: Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina.

Thank you!