Models: pets and herds
Carlos J. Gil [email protected]
September 2014
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 1 / 21
This is a pet...
Source: http://jessfalcone.wordpress.com
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 2 / 21
... and this is a herd
Source: http://bonfirehealth.com/negative-influences-comparisons-social-cues-herd/
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 3 / 21
Some people treat computers as pets...
Source: aliexpress.com
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 4 / 21
... an others like herds
Source: Failure Trends in a Large Disk Drive Population, Pinheiro et al.
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 5 / 21
This is a statistical model treated as a pet
Source: http://www.ats.ucla.edu/stat/stata/seminars/interaction_sem/interaction_sem.htm
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 6 / 21
Pets are very demanding and require...
1 variable selection,
2 checks for outilers,
3 assessment of the goodness of fit,
4 finding confidence intervals,
5 calculating p-values,
6 interpretating the results,
7 discuss the generalization,
8 ...
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 7 / 21
Models... as herds?
Source: http://www.gotmedieval.com
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 8 / 21
Model construction: population
Source: http://timyeo.wordpress.com/
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 9 / 21
Model construction: data enrichment (aka left join)
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 10 / 21
But subject data is often messy...
Source: http://arquitectolegista.com.ar/
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 11 / 21
... and contains temporal data...
Source: http://thirdorderscientist.org/
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 12 / 21
... that is difficult to fit in a box (table)
Source: http://cutestcatpics.com/cat-trying-to-fit-into-tiny-box/
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 13 / 21
We have a whole dataset per subject!
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 14 / 21
... and a model per subject?
Source: http://www.unc.edu/
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 15 / 21
(Most) models are sophisticated summaries of data
Source: http://xkcd.r-forge.r-project.org/
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 16 / 21
Do you seek α? Build a model per stock!
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 17 / 21
Fitting a million models in the nineties was all of anachievement (for some)
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 18 / 21
Beyond recency and frequency: self exciting processes
Source: Bursting transition in a linear self-exciting point process, Onaga, T. et al
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 19 / 21
One logistic regression per Gmail user...
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 20 / 21
Challenges: statistical, computational,... and more!
This approach faces many challenges:
1 Computational: how do you fit so many models? (but Spark rocks!)2 Statistical: how do you...
1 perform variable selection?2 evaluate the fit?3 deal with outliers?4 ...
3 And finally, how do you sell these approaches to business people(ex-Google)?
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 21 / 21
Top Related