Big Data and Big Cities

51
Big Data and Big Ci+es: The Promises and Limita+ons of Improved Measures of City Life Ed Glaeser, ScoC Kominers, Mike Luca and Nikhil Naik

Transcript of Big Data and Big Cities

Page 1: Big Data and Big Cities

BigDataandBigCi+es:ThePromisesandLimita+onsof

ImprovedMeasuresofCityLife

EdGlaeser,ScoCKominers,MikeLucaandNikhilNaik

Page 2: Big Data and Big Cities

Outline

•  Part1:BigDataandUrbanQues+ons–  Toomuchbigthinkonbigdata

•  Part2:MeasuringCityLifewhendataismissing–  Lookforwardtomylecturelater– MeasuringtheimpactofwaterinZambia

•  Part3:UsingBigDatatoImproveCityServices– Modestmodelontournamentsvs.consultants–  ReportonahygienetournamentinBoston

Page 3: Big Data and Big Cities

BigDataandBigQues+onsaboutCi+es

•  Howdoesurbandevelopmentimpacttheeconomy?–  Shockstopeoplevs.shockstoplace

•  Howthephysicalcityinteractwithsocialoutcomes?– Measuringthephysicalcitywithbigdata

•  Howmuchdopeoplevalueurbanameni+es?– Measuringameni+esandbeCercon+ngentvalua+on

•  Howcanpublicpolicyimprovethequalityofurbanspace?– Merginggovernmentac+onswithphysicalmeasures

Page 4: Big Data and Big Cities

ExamplesofBigData

•  Muchfinergeographicrecords(theIRSdata)•  Similardatafromprivateproviders(corelogic)•  Noveldatasetsontradi+onaloutcomes(Zoona)

•  Noveldatasetsonrela+velynewthings(Yelp)•  Completelydifferentdataonthingswehadbarelythoughtaboutbefore(GoogleStreetview)

Page 5: Big Data and Big Cities

What’sItGoodFor•  Bigdatadoesnotintrinsicallysolveanyofthecausalinferenceissuesthatwehavelongworriedabout.

•  Itdoesmakeitpossibletomeasuremorethings(hygiene,streetscapes)inmoreplacesinmoreways.

•  IRSrecordsprovidethemother-of-all-panelsets,whichispar+cularlyusefulforspa+alinterven+ons–  Therightwaytojudgeempowermentzones,forexample,wouldbetousethepanelstructure

Page 6: Big Data and Big Cities

LedAstrayBy“Bigger”Data(.3)

Page 7: Big Data and Big Cities

MeasuringthePreviouslyUnmeasurable

•  Weareusedtohavingpublicsourcesfordataonthemostbasiceconomicoutcome:income

•  Thisistypicallynottrueinthedevelopingworld,especiallysub-SaharanAfrica.

•  Especiallyfalseforausablepanel•  Example#1Zoonadata,waterandhealth•  Example#2GoogleStreetview:essen+allynightlightsonsteroids

Page 8: Big Data and Big Cities

ZoonainZambia

Page 9: Big Data and Big Cities
Page 10: Big Data and Big Cities
Page 11: Big Data and Big Cities
Page 12: Big Data and Big Cities
Page 13: Big Data and Big Cities

MeasuringStreetscapes(withNikhilNaik)

Page 14: Big Data and Big Cities

CrowdsourcingCityGovernment:UsingTournamentstoImproveInspec+onAccuracy

EdwardL.GlaeserAndrewHillis

ScoCDukeKominersMichaelLuca

Page 15: Big Data and Big Cities

BigData:Consumerreviewwebsites

•  Partofcrowdsourcingmovement.

Page 16: Big Data and Big Cities

Yelp

•  Luca2011:highra+ngsincreaserevenueforindependentrestaurants

•  ChevalierandMayzlin2006:Barnes&Noble,Amazonandonlinebookorders

•  Ghoseetal2011:TripAdvisorandhotelreserva+ons

Page 17: Big Data and Big Cities
Page 18: Big Data and Big Cities

YelpSearch

Page 19: Big Data and Big Cities

Restaurant’s Yelp Page

Page 20: Big Data and Big Cities

Somebackground

•  LosAngelesin1997…– Pos+ngà

•  higherscores•  lowerratesoffoodborneillness•  JinandLeslie(2003)

– Majorsuccessstoryofdisclosure

•  NYCin2010•  Yet,alothaschanged…

Page 21: Big Data and Big Cities

TheRiseofTournaments

•  Now,organiza+onscanoutsourcelarge-scalepredic+onproblemsviaopentournaments!– e.g.,210predic+ontournamentsonKaggle,withprizesrangingfrom$0to$500,000.

Page 22: Big Data and Big Cities

TheRiseofTournaments

•  Now,organiza+onscanoutsourcelarge-scalepredic+onproblemsviaopentournaments!– Returnsarenotjustcash–alsorecogni+on,jobinterviews,cer+fica+on,sa+sfac+on,andlearning.

Page 23: Big Data and Big Cities

AnEconomicDesignQues+on

(When)canopentournamentshelpsolvepublicproblems?

Page 24: Big Data and Big Cities

Theory

•  Tournamentsmakesensewhen:–  theprobabilityofabreakthrough,𝜑,ishigh;–  whenthebaselinelow-skilledoutcome,𝑞 ,isnotthatbad;–  andwhenthebestoutcome,𝑞↓𝑚𝑎𝑥 ,ispar+cularlygood.

•  Wageinequalitymakestournamentsmoreappealing.

•  TournamentsareunaCrac+veforensuring¯𝑞 .

� TournamentsmaybebecomingmoreaCrac+ve!

Page 25: Big Data and Big Cities

Theory→Prac+ce

� TournamentsmaybebecomingmoreaCrac+ve!

� Weranone!

Page 26: Big Data and Big Cities

Conjecture

•  Inspec+onanddisclosurepoliciescanbeenhancedbyworkingwithsocialmedia:–  Socialmediaisapoten+alplarormfordisclosure– Op+maldisclosureisafunc+onofwhatpeoplearesayingonsocialmedia

•  Designdisclosureofhygieneviola+onsthroughYelpplarormUseYelpreviewtexttoguideinspec+ons.–  Inspec+onsarefairlyrandom,buttheydon’thavetobe!

Page 27: Big Data and Big Cities

Whyrestauranthygieneinspec+ons?

•  Dataandtechnologyhavechanged–  Policyhasremainedthesame

•  Disclosureside– MarketwithveryliCleinforma+on–  Earlysuccessstoryofdisclosure(JinandLeslie2003),soknownpoten+alimpact

•  Idealsesngforinforma+ondesignques+ons– Whatcondi+onscausepos+ngtowork?– Whatarethebehavioralfactorsunderlyingcustomerresponse?

•  Scopeforimprovingpolicy–  DaiandLuca2016

Page 28: Big Data and Big Cities

HygieneInspec+ons•  Processandscoringvaries(some+mesalot)bycity•  InSF:

–  restaurantsinspectedroughly2Xperyear.–  viola+onsclassifiedasmajor(lotsofrats)andminor(arat)–  finalscorebetween0and100

•  InBoston:–  Restaurantsinspectedatleastonceperyear–  Viola+onsclassifiedasminor,major,andsevere–  Un+lnow,nogrades

•  Goal:–  Iden+fyrisks–  Shutdownworstoffenders,enforcecleanup

Page 29: Big Data and Big Cities

Essen+allyapredic+onproblem

•  Whichrestaurantismostlikelytohaveaviola+on?

•  Bytarge+nginspec+ons,canbemoreefficient:–  Iden+fymorerisks,or,– Reducenumberofinspec+ons

•  Eg:1randomannualinspec+onforeachrestaurant,plustargeted

Page 30: Big Data and Big Cities

Treatment: Inspection Results on Yelp

Page 31: Big Data and Big Cities

Arehygienescorespredictable?

•  Yelpreviewersprovidelotsofnewinforma+on,but…

•  Poten+alpiralls:– Fakereviews– Selec+on– Hygienemaynotfactorintoreviews

Page 32: Big Data and Big Cities

Distribution of Hygiene Scores

Page 33: Big Data and Big Cities

Hygiene Scores by Restaurant Price

Page 34: Big Data and Big Cities

Yelp Ratings Predict Hygiene Scores

Page 35: Big Data and Big Cities

Upda+ngtheInspec+onProcess

•  Layeringonuseoftext,canpredictroughly85%ofrestaurantsintotop/boComhalfofscores(Kang,Kuznetsova,Luca,andChoi2013)

•  Relatedpilots

Page 36: Big Data and Big Cities

Tournament:

•  CosponsoredwithYelp•  SupportedbyCityofBoston•  CombinedYelpdatawithBostoninspec+onresults:– Objec+vetopredictviola+ons.– Weightschosenbycity(minor=1,major=2,severe=5).

– EvaluatedusingRMSLE

Page 37: Big Data and Big Cities

Tournament:Rewards

PlacePrize

Amount1st $3,000

2nd $1,000

3rd $1,000

PrizemoneyprovidedbyYelp

Page 38: Big Data and Big Cities
Page 39: Big Data and Big Cities

Compe++onProcess

Page 40: Big Data and Big Cities

Target:Inspec+onViola+ons

Page 41: Big Data and Big Cities

Target:Inspec+onViola+ons

Page 42: Big Data and Big Cities

Target:Inspec+onViola+ons

Page 43: Big Data and Big Cities

Target:Inspec+onViola+ons

Page 44: Big Data and Big Cities

Target:Inspec+onViola+ons

Page 45: Big Data and Big Cities

Results

•  >500signups•  Developmentphase:

– ~55completedatleastoneentry– ~450setsofpredic+ons

•  Evalua+onphase:– 23submiCedfinalalgorithms– Duringthis+me,Bostoninspected364restaurants

Page 46: Big Data and Big Cities

TheWinner

Page 47: Big Data and Big Cities

TheRunnerUp

Page 48: Big Data and Big Cities

GainsforBoston:~40%

Tocatch3,604weightedviola+ons,inspectthismanyrestaurants:

Page 49: Big Data and Big Cities

GainsforBostonIfchoosingthe364restaurantswiththehighestpredictedviola+ons,expecttoobtaintotalviola+ons:

Page 50: Big Data and Big Cities

Ongoingwork

•  Launchingatrial– Startsthismonth

•  Incorpora+ngintoday-to-dayinspec+ons•  Ongoingchallenges:

– Othercitygoals?– Gamability?– Transferability?

Page 51: Big Data and Big Cities

Epilogue

•  ResultsoftheAlgorithmweregiventoinspectorstoimproveaccuracy.

•  Thenwelookedathowtheydidusingtheirownbestprac+cesvs.thefancyalgorithmvs.areallysimplealgorithm.

•  Thefancyalgorithmdoeshelp–butthesimplealgorithmgetsmostofthewaythere.

•  Insomethings,gesngthebasicsrightisfarmoreimportantthantoomuchfancymath.