Big Data and Big Cities

Post on 25-Jan-2017

55 views 1 download

Transcript of Big Data and Big Cities

BigDataandBigCi+es:ThePromisesandLimita+onsof

ImprovedMeasuresofCityLife

EdGlaeser,ScoCKominers,MikeLucaandNikhilNaik

Outline

•  Part1:BigDataandUrbanQues+ons–  Toomuchbigthinkonbigdata

•  Part2:MeasuringCityLifewhendataismissing–  Lookforwardtomylecturelater– MeasuringtheimpactofwaterinZambia

•  Part3:UsingBigDatatoImproveCityServices– Modestmodelontournamentsvs.consultants–  ReportonahygienetournamentinBoston

BigDataandBigQues+onsaboutCi+es

•  Howdoesurbandevelopmentimpacttheeconomy?–  Shockstopeoplevs.shockstoplace

•  Howthephysicalcityinteractwithsocialoutcomes?– Measuringthephysicalcitywithbigdata

•  Howmuchdopeoplevalueurbanameni+es?– Measuringameni+esandbeCercon+ngentvalua+on

•  Howcanpublicpolicyimprovethequalityofurbanspace?– Merginggovernmentac+onswithphysicalmeasures

ExamplesofBigData

•  Muchfinergeographicrecords(theIRSdata)•  Similardatafromprivateproviders(corelogic)•  Noveldatasetsontradi+onaloutcomes(Zoona)

•  Noveldatasetsonrela+velynewthings(Yelp)•  Completelydifferentdataonthingswehadbarelythoughtaboutbefore(GoogleStreetview)

What’sItGoodFor•  Bigdatadoesnotintrinsicallysolveanyofthecausalinferenceissuesthatwehavelongworriedabout.

•  Itdoesmakeitpossibletomeasuremorethings(hygiene,streetscapes)inmoreplacesinmoreways.

•  IRSrecordsprovidethemother-of-all-panelsets,whichispar+cularlyusefulforspa+alinterven+ons–  Therightwaytojudgeempowermentzones,forexample,wouldbetousethepanelstructure

LedAstrayBy“Bigger”Data(.3)

MeasuringthePreviouslyUnmeasurable

•  Weareusedtohavingpublicsourcesfordataonthemostbasiceconomicoutcome:income

•  Thisistypicallynottrueinthedevelopingworld,especiallysub-SaharanAfrica.

•  Especiallyfalseforausablepanel•  Example#1Zoonadata,waterandhealth•  Example#2GoogleStreetview:essen+allynightlightsonsteroids

ZoonainZambia

MeasuringStreetscapes(withNikhilNaik)

CrowdsourcingCityGovernment:UsingTournamentstoImproveInspec+onAccuracy

EdwardL.GlaeserAndrewHillis

ScoCDukeKominersMichaelLuca

BigData:Consumerreviewwebsites

•  Partofcrowdsourcingmovement.

Yelp

•  Luca2011:highra+ngsincreaserevenueforindependentrestaurants

•  ChevalierandMayzlin2006:Barnes&Noble,Amazonandonlinebookorders

•  Ghoseetal2011:TripAdvisorandhotelreserva+ons

YelpSearch

Restaurant’s Yelp Page

Somebackground

•  LosAngelesin1997…– Pos+ngà

•  higherscores•  lowerratesoffoodborneillness•  JinandLeslie(2003)

– Majorsuccessstoryofdisclosure

•  NYCin2010•  Yet,alothaschanged…

TheRiseofTournaments

•  Now,organiza+onscanoutsourcelarge-scalepredic+onproblemsviaopentournaments!– e.g.,210predic+ontournamentsonKaggle,withprizesrangingfrom$0to$500,000.

TheRiseofTournaments

•  Now,organiza+onscanoutsourcelarge-scalepredic+onproblemsviaopentournaments!– Returnsarenotjustcash–alsorecogni+on,jobinterviews,cer+fica+on,sa+sfac+on,andlearning.

AnEconomicDesignQues+on

(When)canopentournamentshelpsolvepublicproblems?

Theory

•  Tournamentsmakesensewhen:–  theprobabilityofabreakthrough,𝜑,ishigh;–  whenthebaselinelow-skilledoutcome,𝑞 ,isnotthatbad;–  andwhenthebestoutcome,𝑞↓𝑚𝑎𝑥 ,ispar+cularlygood.

•  Wageinequalitymakestournamentsmoreappealing.

•  TournamentsareunaCrac+veforensuring¯𝑞 .

� TournamentsmaybebecomingmoreaCrac+ve!

Theory→Prac+ce

� TournamentsmaybebecomingmoreaCrac+ve!

� Weranone!

Conjecture

•  Inspec+onanddisclosurepoliciescanbeenhancedbyworkingwithsocialmedia:–  Socialmediaisapoten+alplarormfordisclosure– Op+maldisclosureisafunc+onofwhatpeoplearesayingonsocialmedia

•  Designdisclosureofhygieneviola+onsthroughYelpplarormUseYelpreviewtexttoguideinspec+ons.–  Inspec+onsarefairlyrandom,buttheydon’thavetobe!

Whyrestauranthygieneinspec+ons?

•  Dataandtechnologyhavechanged–  Policyhasremainedthesame

•  Disclosureside– MarketwithveryliCleinforma+on–  Earlysuccessstoryofdisclosure(JinandLeslie2003),soknownpoten+alimpact

•  Idealsesngforinforma+ondesignques+ons– Whatcondi+onscausepos+ngtowork?– Whatarethebehavioralfactorsunderlyingcustomerresponse?

•  Scopeforimprovingpolicy–  DaiandLuca2016

HygieneInspec+ons•  Processandscoringvaries(some+mesalot)bycity•  InSF:

–  restaurantsinspectedroughly2Xperyear.–  viola+onsclassifiedasmajor(lotsofrats)andminor(arat)–  finalscorebetween0and100

•  InBoston:–  Restaurantsinspectedatleastonceperyear–  Viola+onsclassifiedasminor,major,andsevere–  Un+lnow,nogrades

•  Goal:–  Iden+fyrisks–  Shutdownworstoffenders,enforcecleanup

Essen+allyapredic+onproblem

•  Whichrestaurantismostlikelytohaveaviola+on?

•  Bytarge+nginspec+ons,canbemoreefficient:–  Iden+fymorerisks,or,– Reducenumberofinspec+ons

•  Eg:1randomannualinspec+onforeachrestaurant,plustargeted

Treatment: Inspection Results on Yelp

Arehygienescorespredictable?

•  Yelpreviewersprovidelotsofnewinforma+on,but…

•  Poten+alpiralls:– Fakereviews– Selec+on– Hygienemaynotfactorintoreviews

Distribution of Hygiene Scores

Hygiene Scores by Restaurant Price

Yelp Ratings Predict Hygiene Scores

Upda+ngtheInspec+onProcess

•  Layeringonuseoftext,canpredictroughly85%ofrestaurantsintotop/boComhalfofscores(Kang,Kuznetsova,Luca,andChoi2013)

•  Relatedpilots

Tournament:

•  CosponsoredwithYelp•  SupportedbyCityofBoston•  CombinedYelpdatawithBostoninspec+onresults:– Objec+vetopredictviola+ons.– Weightschosenbycity(minor=1,major=2,severe=5).

– EvaluatedusingRMSLE

Tournament:Rewards

PlacePrize

Amount1st $3,000

2nd $1,000

3rd $1,000

PrizemoneyprovidedbyYelp

Compe++onProcess

Target:Inspec+onViola+ons

Target:Inspec+onViola+ons

Target:Inspec+onViola+ons

Target:Inspec+onViola+ons

Target:Inspec+onViola+ons

Results

•  >500signups•  Developmentphase:

– ~55completedatleastoneentry– ~450setsofpredic+ons

•  Evalua+onphase:– 23submiCedfinalalgorithms– Duringthis+me,Bostoninspected364restaurants

TheWinner

TheRunnerUp

GainsforBoston:~40%

Tocatch3,604weightedviola+ons,inspectthismanyrestaurants:

GainsforBostonIfchoosingthe364restaurantswiththehighestpredictedviola+ons,expecttoobtaintotalviola+ons:

Ongoingwork

•  Launchingatrial– Startsthismonth

•  Incorpora+ngintoday-to-dayinspec+ons•  Ongoingchallenges:

– Othercitygoals?– Gamability?– Transferability?

Epilogue

•  ResultsoftheAlgorithmweregiventoinspectorstoimproveaccuracy.

•  Thenwelookedathowtheydidusingtheirownbestprac+cesvs.thefancyalgorithmvs.areallysimplealgorithm.

•  Thefancyalgorithmdoeshelp–butthesimplealgorithmgetsmostofthewaythere.

•  Insomethings,gesngthebasicsrightisfarmoreimportantthantoomuchfancymath.