Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23,...

19
Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2019 K. Zhang BUDT 758

Transcript of Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23,...

Page 1: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2019

K. Zhang BUDT 758

Page 2: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

BusinessValueofBigDataandAI

Page 3: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

3

Page 4: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

4http://www.salesforcehacker.com/2014/11/hadoop-and-pig-come-to-salesforce.html

Page 5: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

“AirpalisaWeb-baseddata-explorationandSQLqueryinterfacethatrunsonPresto,thein-memorySQL-on-HadoopquerytechnologythatFacebookdonatedtoApacheopensourceinlate2013.AirbnbinventedAirpalbecauseitneededatoolthatwouldbemoreaccessibletodataanalystsandevenbusinessusers,notjustthe23-personAirbnbdatascienceteamthathandlesHiveandPrestoqueries.”

----Airbnb3/5/201511:35AM

5

Page 6: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

•  “VCinvestmentinthespaceremainsvibrantandthefirstfewweeksof2016sawaflurryofannouncementsofbigfoundingroundsforlatestageBigDatastartups:DataDog($94M),BloomReach($56M),Qubole($30M),PlaceIQ($25M),etc.BigDatastartupsreceived$6.64Binventurecapitalinvestmentin2015,11%oftotaltechVC.”

6

http://www.goldmansachs.com/our-thinking/pages/big-data.html

Page 7: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

BigDataecosystem

7

Page 8: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

Theopensourcecommunity•  Yahoo!

q  Hadoop,Pigq  PighidesJavaprogramming

•  Facebookq  Hive:providesSQLtypefunctionsforHadoopfiles

•  Netflixq  Hbase:massagebigdatatobelikeadatabase

•  UCBerkeleyq  Spark:in-memoryprocessingtoavoidthelowdiskI/O

•  Twitterq  Storm:nearreal-timestreamingdata

8

Page 9: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

Technologyisstillevolvingrapidly

9

Page 10: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

Andtheal-mightyAI!•  2012Matlab•  2013Caffe•  2014Theano•  2015Torch•  2016/7TensorFlow•  2018???(PyTorch)

•  CNN,RNN,GANs…

•  SergeyBrin@2017DavosWorldEconomicForum–  https://www.youtube.com/watch?v=jYuCVcGxtNM

10

Page 11: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

So,what’sgoingon?

•  Youneedcriticalthinkingtonotgetlost

11

Page 12: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

Howdoesdatageneratevalue?

12

Page 13: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

Bigdataprocesses

•  Loaddata•  Cleanupdata•  Transformdata•  Querydata•  Machinelearning/deeplearning

13

Page 14: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

RealizingthebenefitsofBigData

•  SettingupHadoopisjustthebeginning!q Itjustmeansthatyouareenabledtohandlethebigdata

q Butdoesnotguaranteeanybenefit!– Mightwasteyourmoneyanddivertyourattention.

14

Page 15: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

Theeasyones

•  Fasterandcheaperq Inlate2007,theNewYorkTimeswantedtomakeavailableoverthewebitsentirearchiveofarticles,11millioninall,datingbackto1851.Four-terabytepileofimagesinTIFFformatneededtotranslatethatfour-terabytepileofTIFFsintomoreweb-friendlyPDFfiles.•  Notaparticularlycomplicatedbutlargecomputingchore,

q requiringawholelotofcomputerprocessingtime.

15

Page 16: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

•  asoftwareprogrammerattheTimes,DerekGottfrid,q playingaroundwithAmazonWebServices,ElasticComputeCloud

(EC2),•  uploadedthefourterabytesofTIFFdataintoAmazon'sSimpleStorageSystem(S3)

•  Inlessthan24hours,11millionsPDFs,allstoredneatlyinS3andreadytobeserveduptovisitorstotheTimessite.

•  Thetotalcostforthecomputingjob?$240q  10centspercomputer-hourtimes100computerstimes24hours

16

Page 17: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

Howtomakedata“actionable”

• D-D-P-P

q Descriptive:whathappened?q Diagnostic:whydidithappen?q Predictive:whatislikelytohappen?q Prescriptive:whatisthebestcourseofaction?

17

CourtesyofCupidChan

Page 18: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

Traditionalvs.BigDataApproach

18

Page 19: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies

Adynamicprocess•  Whatarethebusinessgoalsandcriticalissues?•  Whatdatadoyouhave?•  Whatdatacanyoupotentiallycapture?•  Whatanalyticaltoolscouldbeapplied?

Goals Data

Goal:findbusinessquestionsthatcanharnessthe

powerofbigdata19