Agenda
ArtificialIntelligence AI
BigDataMachineLearning
DeepLearning
NeuralNetworks
NLPNaturalLanguageProcessing
Demystifythefollowingbuzzwords.
ImageRecognition2
UltimateGoal:PredictiveAnalytics
Predictwhatuserswillwanttobuy.
AconsumersearchesforaTVandbasedonpreviouscustomersdata,showaproductthathasahighprobabilityofbeingboughtaswell.
3
EvolutionofDataAnalytics1990s 2000s
Excel BusinessIntelligence(BI)Dashboards
2015andbeyond
ActionableInsights
WhatHappened? What’sHappening? WhatWillHappen?
4
TheProcess
Structuredandunstructured(ex.
video)data
Dataisstoredindatabasesand
servers
DataGenerated
DataStored
ActionableInsights
DataProcessing
ProcessthedatausingCPU/GPUsandAIalgorithmstodetectpatterns
Predictivesignalsaregenerated
CentralProcessingUnit(CPU)/GraphicsProcessingUnit(GPU)
BigData ArtificialIntelligence
5
HowDidWeGetHere?Databases(the80s)
DataWarehousing(the90s)
• Relationaldatabases• Gigabytesinsize• Lowlatency
• Terabytesinsize• Customhardware
6
WhenToUseMachineLearning
Apatternexists1
Wecannotpindownthepatternmathematically
2
Wehavedataandhopefullylotsofdata
10
SupervisedLearning
X
X
XX
X
Price
SquareFeet
Weknowwhatwearetryingtopredict.Weusesomeexamplesthatweandthemodelknowtheanswersto“train”ourmodel.Itcanthengeneratepredictionstoexampleswedon’tknowtheanswerto.
Example:Predictthepriceofahousebasedonthesizeofthehouse.
XX
12
UnsupervisedLearning
OO OO
O
OOOOO
X
Y
OOO OO
Wedon’tknowwhatwearetryingtopredict.Wearetryingtoidentifysomenaturallyoccurringpatternsinthedatawhichmaybeinformative.
Example:Trytoidentify“clusters”ofcustomersbasedonthedatawehaveonthem.
13
WhatisDeepLearning?• DeepLearningandNeuralNetworksaresynonymous
• It’sabranchofmachinelearningbasedonasetofalgorithmsthatattempttomodelhighlevelabstractionsindatabyusingadeepgraphwithmultipleprocessinglayers,composedofmultiplelinearandnon-lineartransformations
Whatwesee Whatthecomputer“sees” 14
AIResearchersGeoffreyHinton
UniversityofTorontoGoogle
Yoshua Bengio
UniversityofMontreal
YannLeCun
NewYorkUniversityFacebook
AndrewNg
StanfordUniversityBaidu
18
TheName…Hadoop
NamedaftertheyellowtoyelephantofDougCutting’sson.
In2006whileworkingatYahoo,DougcameupwiththeHadoopframework.In2008,itwastakenoverbytheopensourcegroup
Apache,hencetheofficialnameisApacheHadoop.21
HadooptotheRescue“anopensourceframeworkwritteninJavaforstoringand
processingmassiveamountsofdatainadistributedmanner”
1HadoopDistributedFileSystem(HDFS).Scalablefilesystemthatdistributesandstoresdataacrossmanymachinesinacluster.
MapReduce – frameworkfordistributedprocessing.
2KeyComponentsoftheFramework:
Storage 2 Analysis
22
Hadoop Architecture
Hadoopcanrunoncheapcommoditizedhardwareonpremiseorinthecloud.
Storesfilesinlargeblocks(64MB)acrossmultiplemachinesforfaulttolerance.Bydefault,dataisstoredon3separatemachines
HDFS
MapReduceBreakslargedataprocessingproblems intomultiple steps,namelyMappers(DataNode)andReducers(TaskTrackers)thatcanbeworkedoninparallelonmultiplemachines
23
MapReduce StoreSalesData(100MB)
Mappers NameNode1 DataNode1(64MB)
DataNode2(36MB)
LA NYC LA NYC
Reducers JobTracker TaskTracker1
LA LA
TaskTracker2
NYC NYC
ShuffleandSort
24
Top Related