Podling Hivemall in the Apache Incubator
-
Upload
makoto-yui -
Category
Data & Analytics
-
view
248 -
download
1
Transcript of Podling Hivemall in the Apache Incubator
Podling HivemallintheApacheIncubator
ResearchEngineerMakotoYUI@myui
12016/11/08ApacheHadoopMeetupatCWT2016
2016/11/08ApacheHadoopMeetupatCWT2016 2
HivemallenteredApacheIncubatoronSept13,2016🎉
hivemall.incubator.apache.org
@ApacheHivemall
•MakotoYui<TreasureData>• TakeshiYamamuro <NTT>Ø HivemallonApacheSpark• DanielDai<Hortonworks>Ø HivemallonApachePigØ ApachePigPMCmember• TsuyoshiOzawa<NTT>ØApacheHadoopPMCmember• KaiSasaki<TreasureData>
3
Initialcommitters
2016/11/08ApacheHadoopMeetupatCWT2016
Champion
NominatedMentors
4
Projectmentors
• ReynoldXin<Databricks,ASFmember>ApacheSparkPMCmember• MarkusWeimer<Microsoft,ASFmember>ApacheREEFPMCmember• Xiangrui Meng <Databricks,ASFmember>ApacheSparkPMCmember
• RomanShaposhnik <Pivotal,ASFmember>ApacheBigtop/IncubatorPMCmember
2016/11/08ApacheHadoopMeetupatCWT2016
WhatisApacheHivemall
ScalablemachinelearninglibrarybuiltasacollectionofHiveUDFs
52016/11/08ApacheHadoopMeetupatCWT2016
Multi/Crossplatform Versatile Scalable Ease-of-use
Hivemalliseasyandscalable…
ClassificationwithMahout
CREATETABLElr_model ASSELECTfeature,-- reducersperformmodelaveraginginparallelavg(weight)asweightFROM(SELECTlogress(features,label,..)as(feature,weight)FROMtrain)t-- map-onlytaskGROUPBYfeature;-- shuffledtoreducers
MLmadeeasyforSQLdevelopers
Borntobeparallelandscalable
ThisSQLqueryautomaticallyrunsinparallelonHadoopcluster
62016/11/08ApacheHadoopMeetupatCWT2016
Ease-of-use
Scalable
2016/11/08ApacheHadoopMeetupatCWT2016 7
Hivemallisamulti/cross-platformMLlibrary
HiveQL SparkSQL/Dataframe API PigLatin
HivemallisMulti/Crossplatform..
Multi/Crossplatform
predictionmodelsbuiltbyHivecanbeusedfromSpark,andconversely,predictionmodelsbuildbySparkcanbeusedfromHive
2016/11/08ApacheHadoopMeetupatCWT2016 8
HivemallonApacheHive
2016/11/08ApacheHadoopMeetupatCWT2016 9
HivemallonApacheSparkDataframe
2016/11/08ApacheHadoopMeetupatCWT2016 10
HivemallonSparkSQL
2016/11/08ApacheHadoopMeetupatCWT2016 11
HivemallonApachePig
2016/11/08ApacheHadoopMeetupatCWT2016 12
Versatile
HivemallisaVersatilelibrary..
ü HivemallisnotonlyforMachineLearning
ü Hivemallprovidesbunchofgenericutilityfunctions(e.g.,top-k,NLP)
EachorganizationhasownsetsofUDFsfordatapreprocessing!
Don’tRepeatYourself!Don’tRepeatYourself!
ConclusionandTakeaway
Hivemallisamachinelearninglibrarythatis…
2016/11/08ApacheHadoopMeetupatCWT2016 13
WewelcomeyourcontributionstoApacheHivemallJ
Multi/Crossplatform Versatile Scalable Ease-of-use
hivemall.incubator.apache.org