Machine Learning @ Amazon€¦ · • Machine Translation • Visual Systems 9/20/16 2. Amazon’s...

Post on 26-Jun-2020

3 views 0 download

Transcript of Machine Learning @ Amazon€¦ · • Machine Translation • Visual Systems 9/20/16 2. Amazon’s...

MachineLearning@AmazonRalfHerbrich

9/20/16 1

Overview

• WhatisMachineLearning?• AComputerScienceandStatisticsPerspective• HistoryofMachineLearning

• MachineLearning@Amazon• Forecasting• MachineTranslation• VisualSystems

9/20/16 2

Amazon’sVirtuousCycles

Growth CustomerExperience

Traffic

Sellers

Selection & Convenience

LowerPrices

Lower Cost Structure

1. Savingcostsbybetterplanning(e.g.,forecasting)2. Savingcostsbyautomatinghumandecisionmaking(e.g.,pricing)3. Increasingrevenuebylow-frictionexperience(e.g.recommendation)

39/20/16

Overview

• WhatisMachineLearning?• AComputerScienceandStatisticsPerspective• HistoryofMachineLearning

• MachineLearning@Amazon• Forecasting• MachineTranslation• VisualSystems

9/20/16 4

Overview

• WhatisMachineLearning?• AComputerScienceandStatisticsPerspective• HistoryofMachineLearning

• MachineLearning@Amazon• Forecasting• MachineTranslation• VisualSystems

9/20/16 5

MachineLearning:TheScience

Science• ComputerScience• Statistics• Neuroscience• OperationsResearch

ArtificialIntelligence• Ruleextractionfromdata• Inspiredbyhumanlearning• Adaptivealgorithms

Engineering• Training:DataàModels• Prediction:Modelsà Forecast• Decision:Forecastà Actions

9/20/16 6

MachineLearning:AProgramerPerspective

TraditionalProgramming

MachineLearning

ComputerData

ProgramOutput

ComputerData

OutputProgram

79/20/16

HighatopthestepsofthePyramidofGizaayoungwomanlaughedandcalleddowntohim."Robert,hurryup!IknewIshouldhavemarriedayoungerman!"Hersmilewasmagic.….

MLExamples:Named Entity Extraction

8

Author AnnotatorHighatopthestepsofthePyramidofGizaayoungwomanlaughedandcalleddowntohim."Robert,hurryup!IknewIshouldhavemarriedayoungerman!"Her smilewasmagic.….

if (word is capitalized) and(word before is ‘in’) thenPLACE

else if (word = ‘her’) or (word = ‘his’)or (word = ‘he’) or (word = ‘she’) thenPERSON

...

Data Output (Annotation)

Program

9/20/16

HighatopthestepsofthePyramidofGizaayoungwomanlaughedandcalleddowntohim."Robert,hurryup!IknewIshouldhavemarriedayoungerman!"Hersmilewasmagic.….

MLExamples:Named Entity Extraction

9

Author Annotator…"Robert,hurryup!IknewIshouldhavemarriedayoungerman!".….

Machine Learning Service

HighatopthestepsofthePyramidofGizaayoungwomanlaughedandcalleddowntohim.…Her smilewasmagic.….

9/20/16

Overview

• WhatisMachineLearning?• AComputerScienceandStatisticsPerspective• HistoryofMachineLearning

• MachineLearning@Amazon• Forecasting• MachineTranslation• VisualSystems

9/20/16 10

HistoryofMachineLearning

• DeepNeuralNetworks

• Fasthardware(GPUs)

• Distributedcomputingandstorage

• Learning=AdaptationofWeightsinabrain-likelayeredarchitecture

2015("AI")

• Distributedcomputingandstorage

• Adaptivesystems

• Learning=Scalable,AdaptiveComputationforVariousBigData

2010(“Service”)

•Wideapplicationinproducts

• StatisticalModelingofData

• Learning=ParameterEstimationorInference

2005(“GraphicalModels”)

• StatisticalLearningTheory

• ScoringSystems

• Learning=OptimizationofConvexFunctions

2000(“KernelMachines”)

• ExpertSystems• Decision-TreeLearning(C4.5)

• Learning=MethodstoautomaticallybuildExpertSystems

1990(“Symbolic”)

• NeuralNetworks

• ArtificialIntelligence

• Learning=AdaptationofNeuronsbasedonExternalStimuli

1980(“Neuro”)

9/20/16 11

HistoryofMachineLearning

• DeepNeuralNetworks

• Fasthardware(GPUs)

• Distributedcomputingandstorage

• Learning=AdaptationofWeightsinabrain-likelayeredarchitecture

2015("AI")

• Distributedcomputingandstorage

• Adaptivesystems

• Learning=Scalable,AdaptiveComputationforVariousBigData

2010(“Service”)

•Wideapplicationinproducts

• StatisticalModelingofData

• Learning=ParameterEstimationorInference

2005(“GraphicalModels”)

• StatisticalLearningTheory

• ScoringSystems

• Learning=OptimizationofConvexFunctions

2000(“KernelMachines”)

• ExpertSystems• Decision-TreeLearning(C4.5)

• Learning=MethodstoautomaticallybuildExpertSystems

1990(“Symbolic”)

• NeuralNetworks

• ArtificialIntelligence

• Learning=AdaptationofNeuronsbasedonExternalStimuli

1980(“Neuro”)

9/20/16 12

HistoryofMachineLearning

• DeepNeuralNetworks

• Fasthardware(GPUs)

• Distributedcomputingandstorage

• Learning=AdaptationofWeightsinabrain-likelayeredarchitecture

2015("AI")

• Distributedcomputingandstorage

• Adaptivesystems

• Learning=Scalable,AdaptiveComputationforVariousBigData

2010(“Service”)

•Wideapplicationinproducts

• StatisticalModelingofData

• Learning=ParameterEstimationorInference

2005(“GraphicalModels”)

• StatisticalLearningTheory

• ScoringSystems

• Learning=OptimizationofConvexFunctions

2000(“KernelMachines”)

• ExpertSystems• Decision-TreeLearning(C4.5)

• Learning=MethodstoautomaticallybuildExpertSystems

1990(“Symbolic”)

• NeuralNetworks

• ArtificialIntelligence

• Learning=AdaptationofNeuronsbasedonExternalStimuli

1980(“Neuro”)

9/20/16 13

HistoryofMachineLearning

• DeepNeuralNetworks

• Fasthardware(GPUs)

• Distributedcomputingandstorage

• Learning=AdaptationofWeightsinabrain-likelayeredarchitecture

2015("AI")

• Distributedcomputingandstorage

• Adaptivesystems

• Learning=Scalable,AdaptiveComputationforVariousBigData

2010(“Service”)

•Wideapplicationinproducts

• StatisticalModelingofData

• Learning=ParameterEstimationorInference

2005(“GraphicalModels”)

• StatisticalLearningTheory

• ScoringSystems

• Learning=OptimizationofConvexFunctions

2000(“KernelMachines”)

• ExpertSystems• Decision-TreeLearning(C4.5)

• Learning=MethodstoautomaticallybuildExpertSystems

1990(“Symbolic”)

• NeuralNetworks

• ArtificialIntelligence

• Learning=AdaptationofNeuronsbasedonExternalStimuli

1980(“Neuro”)

9/20/16 14

HistoryofMachineLearning

• DeepNeuralNetworks

• Fasthardware(GPUs)

• Distributedcomputingandstorage

• Learning=AdaptationofWeightsinabrain-likelayeredarchitecture

2015("AI")

• Distributedcomputingandstorage

• Adaptivesystems

• Learning=Scalable,AdaptiveComputationforVariousBigData

2010(“Service”)

•Wideapplicationinproducts

• StatisticalModelingofData

• Learning=ParameterEstimationorInference

2005(“GraphicalModels”)

• StatisticalLearningTheory

• ScoringSystems

• Learning=OptimizationofConvexFunctions

2000(“KernelMachines”)

• ExpertSystems• Decision-TreeLearning(C4.5)

• Learning=MethodstoautomaticallybuildExpertSystems

1990(“Symbolic”)

• NeuralNetworks

• ArtificialIntelligence

• Learning=AdaptationofNeuronsbasedonExternalStimuli

1980(“Neuro”)

9/20/16 15

HistoryofMachineLearning

• DeepNeuralNetworks

• Fasthardware(GPUs)

• Distributedcomputingandstorage

• Learning=AdaptationofWeightsinabrain-likelayeredarchitecture

2015("AI")

• Distributedcomputingandstorage

• Adaptivesystems

• Learning=Scalable,AdaptiveComputationforVariousBigData

2010(“Service”)

•Wideapplicationinproducts

• StatisticalModelingofData

• Learning=ParameterEstimationorInference

2005(“GraphicalModels”)

• StatisticalLearningTheory

• ScoringSystems

• Learning=OptimizationofConvexFunctions

2000(“KernelMachines”)

• ExpertSystems• Decision-TreeLearning(C4.5)

• Learning=MethodstoautomaticallybuildExpertSystems

1990(“Symbolic”)

• NeuralNetworks

• ArtificialIntelligence

• Learning=AdaptationofNeuronsbasedonExternalStimuli

1980(“Neuro”)

9/20/16 16

Overview

• WhatisMachineLearning?• AComputerScienceandStatisticsPerspective• HistoryofMachineLearning

• MachineLearning@Amazon• Forecasting• MachineTranslation• VisualSystems

9/20/16 17

MachineLearningOpportunities@Amazon

Retail• DemandForecasting

• VendorLeadTimePrediction

• Pricing• Packaging• SubstitutePrediction

Customers• ProductRecommendation

• ProductSearch• VisualSearch• ProductAds• ShoppingAdvice• CustomerProblemDetection

Seller• FraudDetection• PredictiveHelp• SellerSearch&Crawling

Catalog• Browse-NodeClassification

•Meta-datavalidation

• ReviewAnalysis• HazmatPrediction

Digital• Named-EntityExtraction

• XRay• PlagiarismDetection

• EchoSpeechRecognition

• KnowledgeAcquisiion

189/20/16

Locations

19

MLSeattle

MLBangalore

S9

A9A2Z

9/20/16

Ivona

MLBerlin

Evi

Overview

• WhatisMachineLearning?• AComputerScienceandStatisticsPerspective• HistoryofMachineLearning• MachineLearningandArtificialIntelligence

• MachineLearning@Amazon• Forecasting• MachineTranslation• VisualSystems

9/20/16 20

Forecasting

• Givenpastsalesofaproductineveryregion,predictregionaldemanduptooneyearintothefuture

Setting

• NewProducts:Nopastdemand!• Regionalized:100+fulfillmentcentersworldwide• Sparsity:Hugeskew– manyproductssellveryfewitems• Seasonal:Hugevariationduetoexternal,seasonalevents• Distributions:Futureisuncertainè predictionsmustbedistributions• Scale:20M+productsfulfilledbyAmazonalone!• Orders:Customersdemandbundle ofproducts• Censored:Pastsales≠pastdemand(inventoryconstraint)

Challenges

9/20/16 21

DemandForecasting

229/20/16

Training Range: Non-fashion items have longer training ranges that we can leverage. Need to information share across new and old products.

Seasonality: This item has Christmas seasonality with higher growth over time. This is where we need growth features in addition to date features.

Missing Features or Input: Unexplained spikes in demand are likely caused by missing features or incomplete input data.

Example Softlines product to illustrate the challenges of forecasting.

NewProducts

239/20/16

Learning across groups of products with varying ages to improve accuracy for new products

New Product Without Sharing: Product is less than 1 year old and hasn’t seen all dates before. Features learned per product are not very strong.

Red = Actual DemandBlack = Forecast

New Product With Sharing: Once we share data across groups of products, we start to see the appropriate lift for new holidays.

Overview

• WhatisMachineLearning?• AComputerScienceandStatisticsPerspective• HistoryofMachineLearning

• MachineLearning@Amazon• Forecasting• MachineTranslation• VisualSystems

9/20/16 24

ASINMachineTranslation

ASINs

Con

tribu

tion

Prof

it

Human Translation

Machine Translation

Selection Gap

9/20/16 25

MachineTranslationPipeline

9/20/16 26

InputNormalization Tokenization

SentenceSegmentationLowercasing

Translation/Decoding Recasing

Post-processing De-Tokenization

InputRequest

Detection&EscapingofNon-translatables

Re-insertionof(converted)Nontranslatables

TranslatedRequest

MachineTranslation:Deep Dive

p(English |Chinese) = p(English)× p(Chinese | English)p(Chinese)

∝ p(English)× p(Chinese | English)

Language Model

Translation Model

• Language Model: What are fluent English sentences?

• Translation Model: What English sentences account well for a given Chinese sentence?

9/20/16 27

Overview

• WhatisMachineLearning?• AComputerScienceandStatisticsPerspective• HistoryofMachineLearning

• MachineLearning@Amazon• Forecasting• MachineTranslation• VisualSystems

9/20/16 28

AutomatedProduceInspection:TheGoal

NewAutomated InspectionCurrent Inspection

Computer Vision

Conclusions

• MachineLearningisanemergingandscientificallyyoungdiscipline!

• MachineLearning“translates”datafromthepastintoaccuratepredictionsaboutthefuture!

• AmazonhasabroadrangeofapplicationsforMachineLearning– it’scentraltoAmazon’sbusiness!

9/20/16 30

Thanks!

9/20/16 31