Serverless use cases with AWS Lambda - More Serverless Event
How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python
-
Upload
stuart-myles -
Category
Technology
-
view
208 -
download
6
Transcript of How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python
![Page 1: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/1.jpg)
How to Train
Your Classifier:
Create a Serverless Machine Learning System
with AWS and Python
PyData ✤ November 27th, 2017 ✤ [email protected]
![Page 3: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/3.jpg)
Tags
Why do you want tags
on your text content?
● Search, navigation,
recommendations
● Aggregation, routing
● Discoverability○ properties
○ relationships
![Page 5: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/5.jpg)
TaxonomyJordan Larson
<http://cv.ap.org/id/9A7FD8FA87AD4A43BDD522B65147A808> ,
ap:associatedState <http://cv.ap.org/id/8083[Nebraska]43E>;ap:displayLabel "Jordan Larson (Women's volleyball)"@en;
ap:hometown "Hooper, NE"@en;
ap:olympicTeam2016 <http://cv.ap.org/id/46[United States Olympic Team]B73H>;ap:sport <http://cv.ap.org/id/DA[Volleyball]C8EA>;dbprop:birthdate "1986-10-16"^^xsd:date;dcterms:created "2012-07-11T14:30:26-04:00"^^xsd:dateTime;dcterms:modified "2017-07-25T10:37:49-04:00"^^xsd:dateTime;
a <http://cv.ap.org/c/ProfessionalAthlete>, skos:Concept;
skos:broader <http://cv.ap.org/id/384[Professional Athlete]88>;skos:definition "American volleyball player."@en;skos:inScheme <http://cv.ap.org/a#person>;
skos:prefLabel "Jordan Larson"@en;foaf:gender "Female"@en.
![Page 6: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/6.jpg)
Applying taxonomy to textManually
Airlines Industry
Pan American Airlines Co.
Travel
![Page 7: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/7.jpg)
<Hurricane Harvey>
(AND,
(MINOC_2,
(SENT,
(NOTIN,
(OR,"Harvey_C","HARVEY_C"),
(OR,"[Fullname
female]","[Fullname
male]","[Person]")),
(OR,"texas","landfall","storm",
"hurricane","nws","National weather
service","evacuate@","surge@","flood@",
"rain@N","coastal","sandbag@N"...
)
)
)...
Applying taxonomy to textRules-based classifier
https://www.flickr.com/photos/notionscapital/15556898221/
![Page 8: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/8.jpg)
Applying taxonomy to textStatistical classifier
Training data
Training engine Trained model
![Page 9: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/9.jpg)
AP Metadata ServicesTag with AP taxonomy
APMS Custom TaggingSimple four step REST API
Add your own tags and taxonomy
![Page 10: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/10.jpg)
Let’s create a classifier! For dragons
What if l like the AP Taxonomybut I want to classify with some additional tags?
In this case, documents about dragons
![Page 11: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/11.jpg)
A taxonomy of dragons
(borrowed from screencrush.com)
New documents about dragons
To be classified
![Page 12: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/12.jpg)
A map (with some * )
A fully automated workflow for training and deploying a Lambda-based classifier
Sadly, the expression hic sunt
dracones (here be dragons) is an
anachronism, but it does appear
at least once, on the Hunt-Lenox
globe (ca 1510).
The Hunt-Lenox Globe (NYPL)
* Dragon emojis indicate problems found and (mostly) solved
![Page 13: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/13.jpg)
Step Functions
Client
EC2
Auto Scaling
Download training data
Download dependencies
Train model
Deploy model
EC2 classifier.py
classifier.pkl
tags.json
API Gateway
Lambda
Workflow Scaling Worker Classifier
Creating a classifier
![Page 14: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/14.jpg)
A Lambda-based classifier
• AWS Lambda: run event-driven code without provisioning or managing a server or servers
•Cost efficient solution to ensure capacity meets demand
• What do we need?• Code to invoke classifier and return results to user
• Code dependencies (e.g. scikit-learn)
• Other supporting artifacts (the trained model, the taxonomy)
• Permissions for Lambda function to interact with other AWS services
• API endpoint for accessing Lambda function
![Page 15: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/15.jpg)
Step Functions
Client
EC2
Auto Scaling
Download training data
Download dependencies
Train model
Deploy model
EC2 classifier.py
classifier.pkl
tags.json
API Gateway
Lambda
Workflow Scaling Worker Classifier
Processing user requests
![Page 16: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/16.jpg)
Processing user requests
Validate and trainAdding complexity: a workflow for algorithm selection
AWS Step Functions: use visual workflows to coordinate microservices into a single application
Triggers auto-scaling,
sends training request
to worker in the cloud.
![Page 17: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/17.jpg)
Step Functions
Client
EC2
Auto Scaling
Download training data
Download dependencies
Train model
Deploy model
EC2 classifier.py
classifier.pkl
tags.json
API Gateway
Lambda
Workflow Scaling Worker Classifier
Training and deploying
![Page 18: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/18.jpg)
Training in the cloud
• AWS EC2: scalable computing capacity in the cloud
• Register an Amazon Machine Image (AMI) specifically for training
•Speeds up provisioning your server
• Ensures versions match between dependencies and your model•Prepare dependencies ahead of time to beat AWS Lambda’s size limits
•If you are using scikit-learn, sklearn-build-lambda can generate an appropriately sized zip
• Save model and taxonomy to disk, add to dependency zip
![Page 19: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/19.jpg)
Automating deployments• Serverless Framework: Node.js
application for rapid deployment of serverless architectures
• Simplifies the task of creating (and deleting) our classifier Lambdas•Provider agnostic, though you may not be•Zip artifact support for Lambda creation
![Page 20: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/20.jpg)
Step Functions
Client
EC2
Auto Scaling
Download training data
Download dependencies
Train model
Deploy model
EC2 classifier.py
classifier.pkl
tags.json
API Gateway
Lambda
Workflow Scaling Worker Classifier
Classifying with AWS Lambda
![Page 21: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/21.jpg)
Classifying with AWS Lambda
• Be mindful of cold starts•Allocating more memory may help
• Store large models in S3 and take advantage of container reuse•Download assets to /tmp•Check /tmp for cached data before invocation
Item Limit
Deployment package (compressed) 50MB
Deployment package (uncompressed) 250MB
Non-persistent disk space in /tmp 500MB
![Page 22: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/22.jpg)
Predicted Eagles
Predicted Doves
PredictedPigeons
Sum of items
= 300
Actual Eagles
95 3 2 100 Eagles
Actual Doves
3 72 25 100 Doves
ActualPigeons
2 23 75 100 Pigeons
How do I measure results?Confusion matrix
![Page 23: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/23.jpg)
How do I measure results?
Measure your model’s performance per class• Precision (number of correct predictions divided by the total number in the dataset)
• Recall (number of correct positive predictions divided by the total number of positives)
Predicted
Eagles
Predicted
Doves
Predicted
Pigeons
Sum of items
= 300
Actual
Eagles95 3 2 100 Eagles
Actual
Doves3 72 25 100 Doves
Actual
Pigeons2 23 75 100 Pigeons
Model accuracy:
242 / 300 = 80%
![Page 24: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/24.jpg)
How do I improve results?
Training data• Correctly tagged - quality matters• Quantity matters too - as long as it’s ‘good’ data!• Balanced training sets across classes
![Page 25: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/25.jpg)
How do I improve results?
Taxonomy• Clean taxonomy nodes and structure• Distinct semantics, use relationships• Avoid overlapping concepts between nodes
![Page 26: How to Train Your Classifier: Create a Serverless Machine Learning System with AWS and Python](https://reader034.fdocuments.us/reader034/viewer/2022051404/5a64e7e87f8b9a127f8b45b9/html5/thumbnails/26.jpg)
Thank You!
Learn more about AP Metadata Services
https://developer.ap.org/ap-metadata-services