Building an AI-based service with Rekognition, Polly and Lex
-
Upload
amazon-web-services -
Category
Documents
-
view
197 -
download
3
Transcript of Building an AI-based service with Rekognition, Polly and Lex
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rebeker Choi, Solutions Architect
15-Sep, 2017
Building an AI-based service
with Rekognition, Polly, and Lex
The Challenge for Artificial Intelligence: SCALE
Tons of GPUs and CPUs
Prediction
Tons of GPUs
TrainingData
PBs of existing data
AWS is the Center of Gravity for
Artificial Intelligence
Amazon AIIntelligent Services Powered by Deep Learning
DIY Deep Learningfor Custom Models
AI EnabledManaged APIServices
Amazon AI: New Deep Learning Services
Polly LexRekognition
Deep Learning
FrameworksMXNet, TensorFlow, Theano, Caffe, Torch
CO
NT
RO
LU
SA
BIL
ITY
&
SIM
PL
ICIT
Y
Running AI in Production on AWS Today
Recommendation & Ranking at Netflix
Personalized
ranking, page
generation, search,
similarity, ratings
In 140 new
countries
simultaneously
Autonomous Driving System
Pinterest Visual Search Pinterest Lens
Amazon AI: New Deep Learning Services
Life-like Speech
Polly LexConversational
Engine
RekognitionImage Analysis
Amazon Lex
Conversational interfaces for your applications, powered
by the same Natural Language Understanding (NLU) &
Automatic Speech Recognition (ASR) models as Alexa
Lex: Build Natural, Conversational Interactions
Trigger AWS
Lambda functions
Continually improving
ASR & NLU models
Enterprise
connectors
Salesforce
Microsoft Dynamics
Marketo
Zendesk
Fully
ManagedVoice & Text
“Chatbots”
Text interaction
with Slack & Messenger
Improving human interactions…
• Contact, service, and support center interfaces (text + voice)
• Employee productivity and collaboration (minutes into seconds)
Intents
A particular goal that the
user wants to achieve
Utterances
Spoken or typed phrases
that invoke your intent
Slots
Data the user must provide to fulfill the
intent
Prompts
Questions that ask the user to input
data
Fulfillment
The business logic required to fulfill the
user’s intent
BookHotel
Origin
Destination
Departure Date
Flight Booking
“Book a flight
to London from Seattle”
Automatic
Speech RecognitionNatural Language
Understanding
Book Flight
London
Utterances
Flight booking
London Heathrow
Intent /
Slot model
London Heathrow
SeattleSeattle
Seattle
Origin
Destination
Departure Date
Flight Booking
“Book a flight
to London from Seattle”
Automatic
Speech RecognitionNatural Language
Understanding
Book Flight
London
Utterances
Flight booking
Intent /
Slot model
London Heathrow
Seattle
Prompt
“When would you like to fly?”
“When would you
like to fly?”
Polly
Seattle
London Heathrow
Seattle
Origin
Destination
Departure Date
Flight Booking
London Heathrow
Seattle
Prompt
“When would you like to fly?”
“When would you
like to fly?”
Polly
“Next Friday”
Origin
Destination
Departure Date
Flight Booking
“Next Friday”Automatic
Speech Recognition
Next Friday
Utterances
Natural Language
Understanding
Flight booking
02 / 24 / 2017
Intent /
Slot model
London Heathrow
Seattle
02/24/2017
Origin
Destination
Departure Date
Flight Booking
“Next Friday”Automatic
Speech Recognition
Next Friday
Utterances
Natural Language
Understanding
Flight booking
02 / 24 / 2017
Intent /
Slot model
London Heathrow
Seattle
02/24/2017
Confirmation
“Your flight is booked for next Friday”
“Your flight is booked
for next Friday”
Polly
Origin
Destination
Departure Date
Flight Booking
“Next Friday”Automatic
Speech Recognition
Next Friday
Utterances
Natural Language
Understanding
Flight booking
02 / 24 / 2017
Intent /
Slot model
London Heathrow
Seattle
02/24/2017
Hotel Booking
Amazon Polly
Turn Text into lifelike speech using deep learning
technologies to synthesize speech that sounds like a
human voice
Amazon Polly
“The temperature
in WA is 75°F”
“The temperature
in Washington is 75 degrees
Fahrenheit”
Amazon Polly: Text In, Life-like Speech Out
Converts text
to life-like speech
47 voices 24 languages Low latency,
real time
Fully managed
Polly: Life-like Speech Service
What is supported?
• Supports all programming language included in AWS SDK
(Java, Python, Node.js, etc) as well as HTTP API
• Audio stream formats: MP3, Vorbis, raw PCM
• Choose your sampling rate to optimize bandwidth & quality
• Customized Pronunciation
Articles and Blogs
Training Material
Chatbots (Lex)
Public Announcements
Polly: SSML and Lexicons
• Using version 1.1 SSML tags to adjust the speech rate, pitch, or volume. e.g.
• <break time="1s"/> pause 1 second between the initial two sentences
• <sub alias="World Wide Web Consortium">W3C</sub> substitute "World Wide Web Consortium" for the
acronym "W3C"
• <amazon:effect name="whispered">Score</amazon:effect> say the second "Score" in a whispered voice
<speak>He was caught up in the game.<break time="1s"/> In the middle of the 10/3/2014 <sub alias="World Wide Web Consortium">W3C</sub> meeting he shouted, "Score!" quite loudly. When his boss stared at him, he repeated <amazon:effect name="whispered">"Score"</amazon:effect> in a whisper.</speak>
• Pronounciation lexicons enable you to customize the pronunciation of words
<lexeme>
<grapheme>Bob</grapheme>
<alias>Robert</alias>
</lexeme>
aws polly synthesize-speech \
--lexicon-names LexA LexB \
--output-format mp3 \
--text 'Hello, my name is Bob' \
--voice-id Justin \
bobAB.mp3
“Hello, my name is Robert”
"Our Mapbox Navigation SDK offers a complete
turn-by-turn navigation solution that you can easily
add to your iOS or Android application, and having
clear, well-understood voice guidance is critical to
the user experience. Therefore, we’re excited to
offer natural-sounding pronunciation with highly
intelligible and pleasant voices in our users’ most
widely used languages with Amazon Polly’s Text-to-
Speech service."
– Paul Veugen, VP of Mobile, Mapbox.
Amazon Rekognition
Image Recognitions and Analysis powered by Deep
Learning which allows to search, verify and organize
millions of images
Amazon RekognitionDeep learning-based image recognition service
Search, verify, and organize millions of images
Object and Scene
DetectionFacial
Analysis
Face
Comparison
Facial
Recognition
Integrated with S3, Lambda, Polly, Lex
Object and Scene Detection
• Search, filter, and
curate image
libraries
• Smart searches for
user generated
content
• Photo, travel, real
estate, vacation
rental applications
Maple
Plant
Villa
Garden
Water
Swimming Pool
Tree
Potted Plant
Backyard
Request
Response
Object and Scene Detection – DetectLabels API
{"Image": {
"Bytes": blob,"S3Object": {
"Bucket": "string","Name": "string","Version": "string"
}},"MaxLabels": number,"MinConfidence": number
}
Maple
Plant
Villa
Garden
Water
Swimming Pool
Tree
Potted Plant
Backyard
{"Labels": [{
"Confidence": 95.78783416748047,"Name": "Villa"
},{
"Confidence": 68.914794921875,"Name": "Swimming Pool"
},{
"Confidence": 59.24593734741211,"Name": "Backyard"
},{
"Confidence": 59.24593734741211,"Name": "Yard"
},],"OrientationCorrection": "ROTATE_0" }
Generate labels for thousands of objects, scenes, and concepts, each with a
confidence score
S3 bucket
Facial Analysis
Demographic Data
Facial Landmarks
Sentiment Expressed
• Smart searches for
user generated
content
• Photo, travel, real
estate, vacation
rental applications
• Targeted marketing
• Dynamic,
personalized ads
• Improve online dating
match
recommendations
Facial Analysis"AgeRange": {"High": 38, "Low": 23},
"BoundingBox": {
"Height": 0.42500001192092896,
"Left": 0.1433333307504654,
"Top": 0.11666666716337204,
"Width": 0.2822222113609314
},
"Confidence": 99.8899917602539,
"Emotions": [
{"Confidence": 93.29251861572266,
"Type": "HAPPY"},
{"Confidence": 28.57428741455078,
"Type": "CALM" },
{"Confidence": 1.4989674091339111,
"Type": "ANGRY" }
],
"Eyeglasses": { "Confidence": 99.99998474121094,
"Value": true },
"Gender": { "Confidence": 100,
"Value": "Female" },
"Smile": { "Confidence": 99.47274780273438,
"Value": true },
"Sunglasses": { "Confidence": 97.63555145263672,
"Value": true }
DetectFaces
smart cropping
& ad overlays
sentiment
capture
demographic
analysis
face editing
& pixelation
Face Comparison
Measure the likelihood that faces in two images are of the same
person
• Add face verification to applications and devices
• Extend physical security controls
• Provide guest access to VIP-only facilities
• Verify users for online exams and polls
CompareFaces
"FaceMatches": [
{"Face": {"BoundingBox": {
"Height": 0.4601006507873535,
"Left": 0.32827046513557434,
"Top": 0.18212316930294037,
"Width": 0.3135717809200287},
"Confidence": 99.99964141845703},
"Similarity": 93
},
{"Face": {"BoundingBox": {
"Height": 0.2383333295583725,
"Left": 0.6233333349227905,
"Top": 0.3016666769981384,
"Width": 0.15888889133930206},
"Confidence": 99.71249389648438},
"Similarity": 0
}
],
"SourceImageFace": {"BoundingBox": {
"Height": 0.23983436822891235,
"Left": 0.28333333134651184,
"Top": 0.351423978805542,
"Width": 0.1599999964237213},
"Confidence": 99.99344635009766}
}
Similarity 93%
Similarity 0%
Celebrity Recognition
More Rekognition Capabilities
Image Moderation
Facial Recognition
Identify people in images by finding the closest match for an input face
image against a collection of stored face vectors
• Add friend tagging to social and messaging apps
• Assist public safety officers find missing persons
• Identify employees as they access sensitive locations
• Identify celebrities in historical media archives
Media Case Study
Identify who is on camera at what time for
each of 8 networks so that recorded video
streams can be indexed and searched
Video frame-sampling facial recognition
solution using Amazon Rekognition:
• Indexed 97,000 people into a face collection in
1 day
• Sample frames every 6 secs and test for image
variance
• Upload images to S3 and call Rekognition to
find best facial match
• Store time stamp and faceID metadata
Demo
Amazon AI Services
• Leveraging Amazon internal experiences with AI / ML
• Managed API services with embedded AI for maximum
accessibility and simplicity
• Full stack of platforms and engines for specialized deep
learning applications
Thank you!