Building an AI-based service with Rekognition, Polly and Lex

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Rebeker Choi, Solutions Architect

15-Sep, 2017

Building an AI-based service

with Rekognition, Polly, and Lex

The Challenge for Artificial Intelligence: SCALE

Tons of GPUs and CPUs

Prediction

Tons of GPUs

TrainingData

PBs of existing data

AWS is the Center of Gravity for

Artificial Intelligence

Amazon AIIntelligent Services Powered by Deep Learning

DIY Deep Learningfor Custom Models

AI EnabledManaged APIServices

Amazon AI: New Deep Learning Services

Polly LexRekognition

Deep Learning

FrameworksMXNet, TensorFlow, Theano, Caffe, Torch

CO

NT

RO

LU

SA

BIL

ITY

&

SIM

PL

ICIT

Y

Running AI in Production on AWS Today

Recommendation & Ranking at Netflix

Personalized

ranking, page

generation, search,

similarity, ratings

In 140 new

countries

simultaneously

Autonomous Driving System

Pinterest Visual Search Pinterest Lens

Amazon AI: New Deep Learning Services

Life-like Speech

Polly LexConversational

Engine

RekognitionImage Analysis

Amazon Lex

Conversational interfaces for your applications, powered

by the same Natural Language Understanding (NLU) &

Automatic Speech Recognition (ASR) models as Alexa

Lex: Build Natural, Conversational Interactions

Trigger AWS

Lambda functions

Continually improving

ASR & NLU models

Enterprise

connectors

Salesforce

Microsoft Dynamics

Marketo

Zendesk

Fully

ManagedVoice & Text

“Chatbots”

Text interaction

with Slack & Messenger

Improving human interactions…

• Contact, service, and support center interfaces (text + voice)

• Employee productivity and collaboration (minutes into seconds)

Intents

A particular goal that the

user wants to achieve

Utterances

Spoken or typed phrases

that invoke your intent

Slots

Data the user must provide to fulfill the

intent

Prompts

Questions that ask the user to input

data

Fulfillment

The business logic required to fulfill the

user’s intent

BookHotel

Origin

Destination

Departure Date

Flight Booking

“Book a flight

to London from Seattle”

Automatic

Speech RecognitionNatural Language

Understanding

Book Flight

London

Utterances

Flight booking

London Heathrow

Intent /

Slot model

London Heathrow

SeattleSeattle

Seattle

Origin

Destination

Departure Date

Flight Booking

“Book a flight

to London from Seattle”

Automatic

Speech RecognitionNatural Language

Understanding

Book Flight

London

Utterances

Flight booking

Intent /

Slot model

London Heathrow

Seattle

Prompt

“When would you like to fly?”

“When would you

like to fly?”

Polly

Seattle

London Heathrow

Seattle

Origin

Destination

Departure Date

Flight Booking

London Heathrow

Seattle

Prompt

“When would you like to fly?”

“When would you

like to fly?”

Polly

“Next Friday”

Origin

Destination

Departure Date

Flight Booking

“Next Friday”Automatic

Speech Recognition

Next Friday

Utterances

Natural Language

Understanding

Flight booking

02 / 24 / 2017

Intent /

Slot model

London Heathrow

Seattle

02/24/2017

Origin

Destination

Departure Date

Flight Booking


Speech Recognition

Next Friday

Utterances

Natural Language

Understanding

Flight booking

02 / 24 / 2017

Intent /

Slot model

London Heathrow

Seattle

02/24/2017

Confirmation

“Your flight is booked for next Friday”

“Your flight is booked

for next Friday”

Polly

Origin

Destination

Departure Date

Flight Booking


Speech Recognition

Next Friday

Utterances

Natural Language

Understanding

Flight booking

02 / 24 / 2017

Intent /

Slot model

London Heathrow

Seattle

02/24/2017

Hotel Booking

Amazon Polly

Turn Text into lifelike speech using deep learning

technologies to synthesize speech that sounds like a

human voice

Amazon Polly

“The temperature

in WA is 75°F”

“The temperature

in Washington is 75 degrees

Fahrenheit”

Amazon Polly: Text In, Life-like Speech Out

Converts text

to life-like speech

47 voices 24 languages Low latency,

real time

Fully managed

Polly: Life-like Speech Service

What is supported?

• Supports all programming language included in AWS SDK

(Java, Python, Node.js, etc) as well as HTTP API

• Audio stream formats: MP3, Vorbis, raw PCM

• Choose your sampling rate to optimize bandwidth & quality

• Customized Pronunciation

Articles and Blogs

Training Material

Chatbots (Lex)

Public Announcements

Polly: SSML and Lexicons

• Using version 1.1 SSML tags to adjust the speech rate, pitch, or volume. e.g.

• <break time="1s"/> pause 1 second between the initial two sentences

• <sub alias="World Wide Web Consortium">W3C</sub> substitute "World Wide Web Consortium" for the

acronym "W3C"

• <amazon:effect name="whispered">Score</amazon:effect> say the second "Score" in a whispered voice

<speak>He was caught up in the game.<break time="1s"/> In the middle of the 10/3/2014 <sub alias="World Wide Web Consortium">W3C</sub> meeting he shouted, "Score!" quite loudly. When his boss stared at him, he repeated <amazon:effect name="whispered">"Score"</amazon:effect> in a whisper.</speak>

• Pronounciation lexicons enable you to customize the pronunciation of words

<lexeme>

<grapheme>Bob</grapheme>

<alias>Robert</alias>

</lexeme>

aws polly synthesize-speech \

--lexicon-names LexA LexB \

--output-format mp3 \

--text 'Hello, my name is Bob' \

--voice-id Justin \

bobAB.mp3

“Hello, my name is Robert”

"Our Mapbox Navigation SDK offers a complete

turn-by-turn navigation solution that you can easily

add to your iOS or Android application, and having

clear, well-understood voice guidance is critical to

the user experience. Therefore, we’re excited to

offer natural-sounding pronunciation with highly

intelligible and pleasant voices in our users’ most

widely used languages with Amazon Polly’s Text-to-

Speech service."

– Paul Veugen, VP of Mobile, Mapbox.

Amazon Rekognition

Image Recognitions and Analysis powered by Deep

Learning which allows to search, verify and organize

millions of images

Amazon RekognitionDeep learning-based image recognition service

Search, verify, and organize millions of images

Object and Scene

DetectionFacial

Analysis

Face

Comparison

Facial

Recognition

Integrated with S3, Lambda, Polly, Lex

Object and Scene Detection

• Search, filter, and

curate image

libraries

• Smart searches for

user generated

content

• Photo, travel, real

estate, vacation

rental applications

Maple

Plant

Villa

Garden

Water

Swimming Pool

Tree

Potted Plant

Backyard

Request

Response

Object and Scene Detection – DetectLabels API

{"Image": {

"Bytes": blob,"S3Object": {

"Bucket": "string","Name": "string","Version": "string"

}},"MaxLabels": number,"MinConfidence": number

}

Maple

Plant

Villa

Garden

Water

Swimming Pool

Tree

Potted Plant

Backyard

{"Labels": [{

"Confidence": 95.78783416748047,"Name": "Villa"

},{

"Confidence": 68.914794921875,"Name": "Swimming Pool"

},{

"Confidence": 59.24593734741211,"Name": "Backyard"

},{

"Confidence": 59.24593734741211,"Name": "Yard"

},],"OrientationCorrection": "ROTATE_0" }

Generate labels for thousands of objects, scenes, and concepts, each with a

confidence score

S3 bucket

Facial Analysis

Demographic Data

Facial Landmarks

Sentiment Expressed

• Smart searches for

user generated

content

• Photo, travel, real

estate, vacation

rental applications

• Targeted marketing

• Dynamic,

personalized ads

• Improve online dating

match

recommendations

Facial Analysis"AgeRange": {"High": 38, "Low": 23},

"BoundingBox": {

"Height": 0.42500001192092896,

"Left": 0.1433333307504654,

"Top": 0.11666666716337204,

"Width": 0.2822222113609314

},

"Confidence": 99.8899917602539,

"Emotions": [

{"Confidence": 93.29251861572266,

"Type": "HAPPY"},

{"Confidence": 28.57428741455078,

"Type": "CALM" },

{"Confidence": 1.4989674091339111,

"Type": "ANGRY" }

],

"Eyeglasses": { "Confidence": 99.99998474121094,

"Value": true },

"Gender": { "Confidence": 100,

"Value": "Female" },

"Smile": { "Confidence": 99.47274780273438,

"Value": true },

"Sunglasses": { "Confidence": 97.63555145263672,

"Value": true }

DetectFaces

smart cropping

& ad overlays

sentiment

capture

demographic

analysis

face editing

& pixelation

Face Comparison

Measure the likelihood that faces in two images are of the same

person

• Add face verification to applications and devices

• Extend physical security controls

• Provide guest access to VIP-only facilities

• Verify users for online exams and polls

CompareFaces

"FaceMatches": [

{"Face": {"BoundingBox": {

"Height": 0.4601006507873535,

"Left": 0.32827046513557434,

"Top": 0.18212316930294037,

"Width": 0.3135717809200287},

"Confidence": 99.99964141845703},

"Similarity": 93

},

{"Face": {"BoundingBox": {

"Height": 0.2383333295583725,

"Left": 0.6233333349227905,

"Top": 0.3016666769981384,

"Width": 0.15888889133930206},

"Confidence": 99.71249389648438},

"Similarity": 0

}

],

"SourceImageFace": {"BoundingBox": {

"Height": 0.23983436822891235,

"Left": 0.28333333134651184,

"Top": 0.351423978805542,

"Width": 0.1599999964237213},

"Confidence": 99.99344635009766}

}

Similarity 93%

Similarity 0%

Celebrity Recognition

More Rekognition Capabilities

Image Moderation

Facial Recognition

Identify people in images by finding the closest match for an input face

image against a collection of stored face vectors

• Add friend tagging to social and messaging apps

• Assist public safety officers find missing persons

• Identify employees as they access sensitive locations

• Identify celebrities in historical media archives

Media Case Study

Identify who is on camera at what time for

each of 8 networks so that recorded video

streams can be indexed and searched

Video frame-sampling facial recognition

solution using Amazon Rekognition:

• Indexed 97,000 people into a face collection in

1 day

• Sample frames every 6 secs and test for image

variance

• Upload images to S3 and call Rekognition to

find best facial match

• Store time stamp and faceID metadata

Amazon AI Services

• Leveraging Amazon internal experiences with AI / ML

• Managed API services with embedded AI for maximum

accessibility and simplicity

• Full stack of platforms and engines for specialized deep

learning applications

Thank you!

Building an AI-based service with Rekognition, Polly and Lex

Documents

Transcript of Building an AI-based service with Rekognition, Polly and Lex