Language Empowering Intelligent Assistants (CHT)
-
Upload
yun-nung-vivian-chen -
Category
Technology
-
view
118 -
download
5
Transcript of Language Empowering Intelligent Assistants (CHT)
Language Empowering Intelligent Assistants智慧型對話助理Y U N - N U N G ( V I V I A N ) C H E N 陳 縕 儂
H T T P : / / V I V I A N C H E N . I D V . T W
Jan. 5th, 2017 @ 中華電信
2
OutlineIntroduction
Intelligent Assistants 智慧助理Mobile Service 行動客服Dialogue System/Bot 對話系統 /機器人
FrameworkLanguage Understanding 語意理解Dialogue Management 對話管理Output Generation 輸出生成
Recent TrendIndustrial Trend and Challenge
Deep Learning Basics
Deep Learning for Dialogues
Conclusion
3
OutlineIntroduction
Intelligent Assistants 智慧助理Mobile Service 行動客服Dialogue System/Bot 對話系統 /機器人
FrameworkLanguage Understanding 語意理解Dialogue Management 對話管理Output Generation 輸出生成
Recent TrendIndustrial Trend and Challenge
Deep Learning Basics
Deep Learning for Dialogues
Conclusion
4
Apple Siri (2011)
Google Now (2012)
Facebook M & Bot (2015)
Intelligent Assistants 智慧助理
Google Home (2016)
Microsoft Cortana (2014)
Amazon Alexa/Echo (2014)
5
Why do we need them?– Get things done
• E.g. set up alarm/reminder, take note
– Easy access to structured data, services and apps• E.g. find docs/photos/notes
– Assist your routine schedule• E.g. check the account balance
– Be more productive in managing your work and personal life
6
Mobile Service 行動客服• allows customers to conduct a range of financial transactions remotely
using a mobile devices, usually called an app (e.g. Richart)
reducing the need for visiting a branch cost reduction
7
Mobile Service 行動客服• The users can finish specific tasks that are predefined by the app
• Limitation– App usage design may not be user-friendly
– Good designs may differ across people
– Learning how to use app takes time
8
Why Natural Language?• Global Digital Statistics (2015 January)
Global Population
7.21B
Active Internet Users
3.01B
Active Social Media Accounts
2.08B
Active Unique Mobile Users
3.65B
The more natural and convenient input of devices evolves towards speech.
9
Intelligent Assistant Architecture
Reactive Assistance反應式協助
Proactive Assistance主動式協助
Data Data Bases and Client Signals
Device/Service End-points(Phone, PC, Xbox, Web Browser, Messaging Apps)
User Experience“restaurant suggestions”“call taxi”
10
• Dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions.
• Dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating system, etc).
Good dialogue systems assist users to access information conveniently and finish tasks efficiently.
Dialogue System 對話系統
JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion
11
App Bot• A bot is responsible for a “single” domain, similar to an app
Seamless and automatic information transferring across domains reduce duplicate information and interaction
12
OutlineIntroduction
Intelligent Assistants 智慧助理Mobile Service 行動客服Dialogue System/Bot 對話系統 /機器人
FrameworkLanguage Understanding 語意理解Dialogue Management 對話管理Output Generation 輸出生成
Recent TrendIndustrial Trend and Challenge
Deep Learning Basics
Deep Learning for Dialogues
Conclusion
13
System Framework
Speech Recognition
Language Understanding (LU)• Domain Identification• User Intent Detection• Slot Filling
Dialogue Management (DM)• Dialogue State Tracking• System Action/Policy
Decision
Output Generation
Recognized Text我要申辦下下周期間的國外上網方案
Semantic Frameapply_international_dataplanperiod=下下周
System Action/Policyrequest_country
Text response你要去哪一國
Screen Displaycountry?
Text Input我要申辦下下周期間的國外上網方案
Speech Signal
14
Interaction Example
User
Intelligent Agent Q: How does a dialogue system process this request?
你上個月的電話費帳單金額為 800元,請問你要用你預設的帳號繳款嗎?
我要繳交上個月的手機費帳單
15
System Framework
Speech Recognition
Language Understanding (LU)• Domain Identification• User Intent Detection• Slot Filling
Dialogue Management (DM)• Dialogue State Tracking• System Action/Policy
Decision
Output Generation
Recognized Text我要申辦下下周期間的國外上網方案
Semantic Frameapply_international_dataplanperiod=下下周
System Action/Policyrequest_country
Text response你要去哪一國
Screen Displaycountry?
Text Input我要申辦下下周期間的國外上網方案
Speech Signal
16
1. Domain IdentificationRequires Predefined Domain Ontology
User
Organized Domain Knowledge (Database)Intelligent Agent
市話 DB個人資料
DB
Machine Learning for Classification
手機 DB
我要繳交上個月的手機費帳單
17
我要繳交上個月的手機費帳單
2. Intent DetectionRequires Predefined Schema
User
Intelligent Agent
FEE_PAYMENTCHECK_REMAINING_DATA:
Machine Learning for Classification
手機 DB
18
我要繳交上個月的手機費帳單
3. Slot FillingRequires Predefined Schema
User
Intelligent Agent
手機 DB
Number Period Amount0933xxx 12月 8000928xxx 11月 560
: : :
FEE_PAYMENTperiod=“上個月” FEE_PAYMENT
period=“12月”amount=“800”
Semantic Frame
Machine Learning for Information Extraction
19
System Framework
Speech Recognition
Language Understanding (LU)• Domain Identification• User Intent Detection• Slot Filling
Dialogue Management (DM)• Dialogue State Tracking• System Action/Policy
Decision
Output Generation
Recognized Text我要申辦下下周期間的國外上網方案
Semantic Frameapply_international_dataplanperiod=下下周
System Action/Policyrequest_country
Text response你要去哪一國
Screen Displaycountry?
Text Input我要申辦下下周期間的國外上網方案
Speech Signal
20
State TrackingRequires Hand-Crafted States
User
Intelligent Agent
amount period number
amount, period
period, number
amount, card
all
要 0933那個號碼NULL
我要繳交上個月的手機費帳單
21
State TrackingRequires Hand-Crafted States
User
Intelligent Agent
period
period, number
要 0933那個號碼NULL
我要繳交上個月的手機費帳單
22
State TrackingRequires Hand-Crafted States
User
Intelligent Agent
amount period number
amount, period
period, number
amount, number
all
NULL
我要繳交 x個月的手機費帳單FEE_PAYMENTperiod=“這個月” FEE_PAYMENT
period=“上個月” FEE_PAYMENT
?
?
23
Policy for Agent Action• Inform
– “你的帳單金額為 800元”• Request
– “請問是要繳交哪一支號碼的呢 ?”
• Confirm– “你要繳交 12月的帳單嗎 ?”
• Database Search
• Task Completion / Information Display– Payment / Data Checking
0933xxx0928xxx
:
24
System Framework
Speech Recognition
Language Understanding (LU)• Domain Identification• User Intent Detection• Slot Filling
Dialogue Management (DM)• Dialogue State Tracking• System Action/Policy
Decision
Output Generation
Recognized Text我要申辦下下周期間的國外上網方案
Semantic Frameapply_international_dataplanperiod=下下周
System Action/Policyrequest_country
Text response你要去哪一國
Screen Displaycountry?
Text Input我要申辦下下周期間的國外上網方案
Speech Signal
25
Output / NL Generation• Inform
– “你的帳單為 800” v.s.
• Request– “你要繳交哪一支號碼的帳單 ?” v.s.
• Confirm– “你要繳交 12月的帳單嗎 ?”
$800
26
OutlineIntroduction
Intelligent Assistants 智慧助理Mobile Service 行動客服Dialogue System/Bot 對話系統 /機器人
FrameworkLanguage Understanding 語意理解Dialogue Management 對話管理Output Generation 輸出生成
Recent TrendIndustrial Trend and Challenge
Deep Learning Basics
Deep Learning for Dialogues
Conclusion
27
AI Startups
28
ChatBot Startups
29
FinTech ChatBot
3分でわかるFintech – Chat botが作り出す新しい世界
Challenge• Predefined semantic schemaChen et al., “Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding,” in ACL-IJCNLP, 2015.
• Data without annotationsChen et al., “Zero-Shot Learning of Intent Embeddings for Expansion by Convolutional Deep Structured Semantic Models,” in ICASSP, 2016.
• Semantic concept interpretationChen et al., “Deriving Local Relational Surface Forms from Dependency-Based Entity Embeddings for Unsupervised Spoken Language Understanding,” in SLT, 2014.
• Predefined dialogue statesChen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016.
• Error propagationHakkani-Tur et al., “Multi-Domain Joint Semantic Frame Parsing using Bi-directional RNN-LSTM,” in Interspeech, 2016.
• Cross-domain intention/bot hierarchySun et al., “An Intelligent Assistant for High-Level Task Understanding,” in IUI, 2016.Sun et al., “AppDialogue: Multi-App Dialogues for Intelligent Assistants,” in LREC, 2016.Chen et al., “Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding,” in ICMI, 2016.
• Cross-domain information transferringKim et al., “New Transfer Learning Techniques For Disparate Label Sets,” in ACL-IJCNLP, 2015.
FIND_RESTAURANTrating=“good” rating=5? 4?
HotelRest Flight
Travel
Trip Planning
30
31
Deep Learning Bas ics
32
Learning ≈Looking for a Function• Speech Recognition
• Handwritten Recognition
• Weather forecast
• Play video games
f
f
f
f
“2”
“你好”“ Saturday”
“move left”
Thursday
33
Machine Learning Framework
Training is to pick the best function given the observed dataTesting is to predict the label using the learned function
Training Data
Model: Hypothesis Function Set 21, ff
Training: Pick the best function f *
Testing: yxf y
*f“Best” Function
,ˆ,,ˆ, 2211 yxyx
Testing Data ,?,x
“It claims too much.”
- (negative)
:x
:yfunction input
function output
34
Target Function
• Classification Task
– x: input object to be classified a N-dim vector
– y: class/label a M-dim vector
yxf MN RRf :
Assume both x and y can be represented as fixed-size vectors
35
Vector Representation Ex
• Handwriting Digit Classification
“2”“1”
001
10 dimensions for digit recognition
“1”“2”“3”
010 “1
”“2”“3”
1: for ink 0: otherwise
Each pixel corresponds to an element in the vector
10
16 x 16
16 x 16 = 256 dimensions
x: image y: class/label
“1” or not
“2” or not“3” or not
MN RRf :
36
Vector Representation Ex
• Sentiment Analysis
“-”“+”
001
3 dimensions(positive, negative, neutral)
“+”“-”“?”
010 “+
”“-”“?”
1: indicates the word0: otherwise
Each element in the vector corresponds to a word in the vocabulary
10
dimensions = size of vocab
x: word y: class/label
“+” or not
“-” or not“?” or not
MN RRf :
“love”
37
A Single Neuron
z
1w
2w
Nw…
1x
2x
Nx
b
z z
zbias
y
zez
1
1
Sigmoid function
Activation function
Each neuron is a very simple function
38
A Single Neuron
z
1w
2w
Nw…
1x
2x
Nx
b
z
bias
y
zez
1
1
1
w, b are the parameters of this neuron
39
A Single Neuron
z
1w
2w
Nw…
1x
2x
Nx
bbias
y
1
MN RRf :
5.0"2" 5.0"2"
ynotyis
A single neuron can only handle binary classification
40
A Layer of Neurons
• Handwriting digit classification
MN RRf :
A layer of neurons can handle multiple possible output,and the result depends on the max one
…
1x
2x
Nx
1
1y
……“1” or not
“2” or not
“3” or not
2y
3y
10 neurons/10 classes
Which one is the max?
41
Neural Networks – Multi-Layer Perceptron (MLP)
1a 1z
2z
1x
2x z2a
Hidden Units1 1
y
42
• Continuous function w/ 2 layers
• Combine two opposite-facing threshold functions to make a ridge
• Continuous function w/ 3 layers
• Combine two perpendicular ridges to make a bump
Add bumps of various sizes and locations to fit any surface
Expression of MLP
http://aima.eecs.berkeley.edu/slides-pdf/chapter20b.pdf
Multiple layers enrich the model expression, so that the model can approximate more complex functions
43
Deep Neural Networks (DNN)
• Fully connected feedforward network
1x
2x
……
Layer 1
……
1y
2y
……
Layer 2…
…Layer L
……
……
……
Input Output
MyNx
vector x
vector y
MN RRf :
Deep NN: multiple hidden layers
44
Deep Learning for Dia logues
45
RNN for SLU• IOB Sequence Labeling for Slot Filling
• Intent Classification
𝑤0 𝑤1 𝑤2 𝑤𝑛
h0𝑓 h1
𝑓 h2𝑓 h𝑛
𝑓
h0𝑏 h1
𝑏 h2𝑏 h𝑛
𝑏
𝑦 0 𝑦 1 𝑦 2 𝑦 𝑛
(a) LSTM (b) LSTM-LA (c) bLSTM-LA
(d) Intent LSTM
intent
𝑤0 𝑤1 𝑤2 𝑤𝑛
h0 h1 h2 h𝑛
𝑦 0 𝑦 1 𝑦 2 𝑦 𝑛
𝑤0 𝑤1 𝑤2 𝑤𝑛
h0 h1 h2 h𝑛
𝑦 0 𝑦 1 𝑦 2 𝑦 𝑛
𝑤0 𝑤1 𝑤2 𝑤𝑛
h0 h1 h2 h𝑛
46
RNN for SLU• Joint Multi-Domain Intent Prediction and Slot Filling
– Information can mutually enhanced
semantic frame sequence
ht-1 ht+1htW W W W
taiwanese
B-type
U
food
U
please
U
VO
VO
V
hT+1
EOS
U
FIND_RESTV
Slot Tagging Intent Prediction
Hakkani-Tur, et al., “Multi-Domain Joint Semantic Frame Parsing using Bi-directional RNN-LSTM,” in Interspeech, 2016.
47
just sent email to bob about fishing this weekend
O O O OB-contact_name
OB-subject I-subject I-subject
U
S
I send_emailD communication
send_email(contact_name=“bob”, subject=“fishing this weekend”)
are we going to fish this weekend
U1
S2 send_email(message=“are we going to fish this weekend”)
send email to bob
U2
send_email(contact_name=“bob”)
B-messageI-message
I-message I-message I-messageI-message I-message
B-contact_nameS1
Single Turn
Multi-Turn
Domain Identification Intent Prediction Slot Filling
Contextual SLU (Chen et al., 2016)
48
u
Knowledge Attention Distributionpi
mi
Memory Representation
Weighted Sum h
∑ Wkg
oKnowledge Encoding
Representationhistory utterances {xi}
current utterance
c
Inner Product
Sentence EncoderRNNin
x1 x2 xi…
Contextual Sentence Encoder
x1 x2 xi…
RNNmem
slot tagging sequence y
ht-1 ht
V V
W W W
wt-1 wt
yt-1 yt
U U
RNN Tagger
M M
Chen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016.
1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding
Contextual SLU (Chen et al., 2016)
Idea: additionally incorporating contextual knowledge during slot tagging track dialogue states in a latent way
49
End-to-End Supervised Dialogue System
Wen, et al., “A Network-based End-to-End Trainable Task-Oriented Dialogue System,” arXiv.:1604.04562v2.
50
InfoBot:End-to-End Dialogue System with Supervised & Reinforcement Learning
Movie=?; Actor=Bill Murray; Release Year=1993
Find me the Bill Murray’s movie.
I think it came out in 1993.
When was it released?
Groundhog Day is a Bill Murray movie which came out in 1993.
KB-InfoBotUser
(Groundhog Day, actor, Bill Murray)(Groundhog Day, release year, 1993)(Australia, actor, Nicole Kidman)(Mad Max: Fury Road, release year, 2015)
Knowledge Base (head, relation, tail)
Dhingra, et al., “End-to-End Reinforcement Learning of Dialogue Agents for Information Access,” arXiv.:1609.00777v2.
51
E2E RL Task-Completion Bot
wi
B-type
wi
+1
wi+2
O O
EOS
<intent>
wi
B-type
wi
+1
wi+2
O O
EOS
<intent> Semantic Frame
request_moviegenre=action, date=this weekend
System Action/Policyrequest_location
User Dialogue ActionInform(location=Bellevue)
Time t-1
wi
<slot>
wi+1 wi+2
O O
EOS
<intent>
Language Understanding (LU)
Time t-2
Time t
Dialogue Management
(DM)
w0 w1w2
Natural Language Generation (NLG)EOSUser
Goal
User Agenda Modeling
User Simulator (Li et al., 2016)End-to-End Neural Dialogue System
Text InputAre there any action movies to see this weekend?
52
E2E RL Task-Completion Bot• Simulated User
– Generate interactions based on a predefined fake goal
– Automatically learn strategy by training on the simulated dataUser goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle.
RULE BASED SYSTEMUser: Can I get 2 tickets for the witch?Agent: Which theater do you want?User: Regal meridian 16Agent: What time would you like to see it?User: 9:30 pmAgent: What date would you like to see it?User: Tomorrow!Agent: How many people are you?User: Just two adults.Agent: What is your location?User: Seattle!Agent: What kind of video format do you like?User: Oh, any kind is okay. Agent: Do you have any price constraint?User: No, I don’t care.Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.User: Thank you.Agent: Thank you!
REINFORCEMENT LEARNING SYSTEMUser: Can I get 2 tickets for the witch?Agent: What date would you like to see it?User: TomorrowAgent: Which theater do you prefer?User: Regal meridian 16Agent: What is your location?User: SeattleAgent: What time would you like to see it?User: 9:30 pm.Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.User: Thanks.Agent: Thanks!
53
OutlineIntroduction
Intelligent Assistants 智慧助理Mobile Service 行動客服Dialogue System/Bot 對話系統 /機器人
FrameworkLanguage Understanding 語意理解Dialogue Management 對話管理Output Generation 輸出生成
Recent TrendIndustrial Trend and Challenge
Deep Learning Basics
Deep Learning for Dialogues
Conclusion
54
Conclusion• The conversational bots can help users manage information access and
finish tasks via spoken interactions– More natural– More convenient– More efficient– User-centered
• Future Vision– Not only single-turn requests but also multi-turn conversations– Not only simple transactions but also complicated ones
• Dialogues can span on multiple domains (e.g. check remaining data and then apply for more data)
• NN-Based Dialogue System– Pipeline outputs are represented as vectors distributional– The execution is constrained by backend services symbolic
55
Q & AT H A N K S F O R Y O U R AT T E N T I O N !