Email Processing
and Question
Answering System
(EPQAS)
Date: 29 March 2018
Rafael E. Banchs Human Language Technology Unit
HOW IT STARTED
• Agency receives a high volume of emails that need to be dealt with on a daily basis demanding significant amount of resources and long response times
• Main objectives: to use state-of-the-art natural language processing technologies to
1. Reduce the volume of incoming emails by supporting advance FAQ online support at agency’s website
2. Automatically redirect the incoming emails to the appropriate officer or group
3. Provide officers with pre-selected responses based on similar past email responses
Problem Statement and Objectives
Agency Website
Automated FAQ system
Query Resolved?
YES
Email classifier
NO
Use
r Se
nd
s Em
ail
Group 1 Group N …
Proposed responses
Proposed responses
…
… Officer 1 Officer N
USER
Frequently Asked Questions (FAQ) Service Engine
Email classification and Response Recommendation Engine
Proposed Solution
WHAT WE FACED
Phonetics: system of sounds
Morphology: forms and words
Syntax: clauses and sentences
Semantics: conveying of meaning
Pragmatics: meaning in context
Communication
Means
Pursued Goals
Intentions
Physiological
Capabilities
Abstract Cognitive
Faculties
Levels of Linguistic Phenomena
Phonetics: system of sounds
Morphology: forms and words
Syntax: clauses and sentences
Semantics: conveying of meaning
Pragmatics: meaning in context
Communication
Means
Pursued Goals
Intentions
Physiological
Capabilities
Abstract Cognitive
Faculties
Levels of Linguistic Phenomena
• The process of transforming a natural language
statement into a semantic representation (frame):
• Subtask 1: Intent Detection
• Subtask 2: Entity Extraction
Natural Language Understanding
• The process of transforming a natural language
statement into a semantic representation (frame):
• Subtask 1: Intent Detection
• Subtask 2: Entity Extraction
Ok, I will meet you in Starbucks at 6pm
Natural Language Understanding
• The process of transforming a natural language
statement into a semantic representation (frame):
• Subtask 1: Intent Detection
• Subtask 2: Entity Extraction
Ok, I will meet you in Starbucks at 6pm
Intent Detection: Confirm_meeting
Natural Language Understanding
• The process of transforming a natural language
statement into a semantic representation (frame):
• Subtask 1: Intent Detection
• Subtask 2: Entity Extraction
Ok, I will meet you in Starbucks at 6pm
Intent Detection: Confirm_meeting
Entity Extraction: Action: Meet
Place: Starbucks
Time: 6:00pm
Natural Language Understanding
• Transactional natural language applications
Detected Intent: Confirm_meeting
Extracted Entities: Action: Meet
Place: Starbucks
Time: 6:00pm Semantic Frame
Execute Transaction
for instance… Update_calendar
Transactional vs Informational
• Transactional natural language applications
• Informational natural language applications
Detected Intent: Confirm_meeting
Extracted Entities: Action: Meet
Place: Starbucks
Time: 6:00pm Semantic Frame
Execute Transaction
for instance… Update_calendar
Detected Intent: Ask_for_location
Extracted Entities: Action: Go_to
Place: Starbucks
Location: ??? Semantic Frame
Search for Information
for instance… Find_address_in_map
Transactional vs Informational
• Q&A is typically an informational application
• There are two different approaches, depending on the type of information available:
• Question search: matching intents and entities over a database of available question answer pairs (FAQs).
• Response selection: matching intents and entities over a collection of statements that might contain the answer.
Question Answering
NLU in Question Answering
• Intents have to be identified among different language constructions:
Do you take credit cards?
Can I make the payment with visa?
Detected Intent: Ask_about_CC_accepted
NLU in Question Answering
• Intents have to be identified among different language constructions:
• Entities have to be identified among different references:
Do you take credit cards?
Can I make the payment with visa?
Detected Intent: Ask_about_CC_accepted
Do you take credit cards?
Can I make the payment with visa?
Extracted Entities: Action: Payment
Type: Credit_Card
Accept: ???
Problems of Discrete Representation
Consider the following sequences of words
Do you take credit cards?
Can I make the payment with visa?
When can I make the payment for tourist visa application?
Problems of Discrete Representation
Consider the following sequences of words
Do you take credit cards?
Can I make the payment with visa?
When can I make the payment for tourist visa application?
SEMANTICALLY RELATED
WORD OVELAP = 0%
Problems of Discrete Representation
Consider the following sequences of words
Do you take credit cards?
Can I make the payment with visa?
When can I make the payment for tourist visa application?
SEMANTICALLY RELATED
WORD OVELAP = 0%
NOT SEMANTICALLY
RELATED
WORD OVELAP = 70%
Properties of Continuous Spaces
The Distributional Hypothesis
“a word is characterized for the company it keeps” (Firth 1957)
meaning is mainly determined by the context rather than from individual language units
• Continuous spaces represent semantic similarities by means of
the geometric concept of proximity
• Offer much “better” smoothing capabilities
• Not constrained to the Markovian assumption
Similarity in Continuous Space
Can I make the payment with visa?
Do you take credit cards?
When can I make the payment for tourist visa application?
Similarity in Continuous Space
Can I make the payment with visa?
SEMANTICALLY RELATED
(SHORT DISTANCE) Do you take credit cards?
When can I make the payment for tourist visa application?
Similarity in Continuous Space
Can I make the payment with visa?
SEMANTICALLY RELATED
(SHORT DISTANCE)
NOT SEMANTICALLY RELATED
(LONG DISTANCE)
Do you take credit cards?
When can I make the payment for tourist visa application?
Building Continuous Space Models
1.- Train a deep learning network on a supervised task
W1
W2
W3
…
Wn
Input S
ente
nce
L e v e l s o f A b s t r a c t i o n
C1
C2
…
Ck
Superv
ised T
ask
Building Continuous Space Models
1.- Train a deep learning network on a supervised task
2.- Use some of its internal layer representations as continue space models for language
W1
W2
W3
…
Wn
Input S
ente
nce
L e v e l s o f A b s t r a c t i o n
C1
C2
…
Ck
Superv
ised T
ask
M O D E L
Semantic Maps of Words
BIRD
GOAT
SKY
LIGHTNING
THUNDER
RAIN FIELD
FLOCK
SHEEP MOUNTAIN
SEA
CLOUD
WIND
FISH
RIVER
STORM
(Banchs 2012)
Semantic Maps of Words
Non-living things Living things
BIRD
GOAT
SKY
LIGHTNING
THUNDER
RAIN FIELD
FLOCK
SHEEP MOUNTAIN
SEA
CLOUD
WIND
FISH
RIVER
STORM
(Banchs 2012)
Semantic Maps of Words
Water
Land
Sky
Non-living things Living things
BIRD
GOAT
SKY
LIGHTNING
THUNDER
RAIN FIELD
FLOCK
SHEEP MOUNTAIN
SEA
CLOUD
WIND
FISH
RIVER
STORM
(Banchs 2012)
Regularities as Vector Offsets
Kings
(Mikolov 2013)
King
Queen
Queens
Regularities as Vector Offsets
Kings
(Mikolov 2013)
King
Queen
Queens gender offset singular/plural offset
Regularities as Vector Offsets
Kings
(Mikolov 2013)
King
Queen
Queens
Kings – King ≈ Queens – Queen
gender offset singular/plural offset
Queens ≈ Kings – King + Queen
Regularities across Languages
(Hermann 2014)
Days of the Week Months of the Year
GERMAN FRENCH ENGLISH
GERMAN FRENCH
THE SOLUTION
Agency Website
Automated FAQ system
Query Resolved?
YES
Email classifier
NO
Use
r Se
nd
s Em
ail
Group 1 Group N …
Proposed responses
Proposed responses
…
… Officer 1 Officer N
USER
Frequently Asked Questions (FAQ) Service Engine
Email classification and Response Recommendation Engine
QU
ESTI
ON
SEA
RC
H P
RO
BLE
M
RES
PO
NSE
SEL
ECTI
ON
PR
OB
LEM
Proposed Solution Revisited
Overall FAQ System Architecture
User Query
Query
Type?
Keyword-based
Keyword
based Search
Query
Processing
Natural
Language
Query
Continuous
Space Model 1
Continuous
Space Model 2
In-domain
Bag-of-Words
Re-ranker
Control Engine:
.- High confidence: single response
.- Medium confidence: multiple responses
.- Low confidence: suggest to send email
.- Out-of-domain: institutional response
In/out domain
Classifier
FAQ
Database
Overall Email System Architecture
User Email Email
Classification
Segmentation
Re-ranker
Interactive Interface:
Original user email +
List of pre-selected
responses
Sentence-level
Classification
Historical
Email DB
In-domain
Resources
Query
Processing
Entity
Extraction
Pre-selection of Responses
System Update
Outcome and Output Indicators
Website
FAQs
Enquiry
Officer
Response
USER USER
C u s t o m e r J o u r n e y - V a l u e C h a i n
Outcome and Output Indicators
1. Reduction of incoming email volume (10%-20% less) • User finds more information in the website, and faster • Lower average number of emails per day sent to agency
Website
FAQs
Enquiry
Officer
Response
USER USER 1
C u s t o m e r J o u r n e y - V a l u e C h a i n
Outcome and Output Indicators
1. Reduction of incoming email volume (10%-20% less) • User finds more information in the website, and faster • Lower average number of emails per day sent to agency
2. Reduction in human effort (20%-30% less) • Less human effort to re-route and reply to emails • Larger volume of emails processed per time unit
Website
FAQs
Enquiry
Officer
Response
USER USER 1 2
C u s t o m e r J o u r n e y - V a l u e C h a i n
Outcome and Output Indicators
1. Reduction of incoming email volume (10%-20% less) • User finds more information in the website, and faster • Lower average number of emails per day sent to agency
2. Reduction in human effort (20%-30% less) • Less human effort to re-route and reply to emails • Larger volume of emails processed per time unit
3. Reduction on email response time (20%-30% less) • Faster internal processing of emails • Lower average response time to the user
Website
FAQs
Enquiry
Officer
Response
USER USER 1 2 3 Response Time
C u s t o m e r J o u r n e y - V a l u e C h a i n
Thank you
www.a-star.edu.sg/I2R www.facebook.com/i2r.research
Rafael E. Banchs Human Language Technology Unit
Top Related