Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image...
Transcript of Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image...
![Page 1: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/1.jpg)
Michel Galley and Lucy Vanderwende
Grounded Neural Conversational Models
![Page 2: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/2.jpg)
Collaborators
Jiwei Li
Stanford
Nasrin Mostafazadeh
U. RochesterMarjan Ghazvininejad
USC/ISI
Alan Ritter
Ohio State U.
Yi Luan
U. WashingtonAlessandro Sordoni
Microsoft
Bill Dolan
Microsoft
Jianfeng Gao
Microsoft
Chris Quirk
Microsoft
Chris Brockett
Microsoft
Scott Yih
Microsoft
Ming-Wei Chang
Microsoft
![Page 3: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/3.jpg)
Goal: Learning to converse
• Seamless and natural
• Open domain
• Open ended and free form(chitchat, informational, …)
Teach machines toengage in conversations
I gotta get out of the house, any recommendations?
Yes it is so sunny! It should stay that way for the rest of the weekend.
Try Mount Rainier. People say it’s beautiful in summer.
The weather is gorgeous today!
![Page 4: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/4.jpg)
Fully Data-Driven Conversation
[Ritter et al., 2011; Sordoni et al., 2015; Vinyals and Le, 2015; Shang et al., 2015; etc.]
Source:
conversation history
Target:
response
Our best model trained with
~140 million conversations
![Page 5: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/5.jpg)
NOT grounded
Dialog Systems: Two paradigms
Understanding
(NLU)State tracker
Generation
(NLG)Dialog policy
input x
output ySta
nd
ard
calendar
Grounded
input x
output y
Fu
lly d
ata
-dri
ven
Environment
![Page 6: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/6.jpg)
A Knowledge-Grounded Neural Conversation Model
ht
Going to
Kusakabe tonight
CONVERSATION HISTORY
Try omakase, the
best in town
RESPONSE
ht DECODERDIALOG
ENCODER
...
WORLD
“FACTS”
A
...CONTEXTUALLY-RELEVANT
“FACTS”
Consistently the best omakase
Amazing sushi tasting […]
They were out of kaisui […]
FACTS
ENCODER
[Ghazvininejad
et al., 2017]
![Page 7: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/7.jpg)
“Infusing” non-conversational knowledge into conversations
You know any good Japanese restaurant in Seattle?
Try Kisaku, one of the best sushi restaurants in the city.
You know any good Arestaurant in B?
Try C, one of the best D in the city.
![Page 8: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/8.jpg)
Sample knowledge-grounded responses
Results w/ 23M conversations: outperforms competitive neural baseline (including on human eval)
![Page 9: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/9.jpg)
Data-driven conversation:toward more informational and “useful” dialogs
Standard dialog systems
(grounded)
chitchat informational,
task-completionFully data-driven
(previously ungrounded)
[Ritter et al., 2011, Sordoni et al., 2015;
Vinyals and Le, 2015; Shang et al., 2015;
Li et al., 2016; …]
[Ghazvininejad
et al., 2017]
GROUNDED!
![Page 10: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/10.jpg)
Grounded and Fully Data-Driven Models
Personalization data
(ID, social graph, ...)
Device sensors
(GPS, vision, ...)
[Li et al., 2016]
[Ghazvininejad et al., 2017]
[Luan et al., 2017]
[Mostafazadeh et al., 2017]
![Page 11: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/11.jpg)
Conversation
![Page 12: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/12.jpg)
Question Generation
• Generating Natural Questions About an Image, ACL 2016
• Nasrin Mostafazadeh, Ishan Misra, Jacob Devlin, Margaret Mitchell, Xiadong He, Lucy Vanderwende
![Page 13: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/13.jpg)
Image Grounded Conversation
Did he end up winning the race?
Yes he won, he can’t believe it.
Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation
Nasrin Mostafazadeh, Chris Brockett, Bill Dolan, Michel Galley, Jianfeng Gao, Georgios P. Spithourakis, Lucy Vanderwende – arXiv
![Page 14: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/14.jpg)
• Twitter Dataset• 250K conversations (3 turn)• Photo + tweet; question; response
• IGC crowd dataset• 4,222 conversations (avg 4 turns)
• 5 additional questions and first responses per conversationfor evaluation
• Sourced using CrowdChip
Image Grounded Conversation Datasets
![Page 15: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/15.jpg)
• Twitter Dataset• 250K conversations (3 turn)• Photo + tweet; question; response
• IGC crowd dataset• 4,222 conversations (avg 4 turns)
• 5 additional questions and first responses per conversationfor evaluation
• Sourced using CrowdChip
Image Grounded Conversation Datasets
![Page 16: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/16.jpg)
• Twitter Dataset• 250K conversations (3 turn)• Photo + tweet; question; response
• IGC crowd dataset• 4,222 conversations (avg 4 turns)
• 5 additional questions and first responses per conversationfor evaluation
• Sourced using CrowdChip
Image Grounded Conversation Datasets
![Page 17: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/17.jpg)
• Natural-sounding conversations about a shared image
• Conversation topics are the events and actions that are evoked by the objects in the image
• Both Image and Textual context are informative when generating the question
Key characteristics of IGC dialogue
![Page 18: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/18.jpg)
• 32% of questions are linked to the image frame
• 47% of questions are linked to the textual context frame
• 14% of cases are the image and the textual context frame the same
FrameNet analysis of dialogue
![Page 19: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/19.jpg)
• Natural-sounding conversations about a shared image
• Conversation topics are the events and actions that are evoked by the objects in the image
• Both Image and Text-context are informative when generating the question
• Complex temporal and causal relations are observed across multiple turns in the conversation, as one would expect from natural conversation
Key characteristics of dialogue
![Page 20: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/20.jpg)
Temporal and causal relations across turns
• Of 20 conversations analyzed, multiple types of relations: 15 cause, 11 enable, 9 overlaps, 8 before and 3 prevent
• 2/3 conversations mention an abstract event entity, e.g. race or remodel
![Page 21: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/21.jpg)
Sample output
![Page 22: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/22.jpg)
Sample output
![Page 23: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/23.jpg)
Sample output
![Page 24: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/24.jpg)
Sample output
![Page 25: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in](https://reader030.fdocuments.us/reader030/viewer/2022040508/5e484f2165521d201b4b1d56/html5/thumbnails/25.jpg)
Thank you
Joint work with: Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Marjan Ghazvininejad, Jiwei Li, Yi Luan, Nasrin Mostafazadeh, Chris Quirk, Alan Ritter, Alessandro Sordoni, Scott Yih