Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments
description
Transcript of Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments
![Page 1: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/1.jpg)
Detecting Action Items in Multi-Party Meetings: Annotation and Initial
Experiments
Matthew Purver, Patrick Ehlen, John Niekrasz
Computational Semantics LaboratoryCenter for the Study of Language and Information
Stanford University
![Page 2: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/2.jpg)
The CALO Project
Multi-institution, multi-disciplinary project Working towards an intelligent personal
assistant that learns Three major areas
– managing personal data clustering email, documents, managing contacts
– assisting with task execution learning to carry out computer-based tasks
– observing interaction in meetings
![Page 3: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/3.jpg)
The CALO Meeting Assistant
Observe human-human meetings– Audio recording & speech recognition (ICSI/CMU)– Video recording & processing (MIT/CMU)– Written notes, via digital ink (NIS) or typed (CMU)– Whiteboard sketch recognition (NIS)
Produce a useful record of the interaction– answer questions about what happened– can be used by attendees or non-attendees
Learn to do this better over time (LITW)
![Page 4: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/4.jpg)
The CALO Meeting Assistant
Primary focus on end-user
Develop something that can really help people when it comes to dealing with all of the meetings we have to deal with
![Page 5: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/5.jpg)
What do people want to know from meetings?
![Page 6: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/6.jpg)
What do people want to know from meetings?
Banerjee et al. (2005) survey of 12 academics:– Missed meeting - what do you want to know?– Topics: which were discussed, what was said?– Decisions: what decisions were made?– Action items/tasks: was I assigned something?
![Page 7: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/7.jpg)
What do people want to know from meetings?
Banerjee et al. (2005) survey of 12 academics:– Missed meeting - what do you want to know?– Topics: which were discussed, what was said?– Decisions: what decisions were made?– Action items/tasks: was I assigned something?
Lisowska et al. (2004) survey of 28 people:– What would you ask a meeting reporter system?– Similar responses about topics, decisions– who attended, who asked/decided what?– Did they talk about me?
![Page 8: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/8.jpg)
Purpose
Helpful system: not only records and transcribes a meeting, but extracts (from streams of potentially messy human-human speech):– topics discussed – decisions made – tasks assigned (“action items”)
The system should highlight this information over meeting “noise”
![Page 9: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/9.jpg)
Example
Impromptu meeting you might have after your team has boarded a rebel spacecraft in search of stolen plans, and you’re trying to figure out what to do next
![Page 10: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/10.jpg)
Commander, tear this ship apart until you’ve
found those plans!
![Page 11: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/11.jpg)
Commander, tear this ship apart until you’ve
found those plans!
![Page 12: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/12.jpg)
Commander, tear this ship apart until you’ve
found those plans!
A section of discourse in a meeting where someone is made responsible to take care of something
![Page 13: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/13.jpg)
![Page 14: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/14.jpg)
Action Items
Concrete decisions; public commitments to be responsible for a particular task
Want to know:– Can we find them?– Can we produce useful descriptions of them?
Not aware of previous discourse-based work
![Page 15: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/15.jpg)
Action Item Detection in Email
Corston-Oliver et al., 2004 Marked a corpus of email with “dialogue acts” Task act:
– “items appropriate to add to an ongoing to-do list” Good inter-annotator agreement (kappa > 0.8) Per-sentence classification using SVMs
– lexical features e.g. n-grams; punctuation; message features
– f-scores around 0.6
![Page 16: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/16.jpg)
A First Try: Flat Annotation
Gruenstein et al (2005) analyzed 65 meetings annotated from:– ICSI Meeting Corpus (Janin et al., 2003)– ISL Meeting Corpus (Burger et al., 2002)
Two human annotators “Mark utterances relating to action items”
– create groups of utterances for each AI– made no distinction between utterance type/role
![Page 17: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/17.jpg)
A First Try: Flat Annotation (cont’d)
Annotators identified 921 / 1267 (respectively) action item-related utterances
Human agreement poor ( < 0.4) Tried binary classification using SVMs (like
Corston-Oliver) Precision, recall, f-score: all below .25
![Page 18: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/18.jpg)
Try a more restricted dataset?
Sequence of 5 (related) CALO meetings– similar amount of ICSI/ISL data for training
Same annotation schema SVMs with words & n-grams as features
– Also tried other discriminative classifiers, and 2- & 3- grams, w/ no improvements
Similar performance– Improved f-scores (0.30 - 0.38), but still poor– Recall up to 0.67, precision still low (< 0.36)
![Page 19: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/19.jpg)
Should we be surprised?
Our human annotator agreement poor DAMSL schema has dialogue acts Commit,
Action-directive– annotator agreement poor ( ~ 0.15)– (Core & Allen, 1997)
ICSI MRDA dialogue act commit– Most DA tagging work concentrates on 5 broad DA
classes Perhaps “action items” comprise a more
heterogeneous set of utterances
![Page 20: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/20.jpg)
Rethinking Action Item Acts
Maybe action items are not aptly described as singular “dialogue acts”
Rather: multiple people making multiple contributions of several types
Action item-related utterances represent a form of group action, or social action
That social action has several components, giving rise to a heterogeneous set of utterances
What are those components?
![Page 21: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/21.jpg)
Commander, tear this ship apart until you’ve
found those plans!
• A person commits or is committed to “own” the action item
![Page 22: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/22.jpg)
Commander, tear this ship apart until you’ve
found those plans!
• A person commits or is committed to “own” the action item
• A description of the task itself is given
![Page 23: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/23.jpg)
Commander, tear this ship apart until you’ve
found those plans!
• A person commits or is committed to “own” the action item
• A description of the task itself is given
• A timeframe is specified
![Page 24: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/24.jpg)
Yes, Lord Vader!
• A person commits or is committed to “own” the action item
• A description of the task itself is given
• A timeframe is specified • Some form of agreement
![Page 25: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/25.jpg)
Exploiting discourse structure
Action items have distinctive properties– Task description, owner, timeframe, agreement
Action item utterances can simultaneously play different roles– assigning properties– agreeing/committing
These classes may be more homogeneous & distinct than looking for just “action item” utts.– Could improve classification performance
![Page 26: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/26.jpg)
New annotation schema
Annotated and classified again using the new schema
Classify utterances by their role in the action item discourse– can play more than one role
Define action items by grouping subclass utterances together in an action-item discussion– a subclass can be missing
![Page 27: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/27.jpg)
Action Item discourse: an example
![Page 28: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/28.jpg)
New Experiment
Annotated same set of CALO/ICSI/ISL data using the new schema
Ran classifiers to train and identify utterances that contain each of the 4 subclasses
![Page 29: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/29.jpg)
Encouraging signs
Between-class distinction (cosine distances)– Agreement vs. any other is good: 0.05 to 0.12– Timeframe vs. description is OK: 0.25– Owner/timeframe/description: 0.36 to 0.47
Improved inter-annotator agreement?– Timeframe: = 0.86– Owner 0.77, agreement & description 0.73– Warning: this is only on one meeting, although it’s
the most difficult one we could find
![Page 30: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/30.jpg)
Combined classification
Still don’t have enough data for proper combined classification – Recall 0.3 to 0.5, precision 0.1 to 0.5– Agreement subclass is best, with f-score = 0.40
Overall decision based on sub-classifier outputs Ad-hoc heuristic:
– prior context window of 5 utterances– agreement plus one other class
![Page 31: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/31.jpg)
Questions we can ask
Does overall classification look useful?– Whole-AI-based f-score 0.40 to 1.0 (one meeting
perfectly correlated with human annotation) Does overall output improve sub-classifiers?
– Agreement: f-score 0.40 0.43– Timescale: f-score 0.26 0.07– Owner: f-score 0.12 0.24– Description: f-score 0.33 0.24
![Page 32: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/32.jpg)
Example output
From a CALO meeting: t = [the, start, of, week, three, just, to] o = [reconfirm, everything, and, at, that, time, jack, i'd, like,
you, to, come, back, to, me, with, the] d = [the, details, on, the, printer, and, server] a = [okay]
Another (less nice?) example: o = [/h#/, so, jack, /uh/, for, i'd, like, you, to] d = [have, one, more, meeting, on, /um/, /h#/, /uh/] t = [in, in, a, couple, days, about, /uh/] a = [/ls/, okay]
![Page 33: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/33.jpg)
Where next for action items? More data annotation
– Using NOMOS, our annotation tool Meeting browser to get user feedback Improved individual classifiers Improved combined classifier
– maximum entropy model– not enough data yet
Moving from words to symbolic output– Gemini (Dowding et al., 1990) bottom-up parser
![Page 34: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/34.jpg)
![Page 35: Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments](https://reader030.fdocuments.us/reader030/viewer/2022013004/56814eef550346895dbc7e8e/html5/thumbnails/35.jpg)
Questions we can ask
Does overall classification look useful?– Whole-AI-based f-score 0.40 to 1.0 (one meeting
perfectly correlated with human annotation) Does overall output improve sub-classifiers?
– Agreement: f-score 0.40 0.43– Timescale: f-score 0.26 0.07– Owner: f-score 0.12 0.24– Description: f-score 0.33 0.24