Www.amiproject.org Collaborative Annotation of the AMI Meeting Corpus Jean Carletta University of...

Post on 03-Jan-2016

213 views 0 download

Transcript of Www.amiproject.org Collaborative Annotation of the AMI Meeting Corpus Jean Carletta University of...

ww

w.a

mip

roje

ct.o

rg

Collaborative Annotation of the Collaborative Annotation of the

AMI Meeting CorpusAMI Meeting Corpus

Jean Carletta

University of Edinburgh

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 20072

AMI PartnersAMI Partners

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 20073

NXT Major Development NXT Major Development SitesSites

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 20074

AMI's aimAMI's aim

• aim: to develop technologies for browsing meetings and to assist people during meetings

• interdisciplinary: signal processing, language engineering, theoretical linguistics, human-computer interfaces, organizational psychology, ...

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 20075

Why annotation?Why annotation?

• For basic scientific understanding - e.g.,• How do people choose a next speaker? • What is the relationship between speech

and gesture during deixis?

• For machine learning• Hand-code e.g. statement vs. question• Identify features for each like word

sequences and prosody• Use the data to fit a statistical classifier that

codes new data automatically

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 20076

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 20077

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 20078

AMI Meeting RoomsAMI Meeting Rooms

4 close- and 2 wide-view cameras, 4 head-set and 8 array microphones, presentation screen capture, whiteboard capture, pen devices, plus extra site-dependent devices

TNO Edinburgh IDIAP

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 20079

IS1004d, 3:07 - 4:11IS1004d, 3:07 - 4:11

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200710

Corpus OverviewCorpus Overview

• 100 hrs of well-recorded meetings

• orthographically transcribed with word timings by forced alignment

• ASR output

• heavily annotated by hand for communicative behaviours

• Creative Commons Share-Alike licensing, with demo DVD

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200711

Hand AnnotationsHand Annotations

• transcription with word-level timings from forced alignment (100%)

• timestamping against signal (10-30%)• head gestures; hand gestures for

addressing and interactions with objects; location in room; gaze; emotion?

• discourse structure (70%)• dialogue acts (some w/ addressing), named

entities, topic segments, linked extractive and abstractive summaries

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200712

Costs in person-hrs/hrCosts in person-hrs/hr

transcription 30

topic segments + abstractive summaries 6-10

dialogue acts w/ some relations 20

addressing 12

extractive summaries linked to abstract 1

named entities 2-5

hand gestures (rough timings) 6

head gestures (rough timings) 6

head gestures (precision timings) 20

movement around room 4

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200713

Core ProblemsCore Problems

• How do we represent all of these kinds of annotation on the same base data, including both structural relationships and timing?

• How do we allow for multiple (human and machine) annotations of the same property, so that we can compare them?

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200714

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200715

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200716

NITE XML ToolkitNITE XML Toolkit• Mature toolkit for handling annotations with

temporal ordering and full structural relations • Data storage format designed to support

distributed corpus development• Libraries for data handling, query, and writing

graphical user interfaces• End user annotation tools for common tasks• Command line utilities for analysis, feature

extraction

• Open source

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200717

NXT corpus designNXT corpus design• data model is multi-rooted tree with arbitrary

graph structure over the top• each node has one set of children, multiple parents

• annotations often naturally map to a tree• corpus design to decide where trees intersect

• NXT can represent arbitrary graphs but the more the data has this character, the less useful the query language is

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200718

extract from Bdb001.A.words.xml

<w nite:id="Bdb001.w.1,342" starttime="356.39" endtime="" c="W">time</w> <w nite:id="Bdb001.w.1,343" starttime="" endtime="" c="HYPH">-</w> <w nite:id="Bdb001.w.1,344" starttime="" endtime="356.59" c="W">line</w>

extract from Bdb001.A.speech-quality.xml<speechquality nite:id="Bdb001.emphasis.16" type="emphasis"> <nite:child href="Bdb001.A.words.xml#id(Bdb001.w.1,342)..id(Bdb001.w.1,344)" /> </speechquality>

Stand-off XMLStand-off XML

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200719

Metadata fileMetadata file

Like set of DTDs for the XML files plus:

• connections between the files

• list of "observations" (coded dialogues/group discussions/texts)

• catalog for finding signals and data on disk

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200720

Simple example querySimple example query

($w word)($r reference): ($w@POS = “NN”) && ($r ^ $w)

Return list of 2-tuples of words and referring expressions where the word’s part of speech is NN and the word is in the referring expression.

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200721

General features of the General features of the languagelanguage

• Match variable by no type, single type, or disjunctive type

• Attribute and content tests for existence, ordering, equality, match to regexp

• The usual boolean combinators• Quantifiers forall and exists • Filtering by passing results to another query

to create a result tree (not list)

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200722

Uses for queriesUses for queries

• Exploring the data in a browser• Basic frequency counts• Verifying data quality• Indexing complexes for further use• Finding things for screen rendering

in GUI

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200723

Only configuration Only configuration needed to:needed to:

• search/index data in NXT format• display data in a standardized

(ugly) way• Set up annotation tools for some

common tasks• dialogue act• named entity• time-stamped labelling

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200724

• [named entity demo]

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200725

Programming Programming tailored interfacestailored interfaces

• development time is 1.5 days - 2 weeks depending on • how clear the spec is• complexity of the interface and

whether our "transcription view" middleware fits

• familiarity with Swing

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200726

Named entity coderNamed entity coder

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200727

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200728

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200729

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200730

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200731

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200732

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200733

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200734

ww

w.a

mip

roje

ct.o

rg

Carletta 20 June 200735

SummarySummary

• NXT provides infrastructure for collaborative annotation that • Is distributed• Provides structural relationships• Provides timing w.r.t signals• Works for large-scale projects

• NXT’s best current demonstration is in the AMI Meeting Corpus