The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels,...
Transcript of The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels,...
![Page 1: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/1.jpg)
The Corpus of Interactional Data: a Large MultimodalAnnotated Resource
Philippe Blache
Laboratoire Parole et LangageBrain and Language Research Institute
CNRS & Aix-Marseille Universite
LDC 2013 The CID corpus 1 / 60
![Page 2: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/2.jpg)
Outline
Multimodal annotation: general overview
The formal background
Annotation of the different domains in the CID
LDC 2013 The CID corpus 2 / 60
![Page 3: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/3.jpg)
Part I
Multimodal Annotation: General Overview
LDC 2013 The CID corpus 3 / 60
![Page 4: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/4.jpg)
Multimodality
Goals
Description of modalities and their interactionAnalysis of natural communication
Different sources of information
Different modalities: verbal, non verbal, context, etc.Different domains: phonetics, prosody, syntax, pragmatics, etc.
Issues
Representation, encodingDiversity of annotation tools and formatsAlignment vs. synchronizationData manipulation, querying
Method
Rich annotation for each domainHomogeneous framework
LDC 2013 The CID corpus 4 / 60
![Page 5: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/5.jpg)
Multimodal Corpora: a survey
Switchboard in NXT (NITE XML Toolkit)
642 conversations; 830,000 words.Syntax, turns, disfluency, information status, coreference, phonemes,syllables, prosodic phrases, breaks, accents
LUNA (Spoken Language Understanding in MultilingualCommunication Systems)
8100 human-machine dialogues and 1000 human-human dialogues inPolish, Italian and French.Turns, POS, chunks, dialogue acts, reference
SAMMIE (Saarbrucken Multimodal MP3 Player InteractionExperiment)
Multimodal dialogue system, human-machine multimodal interaction(Wizard of Oz)Transcription, turns, clauses, discourse entities, dialogue acts
LDC 2013 The CID corpus 5 / 60
![Page 6: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/6.jpg)
Multimodal Corpora: a survey
AMI (Augmented Multi-party Interaction)
100h meeting, full manual transcriptionDialogue acts, focus of attention, movement (hand, head, leg), namedentities, topic segmentation
The ITC Corpus
11 groups of 4 people (25 minutes each). Task: decision makingscenarioNo transcription, functional role, socio emotional, speech activity, bodyactivity
The ATR Corpus
10 meetings, 1 hour eachNo transcription, speech activity, body movements, activity type
Multimodal Corpora
LDC 2013 The CID corpus 6 / 60
![Page 7: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/7.jpg)
Part II
The Corpus of Interactional Data: a Large
Scale Experiment
LDC 2013 The CID corpus 7 / 60
![Page 8: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/8.jpg)
CID: main features
8 dialogs, 1 hour each (4 male/male; 4 female/female)
Task: - “Tell something unusual which happened to you”- “Tell about professional conflicts you may have met”
Setting
Anechoic room1 camcorder / 2 microphones
Annotations (aligned on the signal)
Phonetic and orthographic transcriptionProsody (units, intonation, contours)Morphosyntax, syntaxDiscourse (markers, turns, etc.)Gestures
LDC 2013 The CID corpus 8 / 60
![Page 9: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/9.jpg)
Example
LDC 2013 The CID corpus 9 / 60
![Page 10: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/10.jpg)
The Annotation Wokflow
LDC 2013 The CID corpus 10 / 60
![Page 11: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/11.jpg)
The Annotation Architecture
LDC 2013 The CID corpus 11 / 60
![Page 12: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/12.jpg)
The Annotation Architecture
LDC 2013 The CID corpus 12 / 60
![Page 13: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/13.jpg)
Main steps and contributions
1 Primary Data Preparation
Transcription Convention <<< CID Convention
Generation of orthographic and phonetic transcriptionsAligning transcriptions with the signal <<< CID
2 Automatic Annotation
Syllabification <<< CIDIntonationSentence segmentation <<< CIDPOS-taggerChunkerShallow parser
LDC 2013 The CID corpus 13 / 60
![Page 14: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/14.jpg)
Main steps and contributions
3 Manual Annotation <<< CID
Gestures: hands, head, armsProsody: phrasing, contoursDisfluencesDiscourse: turns, backchannels, reported speech, information structure
4 Formal representation
Abstract schema: Typed Feature Structures <<< CIDGeneration of the XML schema <<< CIDFormatting dataQuerying
LDC 2013 The CID corpus 14 / 60
![Page 15: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/15.jpg)
Some descriptions
1 Backchannels <<< CID
Vocal and gesturalDescription in terms of prosody, discourse, morpho-syntax
2 Detachments <<< CID
Dislocation, cleft, topicalizationAnnotation of the detachment type, the category, the function, theanaphor
LDC 2013 The CID corpus 15 / 60
![Page 16: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/16.jpg)
Part III
The Formal Background
LDC 2013 The CID corpus 16 / 60
![Page 17: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/17.jpg)
Annotation Graphs
LDC 2013 The CID corpus 17 / 60
![Page 18: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/18.jpg)
Annotation Graphs
LDC 2013 The CID corpus 18 / 60
![Page 19: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/19.jpg)
Annotation Graphs
LDC 2013 The CID corpus 19 / 60
![Page 20: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/20.jpg)
NXT gestures
LDC 2013 The CID corpus 20 / 60
![Page 21: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/21.jpg)
NXT-format Switchboard
LDC 2013 The CID corpus 21 / 60
![Page 22: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/22.jpg)
Graph Annotation Format (GrAF)
GrAF: nodes and edges, decorated with feature structures
Annotations associated to the nodes (rather than the edges as in AG)
Nodes may be linked to:
Primary dataOther nodes in the graph
LDC 2013 The CID corpus 22 / 60
![Page 23: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/23.jpg)
Graph Annotation Format (GrAF)
Base segmentation:<seg:sink seg:id="42" seg:start="24" seg:end="35"/>
Annotation over the base segmentation:
<msd:node msd:id="16">
<msd:f name="cat" value="NN"/>
</msd:node>
<msd:edge from="msd:16" to="seg:42"/>
Annotation over another annotation:<ptb:node ptb:id="23">
<ptb:f name="type" value="NP"/>
<ptb:f name="role" value="SBJ"/>
</ptb:node>
<ptb:edge from="ptb:23"to="msd:16"/>
LDC 2013 The CID corpus 23 / 60
![Page 24: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/24.jpg)
A generic scheme
Needs
A mean to describe the information to be encoded, its organizationA precise description of:
the categories or objects in each domainthe organization of each domainthe relations between the domains
An homogeneous framework for representing all sources of informationIndependent from a specific tool or formalism
Solution: Typed Feature Structure
Description of the objects and their propertiesDescription of the hierarchical structure
LDC 2013 The CID corpus 24 / 60
![Page 25: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/25.jpg)
An Annotation Scheme in terms of TFS
Type hierarchy:
object
������
����
����
@@
@@
PPPP
PPPP
PP
pros phr��HH
ip ap
phono
�� HHsyllable phoneme
disfluence��HH
lex non-lex
gest
���
HHH
hand head ...
Constituency hierarchy:
ip ::= ap∗
ap ::= syl+
syl ::= const syl+
const syl ::= phon+
disf ::= reprandum break reparans
LDC 2013 The CID corpus 25 / 60
![Page 26: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/26.jpg)
The TFS Schema
Object type :
object
[index integer
locationloc type
]
Location type:
loc type
�����
HHH
HH
temporal
����
HHHH
interval[start time unit
end time unit
] point[point time unit
]spatial
LDC 2013 The CID corpus 26 / 60
![Page 27: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/27.jpg)
Phonetics
phon
sampa label sampa unit
cat{
vowel, consonant}
type{
occlusive, fricative, nasal, ...}
articulation
lip
[protusionstring
aperture aperture
]
tongue
tip
[locationstring
degree string
]
body
[locationstring
degree string
]
velum aperture
glottis aperture
role
[epentheticboolean
liaison boolean
]
LDC 2013 The CID corpus 27 / 60
![Page 28: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/28.jpg)
Prosody
pros phr
������
HHH
HHH
iplabel IP
constituents list(ap)
contour
directionstring
position string
function string
ap[label AP
constituents list(syl)
]
LDC 2013 The CID corpus 28 / 60
![Page 29: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/29.jpg)
Prosody: Example
ip
label IP
index 18
location
[start83.11
end 204.21
]
constituents
ap
label AP
index 25
location
[start192.28
end 204.21
]
contour
directionfalling
position final
function conclusive
LDC 2013 The CID corpus 29 / 60
![Page 30: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/30.jpg)
Syllabic structure
syl
struct syl struct
position
rank{
integer}
syl number{
integer}
accentuable boolean
prominence boolean
constituents list(const syl)
const syl
phon list(phon)
const type{
onset, nucleus, coda}
syl
label syl
index 42
location
[start 195.12
end 204.21
]
constituents
{[const type onset
phon /f/
],
[const type nucleus
phon /u/
],
[const type coda
phon /l/
]}struct CVC
position 3/3
accentuable false
prominence false
LDC 2013 The CID corpus 30 / 60
![Page 31: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/31.jpg)
Syllabic structure
syl
struct syl struct
position
rank{
integer}
syl number{
integer}
accentuable boolean
prominence boolean
constituents list(const syl)
const syl
phon list(phon)
const type{
onset, nucleus, coda}
syl
label syl
index 42
location
[start 195.12
end 204.21
]
constituents
{[const type onset
phon /f/
],
[const type nucleus
phon /u/
],
[const type coda
phon /l/
]}struct CVC
position 3/3
accentuable false
prominence false
LDC 2013 The CID corpus 30 / 60
![Page 32: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/32.jpg)
Disfluency
disfluency
���
��
HHH
HH
lex[reparandum frag
break int break
]
����
HHHH
repaired[type rep
reparans change
] incomplete[dis type inc
]
non lex
���
HHH
filled[type fill
] silent[type sil
]
LDC 2013 The CID corpus 31 / 60
![Page 33: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/33.jpg)
Part IV
The annotations
LDC 2013 The CID corpus 32 / 60
![Page 34: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/34.jpg)
Transcription
EOT on y va avec des copains on a(v)ait pris l(e) ferry en Normandie,T/
p(ui)sque j’avais un frere qui etait en $Normandie, T/$ on traverse on
a(v)ait passe [une, uneu] nuit epouvantab(le) sur le ferry et euh on arrive
a $Londres,T /$ on voit ma soeur e(lle) nous amene dans le [B&B, biainbi]
ou ...
Tokens on y va avec des copains on avait pris le ferry en Normandie puisque j’
avais un frere qui etait en Normandie on traverse on avait passe une nuit
epouvantable sur le ferry et on arrive a Londres on voit ma soeur elle nous
amene dans le B&B ou ...
LDC 2013 The CID corpus 33 / 60
![Page 35: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/35.jpg)
Segmentation
on y va avec des copains /Wm/ on avait pris le ferry en Normandie puisque j’
avais un frere qui etait en Normandie /Wd/ on traverse /Wm/ on avait passe une
nuit epouvantable sur le ferry /Wm/ et on arrive a Londres /Wm/ on voit ma soeur
/Wm/ elle nous amene dans le B&B /Wm/ ou on devait loger /Wd/ on se promene /Wm/
moi /Wm/ j’ etais deja crevee au bout de trois jours /Wm/ parce qu’ on voyageait
vachement a pied /Wm/ donc j’en pouvais plus
LDC 2013 The CID corpus 34 / 60
![Page 36: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/36.jpg)
Phonetic transcription
Grapheme-phoneme conversion
Input: enriched transcriptionOutput: list of phonemes, with liaisons
Exampleet c’est comme en anglais te rappelles pas en anglais quand euh tu epelais
ton nom euh tu sais quand tu apprends les lettres
e s e k o m a~ n a~ g l e t @ R A p e l p A a~ n a~ g l e k a~ @ t y e p @
l e t o~ n o~ t @ t s e k a~ t A p R a~ l e l e t R #
LDC 2013 The CID corpus 35 / 60
![Page 37: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/37.jpg)
Phonetics: example in Praat
LDC 2013 The CID corpus 36 / 60
![Page 38: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/38.jpg)
Phonetics: some figures
Phenomenon Number
Elision 11,058Word truncation 1,732Standard liaison missing 160Unusual liaison 49Non-standard phonetic realization 2,812Laugh seq. 2,111Laughing speech seq. 367Single laugh IPU 844Overlaps > 150 ms 4,150
LDC 2013 The CID corpus 37 / 60
![Page 39: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/39.jpg)
Alignment
LDC 2013 The CID corpus 38 / 60
![Page 40: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/40.jpg)
Syllables
LDC 2013 The CID corpus 39 / 60
![Page 41: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/41.jpg)
Prosody
LDC 2013 The CID corpus 40 / 60
![Page 42: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/42.jpg)
Prosodic contours
LDC 2013 The CID corpus 41 / 60
![Page 43: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/43.jpg)
POS-tagging
LDC 2013 The CID corpus 42 / 60
![Page 44: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/44.jpg)
Lexicon
LDC 2013 The CID corpus 43 / 60
![Page 45: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/45.jpg)
Chunking
LDC 2013 The CID corpus 44 / 60
![Page 46: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/46.jpg)
Chunking (2)
LDC 2013 The CID corpus 45 / 60
![Page 47: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/47.jpg)
Some results
Category Count Group Count
Adverb 15 123 AP 3 634Adjective 4 585 NP 13 107Auxiliary 3 057 PP 7 041Determiner 9 427 AdvP 15 040Conjunction 9 390 VPn 22 925Interjection 5 068 VP 1 323Preposition 8 693 Total 63 070Pronoun 25 199Noun 13 419 Soft Pct 9 689Verb 20 436 Strong Pct 14 459Total 11 4397 Total 24 148
LDC 2013 The CID corpus 46 / 60
![Page 48: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/48.jpg)
Trees
LDC 2013 The CID corpus 47 / 60
![Page 49: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/49.jpg)
Detachment: annotations
Dislocation: “Chocolate, I hate”
Cleft: “It is John who married Ann”
Pseudo-cleft: “What he wanted to do was to travel”
Binary constructions: “Being happy, it is not always”
Features
Detachment type: D, CV, PSCV, B
Detached category: NP, NPrel, NPproP, NPproD, NPproQ, PP, AP,
AdvP, VP, S
Function: Subj, Odir, Oind, Loc, Adj
Resumptive element: Rxx (xx : type of the res. element)
LDC 2013 The CID corpus 48 / 60
![Page 50: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/50.jpg)
Detachment
LDC 2013 The CID corpus 49 / 60
![Page 51: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/51.jpg)
Detachment (2)
LDC 2013 The CID corpus 50 / 60
![Page 52: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/52.jpg)
Disfluencies
LDC 2013 The CID corpus 51 / 60
![Page 53: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/53.jpg)
Disfluencies
LDC 2013 The CID corpus 52 / 60
![Page 54: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/54.jpg)
Disfluencies (2)
LDC 2013 The CID corpus 53 / 60
![Page 55: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/55.jpg)
Gestures: hands
Hands
Symmetry symmetry
hands type
Phase phase
Gesture gesture
HandShape
[Shape
Laxness boolean
]HandOrientation orientation
Space
[Region region
Coordinates
]Contact
MovementQuality
Trajectory trajectory
Velocity velocity
Amplitude amplitude
LDC 2013 The CID corpus 54 / 60
![Page 56: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/56.jpg)
Gestures: hands
symmetry: {Both hands symmetrical, Both hands asymmetrical, ...}phase: {Preparation, Stroke, Hold, Retraction, ...}gesture: {Adaptor, Iconic, Metaphoric, Deictic, Emblem, ...}orientation: { Palm up, Palm down, Palm towards self, Palm away from self, ...}region: { Center center, Center, Periphery, Extreme periphery}coordinates: { Right, Left, Upper, Lower, Upper right, ...}contact: { Forehead, Hair, Cheek, Chin, Eyes, Eyebrow, ...}trajectory: { Upper, Lower, Right, Left, Upper right, lower right, ...}velocity: { Normal, Fast, Slow}amplitude: { Small, Medium, large}
LDC 2013 The CID corpus 55 / 60
![Page 57: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/57.jpg)
Gestures: hands
LDC 2013 The CID corpus 56 / 60
![Page 58: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/58.jpg)
Stand-off hierarchical encoding
When editing, no distinction features vs. hierarchy
Prosody: intonation, contour are features vs. AP ∈ IP
LDC 2013 The CID corpus 57 / 60
![Page 59: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/59.jpg)
TFS representation: XML scheme
label IP
constituents list(ap)
contour
direction string
position string
function string
<xs:complexType name="IntonationalPhrase">
<xs:complexContent>
<xs:extension base="ProsodicPhrase">
<xs:sequence>
<xs:element name="constituents">
<xs:complexType>
<xs:sequence>
<xs:element name="accentual\_phrase" type="AccentualPhrase"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="contour" type="Contour"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
LDC 2013 The CID corpus 58 / 60
![Page 60: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/60.jpg)
XML representation
intervals [2]:
xmin = 0.78
xmax = 1.7559754641684542
text = "ip"
...
intervals [5]:
xmin = 2.6703535937364578
xmax = 3.329971301020408
text = "ip"
...
class = "TextTier"
name = "at_ctr"
xmin = 0
xmax = 3573.6
points: size = 2118
points [1]:
time = 1.7559754641684542
mark = "RT"
...
points [3]:
time = 3.329971301020408
mark = "F"
<IntonationalPhrase index=0>
<localisation start=0.78 end=1.7559 />
<contour type=RT time=1.7559 />
</IntonationalPhrase>
...
<IntonationalPhrase index=5>
<localisation start=2.6703 end=3.3299 />
<contour type=F time=3.3299 />
</IntonationalPhrase>
LDC 2013 The CID corpus 59 / 60
![Page 61: The Corpus of Interactional Data: a Large Multimodal ... · Discourse: turns, backchannels, reported speech, information structure 4 Formal representation Abstract schema: Typed Feature](https://reader033.fdocuments.us/reader033/viewer/2022050113/5f49f5b46014bd1a9617388c/html5/thumbnails/61.jpg)
Distribution: the CID at SLDR
LDC 2013 The CID corpus 60 / 60