ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm...

28
ATILA – 2013 An algorithm for generating child–adult interaction data Yevgen Matusevych Afra Alishahi

Transcript of ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm...

Page 1: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

ATILA – 2013

An algorithm for generating child–adult interaction data

Yevgen Matusevych Afra Alishahi

Page 2: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Contents

1. Input to CLA models.

2. Natural vs. generated input.

3. Hybrid approach.

4. Improving the algorithm.

Page 3: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Overview

• Computational models of child language acquisition (CLA) often use as an input utterance–scene pairs, for example in modeling cross-situational word learning:

Utterance (linguistic input): Take the ball!

Scene (visual input): {ball, car, rattle, book}

• Existing collections of child-directed speech (e.g., CHILDES) provide the linguistic input, but not the visual input.

Page 4: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Input to CLA models

Two possibilities:

1. Use a small manually annotated dataset.

- Relatively small amounts.

2. Generate visual input automatically.

- But what about its statistical properties?

Input to a cognitively plausible model must have the same statistical properties as the naturalistic data. So we need to compare the two sources.

Page 5: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Manually annotated sample • 3 short fragments (~10 min. each) of video recordings of 13-month-old

children playing toys with adults.

• Adult’s and child’s gaze directions, utterances and actions.

• Scene at step 3: [adult, child, book, car, open, point, play]

# Who? Looks where? Does what? Says what?

1. Adult child point book FROG. CROAK-CROAK

2. Child car play car [babbling]

3. Adult book open book CROAK-CROAK

Page 6: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Automatically prepared data Fazly, Alishahi et al., 2010:

use semantic symbols that correspond to the words in the utterance. Referential uncertainty is simulated by merging the representations of two consecutive scenes, and pairing them with only one of the utterances.

Utt1: But it is very boring.

Utt2: Are we going to play now?

Utt3: Did you get fed up … ?

Page 7: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Automatically prepared data Fazly, Alishahi et al., 2010:

use semantic symbols that correspond to the words in the utterance. Referential uncertainty is simulated by merging the representations of two consecutive scenes, and pairing them with only one of the utterances.

Utt1: But it is very boring. Scene1: [but, it, is, very, boring, are,

we, going, to, play, now]

Utt2: Are we going to play now?

Utt3: Did you get fed up … ?

Page 8: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Statistical measures • Measuring statistical properties:

1. Scene stability, or the overlap between every pair of consecutive scenes:

2. Noise, or the normalized number of words that refer to something not present in the scene:

3. Referential certainty, or the normalized number of the scene elements that are referred to in the utterance:

1

11),(

+

++ ∪

∩=

ii

iiii SS

SSSSoverlap

i

iiii U

SUUUnoise

∩−=)(

i

iii S

SUScertainty

∩=)( Si - current scene

Ui - current utterance Si+1 - next following scene

Page 9: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Statistical measures

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Scene stability Noise Referential certainty

manual

automatic

Page 10: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

The hybrid approach A framework that uses a small data sample as an input and generates a

meaningful stream of adult-child interaction.

Context: puzzle, duck, bin, ball, frog Turn Agent Action Utterance

1. Adult play puzzle — 2. Child play duck babbling 3. Adult point puzzle Duck fits here. 4. Child touch bin babbling 5. Adult play puzzle Yes?

Page 11: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

The hybrid approach

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Scene stability Noise Referential certainty

manual

automatic

generated

Page 12: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

The hybrid approach

• The hybrid approach – generating the data based on a small manually annotated sample – provides better data. So how does it work?

Page 13: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

The hybrid approach

• The hybrid approach – generating the data based on a small manually annotated sample – provides better data. So how does it work?

• Based on co-occurrence frequencies. If two items co-occur often, they must be related, e.g.:

— Adults react on children’s babbling and actions.

— Utterances often accompany actions.

— Objects are associated with certain actions.

A manipulate book FROG. CROAK-CROAK

C close book [babbling]

A open book CROAK-CROAK

Page 14: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi
Page 15: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi
Page 16: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Improved algorithm

• Manual system of dependencies using information from n previous feature values.

1. Processing.

2. Generation.

# Who? Looks where? Does what? To what? Says what?

1. Adult child point book FROG. CROAK-CROAK

2. Child car play car [babbling]

3. Adult book open book CROAK-CROAK

Page 17: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Improved algorithm: processing

ADULT gazeA: child actionA: point argument1A: book argument2A: ⌀ utteranceA: FROG. CROAK-CROAK CHILD gazeC: car actionC: play argument1C: car argument2C: ⌀ utteranceC: babbling

# Who? Looks where? Does what? To what? Says what?

1. Adult child point book FROG. CROAK-CROAK

2. Child car play car [babbling]

3. Adult book open book CROAK-CROAK

Page 18: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Improved algorithm: processing

ADULT gazeA: child actionA: point argument1A: book argument2A: ⌀ utteranceA: FROG. CROAK-CROAK CHILD gazeC: car actionC: play argument1C: car argument2C: ⌀ utteranceC: babbling ADULT gazeA: book

Page 19: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Improved algorithm: processing

ADULT gazeA: child actionA: point argument1A: book argument2A: ⌀ utteranceA: FROG. CROAK-CROAK CHILD gazeC: car actionC: play argument1C: car argument2C: ⌀ utteranceC: babbling ADULT gazeA: book

Count (gazeA (n+1) = book| gazeA (n) = child)

Count (gazeA (n+1) = book| actionA (n) = point) …

Page 20: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Improved algorithm: processing

ADULT gazeA: child actionA: point argument1A: book argument2A: ⌀ utteranceA: FROG. CROAK-CROAK CHILD gazeC: car actionC: play argument1C: car argument2C: ⌀ utteranceC: babbling ADULT gazeA: book actionA: open

Page 21: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Improved algorithm: processing

Adult child point book FROG. CROAK-

CROAK

Child car play car [babbling]

Adult book open book CROAK-CROAK

Page 22: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Improved algorithm: processing

Adult child point book FROG. CROAK-

CROAK

Child car play car [babbling]

Adult book open book CROAK-CROAK

Page 23: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Improved algorithm: processing

Adult child point book FROG. CROAK-

CROAK

Child car play car [babbling]

Adult book open book CROAK-CROAK

Page 24: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Improved algorithm: processing

ACTIONC= GAZEA =

play point open

book 1 4 0

child 7 2 0

car 0 5 3

Page 25: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Improved algorithm: generating

Features: {gazeA, actionA, object1A, object2A, utteranceA, gazeC, actionC, object1C, object2C, utteranceC}

A. Assume the features are independent?

B. Markov chain with memory m = 10?

C. Make an assumption that each feature depends on some features, but not on the other ones?

∏∈

==featuresF

iini

valueFvalueFP )|(

),...,,|( 10102211 −−−−−− ==== nnnnnnn vFvFvFvalueFP

Page 26: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Improved algorithm: generating

A distribution of values: book: 0.025 car: 0.005 child: 0.01 … So we can sample a value using the probabilities as weights.

Page 27: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Conclusions & future work

• Data generated using the hybrid approach have their statistical properties closer to those of naturalistic data.

• The algorithm can be improved using automatic collection of implicit statistical information and transforming it into transitional probabilities.

• We need to find an optimal way to represent the relations between the features: - which distribution to use? - assign weights? - replace sparse features like UTTERANCE with their categories? It means more manual work.

Page 28: ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm for generating child–adult interaction data . Yevgen Matusevych . Afra Alishahi

Questions?