EuroLAN 2019 - Thematic threads in chatseurolan.info.uaic.ro/html/profs/materials/Cristea.pdf ·...

Post on 19-Jan-2021

2 views 0 download

Transcript of EuroLAN 2019 - Thematic threads in chatseurolan.info.uaic.ro/html/profs/materials/Cristea.pdf ·...

Thematic threads in chats

Dan Cristea

“Al.I.Cuza” University of IasiInstitute of Computer Science of the Romanian Academy

2

Chapter 1:The discourse and its structure

3

What is discourse?Longman: 1. a serious speech or piece or writing

on a particular subject: Professor Grant delivered a long discourse on aspects of moral theology. 2. serious conversation between people: You can’t expect meaningful discourse when you two disagree so violently. 3. the language used in particular kinds of speech or writing: scientific discourse.

4

Text versus discourseSyntactically – a discourse is more than a single

sentence.

From Garcia Marquez

5

A text is not a discourse!

But it becomes a discourse the very moment it is read or heard by a human... or a machine.

Text versus discourse

6

Time and discourse

Discourse has a dynamic nature

Time axesreal time

discourse time

story time

1 2

2 11000 1030800 920

1 2

7

Attentional state theory (AST)(Barbara Grosz & Candence Sidner, 1987)

Models the linguistic structure of the discourse Gives an account on intentions and how they are combinedExplains the shift of attention during discourse interpretationExplains interruptions and flash-backsPuts in evidence a dynamic domain of referentiality3 components: • a linguistic structure• an intentional structure• an attentional state

8

Rhetorical structure theory

Basics• text span: un uninterrupted linear interval of text• relation: holds between two or more non-overlapping spans• arguments of relations are of a nuclear type and a satellite type

– a nucleus is more important than a satellite (deletion and substitution tests)

– relations: hypotactic (one nucleus + satellites) and paratactic (all nuclear) • scheme: integrates by a relation two or more text spans (like grammar

rules) • RST analysis are trees• they reflect a judge interpretation (therefore could be subjective)

(William Mann and Sandra Thompson, 1987)

9

1 2 3 4

5

6 7 8

9

10

11 12

13-??

??-??

H = 1 9 *V = 1 9 *

H = 1V = 1 9 *

H = 9V = 1 9 *

H = 1V = 1 9 *

H = 5V = 1 5 9 *

H = 1V = 1 9 *

H = 3V = 1 3 5 9 *

H = 6 7V = 1 5 6 7 9 *

H = 9V = 1 9 *

H = 9V = 1 9 *

H = 9V = 1 (8) 9 *

H = 10V = 1 9 10 *

H = 11V = 1 9 10 11 *H = 3

V = 1 3 5 9DRA = 1 3 H = 9

V = 1 (8) 9DRA = 1 8 9

Trees in RST

1

4

2 3

relations

units

nuclear

10

RST schemes

relation

text span: nucleus

text span: satellite

relation

text span: nucleus

text span: nucleus

11

RST relationsSubject matter

(informational)

ElaborationCircumstanceSolutionhoodVolitional CauseVolitional ResultNon-Volitional CauseNon-Volitional ResultPurposeConditionOtherwiseInterpretationEvaluationRestatementSummarySequenceContrast

Presentational (intentional)

MotivationAntithesisBackgroundEnablementEvidenceJustifyConcession

12

RST analysis1. Farmington Police had to help control traffic recently2. when hundreds of people lined up to be among the first applying for jobs at the yet-to-open Marriot Hotel.3. The hotel’s help-wanted announcement – for 300 openings – was a rare opportunity for many unemployed.4. The people waiting in line carried a message of claims that the jobless could be employed if only they

showed enough moxie.5. Every rule has exceptions,6. but the tragic and too-common tableaux of hundreds of people snake-lining up for any task with a paycheck

illustrates a lack of jobs,7 not laziness.

circumstance

32

2-3

volitional result

1-3

4

evidence

5

6

antithesis

7

6-7

concession

5-7

4-7

background

1-7

13

How do themes relate with trees?

Danes…

Downward: refinement/elaboration of an original idea

Upward: uprising of new aspects out of a previous idea

14

How do themes relate with trees?

15

What should a DS count for?

• Unity beyond the sentence level – coherence: it makes sense, with respect to an

accepted setting, real or virtual– why is that certain sequences are more difficult

to read than others?• Referentiality in relation with DS

– cohesion: pronouns and other referential means glues units together

– why is that only certain referential means may be used in certain circumstances?

16

The Sequentiality Principle (SP)

A left-to-right reading of the terminal frontier of a discourse tree associated to a text corresponds to the span of text analysed, in the same linear order.

(Marcu, 2000), etc.

17

Operations inspired by TAG

X

Y

X

Y

developing treeauxiliary tree transformed tree

foot node

Adjunction

X

X

substitution node

developing tree

elementary tree

transformed tree

Substitution

18

The Right Frontier Constraint• As a referential constraint defining the regions of the

discourse model that could anchor a weak power referential expression

• As an attachment constraint in an incremental development of the tree structure:– at any step of an incremental discourse parsing, if the DT does not

contain substitution nodes, the SP is observed if and only if all operations on the developing DT are adjoinings of left-footed ATsonto nodes of the generalised right frontier or substitutions on the most-inner substitution node.

19

The Right Frontier Property

X

Y

X

Y

developing treeauxiliary tree transformed tree

Adjunction of a left-footed AT on the RF of the DT

σy

σ1

σ0

σ1

σ0

σy

20

The Right Frontier Property

X

Y

X

Y

developing treeauxiliary tree transformed tree

Adjunction of an AT which is not left-footed

σa2σa1

σ1

σ0

σ1

σ0

σa2σa1

21

The Right Frontier Property

X

Y

X

Y

developing treeauxiliary tree transformed tree

Adjunction of an AT on a node which does not belong to the RF of the DT

σa2σa1

σ1

σ0

σ1

σ0

σa2σa1

σ2

Y

σ2

22

Chapter 2:The structure of chats

Work in collaboration with Corina Forăscu

23

Chats

• Followers of instant messages– In instant messaging each character appeared when it

was typed. The UNIX "talk" command was popular in the 1980s and early 1990s

• Allows easy collaboration, more akin to genuine conversation than email's "letter" format

• Invites to “immediate” reaction from the partner

24

Why studying chats?• Forums that encourage chats on different

subject matters• Blogs versus chats:

– blogs: short articles displaying your opinion on smth

– chats: direct access with a person (on-line opinion is always more appealing than posting and waiting for reaction)

• Spotting opinions in chats

25

What makes chats different than direct conversations?

• Direct conversation (face-to-face or telephone) – The emitter has immediate reaction– Although a matter of manners, mainly it is not usual

(offending) to react again before the partner answers– There is no record of previous themes, other than the

memories of partners• Chat

– There is a small delay between messages– The “silence” after sending a short message can be used to

approach another theme– The screen preserves previous themes, which thus can be

re-open easily

26

RFC can be violated!…

From (Sassen&Kühnlein, 2005)

B: would you say that I’m indeed only jealous? B: do you have another consultation hour scheduled

soon? A: Try to be with yourself after all you’re heading for

your final degree!B: mhm, well, but concentrating on my studies often

doesn’t workA: yes, always on Monday. A: only jealous that would already suffice as problem,

but I think there is more to it.

27

RFC as conditional

Sassen&Kühnlein 05:

if (not under stress) then RFC

28

Can RFC be violated?…a. B: I’ve recently felt confused when seeing Maria

talking with Michael. b. B: The reason I’m asking your advise is that I’ll

have my degree in June and right now I’m not in my best shape.

a. A: oh yea, then, what we have here is a nice case of jealousy.

a. B: would you say that I’m indeed only jealous? c. B: do you have another consultation hour scheduled

soon? b. A: Try to be with yourself after all you’re heading

for your final degree!b. B: mhm, well, but concentrating on my studies often

doesn’t workc. A: yes, always on Monday. a. A: only jealous that would already suffice as

problem, but I think there is more to it.

29

Three different threads develop in parallel

a. B: I’ve recently felt confused when seeing Maria talking with Michael.

b. B: The reason I’m asking your advise is that I’ll have my degree in June and right now I’m not in my best shape.

a. A: oh yea, then, what we have here is a nice case of jealousy.

a. B: would you say that I’m indeed only jealous? c. B: do you have another consultation hour scheduled

soon? b. A: Try to be with yourself after all you’re heading

for your final degree!b. B: mhm, well, but concentrating on my studies often

doesn’t workc. A: yes, always on Monday. a. A: only jealous that would already suffice as

problem, but I think there is more to it.

the jealousy theme

30

Three different threads develop in parallel

a. B: I’ve recently felt confused when seeing Maria talking with Michael.

b. B: The reason I’m asking your advise is that I’ll have my degree in June and right now I’m not in my best shape.

a. A: oh yea, then, what we have here is a nice case of jealousy.

a. B: would you say that I’m indeed only jealous? c. B: do you have another consultation hour scheduled

soon? b. A: Try to be with yourself after all you’re heading

for your final degree!b. B: mhm, well, but concentrating on my studies often

doesn’t workc. A: yes, always on Monday. a. A: only jealous that would already suffice as

problem, but I think there is more to it.

the studies theme

31

Three different threads develop in parallel

a. B: I’ve recently felt confused when seeing Maria talking with Michael.

b. B: The reason I’m asking your advise is that I’ll have my degree in June and right now I’m not in my best shape.

a. A: oh yea, then, what we have here is a nice case of jealousy.

a. B: would you say that I’m indeed only jealous? c. B: do you have another consultation hour scheduled

soon? b. A: Try to be with yourself after all you’re heading

for your final degree!b. B: mhm, well, but concentrating on my studies often

doesn’t workc. A: yes, always on Monday. a. A: only jealous that would already suffice as

problem, but I think there is more to it.

the consultations theme

32

Three different threads develop in parallel

a1

a2

b1

b2

r1

r4a3

r2

a4

r3

b3

r5

c1 c2

r6

33

Could RFC be violated?…a. B: I’ve recently felt confused when seeing Maria

talking with Michael. b. B: The reason I’m asking your advise is that I’ll

have my degree in June and right now I’m not in my best shape.

a. A: oh yea, then, what we have here is a nice case of jealousy.

a. B: would you say that I’m indeed only jealous? c. B: do you have another consultation hour scheduled

soon? b. A: Try to be with yourself after all you’re heading

for your final degree!b. B: mhm, well, but concentrating on my studies often

doesn’t workc. A: yes, always on Monday. a. A: only jealous that would already suffice as

problem, but I think there is more to it.A: to this you have to find a solution immediately.

34

Could RFC be violated?…a. B: I’ve recently felt confused when seeing Maria

talking with Michael. b. B: The reason I’m asking your advise is that I’ll

have my degree in June and right now I’m not in my best shape.

a. A: oh yea, then, what we have here is a nice case of jealousy.

a. B: would you say that I’m indeed only jealous? c. B: do you have another consultation hour scheduled

soon? b. A: Try to be with yourself after all you’re heading

for your final degree!b. B: mhm, well, but concentrating on my studies often

doesn’t workc. A: yes, always on Monday. a. A: only jealous that would already suffice as

problem, but I think there is more to it.A: to this you have to find a solution immediately.

35

Could RFC be violated?…a. B: I’ve recently felt confused when seeing Maria

talking with Michael. b. B: The reason I’m asking your advise is that I’ll

have my degree in June and right now I’m not in my best shape.

a. A: oh yea, then, what we have here is a nice case of jealousy.

a. B: would you say that I’m indeed only jealous? c. B: do you have another consultation hour scheduled

soon? b. A: Try to be with yourself after all you’re heading

for your final degree!b. B: mhm, well, but concentrating on my studies often

doesn’t workc. A: yes, always on Monday. a. A: only jealous that would already suffice as

problem, but I think there is more to it.A: to this you have to find a solution immediately.

36

r1

B: I’ve recently felt confused when seeing Maria talking with Michael.

A: oh yea, then, what we have here is a nice case of jealousy.

r2

B: would you say that I’m indeed only jealous?

r3

A: only jealous that would already suffice as problem, but I think there is more to it.

B: The reason I’m asking your advise is that I’ll have my degree in June and right now I’m not in my best shape.

r2

A: Try to be with yourself after all you’re heading for your final degree!

r3

B: mhm, well, but concentrating on my studies often doesn’t work

B: do you have another consultation hour scheduled soon?

r3

A: yes, always on Monday.

A: to this you have to find a solution immediately.

37

r1

B: I’ve recently felt confused when seeing Maria talking with Michael.

A: oh yea, then, what we have here is a nice case of jealousy.

r2

B: would you say that I’m indeed only jealous?

r3

A: only jealous that would already suffice as problem, but I think there is more to it.

B: The reason I’m asking your advise is that I’ll have my degree in June and right now I’m not in my best shape.

r2

A: Try to be with yourself after all you’re heading for your final degree!

r3

B: mhm, well, but concentrating on my studies often doesn’t work

B: do you have another consultation hour scheduled soon?

r3

A: yes, always on Monday.

A: to this you have to find a solution immediately.

38

r1

B: I’ve recently felt confused when seeing Maria talking with Michael.

A: oh yea, then, what we have here is a nice case of jealousy.

r2

B: would you say that I’m indeed only jealous?

r3

A: only jealous that would already suffice as problem, but I think there is more to it.

B: The reason I’m asking your advise is that I’ll have my degree in June and right now I’m not in my best shape.

r2

A: Try to be with yourself after all you’re heading for your final degree!

r3

B: mhm, well, but concentrating on my studies often doesn’t work

B: do you have another consultation hour scheduled soon?

r3

A: yes, always on Monday.

A: to this you have to find a solution immediately.

r3

39

r1

B: I’ve recently felt confused when seeing Maria talking with Michael.

A: oh yea, then, what we have here is a nice case of jealousy.

r2

B: would you say that I’m indeed only jealous?

r3

A: only jealous that would already suffice as problem, but I think there is more to it.

B: The reason I’m asking your advise is that I’ll have my degree in June and right now I’m not in my best shape.

r2

A: Try to be with yourself after all you’re heading for your final degree!

r3

B: mhm, well, but concentrating on my studies often doesn’t work

B: do you have another consultation hour scheduled soon?

r3

A: yes, always on Monday.

A: to this you have to find a solution immediately.

40

r1

B: I’ve recently felt confused when seeing Maria talking with Michael.

A: oh yea, then, what we have here is a nice case of jealousy.

r2

B: would you say that I’m indeed only jealous?

r3

A: only jealous that would already suffice as problem, but I think there is more to it.

B: The reason I’m asking your advise is that I’ll have my degree in June and right now I’m not in my best shape.

r2

A: Try to be with yourself after all you’re heading for your final degree!

r3

B: mhm, well, but concentrating on my studies often doesn’t work

B: do you have another consultation hour scheduled soon?

r3

A: yes, always on Monday.

A: to this you have to find a solution immediately.

r3

41

Merge

a1. A: how was your trip to Suceava?b1. A: oh, did I tell you? I have seen yesterday

evening Michael in the disco. b2. A: did you meet him anymore since you two

split?a2. B. Fantastic! lot’s of nice guys and lot of

fun.

42

Merge

a1. A: how was your trip to Suceava?b1. A: oh, did I tell you? I have seen yesterday

evening Michael in the disco. b2. A: did you meet him anymore since you two

split?a2. B. Fantastic! lot’s of nice guys and lot of

fun.

a: the trip theme

43

Merge

a1. A: how was your trip to Suceava?b1. A: oh, did I tell you? I have seen yesterday

evening Michael in the disco. b2. A: did you meet him anymore since you two

split?a2. B. Fantastic! lot’s of nice guys and lot of

fun.

b: the separation from Michael theme

44

Merge

a1. A: how was your trip to Suceava?b1. A: oh, did I tell you? I have seen yesterday

evening Michael in the disco. b2. A: did you meet him anymore since you two

split?a2. B. Fantastic! lot’s of nice guys and lot of

fun. B: eee, but, you know, he was too in that

trip!

45

Merge

a1. A: how was your trip to Suceava?b1. A: oh, did I tell you? I have seen yesterday

evening Michael in the disco. b2. A: did you meet him anymore since you two split?a2. B. Fantastic! lot’s of nice guys and lot of fun.

B: eee, but, you know, he was too in that trip!

46

Merge of two DTsr1

A: how was your trip to Suceava?

B: Fantastic! lot’s of nice guys and lot of fun.

r2

A: did you meet him anymore since you two separated?

A: oh, did I tell you? I have seen yesterday evening Michael in the disco.

B: eee, but, you know, hewas too in that trip!

r1

r2

A: how was your trip to Suceava?

B: Fantastic! lot’s of nice guys and lot of fun.

A: did you meet him anymore since you two separated?

A: oh, did I tell you? I have seen yesterday evening Michael in the disco.

B: eee, but, you know, hewas too in that trip!

r

r

47

Splittingab1. A: so, you didn’t know that I finished with

Michael?!

ab2 A: it happened last month after I came back from Mexico.

a3. B: oh, I’m sorry, are you still mourning after him or you have already someone else?

a4. A: negative! I’m ok but need a period of loneliness.

a5. B: you cannot resist long like this. I know you…

b3. B: so, have you seen the pyramids there?

48

Splitting a DT

A: so, you didn’t know that I finished with Michael?!

A: it happened last month after I came back from Mexico.

r2

r3

B: oh, I’m sorry, are you still mourning after him or you have already someone else?

r4

A: negative! I’m ok and need a period of loneliness.

B: you cannot resist long like this. I know you…

B: so, have you seen the pyramidsthere?

r1

49

r2

Splitting a DT

A: so, you didn’t know that I finished with Michael?!

A: it happened last month after I came back from Mexico.

r2

r3

B: oh, I’m sorry, are you still mourning after him or you have already someone else?

r4

A: negative! I’m ok and need a period of loneliness.

B: you cannot resist long like this. I know you…

r1

B: so, have you seen the pyramidsthere?

50

A problem…

r1

A: so, you didn’t know that I finished with Michael?!

A: it happened last month after I came back from Mexico.

r2

r3

B: oh, I’m sorry, are you still mourning after him or you have already someone else?

r4

A: negative! I’m ok and need a period of loneliness.

B: you cannot resist long like this. I know you…

How could Mexico be evoked by there since it belongs to a span which is closed ?

B: so, have you seen the pyramids there?

51

Referential means

• Weak– pronouns (including zero) – demonstratives

• Strong– proper nouns– noun phrases (bridges or coreferential)

Nothing is really forgotten…

52

Example of a chat

1-45

6-7 8-9

10-11

12

13-1415

16-19

20

21-25 26-41

50

42

43

44 45-49

51-55

57

56

62

58-59

60-61

63

64

1-45

6-7 8-9

10-11

12

13-1415

16-19

20

21-25 26-41

50

42

43

44 45-49

51-55

57

56

62

58-59

60-61

63

64

1-45

6-7 8-9

10-11

12

13-1415

16-19

20

21-25 26-41

50

42

43

44 45-49

51-55

57

56

62

58-59

60-61

63

64

1-45

6-7 8-9

10-11

12

13-1415

16-19

20

21-25 26-41

50

42

43

44 45-49

51-55

57

56

62

58-59

60-61

63

64

1-45

6-7 8-9

10-11

12

13-1415

16-19

20

21-25 26-41

50

42

43

44 45-49

51-55

57

56

62

58-59

60-61

63

64

58

Conclusions• RFC holds!

– not only because a whole class of dedicated scholars have found examples to support it

– because there is a simple theoretical proof for it

• Discourse should be represented by graphs, not trees

• Not only chats, but long discourses too• Graphs can represent multi-threads:

– splitting a thread onto two – merging two distinct threads onto one– interruptions and flashbacks– there are more developing Right Frontiers on a discourse

graph– hopping from a RF to another needs strong referential

means

59

Chapter 3:Veins over discourse structure

Work in collaboration with:Nancy Ide, Laurent Romary, Daniel Marcu,

Valentin Tablan, Corina Forascu et al.

60

Are you comfortable with finding a referent for it in unit 3?

1. With one year before finishing his mandate as president of the company,

2. Mr. W. Ross has begun to bring about its bankruptcy.

3. There were rumors that he has obtained it by fraud.

61

Are you comfortable with finding a referent for it in unit 3?

1. With one year before finishing his mandate as president of the company,

2. Mr. W. Ross has begun to bring about its bankruptcy.

*3. There were rumors that he has obtained it by fraud.

62

How about now?

1. Mr. W. Ross has begun to bring about the bankruptcy of his company.

2. with one year before finishing his mandate as president.

3. There were rumors that he has obtained it by fraud.

63

How about now?

1. Mr. W. Ross has begun to bring about the bankruptcy of his company.

2. with one year before finishing his mandate as president.

3. There were rumors that he has obtained it by fraud.

64

How many of you think that she in unit 4 refers John’s mother?

1 John told Mary that he loves her.2. He has never been married 3. and lived until his 40s with his mother. 4. She, on the contrary, was married twice.

65

Discourse structure modifies the distance between anaphor and

antecedent1 John told Mary that he loves her.2. He has never been married3. and lived until his 40s with his mother. 4. She, on the contrary, was married twice.

antithesis

2÷4

42

66

Discourse structure modifies the distance between anaphor and

antecedent1 John told Mary that he loves her.2. He has never been married3. and lived until his 40s with his mother. 4. She, on the contrary, was married twice.

antithesis

2÷4

42

1÷4

1

elaboration

67

A satellite can access a far nucleus…

1 John told Mary that he loves her.2. He has never been married 3. and lived until his 40s with his mother. 4. She, on the contrary, was married twice.

antithesis

1÷4

4

2

elaboration

3

1

elaboration

68

… but not another satellite

1 John told Mary that he loves her.2. He has never been married 3. and lived until his 40s with his mother. 4. She, on the contrary, was married twice.

antithesis

1÷4

4

2

elaboration

3

1

elaboration

69

A nucleus blocks the access between two satellites…

1. With one year before finishing his mandate as president of the company,

2. Mr. W. Ross has begun to bring about its bankruptcy. 3. There were rumors that he has obtained it by fraud.

1÷3

circumstance

21

background

3

70

… and a satellite can access its nucleus

1. Mr. W. Ross has begun to bring about the bankruptcy of his company.

2. with one year before finishing his mandate as president.3. There were rumors that he has obtained it by fraud.

1÷3

2

background

3

1

circumstance

71

Veins Model basics

72

Fundamental assumption in VT (the cohesion claim)

• An inter-unit reference is possible only if the two units are in a structural relation one with the other

• The nucleus-satellite distinction, as a component of the discourse structure, gives indications on the range of referents to which an anaphor can be resolved

73

The definitions of VT: heads• Head expression of a node: the sequence of

the most important units within the corresponding span of text:– the head of a terminal node: its label– the head of a non-terminal node: the concatenation of

the head expressions of the nuclear daughters• the important units are projected up to the level

where the corresponding span is seen as a satellite

74

Veins

to understand a piece of text in the context of the whole discourse one needs the significant units within the span together with other surrounding units

Vein expression of a node: the sequence of units that are required to understand the span of text covered by the node, in the context of the whole discourse

75

Heads and veins

H=3

H=1 2

H=3H=1

H=2 H=3 H=4

H=5

H=3

1 2 3 4

5

V=3 5

V=3V=3

V=1 2 3

V=1 2 3

V=1 2 3

V=(1 2) 3

V=(1 2) 3 V=3 4

76

On cohesion

77

Types of references

evocative references

-evocative resolution processes:- an anaphor may be resolved to a referent that is not linearly the closest, but only hierarchically the closest - based on associations (pattern matching on morpho-semantic features) - fast- give fluency to the text

78

Types of references

- post-evocative resolution processes:- are inferential processes developed in memory, - computationally and cognitively slow (compel to more inference load),- require more powerful referencing means (like proper nouns), - are less frequent.

post-evocative references

79

Domain ofevocative accessibility (DEA)

dea(u) = pref(u, vein(u))

Remind! The vein expression of a terminal node(discourse unit): the sequence of units that are required to understand just that unit, in the context of the whole discourse.

(simplified)

80

Heads and veins

H=3

H=1 2

H=3H=1

H=2 H=3 H=4

H=5

H=3

1 2 3 4

5

V=3 5

V=3V=3

V=1 2 3

V=1 2 3

V=1 2 3

V=(1 2) 3

V=(1 2) 3 V=3 4

81

From vein expressions...

1 2 3 4

5

V=1 2 3

V=1 2 3 V=(1 2) 3 V=3 4

V=3 5

82

... to Domains of Evocative Accessibility

1 2 3 4

5

V=3 5

V=1 2 3

V=1 2 3 V=3 4

DEAs

V=1 2 3

83

The reason why she can refer Mary but not John’s mother

1 John told Mary that he loves her.2. He has never been married 3. and lived until his 40s with his mother. 4. She, on the contrary, was married twice.

antithesis

1÷4

4

2

elaboration

3

1

elaboration 1

2 3

4

V=1 2 4

84

The reason why we recuperate with difficulty the antecedent of it

1. With one year before finishing his mandate as president of the company,

2. Mr. W. Ross has begun to bring about its bankruptcy. 3. There were rumors that he has obtained it by fraud.

1÷3

circumstance

21

background

3

1

2 3

V=2 3

85

… while here the reference is immediate

1. Mr. W. Ross has begun to bring about the bankruptcy of his company.

2. with one year before finishing his mandate as president.3. There were rumors that he has obtained it by fraud.

1÷3

2

background

3

1

circumstance

1

2 3

V=1 2 3

86

Experiment 1: evocative vs post-evocative references

4.40%95.60%318176Total

4.50%95.50%11166Romanian

0.90%99.10%11048French

8.30%91.70%9762English

Outside the veins

On the veins

Total no. of refs

No. of units

Source

87

The 4.4% exceptions

decreasing evoking

power5.00%pronouns

16.00%common nouns

22.70%proper nouns

56.30%pragmatic

VTType of RE

88

Experiment 2: potential to establish correct co-reference links

• Compare Linear-k and Discourse-VT-kmodels:– For each k, each re, and each model M

(Linear or VT)• p(M-k,re,DEAk) =

• p(M-k,Corpus) = ∑re ∈Corpus p(M-k,re,DEAk)

1, re can be resolved to antecedents in DEAk

0, otherwise.{

89

Potentials

70,00%

75,00%

80,00%

85,00%

90,00%

95,00%

0 1 2 3 4 5 6 7 8 9E - D E A s i z e

VT-k Linear-k

90

Experiment 3: the effort required to find antecedents

• Compare Linear-k and Discourse-VT-kmodels:– For each k, each re, and each model M

(Linear or VT)• e(M-k,re,DEAk) =

• e(M-k,Corpus) = ∑re ∈Corpus e(M-k,re,DEAk)

d<k, the distance between re and the closest antecedent in DEAk

k, if no such antecedent exists.{

91

Effort: an example

Michael D. Casey

Genetic Therapy Inc.

Mr. Casey

Genetic Therapy Inc.

Mr. Casey

the smaller company

Johnson & Johnson

M. James Barett

chairman

its presidentits

J&J

Mr. Casey

J&J

Mr. Barett

CEO

2 3 4 5 6 7 81 9

1. Michael D. Casey, a top Johnson&Johnson manager, moved to Genetic Therapy Inc., a small biotechnology concern here,

2. to become its presidentand chief operating officer.

3. Mr. Casey, 46 years old, was president of J&J's McNeil Pharmaceutical subsidiary,

4. which was merged with another J&J unit, Ortho Pharmaceutical Corp., this year in a cost-cutting move.

5. Mr. Casey succeeds M. James Barrett, 50, as president of Genetic Therapy.

6. Mr. Barrett remains chief executive officer

7. and becomes chairman.8. Mr. Casey said 9. he made the move to the

smaller company.

92

Efforts

0

1000

2000

3000

4000

5000

6000

7000

8000

1 3 5 7 9 11 13 15 17 19 25 35 45 55 65 75 85 95

E - D E A s i z eVT Process Lin Process

93

The account of VT on coherence

• Veins give a natural way to generalize Centering from local to global

94

Centering Rule 2: transitions

Cb(u) = Cb(u-1) Cb(u) ≠ Cb(u-1)

Cb(u) = Cp(u)

Cb(u) ≠ Cp(u)ABRUPT

SHIFTRETAINING

SMOOTH SHIFT

CONTINUING

CON > RET > SSH > ASH

95

1 2 3 4

5 V=1 3 5

V=1 3 5

V=1 2 3 5 V=1 3 5V=1 3 4 5

Vein expressions give „lines of argumentation“

1. John sold his bicycle

1. John sold his bicycle

3. He obtained a good price for it,

5. Therefore he decided to use the money to go on a trip.

1. John sold his bicycle2. although Bill would have wanted it3. He obtained a good price for it,4. which Bill could not have afforded5. Therefore he decided to use the money to go on a trip.

96

1 2 3 4

5 V=1 3 5

V=1 3 5

V=1 2 3 5 V=1 3 5V=1 3 4 5

Lines of argumentation

4. which Bill could not have afforded.

1. John sold his bicycle

3. He obtained a good price for it,

4. which Bill could not have afforded

5. Therefore he decided to use the money to go on a trip.

97

Maximal vein expressions: argumentation lines (al)

1 3 51 3 51 3 55

1 3 41 3 41 3 4 54

1 31 3 53

1 21 21 2 3 52

11 3 51

aldea(u)V(u)u

98

Evaluating the coherence of a discourse

• A smoothness score:– CONTINUING = 4– RETAINING = 3– SMOOTH SHIFT =2– ABRUPT SHIFT = 1– NO Cb = 0

• A global smoothness score: summing up the score of all units

99

The second conjecture (on coherence)• The global smoothness score of a discourse

when computed following VT is at least as high as the score computed following CT.

• But segments, as considered by Centering, typically are developed along veins.

• When passing segments frontiers, in a linear reading, transitions are usually abrupt.

• Therefore, what we claim here is that long-distance transitions, as computed along veins, are systematically smoother than accidental transitions at segment boundaries.

100

Transitions and scores on a linear adjacency metric

J = [John], b = [John's bicycle], B = [Bill], p = [price], m = [the money], t = [a trip])

6/4 = 1.5Global

0231Score

No CbSSHRETASHTrans

-pbbJCb

J, m, tp, BJ, p, bB, bJ, bCf

54321

101

Transitions and scores on a hierarchical adjacency metric

Global

1Score

ASHTrans

BJCb

B, bJ, bCf

21

24

SSHCON

pJJ

p, BJ, p, bJ, b

431

11/4=2.75

4

CON

JJJ

p, m, tJ, p, bJ, b

531

102

Verifying the second conjecture

2.033521.89327173Total

2.341522.1814265Romanian

2.471162.3510947French

1.38841.257659English

Average VT score

per transition

VT scoreAverage CT score

per transition

CT scoreNo. of transitions

Source

103

Chapter 4:Incremental Discourse Parsing

Work in collaboration with: Bonnie Webber, Ionut Pistol et al.

104

Incremental discourse parsing

The principle of sequenciality• A left to right reading of the terminal

frontier of the tree associated with a discourse must correspond to the span of text it analyses in the same left-to-right order.

6

105

Incremental discourse parsing -a TAG inspired approach

Adjoining to the right frontier

τ

a1

σ0σ0

σ1

a

σa*

τ’a

σaσ1

7

(Polanyi, 1988)

106

Substitution in case of free expectations

k+1

k. Although Bill would have wanted it,

k

τ

k+1. John sold his bicycle to somebody else.

k

τ’

k+1

107

Expectations-driven incremental parsing

a. Clinton is bound to win the elections.b. He is a natural born campaigner.c. If you hold some position on an issue,d. then if Clinton wants to get your vote,e. he will assure you with great sincerity that he

holds that position too.

a

b

EVIDENCE

c

d e

EVIDENCE

ANT-CONS

ANT-CONS

8

(Cristea and Webber, 1997)

108

a. Clinton is bound to win the elections.

a

b

EVIDENCE

*

b. He is a natural born campaigner.

9

Expectations-driven incremental parsing

109

a. Clinton is bound to win the elections.b. He is a natural born campaigner.

a b

EVIDENCE

c. If you hold some position on an issue,

EVIDENCE

*c

ANT-CONS

?

13

Expectations-driven incremental parsing

110

a. Clinton is bound to win the elections.b. He is a natural born campaigner.c. If you hold some position on an issue,

a

b

EVIDENCE

EVIDENCE

c

ANT-CONS

?

d. he will assure you with great sincerity that he holds that position too.

14

Expectations-driven incremental parsing

111

a. Clinton is bound to win the elections.b. He is a natural born campaigner.c. If you hold some position on an issue,d. he will assure you with great sincerity that he holds that position too.

a

b

EVIDENCE

EVIDENCE

c

ANT-CONS

d

15

Expectations-driven incremental parsing

112

a. Clinton is bound to win the elections.b. He is a natural born campaigner.c. If you hold some position on an issue,d. then if Clinton wants to get your vote,

d

ANT-CONS

?

a

b

EVIDENCE

EVIDENCE

c

ANT-CONS

?

16

Expectations-driven incremental parsing

113

a. Clinton is bound to win the elections.b. He is a natural born campaigner.c. If you hold some position on an issue,d. then if Clinton wants to get your vote,e. he will assure you with great sincerity that he holds that position too.

d

ANT-CONS

a

b

EVIDENCE

EVIDENCE

c

ANT-CONS

?

17

Expectations-driven incremental parsing

114

a. Clinton is bound to win the elections.b. He is a natural born campaigner.c. If you hold some position on an issue,d. then if Clinton wants to get your vote,e. he will assure you with great sincerity that he holds that position too.

d

ANT-CONS

a

b

EVIDENCE

EVIDENCE

c

ANT-CONS

e

18

Expectations-driven incremental parsing

115

What discourse markers can tell about structure?

[Because John is such a generous man 1] [– whenever he is asked for money, 2] [he will give whatever he has, for example 3] [– he deserves the “Citizen of the year” award. 4]

Reproduced from (Cristea and Webber,1998)

1 2 3 4

because -, -1 2 3 4

1 2 3

1 2

1 3 42

1 32

1 42 3

because <something>, <something>

(Marcu, 1997, 2000; Cristea et al., 2003, 2005)

116

What discourse markers can tell about structure?

[Because John is such a generous man 1] [– whenever he is asked for money, 2] [he will give whatever he has, for example 3] [– he deserves the “Citizen of the year” award. 4]

2 3

whenever -, -2 3 4

2 3

2 43

whenever <something>, <something>

117

What discourse markers can tell about structure?

[Because John is such a generous man 1] [– whenever he is asked for money, 2] [he will give whatever he has, for example 3] [– he deserves the “Citizen of the year” award. 4]

1 2 3

-, - for example1 2 3

2 3

1 32

<something>, <something> for example

118

What discourse markers can tell about structure?

[Because John is such a generous man 1] [– whenever he is asked for money, 2] [he will give whatever he has, for example 3] [– he deserves the “Citizen of the year” award. 4]

There are only two trees that can be obtained after considering all constraints:

1

-, - for example

2 3

whenever -, -

4

because -, -

1

2 3

4

because -, -

-, - for example

whenever -, -

because < >, < >1 2 3 4whenever < >, < >2 3< >, < > for example1 2 3

because < >, < >1 2 3 4whenever < >, < >2 3< >, < > for example2 3

4

119

The incremental generation of the first interpretation

[Because John is such a generous man 1]

1

because -, -

[– whenever he is asked for money, 2]

2

whenever -, -

?

*

120

1

?

because -, -

[Because John is such a generous man 1]

[– whenever he is asked for money, 2]

2

whenever -, -3

[he will give whatever he has, for example 3]

The incremental generation of the first interpretation

121

1

?

3

because -, -

[Because John is such a generous man 1]

[– whenever he is asked for money, 2]

2

whenever -, -

4

[he will give whatever he has, for example 3][– he deserves the “Citizen of the year” award. 4]

The incremental generation of the first interpretation

122

1

?

3

4

because -, -

[Because John is such a generous man 1]

[– whenever he is asked for money, 2]

2

whenever -, -

[he will give whatever he has, for example 3][– he deserves the “Citizen of the year” award. 4]

The incremental generation of the first interpretation

123

[Because John is such a generous man 1]

1

because -, -

2

whenever -, -

[– whenever he is asked for money, 2]

The incremental generation of the second interpretation

124

[Because John is such a generous man 1]

1

because -, -

2

whenever -, -

[– whenever he is asked for money, 2][he will give whatever he has, for example 3]

3

-, - for example

*

The incremental generation of the second interpretation

125

[Because John is such a generous man 1]

[– whenever he is asked for money, 2][he will give whatever he has, for example 3]

1

because -, -

2

whenever -, -

3

-, - for example

[– he deserves the “Citizen of the year” award. 4]

4

The incremental generation of the second interpretation

126

[Because John is such a generous man 1]

[– whenever he is asked for money, 2][he will give whatever he has, for example 3]

1

because -, -

2

whenever -, -

3

-, - for example 4

[– he deserves the “Citizen of the year” award. 4]

The incremental generation of the second interpretation

127

[Because John is such a generous man 1] [– whenever he is asked for money, 2] [he will give whatever he has, for example 3] [– hedeserves the “Citizen of the year” award. 4]

1

-, - for example

2 3

whenever -, -

4

because -, -

1

2 3

4

because -, -

-, - for example

whenever -, -

How can references help in discovering the structure?

DEA=1 4

DEA=1 2 3DEA=1 2

DEA=2 4

DEA=2 3DEA=1 2

wrongright

128

How can references help in discovering the structure?

a. Because Mary was upset,b. even if John agreed,c. they didn’t speak to one another for several days.

c

a b V=(a)cDEA=ac

a

b c

V=(a)(b)cDEA=abcrightwrong

129

Incremental parallel processing

NP-chunker AR-engine

segmentator edts-builder

disc. parser summarizer

(Cristea et al., 2005)

130

VT guides an incremental discourse parsing

The tree resulted after parsing is the one which manifests:– the more natural overall references over the

discourse structure – the smoothest overall CT transitions on veins

(Cristea, 2000; Cristea et al., 2003, 2005)

131

coherence

cohesionreference score

transitions score

The discourse parser implements a „beam search“

N

Ntrees observing markers‘ well-

formedness constraints

132

VT referencesCristea,D.; Ide,N.; Romary,L. (1998): Veins Theory. An Approach to Global

Cohesion and Coherence. In Proceedings of Coling/ACL ‘98, MontrealCristea,D., Ide,N., Marcu,D., Tablan, M.-V. (2000): Discourse Structure and

Co-Reference: An Empirical Study, In Proceedings of The 18th International Conference on Computational Linguistics COLING'2000, Luxembourg

Ide,N., Cristea,D. (2000): A Hierarchical Account of Referential Accessibility.In Proceedings of The 38th Annual Meeting of the Association for Computational Linguistics, ACL'2000, Hong Kong

Seretan,V., Cristea,D. (2002): The use of referential constrains in structuringdiscourse. In Proceedings of The Third International Conference onLanguage Resources and Evaluation, LREC-2002, Las Palmas

Cristea, D. (2005): Motivations and Implications of Veins Theory, in B.Sharp(Ed.). Natural Language Understanding and Cognitive Science, Proceedings of the 2nd International Workshop on Natural LanguageUnderstanding and Cognitive Scienc3, NLUCS 2005, in conjunction withICEIS 2005, Miami, U.S.A., May 2005, INSTICC Press