Computational Models of Discourse Analysis

Computational Models of Discourse Analysis

Carolyn Penstein Rosé

Language Technologies Institute/

Human-Computer Interaction Institute

Warm-Up Discussion What is the

distinction between personality, identity, and perspective? Does the distinction

matter computationally

How do they related to one another as lenses for understanding social media data?

What do we take from today’s readings for assignment 4?

Personality

Identity

Perspective

Student Comment At first the paper did not seem related to

our task of identifying gender but perhaps this paper shows that the way we see ourselves is extremely consistent. No matter how you ask the question a subject will always give you an honest answer as to how they see themselves. This could mean that no matter how hard we try we will sooner or later embed signals into our blog posts that indicate our perceived gender.

Student Comment It seems that the importance of "spiritual self" in

presentation is the most important takeaway from this paper. 96% of users attempt to describe themselves with aspects of their "spiritual self" (i.e., perceived abilities). So focusing on these instead of the material or the social might be better (although, it's possible that a particular gender uses one of these sub-types significantly more than another, which could also be handy, but we don't have that information).

Is this personality or identity? How would you expect it to relate to other online behavior?

Semester Review

Semester in Review

Unit 1: Theoretical Foundation

Unit 2: Linguistic Structure

Unit 3: Sentiment

Unit 4: Identity and Personality

Unit 5: Social Positioning

In each Unit: Readings from

Discourse Analysis and Sociolinguistics

Readings from Language Technologies

Hands-on assignment Implementation and

corpus based experiment

Competitive error analysis

Student Presentations

Building Tasks

According to Gee’s theory, whenever we speak or write, we are constructing 7 areas of reality

What we build: Significance, Practices, Identities, Relationships, Politics, Connections, Sign systems and knowledge

How we build them: Social languages, Socially situated identities, Discourses, Conversations, Figured worlds, intertextuality

What we Build Significance: things and people made more or less significant through

the text Practices: ritualized activities and how are they being enacted

through the text (for example, lecturing or mentoring) Identities: manner in which things and people are being cast in a role

through the text Relationships: style of social relationship, like level of formality Politics: how “social goods” are being distributed, who is responsible

for the flow, where is it going Connections: connections and disconnections between things and

people, e.g., what ideas are related, how are things causally connected, what is affecting what?

Sign Systems and Knowledge: languages, social languages, and ways of knowing, what ways of communicating and knowing are treated as standard and acceptable in the context, e.g., that you’re expected to speak in English in class

DiscourseEnvironmentalism

ConversationGlobal Warming

DiscourseStatusQuo

Socially Situated IdentityEnvironmentalist

Social LanguageLiberal rhetoric

Figured WorldExpected structure of Conservationist Commercial

Form-Function CorrespondenceRange of meanings for the word “sustainability”

Situated MeaningMeaning of “sustainability” in the commercial

Imagine an environmentalist commercial

Computationalizing Gee? Challenge: not variationist

Form-function correspondences can be modeled naturally through rules

Cells of table like feature extractors? Social Languages like topic models? Figured worlds related to “social causality”

Metafunctions

What is a system?

Computationalizing SFL? See Elijah’s ACL paper! We had to REALLY simplify to get there Not clear how to do that for Heteroglossia

yet

Computational Techniques Text entailment/ similarity measures/ paraphrase/

constraint relaxation Topic models Machine Learning Techniques: bootstrapping, HMMs, other

statistical modeling techniques Basic features: unigrams, bigrams, POS bigrams,

acoustic and prosodic features (speech) Created features: dictionaries, templates,

syntactic dependency relations

Basic Aspects of Discourse Structure are Easiest to Model Turn taking Topic segments Speech acts (at least direct ones)

More recent computational work focuses on more challenging “discoursey” problems like sentiment and stance

Some recent work on metaphors (related to frames), but not applied to discourse level problems

Problems Labels in public datasets don’t necessarily match the theory

Computational approaches embody variationist assumptions, but much of the theory is grounded in a more contextualized view of meaning making

Lack of a fully satisfying operationalization of style (style is hard to separate from content) Grammatical metaphor and other indirect strategies Same effect can be achieved in so many ways – each technique only

captures one slice – so you’re always just grasping a glimpse of what’s there

Overfitting spurious correlations “subpopulations” leading to problems with generalization Similar variation arising due to numerous different factors (gender, age,

SES) Features at too low level – words serving multiple purposes

simultaneously

Engagement and Personality?

How would you expect an Engagement style analysis to relate to personality?

•What effect would you expect to see on conversations?

•Are these necessarily connected?

Freshman Engineering Study 131 Freshman engineering students

worked in groups of 3 or 4 to design a better wrench Applying principles related to stress

and leverage

Procedure Tutorial on computer aided engineering Pretest Collaborative design activity Posttest Questionnaire

Heteroglossia Manipulation

Social Manipulation

Tutor Agent Design

RequestDetector

T.T

akin

gC

oord

inato

r

TutoringManager

Outp

utC

oord

inato

r

ConcertChatActor

ConcertChatListener

ConcertChat Server

ConcertChat Server

TutoringActorPromptingActor

MessageFilterDiscourseMemory PresenceFilter

PromptingManager

IntroductionsActor

IntroductionsManager

ActivityDetector

AnnotationFilter

PlanExecutor

SocialController

Pro

gre

ssD

ete

ctor

TutorOne last thing on this topic, Does more (or less) stress in a wrench make it easier to use?

S95 no change?

TutorYou are correct about that.. Stress doesn't determine ease of use.

TutorIt's the moment achieved by the wrench that determines the ease of use.

S89 yay!

TutorIt's good to have your contributions in the discussion Jackie :-)

Tutor Go team :-)S89 Go team yay

TutorI am happy to be working with our team

S89 Me tooS95 whoa the bot knows my name

TutorUnfortunately maximum stress (12800) in our design1 is way above the maximum allowed stress (i.e. 8750)

TutorThis wrench cannot be safely used!

Kumar, R. & Rosé, C. P. (2011). Architecture for building Conversational Agents that support Collaborative Learning, IEEE Transactions on Learning Technologies special issue on Intelligent and Innovative Support Systems for Computer Supported Collaborative Learning

Results on Breadth of Coverage of Design Space

Significant main effect of Heteroglossia on number of ideas mentionedHeteroglossia was better than

Monoglossia and Neutral Significant interaction

In the Social condition, Monoglossia was worse than the other two

Results on Perception Students were significantly happier with the interaction in

the Heteroglossia condition than Neutral, with Monoglossia in the middle

Students liked the Heteroglossic and Monoglossic agents better than the Neutral agent

Students in the Heteroglossia condition felt marginally more successful than students in the Monoglossia condition

No effect on Personality indicators such as Pushy, Wishy Washy, etc.

Does that mean that impression of personality and how you feel about an interaction with someone are not linked?

Student Comment I would also note that English is a very

gender neutral language, so gender performativity is harder to classify.

Engagement Already established: Positioning a

propositionBut can it also be primarily positioning between

people?Patterns of positioning propositions as having

the same or different alignment between speaker and hearer could do this

Is positioning in communication always positioning by means of propositional content?

Connection between Heteroglossia and Attitude

But is this really different from a disclaim?

And is this really different from a proclaim?

Hedging and Occupation? And as such, I believe hedging is a much

more effective tool in showing generational or occupational differences rather than gender differences. For example, teenagers often use verbs such

as 'like' and 'all' to report speech: he was all 'that's stupid' and then he was like ''but I'm stupid too'. The occupational differences I would attribute to the differences between people who need exact values as opposed to people who can accept generalizations or approximations.

Questions?

Computational Models of Discourse Analysis

Documents

Transcript of Computational Models of Discourse Analysis