Computational Models of Discourse Analysis
-
Upload
alden-wynn -
Category
Documents
-
view
36 -
download
3
description
Transcript of Computational Models of Discourse Analysis
Computational Models of Discourse Analysis
Carolyn Penstein Rosé
Language Technologies Institute/
Human-Computer Interaction Institute
Warm-Up Discussion What is the
distinction between personality, identity, and perspective? Does the distinction
matter computationally
How do they related to one another as lenses for understanding social media data?
What do we take from today’s readings for assignment 4?
Personality
Identity
Perspective
Student Comment At first the paper did not seem related to
our task of identifying gender but perhaps this paper shows that the way we see ourselves is extremely consistent. No matter how you ask the question a subject will always give you an honest answer as to how they see themselves. This could mean that no matter how hard we try we will sooner or later embed signals into our blog posts that indicate our perceived gender.
Student Comment It seems that the importance of "spiritual self" in
presentation is the most important takeaway from this paper. 96% of users attempt to describe themselves with aspects of their "spiritual self" (i.e., perceived abilities). So focusing on these instead of the material or the social might be better (although, it's possible that a particular gender uses one of these sub-types significantly more than another, which could also be handy, but we don't have that information).
Is this personality or identity? How would you expect it to relate to other online behavior?
Semester Review
Semester in Review
Unit 1: Theoretical Foundation
Unit 2: Linguistic Structure
Unit 3: Sentiment
Unit 4: Identity and Personality
Unit 5: Social Positioning
In each Unit: Readings from
Discourse Analysis and Sociolinguistics
Readings from Language Technologies
Hands-on assignment Implementation and
corpus based experiment
Competitive error analysis
Student Presentations
Building Tasks
According to Gee’s theory, whenever we speak or write, we are constructing 7 areas of reality
What we build: Significance, Practices, Identities, Relationships, Politics, Connections, Sign systems and knowledge
How we build them: Social languages, Socially situated identities, Discourses, Conversations, Figured worlds, intertextuality
What we Build Significance: things and people made more or less significant through
the text Practices: ritualized activities and how are they being enacted
through the text (for example, lecturing or mentoring) Identities: manner in which things and people are being cast in a role
through the text Relationships: style of social relationship, like level of formality Politics: how “social goods” are being distributed, who is responsible
for the flow, where is it going Connections: connections and disconnections between things and
people, e.g., what ideas are related, how are things causally connected, what is affecting what?
Sign Systems and Knowledge: languages, social languages, and ways of knowing, what ways of communicating and knowing are treated as standard and acceptable in the context, e.g., that you’re expected to speak in English in class
DiscourseEnvironmentalism
ConversationGlobal Warming
DiscourseStatusQuo
Socially Situated IdentityEnvironmentalist
Social LanguageLiberal rhetoric
Figured WorldExpected structure of Conservationist Commercial
Form-Function CorrespondenceRange of meanings for the word “sustainability”
Situated MeaningMeaning of “sustainability” in the commercial
Imagine an environmentalist commercial
Computationalizing Gee? Challenge: not variationist
Form-function correspondences can be modeled naturally through rules
Cells of table like feature extractors? Social Languages like topic models? Figured worlds related to “social causality”
Metafunctions
What is a system?
Computationalizing SFL? See Elijah’s ACL paper! We had to REALLY simplify to get there Not clear how to do that for Heteroglossia
yet
Computational Techniques Text entailment/ similarity measures/ paraphrase/
constraint relaxation Topic models Machine Learning Techniques: bootstrapping, HMMs, other
statistical modeling techniques Basic features: unigrams, bigrams, POS bigrams,
acoustic and prosodic features (speech) Created features: dictionaries, templates,
syntactic dependency relations
Basic Aspects of Discourse Structure are Easiest to Model Turn taking Topic segments Speech acts (at least direct ones)
More recent computational work focuses on more challenging “discoursey” problems like sentiment and stance
Some recent work on metaphors (related to frames), but not applied to discourse level problems
Problems Labels in public datasets don’t necessarily match the theory
Computational approaches embody variationist assumptions, but much of the theory is grounded in a more contextualized view of meaning making
Lack of a fully satisfying operationalization of style (style is hard to separate from content) Grammatical metaphor and other indirect strategies Same effect can be achieved in so many ways – each technique only
captures one slice – so you’re always just grasping a glimpse of what’s there
Overfitting spurious correlations “subpopulations” leading to problems with generalization Similar variation arising due to numerous different factors (gender, age,
SES) Features at too low level – words serving multiple purposes
simultaneously
Engagement and Personality?
How would you expect an Engagement style analysis to relate to personality?
•What effect would you expect to see on conversations?
•Are these necessarily connected?
Freshman Engineering Study 131 Freshman engineering students
worked in groups of 3 or 4 to design a better wrench Applying principles related to stress
and leverage
Procedure Tutorial on computer aided engineering Pretest Collaborative design activity Posttest Questionnaire
Heteroglossia Manipulation
Social Manipulation
Tutor Agent Design
RequestDetector
T.T
akin
gC
oord
inato
r
TutoringManager
Outp
utC
oord
inato
r
ConcertChatActor
ConcertChatListener
ConcertChat Server
ConcertChat Server
TutoringActorPromptingActor
MessageFilterDiscourseMemory PresenceFilter
PromptingManager
IntroductionsActor
IntroductionsManager
ActivityDetector
AnnotationFilter
PlanExecutor
SocialController
Pro
gre
ssD
ete
ctor
TutorOne last thing on this topic, Does more (or less) stress in a wrench make it easier to use?
S95 no change?
TutorYou are correct about that.. Stress doesn't determine ease of use.
TutorIt's the moment achieved by the wrench that determines the ease of use.
S89 yay!
TutorIt's good to have your contributions in the discussion Jackie :-)
Tutor Go team :-)S89 Go team yay
TutorI am happy to be working with our team
S89 Me tooS95 whoa the bot knows my name
TutorUnfortunately maximum stress (12800) in our design1 is way above the maximum allowed stress (i.e. 8750)
TutorThis wrench cannot be safely used!
Kumar, R. & Rosé, C. P. (2011). Architecture for building Conversational Agents that support Collaborative Learning, IEEE Transactions on Learning Technologies special issue on Intelligent and Innovative Support Systems for Computer Supported Collaborative Learning
Results on Breadth of Coverage of Design Space
Significant main effect of Heteroglossia on number of ideas mentionedHeteroglossia was better than
Monoglossia and Neutral Significant interaction
In the Social condition, Monoglossia was worse than the other two
Results on Perception Students were significantly happier with the interaction in
the Heteroglossia condition than Neutral, with Monoglossia in the middle
Students liked the Heteroglossic and Monoglossic agents better than the Neutral agent
Students in the Heteroglossia condition felt marginally more successful than students in the Monoglossia condition
No effect on Personality indicators such as Pushy, Wishy Washy, etc.
Does that mean that impression of personality and how you feel about an interaction with someone are not linked?
Student Comment I would also note that English is a very
gender neutral language, so gender performativity is harder to classify.
Engagement Already established: Positioning a
propositionBut can it also be primarily positioning between
people?Patterns of positioning propositions as having
the same or different alignment between speaker and hearer could do this
Is positioning in communication always positioning by means of propositional content?
Connection between Heteroglossia and Attitude
But is this really different from a disclaim?
And is this really different from a proclaim?
Hedging and Occupation? And as such, I believe hedging is a much
more effective tool in showing generational or occupational differences rather than gender differences. For example, teenagers often use verbs such
as 'like' and 'all' to report speech: he was all 'that's stupid' and then he was like ''but I'm stupid too'. The occupational differences I would attribute to the differences between people who need exact values as opposed to people who can accept generalizations or approximations.
Questions?