TagHelper and InfoMagnets Technologies for Exploring the effect of Language Interactions in Learning...

12
TagHelper and InfoMagnets Technologies for Exploring the effect of Language Interactions in Learning Carolyn Penstein Rosé, Jaime Arguello, Yue Cui, Rohit Kumar, Emil Albright, Hao-Chuan Wang, Pinar Donmez, Cammie Williams, William Cohen Language Technologies Institute/ Human-Computer Interaction Institute/ Machine Learning Department Carnegie Mellon University

Transcript of TagHelper and InfoMagnets Technologies for Exploring the effect of Language Interactions in Learning...

TagHelper and InfoMagnetsTechnologies for Exploring

the effect of Language Interactions in Learning

Carolyn Penstein Rosé, Jaime Arguello, Yue Cui, Rohit Kumar, Emil Albright, Hao-Chuan Wang, Pinar

Donmez, Cammie Williams, William CohenLanguage Technologies Institute/ Human-Computer Interaction Institute/

Machine Learning DepartmentCarnegie Mellon University

Tools for DataMining for Corpus Data

• TagHelper– Supervised text classification technology– Analysts define categories and tag example texts– Algorithms learn to generalize from training

examples– Trained models assign categories to untagged data– Supports categorical corpus analysis

• InfoMagnets– Automatic initial segmentation and topic analysis– Interactive reclustering– Supports exploratory corpus analysis / sense

making

Outline for this Demo Session

• Conceptual overview of data mining from corpus data (15 min)– What is it and what can you do with it?

• Demo of TagHelper (10 minutes)

• TagHelper activity (10 minutes)– Goal: Learn to train simple TagHelper classifiers to

use in TuTalk dialogues

• Demo of InfoMagnets (10 minutes)

• InfoMagnets activity (15 minutes)– Goal: Gain experience with basic topic analysis

technology

What is DataMining from Corpus Data?

• Goal: data reduction - identify meaningful patterns in freeform corpus data– Test recall versus recognition– Verify the effect of a manipulation on

interaction or thought processes– Correlational analyses used for hypothesis

formation

• Relevant Contexts: tutorial dialogue, collaborative learning, self-explanation, think aloud protocols

What is DataMining from Corpus Data?

• Basic Approach: transform freeform corpus data into a formal structure that can be analyzed using quantitative methods– Rating scales: e.g., depth of an explanation– Categorical Coding: e.g., self-explanation

versus summarization

• Caveat: corpus analysis can be subjective– Reliability standards mitigate the risk of

subjectivity in judgments

Motivation for TagHelper Project

• Social scientists, psychologists, and educational scientists code by hand large quantities of corpus data

• Tools currently used by behavioral researchers do not support decision making of human coders– e.g., MacSHAPA, NVivo, HyperResearch, etc.

• Text classification technology can support automatic prediction of codes for supporting data analysis tasks

• Automatic classification technology can also trigger on-line interventions in real time or process freeform student input

Example Research Context: Learning in On-Line Discussions

Knowledge MediaResearch Center,Tuebingen Germany

Example Coding Scheme for Analyzing Collaborative Learning Interactions

• Original German: "Es ist seine Faulheit, aber mangelndes Talent würde auch passen"

• Translation: "It is his laziness, but lack of talent would also fit"

• Social Modes Code: Integration-Oriented Consensus Building

• Original German: "Es ist seine Faulheit, mangelndes Talent würde weniger passen"

• Translation: "It is his laziness, lack of talent would fit less well"

• Social Modes Code: Conflict-Oriented Consensus Building

Dimension Number of Classes

Epistemic (EPI) 35

Micro (ATOL) 4

Macro (ALEI) 6

Social Modes (SOC)

21

Reaction (REA) 3

Appropriateness (PRO)

4

Quoted (QUO) 2

Knowledge

Language

RelationalStyle

* Training + Coding by hand requires 25% of project resources!

Using Automatic Coding in an On-Line Intervention

Other TagHelper/InfoMagnets Applications

• Data Analysis– InfoMagnets style topic analyses reveals more and

less effective student strategies (Kumar et al., 2006)

– Topicality metrics predict on-line community behavior (Arguello et al., 2006)

• On-Line interventions– Triggers feedback to qualitative physics

explanations offered as justifications for multiple choice answers (Ogilvie group, English data)

– Trigger feedback for group and individual brainstorming (Wang et al., submitted, Chinese data)

Corpus Analysis Offerings (Tuesday)

• Basic DataMining from Corpus Data– Introduction to coding scheme design and protocol

analysis– In depth walk through of basic TagHelper

functionality• Advanced Conversational DataMining with

TagHelper– Feature space design– Tuning machine learning algorithms

• Exploratory Corpus Analysis with InfoMagnets– Conceptual discussion of topic segmentation and

topic clustering technology– Using a topic analysis as part of Learning Science

research

Contact Info:

Carolyn Penstein Rosé[email protected]://www.cs.cmu.edu/~cprose