TagHelper and InfoMagnets Technologies for Exploring the effect of Language Interactions in Learning...
-
Upload
alexandra-robbins -
Category
Documents
-
view
221 -
download
0
Transcript of TagHelper and InfoMagnets Technologies for Exploring the effect of Language Interactions in Learning...
TagHelper and InfoMagnetsTechnologies for Exploring
the effect of Language Interactions in Learning
Carolyn Penstein Rosé, Jaime Arguello, Yue Cui, Rohit Kumar, Emil Albright, Hao-Chuan Wang, Pinar
Donmez, Cammie Williams, William CohenLanguage Technologies Institute/ Human-Computer Interaction Institute/
Machine Learning DepartmentCarnegie Mellon University
Tools for DataMining for Corpus Data
• TagHelper– Supervised text classification technology– Analysts define categories and tag example texts– Algorithms learn to generalize from training
examples– Trained models assign categories to untagged data– Supports categorical corpus analysis
• InfoMagnets– Automatic initial segmentation and topic analysis– Interactive reclustering– Supports exploratory corpus analysis / sense
making
Outline for this Demo Session
• Conceptual overview of data mining from corpus data (15 min)– What is it and what can you do with it?
• Demo of TagHelper (10 minutes)
• TagHelper activity (10 minutes)– Goal: Learn to train simple TagHelper classifiers to
use in TuTalk dialogues
• Demo of InfoMagnets (10 minutes)
• InfoMagnets activity (15 minutes)– Goal: Gain experience with basic topic analysis
technology
What is DataMining from Corpus Data?
• Goal: data reduction - identify meaningful patterns in freeform corpus data– Test recall versus recognition– Verify the effect of a manipulation on
interaction or thought processes– Correlational analyses used for hypothesis
formation
• Relevant Contexts: tutorial dialogue, collaborative learning, self-explanation, think aloud protocols
What is DataMining from Corpus Data?
• Basic Approach: transform freeform corpus data into a formal structure that can be analyzed using quantitative methods– Rating scales: e.g., depth of an explanation– Categorical Coding: e.g., self-explanation
versus summarization
• Caveat: corpus analysis can be subjective– Reliability standards mitigate the risk of
subjectivity in judgments
Motivation for TagHelper Project
• Social scientists, psychologists, and educational scientists code by hand large quantities of corpus data
• Tools currently used by behavioral researchers do not support decision making of human coders– e.g., MacSHAPA, NVivo, HyperResearch, etc.
• Text classification technology can support automatic prediction of codes for supporting data analysis tasks
• Automatic classification technology can also trigger on-line interventions in real time or process freeform student input
Example Research Context: Learning in On-Line Discussions
Knowledge MediaResearch Center,Tuebingen Germany
Example Coding Scheme for Analyzing Collaborative Learning Interactions
• Original German: "Es ist seine Faulheit, aber mangelndes Talent würde auch passen"
• Translation: "It is his laziness, but lack of talent would also fit"
• Social Modes Code: Integration-Oriented Consensus Building
• Original German: "Es ist seine Faulheit, mangelndes Talent würde weniger passen"
• Translation: "It is his laziness, lack of talent would fit less well"
• Social Modes Code: Conflict-Oriented Consensus Building
Dimension Number of Classes
Epistemic (EPI) 35
Micro (ATOL) 4
Macro (ALEI) 6
Social Modes (SOC)
21
Reaction (REA) 3
Appropriateness (PRO)
4
Quoted (QUO) 2
Knowledge
Language
RelationalStyle
* Training + Coding by hand requires 25% of project resources!
Other TagHelper/InfoMagnets Applications
• Data Analysis– InfoMagnets style topic analyses reveals more and
less effective student strategies (Kumar et al., 2006)
– Topicality metrics predict on-line community behavior (Arguello et al., 2006)
• On-Line interventions– Triggers feedback to qualitative physics
explanations offered as justifications for multiple choice answers (Ogilvie group, English data)
– Trigger feedback for group and individual brainstorming (Wang et al., submitted, Chinese data)
Corpus Analysis Offerings (Tuesday)
• Basic DataMining from Corpus Data– Introduction to coding scheme design and protocol
analysis– In depth walk through of basic TagHelper
functionality• Advanced Conversational DataMining with
TagHelper– Feature space design– Tuning machine learning algorithms
• Exploratory Corpus Analysis with InfoMagnets– Conceptual discussion of topic segmentation and
topic clustering technology– Using a topic analysis as part of Learning Science
research