Automated Cognitive Presence Detection in Online Discussion Transcripts

23
Automated Cognitive Presence Detection in Online Discussion Transcripts Vitomir Kovanovic 1 Srecko Joksimovic 1 vitomir [email protected] [email protected] Dragan Gasevic 2 Marek Hatala 1 [email protected] [email protected] 1 School of Interactive Arts and Technology Simon Fraser University, Burnaby, Canada 2 School of Computing Science Athabasca University , Edmonton, Canada March 25, 2014

description

Learning Analytics and Machine Learning Workshop at LAK 14, Learning Analytics and Knowledge Conference in Indianapolis, IN USA

Transcript of Automated Cognitive Presence Detection in Online Discussion Transcripts

Page 1: Automated Cognitive Presence Detection in Online Discussion Transcripts

Automated Cognitive Presence Detection in OnlineDiscussion Transcripts

Vitomir Kovanovic1 Srecko Joksimovic1

vitomir [email protected] [email protected]

Dragan Gasevic2 Marek Hatala1

[email protected] [email protected]

1School of Interactive Arts and TechnologySimon Fraser University,

Burnaby, Canada

2School of Computing ScienceAthabasca University, Edmonton, Canada

March 25, 2014

Page 2: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Overview

Overall idea

To examine how we can use text mining for automation ofcontent analysis of discussion transcripts.

More specifically,

We looked at the automation of content analysis of cognitivepresence, one of the three main components of Communityof Inquiry framework.

V. Kovanovic et al. Automated Cog. Presence Detection 1 / 18

Page 3: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Overview

Big goal

Provide instructors with information on student’s learning progresswithin a learning community so that appropriate instructionalinterventions can be planned and implemented.

Provide learners with the feedback on their discussion contributions sothat they can reflect more objectively on their own learning.

V. Kovanovic et al. Automated Cog. Presence Detection 2 / 18

Page 4: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Asynchronous online discussions -“gold mine of information” (Henri, 1992)

• They are frequently used for all typesof education delivery,

• Their use produced large amount ofdata about learning processes,

• Their use is well supported by thesocial-constructivist pedagogies.

V. Kovanovic et al. Automated Cog. Presence Detection 3 / 18

Page 5: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Asynchronous online discussions - Issues and challenges

• Produced data is used mainly for research after the courses are over,

• Content analysis techniques are complex and time consuming,

• Content analysis had almost no impact on educationalpractice (Donnelly and Gardner, 2011),

• There is a need for more proactive use of the data throughautomation:

• Few attempts for automated content analysis,• Focus mostly on surface level characteristics, and• Not based on well established theories of education.

V. Kovanovic et al. Automated Cog. Presence Detection 4 / 18

Page 6: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Community of Inquiry (CoI) FrameworkText Classification and Automatic Content Analysis Approaches

Community of Inquiry (CoI) framework

is a conceptual framework outlying the important constructs that defineworthwhile educational experience in distance education setting.

• Social presence: relationships andsocial climate in a community.

• Cognitive presence: phases ofcognitive engagement and knowledgeconstruction.

• Teaching presence: instructionalrole during social learning.

CoI model is:

• Extensively researched and validated,

• Adopts Content Analysis forassessment of presences

V. Kovanovic et al. Automated Cog. Presence Detection 5 / 18

Page 7: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Community of Inquiry (CoI) FrameworkText Classification and Automatic Content Analysis Approaches

Cognitive Presence

Cognitive Presence

“an extent to which the participants in any particular configuration of acommunity of inquiry are able to construct meaning through sustainedcommunication.” (Garrison, Anderson, and Archer, 1999, p .89)

Four phases of cognitive presence:

1 Triggering event: Some issue, dilemma or problem is identified.

2 Exploration: Students move between private world of reflection andshared world of social knowledge construction.

3 Integration: Students filter irrelevant information and synthesizenew knowledge.

4 Resolution: Students analyze practical applicability, test differenthypotheses, and start a new learning cycle.

V. Kovanovic et al. Automated Cog. Presence Detection 6 / 18

Page 8: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Community of Inquiry (CoI) FrameworkText Classification and Automatic Content Analysis Approaches

Cognitive Presence

Cognitive Presence

“an extent to which the participants in any particular configuration of acommunity of inquiry are able to construct meaning through sustainedcommunication.” (Garrison, Anderson, and Archer, 1999, p .89)

Four phases of cognitive presence:

1 Triggering event: Some issue, dilemma or problem is identified.

2 Exploration: Students move between private world of reflection andshared world of social knowledge construction.

3 Integration: Students filter irrelevant information and synthesizenew knowledge.

4 Resolution: Students analyze practical applicability, test differenthypotheses, and start a new learning cycle.

V. Kovanovic et al. Automated Cog. Presence Detection 6 / 18

Page 9: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Community of Inquiry (CoI) FrameworkText Classification and Automatic Content Analysis Approaches

Cognitive presence coding scheme

• Use of whole message as unit of analysis,• Look for particular indicators of different sociocognitive processes,• Requires expertise with coding instrument and domain knowledge.

V. Kovanovic et al. Automated Cog. Presence Detection 7 / 18

Page 10: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Community of Inquiry (CoI) FrameworkText Classification and Automatic Content Analysis Approaches

Issues and challenges

• Used for analysis long after the courses are over,

• Require substantial manual and time consuming work,

• Not providing explanations for observed levels of thee presences, and

• Not providing suggestions and guidelines for instructors to directtheir pedagogical decisions.

V. Kovanovic et al. Automated Cog. Presence Detection 8 / 18

Page 11: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Community of Inquiry (CoI) FrameworkText Classification and Automatic Content Analysis Approaches

Text Mining and Text Classification

• Text mining for coding cognitive presence,

• Supervised learning classification problem,• Cognitive presence is latent construct,

• We based our work on similar problems of mining latent constructs,• Opining mining (Arora, Joshi, and Rose, 2009; Joshi and

Penstein-Rose, 2009; Tsur and Davidov, 2010),• Gender style difference detection (Gianfortoni, Adamson, and Rose,

2011),• Sentiment analysis (Arora et al., 2010)

• Most commonly used features:• Lexical and part-of-speech N-grams,• Dependency triplets,• Word pattern features, and• Some combination of them.

• Most commonly used algorithms:• Support Vector Machines (SVMs)• K Nearest Neighbors (KNNs)

V. Kovanovic et al. Automated Cog. Presence Detection 9 / 18

Page 12: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Data setFeature extractionClassification

Data set

• Six offerings of graduate level course in software engineering atdistance learning university,

• Total of 1747 messages, 81 students,• Manually coded by two coders

(agreement = 98.1%, Cohen’s κ = 0.974),

ID Phase Messages (%)

0 Other 140 8.01%1 Triggering Event 308 17.63%2 Exploration 684 39.17%3 Integration 508 29.08%4 Resolution 107 6.12%

All phases 1747 100%

Number of Messages in Different Phases of Cognitive Presence

V. Kovanovic et al. Automated Cog. Presence Detection 10 / 18

Page 13: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Data setFeature extractionClassification

Feature Extraction

• Unigrams, Bigrams and Trigrams,

• Part-of-Speech Bigrams and Trigrams,

• Backoff Bigrams and Trigrams:

Example: “John is working.”

Bigrams:

• john is,

• is working.

Backoff Bigrams:

• john 〈verb〉,• 〈noun〉 is,

• is 〈verb〉• 〈verb〉 working.

V. Kovanovic et al. Automated Cog. Presence Detection 11 / 18

Page 14: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Data setFeature extractionClassification

Feature Extraction

• Dependency triplets: 〈rel, head, modifier〉

Example: “Bills on ports and immigration were submitted bySenator Brownback, Republican of Kansas.”

〈nsubjpass, submitted, Bills〉〈auxpass, submitted, were〉〈agent, submitted, Brownback〉〈nn, Brownback, Senator〉〈appos, Brownback, Republican〉〈prep of, Republican, Kansas〉〈prep on, Bills, ports〉〈conj and, ports, immigration〉〈prep on, Bills, immigration〉

V. Kovanovic et al. Automated Cog. Presence Detection 12 / 18

Page 15: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Data setFeature extractionClassification

Feature Extraction

• Backoff dependency triplets:

Example: “Bills on ports and immigration were submitted bySenator Brownback, Republican of Kansas.”

Dependency triplet:

• 〈conj and, ports, immigration〉Backoff dependency triplets:

• 〈conj and, 〈noun〉, immigration〉• 〈conj and, ports, 〈noun〉〉• 〈conj and, 〈noun〉, 〈noun〉〉

V. Kovanovic et al. Automated Cog. Presence Detection 13 / 18

Page 16: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Data setFeature extractionClassification

Additional Features

• Number of named entities in the messageBrainstorming should involve more concepts than posing a question,

• Is message first in the discussion?Posing questions is more likely to be initiating discussions,

• Is message a reply to the first message in the discussion?

V. Kovanovic et al. Automated Cog. Presence Detection 14 / 18

Page 17: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Data setFeature extractionClassification

Classification

Classifier:

• Linear SVM classifier with default value of C = 1,

• Only features with support of 10 or more,

• Accuracy evaluated using 10 fold cross-validation,

• Comparison of models using McNemar’s test.

Implementation:

• Implemented in Java,

• Feature extraction using Stanford CoreNLP1 toolkit,• Tokenization, Part-of-Speech, and Dependency parsing modules

• Classification using Weka (Witten, Frank, and Hall, 2011) andLibSVM (Chang and Lin, 2011), and

• Statistical comparison using Java Statistical Classes (JSC)2

1http://nlp.stanford.edu/software/corenlp.shtml2http://www.jsc.nildram.co.uk/index.htm

V. Kovanovic et al. Automated Cog. Presence Detection 15 / 18

Page 18: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Results

Feature Set Additional Classification Cohen’s P-valFeatures Accuracy Kappa

majority vote baseline 0 0.392 0.000unigrams baseline 2241 0.547 0.364

+ bigrams 3155 0.556 0.376 0.427+ trigrams 911 0.554 0.374 0.571+ pos-bigrams 737 0.561 0.385 0.249+ pos-trigrams 2810 0.560 0.382 0.304+ bo-bigrams 6953 0.560 0.381 0.323+ bo-trigrams 17986 0.584 0.410 0.006+ dep-triplets 1435 0.564 0.386 0.062+ h-bo-triplets 1931 0.571 0.396 0.031+ m-bo-triplets 2771 0.579 0.406 0.003+ hm-bo-triplets 1375 0.558 0.379 0.359+ entity-count 1 0.559 0.381 0.030+ is-first 1 0.555 0.375 0.037+ is-reply-first 1 0.550 0.367 0.665

Classification Results. Bold typeface indicates statistical significance.

V. Kovanovic et al. Automated Cog. Presence Detection 16 / 18

Page 19: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Current progress

• Use of nested-cross validation for better selection of modelparameters (kernel and hyperparameters),

• RBF kernel shown to be better than linear kernel,

• LIWC features, LSA coherence,

• accuracy = 61.7%, Cohen’s κ = 0.43

V. Kovanovic et al. Automated Cog. Presence Detection 17 / 18

Page 20: Automated Cognitive Presence Detection in Online Discussion Transcripts

IntroductionBackground

MethodsResults

Conclusions and Future WorkReferences

Conclusions and future work

Summary:

• Promising path to explore,

• Use of backoff trigrams, plain and backoff dependency triplets, entitycount and first message indicator seem useful,

Future work:

• Additional types of features which look at the context of previousmessages (e.g., convergence vs. divergence),

• Moving away from SVM, explore other classification methods whichare better at explanation (e.g., boosting and random forests),

• Give associated probabilities for each classification,

• Give relative importance of different features.

Challenges:

• Challenges with message unit of analysis and surface-level features,

• Low frequency of resolution messages.

V. Kovanovic et al. Automated Cog. Presence Detection 18 / 18

Page 21: Automated Cognitive Presence Detection in Online Discussion Transcripts

Thank you

Page 22: Automated Cognitive Presence Detection in Online Discussion Transcripts

References I

Arora, Shilpa, Mahesh Joshi, and Carolyn P. Rose (2009). “Identifying types of claims in

online customer reviews”. In: Proceedings of the HLT-NAACL 2009, 37–40.

Arora, Shilpa et al. (2010). “Sentiment classification using automatically extracted

subgraph features”. In: Proceedings of the NAACL HLT 2010 Workshop onComputational Approaches to Analysis and Generation of Emotion in Text, 131–139.

Chang, Chih-Chung and Chih-Jen Lin (2011). “LIBSVM: A library for support vector

machines”. In: ACM Transactions on Intelligent Systems and Technology 2 (3),27:1–27:27.

Donnelly, Roisin and John Gardner (2011). “Content analysis of computer conferencing

transcripts”. In: Interactive Learning Environments 19.4, pp. 303–315.

Garrison, D. Randy, Terry Anderson, and Walter Archer (1999). “Critical Inquiry in a

Text-Based Environment: Computer Conferencing in Higher Education”. In: The Internetand Higher Education 2.2–3, pp. 87–105.

Gianfortoni, Philip, David Adamson, and Carolyn P. Rose (2011). “Modeling of stylistic

variation in social media with stretchy patterns”. In: Proceedings of the First Workshop onAlgorithms and Resources for Modelling of Dialects and Language Varieties, 49–59.

Henri, France (1992). “Computer Conferencing and Content Analysis”. en. In:

Collaborative Learning Through Computer Conferencing, pp. 117–136.

Joshi, Mahesh and Carolyn Penstein-Rose (2009). “Generalizing dependency features for

opinion mining”. In: Proceedings of the ACL-IJCNLP 2009 Conference, 313–316.

Page 23: Automated Cognitive Presence Detection in Online Discussion Transcripts

References II

Tsur, Oren and Dmitry Davidov (2010). “Icwsm – a great catchy name: Semi-supervised

recognition of sarcastic sentences in product reviews”. In: Proceeding of the InternationalAAAI Conference on Weblogs and Social Media.

Witten, Ian H., Eibe Frank, and Mark A. Hall (2011). Data Mining: Practical MachineLearning Tools and Techniques, Third Edition. 3rd ed. Morgan Kaufmann.