Generating Semantic Concept Map for MOOCs · 2016-06-28 · Generating Semantic Concept Map for...

1
Generating Semantic Concept Map for MOOCs Zhuoxuan Jiang, Peng Li, Yan Zhang, Xiaoming Li School of Electronics Engineering and Computer Science, Peking University, China Methodology MOOC Materials Need Reorganization Improve experience of learning process (adaptive learning) Knowledge management across domains Concept map, useful but manually costly (experts required) Semantic Concept Map (SCM) Definition: SCM = {C, R}. C is set of concepts {c 1 ,c 2 ,...,c n }. R is set of relationships (edges) between concepts {r 11 ,...,r ij ,...,r nn } Consider only the semantic relationship Concise for program to run large-scalely Universal to be scaled to cross-domain courses Applications of SCM Discrimination of approximate concepts (e.g. for quiz) Crowdsourcing of course Wiki for social learning Concept hints in forum discussions Two-phase SCM generation approach Phase 1: Concept extraction Phase 2: Relationship establishment Introduction Textual Preprocessing Tokenization, filtering, word segment, etc. Conflation, data are randomly shuffled Concept Extraction Sequence labeling problem, each word ϵ {B, I, O} Course- and instructor-agnostic features: stylistic, structural, contextual, semantic and dictionary features CRF + semi-supervised machine learning framework Self-training, to reduce the cost of labeling Definition of Node and Edge Node weight for Fundamentality: term frequency(tf) Node weight for Importance: term frequency and inverted document frequency (tf-idf) Edge weight: semantic similarity (cosine distance of concept word vectors) Learning Path Generation For each concept, iteratively choose top k most semantically similar concepts Regard the most fundamental/important ones as successors Consider locational order of appearance in Lecture Notes Experiments Data Sets Teaching materials of an interdisciplinary course conducted on Coursera, including lecture notes, PPTs and questions Baselines to Extract Concepts Term Frequency (TF), a statistic baseline Bootstrapping (BT), a rule-based iterative algorithm Supervised Concept-CRF (SC-CRF), a supervised CRF Semi-Supervised Concept-CRF (SSC-CRF), semi-supervised version of SC-CRF Results Table 2: Performance of Baselines and Our Approach Figure 1: Procedure of Semantic Concept Map Generation Figure 2: Comparison between SC-CRF and SSC-CRF Table 1: Statistics of the Course: People and Network Figure 3: Two Kinds of Semantic Concept Map and Learning Path Generated (b) For importance (a) For fundamentality Conclusion Technical and pedagogical feasibility of the novel SCM A universal two-phase approach to re-organize MOOC materials with a good efficacy The SCMs and learning paths generated can be manually revised up to requirements To be scaled to more across-domain courses in the future Teaching Materials Textual Preprocessing Concept Extraction Semantic Concept Map Learning Path Generation Definition of Node and Edge Phase 1 Phase 2 Two demo learning paths: (a) Node → Edge → Element → Set → Alternative → Vote (b) PageRank → PageRankAlgorithm → SmallWorld → Balance → NashBalance

Transcript of Generating Semantic Concept Map for MOOCs · 2016-06-28 · Generating Semantic Concept Map for...

Generating Semantic Concept Map for MOOCs

Zhuoxuan Jiang, Peng Li, Yan Zhang, Xiaoming Li

School of Electronics Engineering and Computer Science, Peking University, China

Methodology

MOOC Materials Need Reorganization• Improve experience of learning process (adaptive learning)

• Knowledge management across domains

• Concept map, useful but manually costly (experts required)

Semantic Concept Map (SCM)Definition: SCM = {C, R}. C is set of concepts {c1,c2,...,cn}. R is

set of relationships (edges) between concepts {r11,...,rij,...,rnn}

• Consider only the semantic relationship

• Concise for program to run large-scalely

• Universal to be scaled to cross-domain courses

Applications of SCM• Discrimination of approximate concepts (e.g. for quiz)

• Crowdsourcing of course Wiki for social learning

• Concept hints in forum discussions

Two-phase SCM generation approach• Phase 1: Concept extraction

• Phase 2: Relationship establishment

Introduction

Textual Preprocessing• Tokenization, filtering, word segment, etc.

• Conflation, data are randomly shuffled

Concept Extraction• Sequence labeling problem, each word ϵ {B, I, O}

• Course- and instructor-agnostic features: stylistic, structural,

contextual, semantic and dictionary features

• CRF + semi-supervised machine learning framework

• Self-training, to reduce the cost of labeling

Definition of Node and Edge• Node weight for Fundamentality: term frequency(tf)

• Node weight for Importance: term frequency and inverted

document frequency (tf-idf)

• Edge weight: semantic similarity (cosine distance of concept

word vectors)

Learning Path Generation• For each concept, iteratively choose top k most semantically

similar concepts

• Regard the most fundamental/important ones as successors

• Consider locational order of appearance in Lecture Notes

Experiments

Data SetsTeaching materials of an interdisciplinary course conducted on

Coursera, including lecture notes, PPTs and questions

Baselines to Extract Concepts• Term Frequency (TF), a statistic baseline

• Bootstrapping (BT), a rule-based iterative algorithm

• Supervised Concept-CRF (SC-CRF), a supervised CRF

• Semi-Supervised Concept-CRF (SSC-CRF), semi-supervised

version of SC-CRF

ResultsTable 2: Performance of

Baselines and Our Approach

Figure 1: Procedure of Semantic Concept Map Generation

Figure 2: Comparison between SC-CRF

and SSC-CRF

Table 1: Statistics of the Course: People and Network

Figure 3: Two Kinds of Semantic Concept Map and Learning Path Generated

(b) For importance(a) For fundamentality

Conclusion• Technical and pedagogical feasibility of the novel SCM

• A universal two-phase approach to re-organize MOOC

materials with a good efficacy

• The SCMs and learning paths generated can be manually

revised up to requirements

• To be scaled to more across-domain courses in the future

Teaching

Materials

Textual

Preprocessing

Concept

Extraction

Semantic

Concept

Map

Learning Path

Generation

Definition of

Node and Edge

Phase 1 Phase 2

Two demo learning paths:(a) Node → Edge → Element → Set → Alternative → Vote

(b) PageRank → PageRankAlgorithm → SmallWorld → Balance → NashBalance