eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

29
Analyzing the students´ behavior and relevant topics in virtual learning communities Llanos Tobarra, Antonio Robles-Gómez, Salvador Ros, Roberto Hernández, Agustín C. Caminero Computers in Human Behavior 31(2014) 659-669 , online December (2013) JCR Q1 Departamento de Sistemas de Comunicación y Control Universidad Nacional de Educación a Distancia (UNED) {llanos,arobles,sros,roberto,accaminero}@scc.uned.es 1

Transcript of eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

Page 1: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

1

Analyzing the students´ behavior and relevant topics in virtual learning

communities

Llanos Tobarra, Antonio Robles-Gómez, Salvador Ros, Roberto Hernández, Agustín C. Caminero

Computers in Human Behavior 31(2014) 659-669 , online December (2013) JCR Q1

Departamento de Sistemas de Comunicación y ControlUniversidad Nacional de Educación a Distancia (UNED){llanos,arobles,sros,roberto,accaminero}@scc.uned.es

Page 2: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

2

Outline

• Introduction• Outcomes

Page 3: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

3

Introduction

• UNED is a distance methodology university.• Need of some specific techniques for

monitoring and analysing the information gathered by LMS.

• Learning Analytics is defined as:The measurement, collection, analysis and

reporting of data about learners and their contexts, for purposes of understanding and optimising

learning and the environments in which it occurs.G. Siemmens – Lak’11

Page 4: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

4

Different ApproachesType of Analytics Level or Object of Analysis Who Benefits?

Learning Analytics

Educational data mining

Course-level: social networks, conceptual development,

discourse analysis, “intelligent curriculum”

Learners, faculty

Departmental: predictive modeling, patterns of

success/failureLearners, faculty

Academic Analytics

Institutional: learner profiles, performance of academics,

knowledge flowAdministrators,

funders, marketing

Regional (state/provincial): comparisons between systems

Funders, administrators

National and InternationalNational

governments, education authorities

Page 5: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

5

Learning Analytics Process

Page 6: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

6

Outline

• Introduction• Outcomes

Page 7: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

7

Where do we focus?

• Forums– Essential for negotation and

exchange of ideas.– Collaborative learning–High correlation of students

participation levels with positive learning outcomes and knowledge constructions

Page 8: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

8

Outcomes

• Provide and extensive analysis of the student´s behaviour ia an on-line learning community

• Propose a set of algorithms to characterize in an automatic way the most relevants topics of the community

• How ? – Students and faculty`interacction by means of the

messages in the forums have been analyzed.

• Results– Patterns of behaviour has been found– Posibility to adapt learning/teaching process

Page 9: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

9

Questions

• What are the students’ behavior patterns during their interaction and participation in the asynchronous virtual discussion forums of the virtual learning community?

• What are the most relevant topics and subtopics in the asynchronous on-line discussion forums of the on-line learning community?

• Could they be characterized in an automatic way?

Page 10: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

10

Input data

• Data from two academic years 2010-2011,2011-2012

• Forum Student • Forum Activities 1-6• Forum Activities 7-11• Forum Faculty• About 2000 messages

Page 11: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

11

Procedure

• Data collection and statistical analysis

• Semantic analysis• Calculation of stem networks

Page 12: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

12

Procedure

For each participant (Statistical indicators)• Number of published messages• Number of replies• Number of initiated conversations• Number of initiated conversations witout

replies• Number of conversations where the

participant has posted a mesage• Number of forums where the participans has

posted a message

Page 13: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

13

Procedure

• Semantics– Splitting message in basic tokens– Remove stop-words– Obtaim the token stem (Porter algoritm)– Calculate daily and global frecuencies Apache Lucene Library, Snowball tool

Page 14: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

14

First Question

• What are the students’ behavior patterns during their interaction and participation in the asynchronous virtual discussion forums of the virtual learning community?

Page 15: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

15

Student behaviour modelling

• Students can be classified depending of their pattern of behaviour as:– Producers• Proactive• Reactive

– Consumers

SIIE'12

Page 16: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

16

Second Question

• What are the most relevant topics and subtopics in the asynchronous on-line discussion forums of the on-line learning community?

Page 17: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

17

Topic Modelling Process

• The topics modelling process deals with the detection of the most relevant topics which are employed in asynchronous discussion forums of on-line educational environments.

Page 18: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

18

Topic Dynamics

• First decomposition:– Chatter topics, which are internally

driven, can be known as sustained discussion topics. New thoughts on chatter topics are published all days at an educational community and some members can react to previous ideas posted.

– Spike topics, which are externally induced, produce sharp rises in postings.

Page 19: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

19

Topic Dynamics

• First decomposition:– Chatter topics, which are internally

driven, can be known as sustained discussion topics. New thoughts on chatter topics are published all days at an educational community and some members can react to previous ideas posted.

– Spike topics, which are externally induced, produce sharp rises in postings.

Page 20: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

20

Topic Dynamics (II)• Second decomposition:

– Just spike. These topics have a very low correlation with any chatter topic, but they are very correlated to an external event, such as congratulations for the new year or initial introductions of participants. They are initially inactive, although they become very active within a particular time sub-window. After that, they come back inactive.

– Spiky chatter. These topics have a high correlation with a chatter level and, additionally, they are very sensitive to external events. The scores could be classified as a spiky chatter subtopic due to its strong correlation with the exam topic and its influence with an external event (as the publication of the participants’ scores is).

– Mostly chatter. These topics are continuously being discussed at moderate levels, through the entire period of our discussion window, with a small variation on average. These topics were the previously named chatter topics, such as exams.

• Forum topic = mostly chatter topics+ spiky chatter topics.

Page 21: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

21

Selecting forum topics• Three weigth functions

• Best fit Weighted frecuency

Page 22: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

22

Third Question

• Could they be characterized in an automatic way?

• Two algoritms:– One for mostly chatter– Second spike chatter

• Results– Topics and subtopics

Page 23: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

23

Example: topic modelling result

Page 24: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

24

What else?

• Create Topic networks per Forum

Page 25: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

SIIE'12 25

Page 26: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

SIIE'12 26

Page 27: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

27

Thanks for your attention!!!¿any question?

Page 28: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

28

Topic Modelling: Chatter

• The DumpTerms set contains all terms already detected as irrelevant topics, such as names or surnames.

• Plural detection.• Accumulated

frequency (f(ti)) is computed for each term.

• Then, they’re ranked.• As result we obtained

a set called Chatter.

SIIE'12

Page 29: eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

29

Topic Modelling: Spikes• For each pair, ti of T

set and tj of Chatter set, the number of appearances (si) of both terms together in any message mk is counted.

• Also, the probability of apparition of tj given ti (cri) is calculated.

• In case these values are between a predefined intervals, the algorithm adds ti

to the Spike set.

SIIE'12