eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

Post on 06-May-2015

532 views 0 download

Transcript of eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"

1

Analyzing the students´ behavior and relevant topics in virtual learning

communities

Llanos Tobarra, Antonio Robles-Gómez, Salvador Ros, Roberto Hernández, Agustín C. Caminero

Computers in Human Behavior 31(2014) 659-669 , online December (2013) JCR Q1

Departamento de Sistemas de Comunicación y ControlUniversidad Nacional de Educación a Distancia (UNED){llanos,arobles,sros,roberto,accaminero}@scc.uned.es

2

Outline

• Introduction• Outcomes

3

Introduction

• UNED is a distance methodology university.• Need of some specific techniques for

monitoring and analysing the information gathered by LMS.

• Learning Analytics is defined as:The measurement, collection, analysis and

reporting of data about learners and their contexts, for purposes of understanding and optimising

learning and the environments in which it occurs.G. Siemmens – Lak’11

4

Different ApproachesType of Analytics Level or Object of Analysis Who Benefits?

Learning Analytics

Educational data mining

Course-level: social networks, conceptual development,

discourse analysis, “intelligent curriculum”

Learners, faculty

Departmental: predictive modeling, patterns of

success/failureLearners, faculty

Academic Analytics

Institutional: learner profiles, performance of academics,

knowledge flowAdministrators,

funders, marketing

Regional (state/provincial): comparisons between systems

Funders, administrators

National and InternationalNational

governments, education authorities

5

Learning Analytics Process

6

Outline

• Introduction• Outcomes

7

Where do we focus?

• Forums– Essential for negotation and

exchange of ideas.– Collaborative learning–High correlation of students

participation levels with positive learning outcomes and knowledge constructions

8

Outcomes

• Provide and extensive analysis of the student´s behaviour ia an on-line learning community

• Propose a set of algorithms to characterize in an automatic way the most relevants topics of the community

• How ? – Students and faculty`interacction by means of the

messages in the forums have been analyzed.

• Results– Patterns of behaviour has been found– Posibility to adapt learning/teaching process

9

Questions

• What are the students’ behavior patterns during their interaction and participation in the asynchronous virtual discussion forums of the virtual learning community?

• What are the most relevant topics and subtopics in the asynchronous on-line discussion forums of the on-line learning community?

• Could they be characterized in an automatic way?

10

Input data

• Data from two academic years 2010-2011,2011-2012

• Forum Student • Forum Activities 1-6• Forum Activities 7-11• Forum Faculty• About 2000 messages

11

Procedure

• Data collection and statistical analysis

• Semantic analysis• Calculation of stem networks

12

Procedure

For each participant (Statistical indicators)• Number of published messages• Number of replies• Number of initiated conversations• Number of initiated conversations witout

replies• Number of conversations where the

participant has posted a mesage• Number of forums where the participans has

posted a message

13

Procedure

• Semantics– Splitting message in basic tokens– Remove stop-words– Obtaim the token stem (Porter algoritm)– Calculate daily and global frecuencies Apache Lucene Library, Snowball tool

14

First Question

• What are the students’ behavior patterns during their interaction and participation in the asynchronous virtual discussion forums of the virtual learning community?

15

Student behaviour modelling

• Students can be classified depending of their pattern of behaviour as:– Producers• Proactive• Reactive

– Consumers

SIIE'12

16

Second Question

• What are the most relevant topics and subtopics in the asynchronous on-line discussion forums of the on-line learning community?

17

Topic Modelling Process

• The topics modelling process deals with the detection of the most relevant topics which are employed in asynchronous discussion forums of on-line educational environments.

18

Topic Dynamics

• First decomposition:– Chatter topics, which are internally

driven, can be known as sustained discussion topics. New thoughts on chatter topics are published all days at an educational community and some members can react to previous ideas posted.

– Spike topics, which are externally induced, produce sharp rises in postings.

19

Topic Dynamics

• First decomposition:– Chatter topics, which are internally

driven, can be known as sustained discussion topics. New thoughts on chatter topics are published all days at an educational community and some members can react to previous ideas posted.

– Spike topics, which are externally induced, produce sharp rises in postings.

20

Topic Dynamics (II)• Second decomposition:

– Just spike. These topics have a very low correlation with any chatter topic, but they are very correlated to an external event, such as congratulations for the new year or initial introductions of participants. They are initially inactive, although they become very active within a particular time sub-window. After that, they come back inactive.

– Spiky chatter. These topics have a high correlation with a chatter level and, additionally, they are very sensitive to external events. The scores could be classified as a spiky chatter subtopic due to its strong correlation with the exam topic and its influence with an external event (as the publication of the participants’ scores is).

– Mostly chatter. These topics are continuously being discussed at moderate levels, through the entire period of our discussion window, with a small variation on average. These topics were the previously named chatter topics, such as exams.

• Forum topic = mostly chatter topics+ spiky chatter topics.

21

Selecting forum topics• Three weigth functions

• Best fit Weighted frecuency

22

Third Question

• Could they be characterized in an automatic way?

• Two algoritms:– One for mostly chatter– Second spike chatter

• Results– Topics and subtopics

23

Example: topic modelling result

24

What else?

• Create Topic networks per Forum

SIIE'12 25

SIIE'12 26

27

Thanks for your attention!!!¿any question?

28

Topic Modelling: Chatter

• The DumpTerms set contains all terms already detected as irrelevant topics, such as names or surnames.

• Plural detection.• Accumulated

frequency (f(ti)) is computed for each term.

• Then, they’re ranked.• As result we obtained

a set called Chatter.

SIIE'12

29

Topic Modelling: Spikes• For each pair, ti of T

set and tj of Chatter set, the number of appearances (si) of both terms together in any message mk is counted.

• Also, the probability of apparition of tj given ti (cri) is calculated.

• In case these values are between a predefined intervals, the algorithm adds ti

to the Spike set.

SIIE'12