Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System...

30
Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student Dinh Ngoc Lan MS200507 M.Tech of Software Engineering IIIT-Allahabad

Transcript of Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System...

Page 1: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad

Student Grade Prediction Intelligent System

Under guidance ofDr. Sudip SanyalIIIT-Allahabad

StudentDinh Ngoc Lan

MS200507M.Tech of Software

EngineeringIIIT-Allahabad

Page 2: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 2

Content

• Objective of the Project• C4.5 Algorithm on Student Grade Prediction Intelligent

System• Software Architecture and Design• Screenshots and Result Comparison with Other Method

(CART Pro 6.0)• Advantages of the Software• Limitations/Concerns• Scope of Improvement• Software Demo• References

Page 3: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 3

Objective of the Project

• Predict the grade of any student, in any given paper, for currently running courses i.e. their grades have not yet been declared.– Predictions will help teachers/instructors to classify students

based on predicted grades and suggest them to focus on certain subjects.

– Predictions will also help students improve their studying goals and improve their grades.

• Develop a classification tool for other works in future.

(Basic technique to be used for predictions is C4.5 algorithm)

Page 4: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 4

Student Grade Training Table

Subjects

Marks

TargetSubject

OSS SEN SRE ASS

Aplus Aplus A Aplus

Aplus Aplus Aplus Aplus

A A Aplus Aplus

Aplus A Aplus Bplus

Aplus Aplus A A

A A Aplus Bplus

Bplus Bplus Bplus A

A Bplus B Bplus

Aplus Aplus Aplus Aplus

A Bplus C A

Bplus A B Bplus

Bplus Bplus Bplus Bplus

A A A A

This training table will be used to build a decision tree by using C4.5 algorithm.

Back

Page 5: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 5

Decision Tree created from student grade table

A plus

SRE

AB

A

B plus

C

Represent the predicted value of

ASS

SEN

A plusA

B plus

B plusSEN

A

A

A plus

SEN

A plus

A plus

A

SEN

A

A

A decision tree is built from the previous training table

Back

Page 6: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 6

Construct Rule Set if … then …

Example of Rule Set if … then … from the previous decision tree• if SRE =A plus and SEN = A plus then ASS =A plus• if SRE =A plus and SEN = A then ASS =B plus• if SRE =A and SEN = A plus then ASS =A• if SRE =A and SEN = A then ASS =A• if SRE =B plus then ASS =A• if SRE =B then ASS =B plus• if SRE =C then ASS =A

A decision tree can be represented as a rule set IF-THEN

Page 7: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Function C4.5:

Input: (R: a set of non-target attributes, C: the target attribute, S: a training set)

returns a decision tree;

Begin

If (S is empty){

return a single node with value Failure;

}else{

if (S consists of records all with the same value for the target attribute){

return a single leaf node with that value;

}else{

if (R is empty){

then return a single node with the value of the most frequent of the values of the target attribute that are found in records of S; [in this situation, there may be errors, examples that will be improperly classified]

}else{

Let A be the attribute with largest Gain Ratio (A, S) among attributes in R;

Let {aj| j=1, 2... m} be the values of attribute A;

Let {Sj| j=1, 2... m} be the subsets of S consisting respectively of records with value aj for A;

Return a tree with root labeled A and arcs labeled a1, a2... am going respectively to the trees (C4.5(R-{A}, C, S1), C4.5(R-{A}, C, S2)... C4.5(R-{A}, C, Sm);

Recursively apply C4.5 to subsets {Sj| j=1, 2... m} until they are empty;

}

}

}

End

C4.5 Algorithm

Page 8: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 8

Example: Steps how to choose the best subject as a candidate node on the tree

Training tableDecision Tree

Gain(X)Gain Ratio(X)

• In the target subject ASS: 4 cases belong to Aplus, 4 cases to A, 5 cases to Bplus, then we have :

info (T) = – 4/13 x log2(4/13) – 4/13 x log2(4/13) – 5/13 x log2(5/13) = 1.576 (This represents the average information needed to identify the class of a case in T)

• Calculate InfoX(T), gain(X) for the non target subject X in the training table:

Let count InfoX(T) for OSS first:

+5/13 x (-1/5 x log2(1/5) -2/5 x log2(2/5) -2/5 x log2(2/5)) + 3/13 x (-2/3 x log2(2/3) – 1/3 x log2(1/3)) =1.324

infoOOS (T)= 5/13 x (-3/5 x log2(3/5) -1/5 x log2(1/5) - 1/5 x log2(1/5))

After using OSS to divide T into three subsets: 5 cases belong to Aplus, 5 to A and 3 to Bplus:

0.2521.324- 1.576)(inf)(inf)( ToToOSSgain OSS

Similarly, computing for other possible choices (SEN and SRE) we find: SEN: infoSEN(T) = 1.084 and gain(SEN): 1.576-1.084=0.492. SRE: infoSRE(T) = 0.739 and gain(SRE): 1.576-0.739=0.837.

• Calculate split info(X) and gain ratio(X) for the non target subject X in the training table:split info(OOS) = - 5/13 x log2 (5/13) - 5/13 x log2 (5/13) – 3/13 x log2 (3/13)

162.0)(inf

)()(

OOSosplit

OOSgainOOSratiogain

Similarly, computing for other subjects we get:gain ratio(SEN) = 0.312, gain ration(SRE) = 0.392

• Finally, the best subject will be chosen is SRE because it has the highest gain ratio

Page 9: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 9

Error based pruning Example

Error based pruning

if SRE =A and SEN = A plus then ASS =A (1|2)if SRE =A and SEN = A then ASS =A (1|1)

• Consider of sub tree represent as rule set below:

• The key idea is: if predicted error of the leave is smaller than the sub-tree’s error then prune the tree by replacing the sub-tree with that leaf.

Calculate the number of predicted errors for the sub-tree:2 x U(1,2) + 1 x U(1,1)=2 x 0.9065 + 1 x 0.75 = 2.563

Calculate the number of predicted errors for the leave A on total 3 cases of the sub tree: 3 x U(1,3)=3 x 0.663= 1.989

The number of predicted errors of the leave is smaller than the number of predicted errors for the sub-tree then we can replace this sub tree with the leave A: If SRE=A then ASS=A (2|3)

Page 10: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 10

Software Architecture

R uless et

W eb S erver

S tu d en t q u er y m o d u le

R u les s e t p r o c es s in gm o d u le

C lients

S erver

Stude nt M arkE s tim atio nSo f tware

C o ns truc t io n Tre eM o dule

P rune d Tre e M o dule

Exam C ellD atab as e

Follow 3 tier model architecture

Page 11: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 11

Package Diagram

The core software has more than near 2000 lines of code with 5 packages and 16 classes developed in Java programming language.

Page 12: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 12

Class Diagram for Construction Tree Module

Page 13: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 13

Class Diagram for Pruned Tree Module

Page 14: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Sequence Diagram for Construction Tree Module

Page 15: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Sequence Diagram for Pruned Tree

Module

Page 16: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 16

Database Design

S em es ter

S em es ter N am e: VAR C HAR 2 ( 2 0 ) P K

G r ad e

G r ad eN am e: VAR C HAR 2 ( 1 0 ) P K

C o u r s e

C o u r s eN am e: VAR C HAR 2 ( 2 0 ) P K

S tu d en tL is t

I D C ard :VAR C HAR 2 ( 2 0 ) P KS tu d en tN am e:VAR C HAR 2 (2 0 )P h o to : BF I LEI n fo rm atio n :VAR C HAR 2 ( 1 0 0 0 )Ad d r es s :VAR C HAR 2 ( 2 5 0 )T elep h o n e:N UM BERE-m ail:VAR C HAR 2 ( 4 0 )

C o u r s eD eta il

S u b jec tN am e: VAR C HAR 2 (2 0 ) F KC o u r s eN am e:VAR C HAR 2 (2 0 ) F KS em es ter N am e:VAR C HAR 2 (2 0 ) F KBatc h :N UM BER F KC red it:N UM BER

T eac h er C o m m en t

S u b jec tN am e:VAR C HAR 2 ( 2 0 ) F KG r ad eN am e:VAR C HAR 2 ( 1 0 ) F KC o m m en tC o n ten t:VAR C HAR 2 ( 2 0 0 )

S tu d en t D ata

I D C ard :VAR C HAR 2 ( 2 0 ) F KS u b jec tN am e:VAR C HAR 2 ( 2 0 ) F KG r ad eN am e:VAR C HAR 2 ( 1 0 ) F K

S tu d en t E s tim a tio n M ark D a tab as e D es ign

Year s

Batc h :N UM BER P K

Atten d

I D C ard : VAR C HAR 2 ( 2 0 ) F KC o u r s eN am e:VAR C HAR 2 (2 0 ) F KBatc h : N UM BER F K

S u b jec t

S u b jec tN am e:VAR C HAR 2 ( 2 0 ) :P KF u llN am e:VAR C HAR 2 (5 0 )

Page 17: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 17

A Screenshot of Student Grade Prediction Intelligent System

Page 18: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 18

Result Comparison with CART Pro 6.0

Input for CART Pro 6.0 is an excel file with 1000 student

records

Page 19: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 19

Result Comparison with CART Pro 6.0

Rules

Page 20: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 20

Result Comparison with CART Pro 6.0

Page 21: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 21

Advantages of the software

• Performance: Using Java with Web service technology reduces bandwidth consumption and makes the environment more reliable, available and safe.

• Resource Utilization Efficiency: Database Connection Pooling technology allows the application reusing resources from Database Oracle.

• Security: Using Sun Java Application server as a middle tie between clients and RDBMS, with advanced security features provided by Sun.

• Interfaces: Using new Java Sever Face to develop interface with web 2.0 asynchronous technology.

• Ease of Use, Portability, Maintainability, Expandability and System Administration Ease: Using advance feature of J2EE to develop the software.

Page 22: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 22

Limitations/concerns

• Lack of student records to construct decision tree.

• Need an update of teacher’s suggestions regularly according to new situations in teaching progress.

• The prediction based on statistical classification not all the capability of students.

Page 23: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 23

Scope of improvement

• Integrate with other student applications like Student Study Progress management, Student Research management.

• Automatic gives a result of concerned students by e-mail to help them focus on the certain subjects.

• Integrate with other technologies of predictions like Neural Network, Genetic Algorithm to give closer prediction result.

Page 24: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 24

References

[1] Quinlan, J. R., “C4.5: Programs for Machine Learning”. San Mateo, CA: Morgan Kaufmann.[2] Carter, C., and Catlett, J., “Assessing credit card applications using machine learning”. IEEE

Expert, Fall issue, pp. 71-79, 1987.[3] Aha, D. W., Kibler, D., and Albert, M. K., “Instance-based learning algorithms”. Machine

Learning, pp.37-66, 1991.[4] Stanfill, C., and Waltz, D., “Toward memory-based reasoning”. Communications of the ACM,

pp.1213-1228, 1986.[5] Nilsson, N. J., “Learning Machines”. New York: McGraw Hill, 1965.[6] Hinton, G. E., “Learning distributed representations of concepts”. Proceeding of the Eighth

Annual Conference of the Cognitive Science Society, Amherst, MA. Reprinted in R. G. M. Morris (ed.), Parallel Distributed Processing: Implications for Psychology and Neurobiology. Oxford, UK: Oxford University Press, 1986.

[7] McClelland, J. L., and Rumelhart, D. E., “Explorations in Parallel Distributed Processing”. Cambridge, MA: MIT Press, 1988.

[8] Dietterich, T. G., Hill, H., and Bakiri, G., “A comparative study of ID3 and back propagation for English text-to-speech mapping”. Proceedings of the seventh International Conference on Machine Learning (pp.24-31). San Mateo, CA: Morgan Kaufmann, 1989.

[9] Holland, J. H., “Escaping brittleness: The possibilities of general purpose learning algorithms applied to parallel rule-based systems”. In R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (eds.), Machine Learning: An Artificial Approach (Vol. 2). San Mateo, CA: Morgan Kaufmann, 1986.

[10] Java EE 5 Tutorialhttp://java.sun.com/javaee/5/docs/tutorial/doc/

(See the thesis for complete of references)

Page 25: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 25

Thank You

Question Please !!!

Page 26: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 26

Distributed Multi Tier Architecture [10]

Page 27: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 27

Result Comparison with CART Pro 6.0

back

Page 28: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 28

Which subject is the best classifier?

Method 1: (in case of small outcome – few values of the target subject)

The quantity:

Where: X is a subject, T is a subset partitioned by X

Average amount of information needed to identify the target subject of a raw in T

(This quantity is also known as the entropy of set T)

)(inf)(inf)( ToToXgain X

:)(inf To

:)(inf ToX A sum of information of n subset after T has been partitioned according to n values of X

)(inf)(inf1

i

n

i

iX To

T

TTo

Back

Page 29: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 29

Which subject is the best classifier?

Method 2: (in case of a lot of outcome – many values of the target subject)

)(log)(inf 21 T

T

T

TXosplit i

n

i

i

)(inf

)()(

Xosplit

XgainXratiogain

Where:

It present potential information generated by dividing T into n subsets, whereas the information gain measures the information relevant to classification from the same division.

Back

Page 30: Indian Institute Of Information Technology, Allahabad Student Grade Prediction Intelligent System Under guidance of Dr. Sudip Sanyal IIIT-Allahabad Student.

Indian Institute Of Information Technology, Allahabad 30

Error based pruning

• Using the normal approximation to the binomial distribution function with upper bound to calculate the predicted error for sub-tree and leaves.

• If predicted error of the leave is smaller than the sub-tree’s error then pruning tree by replacing the sub-tree with that leaf

Where: - U(E,N): upper bound;- N: number of classification

cases;- p=E/N: error rate on training

data;

- E: number of error; - z-value is then obtained through the z-table;

n

ppzpU NE

)1( 112/)/(

levelConfident:

p of E/N estimate theofdeviation Standard:)1( 11

n

pp

knk ppk

nkEp

1)(1

back