KDD CUP 2007

17
2007_12_31 KDD CUP 2007 Neural Network HW2 KDD CUP 2007 Neural Network HW2 Group 14 Yu Szu-Hsien (M9609208) Ciou Yun-Rong(M9608305)

description

KDD CUP 2007. Neural Network HW2 Group 14. Yu Szu-Hsien (M9609208) Ciou Yun-Rong(M9608305). How? (method & system). 1.  Make into a matrix. From analyzing the film types that the customers has rated, we can predict the customers’ rating on the other films in the same type. - PowerPoint PPT Presentation

Transcript of KDD CUP 2007

Page 1: KDD CUP 2007

2007_12_31 KDD CUP 2007 Neural Network HW2

KDD CUP 2007Neural Network HW2

Group 14

Yu Szu-Hsien (M9609208)

Ciou Yun-Rong(M9608305)

Page 2: KDD CUP 2007

Group 14 HW 2

How?

(method & system)

Page 3: KDD CUP 2007

Group 14 HW 2

1.  Make into a matrix

From analyzing the film types that the customers has rated, we can predict the customers’ rating on the other films in the same type.

Page 4: KDD CUP 2007

Group 14 HW 2

This problem takes the data in an enormous database as a basis.

The rating series of every customer imply the personality, favorite and time interval.

Every movie can compile statistics, and it is appraised that how many customers have rated in different time, regarded as time series.

Every customer can compile statistics, and it is appraised that what user rated, regarded as time series.

2. The characteristics of the problem

Page 5: KDD CUP 2007

Group 14 HW 2

Similarity measures

Use Poisson regression

Clustering analysis

Association rule

Random forests

Collaborative filtering method (group filter or social filtering)

Singular value decomposition (SVD)

Methods → How to find the similar films and similar users?

Page 6: KDD CUP 2007

Group 14 HW 2

<Weka> : multilayer perceptron (MLP) Data mining software in Java

<MATLAB> : backpropagation The language of technical computing

<MS SQL 2005> : clustering A comprehensive, integrated data management and analysis s

oftware

System

Page 7: KDD CUP 2007

Group 14 HW 2

Result (training & test set)

Page 8: KDD CUP 2007

Group 14 HW 2

“ Out of memory!! ”-- The dataset size is too large.

Not enough eigenvalues of the dataset.

What are the valuable eigenvalues we really need?

Which algorithm should be used?

Difficulty confronted

Page 9: KDD CUP 2007

Group 14 HW 2

Training & Test set

Downsize the dataset : Grouping by their eigenvalues (using SQL) Sampling from the groups for training

Make the sampled dataset into a matrix

Train in the tool : Weka, MATLAB

Evaluate the accuracy by RMSE

Page 10: KDD CUP 2007

Group 14 HW 2

The Sketch

Page 11: KDD CUP 2007

Group 14 HW 2

SQL Server

Page 12: KDD CUP 2007

Group 14 HW 2

MATLAB(1/2)

Page 13: KDD CUP 2007

Group 14 HW 2

MATLAB(2/2)

(# Training Data = 10040, Test Data = 42)

Page 14: KDD CUP 2007

Group 14 HW 2

Weka

(# Training Data = 118, Test Data = 13)

Page 15: KDD CUP 2007

Group 14 HW 2

Analysis (why)

Page 16: KDD CUP 2007

Group 14 HW 2

Analysis

<Weka> We regard the data as a matrix of the movies and users

• Defect: enormous matrix

Solution: classify the movies or users first

Minimum of the wrong rate: multilayer perceptron neural number& training times

<MATLAB> Not enough eigenvalue (only one eigenvalue about movie classif

ication) We will find more eigenvalue about the dependence among the

movie and customer (use SVD)

Page 17: KDD CUP 2007

Group 14 HW 2