KDD CUP 2007

2007_12_31 KDD CUP 2007 Neural Network HW2

KDD CUP 2007Neural Network HW2

Group 14

Yu Szu-Hsien (M9609208)

Ciou Yun-Rong(M9608305)

Group 14 HW 2

How?

(method & system)

Group 14 HW 2

1. Make into a matrix

From analyzing the film types that the customers has rated, we can predict the customers’ rating on the other films in the same type.

Group 14 HW 2

This problem takes the data in an enormous database as a basis.

The rating series of every customer imply the personality, favorite and time interval.

Every movie can compile statistics, and it is appraised that how many customers have rated in different time, regarded as time series.

Every customer can compile statistics, and it is appraised that what user rated, regarded as time series.

2. The characteristics of the problem

Group 14 HW 2

Similarity measures

Use Poisson regression

Clustering analysis

Association rule

Random forests

Collaborative filtering method (group filter or social filtering)

Singular value decomposition (SVD)

Methods → How to find the similar films and similar users?

Group 14 HW 2

<Weka> : multilayer perceptron (MLP) Data mining software in Java

<MATLAB> : backpropagation The language of technical computing

<MS SQL 2005> : clustering A comprehensive, integrated data management and analysis s

oftware

System

Group 14 HW 2

Result (training & test set)

Group 14 HW 2

“ Out of memory!! ”-- The dataset size is too large.

Not enough eigenvalues of the dataset.

What are the valuable eigenvalues we really need?

Which algorithm should be used?

Difficulty confronted

Group 14 HW 2

Training & Test set

Downsize the dataset : Grouping by their eigenvalues (using SQL) Sampling from the groups for training

Make the sampled dataset into a matrix

Train in the tool : Weka, MATLAB

Evaluate the accuracy by RMSE

Group 14 HW 2

The Sketch

Group 14 HW 2

SQL Server

Group 14 HW 2

MATLAB(1/2)

Group 14 HW 2

MATLAB(2/2)

(# Training Data = 10040, Test Data = 42)

Group 14 HW 2

Weka

(# Training Data = 118, Test Data = 13)

Group 14 HW 2

Analysis (why)

Group 14 HW 2

Analysis

<Weka> We regard the data as a matrix of the movies and users

• Defect： enormous matrix

Solution： classify the movies or users first

Minimum of the wrong rate： multilayer perceptron neural number＆ training times

<MATLAB> Not enough eigenvalue (only one eigenvalue about movie classif

ication) We will find more eigenvalue about the dependence among the

movie and customer (use SVD)

Group 14 HW 2

KDD CUP 2007

Documents

Transcript of KDD CUP 2007