COMP 4332 / RMBI 4330 Big Data Mining (Spring 2015) Lei Chen Hong Kong University of Science and...

9
COMP 4332 / RMBI 4330 Big Data Mining (Spring 2015) Lei Chen Hong Kong University of Science and Technology [email protected] http://www.cs e.ust.hk/~leichen

Transcript of COMP 4332 / RMBI 4330 Big Data Mining (Spring 2015) Lei Chen Hong Kong University of Science and...

Page 1: COMP 4332 / RMBI 4330 Big Data Mining (Spring 2015) Lei Chen Hong Kong University of Science and Technology leichen@cse.ust.hk ://leichen.

COMP 4332 / RMBI 4330Big Data Mining (Spring 2015)

Lei ChenHong Kong University of Science and Technology

[email protected]

http://www.cse.ust.hk/~leichen

Page 2: COMP 4332 / RMBI 4330 Big Data Mining (Spring 2015) Lei Chen Hong Kong University of Science and Technology leichen@cse.ust.hk ://leichen.

Topics• Review of Basics

• Practical Data Mining– Imbalanced Data– Text and Web Mining– Big Data– Social Recommendation– Social Media and Social Networks

• Hands on: 2 Major Projects

• Student Presentations

112/04/19 Course Introduction 2

Page 3: COMP 4332 / RMBI 4330 Big Data Mining (Spring 2015) Lei Chen Hong Kong University of Science and Technology leichen@cse.ust.hk ://leichen.

Outcome and Objective

• Student will know the current state of the art in Data Mining

• Student will be able to implement a practical data mining project

• Student will be able to present their ideas well

• Prepared for PG study, Internship, etc.

112/04/19 Course Introduction 3

Page 4: COMP 4332 / RMBI 4330 Big Data Mining (Spring 2015) Lei Chen Hong Kong University of Science and Technology leichen@cse.ust.hk ://leichen.

Projects: based on KDDCUPs

• Project 1:– KDDCUPs on credit rating and customer

retention (KDDCUP 2009)

• Project 2:– Micro-blog (Weibo) User Recommendation

(KDDCUP 2012)

• Project 3 (Optional): KDDCUP 2013

112/04/19 Course Introduction 4

Page 5: COMP 4332 / RMBI 4330 Big Data Mining (Spring 2015) Lei Chen Hong Kong University of Science and Technology leichen@cse.ust.hk ://leichen.

112/04/19 Course Introduction 55

KDDCUP Examples— KDDCUP from past years

— 2007:

— Predict if a user is going to rate a movie?

— Predict how many users are going to rate a movie?

— 2006:

— Predict if a patient has cancer from medical

images

— 2005:

— Given a web query (“Apple”), predict the

categories (IT, Food)

— 1998:

— Given a person, predict if this person is

going to donate money

— In general, we wish to

— Input: Data

— Output:

— Build model

— Apply model to future data

Page 6: COMP 4332 / RMBI 4330 Big Data Mining (Spring 2015) Lei Chen Hong Kong University of Science and Technology leichen@cse.ust.hk ://leichen.

112/04/19 Course Introduction 6

Important Sites

Course Web Site http://course.cse.ust.hk/comp4332

TA: Yue Wang [email protected] Assignment Hand-in: CASS

Page 7: COMP 4332 / RMBI 4330 Big Data Mining (Spring 2015) Lei Chen Hong Kong University of Science and Technology leichen@cse.ust.hk ://leichen.

112/04/19 Course Introduction 7

Prerequisites

Statistics and Probability would help, But will be reviewed in class

Machine Learning/Pattern Recognition would help, We will review some most important algorithms

One programming language We will teach new languages in the tutorial

Page 8: COMP 4332 / RMBI 4330 Big Data Mining (Spring 2015) Lei Chen Hong Kong University of Science and Technology leichen@cse.ust.hk ://leichen.

112/04/19 Course Introduction 8

Grading

Assignments: 20% Course Projects: 60% Presentations: 10% Term Paper: 10%

Page 9: COMP 4332 / RMBI 4330 Big Data Mining (Spring 2015) Lei Chen Hong Kong University of Science and Technology leichen@cse.ust.hk ://leichen.

112/04/19 Course Introduction 9

More info

• Textbooks:– Listed on Course Website– Buy them online if you wish