Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Post on 23-Aug-2014

412 views 5 download

Tags:

description

Slides about mining cross-domain ratings presented at the WWW 2014 conference on April 8, in Seoul (Korea) by Simon Dooms.

Transcript of Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Mining Cross-Domain Rating Datasets from Structured Data on Twitter

@sidoomsSimon Dooms

Rating Datasets

What are ratings? Explicit user preference information

Why ratings? Recommender systems

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 2

Rating Datasets

What are ratings? Explicit user preference information

Why ratings? Recommender systems

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 3

Ratings Scarcity in Research

Ratings = private data Public datasets to the rescue?– MovieLens 100K (1998)– MovieLens 1M (2000)– MovieLens 10M (2008)– More on recsyswiki.com

Old, Synthetic Datasets

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 4

Social Sharing = Ratings Goldmine

Previous research: MovieTweetings

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 5

Social Sharing = Ratings Goldmine

Previous research: MovieTweetings– Movie Rating dataset from IMDb – Twitter– https://github.com/sidooms/MovieTweetings

What about other domains? Websites?

Well, let’s try it out!

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 6

Target Websites - GoodreadsConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 7

Twitter user - Rating - Book titleBook author - Goodreads URL - Time

Target Websites - PandoraConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 8

Twitter user - SongPandora URL - Time

Target Websites - YouTubeConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 9

Twitter user - (Video uploader)YouTube URL - Time

Mining Experiment

But words are wind…– 2 Weeks experiment– 4 Online platforms

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 10

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 12

Python code + Task Scheduler = Dataset fileshttps://github.com/sidooms/Twitter-ratings

The Numbers

One more thing …

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 13

Cross-Domain Rating DatasetConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 14

Applications

Collect ratings for recsys research / input Cross-domain recsys research Trend detection, analytics, ... Applicable for all social sharing webs

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 15

Conclusions

Ratings scarcity in research Public dataset are old and synthetic Social sharing = ratings goldmine 2 week experiment, 4 major websites Python code & datasets on Github True cross-domain ratings dataset

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 16

@sidoomsSimon Dooms

Mining Cross-Domain Rating Datasets from Structured Data on Twitter