Procrastinators CS340

20
+ PROCRASTINATORS Hacer Tilbeç Beyzanur Koçak Süha Kağan Köse

Transcript of Procrastinators CS340

Page 1: Procrastinators CS340

+

PROCRASTINATORSHacer TilbeçBeyzanur KoçakSüha Kağan Köse

Page 2: Procrastinators CS340

+ Economy Researcher

Page 3: Procrastinators CS340

+ What Economist are Talking About ?

Page 4: Procrastinators CS340

+

Page 5: Procrastinators CS340

+

FetchingTweets

Filtering Tweets with

TR NLPTopic

Modeling with LDA

Visualize with D3

Page 6: Procrastinators CS340

+

Page 7: Procrastinators CS340

+

Page 8: Procrastinators CS340

+

Page 9: Procrastinators CS340

+

Page 10: Procrastinators CS340

+

Page 11: Procrastinators CS340

+ NATURAL LANGUAGE PROCESS

Page 12: Procrastinators CS340

+NLP API EXAMPLE

Page 13: Procrastinators CS340

+NLP API EXAMPLE

Page 14: Procrastinators CS340

+

TOPIC MODELINGa technique for analyzing the topics present in

collection of documents.

Page 15: Procrastinators CS340

+What is LDA?

Latent Dirichlet Allocation(DLA) is a topic model that generates topics based on word frequency from a set of documents.

LDA is particularly useful for finding reasonably accurate mixtures of topics within a given document set.

Page 16: Procrastinators CS340

+What are topics?

LDA is not given topics!

LDA infers topics from raw text as a distribuiton over words.

Page 17: Procrastinators CS340

+Example

This data set consists of 20000 messages taken from 20 newsgroups.The articles are typical postings and thus have headers including subject

lines, signiture files, and quoted portions of other articles.

Sports space exploration computers

Page 18: Procrastinators CS340

+Another Example

4.5 million Wikipeida articles

Page 19: Procrastinators CS340

+

LDA with Spark

Mlib.Clustering

Page 20: Procrastinators CS340

+