Finding Missing Tweets using Topic Structure and Browsing Time

Finding Missing Tweetsusing Topic Structure and Browsing Time

Yu Suzuki†, Hiromitsu Ohara‡, Akiyo Nadamoto‡

† Nara Institute of Science and Technology, Japan‡ Konan University, Japan

5. December, 2017

Introduction

From Social Network Services (SNSs), there are massive volumes ofmessages.Users are not always on-line.

Users miss important information on SNSs.c.f.) A function on twitter “While you were away.” The structure ofsummarization is flat.

Users need to understand in a short time about the topics while theusers are off-line.

A mechanism of summarizing the tweets is useful.

We believe that when we summarize the tweets as a tree structure, theusers can easily understand the topics.

Summarize Tweets Using Topic Structure and Browsing Time

Introduction

Why we consider topic structure?missing tweets topic sub topic

Today’s baseball game is exciting! baseball gameYesterday I went to baseball stadium baseball place

I’m at Salzburg! travel austriaI’m at baseball stadium! baseball place

· · · · · · · · ·

Tweets with minority topics are ignored if we summarize missing tweets.Missing tweets are mainly related to “baseball.”Only one tweet is related to “travel.”If these tweets are summarized without using topics, the tweet about travelmay not be appeared at the summary.

We visualize this topics of tweets as a tree structure.First, the users see top-level topics, such as “baseball” and “travel.”if the users are interested in “baseball,” the users browse “game” and “place.”Users do not miss a tweet about travel.

How to construct the topic structure?

Introduction

Our contribution

1 Generate topic structures of tweets using the Wikipedia category treeand browsing time

We use Wikipedia category as a knowledge to construct tree structure.We use browsing time as a tweets which users miss.

2 Visualize the topic structure of tweets using a network graphWe implement our method using Web application.

3 Confirm using real dataset that our proposed method is effective forcommonly known topics

Our method is effective if there are many information about the theme.Wikipedia only have articles about commonly known topics.

Our Proposed Method

Overview

2. Generate a Topic Graph

Wikipedia Category Tree

Tweets

1. Clustering of Tweets

C0 = Ichiro C1 = Masahiro

C3 = Human ➡ deletetoo wide to cover topics

Ichiro Masahiro

MLB playerSportsJapanese

Topic node: a parent node ofTweet clusters

3. Visualization

Ichiro Masahiro

Japanese MLB Player

Tweet listNow three of the greatest hitters in Major League history in one dugout with the Marlins. Barry Bonds, Ichiro and Don Kelly. amazing.

Joe Girardi discusses Masahiro Tanaka pitching on extended rest after Tuesday night's 9-0 victory.

Baseball

Sports

Basketball

Mariners

Players

Abstract node: a parent node of

topic nodes

Topic Graph

tweets correspond to

category

about Ichiro

about Masahiro

Our Proposed Method

Overview

1 Extracting missing tweet: Extracting which tweets are submittedduring user’s browsing time and it is before and after.

2 Clustering Tweets into Categories and extracting topics: UsingRepeated Bisection as clustering tools, we divide a set of tweets intoclusters and extract topics in each cluster.

3 Generate a topic graph: Using the topics of tweets and the Wikipediacategory tree, we generate a topic graph of the tweets.

4 Classify topics Classify the topics which are nodes of the topic graphas known topics and unknown topics.

5 Visualization of topic graphs: We visualize the topic graph and thecorresponding tweets using our implemented Web user interface.

Our Proposed Method

0. Extraction of missing tweets

We extract tweets which users have not browse.We assume that the browsing time is given.Browsing time may be available if we construct twitter client applications.

Our Proposed Method

1. Clustering tweets

Tweets

1. Clustering of Tweets

C0 = Ichiro C1 = Masahiro

C3 = Human ➡ deletetoo wide to cover topics

We use repeated-bisection for clustering tweets.In our experiment, repeated-bisection is the most effective method forclustering short texts.Similar to k -means.

We remove noise clusters.We calculate the cosine similarity between each two texts in a cluster.We remove the nodes if the similarity is beyond the threshold.

Our Proposed Method

1. Clustering tweets

Repeated bisectionGiven a set of tweets T , we extract a feature vector for each tweet. First, wedivide a tweet into the terms using morphological analysis or POS tagger.Then, we select noun and unknown terms as feature terms. The reason ofusing unknown terms is that these terms consist of slang and newly inventedwords which are not recognized by the morphological analysis. To clean thefeature terms, we select the terms which are included in more than twotweets. Feature vector f (ti) of tweet ti (ti ∈ T ) is defined as follows.

f (ti) = [tf (ti ,w1) · idf (w1), tf (ti ,w2) · idf (w2), · · · ,tf (ti ,wm) · idf (wm)] (1)

tf (ti ,wj) =

1 if wk appears at ti

more than once0 else

idf (wj) = − logdf (wj)

|T | (3)

where wj is a term in T , |T | is the number of tweets in T , tf (ti ,wj) indicateswhether wj appears at ti or not, df (wj) is the number of tweets which have wj ,and idf (wj) is an IDF (Inverted Document Frequency) value of wj where adocument is a tweet.

Our Proposed Method

2. Topic graph

2. Generate a topic graph

2. Generate a Topic Graph

Ichiro Masahiro

MLB playerSportsJapanese

Topic node: a parent node ofTweet clusters

Abstract node: a parent node of

topic nodes

tweets correspond to

Finding Missing Tweets using Topic Structure and Browsing Time

Technology

Transcript of Finding Missing Tweets using Topic Structure and Browsing Time

Ted Tweets

Mining Tweets

ASSOCIATIVE BROWSING

Oklahoma Tweets

Lead Researcher, Microsoft Research Chair Professor at ... · –Human as a sensor: User generated content (check in, photos, tweets) • Loose control and unreliable data missing

Safe Browsing

Trustworthy Browsing

NAGTRI Web Browsing 101 - final version Web...Web Browsing 101 NAGTRI Webinar Series NCJRL / NAAG Web Browsing Topics

Semantic browsing

SUccess tweets

Genome Browsing

Safer browsing

Computer Security - Hawaii Community Collegehawaii.hawaii.edu/sites/default/files/assets/security/docs/... · Computer Security There are some ... to your Tweets ... Your browsing

Tweets Classification

Tweets Tweets & replies 5,732 - Internet Archive

Pricing Tweets

Internet browsing

SUBBING Tweets

Priceless Tweets!

Analyzing Tweets