Earthquake shakes twitter users real-time event detection by social sensors

Earthquake Shakes Twitter Users

Real-time Event Detection by Social Sensors

@MikeMayer

@MikeMayer

What is Twitter?• Twitter is categorized as a

microblogging service.

• Twitter users post small blurbs of text that are 140 characters or less called tweets.

• With url shorteners and services tailored for Twitter a lot of information can be conveyed in that small space.

• Twitter is very free-form and still ways to categorize tweets have emerged. (hashtags)Fusion Search

@MikeMayer

How is Twitter useful as a sensor?

Twitter users will often report their status, however relevant or irrelevant, to the interest of others

This means that the public timeline is full of noise

The timeline is updated in real-time, faster than a blog, faster than a “static” document

Tweets are faster than traditional news and users select from a buffet of other users to customize their news

However, if the tweets are carefully selected there can be a great deal of useful information found

Tweets contain a great deal of metadata

@MikeMayer Source: http://www.readwriteweb.com/archives/this_is_what_a_tweet_looks_like.php

JSON representation of a single Tweet

@MikeMayer

“Each Twitter user is a sensor and each Tweet is sensory information”

Of course context must be considered… more on that soon

A bag of words approach isn’t good enough for detecting earthquakes “My dryer is shaking like crazy” “Didn’t they used to have a ride at carnivals called

Earthquake?”

The paper suggests a machine learning approach to determining the context

@MikeMayer

Event DetectionThe primary focus of the paper is to determine the

means to detect an event using so called social sensors

Events are “arbitrary classifications of space/time regions”

Targeted events are natural occurrences (weather, earthquakes, etc.) and human made (traffic jams, crime, etc.)

@MikeMayer

Semantic Analysis for Tweets

As said before, a bag of words is simply not good enough

To detect and target events they use a SVM (support vector machine), a widely used machine-learning algorithm

They classify Tweets into three componentsA. Statistical features (number of words…)

B. Keyword features

C. Word context features (words around a “query word”)

@MikeMayer

Support Vector MachineSupport vector machines (SVMs) are a set of related

supervised learning methods used for classification and regression. In simple words, given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.1

Very mathy- basically a way to classify data better

1. http://en.wikipedia.org/w/index.php?title=Special:Cite&page=Support_vector_machine&id=361629294

@MikeMayer

Tweets as sensory values

Assumption 1 – “Each twitter user is regarded as a sensor…” Twitter has over 100 million users1

That’s enough sensors to make up for the ones not operating correctly (asleep, tweeting gibberish, busy doing something else…)

Assumption 2 – “Each tweet is associated with a time and location…” The location is the most fundamental requirement for tweets

as a sensor

1. http://economictimes.indiatimes.com/infotech/internet/Twitter-snags-over-100-million-users-eyes-money-making/articleshow/5808927.cms

@MikeMayer

ModelingTemporal Model

Every Tweet has a created_at chunk of data

Using probability the paper describes a way to detect the probability of an event occuring

Spatial Model

Tweets considered in this system require geolocation information

The spatial model is far more complicated

Need to consider time and a delay as event spreads (earthquake)

@MikeMayer

Spatial Model ContinuedKalman Filters

The paper describes an application of Kalman filters to model two cases:1. Location estimate of

earthquake center

2. Trajectory estimation of a typhoon

Particle Filters

Using Twitter user geographic distribution

Generate a set of coordinates and sort them by weight

Resample and generate a new set, predict new sets, weigh the sets, measure, then iterate until convergence

@MikeMayer

Twitter problems that affect statistical analysis

Sensors are not independent of each other

One user will see another user’s tweets then can re-post them or re-tweet them

Some of the algorithms described before would be more accurate if the sensors were independent

@MikeMayer

Experimentation and Evaluations

Finally they describe their experimentation methodology and evaluate their findings

First, their algorithm:

1. Given a set of query terms G for a target event

2. Issue a query every s seconds and obtain tweets T

3. For each tweet obtain the features A,B, and C that were described earlier

4. Calculate the probability of occurrence using the SVM

5. For each tweet estimate its location based on the coordinates given or by querying Google Maps with the registered location of the user

6. Calculate the estimated distance from the Tweet to the event

@MikeMayer

Semantic Analysis Evaluation

It turns out that the most important part of a Tweet is not the context of the words (C) nor is the content (B) it is in fact the statistical property (A)

During an event users are surprised and send very short messages

“Earthquake!”

@MikeMayer

Spatial Estimation EvaluationThe Kalman filter did a poor job at filtering out the

noise in determining the probable location of the event

It was difficult to locate events that were in sparsely populated areas as well as events that are surrounded in water

In a naïve and straightforward way they mention that the number of sensors provide the most accurate positioning of an event

@MikeMayer

Conclusions 1 I’ve thought that using Twitter as a sensor was an

interesting idea for months.

The first thing my mom does when there is an earthquake is run to her laptop and Tweet “EARTHQUAKE #socal”

This paper is too mathematical for me to fully grasp in the short time given

@MikeMayer

Conclusions 2I found this fascinating:

The fastest that an event was detected accurately was 19 seconds.

The accuracy they managed was very impressive.

@MikeMayer

Discussion TimeQuestions?

Otherwise… onto the required points…

@MikeMayer

Discussion 11. What the paper is about?

Using Twitter (Tweets) as a sensor

2. What is the major contribution? Showing that accuracy is possible

3. What did you like best? The way the paper actually ended with positive results

4. What are the weaknesses (according to you)? Generally they accomplished what they set out to do but

it was very limited in scope (Japan). It could have also been applied to many more types of events.

@MikeMayer

Discussion 21. What is the difference between a document, blog, and a

micro-blog in the context of search systems?

2. Tweets are considered to represent real time information. Is that right? What are its implications for News?

3. What is a target event? How are tweets related to that?

4. What is the goal of the system discussed in this paper? Do you think they are successful in their goal?

5. Describe a particle filter. What does it do generally? How is it used in this paper?

@MikeMayer

Discussion 36. What is a support vector machine? Why is it needed in this

system?

7. Human Sensors is an increasingly popular concept. Why do you think this is important? Give three examples where this could be effective.

8. Discuss the system. How does it help? What are the critical steps in this algorithm?

9. This paper talks about Kalman Filter and Particle Filter. What is the difference between these two? Do we need both or just one? If you are developing an application to detect location of an accident based on tweets – which one will you use?

10. How has this paper changed your ideas of Twitter?

@MikeMayer

Thank You.Follow me on Twitter if you want…

Personal: @MikeMayer

Public: @MikeMayerDev

@MikeMayer

Earthquake shakes twitter users real-time event detection by social sensors

Technology

Transcript of Earthquake shakes twitter users real-time event detection by social sensors