Earthquake shakes twitter users real-time event detection by social sensors
-
Upload
mike-mayer -
Category
Technology
-
view
2.173 -
download
1
Transcript of Earthquake shakes twitter users real-time event detection by social sensors
Earthquake Shakes Twitter Users
Real-time Event Detection by Social Sensors
@MikeMayer
@MikeMayer
What is Twitter?• Twitter is categorized as a
microblogging service.
• Twitter users post small blurbs of text that are 140 characters or less called tweets.
• With url shorteners and services tailored for Twitter a lot of information can be conveyed in that small space.
• Twitter is very free-form and still ways to categorize tweets have emerged. (hashtags)Fusion Search
@MikeMayer
How is Twitter useful as a sensor?
Twitter users will often report their status, however relevant or irrelevant, to the interest of others
This means that the public timeline is full of noise
The timeline is updated in real-time, faster than a blog, faster than a “static” document
Tweets are faster than traditional news and users select from a buffet of other users to customize their news
However, if the tweets are carefully selected there can be a great deal of useful information found
Tweets contain a great deal of metadata
@MikeMayer Source: http://www.readwriteweb.com/archives/this_is_what_a_tweet_looks_like.php
JSON representation of a single Tweet
@MikeMayer
“Each Twitter user is a sensor and each Tweet is sensory information”
Of course context must be considered… more on that soon
A bag of words approach isn’t good enough for detecting earthquakes “My dryer is shaking like crazy” “Didn’t they used to have a ride at carnivals called
Earthquake?”
The paper suggests a machine learning approach to determining the context
@MikeMayer
Event DetectionThe primary focus of the paper is to determine the
means to detect an event using so called social sensors
Events are “arbitrary classifications of space/time regions”
Targeted events are natural occurrences (weather, earthquakes, etc.) and human made (traffic jams, crime, etc.)
@MikeMayer
Semantic Analysis for Tweets
As said before, a bag of words is simply not good enough
To detect and target events they use a SVM (support vector machine), a widely used machine-learning algorithm
They classify Tweets into three componentsA. Statistical features (number of words…)
B. Keyword features
C. Word context features (words around a “query word”)
@MikeMayer
Support Vector MachineSupport vector machines (SVMs) are a set of related
supervised learning methods used for classification and regression. In simple words, given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.1
Very mathy- basically a way to classify data better
1. http://en.wikipedia.org/w/index.php?title=Special:Cite&page=Support_vector_machine&id=361629294
@MikeMayer
Tweets as sensory values
Assumption 1 – “Each twitter user is regarded as a sensor…” Twitter has over 100 million users1
That’s enough sensors to make up for the ones not operating correctly (asleep, tweeting gibberish, busy doing something else…)
Assumption 2 – “Each tweet is associated with a time and location…” The location is the most fundamental requirement for tweets
as a sensor
1. http://economictimes.indiatimes.com/infotech/internet/Twitter-snags-over-100-million-users-eyes-money-making/articleshow/5808927.cms
@MikeMayer
ModelingTemporal Model
Every Tweet has a created_at chunk of data
Using probability the paper describes a way to detect the probability of an event occuring
Spatial Model
Tweets considered in this system require geolocation information
The spatial model is far more complicated
Need to consider time and a delay as event spreads (earthquake)
@MikeMayer
Spatial Model ContinuedKalman Filters
The paper describes an application of Kalman filters to model two cases:1. Location estimate of
earthquake center
2. Trajectory estimation of a typhoon
Particle Filters
Using Twitter user geographic distribution
Generate a set of coordinates and sort them by weight
Resample and generate a new set, predict new sets, weigh the sets, measure, then iterate until convergence
@MikeMayer
Twitter problems that affect statistical analysis
Sensors are not independent of each other
One user will see another user’s tweets then can re-post them or re-tweet them
Some of the algorithms described before would be more accurate if the sensors were independent
@MikeMayer
Experimentation and Evaluations
Finally they describe their experimentation methodology and evaluate their findings
First, their algorithm:
1. Given a set of query terms G for a target event
2. Issue a query every s seconds and obtain tweets T
3. For each tweet obtain the features A,B, and C that were described earlier
4. Calculate the probability of occurrence using the SVM
5. For each tweet estimate its location based on the coordinates given or by querying Google Maps with the registered location of the user
6. Calculate the estimated distance from the Tweet to the event
@MikeMayer
Semantic Analysis Evaluation
It turns out that the most important part of a Tweet is not the context of the words (C) nor is the content (B) it is in fact the statistical property (A)
During an event users are surprised and send very short messages
“Earthquake!”
@MikeMayer
Spatial Estimation EvaluationThe Kalman filter did a poor job at filtering out the
noise in determining the probable location of the event
It was difficult to locate events that were in sparsely populated areas as well as events that are surrounded in water
In a naïve and straightforward way they mention that the number of sensors provide the most accurate positioning of an event
@MikeMayer
Conclusions 1 I’ve thought that using Twitter as a sensor was an
interesting idea for months.
The first thing my mom does when there is an earthquake is run to her laptop and Tweet “EARTHQUAKE #socal”
This paper is too mathematical for me to fully grasp in the short time given
@MikeMayer
Conclusions 2I found this fascinating:
The fastest that an event was detected accurately was 19 seconds.
The accuracy they managed was very impressive.
@MikeMayer
Discussion TimeQuestions?
Otherwise… onto the required points…
@MikeMayer
Discussion 11. What the paper is about?
Using Twitter (Tweets) as a sensor
2. What is the major contribution? Showing that accuracy is possible
3. What did you like best? The way the paper actually ended with positive results
4. What are the weaknesses (according to you)? Generally they accomplished what they set out to do but
it was very limited in scope (Japan). It could have also been applied to many more types of events.
@MikeMayer
Discussion 21. What is the difference between a document, blog, and a
micro-blog in the context of search systems?
2. Tweets are considered to represent real time information. Is that right? What are its implications for News?
3. What is a target event? How are tweets related to that?
4. What is the goal of the system discussed in this paper? Do you think they are successful in their goal?
5. Describe a particle filter. What does it do generally? How is it used in this paper?
@MikeMayer
Discussion 36. What is a support vector machine? Why is it needed in this
system?
7. Human Sensors is an increasingly popular concept. Why do you think this is important? Give three examples where this could be effective.
8. Discuss the system. How does it help? What are the critical steps in this algorithm?
9. This paper talks about Kalman Filter and Particle Filter. What is the difference between these two? Do we need both or just one? If you are developing an application to detect location of an accident based on tweets – which one will you use?
10. How has this paper changed your ideas of Twitter?
@MikeMayer
Thank You.Follow me on Twitter if you want…
Personal: @MikeMayer
Public: @MikeMayerDev
@MikeMayer