Post on 25-Mar-2020
1
TwitInfo: Aggregating and Visualizing Microblogsfor Event Exploration
Adam Marcus, Michael Bernstein, Osama Badar,David Karger, Sam Madden, Rob Miller
MIT CSAIL
2
Tell stories using Twitter
3
4
5
too much
too little
6
Identify events of interest
7
Identify events of interestLabel events
8
Identify events of interestLabel eventsAdd context: sentiment? location?
9
Identify events of interestLabel eventsAdd context: sentiment? location?Streaming
10
Identify events of interestLabel eventsAdd context: sentiment? location?Streaming
TwitInfo automates the process of telling stories using Twitter
11
12
13
14
TwitInfo
event detectionevent labeling
contextstreaming algorithms
15
TwitInfo
event detectionevent labeling
contextstreaming algorithms
sentiment correction algorithmevaluation: news consumers
evaluation: pulitzer prize journalist
16
[Andre et al. UIST2007] [Diakopoulos et al. VAST 2010][Diakopoulos+Shamma CHI2010] [Dork et al. InfoVis2010][Go et al. 2010] [Leskovec et al. KDD2009][Shamma et al. CSCW2010+2011]
17
[Andre et al. UIST2007] [Diakopoulos et al. VAST 2010][Diakopoulos+Shamma CHI2010] [Dork et al. InfoVis2010][Go et al. 2010] [Leskovec et al. KDD2009][Shamma et al. CSCW2010+2011]
18
[Andre et al. UIST2007] [Diakopoulos et al. VAST 2010][Diakopoulos+Shamma CHI2010] [Dork et al. InfoVis2010][Go et al. 2010] [Leskovec et al. KDD2009][Shamma et al. CSCW2010+2011]
19
[Andre et al. UIST2007] [Diakopoulos et al. VAST 2010][Diakopoulos+Shamma CHI2010] [Dork et al. InfoVis2010][Go et al. 2010] [Leskovec et al. KDD2009][Shamma et al. CSCW2010+2011]
TwitInfo is a streaming storytelling layer that complements these visualizations
20
ui walkthroughalgorithm designstudy/evaluation
discussion
21
22
23
24
25
26
27
ui walkthroughalgorithm designstudy/evaluation
discussion
28
29
Large
30
LargeRelatively
31
LargeRelatively
Streaming
32
LargeRelatively
Streamingmean deviation
33
LargeRelatively
Adaptive Streamingmean deviation
34
LargeRelatively
Adaptive StreamingExponentially weighted moving mean deviation
35
LargeRelatively
Adaptive Streaming
RFC 2988: ComputingTCP’s retransmission timer
Exponentially weighted moving mean deviation
36
LargeRelatively
Adaptive Streaming
RFC 2988: ComputingTCP’s retransmission timer
Exponentially weighted moving mean deviation
TF-IDF
37
LargeRelatively
Adaptive Streaming
RFC 2988: ComputingTCP’s retransmission timer
Exponentially weighted moving mean deviation
TF-IDF
2 datasets: earthquakes, soccer games
Precision: 80-100%Recall: 80-100%
38
39
40
41
42MAYBE CUT!
43
ui walkthroughalgorithm designstudy/evaluation
discussion
44
12 participants: four female/8 malefirst half: directed tasks
second half: 5 min write article (soccer, Obama)
how do they use the interface?what can they learn from TwitInfo?
45
“After the peace talks, Obama traveled to the ASEAN conference and to NATO to work on issues in those parts of the world. He then spent the week dedicated to domestic economic issues. First he proposed a research tax break, then a $50 billion investment in infrastructure, then the issue came up about whether he should keep some tax breaks that Bush had implemented, and he’s asking for some tax breaks from business, and these are generating some controversy because [. . . ]”
Participant 6:
46
47
Everyone
48
EveryoneSplit
49
EveryoneSplit
NewsJunkies
50
earthquakes
51
“my thoughts and prayers go out to everyone in that country. I hope...”
earthquakes
52
Journalist Interview
53
Journalist Interview
backgrounding
54
Journalist Interview
backgrounding eyewitnesses
55
ui walkthroughalgorithm designstudy/evaluation
discussion
56
appropriate algorithms for the medium
57
appropriate algorithms for the medium
other uses for streaming algorithms/signal processing
58
appropriate algorithms for the medium
other uses for streaming algorithms/signal processing
sentiment: caution!
59
appropriate algorithms for the medium
other uses for streaming algorithms/signal processing
sentiment: caution!
mixed-initiative storytelling
60
TwitInfo
http://twitinfo.csail.mit.edu
marcua@csail.mit.edu / @marcua
Adam Marcus, Michael Bernstein, Osama Badar,David Karger, Sam Madden, Rob Miller
61
62
Twitter data for fun and stories
63
Twitter data for fun and stories
How can we tell stories with the tweet stream?
64
Twitter data for fun and stories
How can we tell stories with the tweet stream?
How can we extract data from the tweet stream?
65
Twitter data for fun and stories
How can we extract data from the tweet stream?
TwitInfo
How can we tell stories with the tweet stream?
66
Twitter data for fun and stories
How can we extract data from the tweet stream?
TwitInfo
TweeQL
How can we tell stories with the tweet stream?
67
TwitInfo
How can we tell stories with the tweet stream?
68
TwitInfo
How can we tell stories with the tweet stream?
69
TwitInfo Uses
Automatically identify eventse.g., goals, earthquakes
70
TwitInfo Uses
Automatically identify eventse.g., goals, earthquakes
Backgrounding e.g., Obama's last two weeks
71
TwitInfo Uses
Automatically identify eventse.g., goals, earthquakes
Backgrounding e.g., Obama's last two weeks
Identifying sources on the grounde.g., interviewing earthquake survivors
72
Cautionary Tweet TalesSentiment is tricky e.g., “My warmest prayers go out to the people of Christchurch.”
73
Cautionary Tweet Tales
Location is evasive
Sentiment is tricky e.g., “My warmest prayers go out to the people of Christchurch.”
74
Cautionary Tweet Tales
Location is evasive
Sentiment is tricky e.g., “My warmest prayers go out to the people of Christchurch.”
Spam is everywhere
75
TwitInfo
Automated event detection makes tweet-basedstory-telling possible
76
TwitInfo
Automated event detection makes tweet-basedstory-telling possible
Interfaces tell a story, people add context
77
TwitInfo
Automated event detection makes tweet-basedstory-telling possible
Interfaces tell a story, people add context
78
TweeQL
How can we extract data from the tweet stream?
79
TweeQL
“It's a balmy 89°C in Phoenix”
location=Phoenix, temperatureC=89
How can we extract data from the tweet stream?
80
TweeQL
“It's a balmy 89°C in Phoenix”
location=Phoenix, temperatureC=89
“I'm starting to dig Obamacare!”
topic=Obama, sentiment=positive
How can we extract data from the tweet stream?
81
Twitter data hacking is hard
● Learn the API● Transform data● Stuff it into a database
82
Twitter data hacking is hard
● Learn the API● Transform data● Stuff it into a database
Ad-hoc data processing
83
Twitter data hacking is hard
● Learn the API● Transform data● Stuff it into a database
Ad-hoc data processing
84
TweeQL extracts data from tweets as they pass through the stream
85
TweeQL extracts data from tweets as they pass through the stream
Data extractione.g., location, sentiment, temperature, opencalais
86
TweeQL extracts data from tweets as they pass through the stream
Data extractione.g., location, sentiment, temperature, opencalais
SQL-like queriese.g., SELECT location, text FROM twitter
87
TweeQL extracts data from tweets as they pass through the stream
Data extractione.g., location, sentiment, temperature, opencalais
SQL-like queriese.g., SELECT location, text FROM twitter
stream, not table
88
TweeQL demo*nerd alert*
89
TweeQL's other features
● Aggregation● Outlier detection● Joins/mashups with other sources
90
91
Unstructureddata
92
+
Structuredqueries
Unstructureddata
93
+
Structuredqueries
=> Structureddata
Unstructureddata
94
+
Structuredqueries
=> Structureddata
Meaningfulvisualizations
=>
Unstructureddata
95
Thanks!
● TweeQL: http://github.com/marcua/tweeql● TwitInfo: http://twitinfo.csail.mit.edu/● Ask me about Mechanical Turk + Journalism!
Adam Marcus@marcuamarcua@csail.mit.eduhttp://people.csail.mit.edu/marcua
96
97
98
99
TweeQL Lessons Learned
● Geographic limitations● Requires grungy regular expressions● Data is not necessarily relational
100
101
102
103
104
105
Breadth
Depth
106
107
108
Challenges
109
Challenges
Event annotation
110
Challenges
Event annotation
Streaming
111
Challenges
Event annotation
StreamingSentiment
112
Challenges
Event annotation
StreamingLocationSentiment