CS190 Part 2: The Social WebCS190 Part 2: The Social...
Transcript of CS190 Part 2: The Social WebCS190 Part 2: The Social...
CS190 Part 2: The Social WebCS190 Part 2: The Social Web
Online Social Network Analysis
Facebook (last) project: hints
• Simplify, simplify, simplify!p y, p y, p y
• (Show simplified Footprints example)
• Reminder: send, by e-mail ([email protected]):– Application URL pp
• e.g.: http://apps.facebook.com/emory-hello-world/– 1 paragraph description of what your app does– Any additional information that would help me evaluate itAny additional information that would help me evaluate it
• Project presentations: Tuesday, April 28th
– W303 @ 1pm. Pizza will be served.
Today: Last ThoughtsToday: Last Thoughts
• Wrap –up: Online social mediaWrap up: Online social media
• Information Diffusion and Expertise
Sun’s Java ForumSun s Java Forum
Constructing an Expertise Network
roles: automatically inferring expertise in Question/Answer forumsin Question/Answer forums
a fragment of Sun’s Java Forum
Zhang
Expertise PairingExpertise Pairing
political blogs are among the most read
Top 10 Technorati 2005/05/24
The most authoritative blogs, ranked by the number of sources that link to each blog.
1. Boing Boing: A Directory of Wonderful Things 22,532 links from 14,623 sourcesg g y g , ,2. Instapundit.com 15,190 links from 10,425 sources3. Daily Kos 15,833 links from 9,509 sources4. Gizmodo 12,278 links from 9,259 sources5 Drew Curtis' FARK com 10 216 links from 9 121 sources5. Drew Curtis FARK.com 10,216 links from 9,121 sources6. Engadget - www.engadget.com. 15,051 links from 7,869 sources7. Davenetics* Pop + Media + Web 7,571 links from 7,408 sources8. Eschaton 8,713 links from 6,279 sources9. dooce 6,797 links from 5,990 sources
10. www.AndrewSullivan.com - Daily Dish 7,680 links from 5,916 sources
The larger political blogosphere
Results– 91% of links point to blog of same persuasion – Conservative blogs show greater tendency to link
• 82% of conservative blogs linked to at least once; 84% link to at least one h blother blog
• 67% of liberal blogs are linked to at least once; 74% link to at least one other blog
• Both sides reciprocate ~ 25% of linksBoth sides reciprocate 25% of links• Clustering coefficient (3 x # triangles/number of connected triples)
0.20 for conservatives, 0.31 for liberals -> “left more cliquish?”
– But when non-linking blogs are excluded, average # of outgoinglinks/blog is about the same for both
Different rankings produce similar A-listsDifferent rankings produce similar A lists
1 DigbysBlog
Citations between blogs in their posts
2 JamesWalcott3 Pandagon4 blog.johnkerry.com5 OliverWillis6 AmericaBlog7 Crooked Timber8 Daily Kos9 A i P t(A) A) all citations between A-list
in their posts(Aug 29th – Nov 15th, 2004)
1 23
4 567
21
22 2324
2526
27
9 AmericanProspect10Eschaton11Wonkette12TalkLeft13Political Wire14Talking Points Memo15Matthew Yglesias16W hi t M thl
(A) A) all citations between A-list blogs in 2 months preceding the 2004 election
78
910 11
1213
1415
16
1718
19
26 2829 30
31 32
3334 35 36
39
16Washington Monthly17MyDD18JuanCole19Left Coaster20Bradford DeLong
21 JawaReport22VokaPundit
B) citations between A-list blogs with at least 5 citations in both directions
19
2037 38 39
40
22VokaPundit23Roger LSimon24Tim Blair25Andrew Sullivan26 Instapundit27Blogsfor Bush28 LittleGreenFootballs29Belmont Club
(B)C) edges further limited to
those exceeding 25 combined citations
29Belmont Club30Captain’sQuarters31Powerline32 HughHewitt33 INDCJournal34RealClearPolitics35Winds ofChange36Allahpundit only 15% of the 36Allahpundit37MichelleMalkin38WizBang39Dean’sWorld40Volokh(C)
citations bridge communities
1 23
211 23
211 23
2121 JawaReport22 Vodka Pundit23 Roger L Simon24 Tim Blair
1 Digby’s Blog2 James Walcott3 Pandagon4 blog.johnkerry.com
4 567
910 11
22 2324
25 26
27
2829 30
34 567
810 11
22 2324
25 26
27
2829 30
34 567
810 11
22 2324
25 26
27
2829 30
25 Andrew Sullivan26 Instapundit27 Blogs for Bush28 LittleGreenFootballs29 Belmont Club30 Captain’s Quarters
5 Oliver Willis6 America Blog7 Crooked Timber8 Daily Kos9 American Prospect10 Eschaton 9 11
1213
14151718
29 30
31 32
3335 36
910 11
1213
14151718
29 30
31 32
3335
90 11
1213
14151718
29 30
31 32
3335
30 Captain s Quarters31 Powerline32 Hugh Hewitt33 INDC journal34 Real Clear Politics35 Winds of Change
10 Eschaton11 Wonkette12 Talk Left13 Political Wire14 Talking Points Memo15 Matthew Yglesias
1619
20
3334 35 36
37 38 39
40
1618
19
20
3334 35 36
37 38 39
1619
20
3334 35 36
37 38 39
40
g36 Allahpundit37 Michelle Malkin38 Wizbang39 Dean’s World40 Volokh
g16 Washington Monthly17 MyDD18 Juan Cole19 Left Coaster20 Bradford DeLong 20 4020 4020 40
Notable examples of blogs breaking a story
1. Swiftvets.com anti-Kerry video– Bloggers linked to this in late July, keeping accusations alivegg y, p g– Kerry responded in late August, bringing mainstream media coverage
2. CBS memos alleging preferential treatment of Pres. Bush during the Vietnam Warthe Vietnam War– Powerline broke the story on Sep. 9th, launching flurry of discussion– Dan Rather apologized later in the month
3 “W B h Wi d?”3. “Was Bush Wired?”– Salon.com asked the question first on Oct. 8th, echoed by Wonkette &
PoliticalWire.comMSM f ll th t d– MSM follows-up the next day
Liberals and conservatives differ in the topics they discuss
Discussion of “forged documents”
35
20
25
30
post
s
10
15
20
# w
eblo
g p
RightLeft
0
5
004
004
004
004
004
004
004
004
004
004
004
8/29
/20
9/5/
20
9/12
/20
9/19
/20
9/26
/20
10/3
/20
10/1
0/20
10/1
7/20
10/2
4/20
10/3
1/20
11/7
/20
Date
Political figures being discussed
59% of the mentions of Kerry are by right leaning blogs53% of the mentions of Bush are by left leaning blogs
Mainstream media bias(links from 1,400 blog set)( , g )
Insights from the political blogosphere
Liberal and conservative blogs are balanced in numbers and tendto link primarily to their own communities
Conservative blogs are more likely to include links to other blogson their pages, and their A-list blogs reference one another more frequently
Liberal and conservative blogs tend to discuss different things but oneLiberal and conservative blogs tend to discuss different things, but oneis not more ‘coherent’ than the other
Different news sources are favored by differently leaning blogs
Easier to criticize opponents than support one’s own position
Mainstream media cited about once every other post from the A-list bloggers
(6 762 ti f th l ft 6 364 f th i ht)(6,762 times from the left, 6,364 from the right)
Why We SearchWhy We SearchEytan Adar
University of Washington
May 12 2007May 12, 2007
Dan Weld, Brian Bershad, and Steve Gribble
Power in predictionPower in prediction• Based on blogs can we figure out which ad words to
buy?buy?• Based on event on TV can we gauge online response?• What kind of news events do groups respond to? HowWhat kind of news events do groups respond to? How
do they respond?• Integrate other behavioral data
– Purchase habits– Brand awareness
Et– Etc.
Power in predictionPower in prediction• Can we understand what events
impact/predict/correlate online behavior?impact/predict/correlate online behavior?– Who responds to an event?– When do they respond?When do they respond?– How much? – Why do they respond?
• Attention as a resource– Indicator for other investments
Daily lives– “Information side
effects”AttentionAttention
– searches, mentions, news, votes, etc.
Searches about newsabout news
Blog posts about news
timePredictive, Correlated
EventResponse 1
EventResponse 2
Suntan lotion saleslotion sales
Sunshine
timePredictive CausalPredictive, Causal
Event Response 1 Response 2p p
AgendaAgenda
• Transform text & behavioral Unstructured Source D t
data to more useful form
• Infrastructure to compare diff t b h i l d t
Data
Conversion / Data CleaningConversion / Data Cleaning
different behavioral data
• Analysis & visualization technique to compare
Time Series
Model BuildingModel Buildingq pbehaviors over time
• Some observations
Model BuildingModel Building
Models
Time Series Analysis Algorithms
Time Series Analysis Algorithms
P di tiPredictions
iraq war
X 15M (MSN Logs)X 12.2M (AOL Logs) May ‘06
iraq war iraq war
iraq war
iraq war
As % of all queries (in that period)
iraq war
iraq war
iraq war
iraq war
iraq war
iraq war
iraq war
iraq war
iraq war
May 1, 2007
00:00 AM
May 1, 2007
00:10 AM
May 1, 2007
00:20 AM
June 1, 2006
00:00 AM…
Query Event Stream (QES)
X 14M Posts
% of blog posts that mention
hphrase
May 1, 2007 May 2, 2007 May 3, 2007 May 31, 2006…
Inlinks to stories
X 13K Articles from CNN/BBC
Inlinks to stories
% of news articles that
X 13K Articles from CNN/BBC
mention phrase * number of
inlinks
May 1, 2007
00:00 AM
May 1, 2007
00:10 AM
May 1, 2007
00:20 AM
June 1, 2006
00:00 AM…
X 2.5K ShowsShows (TV.com)
% of episodes that mentionthat mention
phrase * number of votes
May 1, 2007 May 2, 2007 May 3, 2007 May 31, 2006…
Phrases/Queries TopicsPhrases/Queries Topics
• We want to know that “britney spears” is the same y pas – “spears britney” or just – “britney”britney
• Solution: look at clicks and results– ~1M queries from MSN logs that appear 2+ times– Overlapping clicks/result sets indicate relatedness of
queries (similarity measure)• Naïve clusteringg
– Query Event Stream (QES) Topic Event Stream (TES)
Experimental SetExperimental Set
• We take the 3638 most frequent queries fromWe take the 3638 most frequent queries from MSN– AOL: 3627 (> 99%)( )– BLOG: 1975 (54%)– NEWS: 1704 (47%)– TV: 1602 (44%)
• Compare topic A in one set to topic A in p p panother– Limits spurious correlations
CorrelationsCorrelations
• Do we even have a chance?
2 2
( ( ) )*( ( ) )( )
( ( ) ) ( ( ) )
x i x y i d yr d
x i x y i d y
− − −=
− − −∑∑ ∑
r
d• Equivalent to convolution• Try for some delay range, d, find max value
– Negative/Positive correlations
d0
g /
Delays (high correlation)Distribution of Delays from MSN, Correlations >= .7
Delays (high correlation)
0.12
0.14AOL (906)
BLOGS (965)
BLOG NEWS (478)38% are
0.06
0.08
0.1
nt o
f Top
ics BLOG-NEWS (478)
NEWS (427)
TV (305)
at 0
0.02
0.04
0.06
Perc
en
0
-31 -26 -21 -16 -11 -6 -1 4 9 14 19 24 29
DelayDelay
Max-correlation delay = 3 hours
time
SameSame correlations + delays, but very different yshapes
How do we compare these?
Visual summary of differences?
Some FindingsSome Findings
• Randomly selected some topics and labeledRandomly selected some topics and labeled them– People places events news etc– People, places, events, news, etc.
• So why do we search? Or blog? Or react to news?
1) News of the Weird1) News of the Weird
• Bloggers pick up on “weird” stories firstBloggers pick up on weird stories first
• igor vovkovinskiy
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Blog* Search (MSN)*
• uss oriskany
M a y
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
M a y
Blog Search (MSN)
*Curves normalized to max value for readability
Blogs lead versus lag in the newsBlogs lead versus lag in the news
al gore movie american eagleg gbush border an american hauntingcreative aol gameselliott yamin australian minersgeorgia marriage law cedar pointhalo 3 countrywidehalo 3 countrywidehanso foundation duke caseigor vovkovinskiy enron trialkeith richards high school musicallillian gertrud asplund new orleans jazz fest
h th h dmary cheney over the hedge
2) Anticipated Events2) Anticipated Events
• Pressure to be newPressure to be new– Bloggers don’t talk about anticipated events
lSearch (MSN) BlogSearch (MSN)
• TV Shows
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
M a y
3) Familiarity Breeds Contempt3) Familiarity Breeds Contempt
• We get tired of certain kinds of newsWe get tired of certain kinds of news
• Takes a really big spike for us to get excited
• enron trialenron trial
Search (MSN)
News
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
M a y
4) Correlation vs. Causation4) Correlation vs. Causation
• poseidonp
TVSearch (MSN)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
• Both responsd to movie release, but one to marketing and one to satire
M a y
and one to satire• Need other, more specific, data streams to infer
causation
Google: Predicting the PresentGoogle: Predicting the Present
• http://www google org/flutrends/http://www.google.org/flutrends/
• http://www.google.com/insights/search/
SummarySummary
• The Web is increasingly Social– Models of Information Diffusion and Expertisep
• Mirror of SocietyPeople’s Interest Reflect Reality (and the future)– People s Interest Reflect Reality (and the future)
Reminder: SHORT Final PaperReminder: SHORT Final Paper– Due: Wednesday, May 6th
– Maximum length: 4 pages • Use standard single space format, font no smaller than
10pt.10pt.
– Sample topics:• Does web search advertising work?
(Challenges/advantages over “traditional” advertising• (online) Social network formation( )• Social vs. Traditional Media for News Reporting• Contagion and spread of technology in online networks• “Wisdom of crowds” on the webWisdom of crowds on the web• Privacy challenges in web search and social networks• …