CS190 Part 2: The Social WebCS190 Part 2: The Social...

44
CS190 Part 2: The Social Web CS190 Part 2: The Social Web Online Social Network Analysis

Transcript of CS190 Part 2: The Social WebCS190 Part 2: The Social...

Page 1: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

CS190 Part 2: The Social WebCS190 Part 2: The Social Web

Online Social Network Analysis

Page 2: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Facebook (last) project: hints

• Simplify, simplify, simplify!p y, p y, p y

• (Show simplified Footprints example)

• Reminder: send, by e-mail ([email protected]):– Application URL pp

• e.g.: http://apps.facebook.com/emory-hello-world/– 1 paragraph description of what your app does– Any additional information that would help me evaluate itAny additional information that would help me evaluate it

• Project presentations: Tuesday, April 28th

– W303 @ 1pm. Pizza will be served.

Page 3: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Today: Last ThoughtsToday: Last Thoughts

• Wrap –up: Online social mediaWrap up: Online social media

• Information Diffusion and Expertise

Page 4: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Sun’s Java ForumSun s Java Forum

Page 5: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Constructing an Expertise Network

Page 6: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

roles: automatically inferring expertise in Question/Answer forumsin Question/Answer forums

a fragment of Sun’s Java Forum

Zhang

Page 7: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Expertise PairingExpertise Pairing

Page 8: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

political blogs are among the most read

Top 10 Technorati 2005/05/24

The most authoritative blogs, ranked by the number of sources that link to each blog.

1. Boing Boing: A Directory of Wonderful Things 22,532 links from 14,623 sourcesg g y g , ,2. Instapundit.com 15,190 links from 10,425 sources3. Daily Kos 15,833 links from 9,509 sources4. Gizmodo 12,278 links from 9,259 sources5 Drew Curtis' FARK com 10 216 links from 9 121 sources5. Drew Curtis FARK.com 10,216 links from 9,121 sources6. Engadget - www.engadget.com. 15,051 links from 7,869 sources7. Davenetics* Pop + Media + Web 7,571 links from 7,408 sources8. Eschaton 8,713 links from 6,279 sources9. dooce 6,797 links from 5,990 sources

10. www.AndrewSullivan.com - Daily Dish 7,680 links from 5,916 sources

Page 9: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

The larger political blogosphere

Results– 91% of links point to blog of same persuasion – Conservative blogs show greater tendency to link

• 82% of conservative blogs linked to at least once; 84% link to at least one h blother blog

• 67% of liberal blogs are linked to at least once; 74% link to at least one other blog

• Both sides reciprocate ~ 25% of linksBoth sides reciprocate 25% of links• Clustering coefficient (3 x # triangles/number of connected triples)

0.20 for conservatives, 0.31 for liberals -> “left more cliquish?”

– But when non-linking blogs are excluded, average # of outgoinglinks/blog is about the same for both

Page 10: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Different rankings produce similar A-listsDifferent rankings produce similar A lists

Page 11: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

1 DigbysBlog

Citations between blogs in their posts

2 JamesWalcott3 Pandagon4 blog.johnkerry.com5 OliverWillis6 AmericaBlog7 Crooked Timber8 Daily Kos9 A i P t(A) A) all citations between A-list

in their posts(Aug 29th – Nov 15th, 2004)

1 23

4 567

21

22 2324

2526

27

9 AmericanProspect10Eschaton11Wonkette12TalkLeft13Political Wire14Talking Points Memo15Matthew Yglesias16W hi t M thl

(A) A) all citations between A-list blogs in 2 months preceding the 2004 election

78

910 11

1213

1415

16

1718

19

26 2829 30

31 32

3334 35 36

39

16Washington Monthly17MyDD18JuanCole19Left Coaster20Bradford DeLong

21 JawaReport22VokaPundit

B) citations between A-list blogs with at least 5 citations in both directions

19

2037 38 39

40

22VokaPundit23Roger LSimon24Tim Blair25Andrew Sullivan26 Instapundit27Blogsfor Bush28 LittleGreenFootballs29Belmont Club

(B)C) edges further limited to

those exceeding 25 combined citations

29Belmont Club30Captain’sQuarters31Powerline32 HughHewitt33 INDCJournal34RealClearPolitics35Winds ofChange36Allahpundit only 15% of the 36Allahpundit37MichelleMalkin38WizBang39Dean’sWorld40Volokh(C)

citations bridge communities

Page 12: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

1 23

211 23

211 23

2121 JawaReport22 Vodka Pundit23 Roger L Simon24 Tim Blair

1 Digby’s Blog2 James Walcott3 Pandagon4 blog.johnkerry.com

4 567

910 11

22 2324

25 26

27

2829 30

34 567

810 11

22 2324

25 26

27

2829 30

34 567

810 11

22 2324

25 26

27

2829 30

25 Andrew Sullivan26 Instapundit27 Blogs for Bush28 LittleGreenFootballs29 Belmont Club30 Captain’s Quarters

5 Oliver Willis6 America Blog7 Crooked Timber8 Daily Kos9 American Prospect10 Eschaton 9 11

1213

14151718

29 30

31 32

3335 36

910 11

1213

14151718

29 30

31 32

3335

90 11

1213

14151718

29 30

31 32

3335

30 Captain s Quarters31 Powerline32 Hugh Hewitt33 INDC journal34 Real Clear Politics35 Winds of Change

10 Eschaton11 Wonkette12 Talk Left13 Political Wire14 Talking Points Memo15 Matthew Yglesias

1619

20

3334 35 36

37 38 39

40

1618

19

20

3334 35 36

37 38 39

1619

20

3334 35 36

37 38 39

40

g36 Allahpundit37 Michelle Malkin38 Wizbang39 Dean’s World40 Volokh

g16 Washington Monthly17 MyDD18 Juan Cole19 Left Coaster20 Bradford DeLong 20 4020 4020 40

Page 13: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Notable examples of blogs breaking a story

1. Swiftvets.com anti-Kerry video– Bloggers linked to this in late July, keeping accusations alivegg y, p g– Kerry responded in late August, bringing mainstream media coverage

2. CBS memos alleging preferential treatment of Pres. Bush during the Vietnam Warthe Vietnam War– Powerline broke the story on Sep. 9th, launching flurry of discussion– Dan Rather apologized later in the month

3 “W B h Wi d?”3. “Was Bush Wired?”– Salon.com asked the question first on Oct. 8th, echoed by Wonkette &

PoliticalWire.comMSM f ll th t d– MSM follows-up the next day

Page 14: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Liberals and conservatives differ in the topics they discuss

Discussion of “forged documents”

35

20

25

30

post

s

10

15

20

# w

eblo

g p

RightLeft

0

5

004

004

004

004

004

004

004

004

004

004

004

8/29

/20

9/5/

20

9/12

/20

9/19

/20

9/26

/20

10/3

/20

10/1

0/20

10/1

7/20

10/2

4/20

10/3

1/20

11/7

/20

Date

Page 15: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Political figures being discussed

59% of the mentions of Kerry are by right leaning blogs53% of the mentions of Bush are by left leaning blogs

Page 16: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Mainstream media bias(links from 1,400 blog set)( , g )

Page 17: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Insights from the political blogosphere

Liberal and conservative blogs are balanced in numbers and tendto link primarily to their own communities

Conservative blogs are more likely to include links to other blogson their pages, and their A-list blogs reference one another more frequently

Liberal and conservative blogs tend to discuss different things but oneLiberal and conservative blogs tend to discuss different things, but oneis not more ‘coherent’ than the other

Different news sources are favored by differently leaning blogs

Easier to criticize opponents than support one’s own position

Page 18: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Mainstream media cited about once every other post from the A-list bloggers

(6 762 ti f th l ft 6 364 f th i ht)(6,762 times from the left, 6,364 from the right)

Page 19: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Why We SearchWhy We SearchEytan Adar

University of Washington

May 12 2007May 12, 2007

Dan Weld, Brian Bershad, and Steve Gribble

Page 20: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Power in predictionPower in prediction• Based on blogs can we figure out which ad words to

buy?buy?• Based on event on TV can we gauge online response?• What kind of news events do groups respond to? HowWhat kind of news events do groups respond to? How

do they respond?• Integrate other behavioral data

– Purchase habits– Brand awareness

Et– Etc.

Page 21: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Power in predictionPower in prediction• Can we understand what events

impact/predict/correlate online behavior?impact/predict/correlate online behavior?– Who responds to an event?– When do they respond?When do they respond?– How much? – Why do they respond?

• Attention as a resource– Indicator for other investments

Page 22: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Daily lives– “Information side

effects”AttentionAttention

– searches, mentions, news, votes, etc.

Page 23: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Searches about newsabout news

Blog posts about news

timePredictive, Correlated

EventResponse 1

EventResponse 2

Page 24: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Suntan lotion saleslotion sales

Sunshine

timePredictive CausalPredictive, Causal

Event Response 1 Response 2p p

Page 25: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

AgendaAgenda

• Transform text & behavioral Unstructured Source D t

data to more useful form

• Infrastructure to compare diff t b h i l d t

Data

Conversion / Data CleaningConversion / Data Cleaning

different behavioral data

• Analysis & visualization technique to compare

Time Series

Model BuildingModel Buildingq pbehaviors over time

• Some observations

Model BuildingModel Building

Models

Time Series Analysis Algorithms

Time Series Analysis Algorithms

P di tiPredictions

Page 26: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

iraq war

X 15M (MSN Logs)X 12.2M (AOL Logs) May ‘06

iraq war iraq war

iraq war

iraq war

As % of all queries (in that period)

iraq war

iraq war

iraq war

iraq war

iraq war

iraq war

iraq war

iraq war

iraq war

May 1, 2007

00:00 AM

May 1, 2007

00:10 AM

May 1, 2007

00:20 AM

June 1, 2006

00:00 AM…

Query Event Stream (QES)

Page 27: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

X 14M Posts

% of blog posts that mention

hphrase

May 1, 2007 May 2, 2007 May 3, 2007 May 31, 2006…

Page 28: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Inlinks to stories

X 13K Articles from CNN/BBC

Inlinks to stories

% of news articles that

X 13K Articles from CNN/BBC

mention phrase * number of

inlinks

May 1, 2007

00:00 AM

May 1, 2007

00:10 AM

May 1, 2007

00:20 AM

June 1, 2006

00:00 AM…

Page 29: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

X 2.5K ShowsShows (TV.com)

% of episodes that mentionthat mention

phrase * number of votes

May 1, 2007 May 2, 2007 May 3, 2007 May 31, 2006…

Page 30: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Phrases/Queries TopicsPhrases/Queries Topics

• We want to know that “britney spears” is the same y pas – “spears britney” or just – “britney”britney

• Solution: look at clicks and results– ~1M queries from MSN logs that appear 2+ times– Overlapping clicks/result sets indicate relatedness of

queries (similarity measure)• Naïve clusteringg

– Query Event Stream (QES) Topic Event Stream (TES)

Page 31: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Experimental SetExperimental Set

• We take the 3638 most frequent queries fromWe take the 3638 most frequent queries from MSN– AOL: 3627 (> 99%)( )– BLOG: 1975 (54%)– NEWS: 1704 (47%)– TV: 1602 (44%)

• Compare topic A in one set to topic A in p p panother– Limits spurious correlations

Page 32: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

CorrelationsCorrelations

• Do we even have a chance?

2 2

( ( ) )*( ( ) )( )

( ( ) ) ( ( ) )

x i x y i d yr d

x i x y i d y

− − −=

− − −∑∑ ∑

r

d• Equivalent to convolution• Try for some delay range, d, find max value

– Negative/Positive correlations

d0

g /

Page 33: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Delays (high correlation)Distribution of Delays from MSN, Correlations >= .7

Delays (high correlation)

0.12

0.14AOL (906)

BLOGS (965)

BLOG NEWS (478)38% are

0.06

0.08

0.1

nt o

f Top

ics BLOG-NEWS (478)

NEWS (427)

TV (305)

at 0

0.02

0.04

0.06

Perc

en

0

-31 -26 -21 -16 -11 -6 -1 4 9 14 19 24 29

DelayDelay

Page 34: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Max-correlation delay = 3 hours

time

SameSame correlations + delays, but very different yshapes

Page 35: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

How do we compare these?

Visual summary of differences?

Page 36: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Some FindingsSome Findings

• Randomly selected some topics and labeledRandomly selected some topics and labeled them– People places events news etc– People, places, events, news, etc.

• So why do we search? Or blog? Or react to news?

Page 37: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

1) News of the Weird1) News of the Weird

• Bloggers pick up on “weird” stories firstBloggers pick up on weird stories first

• igor vovkovinskiy

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Blog* Search (MSN)*

• uss oriskany

M a y

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

M a y

Blog Search (MSN)

*Curves normalized to max value for readability

Page 38: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Blogs lead versus lag in the newsBlogs lead versus lag in the news

al gore movie american eagleg gbush border an american hauntingcreative aol gameselliott yamin australian minersgeorgia marriage law cedar pointhalo 3 countrywidehalo 3 countrywidehanso foundation duke caseigor vovkovinskiy enron trialkeith richards high school musicallillian gertrud asplund new orleans jazz fest

h th h dmary cheney over the hedge

Page 39: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

2) Anticipated Events2) Anticipated Events

• Pressure to be newPressure to be new– Bloggers don’t talk about anticipated events

lSearch (MSN) BlogSearch (MSN)

• TV Shows

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

M a y

Page 40: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

3) Familiarity Breeds Contempt3) Familiarity Breeds Contempt

• We get tired of certain kinds of newsWe get tired of certain kinds of news

• Takes a really big spike for us to get excited

• enron trialenron trial

Search (MSN)

News

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

M a y

Page 41: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

4) Correlation vs. Causation4) Correlation vs. Causation

• poseidonp

TVSearch (MSN)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

• Both responsd to movie release, but one to marketing and one to satire

M a y

and one to satire• Need other, more specific, data streams to infer

causation

Page 42: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Google: Predicting the PresentGoogle: Predicting the Present

• http://www google org/flutrends/http://www.google.org/flutrends/

• http://www.google.com/insights/search/

Page 43: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

SummarySummary

• The Web is increasingly Social– Models of Information Diffusion and Expertisep

• Mirror of SocietyPeople’s Interest Reflect Reality (and the future)– People s Interest Reflect Reality (and the future)

Page 44: CS190 Part 2: The Social WebCS190 Part 2: The Social Webeugene/teaching/CS190_s09/lectures/...political blogs are among the most read Top 10 Technorati 2005/05/24 The most authoritative

Reminder: SHORT Final PaperReminder: SHORT Final Paper– Due: Wednesday, May 6th

– Maximum length: 4 pages • Use standard single space format, font no smaller than

10pt.10pt.

– Sample topics:• Does web search advertising work?

(Challenges/advantages over “traditional” advertising• (online) Social network formation( )• Social vs. Traditional Media for News Reporting• Contagion and spread of technology in online networks• “Wisdom of crowds” on the webWisdom of crowds on the web• Privacy challenges in web search and social networks• …