There is No Deadline - Time Evolution of Wikipedia Discussions
-
Upload
andreas-kaltenbrunner -
Category
Technology
-
view
429 -
download
0
Transcript of There is No Deadline - Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions
There is No Deadline - Time Evolution ofWikipedia Discussions
Andreas Kaltenbrunner David Laniado
Social Media Research Group,Barcelona Media,Barcelona, Spain
August 28th, 2012WikiSym ’12, Linz, Austria
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions
Outline
1 IntroductionMotivationDataset
2 PeaksPeak Detection AlgorithmPeak Statistics
3 Growth measureDiscussion ComplexityGrowth in Complexity
4 Conclusions
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Motivation Dataset
Outline
1 IntroductionMotivationDataset
2 PeaksPeak Detection AlgorithmPeak Statistics
3 Growth measureDiscussion ComplexityGrowth in Complexity
4 Conclusions
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Motivation Dataset
Motivation
Wiki means quick in HawaiianHow to study the speed with which an article changes?First choice would the number of edits per time unit.
But the larger an article becomes ...
more of its generative process happens in talk pages.⇒ Looking at the associated discussion is often the mosteffective way to understand the editing process.
Research questionsWhat is the relationship between discussion and edits?How frequent are spikes of activity?How fast do discussions grow, and for how long?Which are the fastest discussions?
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Motivation Dataset
Outline
1 IntroductionMotivationDataset
2 PeaksPeak Detection AlgorithmPeak Statistics
3 Growth measureDiscussion ComplexityGrowth in Complexity
4 Conclusions
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Motivation Dataset
Dataset Dump of March 12th, 2010
Co-evolution of comments and editsnum
ber
of edits
number of comments and edits per day
2003 2004 2005 2006 2007 2008 2009 20100
2.5
5
7.5
10
12.5
15x 10
4
edits
comments
0
1500
3000
4500
6000
7500
9000
num
ber
of com
ments
Jan−1 Feb Mar Apr May Jun Jul Ago Sep Oct Nov Dec−1 Dec−310.5
0.75
1
1.25
1.5x 10
5
2007
num
ber
of edits
zoom on the year 2007
3000
4500
6000
7500
9000
num
ber
of com
ments
6 comments per 100 editsAndreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Motivation Dataset
Example for a single articleActivity is less synchronised
200720082009
2010Peaks in the discussion and edit activity of the article "Barack Obama"
Jan−01 Feb Mar Apr May Jun Jul Ago Sep Oct Nov Dec−01 Dec−310
100
200
300
0
100
200
300
0
100
200
300
0
100
#comments per day#edits per day
0
100
200
300
0
100
200
300
0
100
200
300
0
100
#com
men
ts, #
edits
per
day
.
How to detect peaks?Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Peak Detection Algorithm Peak Statistics
Outline
1 IntroductionMotivationDataset
2 PeaksPeak Detection AlgorithmPeak Statistics
3 Growth measureDiscussion ComplexityGrowth in Complexity
4 Conclusions
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Peak Detection Algorithm Peak Statistics
How to detect peaks?Compare with median activity
200720082009
2010Peaks in the discussion and edit activity of the article "Barack Obama"
Jan−01 Feb Mar Apr May Jun Jul Ago Sep Oct Nov Dec−01 Dec−310
100
200
300
0
100
200
300
0
100
200
300
0
100
#comments per daymedian #comments during ± 2 weeks#edits per daymedian #edits during ± 2 weeks
0
100
200
300
0
100
200
300
0
100
200
300
0
100
#com
men
ts, #
edits
per
day
.
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Peak Detection Algorithm Peak Statistics
Peak if activity > c ·max(m(t), nmin) adapted from [Lehmann 2012]
m(t) . . . 4 weeks median, nmin . . . activity minimum, c . . . peak factor
200720082009
2010Peaks in the discussion and edit activity of the article "Barack Obama"
Jan−01 Feb Mar Apr May Jun Jul Ago Sep Oct Nov Dec−01 Dec−310
100
200
300
0
100
200
300
0
100
200
300
0
100
#comments per daymedian #comments during ± 2 weekscomment peaks#edits per daymedian #edits during ± 2 weeksedit peaks
0
100
200
300
0
100
200
300
0
100
200
300
0
100
#com
men
ts, #
edits
per
day
.
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Peak Detection Algorithm Peak Statistics
Edit and comment peaks do not always coincide ...and can be caused be endogenous or exogenous events
200720082009
2010Peaks in the discussion and edit activity of the article "Barack Obama"
Jan−01 Feb Mar Apr May Jun Jul Ago Sep Oct Nov Dec−01 Dec−310
100
200
300
0
100
200
300
0
100
200
300
0
100
#comments per daymedian #comments during ± 2 weekscomment peaks#edits per daymedian #edits during ± 2 weeksedit peaks
0
100
200
20−Jan−2009Pres. Inaguaration
0
50
100
150
200
250
300
35009 and 10−Mar−2009
Endogenous peak
0
100
200
09−Oct−2009Nobel Price Win
0
50
100
150
200 17−Mar−2008Endogenous peak
0
50
100
150
20004−Jun−2008
Nomination Win
0
50
100
150
200 29−Aug−2008 Official Nomiation
10−Oct−2008Endogenous peak
0
50
100
150
200
0
100
200
300
400 05−Nov−2008 Pres. Elections
0
50
100
150
200
15−Feb−2007Endogenous peak
0
50
100
150
200
11 and 12−Mar−2007Endogenous peak
0
100
200
300
0
100
200
300
0
100
200
300
0
100
#com
men
ts, #
edits
per
day
.
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Peak Detection Algorithm Peak Statistics
Outline
1 IntroductionMotivationDataset
2 PeaksPeak Detection AlgorithmPeak Statistics
3 Growth measureDiscussion ComplexityGrowth in Complexity
4 Conclusions
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Peak Detection Algorithm Peak Statistics
Peak Statistics with c = 5, nmin = 10, 2 580 discussion and 32 853 edit peaks
1 2 3 4 5 6 7 8 9 10 15 2010
0
101
102
103
104
number of peaks per article
num
ber
of art
icle
s
1198 articles with comment peaks
y~x−2.57
20681 articles with edit peaks
y~x−3.50
1 2 3 4 5 6 7 8 910
0
101
102
103
104
105
peak length (days)
num
ber
of peaks
2580 comment peaks
y~x−3.52
32853 edit peaks
y~x−3.97
100
101
102
103
100
101
102
103
time between consecutive peaks (in days)
nu
mb
er
of
tim
e in
terv
alls
comment peaks
y~x−1.41
edit peaks
y~x−1.36
In the entire datasetonly 12% of all comment peaks coincide with an edit peak27% when allowing one day of difference33.8% when allowing two.
Peaks in the discussion activity do not have to lead to peaks inthe editing activity as well.
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Peak Detection Algorithm Peak Statistics
Top 10 articles with most comment and edit peaksTitle #comment-peaks #edit-peaks
Intelligent design 15 2September 11 attacks 15 3Race and intelligence 14 5British Isles 11 0Main page 11 0Anarchism 10 12Catholic church 10 0Canada 10 0Transnistria 9 3New Anti-Semitism 9 0
Title #edit-peaks #com.-peaks
Uxbridge, Massachusetts 19 0Voodoo (D’Angelo album) 17 0List of World Wrestling Entertainment employees 16 3Super Smash Bros. Brawl 16 2Michael Jackson 16 1The Biggest Loser: Couples 2 16 0Roger Federer 15 0Rafael Nadal 15 0List of Barney & Friends episodes and videos 15 0Total Drama Action 15 0
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity
Outline
1 IntroductionMotivationDataset
2 PeaksPeak Detection AlgorithmPeak Statistics
3 Growth measureDiscussion ComplexityGrowth in Complexity
4 Conclusions
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity
How to measure the complexity of a Discussion?Discussion tree for article “Presidency of Barack Obama”
red→ root (the article)blue→ structural nodesgreen→ anonymouscommentsgrey→ registeredcomments
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity
Using the h-index of a discussion introduced in [Gómez 2008]
The h-index ...is a balanced depthmeasure.is the maximal numberh such that there are atleast h comments atlevel (depth) h, but noth + 1 comments atlevel h + 1.In other words thereare h sub-threads ofdepth at least h.
Example
h-index=3
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity
Outline
1 IntroductionMotivationDataset
2 PeaksPeak Detection AlgorithmPeak Statistics
3 Growth measureDiscussion ComplexityGrowth in Complexity
4 Conclusions
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity
Example for growth of discussions
Jan−2002 Jan−2003 Jan−2004 Jan−2005 Jan−2006 Jan−2007 Jan−2008 Jan−2009 Jan−201010
0
101
102
103
num
ber
of com
ments
George W. Bushsmoothed trend Bush
Barack Obama
smoothed trend Obama
Bill Clintonsmoothed trend Clinton
Jan−2002 Jan−2003 Jan−2004 Jan−2005 Jan−2006 Jan−2007 Jan−2008 Jan−2009 Jan−201010
0
101
102
103
104
105
num
ber
of com
ments
George W. Bush
Barack Obama
Bill Clinton
Can we use the h-index to measure this growth?
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity
Example for growth rate ∆h
Jan−2003 Jan−2004 Jan−2005 Jan−2006 Jan−2007 Jan−2008 Jan−20090
2
4
6
8
10
12
14
h−in
dex
George W. BushBarack ObamaBill Clinton George W. Bush
∆h =70.7 daysBarack Obama∆h =90.2 daysBill Clinton∆h =331.9 days
We define the growth rate ∆h asthe average time a discussion increases its h-index by one
∆h =th − t1h− 1
related to the inverse of the m-index proposed in [Hirsch 2005]
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity
Distribution of growth rates ∆hof all discussions with more than 1000 comments
1 10 100 10000
10
20
30
40
50
60
days
# di
scus
sion
s
∆h
Different growth rates
We find several orders of magnitude of different rates ofincrease in complexity of the discussions.
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity
The 15 fastest and slowest discussions (∆h and duration in days)
Title ∆h start date end date duration final h-index
Virginia Tech massacre 0.5 15-Apr-2007 20-Apr-2007 5 92009 flu pandemic 0.9 25-Apr-2009 30-Apr-2009 5 7Bronze Soldier of Tallinn 0.9 26-Apr-2007 02-May-2007 6 72009 Honduran constitutional crisis 1.0 27-Jun-2009 05-Jul-2009 8 8Seung-Hui Cho 1.0 16-Apr-2007 24-Apr-2007 8 82008 Mumbai attacks 1.0 26-Nov-2008 01-Dec-2008 5 6Israeli-occupied territories 1.2 22-Sep-2005 03-Oct-2005 11 10International status of Abkhazia and South 1.3 25-Aug-2008 04-Sep-2008 10 8Air France Flight 447 1.4 01-Jun-2009 08-Jun-2009 7 67 July 2005 London bombings 1.7 10-Jul-2005 15-Jul-2005 5 5State terrorism and the United States 1.7 15-Feb-2008 06-Mar-2008 20 13July 2009 Ürümqi riots 1.9 06-Jul-2009 21-Jul-2009 15 9Henry Louis Gates arrest controversy 2.0 24-Jul-2009 09-Aug-2009 16 9Teach the Controversy 2.6 11-Apr-2005 29-Apr-2005 18 8
Shakespeare authorship question 485.6 02-Jun-2003 24-Jan-2010 2428 7Karl Marx 487.6 19-Sep-2004 21-Jan-2010 1950 6Led Zeppelin 511.9 31-Jan-2003 03-Feb-2010 2560 6Vampire 517.1 19-Nov-2002 18-Jul-2008 2068 6World War II casualties 523.0 13-Sep-2004 29-Dec-2008 1568 4War on Terrorism 533.1 07-Oct-2005 22-Feb-2010 1599 6Fathers’ rights movement 546.0 07-Mar-2004 01-Sep-2008 1639 4Instant-runoff voting 546.3 09-Jul-2003 03-Jan-2008 1639 5Scientific method 553.5 15-Jun-2003 08-Jul-2009 2215 6France 566.5 13-Nov-2003 26-Jan-2010 2266 6Harry Potter 589.9 27-Nov-2002 02-Oct-2007 1770 6Anna Anderson 604.5 17-Mar-2004 09-Jul-2007 1209 3New York City 617.3 09-Dec-2003 03-Jan-2009 1852 5Pi 627.3 07-Dec-2002 20-Oct-2009 2509 6Christopher Columbus 1159.0 24-Oct-2003 27-Feb-2010 2318 5
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions
Conclusions and future workConclusions
Discussion and edit peaks occur mostly independently ofeach other.Both endogenous (Wikipedia internal) and exogenous(offline world) events can be the cause of such peaks.We have introduced a simple growth measure.Some discussions need only a few days to evolve, whilethe slowest go on over years.
Future workUse metrics for early detection of controversies.Apply metrics on sub-threads to detect hot spots.Assess discussion maturity.
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions
Questions?
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions
Bibliography I
Vicenç Gómez, Andreas Kaltenbrunner & Vicente López.Statistical analysis of the social network and discussion threads in Slashdot.In WWW ’08: Proceeding of the 17th international conference on World Wide Web, pages 645–654, NewYork, NY, USA, 2008. ACM.
J. E. Hirsch.An index to quantify an individual’s scientific research output.PNAS, vol. 102, no. 46, pages 16569–16572, 2005.
J. Lehmann, B. Gonçalves, J.J. Ramasco & C. Cattuto.Dynamical Classes of Collective Attention in Twitter.In Proc. of WWW, 2012.
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions