Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

22
Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms Eduardo Graells-Garrido Web Research Group Universitat Pompeu Fabra Barcelona, Spain Mounia Lalmas Yahoo Labs London, UK Hypertext Sept. 4, 2014 Santiago, Chile

description

Presented at Hypertext 2014.

Transcript of Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Page 1: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Balancing Diversity to Counter-measure Geographical Centralization in Microblogging

Platforms

Eduardo Graells-GarridoWeb Research GroupUniversitat Pompeu FabraBarcelona, Spain

Mounia LalmasYahoo LabsLondon, UK

HypertextSept. 4, 2014

Santiago, Chile

Page 2: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Motivation: Geographical Centralization

Every person behaves in a biased way (homophily, selective exposure, etc.) in both physical and virtual worlds.

Does the same happen with systematic biases?

Chile is a centralized country - public policy, population migration and media are biased towards its capital. This is increasing the population imbalance, and vice versa!

Page 3: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Some Effects of Geographical Centralization

This affects Web users as content is not geographically diverse (mostly related to/from Santiago). Content from other locations is hidden and hard to find.

(I was at WWW when I searched for this. “Everywhere” displays relevant tweets from Santiago only.)

Page 4: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Problem Statement

Detect and Measure Geographical CentralizationIs centralization reflected on micro-blogging platforms?

Tweet Classification into LocationsHow to find tweets from other locations in imbalanced contexts?

[Rout et al, HT 2013] studied geolocation in imbalanced populations from a network perspective. We follow a similar approach from a content perspective.

Information Filtering - Geo. Diverse TimelineHow to build a geographically diverse timeline?

We build upon the work of others based on information diversity filtering. [De Choudhury et al, HT 2011] and [Munson et al, ICWSM 2009]

Page 5: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Case Study: Chile, Municipal Elections 2012Is Geographical Centralization Reflected on Twitter?

Page 6: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Frequent Terms

Page 7: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Dataset: #municipales2012

Locally Important Denser network discussionsLocal vocabulary (classification)

National LevelInteractions between locations

Query Keywordshashtags, tenses of to-vote, candidate names, political institutions, locations

Using self-reported location, 27,95% of users is geolocated at regional level. They published 42,15% of tweets in dataset.

Ideal characteristics, but there is a need to classify tweets.

Page 8: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Physical and Virtual Population Distributions

. We consider the sample geographically representative.

r = 0.95, p < 0.01Source: Census 2012*

r = 0.68, p < 0.01Source: CASEN Survey

Imbalanced Population(Different Orders of Magnitude)

Balanced Representation (Equal Orders of Magnitude)

Page 9: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Is the Chilean Virtual Population in Twitter centralized towards the capital Metropolitan Region?

Page 10: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Interactions Between Locations

Adjacency Matrix of 1-way interactions. [Quercia et al, 2012]

M(i,j) = mentions(Li, Lj) + retweets(Li, Lj)

Each arc in the visualization represents a M(i,j). Li is on the left, Lj on the right.

Green edges indicate i = j.Brown edges indicate j = Santiago

(RM).The rest is gray.

Page 11: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Geographical Centralization

We explain the extreme differences between observations and expectations as geographical centralization towards Santiago (Metropolitan Region)

Observed CentralityEstimated from a graph based on M.

Expected CentralityEstimated from a graph with edge weights based on location populations.

Page 12: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

How to make timelines more Geographically Diverse?

Shannon Entropy with respect to geography

Page 13: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

First: Classifying Tweets into Locations with Diversity

We built a corpus of location documents.For classification we consider a tweet as a vector of cosine similarities with each location document, weighted using TF-IDF. We evaluate with 10-fold cross-validation.

Similarity features provide more geographical diversity (lost because of population imbalance) and are overall more accurate than bag of words approaches.

Similarity Features

BOW Features

Page 14: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

We iteratively add tweets to a timeline T. Each added tweet maximizes T’s information entropy [Choudhury et al, 2011], but we enforce geographical diversity of those additions [Munson et al, 2009].

Second: Filtering Tweets to build a Geo. Diverse Timeline

Page 15: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Empirical Observationselection results start to appear!

unexpected results in some location! discussion becomes a bit more global. in all cases, geographical diversity exists.

Proposed Method is more geographically diverse than baselines:DIV [Choudhury et al, HT 2011]POP: top-k popular tweets

in terms of social voting, PM has more representation of popular tweets than DIV.

Page 16: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Overview of Results

Is centralization reflected on micro-blogging platforms?Yes! As with other behavioral biases (homophily, selective exposure), the systematic bias of geographical centralization is also present and is measurable.

How to find tweets from other locations?Consider imbalance-aware features, such as content similarity metrics. This improves diversity of classifications without losing accuracy.

How to build a geographically diverse timeline?A correct mixture of known techniques can have the desired effects without trade-offs! (gained representation of popularity, did not lose info. diversity)In contrast to sensitive contexts where selective exposure is crucial, geographical diversity is less likely to generate cognitive dissonance.

Page 17: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Future Work

User Evaluationis geographical diversity interesting?

Visualization and User Interfacesis geographical diversity engaging?

Page 18: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Questions?

Thanks for attending!

Contact@carnby

http://carnby.github.io

Special ThanksDany Passarinho, Bárbara Poblete, Diego Sáez-Trumper and Anonymous Reviewers

This work was partially funded by Grant TIN2012-38741 (Understanding Social Media: An Integrated Data Mining Approach) of the Ministry of Economy and Competitiveness of Spain.

https://www.flickr.com/photos/malikaladak/8868491759https://www.flickr.com/photos/28047774@N04/6312764345

https://www.flickr.com/photos/iron_horses/6274365371https://www.flickr.com/photos/efimeravulgata/1429969601

Page 19: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Additional Data :)

Page 20: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms
Page 21: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms
Page 22: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms