Conducting Twitter Reserch

41
Conducting Twitter Research ASIST Webinar 12/2013 Kim Holmberg, PhD Statistical Cybermetrics Research Group University of Wolverhampton, UK (e) [email protected] (w3) http://kimholmberg.fi

description

An #ASIST webinar about conducting Twitter research; data collection, filtering, analysis and visualization

Transcript of Conducting Twitter Reserch

Page 1: Conducting Twitter Reserch

Conducting Twitter Research

ASIST Webinar 12/2013

Kim Holmberg, PhD Statistical Cybermetrics Research Group

University of Wolverhampton, UK

(e) [email protected]

(w3) http://kimholmberg.fi

Page 2: Conducting Twitter Reserch

Cascades, Islands, or Streams? Time, Topic, and Scholarly Activities in

Humanities and Social Science Research

Indiana University, Bloomington, USA University of Wolverhampton, UK Université de Montréal, Canada

Page 3: Conducting Twitter Reserch

Cascades, Islands, or Streams?

Integrate several datasets representing a broad range of scholarly activities Use methodological and data triangulation to explore the lifecycle of topics within and across a range of scholarly activities Develop transparent tools and techniques to enable future predictive analyses

Page 4: Conducting Twitter Reserch

I’m preparing slides for an #ASIST #webinar

Page 5: Conducting Twitter Reserch

DATA COLLECTION

Webometric Analyst, for data collection via Twitter’s API, data

cleaning and analysis http://lexiurl.wlv.ac.uk/

For detailed instructions visit

http://lexiurl.wlv.ac.uk/searcher/twitter.htm

Page 6: Conducting Twitter Reserch

DATA COLLECTION

Other data collection tools

Twitter Archiving Google Spreadsheet (TAGS) http://mashe.hawksey.info/2013/02/twitter-archive-tagsv5/

HootSuite http://hootsuite.com/ Or you can write your own script: https://dev.twitter.com/ http://140dev.com/free-twitter-api-source-code-library/twitter-database-server/

Page 7: Conducting Twitter Reserch

Tweet Retweet or RT @username

#Hashtag Tweeters

DATA COLLECTION

Content, trends

Networks, communities

Influence, popularity

Information dissemination

Time series, sentiment

Page 8: Conducting Twitter Reserch

DATA EXTRACTION

Use Webometric Analyst to sort the data and depending on your research goals, to extract URLs, hashtags or usernames or to remove

stopwords from the tweets

Page 9: Conducting Twitter Reserch

ETHICS

Data collected from social media sites is openly available on the web, hence it is already fully public and does not raise any ethical concerns (Wilkinson & Thelwall, 2011). However, in some cases the content of the tweets, blog entries or comments collected may contain identifiable, sensitive information. Although already public, publicizing such information by discussing it in an academic article could potentially have unwanted side-effects. Hence, one must consider to anonymise all data and treate it confidentially. Wilkinson, D. & Thelwall, M. (2011). Researching personal information on the public Web: Methods and ethics, Social Science Computer Review, vol. 29, no. 4, pp. 387-401.

Page 10: Conducting Twitter Reserch

What can we research? 1. Networks (users, words, topics, …) 2. Content (tweets, RTs, hashtags, …) 2

1

Page 11: Conducting Twitter Reserch

FIRST STEPS

Step 1. What do you want to research? Step 2. Collect tweets that are relevant for your research questions Step 3. Sort and clean the tweets (e.g. tweets vs. retweets, remove tweets in other languages, remove spam, remove false positives, ...) Step 4. Extract the data that you need (e.g. tweeters, usernames mentioned, hashtags, URLs, ...)

1 2

Page 12: Conducting Twitter Reserch

1 NETWORK ANALYSIS

Possible research questions: How different communities related to A are in connection to each other? Who is most central/influential (has most connections) in a certain network of tweeters? How information is disseminated in the network? Who the actors involved in a certain network are? What kind of local communities are there in a certain network and what do those communities represent? and many more...

Page 13: Conducting Twitter Reserch

TWITTER NETWORK DATA

1,248 TWEETS

111 FOLLOWING

290

FOLLOWERS

1

2

3

Page 14: Conducting Twitter Reserch

CREATE THE NETWORK

This creates a network file (.net) based on the connections between tweeters and those they mention (@username) in their tweets. Detailed instructions on how to create and analyze conversational networks on Twitter are available at: http://lexiurl.wlv.ac.uk/searcher/twitterConversationNetworks.html

ALTERNATIVE 1

Page 15: Conducting Twitter Reserch

CREATE THE NETWORK

Sort the data

Then convert the data into a network file

Source Username1 Username1 Username2 Username3 Username3 Username3

Target Username2 Username3 Username3 Username1 Username2 Username4

ALTERNATIVE 2

Page 16: Conducting Twitter Reserch

OBJECTS OF ANALYSIS

1. An actors (person, group, organisation, word, etc.) position in the network

2. Structure of the network (in relation to other networks) or subnetworks (clusters)

Page 17: Conducting Twitter Reserch

Degree centrality Used to locate actors with influence in the network or those that are in a position where they can spread information in the network. Can be divided into in- and outdegree. How many other actors can this actor reach directly? Other often used centrality measures: closeness, betweenness, Eigen-vector

AN ACTORS POSITION

Page 18: Conducting Twitter Reserch

Communities in the network Tells something about the structure of the network and how the different actors are spread and connected to each other in the network

NETWORK STRUCTURE

Page 19: Conducting Twitter Reserch

NETWORK ANALYSIS

- tools of the trade

Gephi (for network visualizations) http://gephi.org/

Ucinet (for network analysis and visualization) https://sites.google.com/site/ucinetsoftware/

Pajek (for network analysis and visualization) http://pajek.imfm.si/doku.php

Page 20: Conducting Twitter Reserch

Analyzing astrophysicists’ conversational connections on Twitter Holmberg, Haustein, Bowman & Peters (work in progress)

Communities detected based on the conversational connections in astrophysicists’ tweets

Page 21: Conducting Twitter Reserch

8.8 5.7 2.5 5.0 6.7 7.3

33.3

47.1

8.0 12.5

27.2

13.3 13.8

0.0

2.9

19.3

0.0

4.4

0.0 3.7

0.0

4.4

1.1

12.5

0.6

0.0

0.9

0.0

5.9

18.2

7.5

26.7

46.7 36.7

0.0

13.2

11.4

2.5

16.7 13.3 13.8

0.0

2.9

3.4

0.0

3.3 0.0

0.9

0.0

2.9

12.5

5.0

0.6 0.0 2.8

0.0

0.0

4.5

0.0

1.1

0.0 0.9

0.0

4.4 10.2

40.0

7.8 16.7 9.2

33.3

7.4 5.7

17.5

6.7 3.3 10.1

33.3

0 %

10 %

20 %

30 %

40 %

50 %

60 %

70 %

80 %

90 %

100 %

Mod0(n=68)

Mod1(n=88)

Mod2(n=40)

Mod3(n=180)

Mod4(n=30)

Mod5(n=109)

Mod6 (n=3)

Unknown

Other

Amateur astronomer

Teacher or educator

Corporative

Organization or association

Science communicator

Students

Other researchers

Other astrophysicists

Researcher

Percentage of people with different roles in the 7 communities

Analyzing astrophysicists’ conversational connections on Twitter Holmberg, Haustein, Bowman & Peters (work in progress)

Page 22: Conducting Twitter Reserch

Three groups coded based on their stance to climate change: • Convinced • Skeptic • Neutral

Climate change on Twitter: topics, communities and conversations about the IPCC Pearce, Holmberg, Hellsten & Nerlich (under review).

Page 23: Conducting Twitter Reserch

1 NETWORK ANALYSIS

Summary Step 4. Extract the data that you need (e.g. Tweeters and the usernames they mentioned, following or followers lists, ...) Step 5. Convert your data into a network file Step 6. Visualize the network and analyse In addition you may want to run some social network analysis on the network (e.g. centrality) or code the actors according to suitable titles (e.g. work roles, opinion about something, etc.)

Page 24: Conducting Twitter Reserch

2 CONTENT ANALYSIS

Possible research questions: How is topic A discussed on Twitter? How certain activities on Twitter correlate with offline activities? How popular is A compared with B, based on visibility on Twitter? What is the public opinion (of tweeters) about A? What are tweeters saying about A? and many more...

Page 25: Conducting Twitter Reserch

15,672

Page 26: Conducting Twitter Reserch
Page 27: Conducting Twitter Reserch

Quantitative Qualitative

Page 28: Conducting Twitter Reserch

CONTENT ANALYSIS

- manual coding

Positive-Neutral-Negative Scientific-Not scientific-Not clear Skeptic-Convinced-Neutral Personal-Work related Astrophysics-Biochemistry-Cheminformatics ... Pro something-Against something and many more depending on your research goals...

Page 29: Conducting Twitter Reserch

Scientific content of the tweets by communication type

6.5

18

8.5

1 1

3

3.5

3

0 0.5

10

7

3

5 4.5

3.5

5

7.5

0.5 1.5

0%

5%

10%

15%

20%

25%

30%

35%

40%

Astrophysics Biochemistry Digital humanities Economics History of science

Other

Links

Conversations

Retweets

Holmberg, K. & Thelwall, M. (2013). Disciplinary differences in Twitter scholarly communication. In the Proceedings of 14th International Society for Scientometrics and Informetrics conference, 2013, Vienna, Austria. Available at: http://issi2013.org/proceedings.html.

Page 30: Conducting Twitter Reserch

CONTENT ANALYSIS

- tools of the trade

VOSviewer (to extract noun-phrases from tweets)

http://www.vosviewer.com/

BibExcel (for co-word analysis)

http://www8.umu.se/inforsk/Bibexcel/

Notepad++ (to search and replace in your data)

http://notepad-plus-plus.org/

Screaming Frog SEO Spider (to decode short urls)

http://www.screamingfrog.co.uk/seo-spider/

Page 31: Conducting Twitter Reserch

Analyzing astrophysicists’ conversational connections on Twitter Holmberg, Haustein, Bowman & Peters (work in progress)

Noun-phrases from one of the communities

Page 32: Conducting Twitter Reserch

TIME SERIES

- tools of the trade

Mozdeh (Persian for Good news) Visit http://mozdeh.wlv.ac.uk/index.html for free download and instructions

Page 33: Conducting Twitter Reserch

Pearce, Holmberg, Hellsten & Nerlich (under review). Climate change on Twitter: topics, communities and conversations about the IPCC.

TIME SERIES

Page 34: Conducting Twitter Reserch

The Next Pope? 699,337 tweets collected

between February 12, 2013 and March 11, 2013.

Page 35: Conducting Twitter Reserch

Pope Francis - Jorge Mario Bergoglio

Was mentioned in 9 tweets...

Page 36: Conducting Twitter Reserch

Haustein, Bowman, Holmberg, Larivière, & Peters, (under review). Astrophysicists on Twitter: An in-depth analysis of tweeting and scientific publication behavior.

Comparison of Twitter and publication activity and impact • publications and tweets per day: ρ=−0.339*

• citation rate and tweets per day: ρ=−0.457**

ONLINE/OFFLINE

CORRELATIONS

Page 37: Conducting Twitter Reserch

Overall similarity between abstracts and tweets is low

• cosine=0.081

• 4.1% of 50,854 tweet NPs in abstracts

• 16.0% of 12,970 abstract NPs in tweets

Haustein, Bowman, Holmberg, Larivière, & Peters, (under review). Astrophysicists on

Twitter: An in-depth analysis of tweeting and scientific publication behavior.

ONLINE/OFFLINE

CORRELATIONS

Page 38: Conducting Twitter Reserch

2 CONTENT ANALYSIS

Summary Step 4. Extract the data that you need (e.g. hashtags, usernames, original tweets, ...) And then, depending on your research goals: Step 5A. Analyze frequencies (e.g. most used hashtags, etc.) Step 5B. Classify the tweets manually Step 5C. Extract the noun phrases and create a co-mention network of them with VOSviewer Step 5D. Analyze time series of certain word/hashtag occurrences Step 5E. Run sentiment analysis on the tweets

Page 39: Conducting Twitter Reserch
Page 40: Conducting Twitter Reserch

During this hour over 20,820,000

tweets were sent

Page 41: Conducting Twitter Reserch

Kim Holmberg Statistical Cybermetrics Research Group University of Wolverhampton, UK [email protected] http://kimholmberg.fi @kholmber

Acknowledgements This presentation is based upon work supported by the international funding initiative Digging into Data. Specifically, funding comes from the National Science Foundation in the United States (Grant No. 1208804), JISC in the United Kingdom, and the Social Sciences and Humanities Research Council of Canada.

Thank you for your attention