Guest lecture at Coding Culture, Utrecht
-
Upload
damian-trilling -
Category
Education
-
view
157 -
download
1
description
Transcript of Guest lecture at Coding Culture, Utrecht
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Python in the Social SciencesA brief introduction by means of real-life
examples
Damian Trilling
[email protected]@damian0604
www.damiantrilling.net
Afdeling CommunicatiewetenschapUniversiteit van Amsterdam
Coding Culture, Utrecht, 5 March 2014
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
What I won’t do today
I won’t give you a structured introduction about
• variables• commands• data types• . . .
and all the other technical stuff.
You’ll do that yourself the next weeks.I’ll give you some examples of what you can do with the knowledgeyou’re going to acquire.
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
What I won’t do today
I won’t give you a structured introduction about
• variables• commands• data types• . . .
and all the other technical stuff.
You’ll do that yourself the next weeks.
I’ll give you some examples of what you can do with the knowledgeyou’re going to acquire.
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
What I won’t do today
I won’t give you a structured introduction about
• variables• commands• data types• . . .
and all the other technical stuff.
You’ll do that yourself the next weeks.I’ll give you some examples of what you can do with the knowledgeyou’re going to acquire.
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Some exmples
Why should I learn Python?
Some examples
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Some exmples
A recent bachelor thesis
Tone in tweets
Imagine you want to know something about someone’s behavior ontwitter. Or how a specific topic is discussed on Twitter.Do you really want to go through thousands of tweets by hand?
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Some exmples
A recent bachelor thesis
Tone in tweetsImagine you want to know something about someone’s behavior ontwitter. Or how a specific topic is discussed on Twitter.
Do you really want to go through thousands of tweets by hand?
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Some exmples
A recent bachelor thesis
Tone in tweetsImagine you want to know something about someone’s behavior ontwitter. Or how a specific topic is discussed on Twitter.Do you really want to go through thousands of tweets by hand?
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Some exmples
So you’d better think about automating your coding
Finding out how negative or positive politicians are towardstheir opponents
The student took lists with positive and negative words and madeadditional ones with a politician’s opponents.She used a Python-script to check which type of words was used torefer to opponents.For further analysis, the results where imported in SPSS.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koninkrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Some exmples
So you’d better think about automating your coding
Finding out how negative or positive politicians are towardstheir opponentsThe student took lists with positive and negative words and madeadditional ones with a politician’s opponents.
She used a Python-script to check which type of words was used torefer to opponents.For further analysis, the results where imported in SPSS.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koninkrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Some exmples
So you’d better think about automating your coding
Finding out how negative or positive politicians are towardstheir opponentsThe student took lists with positive and negative words and madeadditional ones with a politician’s opponents.She used a Python-script to check which type of words was used torefer to opponents.
For further analysis, the results where imported in SPSS.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koninkrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Some exmples
So you’d better think about automating your coding
Finding out how negative or positive politicians are towardstheir opponentsThe student took lists with positive and negative words and madeadditional ones with a politician’s opponents.She used a Python-script to check which type of words was used torefer to opponents.For further analysis, the results where imported in SPSS.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koninkrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Some exmples
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Some exmples
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Some exmples
Frame adoption on Twitter
Which phrases used by Merkel and Steinbrück on TV make itto the #tvduell discussion on Twitter?As part of the project, I wrote a Python-script to identify wordco-occurrences on Twitter. The script produced not only lists withword counts, but also a GDF-file that could be used forvisualization.
Python in the Social Sciences Damian Trilling
1 #!/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.72 # -*- coding: utf-8 -*-3 from __future__ import division4 from itertools import combinations5 from collections import defaultdict6 from collections import Counter7 from unicsv import CsvUnicodeReader8 import codecs, cStringIO, sys, re, unicodedata, os9
10 gdfbestand="resultaten/netwerk.gdf"11 wordsplitbestand="resultaten/wordsplit.csv"12 tempbestand="allewoorden.tmp"1314 minedgeweight=2015 cooc=defaultdict(int)16 tweets=[]1718 print "\nReading tweet nr. "19 reader=CsvUnicodeReader(open(wordsplitbestand,"r"))20 i=021 for row in reader:22 i=i+123 # skip first row, as it contains column headers24 if i>1:25 print "\r",str(i)," ",26 sys.stdout.flush()27 tweets.append(row[9])
1 f = codecs.open(tempbestand, ’wb’, encoding="utf-8")2 i=03 print "Making tempfile to count word frequencies"4 allestems=[]5 for tweet in tweets:6 for stems in tweet.split():7 allestems.append(stems)8 for k in range(0,len(allestems)):9 f.write(allestems[k]+"\n")
10 print "Couting..."11 c=Counter()12 with codecs.open(tempbestand,"rb", encoding="utf-8") as r:13 for l in r:14 c[l.rstrip(’\n’)] += 115 os.remove(tempbestand)16 f = codecs.open(gdfbestand, ’wb’, encoding="utf-8")17 for tweet in tweets:18 words=tweet.split()19 for a,b in combinations(words,2):20 if a!=b:21 cooc[(a,b)]+=1
1 f.write("nodedef>name VARCHAR, width DOUBLE\n")2 algenoemd=[]3 verwijderen=[]4 for k in cooc:5 if cooc[k]<minedgeweight:6 verwijderen.append(k)7 else:8 if k[0] not in algenoemd:9 f.write(k[0]+","+str(c[k[0]])+"\n")
10 algenoemd.append(k[0])11 if k[1] not in algenoemd:12 f.write(k[1]+","+str(c[k[1]])+"\n")13 algenoemd.append(k[1])14 for k in verwijderen:15 del cooc[k]16 f.write("edgedef>node1 VARCHAR,node2 VARCHAR, weight DOUBLE\n")17 for k, v in cooc.iteritems():18 regel= ",".join(k)+","+str(v)19 f.write(regel+"\n")
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Some exmples
Frame adoption on Twitter
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Summing up what you can use it for
Why should I learn Python?
Summing up what you can use it for
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Summing up what you can use it for
One tool to rule them all?
Of course there are ready-made tool for some of the questions wewant to answer. But for many, there isn’t. Python offers us the
possibility to build exactly the tool we need. And it’sfun!
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Summing up what you can use it for
One tool to rule them all?
Of course there are ready-made tool for some of the questions wewant to answer. But for many, there isn’t. Python offers us the
possibility to build exactly the tool we need.
And it’sfun!
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Summing up what you can use it for
One tool to rule them all?
Of course there are ready-made tool for some of the questions wewant to answer. But for many, there isn’t. Python offers us the
possibility to build exactly the tool we need. And it’sfun!
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Summing up what you can use it for
1st group of tasks
Highly repetitive tasksSimple tasks (counting things, comparing texts, . . . ) that can bedescribed in a formalized way. Saves time even with few cases, butthere is virtually no size limit.
Example: Retweets start with RT, optionally followed by a space,and some letters. So it is very easy to identify them automatically
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Summing up what you can use it for
2nd group of tasks
Task for which specific Python modules existThere are thousands of modules suitable for text analysis. Youbasically only have to write code for data input and output.
Example: Sentiment analysis
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Summing up what you can use it for
3rd group of tasks
API’s, RSS, webscraping . . .You can use Python if you want to collect and store information.
Example: Collecting bio’s of Twitter users, scraping the web (datajournalism!), downloading Facebook data
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Summing up what you can use it for
Why we should use Python in the social sciences
It is a programming language
• It is flexible. You can use it for (in principle) any kind of data• There are virtually no limits regarding the amount of data toprocess
• You can run it on every platform
• And yet it is easy to learn!
It is widely used for content analysis
• Many online ressources and toolkits• Books about NLP and Web Scraping with Python
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Summing up what you can use it for
Why we should use Python in the social sciences
It is a programming language
• It is flexible. You can use it for (in principle) any kind of data• There are virtually no limits regarding the amount of data toprocess
• You can run it on every platform• And yet it is easy to learn!
It is widely used for content analysis
• Many online ressources and toolkits• Books about NLP and Web Scraping with Python
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Summing up what you can use it for
Why we should use Python in the social sciences
It is a programming language
• It is flexible. You can use it for (in principle) any kind of data• There are virtually no limits regarding the amount of data toprocess
• You can run it on every platform• And yet it is easy to learn!
It is widely used for content analysis
• Many online ressources and toolkits• Books about NLP and Web Scraping with Python
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Summing up what you can use it for
Think of the following task
RQ: What are the differences in terms of actors mentionedbetween Israeli and Palestinian news coverage?
1 The data structure: You have a folder with articles2 The desired output: You want a table with the file names and
a column per actor, counting how often they are mentioned3 A typical task for a short Python script!
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Summing up what you can use it for
Think of the following task
RQ: What are the differences in terms of actors mentionedbetween Israeli and Palestinian news coverage?
1 The data structure: You have a folder with articles
2 The desired output: You want a table with the file names anda column per actor, counting how often they are mentioned
3 A typical task for a short Python script!
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Summing up what you can use it for
Think of the following task
RQ: What are the differences in terms of actors mentionedbetween Israeli and Palestinian news coverage?
1 The data structure: You have a folder with articles2 The desired output: You want a table with the file names and
a column per actor, counting how often they are mentioned
3 A typical task for a short Python script!
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Summing up what you can use it for
Think of the following task
RQ: What are the differences in terms of actors mentionedbetween Israeli and Palestinian news coverage?
1 The data structure: You have a folder with articles2 The desired output: You want a table with the file names and
a column per actor, counting how often they are mentioned3 A typical task for a short Python script!
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Summing up what you can use it for
You need someting like this:
for every file in folder:read the filecount actorsadd new row to table with filename and actor counts
save table
(such a notation is called pseudo-code)
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Summing up what you can use it for
and in Python, it’s not that different!
Python in the Social Sciences Damian Trilling
1 mypath ="C:\Users\Ricarda\Documents\Artikelen"2 regex54 = re.compile(r’Israel.*[minister|politician.*|[Aa]uthorit’)3 filename_list=[]4 matchcount54=05 matchcount54_list=[]6 onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]7 for f in onlyfiles:8 matchcount54=09 artikel=open(join(mypath,f),"r")
10 for line in artikel:11 matches54 = regex54.findall(line)12 for word in matches54:13 matchcount54=matchcount54+114 filename_list.append(f)15 matchcount54_list.append(matchcount54)16 artikel.close()17 output=zip(filename_list,matchcount54_list)18 writer = csv.writer(open("overzichtstabel.csv", ’wb’))19 writer.writerows(output)
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Pseudo-code
Explaining a basic Python script:Pseudo code
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Pseudo-code
We collected tweets on the UNFCC-conference withyourTwapperkeeper.
Our task: Identify all tweets that include a reference to PolandLet’s start with some pseudo-code!
1 open csv-table2 for each line:3 append column 1 to a list of tweets4 append column 3 to a list of corresponding users5 look for searchstring in column 16 append search result to a list of results7 save lists to a new csv-file
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Python code
Explaining a basic Python script:Python code
Python in the Social Sciences Damian Trilling
1 #!/usr/bin/python2 from unicsv import CsvUnicodeReader3 from unicsv import CsvUnicodeWriter4 import re5 inputfilename="mytweets.csv"6 outputfilename="myoutput.csv"7 user_list=[]8 tweet_list=[]9 search_list=[]
10 searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau|[Ww]arszawa’)11 print "Opening "+inputfilename12 reader=CsvUnicodeReader(open(inputfilename,"r"))13 for row in reader:14 tweet_list.append(row[0])15 user_list.append(row[2])16 matches1 = searchstring1.findall(row[0])17 matchcount1=018 for word in matches1:19 matchcount1=matchcount1+120 search_list.append(matchcount1)21 print "Constructing data matrix"22 outputdata=zip(tweet_list,user_list,search_list)23 headers=zip(["tweet"],["user"],["how often is Poland mentioned?"])24 print "Write data matrix to ",outputfilename25 writer=CsvUnicodeWriter(open(outputfilename,"wb"))26 writer.writerows(headers)27 writer.writerows(outputdata)
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Python code
1 #!/usr/bin/python2 # We start with importing some modules:3 from unicsv import CsvUnicodeReader4 from unicsv import CsvUnicodeWriter5 import re67 # Let us define two variables that contain8 # the names of the files we want to use9 inputfilename="mytweets.csv"
10 outputfilename="myoutput.csv"
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Python code
1 # We create some empty lists that we will use later on.2 # A list can contain several variables3 # and is denoted by square brackets.4 user_list=[]5 tweet_list=[]6 search_list=[]
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Python code
1 # What do we want to look for?2 searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau|[Ww]arszawa’)34 # Enough preparation, let the program begin!5 # We tell the user what is going on...6 print "Opening "+inputfilename78 # ... and call the module that reads the input file.9 reader=CsvUnicodeReader(open(inputfilename,"r"))
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Python code
1 # Now we read the file line by line.2 # The indented block is repeated for each row3 # (thus, each tweet)4 for row in reader:5 # append data from the current row to our lists.6 # Note that we start counting with 0.7 tweet_list.append(row[0])8 user_list.append(row[2])9
10 # Let us count how often our searchstring is used in11 # in this tweet12 matches1 = searchstring1.findall(row[0])13 matchcount1=014 for word in matches1:15 matchcount1=matchcount1+116 search_list.append(matchcount1)
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Python code
1 # Time to put all the data in one container2 # and save it:34 print "Constructing data matrix"5 outputdata=zip(tweet_list,user_list,search_list)6 headers=zip(["tweet"],["user"],["how often is Poland mentioned?"])7 print "Write data matrix to ",outputfilename8 writer=CsvUnicodeWriter(open(outputfilename,"wb"))9 writer.writerows(headers)
10 writer.writerows(outputdata)
Python in the Social Sciences Damian Trilling
1 #!/usr/bin/python2 from unicsv import CsvUnicodeReader3 from unicsv import CsvUnicodeWriter4 import re5 inputfilename="mytweets.csv"6 outputfilename="myoutput.csv"7 user_list=[]8 tweet_list=[]9 search_list=[]
10 searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau|[Ww]arszawa’)11 print "Opening "+inputfilename12 reader=CsvUnicodeReader(open(inputfilename,"r"))13 for row in reader:14 tweet_list.append(row[0])15 user_list.append(row[2])16 matches1 = searchstring1.findall(row[0])17 matchcount1=018 for word in matches1:19 matchcount1=matchcount1+120 search_list.append(matchcount1)21 print "Constructing data matrix"22 outputdata=zip(tweet_list,user_list,search_list)23 headers=zip(["tweet"],["user"],["how often is Poland mentioned?"])24 print "Write data matrix to ",outputfilename25 writer=CsvUnicodeWriter(open(outputfilename,"wb"))26 writer.writerows(headers)27 writer.writerows(outputdata)
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
The output
Explaining a basic Python script:The output (myoutput.csv)
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
The output
1 tweet,user,how often is Poland mentioned?2 :-) #Lectrr #wereldleiders #uitspraken #Wikileaks #klimaattop http://t.
co/Udjpk48EIB,henklbr,03 Wat zijn de resulaten vd #klimaattop in #Warschau waard? @EP_Environment
ontmoet voorzitter klimaattop @MarcinKorolec http://t.co/4Lmiaopf60,Europarl_NL,1
4 RT @greenami1: De winnaars en verliezers van de lachwekkende #klimaattopin #Warschau (interview): http://t.co/DEYqnqXHdy #Misserfolg #Kli
...,LarsMoratis,15 De winnaars en verliezers van de lachwekkende #klimaattop in #Warschau (
interview): http://t.co/DEYqnqXHdy #Misserfolg #Klimaschutz #FAZ,greenami1,1
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
The output
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Try it yourself!
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Doe je mee?
Python in the Social Sciences Damian Trilling
Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?
Vragen of opmerkingen?
Damian Trilling
[email protected]@damian0604
www.damiantrilling.net
Python in the Social Sciences Damian Trilling