Hacking Data Visualisations
MELINDA SECKINGTON !@MSECKINGTON
@mseckington
Hacking data visualisations
@mseckington
Why?
https://www.flickr.com/photos/laurenmanning/6632168961/
https://www.flickr.com/photos/jamjar/5491205608
“I feel that everyday, all of us now are being blasted by information design. It's being poured into our eyes through the Web, and we're all visualizers now; we're all demanding a visual aspect to our information. There's something almost quite magical about visual information. It's effortless, it literally pours in. And if you're navigating a dense information jungle, coming across a beautiful graphic or a lovely data visualization, it's a relief, it's like coming across a clearing in the jungle.”
DAVID MCCANDLESS - THE BEAUTY OF DATA VISUALIZATION
@mseckington
Tor NorretrandersTHE BANDWIDTH OF OUR SENSES
@mseckington
A brief history of data visualisations
Theatrum Orbis Terrarum May 20, 1570
The first modern atlas, collected by Abraham Ortelis. !This was a first attempt to gather all maps that were known to man at the time and bind them together.
A BRIEF HISTORY OF DATA VISUALISATION
https://www.flickr.com/photos/smailtronic/2361594300
A BRIEF HISTORY OF DATA VISUALISATION
Bills of Mortality
From 1603, London parish clerks collected health-related population data in order to monitor plague deaths, publishing the London Bills of Mortality on a weekly basis. !John Graunt amalgamated 50 years of information from the bills, producing the first known tables of public health data.
BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - THE GUARDIAN
A BRIEF HISTORY OF DATA VISUALISATION
1644: First known graph of statistical data !
MICHAEL VAN LANGREN - ESTIMATES OF DISTANCE IN LONGITUDE BETWEEN TOLEDO AND ROME
A BRIEF HISTORY OF DATA VISUALISATION
A BRIEF HISTORY OF DATA VISUALISATION
1786 first bar chart William Playfair
Exports and imports of Scotland to and from different parts for one Year from Christmas 1780 to Christmas 1781
A BRIEF HISTORY OF DATA VISUALISATION
Street map of cholera deaths in Soho 1853 John Snow
Snow's 'ghost map' shows deaths from cholera around Broad Street between 19 August and 30 September 1854. Snow simplified the street layout, highlighting the 13 water pumps serving the area and representing each death as a black bar. His map demonstrates how cholera was spreading, not by a 'miasma' rising from the Thames, but in water contaminated by human waste
BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - THE GUARDIAN
A BRIEF HISTORY OF DATA VISUALISATION
Diagram of the Causes of Mortality in the Army in the East !1858 Florence Nightingale
In her seminal ‘rose diagram’, Nightingale demonstrated that far more soldiers died from preventable epidemic diseases (blue) than from wounds inflicted on the battlefield (red) or other causes (black) during the Crimean War (1853-56)
BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - THE GUARDIAN
How?
HOW?
https://www.flickr.com/photos/jdhancock/8031897271
https://www.flickr.com/photos/laurenmanning/5658951917/
HOW?
@mseckington
HOW?
@mseckington
HOW?
@mseckington
HOW?
@mseckington
HOW?
@mseckington
A quick intro to R
A QUICK INTRO TO R
What is R? !
@mseckington
A QUICK INTRO TO R
What is R? !R is a free programming language and environment for statistical computing and graphics. !
@mseckington
A QUICK INTRO TO R
What is R? !R is a free programming language and environment for statistical computing and graphics. !Created by statisticians for statisticians.
@mseckington
A QUICK INTRO TO R
What is R? !R is a free programming language and environment for statistical computing and graphics. !Created by statisticians for statisticians. !Comes with a lot of facilities for data manipulation, calculation, data analysis and graphical display.
@mseckington
A QUICK INTRO TO R
What is R? !R is a free programming language and environment for statistical computing and graphics. !Created by statisticians for statisticians. !Comes with a lot of facilities for data manipulation, calculation, data analysis and graphical display. !Highly and easily extensible.
@mseckington
A QUICK INTRO TO R
!> data()!!list all datasets available !
@mseckington
!> data()!!list all datasets available !> movies = data(movies)!> movies <- data(movies)!!assign movies data to movies variable !
@mseckington
!> data()!!list all datasets available !> movies = data(movies)!> movies <- data(movies)!!assign movies data to movies variable !> dim(movies)![1] 58788! 24!!
@mseckington
!> data()!!list all datasets available !> movies = data(movies)!> movies <- data(movies)!!assign movies data to movies variable !> dim(movies)![1] 58788! 24!!> names(movies)![1] "title" “year" “length" “budget" "rating" “votes" ![7] “r1" “r2" “r3" “r4" “r5" “r6"![13] “r7" “r8" “r9" “r10" “mpaa" “Action" ![19] “Animation" "Comedy" “Drama" “Documentary" “Romance”"Short"!
@mseckington
!> movies[7079,]! !!! title ! ! ! ! ! year ! length budget rating votes !7079 Bourne Identity, The 2002 !119!! 75000000 7.3 ! 29871 !!r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 mpaa !4.5 4.5 4.5 4.5 4.5 14.5 24.5 34.5 14.5 4.5 PG-13!!Action Animation Comedy Drama Documentary Romance Short! 1 0 0 1 0 0 0!!returns 1 row => all the data for 1 movies !
@mseckington
!> movies[7079,]! !!! title ! ! ! ! ! year ! length budget rating votes !7079 Bourne Identity, The 2002 !119!! 75000000 7.3 ! 29871 !!r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 mpaa !4.5 4.5 4.5 4.5 4.5 14.5 24.5 34.5 14.5 4.5 PG-13!!Action Animation Comedy Drama Documentary Romance Short! 1 0 0 1 0 0 0!!returns 1 row => all the data for 1 movies !> movies[1:10,]!. . . !!returns rows 1 to 10
@mseckington
!> movies[,1]!. . .!!returns 1 column => titles of all movies
@mseckington
!> movies[,1]!. . .!!returns 1 column => titles of all movies !> movies$title!. . .!!same as movies[,1]!returns column with the label ‘title !
@mseckington
!> movies[,1]!. . .!!returns 1 column => titles of all movies !> movies$title!. . .!!same as movies[,1]!returns column with the label ‘title !> movies[,1:10]!. . .!!returns columns 1 to 10
@mseckington
!> hist(movies$year)
@mseckington
!> hist(movies$year)
Histogram of movies$year
movies$yearFrequency
1900 1920 1940 1960 1980 2000
02000
4000
6000
8000
@mseckington
!> hist(movies$year)!!> hist(movies$rating)
@mseckington
!> hist(movies$year)!!> hist(movies$rating)
Histogram of movies$rating
movies$ratingFrequency
2 4 6 8 10
02000
4000
6000
8000
@mseckington
!> hist(movies$year)!!> hist(movies$rating)!!> library(ggplot2)
@mseckington
!> hist(movies$year)!!> hist(movies$rating)!!> library(ggplot2)!!> qplot(rating, !! !!! data=movies, !!! geom="histogram")
@mseckington
!> hist(movies$year)!!> hist(movies$rating)!!> library(ggplot2)!!> qplot(rating, !! !!! data=movies, !!! geom=“histogram")!!> qplot(rating, !!!! data=movies, !!! geom="histogram", !! binwidth=1)
@mseckington
!> m = ggplot(movies, aes(rating))!!> m + geom_histogram()
@mseckington
!> m = ggplot(movies, aes(rating))!!> m + geom_histogram()!!> m + geom_histogram(!! ! ! aes(fill = ..count..))
@mseckington
!> m = ggplot(movies, aes(rating))!!> m + geom_histogram()!!> m + geom_histogram(!! ! ! aes(fill = ..count..))!!> m + geom_histogram(!! ! ! colour = "darkgreen", !! ! ! fill = "white", !! ! ! binwidth = 0.5)!!
@mseckington
!> m = ggplot(movies, aes(rating))!!> m + geom_histogram()!!> m + geom_histogram(!! ! ! aes(fill = ..count..))!!> m + geom_histogram(!! ! ! colour = "darkgreen", !! ! ! fill = "white", !! ! ! binwidth = 0.5)!!> x = m + geom_histogram(!! ! ! ! binwidth = 0.5)!> x + facet_grid(Action ~ Comedy)!
@mseckington
!> library(twitteR)!!> setup_twitter_oauth(!! ! "API key”, "API secret", "Access token", "Access secret”)!!
@mseckington
FUTURELEARN STATS
!> fl = read.csv(!! ! "futurelearn_dataset.csv", ! ! header=TRUE)!!
@mseckington
!> fl = read.csv(!! ! "futurelearn_dataset.csv", ! ! header=TRUE)!!> source_table = table(fl$age)!> pie(source_table)
@mseckington
!> fl = read.csv(!! ! "futurelearn_dataset.csv", ! ! header=TRUE)!!> source_table = table(fl$age)!> pie(source_table)!!> pie(source_table, !! ! radius=0.6, !! ! col=rainbow(8))
@mseckington
!> library(twitteR)!!> setup_twitter_oauth(!! ! "API key”, "API secret", "Access token", "Access secret”)!!> tweets <- searchTwitter('futurelearn', n=100)
@mseckington
!> library(twitteR)!!> setup_twitter_oauth(!! ! "API key”, "API secret", "Access token", "Access secret”)!!> tweets <- searchTwitter('futurelearn', n=100)!!> library(“tm”)!!> tweet_text <- sapply(tweets, function(x) x$getText())!> tweet_corpus <- Corpus(VectorSource(tweet_text))!!
@mseckington
!> library(twitteR)!!> setup_twitter_oauth(!! ! "API key”, "API secret", "Access token", "Access secret”)!!> tweets <- searchTwitter('futurelearn', n=100)!!> library(“tm”)!!> tweet_text <- sapply(tweets, function(x) x$getText())!> tweet_corpus <- Corpus(VectorSource(tweet_text))!!> tweet_corpus <- tm_map(tweet_corpus, !!! ! ! ! ! ! ! ! ! content_transformer(tolower))!> tweet_corpus <- tm_map(tweet_corpus, removePunctuation)!> tweet_corpus <- tm_map(tweet_corpus, !! !! ! ! ! ! ! ! ! function(x)removeWords(x,stopwords()))
!> library(wordcloud)!!> wordcloud(tweet_corpus)
@mseckington
!> library(wordcloud)!!> wordcloud(tweet_corpus)
@mseckington
What next?
A QUICK INTRO TO R
A QUICK INTRO TO R
WHAT NEXT?
@mseckington
https://www.flickr.com/photos/jamjar/5491205608
@mseckington
Recap
Data visualisations are awesome
@mseckington
R is awesome
@mseckington
Any questions? !
@mseckington
Top Related