Lasi datawrangling

Post on 27-Jan-2015

110 views 0 download

Tags:

description

 

Transcript of Lasi datawrangling

Data wrangling with open source tools

Tony HirstDept of Communication & Systems

The Open University, UK

Premises

“I take data from wherever I

can get it”1

“Appropriate everything”

2

Conversations with data

3

Visual Conversations

with data3

(Accession Plot)

@mediaczar

If a picture’s worth a thousand words,

maybe it should take as long to read?

Most learning analytics won’t be

performed by learning analytics

researchers

How can we help people fashion

their own tools to support data

conversations?

Recipes

site:open.ac.uk

Have a conversation

with the data…

Ask the right questions…

xkcd.com/1138

Sometimes a question makes most sense in

the context of questions previously asked and answers previously received

DATAU

SERS

EducatorsLearners

PlannersMarketers

PolicymakersResearchers

PressNGOs

“DEVELOPERS”

Have dashboard,

so what?

A tools and issues

based view

DATA

TOOLS

USERS

PROBLEMS

Example – Google Fusion Tables

Fusion Tablehttps://www.google.com/fusiontables/DataSource?

docid=1VKG7iCbFlsEYJzTuQppf4xoIqq1ABxWTdW6O_7o#rows:id=1

http://is.gd/qhuaoA

Walkthroughhttp://blog.ouseful.info/2012/11/16/a-quick-look-at-gcsealevel-certificate-awards-market-share-by-examination-

board/

http://is.gd/f9YAbG

DATA

TOOLS

USERS

PROBLEMS

Access/obtain data

Make sense of data

Ask specific questions of data

Communicate in a data-centric way

Load dataClean data

Merge/enrich data

DATA

Issues

TOOLS

DATA

OtherTOOLS

Issues

TOOLS

“Tool based programming”

A barrier to access (for the tool user) is

data format

JSON XMLCSVXLS

TSV

.dbHTML

PDF DOCTXT

GLUE LOGIC(Glue code)

=importHTML(URL, “table”, N)

HTML

QUERYABLE DATA

Try it…Example Page

http://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_the_United_States_by_endowment

http://is.gd/7Vbg6n

Google Spreadsheets as a database

Explorerhttps://views.scraperwiki.com/run/google_spreadsheet_query/

http://is.gd/jiMJoh

Walkthroughhttp://schoolofdata.org/2013/05/24/asking-questions-of-data-garment-factories-data-expedition/

http://is.gd/qJHihu

=importCSV(URL, N)

HTML

INTERACTIVEDASHBOARD

Google Charts

Google Chart Visualization API

https://code.google.com/apis/ajax/playground/

http://is.gd/TTHIUh

Google Visualisation

API

googleVis (R)

https://developers.facebook.com/docs/reference/api/

examples/

http://is.gd/7cRnvS

A barrier to access (for the tool user) is

data shape

nother

A barrier to access (for the tool user) is

data cleanliness

nother

Yet

Clear to read?

Questions of identity

The Open UniversityOpen University

OUOpen Uni

Open University, UK

NORMALISATION/RECONCILIATION

Reconciliation to a canonical name

and/or to a unique identifier

A stumbling block (for the data user) is data enrichment

A stumbling block (for the data user) is joining datasets

nother

A stumbling block (for the data user) is joining partially

matched data

huge

Rolling your own interactive data

exploration tools

R Shiny Apps

ui.R server.R

RCharts

Many chart tools do the work for

you if the data is in the right shape

DATA

TOOLS

USERS PROBLEMS

Just

ask

ask.SchoolOfData.org

blog.ouseful.info

@psychemedia