MPhil Lecture on Data Vis for Analysis
description
Transcript of MPhil Lecture on Data Vis for Analysis
An Introduction to Data Visualisation for Analysis
Exploring the Dataset - Textual, Numerical and Otherwise
http://www.slideshare.net/shawnday/m-phil-datavisforanalysis
AgendaThoughts from last week - wordpress.com?
Introduction
What do we mean by Data Analysis?
Some foundation terms and concepts
The Data Visualisation Process
Tools and Methods
Extending your toolset
An Exercise
Objective
To appreciate the rich variety of techniques and tools available to digital humanities scholars for
data visualisation and analysis. The intention is to be able to add tools to your
arsenal and to have a sense of where to look for more.
Breakpoint
One of the keys to good visualization is understanding what your immediate goals are.
Are you visualizing data to understand what’s in it, or are you trying to communicate meaning to
others?
You - Visualisation for Data Analysis
Others - Visualisation for Presentation
Speaking of Data AnalysisSPSS
SAS
OS Equivalents
So Why Would You Want to Visualise Your Data?
Bypass language centres to tap directly into the visual cortex
Leverage ability to recognise patterns - what they call visual sense-making
Powerful graphics engines now allow for live data processing and sophisticated animations and interactive research environments
Sources: Geoff McGhee, Getting Started with Data Viz
So Why Would You Want to Visualise Your Data?
Work with new data to create new knowledge
Explore data to discover things that used to be unknown, unknowable or impractical to know
Take a new perspective on the familiar to reveal previously hidden insights
Visualising New Information
Tourists vs Locals, Eric Fischer, 2010 - Flickr
Visualising New Information
Flickr Flow, Martin Wattenberg and Fernanda Viegas, 2009
The Familiar through New Eyes
The Times Atlas
How Could You Use Data Analysis“In the Lab” - for your own analysis
Online as part of collabourative groups
Through dissemination for extension of own work - crowdsourcing
Others?
The Time Ribbon and the Tree Map
Exploring the ordinary life of rural pioneers in nineteenth century Ontario
Visualisation Objective
William Sunter Farm Diary, 1858
Farm Journal
• 100s of pages
• Varying hands
• Varying quality
Diaries: the raw materials
• Generate word frequency (Voyeur, TAPoR)
• Isolate known farm activities (NLP - LanguageWare)
• Collocate to link activity references to time, duration, and resources (Voyeur)
The Process
Medical Diary by BlueChillies
Example: Medical Diary
History flow by Martin Wattenberg and Fernanda Viegas
Example: History Flow
The Result/ New Patterns
•Less time haying
•The impact of technology
•More tasks faster
The Result/ New Patterns
How Else Could this be done?
• Easier to compare over intervals
• Multiple vectors with greater granularity in a compressed space
• The challenge is to find rich enough source materials to yield substantive datasets
What is the Value of this Visualisation
The Tree Map
Example: Newsmap
Example: Panopticon
• What are we studying?
–Self-declared occupations of politicians
• Why?
–What bias might they bring to their job?
• How?
–Visualising past occupation and mapping to political platform of party affiliated with
Case Study:Occupations of Politicians
Occupations of TDs in the 30th Dáil
Occupations of MPs in the 2nd Parliament
Occupations of MPs in the 37th Parliament
• The emergence of the professional politician with no private sector experience
• Occupational continuity across changes in governing party
The Result/ New Patterns
How Else Could this be Done?
• New ways of presenting allow new ways of seeing
• Hidden patterns become evident
• Suggest other hypothesis to test
The Value of Data Vis for Analysis
Basic Terms Datamining
Statistics
Structured/Unstructured Data
Visualisation
Modelling
Types of Data to VisualiseAudio Data
Categorical Data
Cartographic Data
Collections
Image DataStill
Moving
Metadata
Multimedia Data
Network DataSocial
Other
Numerical Data
Temporal Data
Textual DataNarrative
Qualitative
????
General Steps in Data Vis for DHDiscovery / Acquisition
Cleaning / ‘Munging’
Analysis / Exploratory Vis
Presentation
Discovery / AcquisitionOriginal Research
Spreadsheets
Databases
Digitized Media
Other DownloadsPublic Data
Archives/Libraries
Academic Partners
Purchase
ScrapingJunar
Outwit Hub
ScraperWiki
Demo/Hands-On: Junarhttp://www.junar.com
Cleaning / Munging(Normalisation, Format Conversion)
Tools:Data Wrangler
Google Refine
Mr. Data Converter
Data WranglerDoes simple, split, clear, fold/unfold transforms on data
See example --> Data and Script
Google RefineWorks with larger datasets
Hands-On: Data Wranglerhttp://vis.stanford.edu/wrangler/app/
Hands-On: Google Refinehttp://code.google.com/p/google-refine/
Hands-On: Mr Data Converterhttp://shancarter.com/data_converter/
Analysis / Exploratory VisualisationWeb Services
Google Fusion Tables
Google Spreadsheets
IBM ManyEyes
TimeFlow
Applications
Tableau/Tableau Public
MS Office
OpenOffice
Gephi
Node XL (plug-in for Excel)
Spotfire
R Processing
Google NGram ViewersExamine word frequency in digitised books
Currently about 4% of books ever published
In English, Chinese, French, German, Hebrew, Russian, and Spanish
Changes in word usage
Trends
Check out the Cultural Observatory @ Harvard
WordleVisually present word frequency using size, weight, colour
Consider Word Clouds Considered Harmful
ExerciseChoose a dataset from a source such as:
The CSO
Project Guttenberg
or your own material
Choose an appropriate Data Visualisation from a webservice we explored in workshop.
Explain the process and how you madeyour choice and embed it in your own blog using wordpress.com as we explored last week.
Suggest a research question that can be answered by using this data visualisation as a research environment
Send the link to me at: [email protected]
Maybe: http://politicalreform.ie/2011/12/04/state-of-enda-sunday-business-post-red-c-poll-4th-september-2011/