Post on 27-Jan-2015
description
Seminar 11 Maps, Timelines, Big Data, and Visualization
Introduction to the Digital Liberal ArtsMDST 3703 / 7703
Fall 2010
Business
• Quiz 2 available in Collab after this class• Same protocol as Quiz 1 and Midterm• Due by start of class Thursday
Review
• http://www.lkozma.net/wpv/index.html • The Blogosphere, Wikiverse and other “regions”
of the web have produced massive, aggregated sources of information – Big Data
• An unintended consequence of this is that these sources are now being mined for patterns– Freebase, dbPedia, Facebook, etc.
• As a result, new level of information is emerging on the web – the datasphere
Overview
• The Datasphere raises two big questions– What can we do with it?– What will it do with us?
• Today, we look at both questions
What can we do with Big Data?
Different Approaches
• Traditional approaches– Geographical data (Robertson)– Historical data (Elliot and Gillies)
• Radical approaches– Distant Reading (Moretti)– Cultural Analytics (Manovich)
Geographical Data (Places)
• Geographical data are low-hanging fruit– Names can be extracted from a variety of sources
and then “meshed” with gazetteers– e.g. GeoNames http://www.geonames.org/
• Maps can help visualize that data• Maps can also serve as an interface to the
data• Elliot and Gillies exemplify this approach in
Classics
http://books.google.com/books?id=ao8oG7xRRBUC&dq=the+writings+of+thomas+jefferson
The writings of Thomas Jefferson (Google Books)
Historical Data (Events)
• HEML (Historical Event Markup Language) provides a model for defining events– Written in RDF
• Can be used to extract events from texts or convert from other formats– CIDOC-CRM– Semantic MediaWiki
• These can be aggregated and visualized
HEML Sample
Top-level events in an RDF dataset accumulated with Semantic Mediawiki for a first-year Ancient History course
Timeline Software
• Dipity– http://www.dipity.com/
• SIMILE– http://www.simile-widgets.org/timeline/
Time Maps
• Google Timemap– http://code.google.com/p/timemap/ – http://ontoligent.com/jah/timemaps/hd-tm-1858.html
• TimeMap– http://www.timemap.net/
• VisualEyes– http://viseyes.org/
• HyperCities– http://hypercities.com/
Cultural Analytics
• Lev Manovich • Applies interactive visualization to Big Data• http://lab.softwarestudies.com/2008/09/cultu
ral-analytics.html
Distant Reading
• Franco Moretti• Part of a long tradition of “statistical criticism”• Influenced by the French historian, Fernand
Braudel
One of Moretti’s graphs shows the emergence of the market for novels in Britain, Japan, Italy, Spain, and Nigeria between about 1700 and 2000. In each case, the number of new novels produced per year grows -- not at the smooth, gradual pace one might expect, but with the wild upward surge one might expect of a lab rat’s increasing interest in a liquid cocaine drip.
“Five countries, three continents, over two centuries apart,” writes Moretti, “and it’s the same pattern ... in twenty years or so, the graph leaps from five [to] ten new titles per year, which means one new novel every month or so, to one new novel per week. And at that point, the horizon of novel-reading changes. As long as only a handful of new titles are published each year, I mean, novels remain unreliable products, that disappear for long stretches of time, and cannot really command the loyalty of the reading public; they are commodities, yes, but commodities still waiting for a fully developed market.”
But as that market emerges and consolidates itself -- with at least one new title per week becoming available -- the novel becomes “the great capitalist oxymoron of the regular novelty: the unexpected that is produced with such efficiency and punctuality that readers become unable to do without it.”
What are some similarities and differences between the traditional and radical
approaches?
Digital Traditionalists and Radicals
• Similarities– Visualization– Pattern recognition– Desire to express data in terms of RDF, etc.
• Allows programs to aggregate, mash up, and analyze data
• Differences– Traditionalists favor metadata and ontologies,
where the radicals believe the data will “speak for themselves”
What will Big Data do to us?
Anderson
• End of theory, models• Compare to Shirky• A different relationship to data• Is the difference quantitative?