Post on 05-Dec-2014
description
Elizabeth F. Churchill
Data by Design
Design/Science of participation
(1) Science through (platforms for mediated communication) TMSP
(2) Science on (social science contributions about fundamentals of psychology/communication/collaboration/cooperation)
“Hubble telescope” of social science
WE NEED TO ADDRESS THE DESIGN OF DATA (FOR) SCIENCE ISSUE DIRECTLY
On (1) – TMSP via SMPs
Awareness Conversation and content exchange good;
content storage, indexing and search poor Content sharing
Malleable as well as stable content Coordination
Long and short term Collaborative production
Lightweight to complex Longevity
Currently questionable….
Cooperative activities, centralised
Collective action, decentralised
Collective action, centralised
On (2)- Sciences of the social
Data quality descriptive/predictive; observed/understood;
local/universal; reactive/proactive; stand-alone/replicated
Science quality Data stability/longevity, TOS, content and
social responsibility
WE NEED TO ADDRESS THE DESIGN OF DATA (FOR) SCIENCE ISSUE DIRECTLY
Designers : Statisticians : Computer scientists : Data Scientists : Social scientists
Focus on (2)
Mike Loukideshttp://radar.oreilly.com/2010/06/what-is-data-science.html
On Data Science
“What differentiates data science from statistics is that data science is a holistic approach. We’re increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.”
The first step of any data analysis project is “data conditioning,” or getting data into a state where it’s usable.
On Data Science
The most meaningful definition I’ve heard: “big data” is when the size of the data itself becomes part of the problem.
The need to define a schema in advance conflicts with reality of multiple, unstructured data sources, in which you may not know what’s important until after you’ve analyzed the data.
On Data Science
Data scientists … come up with new ways to view the problem, or to work with very broadly defined problems: “here’s a lot of data, what can you make from it?”
The future belongs to the companies who figure out how to collect and use data successfully.
…and the scientists?
Business logic is not science logic
http://www.forbes.com/sites/onmarketing/2012/06/28/social-media-and-the-big-data-explosion/
Data – the ‘this is the dataset’ problem
Verbeeldingskr8 on Flickr
Interface elements
….lead to data, inviting action and inviting information
Like! Like? Agree! Disagree! (bookmarked)Hello Sherry
Dating
profile creation
explicit versus passive “personalisation”
Anxiety, self reflection, identity….
Eva Illouz
Flickr
Recording and Sharing
DocumentingPersonal and Collective Memory
CompetitionStatus
AffiliationGroup Membership
LearningEmulating
AwarenessNear and Far
Curiosity/Voyeurism
Flickr – Photo sharing by user location
The Library of Congress, the Powerhouse Museum, the Smithsonian, New York Public Library, and Cornell University Library
http://www.flickr.com/photos/powerhouse_museum/2980051095/
http://www.museumsandtheweb.com/mw2011/papers/rethinking_evaluation_metrics_in_light_of_flic
Data longevity
“Like all Commons members, the other qualitative measure we value highly is the sheer inventiveness of Flickr members who engage with the photographs.
Currently, Cornell saves links to examples of reuse on delicious (http://www.delicious.com) and displays them as a feed on its website.
Business logic is not science logic
Design/Science of participation
(1) Science through (platforms for mediated communication)
TMSP
(2) Science on (social science contributions about fundamentals of collaboration/cooperation)
“Hubble telescope” of social science
Reflections on requirements
Stability – the existence of content in an accessible (and hopefully the same) format over time
Science requires Consistency: consistently re-code the same data in the same way over a period
of time Reproducibility: the tendency for a group of coders to classify categories
membership in the same way Accuracy: or the extent to which the classification of a text corresponds to a
standard or norm statistically. Validity
correspondence of the categories to the conclusions, avoiding ambiguity and addressing multiple possible classifications
Proof: trust in the inferential procedures and clarity of what level of implication is allowed. i.e. do the conclusions follow from the data or are they explainable due to some other phenomenon
Generalizability of results to a theory Cross-setting comparative interventions
On (2)- Sciences of the social
Data quality descriptive/predictive; observed/understood;
local/universal; reactive/proactive; stand-alone/replicated
Science quality Data stability/longevity, TOS, content and
social responsibility
WE NEED TO ADDRESS THE DESIGN OF DATA (FOR) SCIENCE ISSUE DIRECTLY
Designers : Statisticians : Computer scientists : Data Scientists : Social scientists
Questions?
churchill@acm.org
xeeliz on Twitter
Acknowledgements
On dating: Elizabeth Goodman; on Flickr: Shyong (Tony) Lam, on instrumentation and analysis: David Ayman Shamma & M. Cameron Jones; on Flickr Commons: George Oates
Flickr photographers: Marina Noordegraaf (Verbeeldingskr8), Tim Jagenberg, Nicolas Nova