Elizabeth Churchill, "Data by Design"

Post on 05-Dec-2014

2.511 views 1 download

Tags:

description

 

Transcript of Elizabeth Churchill, "Data by Design"

Elizabeth F. Churchill

Data by Design

Design/Science of participation

(1) Science through (platforms for mediated communication) TMSP

(2) Science on (social science contributions about fundamentals of psychology/communication/collaboration/cooperation)

“Hubble telescope” of social science

WE NEED TO ADDRESS THE DESIGN OF DATA (FOR) SCIENCE ISSUE DIRECTLY

On (1) – TMSP via SMPs

Awareness Conversation and content exchange good;

content storage, indexing and search poor Content sharing

Malleable as well as stable content Coordination

Long and short term Collaborative production

Lightweight to complex Longevity

Currently questionable….

Cooperative activities, centralised

Collective action, decentralised

Collective action, centralised

On (2)- Sciences of the social

Data quality descriptive/predictive; observed/understood;

local/universal; reactive/proactive; stand-alone/replicated

Science quality Data stability/longevity, TOS, content and

social responsibility

WE NEED TO ADDRESS THE DESIGN OF DATA (FOR) SCIENCE ISSUE DIRECTLY

Designers : Statisticians : Computer scientists : Data Scientists : Social scientists

Focus on (2)

Mike Loukideshttp://radar.oreilly.com/2010/06/what-is-data-science.html

On Data Science

“What differentiates data science from statistics is that data science is a holistic approach. We’re increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.”

The first step of any data analysis project is “data conditioning,” or getting data into a state where it’s usable.

On Data Science

The most meaningful definition I’ve heard: “big data” is when the size of the data itself becomes part of the problem.

The need to define a schema in advance conflicts with reality of multiple, unstructured data sources, in which you may not know what’s important until after you’ve analyzed the data.

On Data Science

Data scientists … come up with new ways to view the problem, or to work with very broadly defined problems: “here’s a lot of data, what can you make from it?”

The future belongs to the companies who figure out how to collect and use data successfully.

…and the scientists?

Business logic is not science logic

http://www.forbes.com/sites/onmarketing/2012/06/28/social-media-and-the-big-data-explosion/

Data – the ‘this is the dataset’ problem

Verbeeldingskr8 on Flickr

Interface elements

….lead to data, inviting action and inviting information

Facebook

Like! Like? Agree! Disagree! (bookmarked)Hello Sherry

Dating

profile creation

explicit versus passive “personalisation”

Anxiety, self reflection, identity….

Eva Illouz

Flickr

Recording and Sharing

DocumentingPersonal and Collective Memory

CompetitionStatus

AffiliationGroup Membership

LearningEmulating

AwarenessNear and Far

Curiosity/Voyeurism

Flickr – Photo sharing by user location

The Library of Congress, the Powerhouse Museum, the Smithsonian, New York Public Library, and Cornell University Library

http://www.flickr.com/photos/powerhouse_museum/2980051095/

http://www.museumsandtheweb.com/mw2011/papers/rethinking_evaluation_metrics_in_light_of_flic

Data longevity

“Like all Commons members, the other qualitative measure we value highly is the sheer inventiveness of Flickr members who engage with the photographs.

Currently, Cornell saves links to examples of reuse on delicious (http://www.delicious.com) and displays them as a feed on its website.

Business logic is not science logic

Design/Science of participation

(1) Science through (platforms for mediated communication)

TMSP

(2) Science on (social science contributions about fundamentals of collaboration/cooperation)

“Hubble telescope” of social science

Reflections on requirements

Stability – the existence of content in an accessible (and hopefully the same) format over time

Science requires Consistency: consistently re-code the same data in the same way over a period

of time Reproducibility: the tendency for a group of coders to classify categories

membership in the same way Accuracy: or the extent to which the classification of a text corresponds to a

standard or norm statistically. Validity

correspondence of the categories to the conclusions, avoiding ambiguity and addressing multiple possible classifications

Proof: trust in the inferential procedures and clarity of what level of implication is allowed. i.e. do the conclusions follow from the data or are they explainable due to some other phenomenon

Generalizability of results to a theory Cross-setting comparative interventions

On (2)- Sciences of the social

Data quality descriptive/predictive; observed/understood;

local/universal; reactive/proactive; stand-alone/replicated

Science quality Data stability/longevity, TOS, content and

social responsibility

WE NEED TO ADDRESS THE DESIGN OF DATA (FOR) SCIENCE ISSUE DIRECTLY

Designers : Statisticians : Computer scientists : Data Scientists : Social scientists

Questions?

churchill@acm.org

xeeliz on Twitter

Acknowledgements

On dating: Elizabeth Goodman; on Flickr: Shyong (Tony) Lam, on instrumentation and analysis: David Ayman Shamma & M. Cameron Jones; on Flickr Commons: George Oates

Flickr photographers: Marina Noordegraaf (Verbeeldingskr8), Tim Jagenberg, Nicolas Nova