Studying archives of online behavior

31
Studying Archives of Online Behavior Computational Qualitative Research Seminar James Howison University of Texas at Austin Link to slides on twitter @jameshowison Readings at https://www.dropbox.com/sh/1gx9s2zlnxvumbz/AAAV9uSAJHsiPeJ hSsNnnM9Pa?dl=0

Transcript of Studying archives of online behavior

Page 1: Studying archives of online behavior

Studying Archives of Online Behavior

Computational Qualitative Research Seminar

James HowisonUniversity of Texas at Austin

Link to slides on twitter @jameshowisonReadings at

https://www.dropbox.com/sh/1gx9s2zlnxvumbz/AAAV9uSAJHsiPeJhSsNnnM9Pa?dl=0

Page 2: Studying archives of online behavior

Readings• The presentation and discussion will draw on:

– Howison, J., & Crowston, K. (2014). Collaboration through open superposition: A theory of the open source way. MIS Quarterly, 38(1), 29–50.

– Howison, J., Wiggins, A., & Crowston, K. (2011). Validity Issues in the Use of Social Network Analysis with Digital Trace Data. Journal of the Association for Information Systems, 12(12), Article 2.

– Geiger, R. S., & Ribes, D. (2011). Trace Ethnography: Following Coordination through Documentary Practices. In Proceedings of the 44th Hawaii International Conference on System Sciences (HICSS 2011) (pp. 1–10). Waikoloa, HI. http://doi.org/10.1109/HICSS.2011.455

– Annabi, H., Crowston, K., & Heckman, R. (2008). Depicting What Really Matters: Using Episodes to Study Latent Phenomenon. In Proceedings of the International Conference on Information Systems (ICIS).

– The methodological appendix for the Howison and Crowston Superposition article.

Page 3: Studying archives of online behavior

To the Archives!

The evidence is here, somewhere.CC Credit: http://www.flickr.com/photos/hamadryades/

Page 4: Studying archives of online behavior

Opportunities of online archive studies

• Quantity• Granularity• Accessibility

– Much is openly available– Or the organization can provide bulk access– (compare to ethnography and getting individual

cooperation)• Emic'ness

Page 5: Studying archives of online behavior

Emic'ness?

Emic: in their words (from the inside)Etic: in your words (from the outside)

Naturalistic: the archives are primary to the users and the activity themselves:

"documentary traces are the primary mechanism in which users themselves know their distributed

communities and act within them.”

(Ribes and Geiger, 2011)

Page 6: Studying archives of online behavior

Yet, many challenges

We are using the system (and the system that archived and presents the traces) as a data collection method.

But the systems were not built for research.So we need to ask, for any research question:How well do the archives represent the activity,

as it happened?

Page 7: Studying archives of online behavior

Individual Exercise (6 mins)

1. Pick a system that renders online archives of something you are interested in.

– Can be your project for this course or something you choose right now.

– Slight preference for an archive showing traces from more than 1 person

2. Go and find a specific archive page and read it. 3. Write a sentence or two about what is

happening there.

Page 8: Studying archives of online behavior

Quick Group discussion (4 mins)

• Let’s hear from a few participants about their choices.

Page 9: Studying archives of online behavior

Individual exercise II (6 mins)

• How might archives diverge from experience?1. How did the system record activity at the time?2. How did the conversion to archives occur?3. How is your experience of reading the archives

different from the experience of the participants in the activity that was archived?

Page 10: Studying archives of online behavior

Discussion in groups

• Group discuss questions (go question by question, not person by person)– How recorded? (each person speak)– How converted? …– How is reading experience different?

Page 11: Studying archives of online behavior

Most surprising?

• One person from each group report back aspect that was most surprising.

Page 12: Studying archives of online behavior

Archival transformation

• Deletions– Some data is periodically purged from databases, after all

they are running a website, not a research database.• Overlaps

– When database dumps are pulled periodically• Re-calculations

– Historical depictions on a site (e.g., counts of messages, members, or other data such as downloads) might be later creations or re-calculations

– Can you rely on participants having seen those figures at the time?

Page 13: Studying archives of online behavior

Database schemas are not research ontologies

• Databases (or websites) often use words that are very exciting for research– “Friends”, “Followers”, “Assignment”, “Member”

• But their meaning may have very, very little to do with the sociological/theoretical concept– At best they are a hint that something interesting is happening,

but often are interpreted literally!• Examples from Sourceforge

– use of “assigned to” field on close.– “member list” does not show who is active (no one was ever

removed!)

Page 14: Studying archives of online behavior

Non-archived activity

Page 15: Studying archives of online behavior

Reasoning with missing/complete data

• Trouble both ways• Assuming that the data are complete (rather

than a system selected sample)• Can miss important activities or whole archives that

need to be integrated.

• Oddly enough, when data are complete issues can also emerge– See discussion in JASIST validity in SNA paper.

Page 16: Studying archives of online behavior

Hidden readership

• Archives almost never tell you who read what, and when they read it.– Might be key to interpretation (or might be

irrelevant)– Definitely crucial to any argument about

information flow (and almost all interpretations of SNA measures are about information flow).

• You may be able to impute readership from responses, but it’s a weak signal.

Page 17: Studying archives of online behavior

Activity traces scattered through archives

• Participants experience a flow of activities across different systems– Linked by time and order that they occur

• But they are archived by different systems– If you just read the mailing list you miss so much– And yet so many studies *want* their archive to

be the only one (so much easier to analyze).

Page 18: Studying archives of online behavior
Page 19: Studying archives of online behavior

Pacing of activities

• Participant observation in an open source project highlighted the role of pacing.– Rapid replies indicated interest and importance but

also availability– Very long gaps (sometimes years) indicated

deferral and return.• In other work I was reading archives and found

pacing hard to appreciate; it was very salient in participant observation but hidden in studies relying on trace data alone.

Page 20: Studying archives of online behavior

An episode

Page 21: Studying archives of online behavior

How to represent pacing?

Time stamps

Page 22: Studying archives of online behavior

Representing pacing

• Calculate gaps?

Page 23: Studying archives of online behavior

Reading gaps doesn’t help, easy to ignore, make them harder to ignore?

Page 24: Studying archives of online behavior

Visualize events

Page 25: Studying archives of online behavior

What is to be done?• Sufficient engagement with the system and community to

adequately interpret the traces.• Use a system and see how your data is archived.• When you think a phenomena/construct can be

operationalized computationally, at least show some narrative examples from the dataset.

• Complement archives with interviews and/or surveys– Archives make great prompts for interviews– Lakhani and Wolf (2003) survey immediately after a post.

• Gaskin et al (2014) “Zooming in and out of sociomaterial routines” MISQ.

Page 26: Studying archives of online behavior

An ontology for trace data studies• Document

– Archived content. E.g., An e-mail message, tracker comment, release note, pull-request, log entry.

– Provides evidence for events and actions. – One document may provide evidence for multiple events and actions.

• Event– An event causes documents to be archived. Sending an email, releasing a version.

• Action– The contextualized meaning of an event. e.g., contributing code, showing leadership

(can be at quite different conceptual levels in different studies.) • Participant

– An actor (typically a person, but could be a machine or bot)• Identifier

– A string associated with a participant. – Many identifiers could refer to one participant (e.g, email and username)– but many participants may act through one identifier (e.g., “admin account”)

Page 27: Studying archives of online behavior

Episodes

• A unit of analysis, facilitating comparison and summary (e.g., counting)– Compare to content analysis or nlp that counts

mentions of concepts, database queries that count documents, surveys that measure attitudes.

– The detail provided by trace data renders episodes more accessible, research to be more granular, closer to the work.

• Ideally emic (meaningful to and recognizable by participants)

Page 28: Studying archives of online behavior

Ok, but how to store this?

• Moving from documents and events to actions and outcomes is interpretative work– I do the qualitative first, then hope to make it computable

(e.g, through machine learning)• It is akin to content analysis but a much more

complicated ontology– Content analysis (classic or grounded theory) assigns Codes

to Documents– Software like Atlas ti has trouble handling coding of

structured data (dates, linked documents like threads, multiple identifiers for single participant.).

Page 29: Studying archives of online behavior

I use RDF• Resource Description Format

– Triples: James hasEmail [email protected]– URLs working natively (making viewing original archives easy)

• Retains original data structure – e.g., Document in thread by Identifier– Allows ad-hoc addition of structure (schemaless)– Allows inheritance (e.g., MailingListEvent a CommunicationEvent)

• Allows you to overlay higher level structure– e.g., Action(s) in (ordered) Episode by Participant– And then apply codes to Actions (storing when, who, why)

• Querying via SPARQL, Validation via RDF rules (aka SPIN)

Page 30: Studying archives of online behavior

An episode

Page 31: Studying archives of online behavior

Showing an example