Peter Granda Archival Assistant Director http:// / Data Archives and Data Producers: A Cooperative...

20
Peter Granda Archival Assistant Director http:// www.icpsr.umich.edu / Data Archives and Data Producers: A Cooperative Partnership

Transcript of Peter Granda Archival Assistant Director http:// / Data Archives and Data Producers: A Cooperative...

Page 1: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

Peter Granda

Archival Assistant Director

http://www.icpsr.umich.edu/

Data Archives and Data Producers: A Cooperative Partnership

Page 2: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

What I plan to discuss

• Reasons to share social science data• Obstacles to share social science

data• Role of data archives• Best practices for preparing data for

archiving• New developments to facilitate

archiving process and improve cooperation between data archives and data producers

Page 3: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

Why should data producers share research data? • Data sharing achieves many important goals for the

scientific community, such as:

reinforcing open scientific inquiry encouraging diversity of analysis and opinion promoting new research, testing of new or alternative

hypotheses and methods of analysis supporting studies on data collection methods and

measurement facilitating education of new researchers enabling the exploration of topics not envisioned by the initial

investigators permitting the creation of new datasets by combining data from multiple sources

Page 4: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

Obstacles/Challenges to data sharing and archiving: Reasons not to share Costs to producer in creating public-use files Maintaining respondent confidentiality Use by potential competitors Less credit given for archiving data than for

continually collecting new data especially within the academic community

Unexpected duplication of effort possible in using public-use files for research

Page 5: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

Result of potential conflict between sharing data and the difficulties of doing so

• Even in places where there is a tradition of archiving social science survey and aggregate data, it is not always done or not done correctly

• Funds not always available or, more commonly, all of the funds are spent on data collection process

• Insufficient thought given to preparing materials throughout the data “life-cycle” process that could be easily used by other researchers

Page 6: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

ROLE OF DATA ARCHIVE

• Assist data producers by providing advice regarding procedures to use when archiving their data and documentation

• Consult with data producers regarding respondent confidentiality

• Discuss best strategy and location to preserve the data in perpetuity

Page 7: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

Methods of Data Sharing - Versioning Importance of this issue for replication: e.g. users need to

know which version of data file was used in publications Increasing trend: Data files stored on data producer Web

sites:– Greater number of interim or ‘early release’ versions

now appearing– Need to have “versioning” system in place if data

files are updated Archives/data depositories usually preserve “final”

versions of data files and also have systems in place to record history of each data collection they receive

Page 8: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

Public Files

Restricted Files

Available to all users

Available to general research community

Available to members of a specific research team

Accessible only through a formal application process

Accessible only at a specific location under very restricted conditions

CONFIDENTIALITY

CONCERNS

Page 9: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

Defining Best Practices – General Goals

• Maintain respondent confidentiality while releasing the maximum amount of data publicly

• Archive materials in a format that will insure long-term preservation

• Provide sufficient information so that users who are not expert in the subject matter of the data collection could still use it effectively

Page 10: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

Best Practices – Confidentiality

• Dangers of direct identifiers and potential dangers of indirect identifiers

• Solutions: removal, bracketing, top-coding, collapsing and/or combining variables, sampling, swapping, disturbing

• Restricted-use files or licenses• Data enclaves

Page 11: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

Best Practices – Data Formats

Options:• ASCII data files and record layouts• ASCII data plus setup files• Software-specific system files• Portable software-specific files• Online analysis-ready files

*** IMPORTANCE OF ASCII AS A PRESERVATION FORMAT ***

Page 12: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

Best Practices – Documentation• Project description• Sample and sampling procedures• Weighting• Date, geographic location of data collection, and time period

covered• Data source(s)• Unit(s) of analysis/observation• Variables• Technical information on files• Data collection instruments• Interviewer guide, recode logic, coding instrument

Page 13: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

Best Practices: Yes, in theory, but What is the Real Situation?

• Even on well-funded projects archiving is often given little attention

• It is not unusual that the vast majority of project funds are spent on data collection

• Documentation is often prepared hastily with insufficient thought given to how other researchers might use it

Page 14: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

Best Practices: Yes, in theory, but What is the Real Situation?• Experience from the Archival

Perspective:

• Full compliance with submission requirements is often the exception rather than the rule

Page 15: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

New Project between ICPSR and School of Information at the University of Michigan• Purpose: to identify barriers and develop

incentives for data producers to deposit “archive-ready” datasets• Archive-ready: data and

documentation files that are supplied to the archive in a format based on a specific agreement with the data producer

Page 16: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

What are some of these barriers?

• Archiving process requires time, resources, and attention to detail

• In academic settings, researchers are awarded for publishing not for archiving

• Few “formal” professional rewards for depositing “archive-ready” datasets

Page 17: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

What rewards and incentives are now offered to data producers?

• Appeal to self-interest• Appeal to altruism• Reputation effects• Archive services• Professional norms

Page 18: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

What rewards and incentives might be offered to data producers in the future?• Make it easier to collect and report

information about uses of the data collection by other researchers (“reputation through citation”)

• Scoring rule: how “archive-ready” was the collection submitted?

• Enhanced service from the archive • Publications: “reviews” of datasets

Page 19: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

Implementation in different social science research environments• Archive resources may vary affecting

how much guidance and assistance they can provide to data producers

• Technical standards could also vary: in some places, the importance of certain data formats (e.g., SPSS files) may be paramount

• The key: what is most important for local researchers?

Page 20: Peter Granda Archival Assistant Director http://  / Data Archives and Data Producers: A Cooperative Partnership.

Спасибо !