Peter Granda Archival Assistant Director http:// / Data Archives and Data Producers: A Cooperative...
-
Upload
philip-hancock -
Category
Documents
-
view
214 -
download
2
Transcript of Peter Granda Archival Assistant Director http:// / Data Archives and Data Producers: A Cooperative...
Peter Granda
Archival Assistant Director
http://www.icpsr.umich.edu/
Data Archives and Data Producers: A Cooperative Partnership
What I plan to discuss
• Reasons to share social science data• Obstacles to share social science
data• Role of data archives• Best practices for preparing data for
archiving• New developments to facilitate
archiving process and improve cooperation between data archives and data producers
Why should data producers share research data? • Data sharing achieves many important goals for the
scientific community, such as:
reinforcing open scientific inquiry encouraging diversity of analysis and opinion promoting new research, testing of new or alternative
hypotheses and methods of analysis supporting studies on data collection methods and
measurement facilitating education of new researchers enabling the exploration of topics not envisioned by the initial
investigators permitting the creation of new datasets by combining data from multiple sources
Obstacles/Challenges to data sharing and archiving: Reasons not to share Costs to producer in creating public-use files Maintaining respondent confidentiality Use by potential competitors Less credit given for archiving data than for
continually collecting new data especially within the academic community
Unexpected duplication of effort possible in using public-use files for research
Result of potential conflict between sharing data and the difficulties of doing so
• Even in places where there is a tradition of archiving social science survey and aggregate data, it is not always done or not done correctly
• Funds not always available or, more commonly, all of the funds are spent on data collection process
• Insufficient thought given to preparing materials throughout the data “life-cycle” process that could be easily used by other researchers
ROLE OF DATA ARCHIVE
• Assist data producers by providing advice regarding procedures to use when archiving their data and documentation
• Consult with data producers regarding respondent confidentiality
• Discuss best strategy and location to preserve the data in perpetuity
Methods of Data Sharing - Versioning Importance of this issue for replication: e.g. users need to
know which version of data file was used in publications Increasing trend: Data files stored on data producer Web
sites:– Greater number of interim or ‘early release’ versions
now appearing– Need to have “versioning” system in place if data
files are updated Archives/data depositories usually preserve “final”
versions of data files and also have systems in place to record history of each data collection they receive
Public Files
Restricted Files
Available to all users
Available to general research community
Available to members of a specific research team
Accessible only through a formal application process
Accessible only at a specific location under very restricted conditions
CONFIDENTIALITY
CONCERNS
Defining Best Practices – General Goals
• Maintain respondent confidentiality while releasing the maximum amount of data publicly
• Archive materials in a format that will insure long-term preservation
• Provide sufficient information so that users who are not expert in the subject matter of the data collection could still use it effectively
Best Practices – Confidentiality
• Dangers of direct identifiers and potential dangers of indirect identifiers
• Solutions: removal, bracketing, top-coding, collapsing and/or combining variables, sampling, swapping, disturbing
• Restricted-use files or licenses• Data enclaves
Best Practices – Data Formats
Options:• ASCII data files and record layouts• ASCII data plus setup files• Software-specific system files• Portable software-specific files• Online analysis-ready files
*** IMPORTANCE OF ASCII AS A PRESERVATION FORMAT ***
Best Practices – Documentation• Project description• Sample and sampling procedures• Weighting• Date, geographic location of data collection, and time period
covered• Data source(s)• Unit(s) of analysis/observation• Variables• Technical information on files• Data collection instruments• Interviewer guide, recode logic, coding instrument
Best Practices: Yes, in theory, but What is the Real Situation?
• Even on well-funded projects archiving is often given little attention
• It is not unusual that the vast majority of project funds are spent on data collection
• Documentation is often prepared hastily with insufficient thought given to how other researchers might use it
Best Practices: Yes, in theory, but What is the Real Situation?• Experience from the Archival
Perspective:
• Full compliance with submission requirements is often the exception rather than the rule
New Project between ICPSR and School of Information at the University of Michigan• Purpose: to identify barriers and develop
incentives for data producers to deposit “archive-ready” datasets• Archive-ready: data and
documentation files that are supplied to the archive in a format based on a specific agreement with the data producer
What are some of these barriers?
• Archiving process requires time, resources, and attention to detail
• In academic settings, researchers are awarded for publishing not for archiving
• Few “formal” professional rewards for depositing “archive-ready” datasets
What rewards and incentives are now offered to data producers?
• Appeal to self-interest• Appeal to altruism• Reputation effects• Archive services• Professional norms
What rewards and incentives might be offered to data producers in the future?• Make it easier to collect and report
information about uses of the data collection by other researchers (“reputation through citation”)
• Scoring rule: how “archive-ready” was the collection submitted?
• Enhanced service from the archive • Publications: “reviews” of datasets
Implementation in different social science research environments• Archive resources may vary affecting
how much guidance and assistance they can provide to data producers
• Technical standards could also vary: in some places, the importance of certain data formats (e.g., SPSS files) may be paramount
• The key: what is most important for local researchers?
Спасибо !