Key data management issues for social science research Data management for ESRC Research Centres and...

38
Key data management issues for social science research Data management for ESRC Research Centres and Programmes 3 November 2009, London Louise Corti UK Data Archive University of Essex

Transcript of Key data management issues for social science research Data management for ESRC Research Centres and...

Key data management issues for social science research

Data management for ESRC Research

Centres and Programmes

3 November 2009, London

Louise CortiUK Data Archive

University of Essex

Overview

• Why share and how to share• Challenges in re-using and sharing data• Key areas in data management and sharing• Data Management Policies• Role of the UKDA in supporting researchers• New initiatives

• OECD message: widespread data sharing will enable researchers, empower citizens and convey tremendous scientific, economic, and social benefits

• sharing data:– facilitates research often beyond the scope of the original

research– demonstrates continued usage of data – encourages scientific inquiry– avoids duplicate data collection – provides resources for education and training

• many research councils are now committed to a long-term strategy for data resource provision and for supporting UK researchers

• most research data can be shared with other researchers!

Why share and how to share

Principles of Data Sharing Policies

• publicly funded research data are a valuable, long term resource

• researchers must collect data in such a way as to ensure longer-term sharing

• researchers must document data in such a way as to ensure longer-term usability

• to ensure maximum research exploitation data must be managed effectively from day 1

• researchers require support for data management through the life of the project

• data must be made available by researchers for long-term archiving and dissemination to Research Council supported data centres

Data Management Policies

• Mandatory:– ESRC Data Policy– NERC Data Policy– RELU Data Management Policy– MRC Data Sharing and Preservation Policy

• Advised and encouraged– BBSRC Data Policy– Wellcome Trust Policy on Data Management

and Sharing– British Academy

Successful data sharing policies

Funder commitment:

• contractual obligations• funded support and archiving infrastructure• instilling positive attitudes towards sharing data• encouraging researchers to liaise with data experts

throughout the lifecycle of data creation• enabling capacity to support everyone who requires

support• encouraging deposit of ethically and legally-shareable

high quality data and documentation• peers reviewers to advise on the long-term value of

research data• policing and penalties for “defaulters”

• Support centre/archive:– good partnership and communication with funding

bodies– regular updates about new data creation activities– Data Management Plans and consent forms to

review– building and providing tools for research groups to

share informally

• For award holders:– data recognised as a valid academic output -ISDN?– recognition is given for re-using data– realistic data management plans

Successful data sharing policies

VALUE OF SHARING

How can research data be used?

• description and context

• comparative research, restudy or follow-up study

• re-analysis/secondary analysis

• research design and methodological advancement

• replication/validation of published work

• teaching and learning

Where can you find research data?

• Researchers, research groups and organisations universities, public and private sectors older material often held by default/legacy/a

record/cupboard/old digital media

• University Archives strong social science research tradition already significant collections of older research

materials

• Libraries and Museums strong historical, ethnology, oral or local history and

collections

• Data Archives and Digital libraries proactive acquisition of digital research resources

Accessing local collections

• if someone (internal or external) wanted to use your Centre’s data?

– where would they start and who would they ask?– is there an historical record of research, data and

outputs – would you have support staff to deal with “users”– would you be able to find older data (PI has left)?

• some centres are digitising materials and records, but it is very expensive money is scarce

• very few would offer an access service, let alone online

• would probably rely on the PI having to make time to collaborate

• manage it well, archive it and the burden is diminished

CHALLENGES OF RE-USING DATA

Why are qualitative data not shared that much?

• cultural reasons – not practised by the typical qualitative researcher

• has never been a well-documented research method

• no standardised methods of data description

• few dedicated archives

• no real dissemination infrastructure

• no co-ordinated resource discovery

• issues about the nature of the ‘special relationships’ of ethnographer and participant

‘Difficulties’ cited in re-using data

• constraints of informed consent and ethics

• ownership of qualitative data...whose are they?

• problem of the implicit nature of qualitative data collection and analysis – context and reflexivity

• lack of time to get fully acquainted with research materials created by someone else

• insecurity about the exposure of one’s own research practice or threat of misinterpretation

• ‘no data to suit my needs’ - lack of publicly available research data

Ethical and consent considerations

• archived data should always conform to ethical and legal guidelines with respect to the preservation of anonymity when this has been requested by informants or guaranteed to them

• consent and agreements for sharing CAN be made at the time during the research process/ fieldwork – and afterwards

• various additional strategies for sharing– editing the original data – restricting access/vetting– user undertakings with legal back-up

Are these data really yours?

• copyright needs examining…speaker in an interview owns their words, the researcher does not. Researcher only owns the recording or transcription…

• actually, the funder owns copyright too as does your employer

• if your data has any monetary ‘value’ your employer will be the first to claim it!

• publicly funded data should be shared

I wasn’t there but…..

• yes – of course being present during fieldwork adds a richness over and above the raw data

• but context can be provided at many levels – audio-visual record and full transcription– description of setting, observations– details of methods, sampling, analysis– relevant macro-level details (period, events etc.)

• good to interview the principal investigator, research team

• if historians can make use of partial sources that are centuries old, why can’t sociologists?

…historians have done this for centuries!

CHALLENGES OF SHARING

Challenges

• cultural practice • knowledge and expertise• reasons and incentives – REF• support

Today’s need for speed…

• if data are to be shared they need to be:

– easily accessible – ‘take away’– rapidly accessible - quickly– free or at least affordable– well documented– supported – by humans beyond the life of the current

research

• the length of the research lifecycle has decreased significantly

• and it is far more competitive• archiving is just an extra burden!

How can we help you to share data

• offer best practice strategies and methods for creating a shareable dataset:– managing your own data across the research lifecycle– selecting data worth keeping– describing data– “processing” data– storing data– providing ethical and legal access to data

• based on what we do already and have done for some 40 years

• early this year we released new suite of guidance as web pages and a brochure

• offer bespoke advisory service and training

Key areas for managing data

• how to share data• consent, confidentiality and ethics• copyright and other rights• data description and metadata• data formats and software• data storage, back-up and security

• most problematic areas:– definitions and legal aspects– gaining consent – consent forms – disclosure – data sharing and confidentiality

ESDS - examples of practical advice

• ensure good housekeeping• Model consent forms/wording• anonymise data appropriately • use model transcript template• use model excel sheet for metadata

capture• ensure sufficient context described• use suitable data formats e.g. .rtf for text;

.wav or mp3 for audio• establish version controlling for group

access

Website

ESDS website – process for sharing

Our work at the UK Data Archive

• Research Data Management Support Services (RDMSS) work– support ESDS and RELU award holders– UK Data Archive – generic advice, collaboration and

worldwide training

• Researcher Development Initiative (RDI) bid - training and capacity building

• JISC/ESRC support for ESRC high-investments e.g. Centres and Programmes– Data Management Planning– training – looking for guinea pigs as case studies

What are data “worth” keeping?

• rich data that are well-documented - a string of yeses and nos would be …dull and unacceptable

• format, usability and condition of material

• data that have further analytic potential than the original investigation (depth; large-scale; longitudinal)

• relative importance or impact of the study e.g.. had a major influence in its field and/or representing the working life of a significant researcher

• copyright and confidentiality issues

• complementary to existing data holdings (series)

What is not accepted?

• data that do not have adequate consent for sharing

• data that are semi-structured without richness

• data that have no methodology documented

• all ‘rejections’ are now offered to new forthcoming UKDA-store – UKDA’s self archiving system (FEDORA)

• those that have NOT sought consent to share where it was felt possible are sent a warning letter and referred to Research Council

How do we archive data?

• specialist metadata DDI versus ISAD(G)

• data are ‘processed’ at the study and file/interview/object levels

– error checking/validation of collection contents– check consent and confidentiality agreements met– basic reformatting of text undertaken– possibly anonymisation undertaken– creation of data context - digital user guides, variable

and data lists– access conditions agreed and applied– data mounted for download system

• published ESDS guide to data processing techniques

Simples ;-)

The paper mountain

• most new text-based collections are born digital

• but much older data in paper format

• is it worth digitising paper?

– scan and OCR samples of key data

– scan as image files to enable faster throughput

– can selectively digitise ‘highlights’

• what about audio?

– can digitise sound bytes from analogue sources

Centralised services

• centralised archives have existing infrastructures

• offer on-line accessibility

• and require:– a technical infrastructure – an access control system– researcher liaison staff– acquisition, data preparation, user support and

promotions staff– resource discovery tools– promotional and communications devices

Who are the staff at the UK Data Archive?

• most of the staff now are multi skilled – can handle qualitative and quantitative data

• this is important, as much data coming in is from mixed methods studies

• quali-centric staff suffer from being scared of numeric data, which means they are less flexible

• I believe that all staff should be able to handle all data types (bilingual) and metadata (trilingual), otherwise synergies are less effective

• technical skills are important but can be more easily bought in

Recent RC initiatives

• Archiving and Sharing Demonstrator scheme (QUADS) (ESRC)– develop and promote innovative methodological approaches

to the archiving, sharing, re-use and secondary analysis of qualitative research and data

– develop a range of new models for increasing access to qualitative data resources, and for extending the reach and impact of qualitative studies

– disseminate good practice in qualitative data sharing and research archiving

• RELU-DSS (ESRC, NERC, BBSRC)– set up to help oversee and implement the Programme's Data

Management Policy and Data Management Plan (builds on existing ESRC and NERC mandatory data policies

– provides a support service for RELU researchers and staff to gain information and guidance on issues surrounding longer-term data sharing and preservation

More RC initiatives

• Timescapes (ESRC)– national qualitative longitudinal study – archiving funded and built in from start– repository built for “in-house” data sharing– aim to share more widely later

• Research Methods Programme (RMP)– ReStore – a sustainable repository of online research

methods resources for RMP outputs

• Technology Enhanced Learning (TEL) (ESRC & EPSRC)– using new technologies to help develop learners’  skills of

enquiry, analysis, synthesis, knowledge construction and collaboration

– ENSEMBLE - a case-based archiving and semantic web project

Recent JISC and other initiatives

• Digital Curation Centre (JISC)– Data Audit Framework– Data Management Plans

• Research Information Network (RIN) (HEFCE, RC & Libraries)– case studies on data sharing and on the “profession”

• JISC Research Data Management Programme– addressing strategic requirements for UK HE to improve its

data management capability and better to understand how this may be achieved

– help establish the foundations for the UK research data infrastructure

• UK Research Data Service UKRDS(HEFCE & JISC)– Pathfinder case studies on data sharing in

four universities

European picture: national

mature servicepilot servicefeas studkeen, but..very quiet

Contacts

• RDMSS

[email protected]@essex.ac.uk

UK Data ArchiveUniversity of EssexColchesterEssex

www.esds.ac.uk