GBIF BIFA mentoring, Day 5b Data paper, July 2016

30

Transcript of GBIF BIFA mentoring, Day 5b Data paper, July 2016

Adatapaperisasearchablemetadatadocument,describingapar2culardatasetoragroupofdatasets,publishedintheformofapeer-reviewedar2cleinascholarlyjournal.

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

Theprimarypurposeofadatapaperistodescribedataandthecircumstancesoftheircollec2on,ratherthantoreporthypothesesandconclusions.

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

Adatapaperisameanofbringingcreditandrecogni4ontoallthoseinvolvedindatapublica2onandtoalertthescien2ficcommunitytotheexistenceofbiodiversitydatasetsandthevaluetheycanbringtopar2cularresearchprojects;andasamechanismforqualityassessmentandcontrolofdataaccessiblethroughGBIFandothernetworks.

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

What is a dataset?

Adatasetisunderstoodhereasadigitalcollec2onoflogicallyconnectedfacts(observa4ons,descrip4onsormeasurements).Typicallystructuredintabularformasasetofrecords,witheachrecordcomprisingasetoffields,andrecordedinoneormorecomputerdatafilesthattogethercompriseadatapackage.

PenevL,MietchenD,ChavanV,HagedornG,RemsenD,SmithV,ShoPonD(2011).PensoTDataPublishingPoliciesandGuidelinesforBiodiversityData.PensoTPublishers,hPp://www.pensoT.net/J_FILES/PensoT_Data_

Publishing_Policies_and_Guidelines.pdf

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

PenevL,MietchenD,ChavanV,HagedornG,RemsenD,SmithV,ShoPonD(2011).PensoTDataPublishingPoliciesandGuidelinesforBiodiversityData.PensoTPublishers,hPp://www.pensoT.net/J_FILES/PensoT_Data_

Publishing_Policies_and_Guidelines.pdf

Thereisadis2nc2onbetween‘sta$cdata’some2mescalled‘deaddatasets’and‘curated’dataor‘living’datasets.

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

discretecollec2onofdataunderlyingapaper

collec2onofdatarelatedtomonitoringac2vi2es

discretecollec2onofdatarelatedtodistribu4onofspecies

discretecollec2onofdataunderlyingaspecificresearch

discretecollec2onofdatarelatedtoacollec4onofspecimens

A DATASET CAN BE:

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

adatapackageisthe‘file’containingtheactualdataandmetadataandadescriptorfile

GBIFprefersDarwinCoreArchives(DwC-A)asaformatforpublishingDataDarwinCoreArchive(DwC-A)isaBiodiversityinforma2csdatastandardthatmakesuseoftheDarwinCoretermstoproduceasingle,self-containeddatasetforspeciesoccurrenceorchecklistdata.Essen2allyitisasetoftext(CSV)fileswithasimpledescriptor(meta.xml)toinformothershowyourfilesareorganized.TheformatisdefinedintheDarwinCoreTextGuidelines.ItisthepreferredformatforpublishingdatatotheGBIFnetwork.

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

Why publish data?

dataproducedusingpublicfundsshouldberegardedasacommongood,andshouldbeopenlypublishedandmadeavailableforinspec2on,interpreta2onandre-usebythirdpar2es.

Opendataincreasestransparencyandtheoverallqualityofscience;

Publisheddatasetscanbere-analyzedandverifiedbyothers;

Publisheddatacanbecitedandre-usedinthefuture,eitheraloneorinassocia2onwithotherdata;

Datacanbeintegratedwithotherdatasetsacrossbothspaceand2me;

Dataintegra2onincreasesrecogni4onandopportuni4esforcollabora2on;

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

Duplica2onofdata-collec2ngeffortsandassociatedcostswillbereduced;

Publisheddatacanbeindexedandmadediscoverable,browsableandsearchablethroughinternetservices(e.g.Websearchengines)ormorespecificinfrastructures(e.g.,GBIFforbiodiversitydata);

Collec2onmanagerscantraceusageandcita4onsofdigi2zeddatafromtheircollec2ons;

Datacreators,andtheirins2tu2onsandfundingagencies,canbecreditedfortheirworkofdatacrea2onandpublica2onthroughtheconven2onalchannelsofscholarlycita2on;priorityandauthorshipisachievedinthesamewayaswithapublica2onofaresearchpaper;

Opendataincreasesthepoten2alforinterdisciplinaryresearch,andforre-useinnewcontextsnotenvisagedbythedatacreator;

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

Datasetsandtheirmetadata,andanyrelatedDataPapers,maybeinter-linkedintoResearchObjects,toexpediteandmutuallyextendtheirdissemina2on,tothebenefitoftheauthors,otherscien2stsintheirfields,andsocietyatlarge;

Publisheddatamaybestructuredas‚LinkedData‘,andsocreatesnewknowledge

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

•  AllrightsreservedàDataunusable

•  OpenDataCommonsPublicDomainDedica2onandLicense(PDDL)–(CC0)

•  Crea2veCommonsAPribu2on-NoDerivs(CCBY-ND)•  Crea2veCommonsAPribu2on-NonCommercial(

CCBY-NC)•  Crea2veCommonsAPribu2on-ShareAlike(CCBY-SA)

orOpenDataCommonsOpenDatabaseLicense(ODbL)

•  Crea2veCommonsAPribu2on(CCBY)orOpenDataCommonsAPribu2onLicense(ODC-By)

hPp://www.canadensys.net/2012/why-we-should-publish-our-data-under-cc0

WHAT DATA-LICENSE SHOULD I USE?

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

Why publish data papers?

Improvetheusabilityofyourpublisheddata!Receivecreditthroughindexingandcita2onofthepublishedpaper.

Increasethevisibilityandcredibilityofthedataresourcesyoupublish.Trackmoreeffec2velytheusageandcita2onsofthedatayoupublish.Receivefeedbackonyourdataset.Increaseyournetwork.Getmoreoutofyoudata.Improvethequalityofyourdataset.

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

Metadata,literally“dataaboutdata”areanessen2alcomponentofadatamanagementsystem,describingsuchaspectsasthe“what,where,when,whoandhow”pertainingtoaresource.IntheGBIFcontext,resourcesaredatasets,looselydefinedascollec2onsofrelateddata,thegranularityofwhichisdeterminedbythedatacustodian.Metadatacanoccurinseverallevelsofcompleteness.Ingeneral,metadatashouldallowaprospec2veenduserofdatato:1.iden2fy/discoveritsexistence,2.learnhowtoaccessoracquirethedata,3.understanditsfitness-for-use,and4.learnhowtotransfer(obtainacopyof)thedata.5.learnhowthedatashouldbeused

GBIF(2011).GBIFMetadataProfile,ReferenceGuide,Feb2011,(contributedbyOTuama,E.,Braak,K.,Copenhagen:GlobalBiodiversityInforma2onFacility,19pp.AccessibleathPp://links.gbif.org/gbif_metadata_profile_how-to_en_v1

METADATA

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

GBIF metadata profile

TheGBIFMetadataProfileisprimarilybasedontheEcologicalMetadataLanguage(EML).TheGBIFprofileu2lizesasubsetofEMLandextendsittoincludeaddi2onalrequirementsthatarenotaccommodatedintheEMLspecifica2on.Thefollowingtablesprovideshortdescrip2onsoftheprofileelements,andwhererelevant,linkstomorecompleteEMLdescrip2ons.

-  hPp://knb.ecoinforma2cs.org/soTware/eml/-  hPps://knb.ecoinforma2cs.org/#external//emlparser/docs/index.html

TheGBIFMetadataProfile(GMP)wasdevelopedinordertostandardizehowresourcesgetdescribedatthedatasetlevelintheGBIFDataPortalThisprofilecanbetransformedtoothercommonmetadataformatssuchastheISO19139metadataprofile.

TheelementsusedinIPTareasubsetofthecompleteGBIFmetadataprofile

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

DESCRIBING A DATASET

IPT metadata based on GBIF metadata profile

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

•Dataset(Resource)•Project•PeopleandOrganisa2ons•KeywordSet(GeneralKeywords)•Coverage

oTaxonomicCoverageoGeographicCoverageoTemporalCoverage

•Methods•IntellectualPropertyRights•Addi2onalMetadata+NCD(NaturalCollec2onsDescrip2onsData)

Element Descrip4on2tle Adescrip2onofthe

resourcethatisbeingdocumentedthatislongenoughtodifferen2ateitfromothersimilarresources.

descrip2on Abriefoverviewoftheresourcethatisbeingdocumented.

metadatalanguage ThelanguageinwhichthemetadatadocumentiswriPen.

type Thetypeofresource.

subtype Specimenorobserva2ons

BASIC METADATA

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

Element Descrip4onresourcecontact Theresourcecontactisthe

personororganisa2onthatshouldbecontactedtogetmoreinforma2onabouttheresource,thatcuratestheresourceortowhomputa2veproblemswiththeresourceoritsdatashouldbeaddressed

resourcecreator Theresourcecreatoristhepersonororganisa2onresponsiblefortheoriginalcrea2onoftheresourcecontent.

metadataprovider Themetadataprovideristhepersonororganisa2onresponsibleforproducingtheresourcemetadata.

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

Element Descrip4ongeographicDescrip2on shorttextdescrip2onofa

dataset'sgeographicarealdomain.

westBoundingCoordinate fieldcoveringtheWmarginofaboundingbox.

eastBoundingCoordinate fieldcoveringtheEmarginofaboundingbox.

northBoundingCoordinate fieldcoveringtheNmarginofaboundingbox.

southBoundingCoordinate fieldcoveringtheSmarginofaboundingbox.

descrip2on Shortdescrip2onofthegeographicalcoverage

GEOGRAPHIC COVERAGE

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

TAXONOMIC COVERAGE

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

Element Descrip4onTaxonomiccoveragedescrip2on

Adescrip2onoftherangeoftaxaaddressedinthedatasetorcollec2on.

AddseveraltaxaAddScien2ficName;CommonName;Rank

Addtaxainmetadatauptolowestsharedrank.

Element Descrip4on

Startdate =beginDateEnddata =endDate

TEMPORALCOVERAGE

hPps://knb.ecoinforma2cs.org/#external//emlparser/docs/eml-2.1.1/./eml-resource.html

Element Descrip4onThesaurus/vocabulary n/aKeywordlist Keywordsseparatedby“,”

Element Descrip4onResourcecontact

ASSOCIATEDPARTIES

KEYWORDS

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

Element Descrip4on2tle TitleoftheprojectPersonnelfirstnamerolefunding Referencetofunding

partners

StudyareaDescrip2on àgeographiccoverageelaborated

Designdescrip2on Projectabstract&design

PROJECT DATA

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

Element Descrip4onStudyextent Samplingarea(specific)and

samplingfrequency

Samplingdescrip2onQualitycontrol Valida2onandquality

controlac2onsperformedonthedataset

Stepdescrip2on Proceduresfollowedtoproduceadataobject

SAMPLING METHODS

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

Element Descrip4onCita2oniden2fier URIordoiResourcecita2on cita2on

COLLECTIONDATA Element Descrip4onCollec2onNamecollec2onIDParentcollec2oniden2fier

Preserva2onmethod

CITATIONS

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015

Element Descrip4onResourcehomepage URLAddnewexternallink

ADDITIONALMETADATA Element Descrip4onHierarchylevel datasetDatepublishedPurpose(Ra2onale)LicenseIPrightsAddi2onalinforma2on

EXTERNAL LINKS

Slides modified from Dimitri Brosens, data paper workshop in Trondheim October 2015