Global content summit: Overview, content partnering, richness

35
Cynthia Parr Species Pages Group Global Content Summit 17-19 Jan 2011

description

These are Cyndy Parr's presentations at the EOL Global Partner Summit, starting with an overview of the meeting, and including an overview of how we set up content partnerships, and how we calculate and use page richness scores.

Transcript of Global content summit: Overview, content partnering, richness

Page 1: Global content summit: Overview, content partnering, richness

Cynthia ParrSpecies Pages Group

Global Content Summit17-19 Jan 2011

Page 2: Global content summit: Overview, content partnering, richness

http://www.eol.org• All species known to science• Freely accessible: open

access, open source• Available from a single portal

in a common format• Quality• Constantly growing• Aimed at multiple audiences

Page 3: Global content summit: Overview, content partnering, richness

EOL Global Partners

China

Australia

Dutch

South Africa

Costa Rica

MexicoPan-Arab

India

Colombia

Peru

GBIF

ViBRANT

BHL-Global BHL

Page 4: Global content summit: Overview, content partnering, richness

Aims of global partnersGlobal access to knowledge about life on EarthTo increase awareness and understanding of living

nature through an Encyclopedia of Life that gathers, generates and shares knowledge in an open, freely accessible and trusted digital resource

Work together towards this vision and mission, sharing expertise and knowledge as appropriate

Expand the global pool of knowledge about biodiversity and improve access to it

Page 5: Global content summit: Overview, content partnering, richness

Aims of this workshop• Gather content experts from Global Partners• Become familiar with each other’s work• Learn how core EOL works and provide

feedback on it• Form the Species Pages Working Group

Team at Smithsonian (SPG)Representatives from global partners

• Draft individual plans that complement each other towards a common goal

• Remind ourselves WHY we want to do this

Page 6: Global content summit: Overview, content partnering, richness

What is content?Biological information

Names and hierarchiesDescriptive textLiteratureMultimediaMapsLinks to more information

…..what about comments, collection annotations?

Page 7: Global content summit: Overview, content partnering, richness

Overview of agenda

Day 1: IntroductionsDay 2: SharingDay 3: Planning

Page 8: Global content summit: Overview, content partnering, richness

Acknowledgements• Funding from:

David M. Rubenstein giftJohn D. and Catherine T. MacArthur FoundationAlfred P. Sloane FoundationSmithsonian InstitutionMarine Biological LaboratoryHarvard University and other funders and donors

• All our content partners and global partners• Volunteer curators and individual contributors via Flickr, Wikimedia,

and members of EOL• All of you for coming• Claire Badgley

Page 9: Global content summit: Overview, content partnering, richness

Cynthia ParrSpecies Pages Group

Global Content Summit17-19 Jan 2011

Overview of Content Partnering

Page 10: Global content summit: Overview, content partnering, richness

DatabasesJournalsLifeDesks & ScratchpadsPublic contributions

EOL is a content curation community

Curate

CommentRate, Collect

eol.orgAggregate

API

Third party apps

Quality control, prioritization

Page 13: Global content summit: Overview, content partnering, richness

Low hanging fruit

Photo credit: Stanislas PERRIN

Page 14: Global content summit: Overview, content partnering, richness

Partner trajectory

Y1Q3 Y1Q4 Y2Q1 Y2Q2 Y2Q3 Y2Q4 Y3Q1 Y3Q2 Y3Q3 Y3Q4 Y4Q1 Y4Q2 Y4Q30

25

50

75

100

125

150

Num

ber o

f par

tner

s

Page 15: Global content summit: Overview, content partnering, richness

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 1360

100000

200000

300000

400000

500000

600000

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 1311

10

100

1000

10000

100000

1000000

Partners in order of # taxa contributed to EOL

Num

ber o

f tax

a fo

r whi

ch c

onte

nt is

con

tribu

ted

to E

OL Long Tail in databases contributing to EOL

… viewed on log scale

Page 16: Global content summit: Overview, content partnering, richness

Content strategyHighlightsPrioritiesRichness scoreProcessesGoals

Page 17: Global content summit: Overview, content partnering, richness

http://eol.org/info/partners

Page 18: Global content summit: Overview, content partnering, richness

Content Partner process overviewPartner creates an EOL member accountAdds a content partnerWe communicate with themThey (or we) upload a resource file or set a

URL where one can be foundThey set a harvest frequencyEOL harvests at that frequency

Page 19: Global content summit: Overview, content partnering, richness

Current methods of data transferEOL resource document (XML) (usually they do

the work)Spreadsheet upload (either can do the work)Connector (we do the work)

Scrape web site or PDFUse web servicesWork from a copy of DB

Darwin Core Archive (classifications, soon)

See http://eol.org/info/cp_resource_checklist

Page 20: Global content summit: Overview, content partnering, richness

How EOL gets content n=141 partners

XML resource doc Connector LD/eLD/Scratchpad

Spreadsheet0

10

20

30

40

50

60

70

CSV

web service

PDF

HTMLDB

LD/eLD/Scratchpad

Page 21: Global content summit: Overview, content partnering, richness

Example partner• Pensoft has a

process to generate EOL-compliant XML for new species

• Also sends images to Morphbank, specimens to GBIF

• They registered the URL at EOL

• Our script checks for changes once a day

Page 22: Global content summit: Overview, content partnering, richness

EOL Schema Sources

Content typeTaxaAttribution & licensingText objects & linksMultimedia

Standards usedDarwin Core ArchiveDublin & Darwin CoreSpecies Profile Model(and

now +)Dublin (+ Audubon Core)

Page 23: Global content summit: Overview, content partnering, richness

EOL Table of Contents TDWG Species Profile Model

Physical Description › Morphology #MorphologyPhysical Description › Size #SizeEcology › Habitat #HabitatEcology › Associations #AssociationsLife History & Behavior › Life Expectancy #LifeExpectancy Evolution and Systematics › Functional Adaptations

#Evolution

Conservation > Conservation Status #ConservationStatus Molecular Biology and Genetics › Genetics #GeneticsMolecular Biology and Genetics › Genome #MolecularBiologyMolecular Biology and Genetics › Molecular Biology

#MolecularBiology

Nucleotide Sequences #MolecularBiology

Example biological content

Page 24: Global content summit: Overview, content partnering, richness

EOL v2

Plinian Core

DwCdescription

SPMinfoitem

usingDarwin Core Archive flat files as transport mechanism

Page 25: Global content summit: Overview, content partnering, richness

EOL v3?

Relations

Numeric values

Controlled vocabulary

Page 26: Global content summit: Overview, content partnering, richness

PartnersCan delete or replace any of their objectsControl how often we harvest, and can force a harvestGet an automatically updating collectionCan request that we use their classification for browsingCan change the logo and description of their projectReceive comments and curator actions immediatelyReceive monthly reminders they can get traffic statisticsGet many links back to their original web resources

Page 27: Global content summit: Overview, content partnering, richness
Page 28: Global content summit: Overview, content partnering, richness

Partners cannot

Publish the very first timeDecide if they are pre-vettedRoll back a harvestChange the object of any other partnersChange classifications from any other

partners

Page 29: Global content summit: Overview, content partnering, richness

Cynthia ParrSpecies Pages Group

Global Content Summit17-19 Jan 2011

Richness scores

http://eol.org/pages/704102

Page 30: Global content summit: Overview, content partnering, richness

Taxon page richness algorithm

a (Breadth) b (Depth) c (Diversity)+ +

Breadth: Images, topics of text objects, references, maps, videos, sounds, conservation status

Depth: # words per text object, # words total

Diversity: Sources (partners)

60% 30% 10%

0 – 100, Threshold 40

Page 31: Global content summit: Overview, content partnering, richness

Summary of EOL page richnessOverall950,000 have content2 % are rich~22 % have only links to literature

Hot List30 % of 75K are richAverage richness = ~30

Red Hot List56 % of 3K are richAverage richness = 43

Page 32: Global content summit: Overview, content partnering, richness

How richness is usedChoose images for home page “March of Life”Allows sorting in collections Weird life example

Helps provide best search and API results

Any other ideas? Could we be matchmakers for pages needing enrichment and users?

Page 34: Global content summit: Overview, content partnering, richness

Strategies for improving richnessCrowd-sourcingCollectionsCommunitiesMobile apps

LeveragingEnabling platformsEnabling journalsData mining BHL etc.

Page 35: Global content summit: Overview, content partnering, richness

The page richness index

Helps fill gaps with existing knowledgeHelps prioritize funding and training so that it

has maximum impact on closing true gapsWill be available via API

Computing and storing richness index on EOL is a step towards storing and serving computable data