Lichens, Bryophytes and Climate Change

14
Lichens, Bryophytes and Climate Change Edward Gilbert Corinna Gries Thomas H. Nash III Robert Anglin

description

Edward Gilbert Corinna Gries Thomas H. Nash III Robert Anglin. Lichens, Bryophytes and Climate Change. Goals and Scope. 16 digitization centers > 60 non-governmental US herbaria (95%) Mexico, US, Canada ~ 2.3 million specimen 90% of all specimens 900,000 lichens - PowerPoint PPT Presentation

Transcript of Lichens, Bryophytes and Climate Change

Page 1: Lichens, Bryophytes and Climate Change

Lichens, Bryophytes and Climate Change

Edward GilbertCorinna GriesThomas H. Nash IIIRobert Anglin

Page 2: Lichens, Bryophytes and Climate Change

Goals and Scope 16 digitization centers > 60 non-

governmental US herbaria (95%) Mexico, US, Canada

~ 2.3 million specimen 90% of all specimens 900,000 lichens 1.4 million bryophytes

Page 3: Lichens, Bryophytes and Climate Change

Project Information

http://lbcc.limnology.wisc.edu/

Page 4: Lichens, Bryophytes and Climate Change

Digitization Workflow

Page 5: Lichens, Bryophytes and Climate Change

National Portals Lichen Consortium

http://lichenportal.org Started in 2009 24 Collections ~ 797,916 Records

Bryophyte Consortium http://bryophyteportal/ Started in 2010 16 Collections 1,059,063 Records

Page 6: Lichens, Bryophytes and Climate Change

Imaging Stage

Capture Image

barcode in file name

Create Skeleton

Filebarcode, species name,

exsiccati, etc.

Upload to FTP server

Image processing

extract barcode,

create web versions, map to portal DBs

Duplicate Harvesti

ng

Existing Herbarium Database

Automated ProcessingOCR / NLP /

Georeferencingaugmented with raw OCR, parsed fields, coordinates, etc.

Existing Record

simply link image

Upload to FTP server

Image URLs

Manage Specimen

Data in Portal

Manage / Review

Records in Portal

SymbiotaEditor

review, edit, keystroke, and finalize

Create New Record

barcode, image, skeletal data

Page 7: Lichens, Bryophytes and Climate Change

LBCC: Workflow Overview Image all specimen / specimen labels Collect and load skeletal data

Barcode, scientific name, country, state Upload to portal

Record exists => link image to existing record Record absent => create empty “unprocessed” record

Automated OCR label Block of raw text => database

Automated NLP (field parsing) Review data

Keystroke full record Collector name & number => look for dups Reparse full record => learnable parsers

Page 8: Lichens, Bryophytes and Climate Change

Optical Character Recognition Tesseract V3 Dual cycle

Automatic Manual review

Expected hurtles Handwritten

labels Old fonts Faded labels Form labels

Adjustable image variables

¢_].L.|»‘¢ .'».f.'._..‘~,(.Jfin-x‘*\'a:"511z:1 wf .~\:'i/.onli State UniversityP.’~.r"~2= ,_. gg J:.2 " J*J*" †(=:\‘-“ax "»..'\-12�‘ “ "‘ ;T~;‘~7i?»-1_1_\f;>sf`;,' ESXZ»ie+‘-». “~'.»te;~:i_.t<» ff`t;~f3":.f.“» »4 xx, ,"""‘“â€T"’ <1;-.rs f3'a,1.z>.t;;a¢f~rus ’�V4 J 'if . r°'° M '1?nies ivain.) Sav.neutal Station - " '1 ~»r';;4-\P ` 1.T11 ./P.. ,J ..-.ELEV. ' `.fJL_\ LATL Q _‘ 1 _ Y’ DATE_ ,. W5. (> f- , -:‘; i f>i_T ~~ . A 1:». v\ .-v »~. 4. a xvala 8/27/73

PLANTS OF NEW r~1ExIcoHerbarium of Arizona State UniversityParmelia ulophyllodes (Vain.) Sav.COUNTY “°â€â€œâ€œ �Joranada Experimental Station -New Mexico State University"“““' on JuniperusELEV. ‘ 4400EEILLEETUR DATEDU T. H. Nash #7914 8/27/73T. H. N.

Page 9: Lichens, Bryophytes and Climate Change

Auto-Processing: OCR

1. Iterate through new “unprocessed” images1. 81439 bryophytes images2. 147122 lichens images

2. OCR via Tesseract (version 3)a) Untreated imageb) Treated image (contrast, brightness, etc)

3. Store raw text linked to skeletal record4. Progress to next step

1. Low OCR return => hand processing2. “Unprocessed-OCR” => NLP

Page 10: Lichens, Bryophytes and Climate Change

Auto-Processing: NLP

1. Iterate through raw OCR text blocksa) 147122 lichen OCR blocksb) 81439 bryophyte OCR blocks

2. Collector, number, and datea) Attempt duplicate harvesting

3. Field-by-field parsing4. Full-parsing5. Parsing based on NLP profiles

1. E.g. targeted label formats

Page 11: Lichens, Bryophytes and Climate Change

NLP: Duplicate Harvesting1. Extract collector data

a) Last name, number, date2. Harvest duplicates from consortium DB

a) Exact duplicatesb) Duplicate events

3. Compare return field-by-field4. Compare fields with raw OCR5. Populate fields that have high similarity

indexes6. Processing status: “pending review”

Page 12: Lichens, Bryophytes and Climate Change

NLP: Targeted Parsing Profiles1. Premise: Target similar label formats2. Use raw OCR to locate “Nash” labels3. Need to exclude:

a) Determined by Nashb) Author of scientific namec) Associated collector

4. Test for similarity to target label format

5. Targeted parsing algorithms

Page 13: Lichens, Bryophytes and Climate Change

Label Review

Page 14: Lichens, Bryophytes and Climate Change