An Open-source Place-finder for Genealogy

43
An Open-source Place-finder for Genealogy Dallan Quass [email protected] Ryan Knight [email protected]

description

An Open-source Place-finder for Genealogy presented by Dallan Quass and Ryan Knight at RootsTech 2012 Translate place texts to fully-qualified standardized place names, including historical.

Transcript of An Open-source Place-finder for Genealogy

Page 1: An Open-source Place-finder for Genealogy

A n O p e n -s o u r c e P la c e -f in d e r f o r G e n e a lo g y

Dallan Quass [email protected] Knight [email protected]

Page 2: An Open-source Place-finder for Genealogy

What's the problem?

Page 3: An Open-source Place-finder for Genealogy

Philadelphia PA

After leaving Marion he moved to Cambridge, MA.

Church R ow Goudhurst K ent

L o s A n g les , C a l i f or n ia

Kanesville, (Council Bluffs),

Pottawattamie, IANot stated, Ohio, KentuckyLathom,

Yorkshire, England Tranbylier, Bskrd.

Norway

L a Ju n t a , C O

Of Cranbury, Middlesex,

NJ, Germany

Great ford (near Mar t on) , NZ

K ingston Surrey

P r e s t in g ol , S t A g n e s , C o r n w a l l , E n g la n d

D e ir d o r f ,

R h in e la n d , P r s s ,

( G e r m a n y )Farm e r. &

Butche r.

Laboure rReturned to Boston Mass. with parents

after a visit to Nova Scotia.

Genealogists write places in many different ways

Page 4: An Open-source Place-finder for Genealogy

Some are misspelled

Los angles, California

Page 5: An Open-source Place-finder for Genealogy

Others are abbreviated

Tranbylier, Bskrd. Norway

Deirdorf, Rhineland, Prss, (Germany)

Page 6: An Open-source Place-finder for Genealogy

Some leave out commas

Philadelphia PA

Tranbylier, Bskrd. Norway

Church Row Goudhurst Kent

Kingston Surrey

Page 7: An Open-source Place-finder for Genealogy

Others have extra words

Not stated, Ohio, Kentucky

Of Cranbury, Middlesex, NJ, Germany

After leaving Marion he moved to Cambridge, MA.

Returned to Boston Mass. with parents after a visit to Nova Scotia.

Page 8: An Open-source Place-finder for Genealogy

Some no longer existor exist under different names or jurisdictions

Kanesville, (Council Bluffs), Pottawattamie, IA

Deirdorf, Rhineland, Prss, (Germany)

Page 9: An Open-source Place-finder for Genealogy

Others have an incorrect intermediate level

Lathom, Yorkshire, England

Page 10: An Open-source Place-finder for Genealogy

Some can't be found anywhere

Prestingol, St Agnes, Cornwall, England

Page 11: An Open-source Place-finder for Genealogy

Others can be found in multiple places

La Junta, CO

Philadelphia PA

Kingston Surrey

Page 12: An Open-source Place-finder for Genealogy

And finally, some aren't places at all

Farmer. & Butcher. Labourer

Page 13: An Open-source Place-finder for Genealogy

Why does it matter?

Page 14: An Open-source Place-finder for Genealogy

Search

Page 15: An Open-source Place-finder for Genealogy

Match

Page 16: An Open-source Place-finder for Genealogy

Maps

Page 17: An Open-source Place-finder for Genealogy

How does it work?

Page 18: An Open-source Place-finder for Genealogy

Steps

1. Work right-to-left, finding matching places - split on commas - back off if no matches

Ramsey, Hennepin, MN United States

Page 19: An Open-source Place-finder for Genealogy

Steps

1. Work right-to-left, finding matching places - split on commas - back off if no matches

2. Keep only subordinate jurisdictions - if none are subordinate, try skipping a level - if still no matches, ignore this level

Ramsey, Hennepin, MN United States

Page 20: An Open-source Place-finder for Genealogy

Steps

Ramsey, Hennepin, MN United States1. Work right-to-left, finding matching places - split on commas - back off if no matches

2. Keep only subordinate jurisdictions - if none are subordinate, try skipping a level - if still no matches, ignore this level

3. If there are multiple matches (ambiguous) - filter on type - filter out subordinate places - rank remaining matches

Ramsey, Minnesota, United States

Ramsey, Anoka, Minnesota, United States

Ramsey, Mower, Minnesota,United States

Page 21: An Open-source Place-finder for Genealogy

Database

WeRelate has a database of 435,000 places

• Includes inhabited places and record-keeping jurisdictions

• Excludes geographic entities like rivers, mountains, etc.

• Not complete, but we've researched and added additional places that appear frequently in GEDCOMs

Page 22: An Open-source Place-finder for Genealogy

Wiki as a Database

Page 23: An Open-source Place-finder for Genealogy

Wiki as a Database

Page 24: An Open-source Place-finder for Genealogy

How it began

WikipediaGetty

Thesaurus of Geographic

Names

Family History Catalog

Page 25: An Open-source Place-finder for Genealogy

All of us are smarter than any of us

Page 26: An Open-source Place-finder for Genealogy

Community input

Page 27: An Open-source Place-finder for Genealogy

Community oversight

Page 28: An Open-source Place-finder for Genealogy

Community oversight

Page 29: An Open-source Place-finder for Genealogy

Result

Proof is in the pudding

Page 30: An Open-source Place-finder for Genealogy

Compare to FamilySearch

Standardized 3736 place texts chosen at random from GEDCOMs using both algorithms

• 1911 standardized the same

• 1825 were different

Page 31: An Open-source Place-finder for Genealogy

Let's look at the K's

GEDCOM place text

This project Family Search Best guess

kaiapoi, nz Kaiapoi, Canterbury, New Zealand

Kaiapoi, Canterbury, Canterbury, New Zealand

Kaiapoi, Waimakariri (district), Canterbury (region), New Zealand

kanesville, (council bluffs), pottawattamie, ia

Council Bluffs, Pottawattamie, Iowa, United States

Kanesville, Pottawattamie, Iowa, United States

Council Bluffs (formerly Kanesville), Pottawattamie, Iowa, United States

kansas city, missouri Kansas City, Cass, Missouri, United States

Kansas City, Jackson, Missouri, United States

located in Jackson, Clay, Cass, and Platte counties

kelvin grove cemetary, palmerston north, (section s block 3 plot 38)

Palmerston North, Manawatu-Wanganui, New Zealand

Kelvin Grove, Barkly East, Cape of Good Hope, South Africa

Kelvin Grove Cemetery, Palmerston North, Manawatu-Wanganui (region), New Zealand

Page 32: An Open-source Place-finder for Genealogy

Let's look at the K's

GEDCOM place text

This project Family Search Best guess

kenny ?? cots altandhu lochbroom

Lochbroom, Ross and Cromarty, Scotland

Loch Broom, Pictou, Nova Scotia, Canada

Altandhu, Lochbroom, Ross and Cromarty, Scotland

kincardine ross & cromarty

Cromarty, Ross and Cromarty, Scotland

Ross and Cromarty, Scotland

Kincardine, Ross and Cromarty (county), Scotland

, king queen, virginia, usa

King, Wetzel, West Virginia, United States

,King, Clay, Virginia, United States

King and Queen (county), Virginia, United States

kingston surrey Kingston, Surrey, Jamaica

Kingston, Surrey, England

both places exist, but England is more likely

Page 33: An Open-source Place-finder for Genealogy

Bottom line

Of the 38 place texts compared

• 3 texts were either not a place of were truly ambiguous

• 8 texts weren't matched correctly by either system

• 10 texts were matched to the same place (just named differently) by both systems

• 11 texts were matched better by this project

• 9 texts were matched better by FamilySearch's project

Interestingly, these results are similar to the Nature study comparing Wikipedia with Encyclopedia Britannica – both had about the same number of mistakes.

Page 34: An Open-source Place-finder for Genealogy

Roadmap

• 2005-2011 Place wiki pages under development at WeRelate

• Jan 2011 Open-source project created

• Feb 2011 Announce at RootsTech

• Mar 2011 Incorporate new algorithm at WeRelate

Continued improvements

Page 35: An Open-source Place-finder for Genealogy

Future work

Analyze differences with FamilySearch

Review frequent missing places

Use machine learning for better scoring of ambiguous places

Page 36: An Open-source Place-finder for Genealogy

Demonstration of Places Server

• Demonstrates Matching Places

• Built with Play 1.2.4 - A Java Web framework

Allows for rapid development of web applications with a fully integrated stack

• Deployed to Heroku – Cloud Application Platform

– Heroku allows one step deployment with git

Page 37: An Open-source Place-finder for Genealogy

Demonstration of Places Server

Page 38: An Open-source Place-finder for Genealogy

Demonstration of Places Server

Page 39: An Open-source Place-finder for Genealogy

Demonstration of Places Server

Page 40: An Open-source Place-finder for Genealogy

Demonstration of Labeler

• Community feedback on places we couldn’t match

• Provides the best guess from the Places Standardizers

Page 41: An Open-source Place-finder for Genealogy

Demonstration of Labeler

Page 42: An Open-source Place-finder for Genealogy

Conclusion

Matching places is hard

• people record places in lots of different ways

But it’s important

• useful in search, match, and mapping

Open source algorithm and database are now freely available

• http://github.com/DallanQ/Places

Not perfect, but ongoing improvement

Hopefully others will benefit from this effort

Images appearing on these slides are copyrighted by the contributors to http://commons.wikimedia.org and are used under license

Page 43: An Open-source Place-finder for Genealogy