A n O p e n -s o u r c e P la c e -f in d e r f o r G e n e a lo g y
Dallan Quass [email protected] Knight [email protected]
What's the problem?
Philadelphia PA
After leaving Marion he moved to Cambridge, MA.
Church R ow Goudhurst K ent
L o s A n g les , C a l i f or n ia
Kanesville, (Council Bluffs),
Pottawattamie, IANot stated, Ohio, KentuckyLathom,
Yorkshire, England Tranbylier, Bskrd.
Norway
L a Ju n t a , C O
Of Cranbury, Middlesex,
NJ, Germany
Great ford (near Mar t on) , NZ
K ingston Surrey
P r e s t in g ol , S t A g n e s , C o r n w a l l , E n g la n d
D e ir d o r f ,
R h in e la n d , P r s s ,
( G e r m a n y )Farm e r. &
Butche r.
Laboure rReturned to Boston Mass. with parents
after a visit to Nova Scotia.
Genealogists write places in many different ways
Some are misspelled
Los angles, California
Others are abbreviated
Tranbylier, Bskrd. Norway
Deirdorf, Rhineland, Prss, (Germany)
Some leave out commas
Philadelphia PA
Tranbylier, Bskrd. Norway
Church Row Goudhurst Kent
Kingston Surrey
Others have extra words
Not stated, Ohio, Kentucky
Of Cranbury, Middlesex, NJ, Germany
After leaving Marion he moved to Cambridge, MA.
Returned to Boston Mass. with parents after a visit to Nova Scotia.
Some no longer existor exist under different names or jurisdictions
Kanesville, (Council Bluffs), Pottawattamie, IA
Deirdorf, Rhineland, Prss, (Germany)
Others have an incorrect intermediate level
Lathom, Yorkshire, England
Some can't be found anywhere
Prestingol, St Agnes, Cornwall, England
Others can be found in multiple places
La Junta, CO
Philadelphia PA
Kingston Surrey
And finally, some aren't places at all
Farmer. & Butcher. Labourer
Why does it matter?
Search
Match
Maps
How does it work?
Steps
1. Work right-to-left, finding matching places - split on commas - back off if no matches
Ramsey, Hennepin, MN United States
Steps
1. Work right-to-left, finding matching places - split on commas - back off if no matches
2. Keep only subordinate jurisdictions - if none are subordinate, try skipping a level - if still no matches, ignore this level
Ramsey, Hennepin, MN United States
Steps
Ramsey, Hennepin, MN United States1. Work right-to-left, finding matching places - split on commas - back off if no matches
2. Keep only subordinate jurisdictions - if none are subordinate, try skipping a level - if still no matches, ignore this level
3. If there are multiple matches (ambiguous) - filter on type - filter out subordinate places - rank remaining matches
Ramsey, Minnesota, United States
Ramsey, Anoka, Minnesota, United States
Ramsey, Mower, Minnesota,United States
Database
WeRelate has a database of 435,000 places
• Includes inhabited places and record-keeping jurisdictions
• Excludes geographic entities like rivers, mountains, etc.
• Not complete, but we've researched and added additional places that appear frequently in GEDCOMs
Wiki as a Database
Wiki as a Database
How it began
WikipediaGetty
Thesaurus of Geographic
Names
Family History Catalog
All of us are smarter than any of us
Community input
Community oversight
Community oversight
Result
Proof is in the pudding
Compare to FamilySearch
Standardized 3736 place texts chosen at random from GEDCOMs using both algorithms
• 1911 standardized the same
• 1825 were different
Let's look at the K's
GEDCOM place text
This project Family Search Best guess
kaiapoi, nz Kaiapoi, Canterbury, New Zealand
Kaiapoi, Canterbury, Canterbury, New Zealand
Kaiapoi, Waimakariri (district), Canterbury (region), New Zealand
kanesville, (council bluffs), pottawattamie, ia
Council Bluffs, Pottawattamie, Iowa, United States
Kanesville, Pottawattamie, Iowa, United States
Council Bluffs (formerly Kanesville), Pottawattamie, Iowa, United States
kansas city, missouri Kansas City, Cass, Missouri, United States
Kansas City, Jackson, Missouri, United States
located in Jackson, Clay, Cass, and Platte counties
kelvin grove cemetary, palmerston north, (section s block 3 plot 38)
Palmerston North, Manawatu-Wanganui, New Zealand
Kelvin Grove, Barkly East, Cape of Good Hope, South Africa
Kelvin Grove Cemetery, Palmerston North, Manawatu-Wanganui (region), New Zealand
Let's look at the K's
GEDCOM place text
This project Family Search Best guess
kenny ?? cots altandhu lochbroom
Lochbroom, Ross and Cromarty, Scotland
Loch Broom, Pictou, Nova Scotia, Canada
Altandhu, Lochbroom, Ross and Cromarty, Scotland
kincardine ross & cromarty
Cromarty, Ross and Cromarty, Scotland
Ross and Cromarty, Scotland
Kincardine, Ross and Cromarty (county), Scotland
, king queen, virginia, usa
King, Wetzel, West Virginia, United States
,King, Clay, Virginia, United States
King and Queen (county), Virginia, United States
kingston surrey Kingston, Surrey, Jamaica
Kingston, Surrey, England
both places exist, but England is more likely
Bottom line
Of the 38 place texts compared
• 3 texts were either not a place of were truly ambiguous
• 8 texts weren't matched correctly by either system
• 10 texts were matched to the same place (just named differently) by both systems
• 11 texts were matched better by this project
• 9 texts were matched better by FamilySearch's project
Interestingly, these results are similar to the Nature study comparing Wikipedia with Encyclopedia Britannica – both had about the same number of mistakes.
Roadmap
• 2005-2011 Place wiki pages under development at WeRelate
• Jan 2011 Open-source project created
• Feb 2011 Announce at RootsTech
• Mar 2011 Incorporate new algorithm at WeRelate
Continued improvements
Future work
Analyze differences with FamilySearch
Review frequent missing places
Use machine learning for better scoring of ambiguous places
Demonstration of Places Server
• Demonstrates Matching Places
• Built with Play 1.2.4 - A Java Web framework
Allows for rapid development of web applications with a fully integrated stack
• Deployed to Heroku – Cloud Application Platform
– Heroku allows one step deployment with git
Demonstration of Places Server
Demonstration of Places Server
Demonstration of Places Server
Demonstration of Labeler
• Community feedback on places we couldn’t match
• Provides the best guess from the Places Standardizers
Demonstration of Labeler
Conclusion
Matching places is hard
• people record places in lots of different ways
But it’s important
• useful in search, match, and mapping
Open source algorithm and database are now freely available
• http://github.com/DallanQ/Places
Not perfect, but ongoing improvement
Hopefully others will benefit from this effort
Images appearing on these slides are copyrighted by the contributors to http://commons.wikimedia.org and are used under license
Top Related