Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and...
-
Upload
buck-john-mclaughlin -
Category
Documents
-
view
216 -
download
1
Transcript of Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and...
![Page 1: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/1.jpg)
Determining and Mapping Locations of Study in Scholarly Documents:
A Spatial Representation and Visualization Tool for Information Discovery
James Creel
Katherine H. Weimer
TCDL
May 7, 2013
![Page 2: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/2.jpg)
Geospatial Information Retrieval Challenges
• How to utilize locations represented in text?– 20% of web queries have a geographic relation
(Ahlers)
• Traditional catalog subjects and keywords do not suffice
• Location information is increasingly in demand (Reid)
• Ex. 1049 total ETDs in 2005– 300 included locations (< 30%)– 130 contained international locations (> 10%)
![Page 3: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/3.jpg)
What about a Visual Search?
• Searching collections with a map interface?– Visual representation of research– Enable serendipitous cross-disciplinary
collaborations and networking– Enhance access to the collection
![Page 4: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/4.jpg)
Map Prototype
![Page 5: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/5.jpg)
2011 – Geoparsing Work Begins
1. Overarching goal is to automate geocoding
2. Find toponyms in scholarly documents
3. Look up toponyms in a gazetteer
4. Disambiguate homonymous toponyms
5. Obtain geographic coordinates from gazetter
6. Encode coordinates in item surrogates for map-based view
7. Create map with link to original text
![Page 6: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/6.jpg)
Desired Map Functionality
1. Base map: use Google Maps and other available interfaces
2. Cluster placemarks according to zoom level
3. List the displayed placemarks
4. Dropdown menu for countries and states in the US
5. Dropdown menu for departments grouped by college1. Selection of multiple departments in more than one college
2. If selecting the college, then select all departments within the college
6. Search by author
7. Time range slider (by year)
8. Use the Web-friendly University Brand color palette
![Page 7: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/7.jpg)
Geocoding with KML files
• The KML file with locations includes:– Author– Title– Academic department– Advisor– Degree level– Year– Place– Keywords– Url to document
• Info box displays:– Author– Title– Academic department– Degree level– Year– Place– Url to document
![Page 8: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/8.jpg)
Beta Version of Map: Showing Google Street Maps
![Page 9: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/9.jpg)
Clustering Mechanism
![Page 10: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/10.jpg)
User clicks on Point of Interest Title and Metadata Appear with Link to Text
![Page 11: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/11.jpg)
Automated Process / Geoparser
• Geoparsing addresses two key problems:1) Name extraction
2) Name disambiguation
Document text
Extracted names
Disambiguated
names
Geospatial metadata
![Page 12: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/12.jpg)
Geoparser: Comparable Models
• Edinburgh Geoparser– Grover, et. al. used OCR with historic records,
provided the GeoCrossWalk gazetteer
• DIGMAP Geoparser– Martins, et al. used originally for DIGMAP
digital library of historic maps
![Page 13: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/13.jpg)
Geoparser: Setting
• DSpace 1.7 supports curation tasks– Custom Java programs
• Our instantiation:– Suggest New Metadata – Generate KML
![Page 14: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/14.jpg)
Geoparser Workflow
![Page 15: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/15.jpg)
Geoparser: Pre-Processing
– DSpace filter-media script extracts plain-text from PDFs.
– Suggest New Metadata curation task• Partitions the document into sections using regular
expressions• Excludes sections containing non-topical toponyms
(author-affiliation locations, conference locations, etc.)
![Page 16: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/16.jpg)
Geoparser: Name Extraction
• ‘Named Entity Recognition’ or NER– Various open-source tools/training data
• Current version uses Apache OpenNLP or Stanford NER
• Classifies substrings of the text as names • Toponym occurrences are recorded in
context and counted
![Page 17: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/17.jpg)
Name Disambiguation
• Requires reliable data- or knowledge-base• We employ the Geonames dataset
– Conglomeration of International gazetteers• Includes GNIS (USGS)
• Several complimentary methods– Rule-based– Heuristic– Statistical
![Page 18: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/18.jpg)
Heuristics: Overview
• Various heuristics can help indicate the probable referent of a given toponym
• Other heuristics can help pick out false positives from the classifiers
• Heuristics are based on context-clues in the text or on general observations about human discourse
![Page 19: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/19.jpg)
Heuristics: Context-based
• One document, one sense• Unambiguous extended names i.e. “Paris,
France”• Favor locations close to other mentioned
locations• Favor locations contained in other
mentioned locations• Favor locations of mentioned feature types
![Page 20: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/20.jpg)
Heuristics: Generalized
• Favor higher-level administrative units (countries, states, cities)
• Favor locations of larger population
![Page 21: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/21.jpg)
Heuristics: Application
• Heuristics - grouped into refinement iterations and then applied sequentially
• Resolve obvious cases first in order to provide better data for subsequent heuristics
![Page 22: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/22.jpg)
Geoparser Evaluation
• Comparison of human annotations to geoparser output
• Precision/Recall of name extraction
• Accuracy of name disambiguation
![Page 23: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/23.jpg)
Evaluator Workflow
![Page 24: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/24.jpg)
Future Work
• Explore statistical disambiguation• Explore relevance of toponyms to the
subject matter• Expand to TDL collections• Expand to other digital collections or
collection types, even the library catalog?
Much more work to be done!
![Page 25: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/25.jpg)
References
• Ahlers & Boll, “Location Based Web Search” in The Geospatial Web (London: Springer 2007)
• Apache OpenNLP. https://opennlp.apache.org/index.html• DigMap. http://portal.digmap.edu/• Leidner, Jochen L. “Toponym Resolution in Text” (Univ.
Edinburgh 2007)• Jenny Rose Finkel, Trond Grenager, and Christopher
Manning. 2005. “Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling” (ACL 2005) http://nlp.stanford.edu/~manning/papers/gibbscrf3.pdf
• Reid, James. “GeoXwalk – A Gazetteer Server and Service for UK Academia” (ECDL 2003)
![Page 26: Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.](https://reader035.fdocuments.us/reader035/viewer/2022062716/56649dcf5503460f94ac4315/html5/thumbnails/26.jpg)
Contact:
– James Creel
– Kathy Weimer