Alexandria Digital Library Project University of California, Santa Barbara.
-
Upload
verity-maxwell -
Category
Documents
-
view
216 -
download
1
Transcript of Alexandria Digital Library Project University of California, Santa Barbara.
![Page 1: Alexandria Digital Library Project University of California, Santa Barbara.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649ceb5503460f949b6cbc/html5/thumbnails/1.jpg)
Alexandria Digital Library Project
http://www.alexandria.ucsb.edu/
University of California, Santa Barbara
![Page 2: Alexandria Digital Library Project University of California, Santa Barbara.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649ceb5503460f949b6cbc/html5/thumbnails/2.jpg)
Textual-Geospatial Integration Project
NSF National Science Digital Library Project2001-2003
Aerial photos
Maps
Data
![Page 3: Alexandria Digital Library Project University of California, Santa Barbara.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649ceb5503460f949b6cbc/html5/thumbnails/3.jpg)
Project GoalsExtend NSDL infrastructure
by enabling
geographic queries for text and non-text items
across heterogeneous digital libraries
geographic referencing of arbitrary texts without
explicit geographic cataloging
![Page 4: Alexandria Digital Library Project University of California, Santa Barbara.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649ceb5503460f949b6cbc/html5/thumbnails/4.jpg)
ParticipantsUniversity of California,
Santa Barbara James Frew, PI Terence Smith Michael Bueno Linda Hill
Information Retrieval Lab, Illinois Institute of Technology
Ophir Frieder David Grossman Eric Jensen
The American Geological Institute (AGI) has permitted us to use a set of their GeoRef records for system training.
![Page 5: Alexandria Digital Library Project University of California, Santa Barbara.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649ceb5503460f949b6cbc/html5/thumbnails/5.jpg)
Geospatially- What’s here?
Find library objects associated with a given location:– Place name(s)– “Footprint” (geographic
extent)
Where’s this? Find the location(s) associated
with a given library object
![Page 6: Alexandria Digital Library Project University of California, Santa Barbara.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649ceb5503460f949b6cbc/html5/thumbnails/6.jpg)
Augmented SearchExamples Queries from TREC-9
Find documents that contain residential real estate listings within New Jersey.
Find reports on automobile traffic in the Washington, DC metropolitan area.
What forms of entertainment are available in Newport Beach, California?
![Page 7: Alexandria Digital Library Project University of California, Santa Barbara.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649ceb5503460f949b6cbc/html5/thumbnails/7.jpg)
The stages
Oral histories
geo-parsing
georeferenced facts
•placenames• IN• ENVIRONS• PIECE OF
•feature types
lookup in gazetteer
gazetteer entries
•names•footprint
s
spatial analysis
identify best footprint
1
4
3
2
![Page 8: Alexandria Digital Library Project University of California, Santa Barbara.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649ceb5503460f949b6cbc/html5/thumbnails/8.jpg)
The evaluationTest Settings
• geoparser: IIT, version 2• geoparser: give partial matches a value of 0.25• geofact rule: include geological terms• gazetteer: ADL Gazetteer, protocol interface, 04-
2003• gaz lookup settings: operator = “equals”• clustering settings: basic clustering
Manual Analysis• word count in document• identify unique geofacts in document• identify geoparsing output as valid, partial, or invalid• identify valid matches in ADL gazetteer
Metrics• geoparser: recall and precision• gazlookup: recall and precision• clustering bounding box: recall and precision• clustering bounding box: spatial similarity to
reference bounding box
![Page 9: Alexandria Digital Library Project University of California, Santa Barbara.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649ceb5503460f949b6cbc/html5/thumbnails/9.jpg)
Example Text
title: Stress-induced borehole elongation; a comparison between the four-arm dipmeter and the borehole televiewer in the Auburn geothermal well
keys: applications | Auburn | borehole breakouts | boreholes | caliper logging | Cayuga County New York | deformation | dipmeter logging | elongation | field studies | fractures | geophysical surveys | instruments | New York | patterns | preferred orientation | rock mechanics | spallations | stress | structural analysis | surveys | televiewers | United States | well-logging
abstract: The nature and origin of borehole elongation recorded by the four-arm dipmeter calipers is studied utilizing information obtained from hydraulic fracturing stress measurements and borehole televiewer data taken in a well located in Auburn, New York. A preferred orientation N10 degrees W-S10 degrees E, + or -10 degrees and a less prominant E-W orientation of borehole elongation, was observed on two runs of the dipmeter. Comparisons of borehole geometry determined using the televiewer and the dipmeter show that both tools give the same orientation of borehole elongation provided that the zone of elongation is longer than 30 cm. Comparisons of dipmeter caliper data with orientation of in situ stress and natural fractures, obtained from hydrofracturing tests and televiewer data show that the N10 degrees W-S10 degrees E borehole elongations (1) are axisymmetric, (2) are aligned with the minimum horizontal stress S (sub h) and (3) are not associated with natural fractures intersecting the well. These elongations are interpreted as stress-induced well bore breakouts. The E-W elongation direction is characterized by an assymmetric borehole cross section in thinly bedded rocks and is not caused by breakouts. This assymmetric geometry can be discriminated from breakouts using the oriented electric measurements provided by the dipmeter. This study demonstrates that the dipmeter can be used to determine the orientation of S (sub h) confirming the results of earlier less detailed studies, and provides a firm basis for mapping regional stress patterns using existing dipmeter data.--Modified journal abstract
GeoRef bibliographic record from the TGI test set of 7523 records
![Page 10: Alexandria Digital Library Project University of California, Santa Barbara.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649ceb5503460f949b6cbc/html5/thumbnails/10.jpg)
Manual AnalysisGeofacts
• Auburn IN New York • Cayuga County IN New York • IN Auburn • Auburn • New York • United States
Gaz entries• adlgaz-1-6862604-02 (Auburn, New York) • adlgaz-1-2168-0d (Cayuga County, New York) • adlgaz-1-195-06 (New York) • adlgaz-1-156-69 (United States)
![Page 11: Alexandria Digital Library Project University of California, Santa Barbara.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649ceb5503460f949b6cbc/html5/thumbnails/11.jpg)
Geoparsing
Geoparsing performance parser recall = 4.25/6 = 0.71 parser precision = 4.25/8 = 0.53
Geoparsing output(Auburn, , , , 1, K) (Auburn, , , (in, ), 1, T) (New York, , , , 1, K) (United States, , , , 1, K) (Cayuga County, , , , 1, K) (Auburn New, , , (in,), 1, B)(County New, , , (in, York), 1, K) (York, , , , 1, B)
fact ::= (name?, type?, footprint?, related-fact?, certainty, importance)
Geoparsing scoring valid fact = 1 partially valid fact = 0.25 invalid fact = 0
blue = valid fact green = partially valid fact red = invalid fact
![Page 12: Alexandria Digital Library Project University of California, Santa Barbara.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649ceb5503460f949b6cbc/html5/thumbnails/12.jpg)
Gazlookup operator = “equals” (exact match)
“auburn” .. 37 entries
“new york” .. 18 entries
“united states” .. 1 entry
“cayuga county” .. 1 entry
“auburn new” .. 0
“county new” .. 0
“york” .. 50 entries
TOTAL = 105
Gazlookup performance lookup recall = 3/4 = 0.75 lookup precision = 3/105 = 0.03
![Page 13: Alexandria Digital Library Project University of California, Santa Barbara.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649ceb5503460f949b6cbc/html5/thumbnails/13.jpg)
Scatter of points
Scatter of 105 points from “equals” Gazlookup
Clustered points (67) in the US and Canada
Baseline clustering
![Page 14: Alexandria Digital Library Project University of California, Santa Barbara.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649ceb5503460f949b6cbc/html5/thumbnails/14.jpg)
Derived footprint
Footprint for “equals” lookup data and simple clustering, compared to GeoRef footprint
GeoRef footprint
Derived footprint from points
Very low spatial similarity between TGI box and reference box from GeoRef
![Page 15: Alexandria Digital Library Project University of California, Santa Barbara.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649ceb5503460f949b6cbc/html5/thumbnails/15.jpg)
Statistics redux
Based on comparison of automated processes to manual analysis and GeoRef box for one sample record:
GeoparsingRecall ……………………….. 0.71Precision ……………………. 0.53
GazlookupRecall ……………………….. 0.75Precision ……………………. 0.03
TBI bounding boxRecall ……………………….. 0.75Precision ……………….….. 0.05Similarity to reference ….. 0
![Page 16: Alexandria Digital Library Project University of California, Santa Barbara.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649ceb5503460f949b6cbc/html5/thumbnails/16.jpg)
•Set new conditions
•Find settings that give good results for 10 test records
•Run 7,524 GeoRef test records through TGI
•Calculate similarity of TGI boxes to GeoRef boxes
•Choose 10 new test records for manual analysis from best & worst results
•Reset conditions•Repeat
Next steps
Geoparser Gazlookup Clustering