June 2006Image LSID resolvers Image LSID Resolution Prototypes Hui Dong, Bob Morris UMass Boston.

12
June 2006 Image LSID resolvers Image LSID Resolution Prototypes Hui Dong, Bob Morris UMass Boston

Transcript of June 2006Image LSID resolvers Image LSID Resolution Prototypes Hui Dong, Bob Morris UMass Boston.

Page 1: June 2006Image LSID resolvers Image LSID Resolution Prototypes Hui Dong, Bob Morris UMass Boston.

June 2006 Image LSID resolvers

Image LSID Resolution Prototypes

Hui Dong, Bob MorrisUMass Boston

Page 2: June 2006Image LSID resolvers Image LSID Resolution Prototypes Hui Dong, Bob Morris UMass Boston.

June 2006 Image LSID resolvers

Application

• Toy application at http://tamarin.cs.umb.edu:8081/jsp-examples/MyJsp.jsp

Page 3: June 2006Image LSID resolvers Image LSID Resolution Prototypes Hui Dong, Bob Morris UMass Boston.

June 2006 Image LSID resolvers

Image Lsid Resolution Servers

– Projects• UMB from local image store of 13,000 field

photographs. (Morris, Haber, Dong)• Morphbank (U. Florida) from project to document

morphological characters (Rohnquist, Riccardi)• U.T. Austin X-Ray CT facility to scans of paleo and

very small vertebrates. (Humphrey, Mirenkar)

– Huge variation in social and technical image and metadata acquisition

Page 4: June 2006Image LSID resolvers Image LSID Resolution Prototypes Hui Dong, Bob Morris UMass Boston.

June 2006 Image LSID resolvers

Image Lsid Resolution Servers

• Known– UMB 13000 unedited images from skilled naturalist

• Metadata: exif, taxonomy, habitat, location, voucher number for type specimen of identified taxon, part imaged. 13,000 Images in folders. Mine file and folder names and correlate to checklist(s) then metadata into MySQL with generated LSID.

• cf. ENBI report on Imaging Type Specimens• Data == ???

– Morphbank (U. Florida) from project to document morphological characters

• Metadata – Darwin Core plus local attributes. Automated by Contribution process

– U.T. Austin X-Ray CT facility to document XRCT • Metadata: automated by scan configuration

Page 5: June 2006Image LSID resolvers Image LSID Resolution Prototypes Hui Dong, Bob Morris UMass Boston.

June 2006 Image LSID resolvers

Summary

• Resolution is easy.

• Acquiring metadata is hard.

Page 6: June 2006Image LSID resolvers Image LSID Resolution Prototypes Hui Dong, Bob Morris UMass Boston.

June 2006 Image LSID resolvers

UMB details

• Services implemented on sourceforge.lsid.net Java suite:– Authority, Data, and Metadata interfaces

exposed as separate web services– Omitted security service and assignment

service (use adhoc assignment, not exposed. Would consider making assignment as part of the image deposit service).

Page 7: June 2006Image LSID resolvers Image LSID Resolution Prototypes Hui Dong, Bob Morris UMass Boston.

June 2006 Image LSID resolvers

Implementation issues

• Triples on-the-fly were too slow. We cache them in MySQL. Could use native triple store but haven’t yet encountered any use case except that needs it in the face of a shadow SQL metadata store and a warehouse model.

• Most integrated apps might be easier to do with something that appears to the outside like a triple store though.

Page 8: June 2006Image LSID resolvers Image LSID Resolution Prototypes Hui Dong, Bob Morris UMass Boston.

June 2006 Image LSID resolvers

Implementation issues

• Jena RDF serialization generated huge numbers of triples irrelevant to us (e.g. graph support). Result was intolerable performance so serialized with hibernate.org relational persistence framework. (Message from Mirenkar forcefully and us weakly: there are no standards for serializing naturally occurring RDBs to RDF).

Page 9: June 2006Image LSID resolvers Image LSID Resolution Prototypes Hui Dong, Bob Morris UMass Boston.

June 2006 Image LSID resolvers

Warehousing vs distributed metadata stores

• Current resolution discovery scheme does not support multiple resolution services for a given LSID. Hence metadata cannot presently be distributed. Example: distributed annotation. Bill may not have authority to add annotation to Susan’s metadata store but might still have valuable annotation which should be keyed by the LSID.

Page 10: June 2006Image LSID resolvers Image LSID Resolution Prototypes Hui Dong, Bob Morris UMass Boston.

June 2006 Image LSID resolvers

Warehousing vs distributed metadata

• Given some metadata values, how to find all the LSID’s that have that metadata value. Need entire metadata RDF store someplace (for each resolution service!) in order to make the query

SELECT lsid WHERE metadataAttributeA(lsid) = value_b

• Reasonable image RDF is 50-100 attributes. Reasonable personal image store is 105 images.

• This is not specific to RDF, but there is no history of supporting this kind of query at large scale.

Page 11: June 2006Image LSID resolvers Image LSID Resolution Prototypes Hui Dong, Bob Morris UMass Boston.

June 2006 Image LSID resolvers

Interesting research problem

• Typical utility in applications will(?) arise from metadata containing other LSIDs. But there are no standards for querying this or for recursive resolution. That is, the embedded LSID is a proxy for more metadata + implied ontological relations. How to make resolvers accept ontological data, reason over it, and decide what recursive resolution should take place.

Page 12: June 2006Image LSID resolvers Image LSID Resolution Prototypes Hui Dong, Bob Morris UMass Boston.

June 2006 Image LSID resolvers

Grumble

• LSID Launchpad doesn’t allow showing namespaces in the attribute-value pairs

• sourceforge.lsid.net framework does not support DDNS or some other magical multi-resolver discovery

• Jena rdf serialization doesn’t seem to be scalable.