Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 Su-Shing...
-
Upload
jasper-powell -
Category
Documents
-
view
214 -
download
1
Transcript of Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 Su-Shing...
Indexing Mathematical Abstracts by Metadata and OntologyIMA Workshop, April 26-27, 2004
Su-Shing Chen, University of Florida
Abstract OAI extensions to federated search and other
services for MathML-based metadata indexing and subject classification of mathematical abstracts.
Construction of ontology or conceptual maps of mathematics. Mathematical formulas are considered as elements of the ontology.
Ontology indexing by clustering mathematical abstracts or full papers into an information visualization interface so that users may select using ontology as well as metadata.
DL Server
Data Provider
OAI_DC
Data Provider
OAI_XXX
ServiceProvider
ServiceProvider
Data Mining
Federated Search
Harvester
Harvest API
A DL Server with OAI Extensions:
Managing the Metadata Complexity
DigestedMetadata
DigestedMetadata
HarvestedMetadata
HarvestedMetadata
Service Providers’Data
Service Providers’Data
HarvesterHarvester DataProvider
DataProvider
ServiceProvider 1
ServiceProvider 1
ServiceProvider N
ServiceProvider N
…
Java DataBase Connectivity (JDBC)Java DataBase Connectivity (JDBC)
Server
UserUser DataProvider
DataProvider
ServiceProvider
ServiceProvider
Internet
A DL Server with OAI Extensions:
Managing the Metadata Complexity
Built in capabilities: Harvester – harvest various OAI compliant
data providers Data provider – expose harvested and
existing metadata sets Service provider – federated search and
data mining capabilities on metadata sets
Harvester
DL Server
Harvester
Harvester Interface:
• URL to harvest• Selective harvesting parameters
Harvest API
parametersharvest
harvest
Data Providers
…
Harvested metadata
Harvester Interface
Harvester Interface
Data Provider
Expose single or combined metadata sets harvested to other harvesters
Reformat metadata from different data providers to be harvested by other service providers (e.g., originally Dublin Core, reformat to MARC before exposing)
Service Provider: Federated Search
Emulating a federated search service on existing and combined harvested metadata sets
Federated search across potentially other
search protocols
Federated Search
Federated Search
Federated Search
Service Provider: Data Mining
Knowledge discovery on harvested metadata sets
Metadata classification using the Self-Organizing Map (SOM) algorithm
Improving retrieval effectiveness by providing concept browsing and search services
Self-Organizing Map Algorithm
Competitive and unsupervised learning algorithm
Artificial neural network algorithm for visualizing and interpreting complex data sets
Providing a mapping from a high-dimensional input space to a two-dimensional output space
Data Mining Service Provider System Architecture
Metadata Database
SOM Categorizer
Concept Harvester
Input Vector Generator
Noun Phraser
Browser BrowserConcept browsing
requestConcept search
request ResponseResponse
Request Response
Fetch metadata Save SOM
Concept Harvester
Screenshot of the SOM Categorizer
Construction of Two-level Concept Hierarchy
Constructing the SOM for each harvested metadata set SOMs of the lower layer are added to the upper-layer
SOM.
VTETD
Top-level Concept Browsing
Bottom-level Concept Browsing
MEDLINE Database
Developed by the National Library of Medicine (NLM) Bibliographic citations and abstracts from more than
4,600 biomedical journals published in the United States and 70 other countries.
Covering the fields of medicine, nursing, dentistry, veterinary medicine, the health care system, and the preclinical sciences.
Over 12 million citations Searchable via PubMed or the NLM Gateway
MeSH (Medical Subject Headings)
MEDLINE uses MeSH as its controlled vocabulary for indexing database articles
Indexers scan an entire article and assign MeSH headings (or MeSH descriptors) to each article
MeSH descriptors are arranged in both an alphabetic list and a hierarchical structure.
Updated annually to reflect the changes in medicine and medical terminology
Our Experimentation Problems
It is well known that searching by descriptors will greatly improve the search precision.
However, it is very difficult for naïve users to know and use exact MeSH descriptors to search.
In addition, as the database of MEDLINE grows, information overload would prevent users from finding relevant information of their interest.
Proposed Approach Categorizations according to MeSH terms, MeSH major topics,
and the co-occurrence of MeSH descriptors Clustering using the results of MeSH term categorization through
the Knowledge Grid Visualization of categories and hierarchical clusters
Data Access Services
MeSH Major Topic Tree View SOM Tree View
Knowledge Grid
RAEM Resource Alloc. Execution Mng.
KDS Knowledge
Directory Service
Generic and Data Grid Services
Core K-Grid layer
High level K-Grid layer
KMR
KBR
DA Data
Access Service
TAAS Tools and Algorithms
Access Service
EPM Execution Plan Management
RPS Result
Presentation Serv.
KEPR
Courtesy of Cannataro and Talia(Knowledge Grid: An Architecture for Distributed Knowledge Discovery)
Knowledge Grid Architecture
Future Directions Develop a federated search service for OAI-
compliant mathematical abstracts. Develop an ontology or conceptual maps for
mathematics. Develop an ontology search service for
mathematical abstracts and full papers. Develop an interoperable architecture with
other services, such as OCR of mathematical formulas.
Acknowledgement
Many thanks to the NSF NSDL Program. Collaborators – Joe Futrelle (NCSA), Ed
Fox (Virginia Tech) Student Team – Hyunki Kim, Chee Yoong
Choo, Xiaoou Fu, Yu Chen