Realizing Semantic Web - Light Weight semantics and beyond

28
Realizing Semantic Web: Lightweight Semantics and Beyond Krishnaprasad Thirunarayan (T. K. Prasad) Kno.e.sis Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, OH-45435 1

Transcript of Realizing Semantic Web - Light Weight semantics and beyond

Page 1: Realizing Semantic Web - Light Weight semantics and beyond

Realizing Semantic Web: Lightweight Semantics and BeyondKrishnaprasad Thirunarayan (T. K. Prasad)

Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing

Wright State University, Dayton, OH-454351

Page 2: Realizing Semantic Web - Light Weight semantics and beyond

Outline

• Domain Goals and Challenges

• Cyberinfrastructure Investments in Science

• Utility and Continuum of Machine-Processable

Semantics : An Architecture

• What?: Nature of Data and Granurality of Semantics

• Why?: Lightweight semantics and its benefits

• How?: Community-ratified Ontologies

+ Semantic Annotations of Data and Documents

+ Linked Open Materials Data

• Research: Processing Tabular Data

2

Page 3: Realizing Semantic Web - Light Weight semantics and beyond

Domain Goals and Challenges

• Materials Science and Engineering Data and Information sharing, discovery, and application are possible only if domain scientists are able and willing to do so.

• Technological challenges– Computational tools and repositories conducive to easy

exchange, curation, attribution, and analysis of data

• Cultural challenges– Proper protection, control, and credit for sharing data

3

Page 4: Realizing Semantic Web - Light Weight semantics and beyond

Category of

Geoscience

Data

Characteristics Strategy for Reuse CI Strategy

Short tail

science

data created

by large

organization

s and

projects

Few, large (TB+),

structured, spatially

rich (e.g., remote

sensing), largely

homogeneous,

highly visible,

curated

Planned integration

strategies, could use formal

ontologies / domain models

and vocabularies,

visualization tools and APIs

Data centers / grids

generally using

relational databases

and files, maintained

by people with

significant IT skills

Long tail

science

data created

by individual

scientists

and small

groups

Many, small (GB+),

heterogeneous,

invisible (except via

publications),

poorly curated

Multi-domain and broad

vocabularies (including

community established

ones), create semantic

metadata (annotations) and

optionally publish, search

and download legacy data,

or use an open data

initiative

Web-based easy to

learn and use semantic

tools for annotation,

publication, search and

download that can be

used by individual

scientists without

significant IT skills

4

Page 5: Realizing Semantic Web - Light Weight semantics and beyond

Our Thesis

Associating machine-processable semantics

with materials science and engineering data

and documents can help overcome

challenges associated with data discovery,

integration and interoperability caused by

data heterogeneity.

5

Page 6: Realizing Semantic Web - Light Weight semantics and beyond

What?: Nature of Data and Documents

• Structured Data (e.g., relational)

• Semi-structured, Heterogeneous Documents (e.g., publications and technical specs usually include text, numerics, units of measure, images and equations)

• Tabular data (e.g., ad hoc spreadsheets and complex tables incorporating “irregular” entries)

6

Page 7: Realizing Semantic Web - Light Weight semantics and beyond

Fragment of Materials and Process spec for Ti Alloy Bars, Wire, Forgings, and Rings.

7

Page 8: Realizing Semantic Web - Light Weight semantics and beyond

What?: Granularity of Semantics and Applications: Examples

• Synonyms– Chemistry, Chemical Composition, Chemical Analysis, ...

– Bend Test, Bending, ...

– Delivery Condition, Process/Surface Finish, Temper, "as received by

purchaser", ...

• Coreference vs broadening/narrowing– Tubing vs welded tubing vs flash-welded part

• Capturing characteristic-value pairs– Recognize and Normalize: “0.1 inch and under in nominal thickness”

is translated to “Thickness <= 0.1 in”.

– Glean elided characteristic: controlled term “solution heat treated”

implies the characteristic “heat treat type”.

8

Page 9: Realizing Semantic Web - Light Weight semantics and beyond

What?: Granularity of Semantics and Associated Applications

• Lightweight semantics: File and document-level

annotation to enable discovery and sharing

• Richer semantics: Data-level annotation and

extraction for semantic search and summarization

• Fine-grained semantics: Data integration,

interoperability and reasoning in Linked Open

Materials Science Data

9

Page 10: Realizing Semantic Web - Light Weight semantics and beyond

Computer Assisted Document Extraction Tool

10

Typical view of the tagged Spec Tree/Structure view of the Spec

Page 11: Realizing Semantic Web - Light Weight semantics and beyond

Computer Assisted Document Extraction Tool

11

Few More Examples: Procedure Melt Methods

View of the Original Spec Tagged Spec

TagEditor

Page 12: Realizing Semantic Web - Light Weight semantics and beyond

Computer Assisted Document Extraction Tool

12

TagEditor

The SDL

Few More Examples: Procedure Melt Methods

Page 13: Realizing Semantic Web - Light Weight semantics and beyond

Why?: Benefits of Lightweight Semantics

• Ease of use by domain experts

– Faster and wider adoption, promoting evolution

• Low upfront cost to support

• Shallow semantics has wider applicability to a

range of documents/data and appeal to a broader

community of geoscientists

• Bottom-line: “Learn to Walk before we Run”

13

Page 14: Realizing Semantic Web - Light Weight semantics and beyond

How?: Using Semantic Web Technologies

Machine-processable semantics achieved by addressing

• Syntactic Heterogeneity: Using XML syntax and RDF datamodel (labelled graph structure)

• Semantic Heterogeneity: – Using “common” controlled vocabularies, taxonomies

and ontologies

– Using federated data sources, exchanges, querying, and services

14

Page 15: Realizing Semantic Web - Light Weight semantics and beyond

How?: Ingredients for Semantics-based Cyber Infrastructure

• Use of community-ratified controlled vocabularies and lightweight ontologies (upper-level, hierarchies)

• Ease registration, publishing, and discovery

• Provide support for provenance and access control

• Track data citation for credit for data sharing

• Semi-automatic annotation of data and documents : Manual + Automatic

15

Page 16: Realizing Semantic Web - Light Weight semantics and beyond

How?: Search Continuum

• Keyword-based full-text search

• + Manually provided content and source metadata • Uses upper-level ontology

• + Automatically extracted metadata • Map text to concepts/properties/values• Semantic + faceted search using background knowledge

• + Deeper semi-automatic content annotation andextraction

• Aggregating related pieces of information; conditioning• Integration and Interoperation

• + Linked Open Material Science Data

• + Federated and Faceted Querying and Services

16

Page 17: Realizing Semantic Web - Light Weight semantics and beyond

Linked Open Data – Why do we need data?

17

Page 18: Realizing Semantic Web - Light Weight semantics and beyond

Linked Open Data – Just data is not enough

• More and more data are available, But …

18

Isolated islands of data is not enough, akin tothe web of documents without hyperlinks.

data set D

dataset E

dataset F

dataset C

dataset A

dataset B

data set D

dataset E

dataset F

dataset C

dataset A

dataset B

Need to interlink data over the web to enable content-rich applications.

Linked Data

Page 19: Realizing Semantic Web - Light Weight semantics and beyond

Linked Open Data – A Realization

19

http://dbpedia../John_F._Kennedy

http://dbpedia../politician

http://ex./John_Kennedy

http://ex./A_Nation_of_Immigrants

http://ex./AuthoredBook

Owl:sameAshttp://dbpedia../Profession

http://dbpedia../Massachusetts

http://dbpedia../BirthPlace

http://dbpedia../United_States

http://dbpedia../Boston

http://dbpedia../Countryhttp://dbpedia../Capital

http://dbpedia../BirthDate

1917-05-29

http://ex./publishedIn

1964

http://ex./non-fiction

http://ex./genre

Page 20: Realizing Semantic Web - Light Weight semantics and beyond

Linked Open Data

20

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

Page 21: Realizing Semantic Web - Light Weight semantics and beyond

Example: Lightweight Semantic Registration of Data

21

Title of data Selected from five tier vocabulary

provided Keywords

Type of data maps, excel files, images, text

Data format structured or unstructured

Description of data brief unstructured description of content

Contact information of provider(s) name of provider(s), email for verification,

lineage

Spatial extent of data and

reference system

location

Temporal extent of data date range in time or age range if not recent

Date and type of Related

Publication(s)

Journal, Thesis, Agency report, not published

Host site for publication Journal, Library, Personal computer

Access restrictions copyright regulations

Page 22: Realizing Semantic Web - Light Weight semantics and beyond

System Architecture and Components

22

Page 23: Realizing Semantic Web - Light Weight semantics and beyond

Problems and A Practical Approach

(“When rubber meets the road”)

Deeper Issues: Semantic Formalization

of Tabular Data

23

skip

Page 24: Realizing Semantic Web - Light Weight semantics and beyond

Nature of tables

• Compact structures for sharing information

– Minimize duplication

• Types of Tables

– Regular : Dense Grid with explicit schema

information in terms of column and row

headings => Tractable

– Irregular: Sparse Grid with implicit schema and

ad hoc placement of heading => Hard

24

Page 25: Realizing Semantic Web - Light Weight semantics and beyond

25

Page 26: Realizing Semantic Web - Light Weight semantics and beyond

Challenges Associated with Typical Spreadsheet/Table

• Meant for human consumption

• Irregular :

– Not simple rectangular grid

• Heterogeneous

– All rows not interpreted similarly

• Complex

– Meaning of each row and each column context

dependent

• Footnotes modify meaning of entries (esp. in materials

and process specifications)

26

Page 27: Realizing Semantic Web - Light Weight semantics and beyond

Practical Semi-Automatic Content Extraction

• DESIGN: Develop regular data structures that

can be used to formalize tabular information.

– Provide a natural expression of data

– Provide semantics to data, thereby removing potential

ambiguities

– Enable automatic translation

• USE: Manual population of regular tables and

automatic translation into LOD

27

Page 28: Realizing Semantic Web - Light Weight semantics and beyond

28

thank you, and please visit us at

http://knoesis.org/

Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing

Wright State University, Dayton, Ohio, USA

Kno.e.sis