Prateek Jain dissertation defense, Kno.e.sis, Wright State University

About 22 years ago..

11 years later…

Image from Scientific American Website

Tim Berners-Lee 2006

1.Use URIs as names for things

2.Use HTTP URIs so that people can look up those names.

3.When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)

4. Include links to other URIs. so that they can discover more things.

In 2006 Web of Data

Linked Open Data

• Massive collection of instance data

• Primarily connected via owl:sameAs relationship

• Excellent source of information for background knowledge

• Labeled as mainstream Semantic Web7/29/12

Is it really mainstream Semantic Web?

• What is the relationship between the models whose instances are being linked?

• How to do querying on LOD without knowing individual datasets?

• How to perform schema level reasoning over LOD cloud?

What can be done?

• Relationships are at the heart of Semantics

• LOD primarily consists of owl:sameAs links

• LOD captures instance level relationships, but lacks class level relationships.o Superclasso Subclasso Equivalence

• How to find these relationships?o Perform a matching of the LOD Ontology’s using state of the art ontology matching tools.

Linked Open Data Alignment and Querying

Dissertation Defense July 27th, 2012

Prateek JainKno.e.sis Center

Wright State University, Dayton, OH

Agenda

• Motivation and Significance of this research

• Research questions and proposed solutions

• State of the current research and planned work

• Questions and comments14th February 2012 12

Linked Open Data

14th February 2012 13

• A set of best practices for publishing and connecting structured data on the Web

• Practices have been adopted by an increasing number of data providers in the past 5 years

• Latest count is at 295 datasets with over 50 Billion triples (approx)

Linked Open Data 2007 (May)

Linking Open Data cloud diagram, this and subsequent pages, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Linked Open Data 2007 (Oct)

Linked Open Data 2009

Linked Open Data 2011

Linked Open DataNumber of Datasets

2011-09-19 295

2010-09-22 203

2009-07-14 95

2008-09-18 45

2007-10-08 25

2007-05-01 12

Number of triples (Sept 2011)

31,634,213,770

with 503,998,829 out-links

From http://www4.wiwiss.fu-berlin.de/lodcloud/state/

6 years of existence how many applications come to

your mind?

7/29/12 19

I tried to investigate..

Compiled List

• BBC Music

• Faviki

• Application Lifecycle Management at IBM Rational

• British Museum

Reality…

• “We DID NOT use the entire Dbpedia or LOD. The only component of LOD which helped us with Watson was YAGO class hierarchy present in DBpedia. We had strict information gain requirements and other components honestly did not help much“

– Researcher with the Watson Team

7/29/12

A simple query..

“Identify congress members, who have voted “No” on pro environmental legislation in the past four years, with high-pollution industry in their congressional districts.”

But even with LOD we cannot answer this query.

Example: GovTrack

Bills:h3962

H.R. 3962: Affordable Health Care for America

Votes:2009-887/+

people/P000197

Nancy PelosiOn Passage: H R 3962 Affordable Health Care for

America Act

Vote: 2009-887

vote:hasAction

vote:vote

dc:title

vote:hasOption

rdfs:labelAye

dc:title

vote:votedBy

Example: GeoNames

rdfs:subClassOf?

Our ApproachUse knowledge contributed by users

To enhance existing approaches to solve these issues:

• Ontology integration

• Detection relationships within and across datasets

• Querying multiple datasets

LOD Cloud

Circling Back

• LOD captures instance level relationships, but lacks class level relationships.o Superclasso Subclasso Equivalence

7/29/12

BLOOMS – Bootstrapping …

• BLOOMS - Bootstrapping-based Linked Open Data Ontology Matching System

• Developed specifically for LOD Ontologies

• Identifies schema level links between different LOD datasets

• Aligns ontologies belonging to diverse domains using diverse data sources

• Technique relies on using hierarchy in other datasets (therefore bootstrapping)

Existing Approaches

A survey of approaches to automatic Ontology matching by Erhard Rahm, Philip A. Bernstein in the VLDB Journal 10: 334–350 (2001)

LOD Ontology Alignment

• Actual Results from these techniques Nation = Menstruation, Confidence=0.9

• They perform extremely well on established benchmarks, but typically not in the wilds.

• LOD Ontology’s are of very different nature• Created by community for community.

• Emphasis on number of instances, not number of meaningful relationships.

• Require solutions beyond syntactic and structural matching.

Rabbit out of a hat?

• Traditional auxiliary data sources (WordNet, Upper Level Ontologies) have limited coverage.

• Community generated is noisy, but is rich in • Content

• Structure

• Has a “self healing property”

• Problems like Ontology Matching have a dimension of context associated with them.

Wikipedia

• The English version alone has more than 2.9 million articles

• Continually expanded by approx. 100,000 active volunteer editors

• Multiple points of view are mentioned with proper contexts

• Article creation/correction is an ongoing activity

Ontology Matching using Wikipedia

• On Wikipedia, categories are used to organize the entire project.

• Wikipedia's category system consists of overlapping trees.

• Simple rules for categorization

BLOOMS Approach – Step 1

• Pre-process the input ontology Remove property restrictions Remove individuals, properties

• Tokenize the class names Remove underscores, hyphens and other delimiters Breakdown complex class names

• example: SemanticWeb => Semantic Web

BLOOMS Approach – Step 2

• Identify article in Wikipedia corresponding to the concept.o Each article related to the concept indicates a sense of the usage of the

• For each article found in the previous stepo Identify the Wikipedia category to which it belongs.o For each category found, find its parent categories till level 4.

• Once the “BLOOMS tree” for each of the sense of the source concept is created (Ts), utilize it for comparison with the “BLOOMS tree” of the target concepts (Tt).

BLOOMS Approach – Step 3• In the tree Ts, remove all nodes for which the parent node

which occurs in Tt to create Ts’.o All leaves of Ts are of level 4 or occur in Tt. o The pruned nodes do not contribute any additional new knowledge.

• Compute overlap Os between the source and target tree.o Os= n/(k-1), n = |z|, z ε Ts’ Π Tt, k= |s|, s ε Ts’

• The decision of alignment is made as follows.o For Ts ε Tc and Tt ε Td, we have Ts=Tt, then C=D.o If min{o(Ts,Tt),o(Tt,Ts)} ≥ x, then set C rdfs:subClassOf D if o(Ts,Tt) ≤ o(Tt,

Ts), and set D rdfs:subClassOf C if o(Ts, Tt) ≥ o(Tt, Ts).

Example

Evaluation Objectives

• To examine BLOOMS as a tool for the purpose of LOD ontology matching.

• To examine the ability of BLOOMS to serve as a general purpose ontology matching system.

BLOOMS

Circling Back

• LOD primarily consists of owl:sameAs links

7/29/12

Part of Relationship Identification

Partonomy Identification

• Currently entities across datasets are linked using primarily the owl:sameAs relationship

• Relationships such as partonomy (part-of), and causality can allow creating even more intelligent applications such as Watson

• Approach PLATO (Part-Of relation finder on Linked Open DAta Tool)

PLATO Approach

• PLATO generates all possible partonomically linked pairs between the entities in the dataset. o Utilize “strongly” associated entities

• Identify the type of each entity in the pair using WordNet.o Use Class Nameso Gives the lexicographer files for the synsets

corresponding to these entities

• Use this information to determine the applicable OWL partonomy properties.o Using Winston’s taxonomy

Winston’s Taxonomy

PLATO Approach – Step 2

• PLATO generates linguistic patterns for each applicable property based on linguistic cues suggested by Winston.o Cell Wall is made of Cellulose

• Tests the lexical patterns for each entity pair in a corpus-driven manner.o Using Web as a corpus

• PLATO counts the total number of web pages that contain the patterno Parse the page and identify the occurance of pattern.

PLATO Approach – Step 3

• Asserts the partonomy property with strongest supporting evidenceo Cell Wall is made of Cellulose, 48o Cellulose is made of Cell Wall, 10

• PLATO also enriches the schema by generalizing from the instance level assertions.

Evaluation Objectives

• To examine PLATO as a tool for finding different kinds of part-of relation.

• To examine PLATO as a tool for finding part-of relation within a dataset

• To examine PLATO as a tool for finding part-of relation across dataset

PLATO Evaluation

Some other work

• Requirement document analysiso Internship at Accenture

• Querying of partonomical relationship

• Operators for querying spatio-temporal-thematic data

• Plug-n-Play system for BLOOMS7/29/12

BLOOMS BLOOMS+ PLATO Others

2010 1. 1 paper at ISWC 2. 1 paper at OM

workshop

1. Paper at AAAI SS2. Paper at GEOS

2011 1. 1 paper at ESWC2. Workshop at ICBO3. 1 patent

2012 1. 1 paper at ACM Hypertext

Total of 7 publications covering this research

Potential Applications

• Automatic domain identification of datasetso Work currently being pursued by Sarasi

• Property alignment on LOD cloudo Work currently being pursued by Kalpa and Sanjaya

• Personalization of property and concepts match.o Machine learning and data mining based techniques

Publications

• Prateek Jain, Pascal Hitzler, Kunal Verma, Peter Z. Yeh and Amit P. Sheth, “Moving beyond sameAs with PLATO: Partonomy detection for Linked Data”. In Proceedings of the 23rd ACM Hypertext and Social Media conference (HT 2012), Milwaukee, WI, USA, June 25th-28th, 2012 (Acceptance Rate 27.5%)

• Amit Krishna Joshi, Prateek Jain, Pascal Hitzler, Peter Yeh, Kunal Verma, Amit Sheth, Mariana Damova, "Alignment-based Querying of Linked Open Data", In Proceedings of the 11th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE 2012) (To Appear)

• Prateek Jain,Peter Z. Yeh, Kunal Verma, Reymonrod Vasquez, Mariana Damova, Pascal Hitzler and Amit P. Sheth, “Contextual Ontology Alignment of LOD with an Upper Ontology: A Case Study with Proton”.In Grigoris Antoniou, Marko Grobelnik, Elena Simperl, Bijan Parsia, Dimitris Plexousakis, Jeff Pan and Pieter De Leenheer, editors, Proceedings of the 8th Extended Semantic Web Conference 2011, volume 6643 of Lecture Notes in Computer Science, Heidelberg, 2011. Springer Berlin. (Acceptance Rate 23.5%)

• Prateek Jain, Pascal Hitzler, Amit P. Sheth, Kunal Verma and Peter Z. Yeh, “Ontology Alignment for Linked Open Data”. In P. Patel-Schneider, Y. Pan, P. Hitzler, P. Mika, L. Zhang, J. Pan, I. Horrocks, And B. Glimm, editors, Proceedings of the 9th International Semantic Web Conference 2010, Shanghai, China, November 7th-11th, 2010,volume 6496 of Lecture Notes in Computer Science, pages 402-417, Heidelberg, 2010. Springer Berlin. (Acceptance Rate 20%)

Publications

• Prateek Jain, Pascal Hitzler and Amit P. Sheth. "Flexible Bootstrapping-Based Ontology Alignment". In Proceedings of the Fifth international Workshop on Ontology Matching (Shanghai, China, November 7th - 11th, 2010).

• Prateek Jain, Pascal Hitzler, Peter Z. Yeh, Kunal Verma, and Amit P. Sheth, “Linked Data Is Merely More Data”. In: Dan Brickley, Vinay K. Chaudhri, Harry Halpin, and Deborah McGuinness: Linked Data Meets Artificial Intelligence. Technical Report SS-10-07, AAAI Press, Menlo Park, California, 2010, pp. 82-86. ISBN 978-1-57735-461-1

• Prateek Jain, Peter Yeh, Kunal Verma, Cory Henson, and Amit Sheth. “SPARQL Query Re-writing Using Partonomy Based Transformation Rules”. In K. Janowicz, M. Raubal, and S. Levashkin, editors, Proceedings of the Third International Conference on GeoSpatial Semantics, December 3-4, 2009, Mexico City, Mexico, volume 5892/2009 of Lecture Notes in Computer Science, pages 140–158, Heidelberg, 2009. Springer Berlin.

• Prateek Jain, Kunal Verma, Alex Kass, Reymonrod G. Vasquez, “Automated Review of Natural Language Requirements Documents: Generating Useful Warnings with User-extensible Glossaries Driving a Simple State Machine”, In Kiran Deshpande, Pankaj Jalote and Sriram K. Rajamani editors, Proceedings of the Second India Software Engineering Conference, February 23-26, 2009, Pune, India, ACM, New York, NY, 37-46. DOI= http://doi.acm.org/10.1145/1506216.1506224 (Acceptance Rate 10%).

Publications

• Prateek Jain, Peter Z. Yeh, Kunal Verma, Alex Kass, and Amit P. Sheth, 2008. "Enhancing process-adaptation capabilities with web-based corporate radar technologies". In Proceedings of the First international Workshop on ontology-Supported Business intelligence (Karlsruhe, Germany, October 27 - 27, 2008). OBI '08, vol. 308. ACM, New York, NY, 1-6. DOI= http://doi.acm.org/10.1145/1452567.1452569

• Matthew Perry, Amit P. Sheth, Farshad Hakimpour, Prateek Jain. "Supporting Complex Thematic, Spatial and Temporal Queries over Semantic Web Data", In F. T. Fonseca, M. Andrea Rodriguez and S. Levashkin editors, Proceedings of the Second International Conference on GeoSpatial Semantics, December 3-4, 2009, Mexico City, Mexico, volume 4853/2007 of Lecture Notes in Computer Science, pages 228–246, Heidelberg, 2007. Springer Berlin.

• Colin Puri, Karthik Gomadam, Prateek Jain, Peter Z. Yeh, Kunal Verma, “Multiple Ontologies in Healthcare Information Technology: Motivations and Recommendation for Ontology Mapping and Alignment”.In Proceedings of the Workshop on Working with Multiple Biomedical Ontologies (at ICBO), 26 July 2011, Buffalo, NY, USA.

• Cory Henson, Amit P. Sheth, Prateek Jain, Josh Pschorr and Terry Rapoch. "Video on the Semantic Sensor Web", W3C Video on the Web Workshop 12-13 December 2007, San Jose, California and Brussels, Belgium

Patent

• Peter Z. Yeh, Prateek Jain, Kunal Verma, Reymonrod G. Vasquez, Titled: Information Source Alignment, Filed 4th March 2011, Status: Pending.

Acknowledgement

• Cory Hensono coffee breaks, research, football, baseball, politics, life..o First person I met while finding my way to LSDIS lab

• Kno.e.sis Lab Members & support staff

• Folks at Accenture Technology Labso Amazing group of people to work with/for

Acknowledgement

• NSF Award:IIS-0842129, titled III-SGER: Spatio-Temporal-Thematic Queries of Semantic Web Data: a Study of Expressivity and Efficiency

• NSF Award 1143717 III: EAGER -- Expressive Scalable Querying over Linked Open Data.

Questions?

Prateek Jain dissertation defense, Kno.e.sis, Wright State University

Technology

Transcript of Prateek Jain dissertation defense, Kno.e.sis, Wright State University

Delhi Technological University: Design and ... - auv.dtu.ac.inauv.dtu.ac.in/wp-content/uploads/2016/09/Journal2014.pdf · Aayush Gupta,Vatsal Rustagi,Raj Kumar Saini,Akshay Jain,Prateek

Scea prateek

Diverse Yet Efﬁcient Retrieval using Hash Functionscvit.iiit.ac.in/images/ArxivTechnicalReport/2015/Diverse_hashfun.pdf · Prateek Jain Microsoft Research India prajain@microsoft.com

Kno.e.sis Review: late 2012 to mid 2013

Prateek Jain Inderjit S. Dhillon - arXivProvable Inductive Matrix Completion Prateek Jain Microsoft Research India, Bangalore prajain@microsoft.com Inderjit S. Dhillon The University

33369999 Summer Training Project Report on NTPC by Prateek Jain VIT University

Knowledge Enabled Information and Services Science Extending SPARQL to Support Spatially and Temporally Related Information Prateek Jain, Amit Sheth Peter.

Summer Training Project report on NTPC BY Prateek Jain- VIT University

Prateek stylomes, Prateek project noida sector 45-

Kno.e.sis collaborations/projects with AFRL-HE

Prateek Jain 046 {1}

What's up at Kno.e.sis?

Prateek Jain Chi Jiny Sham M. Kakadez x March 10, 2017 · PDF filePrateek Jain Chi Jiny Sham M. Kakadez Praneeth Netrapallix March 10, 2017 Abstract ... Jin et al. [2017] proved global

Fast Similarity Search for Learned Metrics Prateek Jain, Brian Kulis, and Kristen Grauman Department of Computer Sciences University of Texas at Austin.

Introduction to Kno.e.sis Center - March 2011

Why Waste The Waste? Prateek Shetty Sri Bhagawan Mahaveer Jain College V.V.Puram.

Harish Karnick*, Prateek Jain and Piyush Rai* · Harish Karnick*, Prateek Jain# and Piyush Rai* ^School of Computing, University of Utah # Microsoft Research Lab, India *Indian Institute

Purushottam Kar Harikrishna Narasimhan Indian Institute of ... · Purushottam Kar Harikrishna Narasimhany Prateek Jain Microsoft Research, INDIA yIndian Institute of Science, Bangalore,

Knowledge Enabled Information and Services Science SPARQL Query Re-writing for Spatial Datasets Using Partonomy Based Transformation Rules Prateek Jain,

Non-convex Optimization for Machine Learning - Prateek Jain · 2020-03-12 · Prateek Jain Microsoft Research India . Outline •Optimization for Machine Learning •Non-convex Optimization

Harish Karnick, Prateek Jain and Piyush Rai · Harish Karnick, Prateek Jain# and Piyush Rai ^School of Computing, University of Utah # Microsoft Research Lab, India *Indian Institute