Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.

9
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation

description

Research Question Problem 1: Conceptual level understanding on queries and documents. How can we use semantic web technologies to improve search results by helping search engine “understand” user's intention to search and “understand” the content of the document?

Transcript of Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.

Page 1: Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.

Toward Semantic Search:RDFa based facet browser

Jin Guang ZhengTetherless World Constellation

Page 2: Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.

Introduction

The current state of the art in search:– Keyword based search mechanism

• Easy to use, low learning curve• Use statistics analysis, machine learning, and natural

language processing technologies to improve search result

Problem:– limited conceptual level understanding on both queries &

documents• “Jaguar”: the car vs the animal• “Understand” the document base on most frequent keyword

– Lack of inference: • ISWC and sub-events

Page 3: Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.

Research Question

Problem 1: Conceptual level understanding on queries and documents.

How can we use semantic web technologies to improve search results by helping search engine “understand” user's intention to search and “understand” the content of the document?

Page 4: Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.

ChallengesUnderstand User's intention to search:

Trade off:

Usability

More semantics (Structured Query)

Need to find the right point where usability and semantic can both be satisfied

Page 5: Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.

Challenges

1. Unstructured Document: Most documents are unstructured text encode in html format. Hard to perform structured query against unstructured data. Need Structured data in/for documents.

2. Perform structured query against documents with structured data.

Page 6: Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.

Approach: User Side

Facet Browse:– Construct the structured query– Help user filter, navigate the search result

Example:

CarAnimal

Page 7: Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.

Approach:Document Side

RDFa or Other Metadata format:– Embedding Structured Metadata into the document– Index RDFa data: “understand” the document base on the

structured data.

Example:<div about=”#Jaguar” typeof=”_:Car”>.....</div>

Page 8: Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.

Research PlanTimeline & Tasks

Research on:1. RDFa Parsing – How current parsers work? Do they parse RDFa correctly? Time? – 2 weeks: Collect parsers, and testing data, perform test on the parsers and collect testing results

4. Analyze Exisiting RDFa data – How much data? What vocabularies? – 3 weeks: Crawl RDFa data, perform analysis on the vocabularies

5. RDFa Indexing – How to index RDFa data so we can retrieve the document through RDFa data? – 4 weeks: Develop an indexing algorithm and test algorithm

2. Facet Generation – What vocabularies? How many facets? – 2 weeks: Perfom analysis on vocabularies and documents

2. Facet Ranking – Which facet can really help user? – 3 weeks: Develop ranking algorithm and test algorithm

Page 9: Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.

Questions

THANK YOU !