Semantic web Bootstrapping & Annotation Hassan Sayyadi [email protected] Semantic web research...

35
Semantic web Bootstrapping & Annotation Hassan Sayyadi [email protected] Semantic web research laboratory Computer department Sharif university of technology

Transcript of Semantic web Bootstrapping & Annotation Hassan Sayyadi [email protected] Semantic web research...

Page 1: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

Semantic web Bootstrapping & Annotation

Hassan Sayyadi

[email protected]

Semantic web research laboratory

Computer department

Sharif university of technology

Page 2: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

2

Outline• What is annotation?

• Why use annotation?

• Crawler

• Annotation model

• Annotation methods

• Our Implementation

Page 3: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

3

Outline• What is annotation?

• Why use annotation?

• Crawler

• Annotation model

• Annotation methods

• Our Implementation

Page 4: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

4

What is annotation?• People make notes to themselves in order to

preserve ideas that arise during a variety of activities

• The purpose of these notes is often to summarize, criticize, or emphasize specific phrases or events

• Semantic annotations are to tag ontology class instance data and map it into ontology classes.

Page 5: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

5

Outline• What is annotation?

• Why use annotation?

• Crawler

• Annotation model

• Annotation methods

• Our Implementation

Page 6: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

6

Why use annotation?

• To have the world knowledge at one's finger tips seems possible.

• The Internet is the platform for information.

• Unfortunately most of the information is provided in an unstructured and non-standardized form.

Page 7: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

7

Why use annotation? (continue)

Page 8: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

8

Outline• What is annotation?

• Why use annotation?

• Crawler

• Annotation model

• Annotation methods

• Our Implementation

Page 9: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

9

Crawler

• A crawler is a program which traverses the Internet following these links from one page to the next.

Page 10: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

10

Focused crawler• Not all the Internet knowledge is required for

every query.• This assumption seems reasonable because

most people work on a restricted domain and do not need the knowledge of the whole Internet

• Searching the whole Internet in this case is very inefficient and expensive.

• Free texts in the Internet contain various information in diverse domains.

Page 11: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

11

Focused crawler (continue)

• The focus can be achieved by examining keywords

• Problems: – “Understanding“ the semantic of document– Extremely focusing on one topic

• Another way to focus is the Internet connectivity structure

Page 12: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

12

Outline• What is annotation?

• Why use annotation?

• Crawler

• Annotation model

• Annotation methods

• Our Implementation

Page 13: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

13

Annotation models

• Mark in web page

• Example:– SUT is one of the largest engineering

schools in the Islamic Republic of Iran– <university>SUT</university> is one of the

largest universities in the <country>Islamic Republic of Iran</country>

Page 14: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

14

Annotation models (continue)• Generate RDF• Example:

– SUT is one of the largest engineering schools in the Islamic Republic of Iran

– <rdf:Description rdf:about="http://sharif.edu/#SUT"> <rdf:type>university</rdf:type>

<SHARIF:be_in rdf:resource="http://sharif.edu/#Islamic+Republic+of+Iran"/>

</rdf:Description> <rdf:Description rdf:about="http://sharif.edu/#Islamic+Republic+of+Iran”> <rdf:type>Country</rdf:type> </rdf:Description>

Page 15: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

15

Outline• What is annotation?

• Why use annotation?

• Crawler

• Annotation model

• Annotation methods

• Our Implementation

Page 16: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

16

Annotation methods

• Manually

• Semi-automatically

• Automatically

Page 17: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

17

Automatic Annotation

• The fully automatic creation of semantic annotations is an unsolved problem.

• Automatic semantic annotation for the natural language sentences in these pages is a daunting task and we are often forced to do it manually or semi-automatically using handwritten rules

Page 18: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

18

Manual Annotation• Manual annotation is more easily accomplished

today, using authoring tools, which provide an integrated environment for simultaneously authoring and annotating text.

• However, the use of human annotators is often fraught with errors due to factors such as annotator familiarity with the domain, amount of training, personal motivation and complex schemas

• Manual annotation is also an expensive process

Page 19: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

19

Semi-automatic Annotation

• To overcome the annotation acquisition bottleneck, semiautomatic annotation of documents has been proposed.

Page 20: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

20

Semi-automatic annotation

• assumptions:– vocabulary set is limited– word usage has patterns– semantic ambiguities are rare– terms and jargon of the domain appear

frequently

Page 21: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

21

Semantic Annotation Platform (SAP)

Page 22: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

22

Multistrategy SAPs• Multistrategy SAPs are able to combine

methods from both pattern-based and machine learning-based systems.

• No SAP currently implements the multistrategy approach for semantic annotation, although it has been implemented in systems for ontology extraction (such as On-To-Knowledge)

Page 23: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

23

Semi-automatic annotation (continue)• Example

– I go to Shanghai

• Link structure is

more like a RDF

graph

Page 24: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

24

The accuracy of concepts and relations about different algorithm

Page 25: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

25

Automatic annotation

Page 26: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

26

Source preprocessing

• Document Object Model (DOM)

• Text Model

• Layout Model

• NLP Model

Page 27: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

27

Information Identification• Operators

– perform extraction actions on document access models

– Retrieval, Check, Execute• Strategies

– build operator sequences according to user time and quality requirements

• Source Description– build operator sequences according to user time

and quality requirements

Page 28: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

28

Ontology population• The final stage of the overall process is to

decide which hypothesis represents the extracted information to insert into the ontology

• The module simulates insertions and calculates the cost according to the number of new instance creations, instance modifications or inconsistencies found

Page 29: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

29

Outline• What is annotation?

• Why use annotation?

• Crawler

• Annotation model

• Annotation methods

• Our Implementation

Page 30: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

30

Our implementation

• Crawler:– Crawl all link that contains:

• sharif.ir• sharif.edu• sharif.ac.ir

Page 31: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

31

Our implementation• Source pre-processing

– Html to text• text = text.replaceAll("\n", "*_newline_*");• text = text.replaceAll("\\<script.*?\\</script\\>", "");• text = text.replaceAll("\\<style.*?</style.*\\>", "");• text = text.replaceAll("<\\!--.*?--\\>", "");• text = text.replaceAll("\\<.*?\\>", "");• text = text.replaceAll("&nbsp;", " ");• text = text.replaceAll("&lt;", "<");• …• text = text.replaceAll("\\*_newline_\\*", "\n");

– Additional• text = text.replaceAll("\n(\n|| )*\n",".");• text = text.replaceAll(",", " and ");

Page 32: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

32

Our implementation

• Information extraction:– JMontyLingua

• SUT is one of the largest engineering schools in the Islamic Republic of Iran

• ("be" "SUT" "one" "of largest engineering school" "in Islamic Republic" "of Iran")

Page 33: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

33

Our implementation

• JMontyLingua problem:– SUT has computer, mechanic and electric

engineering departments – ("have" "SUT" "computer mechanic and

electric engineering departments")– ("have" "SUT" "computer and mechanic

and electric engineering departments")

Page 34: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

34

Our inplementation• ("be" "SUT" “university" "in Islamic Republic" "of Iran")

• => ("be" "SUT" “university" "in Islamic Republic of Iran")

• =>SUT,be,university & SUT,be_in,Islamic Republic of Iran

• <rdf:Description rdf:about="http://sharif.edu/#SUT"> <rdf:type>university</rdf:type>

<SHARIF:be_in rdf:resource="http://sharif.edu/#Islamic+Republic+of+Iran"/>

</rdf:Description>

Page 35: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of.

35

Any question?