HOMME: Ontological Explorer

HOMME: Hierarchical‐Ontological Mind Map Explorer

Yi‐Shin Chen, Pei‐Ling Hsu, Hsiao‐Shan Hsieh, Li‐Chin Lee, Carlos ArguetaInstitute of Information Systems and Applications

N i l T i H U i iNational Tsing Hua UniversityIDEA Lab

Outline

• Introduction to IDEA Lab

• Introduction to HOMMEIntroduction to HOMME

• Framework

• Experimental Evaluation

• Conclusions and Future WorkConclusions and Future Work

llIntelligent Data Engineeringand Applications (IDEA) Laboratoryand Applications (IDEA) Laboratory

Research Focus

Mining

Optimization

Storage

Corresponding Research Issuesp g

HCI Network

DatabaseWeb Pattern Recognition

CURRENT PROJECTS

Current Projectsj

• GoogolPlex

– Web information integration and retrievalg

– Topic expansion and integration

Group “answers” based on topic and sentiment– Group answers based on topic and sentiment

GoogolPlex Project (Cont.)g j ( )

• Apply cloud computing to speed up the analysis in large scale and heterogeneous data (Googolplex size)

R l t d h i• Related research issues– Automatic Ontology construction from heterogeneous data

• Related research issues– Sentiment analysis for short articles (e.g., micro‐blogs, social network messages) in multi‐language environments

I hate it when it’s rainy and cold!I hate it when it s rainy and cold!

Loved today’s trip.

I can’t believe this happened!

• Related research issues– Keyword extraction from short articles (e.g., micro‐blogs, social network messages) in multi‐language environments

…task of algorithm analysis consists…

…in a Markov Chain is…

…when sorting is…

• Related research issues– Semantic analysis for different purposes, such as geo‐tagging

– TweoLocator: A Non‐Intrusive Geographical LocatorSystem for Twitter

Id if h l i f i l i i• Identify the location of a particular Twitter at a given time

Using exclusively the content of his/her tweets– Using exclusively the content of his/her tweets

HOMME Conceptual Finder Demop

HOMME Ontology Builder Demo(cont’d)gy ( )

TweoLocator: Framework

TweoLocator: Experimental Resultsp

Tweets

70%80%90%100%

US GB CA AU INOTHERS

Avg Acc

30%40%50%60%70%

ProfilesCorrect tweets 463 288 353 169 125 23 65.6%

Unrelated Tweets 110 55 114 53 41 18 18.1%

Disagreed & Reallocated

142 175 22 0 14 0 16.3%

Accuracy 65% 56% 72% 76% 69% 56% 66%

US GB PH CA Others AU SE

Correct 240 88 39 28 26 22 9

0%10%20%

Wrong 24 3 0 2 0 0 0

N/A 44 17 3 6 1 3 1

Disagreement 16 0 0 1 0 0 0

Accuracy 74% 81% 93% 76% 96% 88% 90%

Current Projectsj

• GoogolPlex

• iConductI i d i– Interactive conducting system

iConduct Projectj

• Analyze the intentions from data streams

• Instantly aggregate user intentions and multimedia data

Current Projectsj

• GoogolPlex

• iConductI i d i– Interactive conducting systems

• MyMiningy g– Market analysis

MyMining Projecty g j

• Mining market information from– Stock data (numerical data)( )

– News, blogs, and micro blogs (text data)

• Find the relationship between Stock Market and social networking sites

• In this research, our goal is to build a system which can help us to :p– Automatically integrate the stock news and Identify the events.Identify the events.

– Evaluate the event influence on the industry level and use the information on verifying pricesand use the information on verifying prices movement.

MyMining Projecth d lMethodology

Off‐line

On‐line

Current Performance

• Accuracy of four methods:Methods Average

Accuracy

Pheromone 0.5784574

Adjust 0 5323214Adjust regression

0.5323214

Regression 0.5134457

Blind test 0.3045479

PEOPLE IN IDEA LAB

Peoplep

• Current students:– Domestic students: 7

– International students: 8San Lucia

Nationality

Myanmar7%

Taiwan46%

Honduras20%

El Salvador

Malaysia6%

Indonesia7%

INTRODUCTION TO HOMME

Humans generate Knowledgeg g

• Collecting all human knowledge has always been a recurring goalg g

Internet Era

• WWW has made collecting all human knowledge possible.g p

Data Flood

• Redundant

• ScatteredScattered

• Mutually complementary

Integrationg

• It is crucial to integrate heterogeneous data sources.– Easier access

Summarization– Summarization.

– Less redundancy

Previous Work (1)( )

• Web data integration and organization based on expert knowledge or collaboratively‐p g ycreated (crowd wisdom) data– Manually– Manually

– Semi‐automatic

– Automatic

• Wikipedia: most successful collaboratively‐created collection of human knowledge on the gweb

U t t d ti l• Unstructured articles• Structured information (infoboxes)

• Other works used Wikipedia structured data to integrate web data.g– YAGO:

• Wikipedia Categories + WordNetWikipedia Categories + WordNet

• http://www.mpi‐inf.mpg.de/yago‐naga/yago/

– DBpedia: • Wikipedia infoboxes

• http://dbpedia.org/About

• Other sources of crowd wisdom studied to integrate and organize web datag g– Social annotations

Search logs– Search logs

• Two approaches to integrate web data:– External Resources to extract relationshipsp

• Relatively small coverage

– Bottom‐up approach to web data integration• Difficulty in labeling the semantic relationships• Difficulty in labeling the semantic relationships

• Relies on multiple heterogeneous “crowd wisdom” data sources.

B i f i• Bottom‐up extraction of semantic relationships present in the web data.

P t i d lik t ti f• Presents a mind map like representation of knowledge for easy navigation

FRAMEWORK

Framework

Data Sources

• Multiple heterogeneous data sources– Search logsg

– Social annotations: Delicious tags

Web directory: Open Directory Project (ODP)– Web directory: Open Directory Project (ODP)

Framework

Resource Integratorg

• Normalize and decompose heterogeneous data into smaller elements with common characteristics.

• We use the notion of word sequences and concept sequences

Word Sequencesq

h h l d d d• Every query in the search log is considered a word sequence

• Every URL in the search log can be decomposed into a word sequenceEvery URL in the search log can be decomposed into a word sequence

– www.mtv.com/music/artist/bowlingforsoupartist.jhtml

<mtv, music, artist, bowling, for, soup, artist>

• All the Delicious tags assigned to a URL are a word sequence

• The ODP title assigned to a URL is a word sequence.

• The ODP category assigned to a URL is turned into a word sequence.– E.g. air/travel/agent <air, travel, agent>

Concept Sequencesp q

• A sequence of words can represent concept

Framework

Term Extractor

• For each frequent word sequence it tries to split it into concepts.p p– E.g. Query: “star wars light saber”

Word sequence: <star wars light saber>Word sequence: <star, wars, light, saber>

Concept sequences: <<star, wars>, <light, saber>>

Framework

Term Mapperpp

• Term Mapper uses the output of Term Extractor to build a features matrix.

1. Classify concepts by ODP category.

2. Frequency of tags assigned to queries as features.q y g g q

Framework

Relationship Finderp

• Input data from Term Extractor: Word sequences

• Goal of relationship Finder: p– Seeks to find important semantic relationships between word sequencesbetween word sequences

• Challenges:T d t t t did t i d– To detect concept candidates in word sequences

– To gather correlated concept candidates

– To name semantic relationships between concept candidates

S l i• Solutions:– Rules of detecting concept candidates from word

sequences • Mapped with existed concepts• Mapped with dictionaries• Mapped with dictionaries• Crowd wisdom

– Frequent queries– ODP titles

• Word sequences containing “of”

C id i th t t d– Considering the contexts among word sequences– Considering the meanings of relationships

i hi l l i hi• Hierarchical Relationships– Has‐Subclass– Is‐A

• Synonymous RelationshipsSynonymous Relationships– Is‐Equal‐ToHas Meaning– Has‐Meaning

• Other relationships– Has‐Data‐About– Has‐Website

i hi l l i hi• Hierarchical Relationships– Has‐Subclass

C l i hi i l i– Is‐A

• Synonymous Relationships

Common relationships in ontologies

Synonymous Relationships– Is‐Equal‐ToHas Meaning– Has‐Meaning

i hi l l i hi• Hierarchical Relationships– Has‐Subclass Top down

Has‐Subclass

– Is‐A

• Synonymous RelationshipsclassBottom up

Synonymous Relationships– Is‐Equal‐ToHas Meaning

is a– Has‐Meaning

• Other relationships instance

– Has‐Data‐About– Has‐Website

Has‐Subclass Relationship FinderpCommon relationships in ontologies

• Hierarchical Relationships– Has‐Subclass Top down

Has‐Subclass

• Utilizing ODP Categories

• Mapping with crowd wisdoms: frequent queries

Mapping with crowd wisdoms: frequent queries

• For instance“ l h ”– Query: “travel agent phone”

– ODP category: air/travel/agent

– Output: travel has‐Subclass travel agent

Is‐A Relationship Finderp

Hi hi l R l ti hiCommon relationships in ontologies

• Hierarchical Relationships– Is‐A

• Word sequences with crowd wisdom

Has‐SubclassBottom up

• Word sequences with crowd wisdom– Queries, ODP titles

• Hierarchies among word sequences

– Word sequences with “of”– Additional words for ambiguous words

• For instanceclass

is aFor instance– Query: “apple company”– Ambiguous word: apple

instance

g pp– Additional words: company– Output: apple company Is‐A company

• Synonymous RelationshipsReferring to the same concepts

Synonymous Relationships– Is‐Equal‐ToHas Meaning– Has‐Meaning

Synonymous Relationship Finder(1)y y p ( )

• Many word sequences refer to the same concepts• Many word sequences refer to the same concepts.

• Is‐Equal‐To– <cartoonnetwork>, and <cartoon, network>

• Has‐Meaning– <ae>, <american, eagle>, and <american, eagle, outfitter>, , g , , g ,

• Finds distinct queries and ODP data referring to same concepts.

• Steps:1. Groups queries based on navigational intention

– Intention inferred from clicked URLs– Groups the navigational queries based on the clicked URL

2. ODP data is added to the groups based on their referring URLs.O data s added to t e g oups based o t e e e g U s

Synonymous Relationship Finder(2)y y p ( )

• For instance:– Query: “american eagle”Q y g

– Clicked URL: www.ae.com

ODP title: “american eagle outfitter”– ODP title: american eagle outfitter

– Output:• “ae” has‐Meaning ”American eagle”

• ”American eagle” has‐Meaning “american eagle f ”outfitter”

• Synonymous RelationshipsSynonymous Relationships– Is‐Equal‐ToHas Meaning– Has‐Meaning

Has‐Data‐About Relationship Finderp

• S t i d d t t t i b• Some terms in word sequences denote concepts present in a web site.

• Finds frequent match between query terms and parts of clicked URLs.

• For instance:– Query: “bowling for soup”– Clicked URL: wwwmtv com/music/artist/bowlingforsoupartist jhtmlClicked URL: www.mtv.com/music/artist/bowlingforsoupartist.jhtml– Output:

• “mtv” has‐Data‐About “music”• “mtv” has‐Data‐About “artist”mtv has Data About artist• “mtv” has‐Data‐About “bowling for soup”

Has‐Website Relationship Finderp

d f i d• Uses word sequences from queries, URLs, and ODP titles

• For instance:– Query: “online dictionary”– Clicked URL: www.m‐w.com– ODP title: “merriam‐webster online”– Output:p

• “online dictionary” has‐Website www.m‐w.com• “merriam‐webster online” has‐Website www.m‐w.com

Iterative Process

• The extracted relationships are used to improve the term extraction process.p p

C i i b h T• Constant interaction between the Term Extractor and the Relationship Finder.

Framework

Concept Cluster Finderp

U h f i d b T M• Uses the features matrix generated by Term Mapper.

• Uses k‐means algorithm to cluster queries.

• Each cluster automatically labeled based on cluster yrepresentative.– Features with highest scores

EXPERIMENTAL EVALUATION

Setupp

• Three data sources:– Search log by MS Live Labs from US users in May 2006

• 1,512,556 navigational queries extracted

– Open Directory Project (ODP)

– Delicious tags crawled from February to May 2010

• Implementation:P f d PHP J S i I f Vi T lki– Prototype front end: PHP + JavaScript InfoVis Toolkit

Demonstration

Ontology Buildergy

Demonstration

Concept Linker (1)p ( )

Concept Linker (2)p ( )

Experimental Results – Concept Linkerp p

O k d h k• Our work was compared to other works:– Single‐link Agglomerative Hierarchical clustering(AHC)– DBSCAN

• We want to evaluate ability to discover query clusters.

• Ground truth: manually labeled 50 queries fromGround truth: manually labeled 50 queries from each category.

HOMME and AHC

HOMME and DBSCAN

Experimental Results ‐ Relationship Fi dFinder

• 11 volunteers checked sample of output relationshipsp

E h h k d 100 l f h l i hi• Each checked 100 tuples for each relationship type.– Total 400 output relationships

– All checked same setAll checked same set

Relationship Finder Evaluated by H E tHuman Expert

CONCLUSIONS AND FUTURE WORK

Conclusions

• The proposed approach uses heterogeneous sources to – Effectively cluster queries related to a concept.

Extract relationships between concepts– Extract relationships between concepts automatically.

• The relationships recognized by HOMME are also recognized by humans most of the time.

Future Work

• Improve coverage for Relationship Finder

• Add more relationship types

• Improve execution times for offline partImprove execution times for offline part

HOMME: Ontological Explorer

Documents

Transcript of HOMME: Ontological Explorer

Ontological Turn

Schelling Ontological

Edinburgh Research Explorer · 2018. 6. 5. · representation are used for declarative problem solving, and, more recently, to model and reason about ontological knowledge. Therefore,

bijoux homme

Relation homme machine

ECOUTE PETIT HOMME

SS2009 HOMME

Distinct Homme

REFLEX HOMME September 2015

REFLEX HOMME fw14

Ontological branding

So Homme review

HOMME Trace Analysis

Ontological arguments

Homme Emergency and the

REFLEX HOMME Jason Morgan

The Ontological Presuppositions of the Ontological Argumentcommonsenseatheism.com/wp-content/uploads/2009/09/Mann-Ontological... · THE ONTOLOGICAL PRESUPPOSITIONS OF THE ONTOLOGICAL

Les Interfaces Homme Ordinateur

Un Homme Preprint Zidane

Lookbook homme maison standards