Social Fabric of Semantics - SemTech 2010

Post on 18-Dec-2014

679 views 1 download

description

Vocabulary construction is critical to the success of semantic technologies. Can we learn from communities where practical vocabularies have emerged?

Transcript of Social Fabric of Semantics - SemTech 2010

The Social Fabric of SemanticsJamie Taylor, Ph.D.

Explicit Semantics in Surprising Places

microformats

HTML5 MicroData

Open Graph Protocol

RDFa

We have overlooked the human “stack”

The Crisis of Vocabulary

Much formal analysis of knowledge representation

Little guidance on what actually works

education

nationality

contained-by

education

member-of

eventalbums

label

contained-by

contains

member-of

The arrangement of entities in a graph is not predetermined by a higher being

containscontained-by

eventmember-ofnationalityeducationalbums

Vocabulary is a social process

Semantics: To communicate meaning, resulting in an action

Or at least so Blue Guy can write code that responds to the graph in a way consistent with Red Guy's expectations

Vocabulary

"All the types of things you can say about something"

Alison Hewson

EDUNMount Temple

Comprehensive School

May 10, 1960

U2

Million Dollar Hotel

End of Violence

Elevation Partners

Show 8

Dublin

spouse

date of birth

founder

performer

educ

ation

founder

producerperformer

born in

mem

ber o

f

Semantics are in the Links

Alison Hewson

EDUNMount Temple

Comprehensive School

May 10, 1960

U2

Million Dollar Hotel

End of Violence

Elevation Partners

Show 8

Dublin

spouse

date of birth

founder

performer

educ

ation

founder

producerperformer

born in

mem

ber o

f

Semantics are in the Links

Do you understand the words that are coming out of

my mouth?

The Twitter Vocabulary

@

#

Short URLs

Pivot on @

Pivot on Short URL

Pivot on #

#

Broadcast: U(n) = n

Telephone:

Metcalfe's Law

U(n) = n2

Group Network Formation:

Reed's Law

U(n) = 2n

Reed's Law

N

Value N

N^22^N

N

Value

N N2

2N

Broadcast Email Chatrooms

N

Value

N N2

2N

Tweets #tagsFeeds

#tags are a USER invention!

N

Value

N N2

2N

Folksonomy ???Ontology

Twannotations

Tweets have "type"

Name/Value Structure

What's the vocabulary?

•Anything you want

•Lead by example

Vocabulary and VisibilityPros: Feedback, Incentive, Training, Convergence

Vocabulary and VisibilityCons: Usage for side effects

Lessons from everyday vocabulary

Wikipedia Word Frequency

0

2000000

4000000

6000000

8000000

10000000

12000000

14000000

16000000

18000000

20000000

0 20 40 60 80 100 120

Rank

Freq

uen

cy

Data from Victor S. Grishchenko

Zipf’s Law

!Plot by Victor Grishchenko

Zipf’s Explanation

Law of Least Effort:

Use a few common words to communicate main concept

Use a few rare words to disambiguate concepts

Satisficing

535,393 Categories

2k French Films

17 films

Schema Principle #1

Use Types Liberally:

Use a few large, encompassing Types to provide general information

Use several smaller, fine grained Types to provide detailed information

The Freebase Commons·American football ·Internet·Anime/Manga ·Language·Architecture ·Law·Astronomy ·Library·Automotive ·Location·Aviation ·Martial Arts·Awards ·Measurement Unit·Baseball ·Media Common·Basketball ·Medicine·Bicycles ·Metaweb Types·Biology ·Meteorology·Boats ·Military·Broadcast ·Music·Business ·Olympics·Celebrities ·Opera·Chemistry ·Organization·Comics ·People·Common ·Geography·Computers ·Projects·Conferences ·Protected Places·Cricket ·Publishing·Data World ·Radio·Digicams ·Rail·Education ·Religion·Engineering ·Royalty·Event ·Soccer·Clothing and Textiles ·Spaceflight·Fictional Universes ·Sports·Film ·Symbols·Food & Drink ·Tennis·Freebase ·Theater·Games ·Time·Geology ·Transportation·Government ·Travel·Hobbies and Interests ·TV·Ice Hockey ·Video Games·Influence ·Visual Art

Top-level domains

schema = vocabulary

Ontologies you design will be too complicated because almost all people will use a small subset of it

Ontologies you design will be too simple because there will be a long tail of users who will want to express something you didn’t cover

--Colin Evans (Metaweb)

Solution:

• Provide a core

• Let the community tune the specifics to their needs

What is a Politician?

Schema Principle #2

Avoid Types which "carve out" categories of things

"Original TV Program"

• Is a TV Program

• Isn't an adaptation of a film

• Isn't an adaptation of a book

• Isn't an adaptation of a play

• Wasn't spun off from another TV Program

• Hasn't spun off any other TV Programs

"Original TV Program"[{

  "name": null,  "type": "/tv/tv_program",  "b:type": {    "id":       "/media_common/adaptation",    "optional": "forbidden"  },  "spun_off_from": [{    "id":       null,    "optional": "forbidden"  }],  "spin_offs": [{    "id":       null,    "optional": "forbidden"  }]}]

Show as Two Viewsnot a MQL query

Principle #2 Corollary

Strive for bright lines between Types• Let queries and simple types do the work

• Better, easier to maintain data quality

What are you sitting on?

Chair

Furniture

Folding Chair

Natural Category

Added Features?

What does one look like?

Eleanor Rosch

HTML5 MicroData

Open Graph Protocol

#

Addendum

Social Network Analysis Resources

Wikipedia

Jon Kleinberg

http://www.cs.cornell.edu/home/kleinber

Twitter

Kwak et al. WWW2010

http://an.kaist.ac.kr/traces/WWW2010.html

Modeling ResourcesMcGuinness & Noy's Ontologies 101

Attend when possible!

http://ksl.stanford.edu/people/dlm/papers/ontology101

Toward Principles for the Design of Ontologies Used for Knowledge Sharing

http://tomgruber.org/writing/onto-design.htm

Allemang & Hendler

Semantic Web for the Working Ontologist