Netflix presentation final

47
08/14/22 1 Semantic Technology Overview Trends, Applications November 29, 2010

description

This presentation was done for Netflix in Los Gatos, CA

Transcript of Netflix presentation final

Page 1: Netflix presentation   final

04/10/23 1

Semantic Technology Overview Trends, ApplicationsNovember 29, 2010

Page 2: Netflix presentation   final

04/10/23 2

About Recognos

• Established 1999 ( www.recognos.com )• California S-Corporation – Offices in San Rafael,

San Mateo• In 2000 created Recognos Romania• Office in Romania situated in Cluj (

www.cluj4all.com)• 70 employees • Semantic technologies R&D• Started a meetup : http://www.meetup.com/Cluj-Semantic-

WEB/

• Applications in Finance, CRM, Life Sciences, etc.

Page 3: Netflix presentation   final

04/10/23 3

What is the Semantic Technology• WEB 3.0 ?• Gives meaning through relationships• Building bloc – statements• The statements describe: concepts, logic, restrictions

and individuals (instances)• WWW is for human consumption• Semantic WEB – for machines• Relationships: definitions, associations, aggregations

and restrictions

Page 4: Netflix presentation   final

04/10/23 4

World Wide Web vs. Semantic WEB

Page 5: Netflix presentation   final

04/10/23 5

Major DifficultyOpen World vs Closed World

Anybody can say ANYTHING about ANYTHING!

You don’t know what you don’t know!

Page 6: Netflix presentation   final

04/10/23 6

Semantic Technology vs. Semantic WEB

• Semantic Technology – “machines” try to understand :– Natural Language Text– Images– Sounds– Machine learning

• Semantic WEB Technology – part of the Semantic Technology (semantic search, semantic tagging, microformats (FOAF), web site federation), Linked Open Data

Page 7: Netflix presentation   final

04/10/23 7

Semantic WEB Model

Page 8: Netflix presentation   final

04/10/23 8

How to represent the knowledge

• Gives meaning through relationships• Everybody to understand the same thing• The machines could understand• Eliminates ambiguities through URI – Uniform Resource

Identifier – PURL – Persistent Uniform Locator• Need software that will be able to read these and

“understand”• Describe things on the internet using such a universal

language

Page 9: Netflix presentation   final

04/10/23 9

Building Block RDF“There is a Person identified by http://www.w3.org/People/EM/contact#me, whose name is Eric Miller, whose email address is [email protected], and whose title is Dr.".

Triplets:(i) http://www.w3.org/People/EM/contact#me, http://www.w3.org/2000/10/swap/pim/contact#fullName, "Eric Miller"(ii) http://www.w3.org/People/EM/contact#me, http://www.w3.org/2000/10/swap/pim/contact#personalTitle, "Dr."(iii) http://www.w3.org/People/EM/contact#me, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2000/10/swap/pim/contact#Person(iv) http://www.w3.org/People/EM/contact#me, http://www.w3.org/2000/10/swap/pim/contact#mailbox, [email protected]

Page 10: Netflix presentation   final

04/10/23 10

Ontologies - OWL

http://www.fao.org/countryprofiles/geoinfo.asp?lang=en

Page 11: Netflix presentation   final

04/10/23 11

Ontologies - OWL

http://www.fao.org/countryprofiles/geoinfo.asp?lang=en

An Ontology is a kind of dictionary that describes information in a certain domain using concepts and relationships. It is often implemented using OWL •A Concept is defined as abstract knowledge. (Example: Movie, Country, Organizatiuon). Concepts are explicitly implemented in the ontology with individuals and classes:

•An individual is defined as an object perceived from the real world. (The Sound of Music is a Movie , and belongs to the musical genre.•A class is defined as a set of individuals sharing common properties. In the geopolitical domain, Ethiopia, Republic of Korea or Italy are individuals of the class country; Relationships between concepts are explicitly implemented by: •Object properties between individuals of two classes. For example, has member and is in group properties. •Datatype properties between individuals and literals or XML datatypes. For example, the individual “United States” has the datatype property CodeISO3 with the value “USA". •Restrictions in classes and/or properties. For example, the property spoken Language of the class Movie has been restricted to have only one value, this means that a movie canb have oly one spoken language].

Page 12: Netflix presentation   final

04/10/23 12

The Movie Ontology –

Page 13: Netflix presentation   final

04/10/23 13

The Geo Spatial Ontology –

Page 14: Netflix presentation   final

04/10/23 14

The Movie Ontology – www.movieontology.org

Page 15: Netflix presentation   final

04/10/23 15

• The main entities can be represented as Class using an ontology language (Movie , Person, Role)

• Other attributes (movie rating, movie genres,…) can be represented as Properties of the appropriate Classes

Movie Person

Role

actedfilm

Brad PittTroy

Achilles

acted

film

Page 16: Netflix presentation   final

04/10/23 16

Movie PersonLiteral

title, year, runtime, country, languages, genres,

rating, votes, plot, colorInfo, certificate

company

production_companies, distributor,soundMix,

miscCompanies

Literal

birth_date, death_date,birth_name, longname,

spouse, trivia

Role

teammember

crewmember

stuntPerformer, soundCrew

director

castingDirector, artDirectorassistantDirector

Cast, composer, producer, productionDesigner, artDepartment, productionManager, specialEffects,

setDecorator, editor, writer, Cinematographer, costumeDesigner

actedfilm

(foaf)

Page 17: Netflix presentation   final

04/10/23 17

Troy

title 2004year

163runtime

English

language

6.9

85463

ratingvotes

Alejandro Avendano

longname

stuntPerformer

Jack El Despertador

title

setDecorator

Romerotitlefilm

acted

i.e.Alejandro Avendano as• Actor• Stunt Perfomer• Set Decorator

p1 m1

m2

m3

r1

p1:http://www.imdb.com/Person/Avendanom1:http://www.imdb.com/Movie/Troym2:http://www.imdb.com/Movie/Romerom3:http://www.imdb.com/Movie/JackElDespertadorr1:http://www.imdb.com/Role/DeathSquadMember

Page 18: Netflix presentation   final

04/10/23 18

• find resources according to specific criteria– i.e. Find movies with Roger Bratt as a cinematographer, or movies

with producer Halle Berry’s spouse

• and simpler queries – i.e. Find movies with genre = War, Romance etc

Page 19: Netflix presentation   final

04/10/23 19

How to represent the knowledge

Feature Relational Database Knowledgebase

Structure Schema Ontology Statements

Data Rows Instance Elements

Admin language

DDL Ontology Statements (OWL)

Query language

SQL SPARQL

Relationship Foreign Keys Multidimensional

Logic External of DB / triggers Formal logic statements

Uniqueness Key for table Uniqueness Restriction

Page 20: Netflix presentation   final

04/10/23 20

How to store the knowledgeRDF Stores

•These are “referential databases”

•Oracle 11g – stores RDF in relational database

•http://www.franz.com/agraph/allegrograph/ - Allegrograph

•AllegroGraph RDFStore is a high-performance, persistent RDF graph database. AllegroGraph uses disk-based storage, enabling it to scale to billions of triples. AllegroGraph supports SPARQL, RDFS++, and Prolog reasoning.

•Sesame

•Virtuoso

Page 21: Netflix presentation   final

04/10/23 21

Applications

• Are used to solve complicated problems• All problems could be solved manually or with

conventional applications but with much more effort• The Semantic WEB core idea is to “teach” the machine

to “mimic” the human reasoning – simplistic approach• This is in fact “recycled AI techniques”• Alternative to data warehouses• Using inference to find new facts• Integrates formatted with non-formatted docs • Cross technology queries

Page 22: Netflix presentation   final

04/10/23 22

Potential for NetflixApplications Categories:1) Data Integration of Heterogeneous data silos2) Semantic Search

a) Semantic Taggingb) Faceted Searchc) NL Queries

3) Use of Open Linked Data

4) (Others: Market Sentiment Analysis – blogs, forums; Advertising)

Page 23: Netflix presentation   final

04/10/23 23

Data Integration using Ontologies

n:Movie

n:MovieId

n: hasIdentifier

n:Documentary

isA

n:Director

hasD

irect

or

n:Person

isA

n:Actor

isA

hasActor

IMDB Movie Databasea - Namespace

a. Charactera. Cast Member

a.Picturea.IMDB Id

...

Paramount Movie Databaseb-Namespace

b. Roleb. Person

b.Motion Pictureb. Other fields

...

Warner Bros Movie Database

c:Namespace

c. RoleNamec. PersonNamec.MovieNamec. Other fields

...

RDF Store 1 RDF Store 2 RDF Store 3

Data Mapping:

n:Movie owl:sameAs a:Picturen:Actor owl:sameAs a:charactern:Actor owl:sameAs a:character

n:Actor owl:sameAs c.PersonName….

Data Federation using

SPARQL

The fields on the integrated dataset consists of the union of fields in the

federated data sources.Is is very easy to add new data

sources.

Unformatted text…Blogs, Forums, RSS

Feeds….

RDF Store 3

Knowledge extraction from

text

Canb be data sources in different technologies : Oracle , MySQL,

XLS, CSV, etc.

Page 24: Netflix presentation   final

04/10/23 24

https://pub.needlebase.com/actions/visualizer/V2Visualizer.do?domain=Oscar-History&query=2009+Awards

Page 25: Netflix presentation   final

04/10/23 25

Semantic Search

• Wolfram Alpha, Semantifi• Faceted Search (www.needlebase.com) • Micro Formats • Good Relations• Open Linked Data• Using natural language as a query language

Page 26: Netflix presentation   final

04/10/23 26

Deep WEB vs. Shallow WEB• www.wolframalpha.com, www.google.com• www.semantifi.com

Page 27: Netflix presentation   final

04/10/23 27

Deep WEB vs. Shallow WEB

Page 28: Netflix presentation   final

04/10/23 28

Faceted Searchhttp://www.needlebase.com/cases/events

Page 29: Netflix presentation   final

04/10/23 29

Faceted Searchhttp://dbpedia.neofonie.de/browse/rdf-type:Film/

WikipediaDbpedia – semantified wikipedia

Page 30: Netflix presentation   final

04/10/23 30

MicroformatsA microformat (sometimes abbreviated μF) is a web-based approach to semantic markup which seeks to re-use existing HTML/XHTML tags to convey metadata and other attributes in web pages and other contexts that support (X)HTML, such as RSS. This approach allows software to process information intended for end-users (such as contact information, geographic coordinates, calendar events, and the like) automatically. Examples:hAtom – for marking up Atom feeds from within standard HTML hCalendar – for events hCard – for contact information; includes:

adr – for postal addresses geo – for geographical coordinates (latitude, longitude)

hNews - for news content hProduct – for products hRecipe - for recipes and foodstuffs. hResume – for resumes or CVs hReview – for reviews rel-directory – for distributed directory creation and inclusion[7]

Page 31: Netflix presentation   final

04/10/23 31

Good Relationshttp://www.heppnetz.de/projects/goodrelations/

Page 32: Netflix presentation   final

04/10/23 32

Open Linked Data - Folksonomieshttp://linkeddata.org/

Open Linked Data "a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF."

http://esw.w3.org/DataSetRDFDumps

www.wikipedia.com

www.freebase.com – bought by Google i9n July, 2010 – Metaweb

Folksonomy - Folksonomy is the result of personal free tagging of information and objects (anything with a URL) for one's own retrieval. The tagging is done in a social environment (usually shared and open to others). Folksonomy is created from the act of tagging by the person

consuming the information. (Thomas Vander Wal – 2004)

Page 33: Netflix presentation   final

04/10/23 33

Open Linked Datahttp://linkeddata.org/

Page 34: Netflix presentation   final

04/10/23 34

Freebase

Page 35: Netflix presentation   final

04/10/23 35

http://www.freebase.com/view/film/film

Page 36: Netflix presentation   final

04/10/23 36

http://www.freebase.com/view/film/film

Page 37: Netflix presentation   final

04/10/23 37

LinkedMDB

Page 38: Netflix presentation   final

04/10/23 38

Data.gov & Data.gov.uk

• …..

Page 39: Netflix presentation   final

04/10/23 39

The Future: Using NL as a query language

Comedies with John Travolta filmed in the US

All movies with Clint Eastwood as director

Coppola family movies

Documentaries about the genocide in Africa

Movies filmed in San Francisco Marina

Where can I buy the music from Love Story ?

Is any tour based on the Da Vinci Code ?

Movies based on novels written by 19th Century British writers

Page 40: Netflix presentation   final

04/10/23 40

Using NL as a query language

Page 41: Netflix presentation   final

04/10/23 41

What is behind a semantic search

Page 42: Netflix presentation   final

04/10/23 42

Extracting knowledge from textExpert System – CogitoApplication in Financial Complex Documents Document Advisor Blogs Forums

Page 43: Netflix presentation   final

04/10/23 43

Extracting knowledge from text - A look behind the scene

Page 44: Netflix presentation   final

04/10/23 44

ToolsOpen Source, Licensed

• RDF Stores• Ontology Management : Protégé – Stanford – Open

Source• Data Integration Tools – Cambridge Semantics,

Metatomix• NLP Tools – COGITO (Expert Systems), GATE• Etc…

Page 45: Netflix presentation   final

04/10/23 45

Semantic Technology Companies

Page 46: Netflix presentation   final

04/10/23 46

How can Recognos Help•Recognos is a Semantic Applications Developer

•Works with vendors to develop applications

•Help Netflix create a Semantic Group

•Help selecting technologies

•Build search applications for Linked Data, Faceted Search

•Detect similarities between film descriptions

•Data Integrations

•Leverage the 3 years experience in developing semantic applications (data integration, NLP, semantic search)

• etc.

Page 47: Netflix presentation   final

04/10/23 47

Contact InfoGeorge Roth – CEO Recognos Inc

Skype Id: grecognos

eMail: [email protected]

WEB Site: www.recognos.com

Adonis Damian – Senior Semantic Application Architect

eMail: [email protected]