Confidential. The material in this presentation is the property of Fair Isaac Corporation, is...

8
Confidential. The material in this presentation is the property of Fair Isaac Corporation, is provided for the recipient only, and shall not be used, reproduced, or disclosed without Fair Isaac Corporation's express consent. © 2008 Fair Isaac Corporation. HNC Data Alignment Research Direction Richard Rohwer Senior Principal Scientist, Advanced Technologies HNC Software / Fair Isaac

Transcript of Confidential. The material in this presentation is the property of Fair Isaac Corporation, is...

Confidential. The material in this presentation is the property of Fair Isaac Corporation, is provided for the recipient only, and shall not be used, reproduced, or disclosed without Fair Isaac Corporation's express consent. © 2008 Fair Isaac Corporation.

HNC Data Alignment Research Direction

Richard RohwerSenior Principal Scientist, Advanced Technologies

HNC Software / Fair Isaac

2© 2008 Fair Isaac Corporation. Confidential.

Cognition needs Semantics needs Massive Data

Massive Data

Tacit Knowledge

Explicit Knowledge

KNOWLEDGE

Statistics

includes Semantics / Meaning

= Association Statistics

Information Organization

Statistics Reasoning

Theorem:

Probability distributions

are the UNIQUE logically

consistent knowledge

representation.

3© 2008 Fair Isaac Corporation. Confidential.

Association-Grounded Semantics

AGS

InformationGeometry

Awareness

Meaning from Usage. Discovery of Semantics

as meant

CognitiveResource

From massive data to machine cognition:The technical principles

Mathematical ingredients: Association-Grounded Semantics

(AGS)- To capture meaning

mathematically.

Semantically-Driven Segmentation (SDS)- To extract the most meaningful

patterns.

Distributional Alignment (DA)- To compare meanings abstractly.

Semantically Enriched Reasoning Engine To think in terms of meanings

instead of symbols.

4© 2008 Fair Isaac Corporation. Confidential.

Association-Grounded Semantics (AGS):Meaning = Usage

Cat

Dog

Computer

Hou

se

Tru

ck Oil

Eq

uip

t

Ele

ctro

nic

JoeS

mith

Mou

se

Tai

l

Pet

Foo

d

Cat

Dog

Computer

Hou

se

Tru

ck Oil

Eq

uip

t

Ele

ctro

nic

JoeS

mith

Mou

se

Tai

l

Pet

Foo

d

Terms

Usage Contexts

Similar

Different

Association-Grounded Semantics (AGS): Meaning from usage statistics alone.

Any Language. Any Domain. Any Medium (in principle).No knowledge required. Just add data. (no annotation.)

Cat

Dog

Computer

Hou

se

Tru

ck Oil

Eq

uip

t

Ele

ctro

nic

JoeS

mith

Mou

se

Tai

l

Pet

Foo

d

Cat

Dog

Computer

Hou

se

Tru

ck Oil

Eq

uip

t

Ele

ctro

nic

JoeS

mith

Mou

se

Tai

l

Pet

Foo

d

Terms

Usage Contexts

Similar

Different

Association-Grounded Semantics (AGS): Meaning from usage statistics alone.

Any Language. Any Domain. Any Medium (in principle).No knowledge required. Just add data. (no annotation.)

cat

computer

dog

Distribution Spacehas

Information Geometry

cat

computer

dog

Distribution Spacehas

Information Geometry

fro onto reaching acrs btwn beyond frm inside alg across via thru ovr around near between within through into over by from at

jun sept apr jul nov oct dec aug feb sep

jan

captain mr gen msgt ltc tsgt cpt sgt ssgt

capt maj lt

bsb msj tng opv adm atm cpo bdo notal u b

Cables

5© 2008 Fair Isaac Corporation. Confidential.

Distributional Alignment (DA)Abstraction ~ Structural Commonality

Align semantic spaces by distribution of content. No need to

understand content.

Transport meaning between Languages Dialects Cultures

Transport metaphorically between topics.

English word clusters

English context clusters

Joint Probabilities

German word clusters

German context clusters

Joint ProbabilitiesAlign

English word clusters

English context clusters

Joint Probabilities

German word clusters

German context clusters

Joint ProbabilitiesAlign

English word clusters

English context clusters

Joint Probabilities

German word clusters

German context clusters

Joint ProbabilitiesAlign

transLign algorithm:•No language knowledge.•No tie words.•No aligned corpora.

6© 2008 Fair Isaac Corporation. Confidential.

Alignment: Terminology

RP EnglishCable English

Blog Dialects

Less Commonly Taught Language

Institutional Dialects

Terror Cell Obfuscated Slang

Professional Dialects

Newswire English

Foreign Newswire

Polysemy(Sense resolution)

Good solutions from NIMD:Entity Disambiguation (5.5% err vs. 13.5% err in KDD)General terms

Information Loss(Unequal expressive power)

Automation

AGS techniques do not require manually constructed resources…… but can use them when available.

“bank”“river bank”

“bank note”

AGS Semantic Space

fluffy

snow

What ‘cha call it?

Naïve Bayes

7© 2008 Fair Isaac Corporation. Confidential.

Alignment: Schemata

Column name

Column name

Column name

I n s t a n c eI n s t a n c eI n s t a n c eI n s t a n c eI n s t a n c eI n s t a n c eI n s t a n c e

Table name

Column name

Column name

Column name

I n s t a n c eI n s t a n c eI n s t a n c eI n s t a n c eI n s t a n c eI n s t a n c eI n s t a n c e

Table name

NaturalLanguageCorpora

NaturalLanguageCorpora

SemanticAlignment

Instance Statistics (Joined across

schema)

Instance Statistics (Joined across

schema)

SemanticAlignment

StructuralAlignment

Schema Graph

Schema Graph

8© 2008 Fair Isaac Corporation. Confidential.

Alignment: Ontologies

More complex graph structure Reflecting multiple (transitive) relations

- is-a, part-of, reports-to, prerequisite-for, … Implies more options for defining AGS

statistics- More relations, more ways to define co-

occurrence.

Big Picture issue: Ontological structure makes general

statements about instances of relationships within data.

So does AGS. How are these related?