Confidential. The material in this presentation is the property of Fair Isaac Corporation, is...
-
Upload
tracy-gray -
Category
Documents
-
view
212 -
download
0
Transcript of Confidential. The material in this presentation is the property of Fair Isaac Corporation, is...
Confidential. The material in this presentation is the property of Fair Isaac Corporation, is provided for the recipient only, and shall not be used, reproduced, or disclosed without Fair Isaac Corporation's express consent. © 2008 Fair Isaac Corporation.
HNC Data Alignment Research Direction
Richard RohwerSenior Principal Scientist, Advanced Technologies
HNC Software / Fair Isaac
2© 2008 Fair Isaac Corporation. Confidential.
Cognition needs Semantics needs Massive Data
Massive Data
Tacit Knowledge
Explicit Knowledge
KNOWLEDGE
Statistics
includes Semantics / Meaning
= Association Statistics
Information Organization
Statistics Reasoning
Theorem:
Probability distributions
are the UNIQUE logically
consistent knowledge
representation.
3© 2008 Fair Isaac Corporation. Confidential.
Association-Grounded Semantics
AGS
InformationGeometry
Awareness
Meaning from Usage. Discovery of Semantics
as meant
CognitiveResource
From massive data to machine cognition:The technical principles
Mathematical ingredients: Association-Grounded Semantics
(AGS)- To capture meaning
mathematically.
Semantically-Driven Segmentation (SDS)- To extract the most meaningful
patterns.
Distributional Alignment (DA)- To compare meanings abstractly.
Semantically Enriched Reasoning Engine To think in terms of meanings
instead of symbols.
4© 2008 Fair Isaac Corporation. Confidential.
Association-Grounded Semantics (AGS):Meaning = Usage
Cat
Dog
Computer
Hou
se
Tru
ck Oil
Eq
uip
t
Ele
ctro
nic
JoeS
mith
Mou
se
Tai
l
Pet
Foo
d
Cat
Dog
Computer
Hou
se
Tru
ck Oil
Eq
uip
t
Ele
ctro
nic
JoeS
mith
Mou
se
Tai
l
Pet
Foo
d
Terms
Usage Contexts
Similar
Different
Association-Grounded Semantics (AGS): Meaning from usage statistics alone.
Any Language. Any Domain. Any Medium (in principle).No knowledge required. Just add data. (no annotation.)
Cat
Dog
Computer
Hou
se
Tru
ck Oil
Eq
uip
t
Ele
ctro
nic
JoeS
mith
Mou
se
Tai
l
Pet
Foo
d
Cat
Dog
Computer
Hou
se
Tru
ck Oil
Eq
uip
t
Ele
ctro
nic
JoeS
mith
Mou
se
Tai
l
Pet
Foo
d
Terms
Usage Contexts
Similar
Different
Association-Grounded Semantics (AGS): Meaning from usage statistics alone.
Any Language. Any Domain. Any Medium (in principle).No knowledge required. Just add data. (no annotation.)
cat
computer
dog
Distribution Spacehas
Information Geometry
cat
computer
dog
Distribution Spacehas
Information Geometry
fro onto reaching acrs btwn beyond frm inside alg across via thru ovr around near between within through into over by from at
jun sept apr jul nov oct dec aug feb sep
jan
captain mr gen msgt ltc tsgt cpt sgt ssgt
capt maj lt
bsb msj tng opv adm atm cpo bdo notal u b
Cables
5© 2008 Fair Isaac Corporation. Confidential.
Distributional Alignment (DA)Abstraction ~ Structural Commonality
Align semantic spaces by distribution of content. No need to
understand content.
Transport meaning between Languages Dialects Cultures
Transport metaphorically between topics.
English word clusters
English context clusters
Joint Probabilities
German word clusters
German context clusters
Joint ProbabilitiesAlign
English word clusters
English context clusters
Joint Probabilities
German word clusters
German context clusters
Joint ProbabilitiesAlign
English word clusters
English context clusters
Joint Probabilities
German word clusters
German context clusters
Joint ProbabilitiesAlign
transLign algorithm:•No language knowledge.•No tie words.•No aligned corpora.
6© 2008 Fair Isaac Corporation. Confidential.
Alignment: Terminology
RP EnglishCable English
Blog Dialects
Less Commonly Taught Language
Institutional Dialects
Terror Cell Obfuscated Slang
Professional Dialects
Newswire English
Foreign Newswire
Polysemy(Sense resolution)
Good solutions from NIMD:Entity Disambiguation (5.5% err vs. 13.5% err in KDD)General terms
Information Loss(Unequal expressive power)
Automation
AGS techniques do not require manually constructed resources…… but can use them when available.
“bank”“river bank”
“bank note”
AGS Semantic Space
fluffy
snow
What ‘cha call it?
Naïve Bayes
7© 2008 Fair Isaac Corporation. Confidential.
Alignment: Schemata
Column name
Column name
Column name
I n s t a n c eI n s t a n c eI n s t a n c eI n s t a n c eI n s t a n c eI n s t a n c eI n s t a n c e
Table name
Column name
Column name
Column name
I n s t a n c eI n s t a n c eI n s t a n c eI n s t a n c eI n s t a n c eI n s t a n c eI n s t a n c e
Table name
NaturalLanguageCorpora
NaturalLanguageCorpora
SemanticAlignment
Instance Statistics (Joined across
schema)
Instance Statistics (Joined across
schema)
SemanticAlignment
StructuralAlignment
Schema Graph
Schema Graph
8© 2008 Fair Isaac Corporation. Confidential.
Alignment: Ontologies
More complex graph structure Reflecting multiple (transitive) relations
- is-a, part-of, reports-to, prerequisite-for, … Implies more options for defining AGS
statistics- More relations, more ways to define co-
occurrence.
Big Picture issue: Ontological structure makes general
statements about instances of relationships within data.
So does AGS. How are these related?