Bayesian topic models Tom Griffiths UC Berkeley.
-
Upload
rolf-robertson -
Category
Documents
-
view
216 -
download
0
Transcript of Bayesian topic models Tom Griffiths UC Berkeley.
Bayesian topic models
Tom Griffiths
UC Berkeley
Latent structure
Observed data
probabilisticprocess
probabilisticprocess
Latent structure (meaning)
Observed data (words)
statisticalinference
Latent structure (meaning)
Observed data (words)
Outline
Topic models
Latent Dirichlet allocation
More complex models
Conclusions
Outline
Topic models
Latent Dirichlet allocation
More complex models
Conclusions
Topic models• Hierarchical Bayesian models
– share parameters (topics) across densities
• Define a generative model for documents– each document is a mixture of topics– each word is chosen from a single topic
• An idea that is widely used in NLP– e.g., Bigi et al., 1997; Blei et al., 2003; Hofmann, 1999;
Iyer & Ostendorf, 1996; Ueda & Saito, 2003
∑=
===T
jiiii jzPjzwPwP
1
)()|()(topics shared weights vary
A generative model for documents
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2SCIENTIFIC 0.0KNOWLEDGE 0.0WORK 0.0RESEARCH 0.0MATHEMATICS 0.0
HEART 0.0 LOVE 0.0SOUL 0.0TEARS 0.0JOY 0.0 SCIENTIFIC 0.2KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
topic 1 topic 2
w P(w|z = 1) w P(w|z = 2)
Choose mixture weights for each document, generate “bag of words”
{P(z = 1), P(z = 2)}
{0, 1}
{0.25, 0.75}
{0.5, 0.5}
{0.75, 0.25}
{1, 0}
MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK
SCIENTIFIC KNOWLEDGE MATHEMATICS SCIENTIFIC HEART LOVE TEARS KNOWLEDGE HEART
MATHEMATICS HEART RESEARCH LOVE MATHEMATICS WORK TEARS SOUL KNOWLEDGE HEART
WORK JOY SOUL TEARS MATHEMATICS TEARS LOVE LOVE LOVE SOUL
TEARS LOVE JOY SOUL LOVE TEARS SOUL SOUL TEARS JOY
Geometric interpretation
P(LOVE)P(RESEARCH)
P(TEARS)
P(RESEARCH)
P(TEARS)
0
1
1
1
P(LOVE)
Geometric interpretation
P(RESEARCH)P(RESEARCH)
0
1
1
1 P(LOVE)
P(TEARS)
P(LOVE)
P(TEARS)
Geometric interpretation
P(RESEARCH)P(RESEARCH)
0
1
1
1
TOPIC 1TOPIC 2
P(LOVE)
P(TEARS)
P(LOVE)
P(TEARS)
Geometric interpretation
P(RESEARCH)P(RESEARCH)
0
1
1
1
TOPIC 1TOPIC 2
P(LOVE)
P(TEARS)
P(LOVE)
P(TEARS)
P(w|z = 2)
Geometric interpretation
SW
P(w|z = 3)
P(w|z = 1)
minimum KLprojection
P(w)try to minimizeKL divergence
(Hofmann, 1999)
wor
ds
documents
wor
ds
topics
topi
cs
documents
P(w
|z)
P(z)
Matrix factorization interpretation
Maximum-likelihood estimation is finding the factorization that minimizes KL divergence
P(w)
(Hofmann, 1999)
Three nice properties of topic models
• Interpretability of topics– useful in many applications
STORYSTORIES
TELLCHARACTER
CHARACTERSAUTHOR
READTOLD
SETTINGTALESPLOT
TELLINGSHORT
FICTIONACTION
TRUEEVENTSTELLSTALE
NOVEL
MINDWORLDDREAM
DREAMSTHOUGHT
IMAGINATIONMOMENT
THOUGHTSOWNREALLIFE
IMAGINESENSE
CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE
WATERFISHSEA
SWIMSWIMMING
POOLLIKE
SHELLSHARKTANK
SHELLSSHARKSDIVING
DOLPHINSSWAMLONGSEALDIVE
DOLPHINUNDERWATER
DISEASEBACTERIADISEASES
GERMSFEVERCAUSE
CAUSEDSPREADVIRUSES
INFECTIONVIRUS
MICROORGANISMSPERSON
INFECTIOUSCOMMONCAUSING
SMALLPOXBODY
INFECTIONSCERTAIN
FIELDMAGNETIC
MAGNETWIRE
NEEDLECURRENT
COILPOLESIRON
COMPASSLINESCORE
ELECTRICDIRECTION
FORCEMAGNETS
BEMAGNETISM
POLEINDUCED
SCIENCESTUDY
SCIENTISTSSCIENTIFIC
KNOWLEDGEWORK
RESEARCHCHEMISTRY
TECHNOLOGYMANY
MATHEMATICSBIOLOGY
FIELDPHYSICS
LABORATORYSTUDIESWORLD
SCIENTISTSTUDYINGSCIENCES
BALLGAMETEAM
FOOTBALLBASEBALLPLAYERS
PLAYFIELD
PLAYERBASKETBALL
COACHPLAYEDPLAYING
HITTENNISTEAMSGAMESSPORTS
BATTERRY
JOBWORKJOBS
CAREEREXPERIENCE
EMPLOYMENTOPPORTUNITIES
WORKINGTRAINING
SKILLSCAREERS
POSITIONSFIND
POSITIONFIELD
OCCUPATIONSREQUIRE
OPPORTUNITYEARNABLE
Interpretable topics
each column shows words from a single topic, ordered by P(w|z)
Three nice properties of topic models
• Interpretability of topics– useful in many applications
• Handling of multiple meanings or senses– better than spatial representations (e.g., LSI)
STORYSTORIES
TELLCHARACTER
CHARACTERSAUTHOR
READTOLD
SETTINGTALESPLOT
TELLINGSHORT
FICTIONACTION
TRUEEVENTSTELLSTALE
NOVEL
MINDWORLDDREAM
DREAMSTHOUGHT
IMAGINATIONMOMENT
THOUGHTSOWNREALLIFE
IMAGINESENSE
CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE
WATERFISHSEA
SWIMSWIMMING
POOLLIKE
SHELLSHARKTANK
SHELLSSHARKSDIVING
DOLPHINSSWAMLONGSEALDIVE
DOLPHINUNDERWATER
DISEASEBACTERIADISEASES
GERMSFEVERCAUSE
CAUSEDSPREADVIRUSES
INFECTIONVIRUS
MICROORGANISMSPERSON
INFECTIOUSCOMMONCAUSING
SMALLPOXBODY
INFECTIONSCERTAIN
FIELDMAGNETIC
MAGNETWIRE
NEEDLECURRENT
COILPOLESIRON
COMPASSLINESCORE
ELECTRICDIRECTION
FORCEMAGNETS
BEMAGNETISM
POLEINDUCED
SCIENCESTUDY
SCIENTISTSSCIENTIFIC
KNOWLEDGEWORK
RESEARCHCHEMISTRY
TECHNOLOGYMANY
MATHEMATICSBIOLOGY
FIELDPHYSICS
LABORATORYSTUDIESWORLD
SCIENTISTSTUDYINGSCIENCES
BALLGAMETEAM
FOOTBALLBASEBALLPLAYERS
PLAYFIELD
PLAYERBASKETBALL
COACHPLAYEDPLAYING
HITTENNISTEAMSGAMESSPORTS
BATTERRY
JOBWORKJOBS
CAREEREXPERIENCE
EMPLOYMENTOPPORTUNITIES
WORKINGTRAINING
SKILLSCAREERS
POSITIONSFIND
POSITIONFIELD
OCCUPATIONSREQUIRE
OPPORTUNITYEARNABLE
Handling multiple senses
each column shows words from a single topic, ordered by P(w|z)
Three nice properties of topic models
• Interpretability of topics– useful in many applications
• Handling of multiple meanings or senses– better than spatial representations (e.g., LSI)
• Well-formulated generative models– many options for inferring topics from documents– supports extensions of the basic model
Outline
Topic models
Latent Dirichlet allocation
More complex models
Conclusions
Latent Dirichlet allocation(Blei, Ng, & Jordan, 2001; 2003)
Nd D
zi
wi
(d)
(j)
(d) Dirichlet()
zi Discrete( (d) ) (j) Dirichlet()
wi Discrete( (zi) )
T
distribution over topicsfor each document
topic assignment for each word
distribution over words for each topic
word generated from assigned topic
Dirichlet priors
Dirichlet priors• Multivariate equivalent of Beta distribution
• Hyperparameters determine form of the prior
∏=
−
ΓΓ
=T
jjT
Tp
1
1
)(
)()( αθ
α
αθ
Relationship to other models
• A fully generative aspect model (Hofmann, 1999)
• A multinomial version of PCA (Buntine, 2002)
• Poisson models:– non-negative matrix factorization (Lee & Seung, 1999)
– GaP model (Canny, 2004)
– discussed in detail in Buntine & Jakulin (2005)
• Statistics: grade of membership (GoM) models(Erosheva, 2002)
• Genetics: structure of populations(Pritchard, Stephens, & Donnelly, 2000)
Inverting the generative model
• Maximum likelihood estimation (EM)– e.g. Hofmann (1999)
• Deterministic approximate algorithms – variational EM; Blei, Ng & Jordan (2001; 2003)– expectation propagation; Minka & Lafferty (2002)
• Markov chain Monte Carlo– full Gibbs sampler; Pritchard et al. (2000)– collapsed Gibbs sampler; Griffiths & Steyvers (2004)
The collapsed Gibbs sampler• Using conjugacy of Dirichlet and multinomial
distributions, integrate out continuous parameters
• Defines a distribution on discrete ensembles z
ΦΦΦ=∫Δ
dpPPTW
)(),|()|( zwzw
ΘΘΘ= ∫Δ
dpPPDT
)()|()( zz
∑=
z
zzw
zzwwz
)()|(
)()|()|(
PP
PPP
∏ ∑∏
= +ΓΓ
Γ
+Γ=
T
jw
jw
Ww
jw
n
Wn
1)(
)(
)(
)(
)(
)(
β
β
β
β
∏ ∑∏
= +ΓΓ
Γ
+Γ=
D
dj
dj
T
j
dj
n
Tn
1)(
)(
)(
)(
)(
)(
α
α
α
α
The collapsed Gibbs sampler
• Sample each zi conditioned on z-i
• This is nicer than your average Gibbs sampler:– memory: counts can be cached in two sparse matrices– optimization: no special functions, simple arithmetic– the distributions on Φ and Θ are analytic given z and w, and
can later be found for each sample
Tn
n
Wn
nzP
i
i
i
i
i
d
dj
z
zw
ii +
+
+
+∝
••− )(
)(
)(
)(
),|( zw
Gibbs sampling in LDA
i wi di zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
iteration1
Gibbs sampling in LDA
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
?
iteration1 2
Gibbs sampling in LDA
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
?
iteration1 2
Gibbs sampling in LDA
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
?
iteration1 2
Gibbs sampling in LDA
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
2?
iteration1 2
Gibbs sampling in LDA
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
21?
iteration1 2
Gibbs sampling in LDA
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
211?
iteration1 2
Gibbs sampling in LDA
i wi di zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
2112?
iteration1 2
Gibbs sampling in LDA
i wi di zi zi zi123456789
101112...
50
MATHEMATICSKNOWLEDGE
RESEARCHWORK
MATHEMATICSRESEARCH
WORKSCIENTIFIC
MATHEMATICSWORK
SCIENTIFICKNOWLEDGE
.
.
.JOY
111111111122...5
221212212111...2
211222212212...1
…
222122212222...1
iteration1 2 … 1000
pixel = word image = document
sample each pixel froma mixture of topics
A visual example: Bars
Effects of hyperparameters
and control the relative sparsity of Φ and Θ– smaller , fewer topics per document– smaller , fewer words per topic
• Good assignments z compromise in sparsity
log
Γ(x
)
x
∏ ∑∏
= +ΓΓ
Γ
+Γ=
T
jw
jw
Ww
jw
n
WnP
1)(
)(
)(
)(
)(
)()|(
β
β
β
βzw
∏ ∑∏
= +ΓΓ
Γ
+Γ=
D
dj
dj
T
j
dj
n
TnP
1)(
)(
)(
)(
)(
)()(
α
α
α
αz
Analysis of PNAS abstracts
• Test topic models with a real database of scientific papers from PNAS
• All 28,154 abstracts from 1991-2001
• All words occurring in at least five abstracts, not on “stop” list (20,551)
• Total of 3,026,970 tokens in corpus
FORCESURFACE
MOLECULESSOLUTIONSURFACES
MICROSCOPYWATERFORCES
PARTICLESSTRENGTHPOLYMER
IONICATOMIC
AQUEOUSMOLECULARPROPERTIES
LIQUIDSOLUTIONS
BEADSMECHANICAL
HIVVIRUS
INFECTEDIMMUNODEFICIENCY
CD4INFECTION
HUMANVIRAL
TATGP120
REPLICATIONTYPE
ENVELOPEAIDSREV
BLOODCCR5
INDIVIDUALSENV
PERIPHERAL
MUSCLECARDIAC
HEARTSKELETALMYOCYTES
VENTRICULARMUSCLESSMOOTH
HYPERTROPHYDYSTROPHIN
HEARTSCONTRACTION
FIBERSFUNCTION
TISSUERAT
MYOCARDIALISOLATED
MYODFAILURE
STRUCTUREANGSTROM
CRYSTALRESIDUES
STRUCTURESSTRUCTURALRESOLUTION
HELIXTHREE
HELICESDETERMINED
RAYCONFORMATION
HELICALHYDROPHOBIC
SIDEDIMENSIONALINTERACTIONS
MOLECULESURFACE
NEURONSBRAIN
CORTEXCORTICAL
OLFACTORYNUCLEUS
NEURONALLAYER
RATNUCLEI
CEREBELLUMCEREBELLAR
LATERALCEREBRAL
LAYERSGRANULELABELED
HIPPOCAMPUSAREAS
THALAMIC
A selection of topics
TUMORCANCERTUMORSHUMANCELLS
BREASTMELANOMA
GROWTHCARCINOMA
PROSTATENORMAL
CELLMETASTATICMALIGNANT
LUNGCANCERS
MICENUDE
PRIMARYOVARIAN
Cold topics Hot topics
Cold topics Hot topics
2SPECIESGLOBALCLIMATE
CO2WATER
ENVIRONMENTALYEARS
MARINECARBON
DIVERSITYOCEAN
EXTINCTIONTERRESTRIALCOMMUNITYABUNDANCE
134MICE
DEFICIENTNORMAL
GENENULL
MOUSETYPE
HOMOZYGOUSROLE
KNOCKOUTDEVELOPMENT
GENERATEDLACKINGANIMALSREDUCED
179APOPTOSIS
DEATHCELL
INDUCEDBCL
CELLSAPOPTOTIC
CASPASEFAS
SURVIVALPROGRAMMED
MEDIATEDINDUCTIONCERAMIDE
EXPRESSION
Cold topics Hot topics
2SPECIESGLOBALCLIMATE
CO2WATER
ENVIRONMENTALYEARS
MARINECARBON
DIVERSITYOCEAN
EXTINCTIONTERRESTRIALCOMMUNITYABUNDANCE
134MICE
DEFICIENTNORMAL
GENENULL
MOUSETYPE
HOMOZYGOUSROLE
KNOCKOUTDEVELOPMENT
GENERATEDLACKINGANIMALSREDUCED
179APOPTOSIS
DEATHCELL
INDUCEDBCL
CELLSAPOPTOTIC
CASPASEFAS
SURVIVALPROGRAMMED
MEDIATEDINDUCTIONCERAMIDE
EXPRESSION
37CDNA
AMINOSEQUENCE
ACIDPROTEIN
ISOLATEDENCODING
CLONEDACIDS
IDENTITYCLONE
EXPRESSEDENCODES
RATHOMOLOGY
289KDA
PROTEINPURIFIED
MOLECULARMASS
CHROMATOGRAPHYPOLYPEPTIDE
GELSDS
BANDAPPARENTLABELED
IDENTIFIEDFRACTIONDETECTED
75ANTIBODY
ANTIBODIESMONOCLONAL
ANTIGENIGG
MABSPECIFICEPITOPEHUMANMABS
RECOGNIZEDSERA
EPITOPESDIRECTED
NEUTRALIZING
Outline
Topic models
Latent Dirichlet allocation
More complex models
Conclusions
Extensions to the basic model
• No need for a pre-segmented corpus– useful for modeling discussions, meetings
Nu U
zi
wi
(u)
(j)
(u)|su=0 Delta((u-1))(u)|su=1 Dirichlet()
zi Discrete( (u) )
(j) Dirichlet()
wi Discrete( (zi) )
T
A model for meetings
su
(u-1)…
Sample of ICSI meeting corpus(25 meetings)
• no it's o_k. • it's it'll work. • well i can do that. • but then i have to end the presentation in the middle so i can go back to open up javabayes. • o_k fine. • here let's see if i can. • alright. • very nice. • is that better. • yeah. • o_k. • uh i'll also get rid of this click to add notes. • o_k. perfect• NEW TOPIC (not supplied to algorithm)• so then the features we decided or we decided we were talked about. • right. • uh the the prosody the discourse verb choice. • you know we had a list of things like to go and to visit and what not. • the landmark-iness of uh. • i knew you'd like that. • nice coinage. • thank you.
Topic segmentation applied to meetings
Inferred Segmentation
Inferred Topics
Comparison with human judgments
Topics recovered are much more coherent than those found using random segmentation, no segmentation, or an HMM
Extensions to the basic model
• No need for a pre-segmented corpus
• Learning the number of topics
Learning the number of topics
• Can use standard Bayes factor methods to evaluate models of different dimensionality– e.g. importance sampling via MCMC
• Alternative: nonparametric Bayes– fixed number of topics per document,
unbounded number of topics per corpus(Blei, Griffiths, Jordan, & Tenenbaum, 2004)
– unbounded number of topics for both (the hierarchical Dirichlet process)
(Teh, Jordan, Beal, & Blei, 2004)
Extensions to the basic model
• No need for a pre-segmented corpus
• Learning the number of topics
• Learning topic hierarchies
Learning topic hierarchies
• Fixed hierarchies: Hofmann & Puzicha (1998)
• Learning hierarchies: Blei et al. (2004)
Topic 0
Topic 1.1
Topic 1.2
Topic 2.1
Topic 2.2
Topic 2.3
Learning topic hierarchies
Topic 0
Topic 1.1
Topic 1.2
Topic 2.1
Topic 2.2
Topic 2.3
The topics in each document form a path
from root to leaf
• Fixed hierarchies: Hofmann & Puzicha (1998)• Learning hierarchies: Blei et al. (2004)
Twelve years of NIPS
(Blei, Griffiths, Jordan, & Tenenbaum, 2004)
Extensions to the basic model
• No need for a pre-segmented corpus
• Learning the number of topics
• Learning topic hierarchies
• Modeling authors as well as documents
The Author-Topic model(Rosen-Zvi, Griffiths, Smyth, & Steyvers, 2004)
Nd D
zi
wi
(a)
(j)
(a) Dirichlet()
zi Discrete( (xi) ) (j) Dirichlet()
wi Discrete( (zi) )
T
xi
A
xi Uniform(A (d) )
each author has a distribution over topics the author of each
word is chosen uniformly at random
Four example topics from NIPS
WORD PROB. WORD PROB. WORD PROB. WORD PROB.
LIKELIHOOD 0.0539 RECOGNITION 0.0400 REINFORCEMENT 0.0411 KERNEL 0.0683
MIXTURE 0.0509 CHARACTER 0.0336 POLICY 0.0371 SUPPORT 0.0377
EM 0.0470 CHARACTERS 0.0250 ACTION 0.0332 VECTOR 0.0257
DENSITY 0.0398 TANGENT 0.0241 OPTIMAL 0.0208 KERNELS 0.0217
GAUSSIAN 0.0349 HANDWRITTEN 0.0169 ACTIONS 0.0208 SET 0.0205
ESTIMATION 0.0314 DIGITS 0.0159 FUNCTION 0.0178 SVM 0.0204
LOG 0.0263 IMAGE 0.0157 REWARD 0.0165 SPACE 0.0188
MAXIMUM 0.0254 DISTANCE 0.0153 SUTTON 0.0164 MACHINES 0.0168
PARAMETERS 0.0209 DIGIT 0.0149 AGENT 0.0136 REGRESSION 0.0155
ESTIMATE 0.0204 HAND 0.0126 DECISION 0.0118 MARGIN 0.0151
AUTHOR PROB. AUTHOR PROB. AUTHOR PROB. AUTHOR PROB.
Tresp_V 0.0333 Simard_P 0.0694 Singh_S 0.1412 Smola_A 0.1033
Singer_Y 0.0281 Martin_G 0.0394 Barto_A 0.0471 Scholkopf_B 0.0730
Jebara_T 0.0207 LeCun_Y 0.0359 Sutton_R 0.0430 Burges_C 0.0489
Ghahramani_Z 0.0196 Denker_J 0.0278 Dayan_P 0.0324 Vapnik_V 0.0431
Ueda_N 0.0170 Henderson_D 0.0256 Parr_R 0.0314 Chapelle_O 0.0210
Jordan_M 0.0150 Revow_M 0.0229 Dietterich_T 0.0231 Cristianini_N 0.0185
Roweis_S 0.0123 Platt_J 0.0226 Tsitsiklis_J 0.0194 Ratsch_G 0.0172
Schuster_M 0.0104 Keeler_J 0.0192 Randlov_J 0.0167 Laskov_P 0.0169
Xu_L 0.0098 Rashid_M 0.0182 Bradtke_S 0.0161 Tipping_M 0.0153
Saul_L 0.0094 Sackinger_E 0.0132 Schwartz_A 0.0142 Sollich_P 0.0141
TOPIC 19 TOPIC 24 TOPIC 29 TOPIC 87
Who wrote what?A method1 is described which like the kernel1 trick1 in support1 vector1 machines1 SVMs1 lets us generalize distance1 based2 algorithms to operate in feature1 spaces usually nonlinearly related to the input1 space This is done by identifying a class of kernels1 which can be represented as norm1 based2 distances1 in Hilbert spaces It turns1 out that common kernel1 algorithms such as SVMs1 and kernel1 PCA1 are actually really distance1 based2 algorithms and can be run2 with that class of kernels1 too As well as providing1 a useful new insight1 into how these algorithms work the present2 work can form the basis1 for conceiving new algorithms
This paper presents2 a comprehensive approach for model2 based2 diagnosis2 which includes proposals for characterizing and computing2 preferred2 diagnoses2 assuming that the system2 description2 is augmented with a system2 structure2 a directed2 graph2 explicating the interconnections between system2 components2 Specifically we first introduce the notion of a consequence2 which is a syntactically2 unconstrained propositional2 sentence2 that characterizes all consistency2 based2 diagnoses2 and show2 that standard2 characterizations of diagnoses2 such as minimal conflicts1 correspond to syntactic2 variations1 on a consequence2 Second we propose a new syntactic2 variation on the consequence2 known as negation2 normal form NNF and discuss its merits compared to standard variations Third we introduce a basic algorithm2 for computing consequences in NNF given a structured system2 description We show that if the system2 structure2 does not contain cycles2 then there is always a linear size2 consequence2 in NNF which can be computed in linear time2 For arbitrary1 system2 structures2 we show a precise connection between the complexity2 of computing2 consequences and the topology of the underlying system2 structure2 Finally we present2 an algorithm2 that enumerates2 the preferred2 diagnoses2 characterized by a consequence2 The algorithm2 is shown1 to take linear time2 in the size2 of the consequence2 if the preference criterion1 satisfies some general conditions
Written by(1) Scholkopf_B
Written by(2) Darwiche_A
Extensions to the basic model
• No need for a pre-segmented corpus
• Learning the number of topics
• Learning topic hierarchies
• Modeling authors as well as documents
• Combining topics and syntax
Combining topics and syntax
z
w
zz
w w
xxx
semantics: probabilistic topics
syntax: probabilistic regular grammar
Factorization of language based onstatistical dependency patterns:
long-range, document specific,dependencies
short-range dependencies constantacross all documents
(Griffiths, Steyvers, Blei, & Tenenbaum, 2005)
FOODFOODSBODY
NUTRIENTSDIETFAT
SUGARENERGY
MILKEATINGFRUITS
VEGETABLESWEIGHT
FATSNEEDS
CARBOHYDRATESVITAMINSCALORIESPROTEIN
MINERALS
MAPNORTHEARTHSOUTHPOLEMAPS
EQUATORWESTLINESEAST
AUSTRALIAGLOBEPOLES
HEMISPHERELATITUDE
PLACESLAND
WORLDCOMPASS
CONTINENTS
DOCTORPATIENTHEALTH
HOSPITALMEDICAL
CAREPATIENTS
NURSEDOCTORSMEDICINENURSING
TREATMENTNURSES
PHYSICIANHOSPITALS
DRSICK
ASSISTANTEMERGENCY
PRACTICE
BOOKBOOKS
READINGINFORMATION
LIBRARYREPORT
PAGETITLE
SUBJECTPAGESGUIDE
WORDSMATERIALARTICLE
ARTICLESWORDFACTS
AUTHORREFERENCE
NOTE
GOLDIRON
SILVERCOPPERMETAL
METALSSTEELCLAYLEADADAM
OREALUMINUM
MINERALMINE
STONEMINERALS
POTMININGMINERS
TIN
BEHAVIORSELF
INDIVIDUALPERSONALITY
RESPONSESOCIAL
EMOTIONALLEARNINGFEELINGS
PSYCHOLOGISTSINDIVIDUALS
PSYCHOLOGICALEXPERIENCES
ENVIRONMENTHUMAN
RESPONSESBEHAVIORSATTITUDES
PSYCHOLOGYPERSON
CELLSCELL
ORGANISMSALGAE
BACTERIAMICROSCOPEMEMBRANEORGANISM
FOODLIVINGFUNGIMOLD
MATERIALSNUCLEUSCELLED
STRUCTURESMATERIAL
STRUCTUREGREENMOLDS
Semantic topics
PLANTSPLANT
LEAVESSEEDSSOIL
ROOTSFLOWERS
WATERFOOD
GREENSEED
STEMSFLOWER
STEMLEAF
ANIMALSROOT
POLLENGROWING
GROW
GOODSMALL
NEWIMPORTANT
GREATLITTLELARGE
*BIG
LONGHIGH
DIFFERENTSPECIAL
OLDSTRONGYOUNG
COMMONWHITESINGLE
CERTAIN
THEHIS
THEIRYOURHERITSMYOURTHIS
THESEA
ANTHATNEW
THOSEEACH
MRANYMRSALL
MORESUCHLESS
MUCHKNOWN
JUSTBETTERRATHER
GREATERHIGHERLARGERLONGERFASTER
EXACTLYSMALLER
SOMETHINGBIGGERFEWERLOWER
ALMOST
ONAT
INTOFROMWITH
THROUGHOVER
AROUNDAGAINSTACROSS
UPONTOWARDUNDERALONGNEAR
BEHINDOFF
ABOVEDOWN
BEFORE
SAIDASKED
THOUGHTTOLDSAYS
MEANSCALLEDCRIED
SHOWSANSWERED
TELLSREPLIED
SHOUTEDEXPLAINEDLAUGHED
MEANTWROTE
SHOWEDBELIEVED
WHISPERED
ONESOMEMANYTWOEACHALL
MOSTANY
THREETHIS
EVERYSEVERAL
FOURFIVEBOTHTENSIX
MUCHTWENTY
EIGHT
HEYOU
THEYI
SHEWEIT
PEOPLEEVERYONE
OTHERSSCIENTISTSSOMEONE
WHONOBODY
ONESOMETHING
ANYONEEVERYBODY
SOMETHEN
Syntactic classes
BEMAKE
GETHAVE
GOTAKE
DOFINDUSESEE
HELPKEEPGIVELOOKCOMEWORKMOVELIVEEAT
BECOME
Outline
Topic models
Latent Dirichlet allocation
More complex models
Conclusions
Conclusions
• Topic models are a flexible class of models that can capture the content of documents
• Dirichlet priors can be used to assert a preference for sparsity in multinomials
• The collapsed Gibbs sampler makes it possible to exploit these priors, and is simple to use…
• …easy to extend to other aspects of language… • …and applicable in a broad range of models
wor
ds
documents
U D V
wor
ds
dims
dims
dim
s
vect
ors documents
LSI
wor
ds
documents
wor
ds
topics
topi
csdocuments
LDA
P(w
|z)
P(z)P(w)
(Dumais, Landauer)
NIPS: support vector topic
NIPS: neural network topic
MODELALGORITHM
SYSTEMCASE
PROBLEMNETWORKMETHOD
APPROACHPAPER
PROCESS
ISWASHAS
BECOMESDENOTES
BEINGREMAINS
REPRESENTSEXISTSSEEMS
SEESHOWNOTE
CONSIDERASSUMEPRESENT
NEEDPROPOSEDESCRIBESUGGEST
USEDTRAINED
OBTAINEDDESCRIBED
GIVENFOUND
PRESENTEDDEFINED
GENERATEDSHOWN
INWITHFORON
FROMAT
USINGINTOOVER
WITHIN
HOWEVERALSOTHENTHUS
THEREFOREFIRSTHERENOW
HENCEFINALLY
#*IXTN-CFP
EXPERTSEXPERTGATING
HMEARCHITECTURE
MIXTURELEARNINGMIXTURESFUNCTION
GATE
DATAGAUSSIANMIXTURE
LIKELIHOODPOSTERIOR
PRIORDISTRIBUTION
EMBAYESIAN
PARAMETERS
STATEPOLICYVALUE
FUNCTIONACTION
REINFORCEMENTLEARNINGCLASSESOPTIMAL
*
MEMBRANESYNAPTIC
CELL*
CURRENTDENDRITICPOTENTIAL
NEURONCONDUCTANCE
CHANNELS
IMAGEIMAGESOBJECT
OBJECTSFEATURE
RECOGNITIONVIEWS
#PIXEL
VISUAL
KERNELSUPPORTVECTOR
SVMKERNELS
#SPACE
FUNCTIONMACHINES
SET
NETWORKNEURAL
NETWORKSOUPUTINPUT
TRAININGINPUTS
WEIGHTS#
OUTPUTS
NIPS Semantics
NIPS Syntax
(Griffiths, Steyvers, Blei, & Tenenbaum, 2005)
Varying
decreasing increases sparsity
Varying decreasing
increases sparsity?
Number of words per topic
Topic
Num
ber
of w
ords