Bayesian topic models Tom Griffiths UC Berkeley.

Bayesian topic models

Tom Griffiths

UC Berkeley

Latent structure

Observed data

probabilisticprocess

probabilisticprocess

Latent structure (meaning)

Observed data (words)

statisticalinference

Latent structure (meaning)

Observed data (words)

Outline

Topic models

Latent Dirichlet allocation

More complex models

Conclusions

Topic models• Hierarchical Bayesian models

– share parameters (topics) across densities

• Define a generative model for documents– each document is a mixture of topics– each word is chosen from a single topic

• An idea that is widely used in NLP– e.g., Bigi et al., 1997; Blei et al., 2003; Hofmann, 1999;

Iyer & Ostendorf, 1996; Ueda & Saito, 2003

∑=

===T

jiiii jzPjzwPwP

1

)()|()(topics shared weights vary

A generative model for documents

HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2SCIENTIFIC 0.0KNOWLEDGE 0.0WORK 0.0RESEARCH 0.0MATHEMATICS 0.0

HEART 0.0 LOVE 0.0SOUL 0.0TEARS 0.0JOY 0.0 SCIENTIFIC 0.2KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2

topic 1 topic 2

w P(w|z = 1) w P(w|z = 2)

Choose mixture weights for each document, generate “bag of words”

{P(z = 1), P(z = 2)}

{0, 1}

{0.25, 0.75}

{0.5, 0.5}

{0.75, 0.25}

{1, 0}

MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK

SCIENTIFIC KNOWLEDGE MATHEMATICS SCIENTIFIC HEART LOVE TEARS KNOWLEDGE HEART

MATHEMATICS HEART RESEARCH LOVE MATHEMATICS WORK TEARS SOUL KNOWLEDGE HEART

WORK JOY SOUL TEARS MATHEMATICS TEARS LOVE LOVE LOVE SOUL

TEARS LOVE JOY SOUL LOVE TEARS SOUL SOUL TEARS JOY

Geometric interpretation

P(LOVE)P(RESEARCH)

P(TEARS)

P(RESEARCH)

P(TEARS)

0

1

1

1

P(LOVE)


P(RESEARCH)P(RESEARCH)

0

1

1

1 P(LOVE)

P(TEARS)

P(LOVE)

P(TEARS)


P(RESEARCH)P(RESEARCH)

0

1

1

1

TOPIC 1TOPIC 2

P(LOVE)

P(TEARS)

P(LOVE)

P(TEARS)

P(w|z = 2)


SW

P(w|z = 3)

P(w|z = 1)

minimum KLprojection

P(w)try to minimizeKL divergence

(Hofmann, 1999)

wor

ds

documents

wor

ds

topics

topi

cs

documents

P(w

|z)

P(z)

Matrix factorization interpretation

Maximum-likelihood estimation is finding the factorization that minimizes KL divergence

P(w)

(Hofmann, 1999)

Three nice properties of topic models

• Interpretability of topics– useful in many applications

STORYSTORIES

TELLCHARACTER

CHARACTERSAUTHOR

READTOLD

SETTINGTALESPLOT

TELLINGSHORT

FICTIONACTION

TRUEEVENTSTELLSTALE

NOVEL

MINDWORLDDREAM

DREAMSTHOUGHT

IMAGINATIONMOMENT

THOUGHTSOWNREALLIFE

IMAGINESENSE

CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE

WATERFISHSEA

SWIMSWIMMING

POOLLIKE

SHELLSHARKTANK

SHELLSSHARKSDIVING

DOLPHINSSWAMLONGSEALDIVE

DOLPHINUNDERWATER

DISEASEBACTERIADISEASES

GERMSFEVERCAUSE

CAUSEDSPREADVIRUSES

INFECTIONVIRUS

MICROORGANISMSPERSON

INFECTIOUSCOMMONCAUSING

SMALLPOXBODY

INFECTIONSCERTAIN

FIELDMAGNETIC

MAGNETWIRE

NEEDLECURRENT

COILPOLESIRON

COMPASSLINESCORE

ELECTRICDIRECTION

FORCEMAGNETS

BEMAGNETISM

POLEINDUCED

SCIENCESTUDY

SCIENTISTSSCIENTIFIC

KNOWLEDGEWORK

RESEARCHCHEMISTRY

TECHNOLOGYMANY

MATHEMATICSBIOLOGY

FIELDPHYSICS

LABORATORYSTUDIESWORLD

SCIENTISTSTUDYINGSCIENCES

BALLGAMETEAM

FOOTBALLBASEBALLPLAYERS

PLAYFIELD

PLAYERBASKETBALL

COACHPLAYEDPLAYING

HITTENNISTEAMSGAMESSPORTS

BATTERRY

JOBWORKJOBS

CAREEREXPERIENCE

EMPLOYMENTOPPORTUNITIES

WORKINGTRAINING

SKILLSCAREERS

POSITIONSFIND

POSITIONFIELD

OCCUPATIONSREQUIRE

OPPORTUNITYEARNABLE

Interpretable topics

each column shows words from a single topic, ordered by P(w|z)



• Handling of multiple meanings or senses– better than spatial representations (e.g., LSI)

STORYSTORIES

TELLCHARACTER

CHARACTERSAUTHOR

READTOLD

SETTINGTALESPLOT

TELLINGSHORT

FICTIONACTION

TRUEEVENTSTELLSTALE

NOVEL

MINDWORLDDREAM

DREAMSTHOUGHT

IMAGINATIONMOMENT

THOUGHTSOWNREALLIFE

IMAGINESENSE

CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE

WATERFISHSEA

SWIMSWIMMING

POOLLIKE

SHELLSHARKTANK

SHELLSSHARKSDIVING

DOLPHINSSWAMLONGSEALDIVE

DOLPHINUNDERWATER

DISEASEBACTERIADISEASES

GERMSFEVERCAUSE

CAUSEDSPREADVIRUSES

INFECTIONVIRUS

MICROORGANISMSPERSON

INFECTIOUSCOMMONCAUSING

SMALLPOXBODY

INFECTIONSCERTAIN

FIELDMAGNETIC

MAGNETWIRE

NEEDLECURRENT

COILPOLESIRON

COMPASSLINESCORE

ELECTRICDIRECTION

FORCEMAGNETS

BEMAGNETISM

POLEINDUCED

SCIENCESTUDY

SCIENTISTSSCIENTIFIC

KNOWLEDGEWORK

RESEARCHCHEMISTRY

TECHNOLOGYMANY

MATHEMATICSBIOLOGY

FIELDPHYSICS

LABORATORYSTUDIESWORLD

SCIENTISTSTUDYINGSCIENCES

BALLGAMETEAM

FOOTBALLBASEBALLPLAYERS

PLAYFIELD

PLAYERBASKETBALL

COACHPLAYEDPLAYING

HITTENNISTEAMSGAMESSPORTS

BATTERRY

JOBWORKJOBS

CAREEREXPERIENCE

EMPLOYMENTOPPORTUNITIES

WORKINGTRAINING

SKILLSCAREERS

POSITIONSFIND

POSITIONFIELD

OCCUPATIONSREQUIRE

OPPORTUNITYEARNABLE

Handling multiple senses

each column shows words from a single topic, ordered by P(w|z)



• Handling of multiple meanings or senses– better than spatial representations (e.g., LSI)

• Well-formulated generative models– many options for inferring topics from documents– supports extensions of the basic model

Outline

Topic models


More complex models

Conclusions

Latent Dirichlet allocation(Blei, Ng, & Jordan, 2001; 2003)

Nd D

zi

wi

(d)

(j)

(d) Dirichlet()

zi Discrete( (d) ) (j) Dirichlet()

wi Discrete( (zi) )

T

distribution over topicsfor each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Dirichlet priors• Multivariate equivalent of Beta distribution

• Hyperparameters determine form of the prior

∏=

−

ΓΓ

=T

jjT

Tp

1

1

)(

)()( αθ

α

αθ

Relationship to other models

• A fully generative aspect model (Hofmann, 1999)

• A multinomial version of PCA (Buntine, 2002)

• Poisson models:– non-negative matrix factorization (Lee & Seung, 1999)

– GaP model (Canny, 2004)

– discussed in detail in Buntine & Jakulin (2005)

• Statistics: grade of membership (GoM) models(Erosheva, 2002)

• Genetics: structure of populations(Pritchard, Stephens, & Donnelly, 2000)

Inverting the generative model

• Maximum likelihood estimation (EM)– e.g. Hofmann (1999)

• Deterministic approximate algorithms – variational EM; Blei, Ng & Jordan (2001; 2003)– expectation propagation; Minka & Lafferty (2002)

• Markov chain Monte Carlo– full Gibbs sampler; Pritchard et al. (2000)– collapsed Gibbs sampler; Griffiths & Steyvers (2004)

The collapsed Gibbs sampler• Using conjugacy of Dirichlet and multinomial

distributions, integrate out continuous parameters

• Defines a distribution on discrete ensembles z

ΦΦΦ=∫Δ

dpPPTW

)(),|()|( zwzw

ΘΘΘ= ∫Δ

dpPPDT

)()|()( zz

∑=

z

zzw

zzwwz

)()|(

)()|()|(

PP

PPP

∏ ∑∏

= +ΓΓ

Γ

+Γ=

T

jw

jw

Ww

jw

n

Wn

1)(

)(

)(

)(

)(

)(

β

β

β

β

∏ ∑∏

= +ΓΓ

Γ

+Γ=

D

dj

dj

T

j

dj

n

Tn

1)(

)(

)(

)(

)(

)(

α

α

α

α

The collapsed Gibbs sampler

• Sample each zi conditioned on z-i

• This is nicer than your average Gibbs sampler:– memory: counts can be cached in two sparse matrices– optimization: no special functions, simple arithmetic– the distributions on Φ and Θ are analytic given z and w, and

can later be found for each sample

Tn

n

Wn

nzP

i

i

i

i

i

d

dj

z

zw

ii +

+

+

+∝

••− )(

)(

)(

)(

),|( zw

Gibbs sampling in LDA

i wi di zi123456789

101112...

50

MATHEMATICSKNOWLEDGE

RESEARCHWORK

MATHEMATICSRESEARCH

WORKSCIENTIFIC

MATHEMATICSWORK

SCIENTIFICKNOWLEDGE

.

.

.JOY

111111111122...5

221212212111...2

iteration1


i wi di zi zi123456789

101112...

50


RESEARCHWORK

MATHEMATICSRESEARCH

WORKSCIENTIFIC

MATHEMATICSWORK

SCIENTIFICKNOWLEDGE

.

.

.JOY

111111111122...5

221212212111...2

?

iteration1 2



101112...

50


RESEARCHWORK

MATHEMATICSRESEARCH

WORKSCIENTIFIC

MATHEMATICSWORK

SCIENTIFICKNOWLEDGE

.

.

.JOY

111111111122...5

221212212111...2

2?

iteration1 2



101112...

50


RESEARCHWORK

MATHEMATICSRESEARCH

WORKSCIENTIFIC

MATHEMATICSWORK

SCIENTIFICKNOWLEDGE

.

.

.JOY

111111111122...5

221212212111...2

21?

iteration1 2



101112...

50


RESEARCHWORK

MATHEMATICSRESEARCH

WORKSCIENTIFIC

MATHEMATICSWORK

SCIENTIFICKNOWLEDGE

.

.

.JOY

111111111122...5

221212212111...2

211?

iteration1 2



101112...

50


RESEARCHWORK

MATHEMATICSRESEARCH

WORKSCIENTIFIC

MATHEMATICSWORK

SCIENTIFICKNOWLEDGE

.

.

.JOY

111111111122...5

221212212111...2

2112?

iteration1 2


i wi di zi zi zi123456789

101112...

50


RESEARCHWORK

MATHEMATICSRESEARCH

WORKSCIENTIFIC

MATHEMATICSWORK

SCIENTIFICKNOWLEDGE

.

.

.JOY

111111111122...5

221212212111...2

211222212212...1

…

222122212222...1

iteration1 2 … 1000

pixel = word image = document

sample each pixel froma mixture of topics

A visual example: Bars

Effects of hyperparameters

and control the relative sparsity of Φ and Θ– smaller , fewer topics per document– smaller , fewer words per topic

• Good assignments z compromise in sparsity

log

Γ(x

)

x

∏ ∑∏

= +ΓΓ

Γ

+Γ=

T

jw

jw

Ww

jw

n

WnP

1)(

)(

)(

)(

)(

)()|(

β

β

β

βzw

∏ ∑∏

= +ΓΓ

Γ

+Γ=

D

dj

dj

T

j

dj

n

TnP

1)(

)(

)(

)(

)(

)()(

α

α

α

αz

Analysis of PNAS abstracts

• Test topic models with a real database of scientific papers from PNAS

• All 28,154 abstracts from 1991-2001

• All words occurring in at least five abstracts, not on “stop” list (20,551)

• Total of 3,026,970 tokens in corpus

FORCESURFACE

MOLECULESSOLUTIONSURFACES

MICROSCOPYWATERFORCES

PARTICLESSTRENGTHPOLYMER

IONICATOMIC

AQUEOUSMOLECULARPROPERTIES

LIQUIDSOLUTIONS

BEADSMECHANICAL

HIVVIRUS

INFECTEDIMMUNODEFICIENCY

CD4INFECTION

HUMANVIRAL

TATGP120

REPLICATIONTYPE

ENVELOPEAIDSREV

BLOODCCR5

INDIVIDUALSENV

PERIPHERAL

MUSCLECARDIAC

HEARTSKELETALMYOCYTES

VENTRICULARMUSCLESSMOOTH

HYPERTROPHYDYSTROPHIN

HEARTSCONTRACTION

FIBERSFUNCTION

TISSUERAT

MYOCARDIALISOLATED

MYODFAILURE

STRUCTUREANGSTROM

CRYSTALRESIDUES

STRUCTURESSTRUCTURALRESOLUTION

HELIXTHREE

HELICESDETERMINED

RAYCONFORMATION

HELICALHYDROPHOBIC

SIDEDIMENSIONALINTERACTIONS

MOLECULESURFACE

NEURONSBRAIN

CORTEXCORTICAL

OLFACTORYNUCLEUS

NEURONALLAYER

RATNUCLEI

CEREBELLUMCEREBELLAR

LATERALCEREBRAL

LAYERSGRANULELABELED

HIPPOCAMPUSAREAS

THALAMIC

A selection of topics

TUMORCANCERTUMORSHUMANCELLS

BREASTMELANOMA

GROWTHCARCINOMA

PROSTATENORMAL

CELLMETASTATICMALIGNANT

LUNGCANCERS

MICENUDE

PRIMARYOVARIAN

Cold topics Hot topics


2SPECIESGLOBALCLIMATE

CO2WATER

ENVIRONMENTALYEARS

MARINECARBON

DIVERSITYOCEAN

EXTINCTIONTERRESTRIALCOMMUNITYABUNDANCE

134MICE

DEFICIENTNORMAL

GENENULL

MOUSETYPE

HOMOZYGOUSROLE

KNOCKOUTDEVELOPMENT

GENERATEDLACKINGANIMALSREDUCED

179APOPTOSIS

DEATHCELL

INDUCEDBCL

CELLSAPOPTOTIC

CASPASEFAS

SURVIVALPROGRAMMED

MEDIATEDINDUCTIONCERAMIDE

EXPRESSION


2SPECIESGLOBALCLIMATE

CO2WATER

ENVIRONMENTALYEARS

MARINECARBON

DIVERSITYOCEAN

EXTINCTIONTERRESTRIALCOMMUNITYABUNDANCE

134MICE

DEFICIENTNORMAL

GENENULL

MOUSETYPE

HOMOZYGOUSROLE

KNOCKOUTDEVELOPMENT

GENERATEDLACKINGANIMALSREDUCED

179APOPTOSIS

DEATHCELL

INDUCEDBCL

CELLSAPOPTOTIC

CASPASEFAS

SURVIVALPROGRAMMED

MEDIATEDINDUCTIONCERAMIDE

EXPRESSION

37CDNA

AMINOSEQUENCE

ACIDPROTEIN

ISOLATEDENCODING

CLONEDACIDS

IDENTITYCLONE

EXPRESSEDENCODES

RATHOMOLOGY

289KDA

PROTEINPURIFIED

MOLECULARMASS

CHROMATOGRAPHYPOLYPEPTIDE

GELSDS

BANDAPPARENTLABELED

IDENTIFIEDFRACTIONDETECTED

75ANTIBODY

ANTIBODIESMONOCLONAL

ANTIGENIGG

MABSPECIFICEPITOPEHUMANMABS

RECOGNIZEDSERA

EPITOPESDIRECTED

NEUTRALIZING

Outline

Topic models


More complex models

Conclusions

Extensions to the basic model

• No need for a pre-segmented corpus– useful for modeling discussions, meetings

Nu U

zi

wi

(u)

(j)

(u)|su=0 Delta((u-1))(u)|su=1 Dirichlet()

zi Discrete( (u) )

(j) Dirichlet()

wi Discrete( (zi) )

T

A model for meetings

su

(u-1)…

Sample of ICSI meeting corpus(25 meetings)

• no it's o_k. • it's it'll work. • well i can do that. • but then i have to end the presentation in the middle so i can go back to open up javabayes. • o_k fine. • here let's see if i can. • alright. • very nice. • is that better. • yeah. • o_k. • uh i'll also get rid of this click to add notes. • o_k. perfect• NEW TOPIC (not supplied to algorithm)• so then the features we decided or we decided we were talked about. • right. • uh the the prosody the discourse verb choice. • you know we had a list of things like to go and to visit and what not. • the landmark-iness of uh. • i knew you'd like that. • nice coinage. • thank you.

Topic segmentation applied to meetings

Inferred Segmentation

Inferred Topics

Comparison with human judgments

Topics recovered are much more coherent than those found using random segmentation, no segmentation, or an HMM


• No need for a pre-segmented corpus

• Learning the number of topics

Learning the number of topics

• Can use standard Bayes factor methods to evaluate models of different dimensionality– e.g. importance sampling via MCMC

• Alternative: nonparametric Bayes– fixed number of topics per document,

unbounded number of topics per corpus(Blei, Griffiths, Jordan, & Tenenbaum, 2004)

– unbounded number of topics for both (the hierarchical Dirichlet process)

(Teh, Jordan, Beal, & Blei, 2004)




• Learning topic hierarchies

Learning topic hierarchies

• Fixed hierarchies: Hofmann & Puzicha (1998)

• Learning hierarchies: Blei et al. (2004)

Topic 0

Topic 1.1

Topic 1.2

Topic 2.1

Topic 2.2

Topic 2.3

Learning topic hierarchies

Topic 0

Topic 1.1

Topic 1.2

Topic 2.1

Topic 2.2

Topic 2.3

The topics in each document form a path

from root to leaf

• Fixed hierarchies: Hofmann & Puzicha (1998)• Learning hierarchies: Blei et al. (2004)

Twelve years of NIPS

(Blei, Griffiths, Jordan, & Tenenbaum, 2004)





• Modeling authors as well as documents

The Author-Topic model(Rosen-Zvi, Griffiths, Smyth, & Steyvers, 2004)

Nd D

zi

wi

(a)

(j)

(a) Dirichlet()

zi Discrete( (xi) ) (j) Dirichlet()

wi Discrete( (zi) )

T

xi

A

xi Uniform(A (d) )

each author has a distribution over topics the author of each

word is chosen uniformly at random

Four example topics from NIPS

WORD PROB. WORD PROB. WORD PROB. WORD PROB.

LIKELIHOOD 0.0539 RECOGNITION 0.0400 REINFORCEMENT 0.0411 KERNEL 0.0683

MIXTURE 0.0509 CHARACTER 0.0336 POLICY 0.0371 SUPPORT 0.0377

EM 0.0470 CHARACTERS 0.0250 ACTION 0.0332 VECTOR 0.0257

DENSITY 0.0398 TANGENT 0.0241 OPTIMAL 0.0208 KERNELS 0.0217

GAUSSIAN 0.0349 HANDWRITTEN 0.0169 ACTIONS 0.0208 SET 0.0205

ESTIMATION 0.0314 DIGITS 0.0159 FUNCTION 0.0178 SVM 0.0204

LOG 0.0263 IMAGE 0.0157 REWARD 0.0165 SPACE 0.0188

MAXIMUM 0.0254 DISTANCE 0.0153 SUTTON 0.0164 MACHINES 0.0168

PARAMETERS 0.0209 DIGIT 0.0149 AGENT 0.0136 REGRESSION 0.0155

ESTIMATE 0.0204 HAND 0.0126 DECISION 0.0118 MARGIN 0.0151

AUTHOR PROB. AUTHOR PROB. AUTHOR PROB. AUTHOR PROB.

Tresp_V 0.0333 Simard_P 0.0694 Singh_S 0.1412 Smola_A 0.1033

Singer_Y 0.0281 Martin_G 0.0394 Barto_A 0.0471 Scholkopf_B 0.0730

Jebara_T 0.0207 LeCun_Y 0.0359 Sutton_R 0.0430 Burges_C 0.0489

Ghahramani_Z 0.0196 Denker_J 0.0278 Dayan_P 0.0324 Vapnik_V 0.0431

Ueda_N 0.0170 Henderson_D 0.0256 Parr_R 0.0314 Chapelle_O 0.0210

Jordan_M 0.0150 Revow_M 0.0229 Dietterich_T 0.0231 Cristianini_N 0.0185

Roweis_S 0.0123 Platt_J 0.0226 Tsitsiklis_J 0.0194 Ratsch_G 0.0172

Schuster_M 0.0104 Keeler_J 0.0192 Randlov_J 0.0167 Laskov_P 0.0169

Xu_L 0.0098 Rashid_M 0.0182 Bradtke_S 0.0161 Tipping_M 0.0153

Saul_L 0.0094 Sackinger_E 0.0132 Schwartz_A 0.0142 Sollich_P 0.0141

TOPIC 19 TOPIC 24 TOPIC 29 TOPIC 87

Who wrote what?A method1 is described which like the kernel1 trick1 in support1 vector1 machines1 SVMs1 lets us generalize distance1 based2 algorithms to operate in feature1 spaces usually nonlinearly related to the input1 space This is done by identifying a class of kernels1 which can be represented as norm1 based2 distances1 in Hilbert spaces It turns1 out that common kernel1 algorithms such as SVMs1 and kernel1 PCA1 are actually really distance1 based2 algorithms and can be run2 with that class of kernels1 too As well as providing1 a useful new insight1 into how these algorithms work the present2 work can form the basis1 for conceiving new algorithms

This paper presents2 a comprehensive approach for model2 based2 diagnosis2 which includes proposals for characterizing and computing2 preferred2 diagnoses2 assuming that the system2 description2 is augmented with a system2 structure2 a directed2 graph2 explicating the interconnections between system2 components2 Specifically we first introduce the notion of a consequence2 which is a syntactically2 unconstrained propositional2 sentence2 that characterizes all consistency2 based2 diagnoses2 and show2 that standard2 characterizations of diagnoses2 such as minimal conflicts1 correspond to syntactic2 variations1 on a consequence2 Second we propose a new syntactic2 variation on the consequence2 known as negation2 normal form NNF and discuss its merits compared to standard variations Third we introduce a basic algorithm2 for computing consequences in NNF given a structured system2 description We show that if the system2 structure2 does not contain cycles2 then there is always a linear size2 consequence2 in NNF which can be computed in linear time2 For arbitrary1 system2 structures2 we show a precise connection between the complexity2 of computing2 consequences and the topology of the underlying system2 structure2 Finally we present2 an algorithm2 that enumerates2 the preferred2 diagnoses2 characterized by a consequence2 The algorithm2 is shown1 to take linear time2 in the size2 of the consequence2 if the preference criterion1 satisfies some general conditions

Written by(1) Scholkopf_B

Written by(2) Darwiche_A





• Modeling authors as well as documents

• Combining topics and syntax

Combining topics and syntax

z

w

zz

w w

xxx

semantics: probabilistic topics

syntax: probabilistic regular grammar

Factorization of language based onstatistical dependency patterns:

long-range, document specific,dependencies

short-range dependencies constantacross all documents

(Griffiths, Steyvers, Blei, & Tenenbaum, 2005)

FOODFOODSBODY

NUTRIENTSDIETFAT

SUGARENERGY

MILKEATINGFRUITS

VEGETABLESWEIGHT

FATSNEEDS

CARBOHYDRATESVITAMINSCALORIESPROTEIN

MINERALS

MAPNORTHEARTHSOUTHPOLEMAPS

EQUATORWESTLINESEAST

AUSTRALIAGLOBEPOLES

HEMISPHERELATITUDE

PLACESLAND

WORLDCOMPASS

CONTINENTS

DOCTORPATIENTHEALTH

HOSPITALMEDICAL

CAREPATIENTS

NURSEDOCTORSMEDICINENURSING

TREATMENTNURSES

PHYSICIANHOSPITALS

DRSICK

ASSISTANTEMERGENCY

PRACTICE

BOOKBOOKS

READINGINFORMATION

LIBRARYREPORT

PAGETITLE

SUBJECTPAGESGUIDE

WORDSMATERIALARTICLE

ARTICLESWORDFACTS

AUTHORREFERENCE

NOTE

GOLDIRON

SILVERCOPPERMETAL

METALSSTEELCLAYLEADADAM

OREALUMINUM

MINERALMINE

STONEMINERALS

POTMININGMINERS

TIN

BEHAVIORSELF

INDIVIDUALPERSONALITY

RESPONSESOCIAL

EMOTIONALLEARNINGFEELINGS

PSYCHOLOGISTSINDIVIDUALS

PSYCHOLOGICALEXPERIENCES

ENVIRONMENTHUMAN

RESPONSESBEHAVIORSATTITUDES

PSYCHOLOGYPERSON

CELLSCELL

ORGANISMSALGAE

BACTERIAMICROSCOPEMEMBRANEORGANISM

FOODLIVINGFUNGIMOLD

MATERIALSNUCLEUSCELLED

STRUCTURESMATERIAL

STRUCTUREGREENMOLDS

Semantic topics

PLANTSPLANT

LEAVESSEEDSSOIL

ROOTSFLOWERS

WATERFOOD

GREENSEED

STEMSFLOWER

STEMLEAF

ANIMALSROOT

POLLENGROWING

GROW

GOODSMALL

NEWIMPORTANT

GREATLITTLELARGE

*BIG

LONGHIGH

DIFFERENTSPECIAL

OLDSTRONGYOUNG

COMMONWHITESINGLE

CERTAIN

THEHIS

THEIRYOURHERITSMYOURTHIS

THESEA

ANTHATNEW

THOSEEACH

MRANYMRSALL

MORESUCHLESS

MUCHKNOWN

JUSTBETTERRATHER

GREATERHIGHERLARGERLONGERFASTER

EXACTLYSMALLER

SOMETHINGBIGGERFEWERLOWER

ALMOST

ONAT

INTOFROMWITH

THROUGHOVER

AROUNDAGAINSTACROSS

UPONTOWARDUNDERALONGNEAR

BEHINDOFF

ABOVEDOWN

BEFORE

SAIDASKED

THOUGHTTOLDSAYS

MEANSCALLEDCRIED

SHOWSANSWERED

TELLSREPLIED

SHOUTEDEXPLAINEDLAUGHED

MEANTWROTE

SHOWEDBELIEVED

WHISPERED

ONESOMEMANYTWOEACHALL

MOSTANY

THREETHIS

EVERYSEVERAL

FOURFIVEBOTHTENSIX

MUCHTWENTY

EIGHT

HEYOU

THEYI

SHEWEIT

PEOPLEEVERYONE

OTHERSSCIENTISTSSOMEONE

WHONOBODY

ONESOMETHING

ANYONEEVERYBODY

SOMETHEN

Syntactic classes

BEMAKE

GETHAVE

GOTAKE

DOFINDUSESEE

HELPKEEPGIVELOOKCOMEWORKMOVELIVEEAT

BECOME

Outline

Topic models


More complex models

Conclusions

Conclusions

• Topic models are a flexible class of models that can capture the content of documents

• Dirichlet priors can be used to assert a preference for sparsity in multinomials

• The collapsed Gibbs sampler makes it possible to exploit these priors, and is simple to use…

• …easy to extend to other aspects of language… • …and applicable in a broad range of models

wor

ds

documents

U D V

wor

ds

dims

dims

dim

s

vect

ors documents

LSI

wor

ds

documents

wor

ds

topics

topi

csdocuments

LDA

P(w

|z)

P(z)P(w)

(Dumais, Landauer)

NIPS: support vector topic

NIPS: neural network topic

MODELALGORITHM

SYSTEMCASE

PROBLEMNETWORKMETHOD

APPROACHPAPER

PROCESS

ISWASHAS

BECOMESDENOTES

BEINGREMAINS

REPRESENTSEXISTSSEEMS

SEESHOWNOTE

CONSIDERASSUMEPRESENT

NEEDPROPOSEDESCRIBESUGGEST

USEDTRAINED

OBTAINEDDESCRIBED

GIVENFOUND

PRESENTEDDEFINED

GENERATEDSHOWN

INWITHFORON

FROMAT

USINGINTOOVER

WITHIN

HOWEVERALSOTHENTHUS

THEREFOREFIRSTHERENOW

HENCEFINALLY

#*IXTN-CFP

EXPERTSEXPERTGATING

HMEARCHITECTURE

MIXTURELEARNINGMIXTURESFUNCTION

GATE

DATAGAUSSIANMIXTURE

LIKELIHOODPOSTERIOR

PRIORDISTRIBUTION

EMBAYESIAN

PARAMETERS

STATEPOLICYVALUE

FUNCTIONACTION

REINFORCEMENTLEARNINGCLASSESOPTIMAL

*

MEMBRANESYNAPTIC

CELL*

CURRENTDENDRITICPOTENTIAL

NEURONCONDUCTANCE

CHANNELS

IMAGEIMAGESOBJECT

OBJECTSFEATURE

RECOGNITIONVIEWS

#PIXEL

VISUAL

KERNELSUPPORTVECTOR

SVMKERNELS

#SPACE

FUNCTIONMACHINES

SET

NETWORKNEURAL

NETWORKSOUPUTINPUT

TRAININGINPUTS

WEIGHTS#

OUTPUTS

NIPS Semantics

NIPS Syntax

(Griffiths, Steyvers, Blei, & Tenenbaum, 2005)

Varying

decreasing increases sparsity

Varying decreasing

increases sparsity?

Number of words per topic

Topic

Num

ber

of w

ords

Bayesian topic models Tom Griffiths UC Berkeley.

Documents

Transcript of Bayesian topic models Tom Griffiths UC Berkeley.