Mallet Tutorial
-
Upload
mandeep-kang -
Category
Documents
-
view
32 -
download
5
Transcript of Mallet Tutorial
MachineLearningwithMALLET
h1p://mallet.cs.umass.edu
DavidMimno
Informa@onExtrac@onandSynthesisLaboratory,DepartmentofCS
UMass,Amherst
Outline
• AboutMALLET
• Represen@ngData
• Classifica@on
• SequenceTagging
• TopicModeling
Outline
• AboutMALLET
• Represen@ngData
• Classifica@on
• SequenceTagging
• TopicModeling
Who?
• AndrewMcCallum(mostofthework)
• CharlesSu1on,AronCulo1a,GregDruck,KedarBellare,GauravChandalia…
• FernandoPereira,othersatPenn…
WhoamI?
• ChiefmaintainerofMALLET
• PrimaryauthorofMALLETtopicmodelingpackage
Why?
• Mo@va@on:textclassifica@onandinforma@onextrac@on
• Commercialmachinelearning(JustResearch,WhizBang)
• Analysisandindexingofacademicpublica@ons:Cora,Rexa
What?
• Textfocus:dataisdiscreteratherthancon@nuous,evenwhenvaluescouldbecon@nuous:
double value = 3.0
How?
• Commandlinescripts:– bin/mallet[command]‐‐[op@on][value]…
– TextUserInterface(“tui”)classes
• DirectJavaAPI– h1p://mallet.cs.umass.edu/api
Most of this talk
History
• Version0.4:c2004– Classesinedu.umass.cs.mallet.base.*
• Version2.0:c2008– Classesincc.mallet.*– Majorchangestofinitestatetransducerpackage
– bin/malletvs.specializedscripts– Java1.5generics
LearningMore
• h1p://mallet.cs.umass.edu– “QuickStart”guides,focusedoncommandlineprocessing
– Developers’guides,withJavaexamples
• mallet‐[email protected]– Lowvolume,butcanbebursty
Outline
• AboutMALLET
• Represen@ngData
• Classifica@on
• SequenceTagging
• TopicModeling
ModelsforTextData
• Genera@vemodels(Mul@nomials)– NaïveBayes
– HiddenMarkovModels(HMMs)
– LatentDirichletTopicModels
• Discrimina@veRegressionModels– MaxEnt/Logis@cregression
– Condi@onalRandomFields(CRFs)
Representa@ons
• Transformtextdocumentstovectorsx1, x2,…
• Retainmeaningofvectorindices
• Ideallysparsely
Call meIshmael.…
Document
Representa@ons
• Transformtextdocumentstovectorsx1, x2,…
• Retainmeaningofvectorindices
• Ideallysparsely
1.00.0…0.06.00.0…3.0…
Call meIshmael.…
xi
Document
Representa@ons
• Elementsofvectorarecalledfeaturevalues
• Example:Featureatrow345isnumberof@mes“dog”appearsindocument
1.00.0…0.06.00.0…3.0…
xi
DocumentstoVectors
Call me Ishmael.
Document
DocumentstoVectors
Call me Ishmael.
Document
Call me Ishmael
Tokens
DocumentstoVectors
Call me Ishmael
Tokens
call me ishmael
Tokens
DocumentstoVectors
call me ishmael
Tokens
473, 3591, 17
Features
17 ishmael…473 call…3591 me
DocumentstoVectors
17 1.0473 1.03591 1.0
Features (bag)
17 ishmael473 call3591 me
473, 3591, 17
Features (sequence)
17 ishmael…473 call…3591 me
17 ishmael…473 call…3591 me
Instances
Emailmessage,webpage,sentence,journalabstract…
• Name
• Data
• Target/Label
• Source
What is it called?
What is the input?
What is the output?
What did it originally look like?
Instances
• Name
• Data
• Target
• Source
String
TokenSequenceArrayList<Token>
FeatureSequenceint[]
FeatureVectorint -> double map
cc.mallet.types
Alphabets
TObjectIntHashMap mapArrayList entries
int lookupIndex(Object o, boolean shouldAdd)
Object lookupObject(int index)
cc.mallet.types, gnu.trove
17 ishmael…473 call…3591 me
Alphabets
TObjectIntHashMap mapArrayList entries
int lookupIndex(Object o, boolean shouldAdd)
Object lookupObject(int index)
cc.mallet.types, gnu.trove
17 ishmael…473 call…3591 me
for
Alphabets
TObjectIntHashMap mapArrayList entries
cc.mallet.types, gnu.trove
17 ishmael…473 call…3591 me
void stopGrowth()
void startGrowth()
Do not add entries fornew Objects -- defaultis to allow growth.
Crea@ngInstances
• Instanceconstructormethod
• Iterators
new Instance(data, target,name, source)
Iterator<Instance>FileIterator(File[], …)CsvIterator(FileReader, Pattern…)ArrayIterator(Object[])…
cc.mallet.pipe.iterator
Crea@ngInstances
• FileIterator
cc.mallet.pipe.iterator
/data/bad/
/data/good/
Label from dir name
Each instance inits own file
Crea@ngInstances
• CsvIterator
cc.mallet.pipe.iterator
Name, label, data from regular expression groups.“CSV” is a lousy name. LineRegexIterator?
Each instanceon its own line
1001 Melville Call me Ishmael. Some years ago…1002 Dickens It was the best of times, it was…
^([^\t]+)\t([^\t]+)\t(.*)
InstancePipelines
• Sequen@altransforma@onsofinstancefields(usuallyData)
• PassanArrayList<Pipe>toSerialPipes
cc.mallet.pipe
// “data” is a StringCharSequence2TokenSequence// tokenize with regexpTokenSequenceLowercase// modify each token’s textTokenSequenceRemoveStopwords// drop some tokensTokenSequence2FeatureSequence// convert token Strings to intsFeatureSequence2FeatureVector// lose order, count duplicates
InstancePipelines
• Asmallnumberofpipesmodifythe“target”field
• Therearenowtwoalphabets:dataandlabel
cc.mallet.pipe, cc.mallet.types
// “target” is a StringTarget2Label// convert String to int// “target” is now a Label
Alphabet > LabelAlphabet
Labelobjects
• Weightsonafixedsetofclasses
• Fortrainingdata,weightforcorrectlabelis1.0,allothers0.0
cc.mallet.types
implements Labeling
int getBestIndex()Label getBestLabel()
You cannot create a Label,they are only produced byLabelAlphabet
InstanceLists
• AListofInstanceobjects,alongwithaPipe,dataAlphabet,andLabelAlphabet
cc.mallet.types
InstanceList instances = new InstanceList(pipe);
instances.addThruPipe(iterator);
Purngitalltogether
ArrayList<Pipe> pipeList = new ArrayList<Pipe>();
pipeList.add(new Target2Label());pipeList.add(new CharSequence2TokenSequence());pipeList.add(new TokenSequence2FeatureSequence());pipeList.add(new FeatureSequence2FeatureVector());
InstanceList instances = new InstanceList(new SerialPipes(pipeList));
instances.addThruPipe(new FileIterator(. . .));
PersistentStorage
• MostMALLETclassesuseJavaserializa@ontostoremodelsanddata
java.io
ObjectOutputStream oos = new ObjectOutputStream(…);oos.writeObject(instances);oos.close();
Pipes, data objects, labelings, etcall need to implementSerializable.
Be sure to include custom classesin classpath, or you get aStreamCorruptedException
Review
• WhatarethefourmainfieldsinanInstance?
Review
• WhatarethefourmainfieldsinanInstance?
• WhataretwowaystogenerateInstances?
Review
• WhatarethefourmainfieldsinanInstance?
• WhataretwowaystogenerateInstances?
• HowdowemodifythevalueofInstancefields?
Review
• WhatarethefourmainfieldsinanInstance?
• WhataretwowaystogenerateInstances?
• HowdowemodifythevalueofInstancefields?
• Namesomeclassesthatappearinthe“data”field.
Outline
• AboutMALLET
• Represen@ngData
• Classifica@on
• SequenceTagging
• TopicModeling
Classifierobjects
• Classifiersmapfrominstancestodistribu@onsoverafixedsetofclasses
• MaxEnt,NaïveBayes,DecisionTrees…
cc.mallet.classify
Given data Which classis best?
(this one!)wateryNNJJPRPVBCC
Classifierobjects
• Classifiersmapfrominstancestodistribu@onsoverafixedsetofclasses
• MaxEnt,NaïveBayes,DecisionTrees…
cc.mallet.classify
Labeling labeling = classifier.classify(instance);
Label l = labeling.getBestLabel();
System.out.print(instance + “\t”);System.out.println(l);
TrainingClassifierobjects
cc.mallet.classify
ClassifierTrainer trainer = new MaxEntTrainer();
Classifier classifier = trainer.train(instances);
• EachtypeofclassifierhasoneormoreClassifierTrainerclasses
TrainingClassifierobjects
cc.mallet.optimize
log P(Labels | Data) =log f(label1, data1, w) +log f(label2, data2, w) +log f(label3, data3, w) +…
• Someclassifiersrequirenumericalop@miza@onofanobjec@vefunc@on. Maximize w.r.t. w!
Parametersw
• Associa@onbetweenfeature,classlabel
• HowmanyparametersforKclassesandNfeatures?
ac@on NN 0.13ac@on VB ‐0.1ac@on JJ ‐0.21SUFF‐@on NN 1.3SUFF‐@on VB ‐2.1SUFF‐@on JJ ‐1.7SUFF‐on NN 0.01SUFF‐on VB ‐0.02…
TrainingClassifierobjects
cc.mallet.optimize
interface Optimizerboolean optimize()
interface Optimizableinterface ByValueinterface ByValueGradient
Limited-memory BFGS,Conjugate gradient…
Specific objective functions
TrainingClassifierobjects
cc.mallet.classify
MaxEntOptimizableByLabelLikelihooddouble[] getParameters()void setParameters(double[] parameters)…
double getValue()void getValueGradient(double[] buffer)
Log likelihood and its first derivative
ForOptimizableinterface
Evalua@onofClassifiers
• Createrandomtest/trainsplits
cc.mallet.types
InstanceList[] instanceLists =instances.split(new Randoms(),
new double[] {0.9, 0.1, 0.0});
90% training
10% testing
0% validation
Evalua@onofClassifiers
• TheTrialclassstorestheresultsofclassifica@onsonanInstanceList(tes@ngortraining)
cc.mallet.classify
Trial(Classifier c, InstanceList list)double getAccuracy()double getAverageRank()double getF1(int/Label/Object)double getPrecision(…)double getRecall(…)
Review
• Ihaveinventedanewclassifier:Davidregression.– WhatclassshouldIimplementtoclassifyinstances?
Review
• Ihaveinventedanewclassifier:Davidregression.– WhatclassshouldIimplementtotrainaDavidregressionclassifier?
Review
• Ihaveinventedanewclassifier:Davidregression.– IwanttotrainusingByValueGradient.Whatmathema@calfunc@onsdoIneedtocodeup,andwhatclassshouldIputthemin?
Review
• Ihaveinventedanewclassifier:Davidregression.– HowwouldIcheckwhethermynewclassifierworksbe1erthanNaïveBayes?
Outline
• AboutMALLET
• Represen@ngData
• Classifica@on
• SequenceTagging
• TopicModeling
SequenceTagging
• Dataoccursinsequences
• Categoricallabelsforeachposi@on
• Labelsarecorrelated
DETNNVBSVBGthedoglikesrunning
SequenceTagging
• Dataoccursinsequences
• Categoricallabelsforeachposi@on
• Labelsarecorrelated
????????thedoglikesrunning
SequenceTagging
• Classifica@on:n‐way
• SequenceTagging:nT‐way
NNJJPRPVBCC
NNJJPRPVBCC
NNJJPRPVBCC
NNJJPRPVBCC
NNJJPRPVBCC
NNJJPRPVBCC
NNJJPRPVBCC
orreddogsonbluetrees
AvoidingExponen@alBlowup
• Markovproperty
• Dynamicprogramming
Andrei Markov
AvoidingExponen@alBlowup
• Markovproperty
• Dynamicprogramming
This oneGiven this one
Is independent of theseAndrei Markov
DETJJNNVB
AvoidingExponen@alBlowup
• Markovproperty
• Dynamicprogramming
NNJJPRPVBCC
NNJJPRPVBCC
NNJJPRPVBCC
NNJJPRPVBCC
NNJJPRPVBCC
NNJJPRPVBCC
orreddogsonbluetrees Andrei Markov
AvoidingExponen@alBlowup
• Markovproperty
• Dynamicprogramming
NNJJPRPVBCC
NNJJPRPVBCC
NNJJPRPVBCC
NNJJPRPVBCC
NNJJPRPVBCC
reddogsonbluetrees Andrei Markov
AvoidingExponen@alBlowup
• Markovproperty
• Dynamicprogramming
NNJJPRPVBCC
NNJJPRPVBCC
NNJJPRPVBCC
NNJJPRPVBCC
dogsonbluetrees Andrei Markov
HiddenMarkovModelsandCondi@onalRandomFields
• HiddenMarkovModel:fullygenera@ve
• Condi@onalRandomField:condi@onal
P(Labels | Data) =P(Data, Labels) / P(Data)
P(Labels | Data)
HiddenMarkovModelsandCondi@onalRandomFields
• HiddenMarkovModel:simple(independent)outputspace
• Condi@onalRandomField:arbitrarilycomplicatedoutputs
“NSF-funded”
“NSF-funded”CAPITALIZEDHYPHENATEDENDS-WITH-edENDS-WITH-d…
HiddenMarkovModelsandCondi@onalRandomFields
FeatureSequence
FeatureVectorSequence
FeatureVector[]
int[]
• HiddenMarkovModel:simple(independent)outputspace
• Condi@onalRandomField:arbitrarilycomplicatedoutputs
Impor@ngData
• SimpleTaggerformat:onewordperline,withinstancesdelimitedbyablankline
Call VBme PPNIshmael NNP. .
Some JJyears NNS…
Impor@ngData
• SimpleTaggerformat:onewordperline,withinstancesdelimitedbyablankline
Call SUFF-ll VBme TWO_LETTERS PPNIshmael BIBLICAL_NAME NNP. PUNCTUATION .
Some CAPITALIZED JJyears TIME SUFF-s NNS…
Impor@ngData
LineGroupIterator
SimpleTaggerSentence2TokenSequence()//String to Tokens, handles labels
TokenSequence2FeatureVectorSequence()//Token objects to FeatureVectors
cc.mallet.pipe, cc.mallet.pipe.iterator
Impor@ngData
LineGroupIterator
SimpleTaggerSentence2TokenSequence()//String to Tokens, handles labels
[Pipes that modify tokens]
TokenSequence2FeatureVectorSequence()//Token objects to FeatureVectors
cc.mallet.pipe, cc.mallet.pipe.iterator
Impor@ngData
//IshmaelTokenTextCharSuffix(“C2=”, 2)
//Ishmael C2=elRegexMatches(“CAP”, Pattern.compile(“\\p{Lu}.*”))
//Ishmael C2=el CAPLexiconMembership(“NAME”, new File(‘names’), false)
//Ishmael C2=el CAP NAME
cc.mallet.pipe.tsf
must matchentire string
one name per line
ignore case?
Slidingwindowfeatures
areddogonabluetree
Slidingwindowfeatures
areddogonabluetree
Slidingwindowfeatures
areddogonabluetree
red@-1
Slidingwindowfeatures
areddogonabluetree
red@-1a@-2
Slidingwindowfeatures
areddogonabluetree
red@-1a@-2on@1
Slidingwindowfeatures
areddogonabluetree
red@-1a@-2on@1a@-2_&_red@-1
Impor@ngData
int[][] conjunctions = new int[3][]; conjunctions[0] = new int[] { -1 }; conjunctions[1] = new int[] { 1 }; conjunctions[2] = new int[] { -2, -1 };
OffsetConjunctions(conjunctions)
// a@-2_&_red@-1 on@1
cc.mallet.pipe.tsf
previousposition
next position
previous two
Impor@ngData
int[][] conjunctions = new int[3][]; conjunctions[0] = new int[] { -1 }; conjunctions[1] = new int[] { 1 }; conjunctions[2] = new int[] { -2, -1 };
TokenTextCharSuffix("C1=", 1)OffsetConjunctions(conjunctions)
// a@-2_&_red@-1 a@-2_&_C1=d@-1
cc.mallet.pipe.tsf
previousposition
next position
previous two
FiniteStateTransducers
• Finitestatemachineovertwoalphabets(observed,hidden)
FiniteStateTransducers
• Finitestatemachineovertwoalphabets(observed,hidden)
DET
P(DET)
FiniteStateTransducers
• Finitestatemachineovertwoalphabets(observed,hidden)
DETthe
P(the | DET)
FiniteStateTransducers
• Finitestatemachineovertwoalphabets(observed,hidden)
DETNNthe
P(NN | DET)
FiniteStateTransducers
• Finitestatemachineovertwoalphabets(observed,hidden)
DETNNthedog
P(dog | NN)
FiniteStateTransducers
• Finitestatemachineovertwoalphabets(observed,hidden)
DETNNVBSthedog
P(VBS | NN)
Howmanyparameters?
• Determinesefficiencyoftraining
• Toomanyleadstooverfirng
Trick: Don’t allowcertain transitions
P(VBS | DET) = 0
Howmanyparameters?
• Determinesefficiencyoftraining
• Toomanyleadstooverfirng
DETNNVBS
thedogruns
DETNNVBS
thedogruns
DETNNVBS
thedogruns
FiniteStateTransducers
abstract class TransducerCRFHMM
abstract class TransducerTrainerCRFTrainerByLabelLikelihoodHMMTrainerByLikelihood
cc.mallet.fst
FiniteStateTransducers
cc.mallet.fst
First order: one weightfor every pair of labelsand observations.
CRF crf = new CRF(pipe, null);crf.addFullyConnectedStates(); // orcrf.addStatesForLabelsConnectedAsIn(instances);
DETNNVBS
thedogruns
FiniteStateTransducers
cc.mallet.fst
“three-quarter” order:one weight for everypair of labels andobservations.
crf.addStatesForThreeQuarterLabelsConnectedAsIn(instances);
DETNNVBS
thedogruns
FiniteStateTransducers
cc.mallet.fst
Second order: one weightfor every triplet of labelsand observations.
crf.addStatesForBiLabelsConnectedAsIn(instances);
DETNNVBS
thedogruns
FiniteStateTransducers
cc.mallet.fst
“Half” order: equivalent toindependent classifiers,except some transitionsmay be illegal.
crf.addStatesForHalfLabelsConnectedAsIn(instances);
DETNNVBS
thedogruns
Trainingatransducer
CRF crf = new CRF(pipe, null);crf.addStatesForLabelsConnectedAsIn(trainingInstances); CRFTrainerByLabelLikelihood trainer = new CRFTrainerByLabelLikelihood(crf);
trainer.train();
cc.mallet.fst
Evalua@ngatransducer
CRFTrainerByLabelLikelihood trainer = new CRFTrainerByLabelLikelihood(transducer);
TransducerEvaluator evaluator = new TokenAccuracyEvaluator(testing, "testing"));
trainer.addEvaluator(evaluator);
trainer.train();
cc.mallet.fst
Applyingatransducer
Sequence output = transducer.transduce (input);
for (int index=0; index < input.size(); input++) {System.out.print(input.get(index) + “/”);System.out.print(output.get(index) + “ “);
}
cc.mallet.fst
Review
• HowdoyouaddnewfeaturestoTokenSequences?
Review
• HowdoyouaddnewfeaturestoTokenSequences?
• Whatarethreefactorsthataffectthenumberofparametersinamodel?
Outline
• AboutMALLET
• Represen@ngData
• Classifica@on
• SequenceTagging
• TopicModeling
Topics:“Seman@cGroups”
News Article
Topics:“Seman@cGroups”
“Sports” “Negotiation”
News Article
Topics:“Seman@cGroups”
“Sports” “Negotiation”
News Article
teamplayer
game
strike
deadlineunion
Topics:“Seman@cGroups”
News Article
teamplayer
game
strike
deadlineunion
SeriesYankeesSoxRedWorldLeaguegameBostonteamgamesbaseballMetsGameserieswonClemensBraves
Yankeeteams
playersLeagueownersleaguebaseballunioncommissionerBaseballAssocia@onlaborCommissionerFootballmajor
teamsSeligagreementstriketeambargaining
TrainingaTopicModel
cc.mallet.topics
ParallelTopicModel lda = new ParallelTopicModel(numTopics); lda.addInstances(trainingInstances); lda.estimate();
Evalua@ngaTopicModel
cc.mallet.topics
ParallelTopicModel lda = new ParallelTopicModel(numTopics);lda.addInstances(trainingInstances);lda.estimate();
MarginalProbEstimator evaluator = lda.getProbEstimator();
double logLikelihood = evaluator.evaluateLeftToRight(testing, 10, false, null);
Inferringtopicsfornewdocuments
cc.mallet.topics
ParallelTopicModel lda = new ParallelTopicModel(numTopics);lda.addInstances(trainingInstances);lda.estimate();
TopicInferencer inferencer = lda.getInferencer();
double[] topicProbs = inferencer.getSampledDistribution(instance, 100, 10, 10);
Morethanwords…
• Textcollec@onsmixfreetextandstructureddata
David MimnoAndrew McCallumUAI2008…
Morethanwords…
• Textcollec@onsmixfreetextandstructureddata
David MimnoAndrew McCallumUAI2008
“Topic models conditionedon arbitrary features usingDirichlet-multinomialregression. …”
Dirichlet‐mul@nomialRegression(DMR)
Thecorpusspecifiesavectorofreal‐valuedfeatures(x)foreachdocument,oflengthF.
EachtopichasanF‐lengthvectorofparameters.
Topicparametersforfeature“publishedinJMLR”
user,users,userinterface,interac@ve,interface‐1.44
web,webpages,webpage,worldwideweb,websites‐1.36
retrieval,informa@onretrieval,query,queryexpansion‐1.23
strategies,strategy,adapta@on,adap@ve,driven‐1.21
agent,agents,mul@agent,autonomousagents‐1.12
nearestneighbor,boos@ng,nearestneighbors,adaboost1.37
blindsourcesepara@on,sourcesepara@on,separa@on,channel1.40
reinforcementlearning,learning,reinforcement1.41
bounds,vcdimension,bound,upperbound,lowerbounds1.74
kernel,kernels,ra@onalkernels,stringkernels,fisherkernel2.27
FeatureparametersforRLtopic
<default>‐3.76
COLING‐1.64
IEEETrans.PAMI‐1.54
CVPR‐1.47
ACL‐1.38
MachineLearningJournal2.19
ECML2.45
KenjiDoya2.56
ICML2.88
SridharMahadevan2.99
Topicparametersforfeature“publishedinUAI”
nearestneighbor,boos@ng,nearestneighbors,adaboost‐1.50
descrip@ons,descrip@on,top,bo1om,topbo1om‐1.50
workshopreport,invitedtalk,interna@onalconference,report‐1.37
digitallibraries,digitallibrary,digital,library‐1.36
shape,deformable,shapes,contour,ac@vecontour‐1.29
reasoning,logic,defaultreasoning,nonmonotonicreasoning2.11
uncertainty,symbolic,sketch,primalsketch,uncertain,[email protected]
probability,probabili@es,probabilitydistribu@ons,2.25
qualita@ve,reasoning,qualita@vereasoning,qualita@[email protected]
bayesiannetworks,bayesiannetwork,beliefnetworks2.88
FeatureparametersforBayesnetstopic
<default>‐3.36
ICRA‐2.24
NeuralNetworks‐1.50
COLING‐1.38
Probabilis@cSeman@csforNonmonotonicReasoning(Pearl,KR,1989)
‐1.16
LoopyBeliefPropaga@onforApproximateInference(Murphy,Weiss,andJordan,UAI,1999)
2.04
PhilippeSmets2.15
AshrafM.Abdelbar2.23
Mary‐AnneWilliams2.41
UAI2.88
Dirichlet‐mul@nomialRegression
• Arbitraryobservedfeaturesofdocuments
• TargetcontainsFeatureVector
DMRTopicModel dmr = new DMRTopicModel (numTopics);
dmr.addInstances(training);dmr.estimate();
dmr.writeParameters(new File("dmr.parameters"));
PolylingualTopicModeling
• Topicsexistinmorelanguagesthanyoucouldpossiblylearn
• Topicallycomparable documentsaremucheasiertogetthantransla@onsets
• Transla@ondic@onaries– coverpairs,notsetsoflanguages– misstechnicalvocabulary– aren’tavailableforlow‐resourcelanguages
TopicsfromEuropeanParliamentProceedings
TopicsfromEuropeanParliamentProceedings
TopicsfromWikipedia
Alignedinstancelists
dog… chien… hund…cat… chat…pig… schwein…
PolylingualTopics
InstanceList[] training = new InstanceList[] { english, german, arabic, mahican };
PolylingualTopicModel pltm = new PolylingualTopicModel(numTopics);
pltm.addInstances(training);
MALLEThands‐ontutorial
h1p://mallet.cs.umass.edu/mallet‐handson.tar.gz