java_lect_27.ppt
-
Upload
chander-kumar -
Category
Documents
-
view
212 -
download
0
Transcript of java_lect_27.ppt
-
Graphical Models for Segmenting and Labeling Sequence Data
Manoj Kumar Chinnakotla
NLP-AI Seminar
-
OutlineIntroductionDirected Graphical ModelsHidden Markov Models (HMMs)Maximum Entropy Markov Models (MEMMs)Label Bias ProblemUndirected Graphical ModelsConditional Random Fields (CRFs)Summary
-
The TaskLabelingGiven sequence data, mark appropriate tags for each data itemSegmentationGiven sequence data, segment into non-overlapping groups such that related entities are in same group
-
ApplicationsComputational LinguisticsPOS TaggingInformation ExtractionSyntactic DisambiguationComputational BiologyDNA and Protein Sequence AlignmentSequence homologue searchingProtein Secondary Structure Prediction
-
Example : POS Tagging
-
Directed Graphical ModelsHidden Markov models (HMMs)Assign a joint probability to paired observation and label sequencesThe parameters trained to maximize the joint likelihood of train examples
-
Hidden Markov Models (HMMs)Generative Model - Models the joint distribution
Generation ProcessProbabilistic Finite State MachineSet of states Correspond to tagsAlphabet - Set of wordsTransition Probability State Probability
-
HMMs (Contd..)For a given word/tag sequence pair
Why Hidden?Sequence of tags which generated word sequence not visibleWhy Markov? Based on Markovian Assumption : current tag depends only on previous n tagsSolves the sparsity problemTraining Learning the transition and emission probabilities from data
-
HMMs Tagging ProcessGiven a string of words w, choose tag sequence t* such that
Computationally expensive - Need to evaluate all possible tag sequences!For n possible tags, m positions Viterbi AlgorithmUsed to find the optimal tag sequence t*Efficient dynamic programming based algorithm
-
Disadvantages of HMMsNeed to enumerate all possible observation sequencesNot possible to represent multiple interacting featuresDifficult to model long-range dependencies of the observationsVery strict independence assumptions on the observations
-
Maximum Entropy Markov Models (MEMMs)Conditional Exponential ModelsAssumes observation sequence given (need not model) Trains the model to maximize the conditional likelihood P(Y|X)
-
MEMMs (Contd..)For a new data sequence x, the label sequence y which maximizes P(y|x,) is assigned ( - parameter set)Arbitrary non-independent features on observation sequence possibleConditional Models known to perform well than GenerativePerforms Per-State NormalizationTotal mass which arrives at a state must be distributed among all possible successor states
-
Label Bias ProblemBias towards states with fewer outgoing transitionsDue to per-state normalizationAn Example MEMM
-
Undirected Graphical ModelsRandom Fields
-
Conditional Random Fields (CRFs)Conditional Exponential Model like MEMMHas all the advantages of MEMMs without label bias problemMEMM uses per-state exponential model for the conditional probabilities of next states given the current stateCRF has a single exponential model for the joint probability of the entire sequence of labels given the observation sequenceAllow some transitions vote more strongly than others depending on the corresponding observations
-
Definition of CRFs
-
CRF Distribution FunctionWhere :
V = Set of Label Random Variablesfk and gk = Featuresgk = State Featurefk = Edge Feature
are parameters to be estimatedy|e = Set of Components of y defined by edge ey|v = Set of Components of y defined by vertex v
-
CRF Training
-
CRF Training (Contd..)Condition for maximum likelihoodExpected feature count computed using Model equals Empirical feature count from training dataClosed form solution for parameters not possibleIterative algorithms employed - Improve log likelihood in successive iterationsExamplesGeneralized Iterative Scaling (GIS)Improved Iterative Scaling (IIS)
-
Graphical Comparison HMMs, MEMMs, CRFs
-
POS Tagging Results
-
SummaryHMMsDirected, Generative graphical modelsCannot be used to model overlapping features on observationsMEMMsDirected, Conditional ModelsCan model overlapping features on observationsSuffer from label bias problem due to per-state normalizationCRFsUndirected, Conditional ModelsAvoids label bias problemEfficient training possible
-
Thanks!
AcknowledgementsSome slides in this presentation are from Rongkun Shens (Oregon State Univ) Presentation on CRFs