N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.
-
Upload
brianna-anthony -
Category
Documents
-
view
222 -
download
0
Transcript of N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.
![Page 1: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/1.jpg)
N-gram Models
CMSC 25000
Artificial Intelligence
March 1, 2005
![Page 2: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/2.jpg)
Markov Assumptions
• Exact computation requires too much data• Approximate probability given all prior wds
– Assume finite history– Bigram: Probability of word given 1 previous
• First-order Markov
– Trigram: Probability of word given 2 previous
• N-gram approximation
)|()|( 11
11
nNnn
nn wwPwwP
)|()( 11
1 k
n
kk
n wwPwPBigram sequence
![Page 3: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/3.jpg)
Evaluating n-gram models
• Entropy & Perplexity– Information theoretic measures– Measures information in grammar or fit to data– Conceptually, lower bound on # bits to encode
• Entropy: H(X): X is a random var, p: prob fn
• Perplexity: – Weighted average of number of choices
)(log)()( 2 xpxpXHXx
H2
![Page 4: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/4.jpg)
Perplexity Model Comparison
• Compare models with different history• Train models
– 38 million words – Wall Street Journal
• Compute perplexity on held-out test set– 1.5 million words (~20K unique, smoothed)
• N-gram Order | Perplexity– Unigram | 962– Bigram | 170– Trigram | 109
![Page 5: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/5.jpg)
Does the model improve?
• Compute probability of data under model– Compute perplexity
• Relative measure– Decrease toward optimum?
– Lower than competing model?
Iter 0 1 2 3 4 5 6 9 10
P(data) 9^-19 1^-16 2^-16 3^-16 4^-16 4^-16 4^-16 5^-16 5^-16
Perplex 3.393 2.95 2.88 2.85 2.84 2.83 2.83 2.8272 2.8271
![Page 6: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/6.jpg)
Entropy of English• Shannon’s experiment
– Subjects guess strings of letters, count guesses– Entropy of guess seq = Entropy of letter seq– 1.3 bits; Restricted text
• Build stochastic model on text & compute– Brown computed trigram model on varied corpus– Compute (per-char) entropy of model– 1.75 bits
![Page 7: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/7.jpg)
Using N-grams
• Language Identification– Take text samples
• English, French, Spanish, German
– Build character tri-gram models – Test Sample: Compute maximum likelihood
• Best match is chosen language
• Authorship attribution
![Page 8: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/8.jpg)
Sequence Models in Modern AI
• Probabilistic sequence models:– HMMs, N-grams– Train from available data
• Classification with contextual influence
– Robust to noise/variability• E.g. Sentences vary in degrees of acceptability
– Provides ranking of sequence quality
– Exploits large scale data, storage, memory, CPU
![Page 9: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/9.jpg)
Computer Vision
CMSC 25000
Artificial Intelligence
March 1, 2005
![Page 10: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/10.jpg)
Roadmap
• Motivation– Computer vision applications
• Is a Picture worth a thousand words?– Low level features
• Feature extraction: intensity, color
– High level features• Top-down constraint: shape from stereo, motion,..
• Case Study: Vision as Modern AI– Fast, robust face detection (Viola & Jones 2002)
![Page 11: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/11.jpg)
Perception
• From observation to facts about world– Analogous to speech recognition– Stimulus (Percept) S, World W
• S = g(W)
– Recognition: Derive world from percept• W=g’(S)
• Is this possible?
![Page 12: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/12.jpg)
Key Perception Problem
• Massive ambiguity– Optical illusions
• Occlusion
• Depth perception
• “Objects are closer than they appear”
• Is it full-sized or a miniature model?
![Page 13: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/13.jpg)
Image Ambiguity
![Page 14: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/14.jpg)
Handling Uncertainty
• Identify single perfect correct solution– Impossible!
• Noise, ambiguity, complexity
• Solution:– Probabilistic model– P(W|S) = αP(S|W) P(W)
• Maximize image probability and model probability
![Page 15: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/15.jpg)
Handling Complexity
• Don’t solve the whole problem– Don’t recover every object/position/color…
• Solve restricted problem– Find all the faces– Recognize a person– Align two images
![Page 16: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/16.jpg)
Modern Computer Vision Applications
• Face / Object detection
• Medical image registration
• Face recognition
• Object tracking
![Page 17: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/17.jpg)
Vision Subsystems
![Page 18: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/18.jpg)
Image Formation
![Page 19: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/19.jpg)
Images and Representations
• Initially pixel images – Image as NxM matrix of pixel values
– Alternate image codings• Grey-scale intensity values
• Color encoding: intensities of RGB values
![Page 20: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/20.jpg)
Images
![Page 21: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/21.jpg)
Grey-scale Images
![Page 22: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/22.jpg)
Color Images
![Page 23: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/23.jpg)
Image Features
• Grey-scale and color intensities– Directly access image signal values
– Large number of measures• Possibly noisy
• Only care about intensities as cues to world
• Image Features:– Mid-level representation
– Extract from raw intensities
– Capture elements of interest for image understanding
![Page 24: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/24.jpg)
Edge Detection
![Page 25: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/25.jpg)
Edge Detection
• Find sharp demarcations in intensity• 1) Apply spatially oriented filters
• E.g. vertical, horizontal, diagonal
• 2) Label above-threshold pixels with edge orientation• 3) Combine edge segments with same orientation:
line
![Page 26: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/26.jpg)
Top-down Constraints
• Goal: Extract objects from images– Approach: apply knowledge about how the world
works to identify coherent objects; reconstruct 3D
![Page 27: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/27.jpg)
Motion: Optical Flow
• Find correspondences in sequential images– Units which move
together represent objects
![Page 28: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/28.jpg)
Stereo
![Page 29: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/29.jpg)
Texture and Shading
![Page 30: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/30.jpg)
Edge-Based 2-3D Reconstruction
Assume world of solid polyhedra with 3-edge verticesApply Waltz line labeling – via Constration Satisfaction
![Page 31: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/31.jpg)
Summary
• Vision is hard:– Noise, ambiguity, complexity
• Prior knowledge is essential to constrain problem– Cohesion of objects, optics, object features
• Combine multiple cues– Motion, stereo, shading, texture,
• Image/object matching:– Library: features, lines, edges, etc
• Apply domain knowledge: Optics• Apply machine learning: NN, NN, CSP, etc
![Page 32: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/32.jpg)
Computer Vision Case Study
• “Rapid Object Detection using a Boosted Cascade of Simple Features”, Viola/Jones ’01
• Challenge:– Object detection:
• Find all faces in an arbitrary images
– Real-time execution• 15 frames per second
– Need simple features, classifiers
![Page 33: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/33.jpg)
Rapid Object Detection Overview
• Fast detection with simple local features– Simple fast feature extraction
• Small number of computations per pixel• Rectangular features
– Feature selection with Adaboost• Sequential feature refinement
– Cascade of classifiers• Increasingly complex classifiers• Repeatedly rule out non-object areas
![Page 34: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/34.jpg)
Picking Features
• What cues do we use for object detection?– Not direct pixel intensities– Features
• Can encode task specific domain knowledge (bias)– Difficult to learn directly from data
– Reduce training set size
• Feature system can speed processing
![Page 35: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/35.jpg)
Rectangle Features
• Treat rectangles as units– Derive statistics
• Two-rectangle features– Two similar rectangular regions
• Vertically or horizontally adjacent
– Sum pixels in each region• Compute difference between regions
![Page 36: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/36.jpg)
Rectangle Features II
• Three-rectangle features– 3 similar rectangles: horizontally/vertically
• Sum outside rectangles
• Subtract from center region
• Four-rectangle features– Compute difference between diagonal pairs
• HUGE feature set: ~180,000
![Page 37: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/37.jpg)
Rectangle Features
![Page 38: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/38.jpg)
Computing Features Efficiently
• Fast detection requires fast feature calculation• Rapidly compute intermediate representation
– “Integral image”
– Value for point (x,y) is sum of pixels above, left
– ii(x,y) = Σx’<=x,y’<=y i(x,y)
– Computed by recurrence• s(x,y) = s(x,y-1) + i(x,y) , where s(x,y) cumulative row
• ii(x,y) = ii(x-1,y) + s(x,y)
• Compute rectangle sum with 4 array references
![Page 39: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/39.jpg)
Rectangle Feature Summary
• Rectangle features– Relatively simple– Sensitive to bars, edges, simple structure
• Coarse
– Rich enough for effective learning– Efficiently computable
![Page 40: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/40.jpg)
Learning an Image Classifier
• Supervised training: +/- examples• Many learning approaches possible• Adaboost:
– Selects features AND trains classifier– Improves performance of simple classifiers
• Guaranteed to converge exponentially rapidly
– Basic idea: Simple classifier• Boosts performance by focusing on previous errors
![Page 41: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/41.jpg)
Feature Selection and Training
• Goal: Pick only useful features from 180000– Idea: Small number of features effective
• Learner selects single feature that best separates +/- ve examples– Learner selects optimal threshold for each feature– Classifier h(x) = 1 if pf(x)<pθ, 0 otherwise
![Page 42: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/42.jpg)
Basic Learning Results
• Initial classification: Frontal faces– 200 features– Finds 95%, 1/14000 false positive– Very fast
• Adding features adds to computation time
• Features interpretable– Darker region around eyes that nose/cheeks– Eyes are darker than bridge of nose
![Page 43: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/43.jpg)
Primary Features
![Page 44: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/44.jpg)
“Attentional Cascade”
• Goal: Improved classification, reduced time– Insight: Small – fast – classifiers can reject
• But have very few false negatives– Reject majority of uninteresting regions quickly
– Focus computation on interesting regions
• Approach: “Degenerate” decision tree• Aka “cascade”
• Positive results passed to high detection classifiers– Negative results rejected immediately
![Page 45: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/45.jpg)
Cascade Schematic
All Sub-window Features
CL 1 CL 2 CL 3
F F F
T T T MoreClassifiers
Reject Sub-Window
![Page 46: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/46.jpg)
Cascade Construction
• Each stage is a trained classifier– Tune threshold to minimize false negatives– Good first stage classifier
• Two feature strong classifier – eye/check + eye/nose
• Tuned: Detect 100%; 40% false positives
– Very computationally efficient • 60 microprocessor instructions
![Page 47: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/47.jpg)
Cascading
• Goal: Reject bad features quickly– Most features are bad
• Reject early in processing, little effort
– Good regions will trigger full cascade• Relatively rare
• Classification is progressively more difficult– Rejected the most obvious cases already
• Deeper classifiers more complex, more error-prone
![Page 48: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/48.jpg)
Cascade Training
• Tradeoffs: Accuracy vs Cost– More accurate classifiers: more features, complex
– More features, more complex: Slower
– Difficult optimization
• Practical approach– Each stage reduces false positive rate
– Bound reduction in false pos, increase in miss
– Add features to each stage until meet target
– Add stages until overall effectiveness targets met
![Page 49: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/49.jpg)
Results
• Task: Detect frontal upright faces– Face/non-face training images
• Face: ~5000 hand-labeled instances
• Non-face: ~9500 random web-crawl, hand-checked
– Classifier characteristics:• 38 layer cascade
• Increasing number of features: 1,10,25,… : 6061
– Classification: Average 10 features per window• Most rejected in first 2 layers
• Process 384x288 image in 0.067 secs
![Page 50: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/50.jpg)
Detection Tuning
• Multiple detections:– Many subwindows around face will alert– Create disjoint subsets
• For overlapping boundaries, only report one – Return average of corners
• Voting:– 3 similarly trained detectors
• Majority rules
– Improves overall
![Page 51: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/51.jpg)
Conclusions
• Fast, robust facial detection– Simple, easily computable features– Simple trained classifiers– Classification cascade allows early rejection
• Early classifiers also simple, fast
– Good overall classification in real-time
![Page 52: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/52.jpg)
Some Results
![Page 53: N-gram Models CMSC 25000 Artificial Intelligence March 1, 2005.](https://reader036.fdocuments.us/reader036/viewer/2022081511/5697bfa41a28abf838c97397/html5/thumbnails/53.jpg)
Vision in Modern Ai
• Goals: – Robustness– Multidomain applicability– Automatic acquisition– Speed: Real time
• Approach:– Simple mechanisms, feature selection– Machine learning: Tune features, classification