Multi-resolution methods for - School of Computer Science

Notes and Announcements

• Midterm exam: Oct 20, Wednesday, In Class

• Late Homeworks– Turn in hardcopies to Michelle.– DO NOT ask Michelle for extensions.– Note down the date and time of submission.– If submitting softcopy, email to 10-701 instructors list.– Software needs to be submitted via Blackboard.

• HW2 out today – watch email

Projects

Hands-on experience with Machine Learning Algorithms –understand when they work and fail, develop new ones!

Project Ideas online, discuss TAs, every project must have a TA mentor

• Proposal (10%): Oct 11

• Mid-term report (25%): Nov 8

• Poster presentation (20%): Dec 2, 3-6 pm, NSH Atrium

• Final Project report (45%): Dec 6

Project Proposal

• Proposal (10%): Oct 11

– 1 pg maximum

– Describe data set

– Project idea (approx two paragraphs)

– Software you will need to write.

– 1-3 relevant papers. Read at least one before submitting your proposal.

– Teammate. Maximum team size is 2. division of work

– Project milestone for mid-term report? Include experimental results.

Recitation Tomorrow!

• Linear & Non-linear Regression, Nonparametric methods

• Strongly recommended!!

• Place: NSH 1507 (Note)

• Time: 5-6 pm

Non-parametric methods

Kernel density estimate, kNN classifier, kernel regression

Aarti Singh

Machine Learning 10-701/15-781Sept 29, 2010

Parametric methods

• Assume some functional form (Gaussian, Bernoulli, Multinomial, logistic, Linear) for

– P(Xi|Y) and P(Y) as in Naïve Bayes

– P(Y|X) as in Logistic regression

• Estimate parameters (m,s2,q,w,b) using MLE/MAP and plug in

• Pro – need few data points to learn parameters

• Con – Strong distributional assumptions, not satisfied in practice

Example

Hand-written digit imagesprojected as points on a two-dimensional (nonlinear) feature spaces

Non-Parametric methods

• Typically don’t make any distributional assumptions

• As we have more data, we should be able to learn more complex models

• Let number of parameters scale with number of training data

• Today, we will see some nonparametric methods for

– Density estimation

– Classification

– Regression8

Histogram density estimate

Partition the feature space into distinct bins with widths ¢i and count the number of observations, ni, in each bin.

• Often, the same width is

used for all bins, ¢i = ¢.

• ¢ acts as a smoothing

parameter.

Image src: Bishop book

Effect of histogram bin width

# bins = 1/D

Assuming density it roughly constant in each bin(holds true if D is small)

Bias of histogram density estimate: x

Bias – Variance tradeoff

• Choice of #bins

• Bias – how close is the mean of estimate to the truth• Variance – how much does the estimate vary around mean

Small D, large #bins “Small bias, Large variance”

Large D, small #bins “Large bias, Small variance”

Bias-Variance tradeoff11

# bins = 1/D

(p(x) approx constant per bin)

(more data per bin, stable estimate)

Choice of #bins

12Image src: Bishop book

# bins = 1/D

Image src: Larry book

fixed nD decreasesni decreases

Histogram as MLE

• Class of density estimates – constants on each binParameters pj - density in bin j

Note since

• Maximize likelihood of data under probability model with parameters pj

• Show that histogram density estimate is MLE under this model – HW/Recitation

• Histogram – blocky estimate

• Kernel density estimate aka “Parzen/moving window method”

Kernel density estimate

-5 -4 -3 -2 -1 0 1 2 3 4 50

• more generally

Kernel density estimate

Kernel density estimation

Gaussian bumps (red) around six data points and their sum (blue)

• Place small "bumps" at each data point, determined by the kernel function.

• The estimator consists of a (normalized) "sum of bumps”.

• Note that where the points are denser the density estimate will have higher values.

Img src: Wikipedia

Kernels

Any kernel function that satisfies

Kernels

Finite support – only need local

points to computeestimate

Infinite support- need all points tocompute estimate

-But quite popular since smoother (10-702)

Choice of kernel bandwidth

Image Source: Larry’s book – All of NonparametricStatistics

Bart-Simpson Density

Too small

Too largeJust right

Histograms vs. Kernel density estimation

D = h acts as a smoother.

Bias-variance tradeoff

• Simulations

k-NN (Nearest Neighbor) density estimation

• Histogram

• Kernel density est

Fix D, estimate number of points within D of x (ni or nx) from data

Fix nx= k, estimate D from data (volume of ball around x that contains k training pts)

• k-NN density est

k-NN density estimation

k acts as a smoother.

Not very popular for densityestimation - expensive to compute, bad estimates

But a related versionfor classification quite popular…

Density estimation

Classification

k-NN classifier

Sports

Science

k-NN classifier

Sports

Science

Test document

k-NN classifier (k=4)

Sports

Science

Test document

What should we predict? … Average? Majority? Why?

k-NN classifier

• Optimal Classifier:

• k-NN Classifier:

(Majority vote)

# total training pts of class y

# training pts of class ythat lie within Dk ball

1-Nearest Neighbor (kNN) classifier

Sports

Science

Sports

Science

K even not usedin practice

Sports

Science

Sports

Science

What is the best K?

Bias-variance tradeoffLarger K => predicted label is more stable Smaller K => predicted label is more accurate

Similar to density estimation

Choice of K - in next class …

1-NN classifier – decision boundary

VoronoiDiagram

k-NN classifier – decision boundary

• K acts as a smoother (Bias-variance tradeoff)

• Guarantee: For , the error rate of the 1-nearest-neighbour classifier is never more than twice the optimal error.

Case Study:kNN for Web Classification

• Dataset – 20 News Groups (20 classes)

– Download :(http://people.csail.mit.edu/jrennie/20Newsgroups/)

– 61,118 words, 18,774 documents

– Class labels descriptions

Experimental Setup

• Training/Test Sets: – 50%-50% randomly split.

– 10 runs

– report average results

• Evaluation Criteria:

Results: Binary Classes

alt.atheismvs.

comp.graphics

rec.autosvs.

rec.sport.baseball

comp.windows.xvs.

rec.motorcycles

Classification

Regression

Temperature sensing

• What is the temperature

in the room?

Average “Local” Average

at location x?

Kernel Regression

• Aka Local Regression

• Nadaraya-Watson Kernel Estimator

• Weight each training point based on distance to test point

• Boxcar kernel yields

local average

Kernels

Choice of kernel bandwidth h

Image Source: Larry’s book – All of NonparametricStatistics

h=1 h=10

h=50 h=200

Choice of kernel isnot that important

Too small

Too largeJust right

Too small

Kernel Regression as Weighted Least Squares

Weighted Least Squares

Kernel regression corresponds to locally constant estimator obtained from (locally) weighted least squares

i.e. set f(Xi) = b (a constant)

Kernel Regression as Weighted Least Squares

constant

Notice that

set f(Xi) = b (a constant)

Local Linear/Polynomial Regression

Weighted Least Squares

Local Polynomial regression corresponds to locally polynomial estimator obtained from (locally) weighted least squares

i.e. set

(local polynomial of degree p around X)

More in HW, 10-702 (statistical machine learning)

Summary

• Instance based/non-parametric approaches

Four things make a memory based learner:

1. A distance metric, dist(x,Xi)Euclidean (and many more)

2. How many nearby neighbors/radius to look at?k, D/h

3. A weighting function (optional)W based on kernel K

4. How to fit with the local points?Average, Majority vote, Weighted average, Poly fit

Summary

• Parametric vs Nonparametric approaches

Nonparametric models place very mild assumptions on the data distribution and provide good models for complex data

Parametric models rely on very strong (simplistic) distributional assumptions

Nonparametric models (not histograms) requires storing and computing with the entire data set.

Parametric models, once fitted, are much more efficient in terms of storage and computation.

What you should know…

• Histograms, Kernel density estimation– Effect of bin width/ kernel bandwidth– Bias-variance tradeoff

• K-NN classifier– Nonlinear decision boundaries

• Kernel (local) regression– Interpretation as weighted least squares– Local constant/linear/polynomial regression

Multi-resolution methods for - School of Computer Science

Documents

Transcript of Multi-resolution methods for - School of Computer Science

A Fruitful Resolution to Simpson’s Paradox via Multi ...gelman/stuff_for_blog/LiuMengTASv2.pdf · A Fruitful Resolution to Simpson’s Paradox via Multi-Resolution Inference Keli

MULTISCALE GEOMETRIC METHODS FOR DATA SETS II: GEOMETRIC MULTI-RESOLUTION …fodava.gatech.edu/sites/default/files/FODAVA-12-21.pdf · 2012-11-28 · MULTISCALE GEOMETRIC METHODS

Alternative Dispute Resolution Methods

High resolution probabilistic regional climate projection ...€¦ · multi-model ensemble experiments. An appropriate combination of statistical methods and optimization of climate

Conflict Resolution Methods in Somaliland

dispute resolution methods in construction

Multi-view and multi-resolution real-time digital holographic microscopy

Alternative Dispute of Resolution Methods

Real-Time Adaptive Multi-Classifier Multi-Resolution ...orbilu.uni.lu/bitstream/10993/10011/1/INVE_MEM_2013_200181.pdf · Multi-Resolution Visual Tracking Framework for Unmanned Aerial

Multi-Resolution POMDP Planning for Multi-Object Search in 3D

Multi Resolution Touch Screen- DATE2014

Multi resolution Watermarking For Digital Images

ITK Registration Methods Kitware Inc.. Overview Image Resampling Registration Framework Multi-Modality Multi-Resolution Deformable registration.

VARIABLE RESOLUTION METHODS IN FV

AMRA*: Anytime Multi-Resolution Multi-Heuristic A*

1 Blind Image Deblurring using Multi-Resolution methods

Multi-Resolution Analysis

MULTISCALE GEOMETRIC METHODS FOR DATA SETS II: … · MULTISCALE GEOMETRIC METHODS FOR DATA SETS II: GEOMETRIC MULTI-RESOLUTION ANALYSIS WILLIAM K. ALLARD, GUANGLIANG CHEN, AND MAURO

High Resolution Surface Reconstruction from Multi-view ...vision.lems.brown.edu/sites/vision.lems.brown.edu/files/CameraReady.pdfHigh Resolution Surface Reconstruction from Multi-view

LITHOLOGICAL CLASSIFICATION USING MULTI-SENSOR DATA … · 3.2 Methods 3.2.1. Data pre-processing All data was spatially resampled to 10m resolution, the best spatial resolution of

AMRA: Anytime Multi-Resolution Multi-Heuristic A