Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen]...

Post on 12-Apr-2020

7 views 0 download

Transcript of Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen]...

A Dendrogram

3/15/05 1Bioinformatics (Lec 17)

Hierarchical Clustering [Johnson, SC, 1967]• Given n points in Rd, compute the distance

between every pair of points• While (not done)

– Pick closest pair of points si and sj and make them part of the same cluster.

– Replace the pair by an average of the two sij

Try the applet at:http://www.cs.mcgill.ca/~papou/#applet

3/15/05 2Bioinformatics (Lec 17)

Distance Metrics• For clustering, define a distance function:

– Euclidean distance metrics

– Pearson correlation coefficient

k=2: Euclidean Distancekd

i

kiik YXYXD

/1

1)(),( ⎥⎦

⎤⎢⎣

⎡−= ∑

=

⎟⎟⎠

⎞⎜⎜⎝

⎛ −⎟⎟⎠

⎞⎜⎜⎝

⎛ −= ∑

= y

i

x

id

ixy

YYXXd σσ

ρ1

1-1 ≤ ρxy ≥ 1

3/15/05 3Bioinformatics (Lec 17)

Start

End

3/15/05 4Bioinformatics (Lec 17)

K-Means Clustering [McQueen ’67]Repeat– Start with randomly chosen cluster centers– Assign points to give greatest increase in score– Recompute cluster centers– Reassign pointsuntil (no changes)

Try the applet at: http://www.cs.mcgill.ca/~bonnef/project.html

3/15/05 5Bioinformatics (Lec 17)

Self-Organizing Maps [Kohonen]• Kind of neural network.• Clusters data and find complex relationships

between clusters.• Helps reduce the dimensionality of the data.• Map of 1 or 2 dimensions produced.• Unsupervised Clustering• Like K-Means, except for visualization

3/15/05 6Bioinformatics (Lec 17)

SOM Algorithm• Select SOM architecture, and initialize

weight vectors and other parameters.• While (stopping condition not satisfied) do

for each input point x– winning node q has weight vector closest to x.– Update weight vector of q and its neighbors.– Reduce neighborhood size and learning rate.

3/15/05 7Bioinformatics (Lec 17)

SOM Algorithm Details

• Distance between x and weight vector:• Winning node: • Weight update function (for neighbors):

• Learning rate:

iwx −

)]()()[,,()()1( kwkxixkkwkw iii −+=+ µ

ii

wxxq −= min)(

⎟⎟

⎜⎜

⎛ −−= 2

2)(

exp)(),,( 0

σηµ

xqi rrkixk

3/15/05 8Bioinformatics (Lec 17)

3/15/05 9Bioinformatics (Lec 17)

World Poverty SOM

3/15/05 10Bioinformatics (Lec 17)

World Poverty Map

Neural Networks

ΣInput X

SynapticWeights W

ƒ(•)

Bias θ

Output y

3/15/05 11Bioinformatics (Lec 17)

Learning NN

×

×

× Σ

Σ

Adaptive Algorithm

Input X

1

Weights W

−+

Desired Response

Error

3/15/05 12Bioinformatics (Lec 17)

Types of NNs• Recurrent NN• Feed-forward NN• Layered

Other issues• Hidden layers possible• Different activation functions possible

3/15/05 13Bioinformatics (Lec 17)

Application: Secondary Structure Prediction

3/15/05 14Bioinformatics (Lec 17)

Support Vector Machines• Supervised Statistical Learning Method for:

– Classification– Regression

• Simplest Version:– Training: Present series of labeled examples

(e.g., gene expressions of tumor vs. normal cells)– Prediction: Predict labels of new examples.

3/15/05 15Bioinformatics (Lec 17)

3/15/05 16Bioinformatics (Lec 17)

Learning Problems

AA

A

B

ABB

B

SVM – Binary Classification• Partition feature space with a surface.• Surface is implied by a subset of the

training points (vectors) near it. These vectors are referred to as Support Vectors.

• Efficient with high-dimensional data. • Solid statistical theory• Subsume several other methods.

3/15/05 17Bioinformatics (Lec 17)

Learning Problems• Binary Classification• Multi-class classification• Regression

3/15/05 18Bioinformatics (Lec 17)

3/15/05 19Bioinformatics (Lec 17)

3/15/05 20Bioinformatics (Lec 17)

3/15/05 21Bioinformatics (Lec 17)

SVM – General Principles• SVMs perform binary classification by

partitioning the feature space with a surface implied by a subset of the training points (vectors) near the separating surface. These vectors are referred to as Support Vectors.

• Efficient with high-dimensional data. • Solid statistical theory• Subsume several other methods.

3/15/05 22Bioinformatics (Lec 17)

SVM Example (Radial Basis Function)

3/15/05 23Bioinformatics (Lec 17)

SVM Ingredients• Support Vectors• Mapping from Input Space to Feature Space• Dot Product – Kernel function• Weights

3/15/05 24Bioinformatics (Lec 17)

Classification of 2-D (Separable) data

3/15/05 25Bioinformatics (Lec 17)

3/15/05 26Bioinformatics (Lec 17)

Classification of(Separable) 2-D data

3/15/05 27Bioinformatics (Lec 17)

Classification of (Separable) 2-D data

+1

-1

•Margin of a point•Margin of a point set

3/15/05 28Bioinformatics (Lec 17)

Classification using the Separator

x

Separatorw•x + b = 0

w•xi + b > 0

w•xj + b < 0

x

3/15/05 29Bioinformatics (Lec 17)

Perceptron Algorithm (Primal)

Given separable training set S and learning rate η>0 w0 = 0; // Weightb0 = 0; // Biask = 0; R = max 7xi7repeat

for i = 1 to N if yi (wk•xi + bk) ≤ 0 then

wk+1 = wk + ηyixibk+1 = bk + ηyiR2

k = k + 1Until no mistakes made within loopReturn k, and (wk, bk) where k = # of mistakes

Rosenblatt, 1956

w = Σ aiyixi

Theorem: If margin m of S is positive, then

i.e., the algorithm will always converge, and will converge quickly.

Performance for Separable Data

k ≤ (2R/m)2

3/15/05 30Bioinformatics (Lec 17)

Perceptron Algorithm (Dual)Given a separable training set S a = 0; b0 = 0; R = max 7xi7repeat

for i = 1 to N if yi (Σaj yj xi•xj + b) ≤ 0 then

ai = ai + 1b = b + yiR2

endifUntil no mistakes made within loopReturn (a, b)

3/15/05 31Bioinformatics (Lec 17)

Non-linear Separators

3/15/05 32Bioinformatics (Lec 17)

Main idea: Map into feature space

3/15/05 33Bioinformatics (Lec 17)

Non-linear Separators

X F

3/15/05 34Bioinformatics (Lec 17)

Useful URLs• http://www.support-vector.net

3/15/05 35Bioinformatics (Lec 17)

Perceptron Algorithm (Dual)Given a separable training set S a = 0; b0 = 0; R = max 7xi7repeat

for i = 1 to N if yi (Σaj yj k(xi ,xj) + b) ≤ 0 then

ai = ai + 1b = b + yiR2

Until no mistakes made within loopReturn (a, b)

k(xi ,xj) = Φ(xi)• Φ(xj)

3/15/05 36Bioinformatics (Lec 17)

Different Kernel Functions• Polynomial kernel

• Radial Basis Kernel

• Sigmoid Kernel

dYXYX )(),( •=κ

⎟⎟

⎜⎜

⎛ −−= 2

2

2exp),(

σκ

YXYX

))(tanh(),( θωκ +•= YXYX

3/15/05 37Bioinformatics (Lec 17)

SVM Ingredients• Support Vectors• Mapping from Input Space to Feature Space• Dot Product – Kernel function

3/15/05 38Bioinformatics (Lec 17)

Generalizations• How to deal with more than 2 classes?

Idea: Associate weight and bias for each class.• How to deal with non-linear separator?

Idea: Support Vector Machines.• How to deal with linear regression?• How to deal with non-separable data?

3/15/05 39Bioinformatics (Lec 17)

Applications• Text Categorization & Information Filtering

– 12,902 Reuters Stories, 118 categories (91% !!)• Image Recognition

– Face Detection, tumor anomalies, defective parts in assembly line, etc.

• Gene Expression Analysis• Protein Homology Detection

3/15/05 40Bioinformatics (Lec 17)

3/15/05 41Bioinformatics (Lec 17)

3/15/05 42Bioinformatics (Lec 17)

3/15/05 43Bioinformatics (Lec 17)

SVM Example (Radial Basis Function)

3/15/05 44Bioinformatics (Lec 17)