6.034 Introduction to Artificial Intelligence

37
6.034 Introduction to Artificial Intelligence Machine learning and applications

Transcript of 6.034 Introduction to Artificial Intelligence

Page 1: 6.034 Introduction to Artificial Intelligence

6.034 Introduction to Artificial Intelligence

Machine learning and applications

Page 2: 6.034 Introduction to Artificial Intelligence

Problems we will cover

• Computational biology- cancer classification

- functional classification of genes

• Information retrieval- document classification/ranking

• Recommender systems- predicting user preferences (e.g., movies)

Page 3: 6.034 Introduction to Artificial Intelligence

What are we trying to do?

• The goal is to find the right method for the right problem (matching task)

ProblemsMethods

SVMs

Boosting

K-means. . . . . .

This paper shows that theaccuracy of learned text

augmenting a small number oflabeled training documentswith a large pool of unlabeled

classifiers can be improved byThis paper shows that theaccuracy of learned text

augmenting a small number oflabeled training documentswith a large pool of unlabeled

classifiers can be improved byThis paper shows that theaccuracy of learned text

augmenting a small number oflabeled training documentswith a large pool of unlabeled

classifiers can be improved by

!"#$

%&'(

%&')

*+,$

*+,-

.//$

!01$

203-

%&'4

56#,5789

56#,57:9

56#,57$(9

56#,57-$9

56#,57-;9

56#,57<49

56#,57(-9

56#,57(=9

56#,574)9

56#,57)<97

56#,57:89

56#,57::9

56#,57;(9

56#,57=$9

56#,57=;9

56#,57$849

56#,57$$-9

56#,57$$=9

!"#$"#% &'()*++",#

$>%&')

>%&')

1$>*+,->.//$

-?7%&'4?7!01$

@$

%

%>@-

@->!

!>@$

! "#$%%%%%%%%%%%&$ !'" ! "

-).#+/)"(0",#.123*%41.0",#2,526*.+027*11278/1*2 1 4 5

4 1 3 5

5 4 1 33 5 2

4 5 3

2 1 41 5 5 42 5 4

3 3 1 5 2 13 1 2 34 5 1 33 3 5

2 1 15 2 4 4

1 3 1 5 4 51 2 4 5

users

movies

Page 4: 6.034 Introduction to Artificial Intelligence

Cancer classification

• We’d like to automatically classify tissue samples according to whether there’s evidence of cancer or the type of tumor cells they contain

• What features to extract? - visual features due to different types of staining

- how active different genes are in the cells (gene expression)

!"#$%&'$"'()$*+,-,,,$./'/0$

01/%23)$&$%"41(/5$4&44&(6

!"#$%%

78,,$9/(($:)1/0

Page 5: 6.034 Introduction to Artificial Intelligence

Gene expressionReview441

Figure 2. A Contemporary View of Gene Expression

Recent findings suggest that each step regulating gene expression (from transcription to translation) is a subdivision of a continuous process.In this contemporary view of gene expression, each stage is physically and functionally connected to the next, ensuring that there is efficienttransfer between manipulations and that no individual step is omitted (see text for details).

transcript is spooling off the transcribing RNAP II. The containing two copies each of four histone proteins:H2A, H2B, H3, and H4 (Luger et al., 1997). These small,picture that is emerging is one in which most steps are

physically and functionally connected—conveyor belt- positively charged proteins show remarkable conserva-tion among eukaryotes and are the protein buildingstyle—ensuring efficient transfer from one manipulation

to the next (Figure 2). This organization of events may blocks of our chromosomes. Further compaction of ourgenes is achieved via poorly defined levels of higher-also introduce a series of quality control mechanisms,

as it ensures that no individual step is omitted. order nucleosome folding.Once thought of as being a static organizationalThe results of a large body of work have revealed at

least three general principles. (1) The protein factors framework for DNA, it is now apparent that chromatinplays a pivotal role in regulating gene transcription byresponsible for each individual step in the pathway from

gene to protein are functionally, and sometimes physi- marshalling access of the transcriptional apparatus tocally, connected. (2) Regulation of the pathway is con- genes (reviewed by Narlikar et al., 2002 [this issue oftrolled at multiple stages. (3) No general rules exist de- Cell]). However, not all chromatin is equal. Untran-scribing how the pathway is regulated. Different classes scribed regions of the genome are packaged into highlyof gene are regulated at different stages. condensed “heterochromatin,” while transcribed genes

In this review, we focus on novel paradigms that de- are present in more accessible “euchromatin” (reviewedscribe the functioning and regulation of the gene expres- by Richards and Elgin, 2002 [this issue of Cell]). Eachsion pathway and the connections that exist between cell type packages its genes into a unique pattern ofthe constituent steps of this process. heterochromatin and euchromatin, and this pattern is

maintained after cell division. The pattern of packaginginto these alternative chromatin states determinesThe Role of Chromatin Structure in Genewhich genes will be active in a newly divided cell, thusExpression: Not Just Packagingensuring that the unique characteristics of each cellThe DNA in our cells is not naked, but packaged into alineage are transferred from generation to generation.highly organized and compact nucleoprotein structureTo activate gene expression, transcriptional activatorknown as chromatin. The basic organizational unit ofproteins must, therefore, contend with inaccessible andchromatin is the nucleosome, which consists of 146 bp

of DNA wrapped almost twice around a protein core repressive chromatin structures. As we discuss below,

(Orphanides et al. 2002)

Page 6: 6.034 Introduction to Artificial Intelligence

Measuring gene expression

• Basic cDNA micro-array technology

!"##$%&'()'*!+,"-'.)'*/#01$2+3'/)'*405+6'7)'*!+,8$9+65%30&":'/)'*;0##0$95'.)'*(+<<,+='>)'*?"@5@+06'A)*$62*?,"B6'!)*CDEEEF*G+6"9+HB02+*$6$#=505*"<*

AI/*%"J=H6-9K+,*%3$68+5*-5068*%AI/ 90%,"$,,$=5)*!"#$%&'('#$'*!"'*LDMLN)

!"#$%&'(&#&%$()*+(,-#.(/'$$"

0123.%&4/(%.5$4,4/%&4#1(&#(

6'1'-%&'(74#&418$%7'$'9(/)*+

:;<8=<<(,#$9(%.5$4,4/%&4#1>

!""#$%&'() *%+%,-(. /).%'($%+&012

34'5"146

?37-4942'(&#(+--%3(

:@;#A(#B'-146C&>

D%"C(E(F&%41 !.%6'(A%5&G-'

718(9(:;<:0++%'

=>2?(&2@A772($0B%

H%&%(

0I&-%/&4#1

!"##$%&'()'*!+,"-'.)'*/#01$2+3'/)'*405+6'7)'*!+,8$9+65%30&":'/)'*;

0##0$95'.)'*(+<<,+='>)'*?"@5@+06'A)*$62*?,"B6'!)*CDEEEF*G+6"9+HB02+*$6$#=505*"<*

AI/*%"J=H6-9K+,*%3$68+5*-5068*%AI/90%,"$,,$=5)*!"#$%&

'('#$'*!

"'*LDMLN)

!"##$%&'()'*!+,"-'.)'*/#01$2+3'/)'*405+6'7)'*!+,8$9+65%30&":'/)'*;0##0$95'.)'*(+<<,+='>)'*?"@5@+06'A)*$62*?,"B6'!)*CDEEEF*G+6"9+HB02+*$6$#=505*"<*

AI/*%"J=H6-9K+,*%3$68+5*-5068*%AI/ 90%,"$,,$=5)*!"#$%&'('#$'*!"'*LDMLN)

!"##$%&'()'*!+,"-'.)'*/#01$2+3'/)'*405+6'7)'*!+,8$9+65%30&":'/)'*;0##0$95'.)'*(+<<,+='>)'*?"@5@+06'A)*$62*?,"B6'!)*CDEEEF*G+6"9+HB02+*$6$#=505*"<*

AI/*%"J=H6-9K+,*%3$68+5*-5068*%AI/ 90%,"$,,$=5)*!"#$%&'('#$'*!"'*LDMLN)

1.20.51.02.3· · ·

genes

!"##$%&'()'*!+,"-'.)'*/#01$2+3'/)'*405+6'7)'*!+,8$9+65%30&":'/)'*;0##0$95'.)'*(+<<,+='>)'*?"@5@+06'A)*$62*?,"B6'!)*CDEEEF*G+6"9+HB02+*$6$#=505*"<*

AI/*%"J=H6-9K+,*%3$68+5*-5068*%AI/ 90%,"$,,$=5)*!"#$%&'('#$'*!"'*LDMLN)

!"#$%&'(&#&%$()*+(,-#.(/'$$"

0123.%&4/(%.5$4,4/%&4#1(&#(

6'1'-%&'(74#&418$%7'$'9(/)*+

:;<8=<<(,#$9(%.5$4,4/%&4#1>

!""#$%&'() *%+%,-(. /).%'($%+&012

34'5"146

?37-4942'(&#(+--%3(

:@;#A(#B'-146C&>

D%"C(E(F&%41 !.%6'(A%5&G-'

718(9(:;<:0++%'

=>2?(&2@A772($0B%

H%&%(

0I&-%/&4#1

!"##$%&'()'*!+,"-'.)'*/#01$2+3'/)'*405+6'7)'*!+,8$9+65%30&":'/)'*;

0##0$95'.)'*(+<<,+='>)'*?"@5@+06'A)*$62*?,"B6'!)*CDEEEF*G+6"9+HB02+*$6$#=505*"<*

AI/*%"J=H6-9K+,*%3$68+5*-5068*%AI/90%,"$,,$=5)*!"#$%&

'('#$'*!

"'*LDMLN)

sample (e.g., tumor)control

Tissue profile

!"##$%&'()'*!+,"-'.)'*/#01$2+3'/)'*405+6'7)'*!+,8$9+65%30&":'/)'*;0##0$95'.)'*(+<<,+='>)'*?"@5@+06'A)*$62*?,"B6'!)*CDEEEF*G+6"9+HB02+*$6$#=505*"<*

AI/*%"J=H6-9K+,*%3$68+5*-5068*%AI/ 90%,"$,,$=5)*!"#$%&'('#$'*!"'*LDMLN)

Page 7: 6.034 Introduction to Artificial Intelligence

Measuring gene expression

• Basic cDNA micro-array technology

!"##$%&'()'*!+,"-'.)'*/#01$2+3'/)'*405+6'7)'*!+,8$9+65%30&":'/)'*;0##0$95'.)'*(+<<,+='>)'*?"@5@+06'A)*$62*?,"B6'!)*CDEEEF*G+6"9+HB02+*$6$#=505*"<*

AI/*%"J=H6-9K+,*%3$68+5*-5068*%AI/ 90%,"$,,$=5)*!"#$%&'('#$'*!"'*LDMLN)

!"#$%&'(&#&%$()*+(,-#.(/'$$"

0123.%&4/(%.5$4,4/%&4#1(&#(

6'1'-%&'(74#&418$%7'$'9(/)*+

:;<8=<<(,#$9(%.5$4,4/%&4#1>

!""#$%&'() *%+%,-(. /).%'($%+&012

34'5"146

?37-4942'(&#(+--%3(

:@;#A(#B'-146C&>

D%"C(E(F&%41 !.%6'(A%5&G-'

718(9(:;<:0++%'

=>2?(&2@A772($0B%

H%&%(

0I&-%/&4#1

!"##$%&'()'*!+,"-'.)'*/#01$2+3'/)'*405+6'7)'*!+,8$9+65%30&":'/)'*;

0##0$95'.)'*(+<<,+='>)'*?"@5@+06'A)*$62*?,"B6'!)*CDEEEF*G+6"9+HB02+*$6$#=505*"<*

AI/*%"J=H6-9K+,*%3$68+5*-5068*%AI/90%,"$,,$=5)*!"#$%&

'('#$'*!

"'*LDMLN)

!"##$%&'()'*!+,"-'.)'*/#01$2+3'/)'*405+6'7)'*!+,8$9+65%30&":'/)'*;0##0$95'.)'*(+<<,+='>)'*?"@5@+06'A)*$62*?,"B6'!)*CDEEEF*G+6"9+HB02+*$6$#=505*"<*

AI/*%"J=H6-9K+,*%3$68+5*-5068*%AI/ 90%,"$,,$=5)*!"#$%&'('#$'*!"'*LDMLN)

!"##$%&'()'*!+,"-'.)'*/#01$2+3'/)'*405+6'7)'*!+,8$9+65%30&":'/)'*;0##0$95'.)'*(+<<,+='>)'*?"@5@+06'A)*$62*?,"B6'!)*CDEEEF*G+6"9+HB02+*$6$#=505*"<*

AI/*%"J=H6-9K+,*%3$68+5*-5068*%AI/ 90%,"$,,$=5)*!"#$%&'('#$'*!"'*LDMLN)

genes

!"##$%&'()'*!+,"-'.)'*/#01$2+3'/)'*405+6'7)'*!+,8$9+65%30&":'/)'*;0##0$95'.)'*(+<<,+='>)'*?"@5@+06'A)*$62*?,"B6'!)*CDEEEF*G+6"9+HB02+*$6$#=505*"<*

AI/*%"J=H6-9K+,*%3$68+5*-5068*%AI/ 90%,"$,,$=5)*!"#$%&'('#$'*!"'*LDMLN)

!"#$%&'(&#&%$()*+(,-#.(/'$$"

0123.%&4/(%.5$4,4/%&4#1(&#(

6'1'-%&'(74#&418$%7'$'9(/)*+

:;<8=<<(,#$9(%.5$4,4/%&4#1>

!""#$%&'() *%+%,-(. /).%'($%+&012

34'5"146

?37-4942'(&#(+--%3(

:@;#A(#B'-146C&>

D%"C(E(F&%41 !.%6'(A%5&G-'

718(9(:;<:0++%'

=>2?(&2@A772($0B%

H%&%(

0I&-%/&4#1

!"##$%&'()'*!+,"-'.)'*/#01$2+3'/)'*405+6'7)'*!+,8$9+65%30&":'/)'*;

0##0$95'.)'*(+<<,+='>)'*?"@5@+06'A)*$62*?,"B6'!)*CDEEEF*G+6"9+HB02+*$6$#=505*"<*

AI/*%"J=H6-9K+,*%3$68+5*-5068*%AI/90%,"$,,$=5)*!"#$%&

'('#$'*!

"'*LDMLN)

sample (e.g., tumor)control

Tissue profile

0.18−0.69

0.000.83

· · ·

Page 8: 6.034 Introduction to Artificial Intelligence

Cancer classificationtissues (with known tumor type)

genes

0.18−0.69

0.000.83

· · ·

(Golub et al. 1999)

Page 9: 6.034 Introduction to Artificial Intelligence

Machine learning problem6.034 Artificial Intelligence. Copyright © 2006 by Massachusetts Institute of Technology. All rights reserved

Slide 4.3.3

This one seems safer, no?

Another way to motivate the choice of the maximal margin separator is to see that it reduces the

"variance" of the hypothesis class. Recall that a hypothesis has large variance if small changes in the data

result in a very different hypothesis. With a maximal margin separator, we can wiggle the data quite a bit

without affecting the separator. Placing the separator very close to positive or negative points is a kind of

overfitting; it makes your hypothesis very dependent on details of the input data.

Let's see if we can figure out how to find the separator with maximal margin as suggested by this picture.

Slide 4.3.4

First we have to define what we are trying to optimize. Clearly we want to use our old definition of

margin, but we'll have to deal with a couple of issues first. Note that we're using the w, b notation instead

of w bar, because we will end up giving b special treatment in the future.

Slide 4.3.5

Remember that any scaling of w and b defines the same line; but it will result in different values of

gamma. To get the actual geometric distance from the point to the separator (called the geometric

margin), we need to divide gamma through by the magnitude of w.

Slide 4.3.6

The next issue is that the we have defined the margin for a point relative to a separator but we don't want

to just maximize the margin of some particular single point. We want to focus on one point on each side

of the separator, each of which is closest to the separator. And we want to place the separator so that the

it is as far from these two points as possible. Then we will have the maximal margin between the two

classes.

Since we have a degree of freedom in the magnitude of w we're going to just define the margin for each

of these points to be 1. (You can think of this 1 as having arbitrary units given by the magnitude of w.)

You might be worried that we can't possibly know which will be the two closest points until we know

what the separator is. It's a reasonable worry, and we'll sort it out in a couple of slides.

file:///C|/Documents%20and%20Settings/Administrator/My%2...hing/6.034/04/lessons/Chapter4/linearneural-handout.html (18 of 48)2/8/2007 1:49:35 PM

+1 -1

?

Page 10: 6.034 Introduction to Artificial Intelligence

Machine learning problem

• Complicating issues- micro-array measurements are very noisy

- each training example is of very high dimension (e.g., ~ 10,000 genes)

- there are relatively few labeled tissue samples (only tens per class)

- some labels may be wrong

6.034 Artificial Intelligence. Copyright © 2006 by Massachusetts Institute of Technology. All rights reserved

Slide 4.3.3

This one seems safer, no?

Another way to motivate the choice of the maximal margin separator is to see that it reduces the

"variance" of the hypothesis class. Recall that a hypothesis has large variance if small changes in the data

result in a very different hypothesis. With a maximal margin separator, we can wiggle the data quite a bit

without affecting the separator. Placing the separator very close to positive or negative points is a kind of

overfitting; it makes your hypothesis very dependent on details of the input data.

Let's see if we can figure out how to find the separator with maximal margin as suggested by this picture.

Slide 4.3.4

First we have to define what we are trying to optimize. Clearly we want to use our old definition of

margin, but we'll have to deal with a couple of issues first. Note that we're using the w, b notation instead

of w bar, because we will end up giving b special treatment in the future.

Slide 4.3.5

Remember that any scaling of w and b defines the same line; but it will result in different values of

gamma. To get the actual geometric distance from the point to the separator (called the geometric

margin), we need to divide gamma through by the magnitude of w.

Slide 4.3.6

The next issue is that the we have defined the margin for a point relative to a separator but we don't want

to just maximize the margin of some particular single point. We want to focus on one point on each side

of the separator, each of which is closest to the separator. And we want to place the separator so that the

it is as far from these two points as possible. Then we will have the maximal margin between the two

classes.

Since we have a degree of freedom in the magnitude of w we're going to just define the margin for each

of these points to be 1. (You can think of this 1 as having arbitrary units given by the magnitude of w.)

You might be worried that we can't possibly know which will be the two closest points until we know

what the separator is. It's a reasonable worry, and we'll sort it out in a couple of slides.

file:///C|/Documents%20and%20Settings/Administrator/My%2...hing/6.034/04/lessons/Chapter4/linearneural-handout.html (18 of 48)2/8/2007 1:49:35 PM

+1 -1

?

Page 11: 6.034 Introduction to Artificial Intelligence

SVM classifiers

6.034 Artificial Intelligence. Copyright © 2006 by Massachusetts Institute of Technology. All rights reserved

Slide 4.3.3

This one seems safer, no?

Another way to motivate the choice of the maximal margin separator is to see that it reduces the

"variance" of the hypothesis class. Recall that a hypothesis has large variance if small changes in the data

result in a very different hypothesis. With a maximal margin separator, we can wiggle the data quite a bit

without affecting the separator. Placing the separator very close to positive or negative points is a kind of

overfitting; it makes your hypothesis very dependent on details of the input data.

Let's see if we can figure out how to find the separator with maximal margin as suggested by this picture.

Slide 4.3.4

First we have to define what we are trying to optimize. Clearly we want to use our old definition of

margin, but we'll have to deal with a couple of issues first. Note that we're using the w, b notation instead

of w bar, because we will end up giving b special treatment in the future.

Slide 4.3.5

Remember that any scaling of w and b defines the same line; but it will result in different values of

gamma. To get the actual geometric distance from the point to the separator (called the geometric

margin), we need to divide gamma through by the magnitude of w.

Slide 4.3.6

The next issue is that the we have defined the margin for a point relative to a separator but we don't want

to just maximize the margin of some particular single point. We want to focus on one point on each side

of the separator, each of which is closest to the separator. And we want to place the separator so that the

it is as far from these two points as possible. Then we will have the maximal margin between the two

classes.

Since we have a degree of freedom in the magnitude of w we're going to just define the margin for each

of these points to be 1. (You can think of this 1 as having arbitrary units given by the magnitude of w.)

You might be worried that we can't possibly know which will be the two closest points until we know

what the separator is. It's a reasonable worry, and we'll sort it out in a couple of slides.

file:///C|/Documents%20and%20Settings/Administrator/My%2...hing/6.034/04/lessons/Chapter4/linearneural-handout.html (18 of 48)2/8/2007 1:49:35 PM

Predicted label training label

kernel (similarity)training example

new example

example weight

y = sign( n∑

i=1

yiαiK(xi,x) + w0

)offset

Page 12: 6.034 Introduction to Artificial Intelligence

SVM training

6.034 Artificial Intelligence. Copyright © 2006 by Massachusetts Institute of Technology. All rights reserved

Slide 4.3.3

This one seems safer, no?

Another way to motivate the choice of the maximal margin separator is to see that it reduces the

"variance" of the hypothesis class. Recall that a hypothesis has large variance if small changes in the data

result in a very different hypothesis. With a maximal margin separator, we can wiggle the data quite a bit

without affecting the separator. Placing the separator very close to positive or negative points is a kind of

overfitting; it makes your hypothesis very dependent on details of the input data.

Let's see if we can figure out how to find the separator with maximal margin as suggested by this picture.

Slide 4.3.4

First we have to define what we are trying to optimize. Clearly we want to use our old definition of

margin, but we'll have to deal with a couple of issues first. Note that we're using the w, b notation instead

of w bar, because we will end up giving b special treatment in the future.

Slide 4.3.5

Remember that any scaling of w and b defines the same line; but it will result in different values of

gamma. To get the actual geometric distance from the point to the separator (called the geometric

margin), we need to divide gamma through by the magnitude of w.

Slide 4.3.6

The next issue is that the we have defined the margin for a point relative to a separator but we don't want

to just maximize the margin of some particular single point. We want to focus on one point on each side

of the separator, each of which is closest to the separator. And we want to place the separator so that the

it is as far from these two points as possible. Then we will have the maximal margin between the two

classes.

Since we have a degree of freedom in the magnitude of w we're going to just define the margin for each

of these points to be 1. (You can think of this 1 as having arbitrary units given by the magnitude of w.)

You might be worried that we can't possibly know which will be the two closest points until we know

what the separator is. It's a reasonable worry, and we'll sort it out in a couple of slides.

file:///C|/Documents%20and%20Settings/Administrator/My%2...hing/6.034/04/lessons/Chapter4/linearneural-handout.html (18 of 48)2/8/2007 1:49:35 PM

minimizen∑

i=1

αi −12

i,j

yiyjαiαjK(xi,xj)

subject to αi ≥ 0,n∑

i=1

yiαi = 0

(where is w0?)

• SVMs are trained by solving a quadratic programming problem

Page 13: 6.034 Introduction to Artificial Intelligence

Back to the problem6.034 Artificial Intelligence. Copyright © 2006 by Massachusetts Institute of Technology. All rights reserved

Slide 4.3.3

This one seems safer, no?

Another way to motivate the choice of the maximal margin separator is to see that it reduces the

"variance" of the hypothesis class. Recall that a hypothesis has large variance if small changes in the data

result in a very different hypothesis. With a maximal margin separator, we can wiggle the data quite a bit

without affecting the separator. Placing the separator very close to positive or negative points is a kind of

overfitting; it makes your hypothesis very dependent on details of the input data.

Let's see if we can figure out how to find the separator with maximal margin as suggested by this picture.

Slide 4.3.4

First we have to define what we are trying to optimize. Clearly we want to use our old definition of

margin, but we'll have to deal with a couple of issues first. Note that we're using the w, b notation instead

of w bar, because we will end up giving b special treatment in the future.

Slide 4.3.5

Remember that any scaling of w and b defines the same line; but it will result in different values of

gamma. To get the actual geometric distance from the point to the separator (called the geometric

margin), we need to divide gamma through by the magnitude of w.

Slide 4.3.6

The next issue is that the we have defined the margin for a point relative to a separator but we don't want

to just maximize the margin of some particular single point. We want to focus on one point on each side

of the separator, each of which is closest to the separator. And we want to place the separator so that the

it is as far from these two points as possible. Then we will have the maximal margin between the two

classes.

Since we have a degree of freedom in the magnitude of w we're going to just define the margin for each

of these points to be 1. (You can think of this 1 as having arbitrary units given by the magnitude of w.)

You might be worried that we can't possibly know which will be the two closest points until we know

what the separator is. It's a reasonable worry, and we'll sort it out in a couple of slides.

file:///C|/Documents%20and%20Settings/Administrator/My%2...hing/6.034/04/lessons/Chapter4/linearneural-handout.html (18 of 48)2/8/2007 1:49:35 PM

+1 -1

?

• High dimensionality => linear kernel

• Noise in the measurements => feature selection (use only a relevant subset of the genes)

• Outliers => adjust the kernel to increase resistance to outliers

K(xi,xj) = (xTi xj + 1)

Page 14: 6.034 Introduction to Artificial Intelligence

Feature selection / ranking

• We can rank genes according to how much they seem to be related to the classification task

R(genei) =|µ+

i − µ−i |σ+ + σ−

mean value across +1 tissues mean value across -1 tissues

stdv across +1 tissues

stdv across -1 tissues

genes

Page 15: 6.034 Introduction to Artificial Intelligence

# of examples, dimensionality

• Suppose the expression levels of all the 10,000 genes in each tissue sample are drawn at random from some distribution (e.g., normal)

• Based on 5 such expression vectors for each class, can we find a gene that is perfectly correlated with the labels?

• The chance of this happening is 100%

• What if we have had instead 10 such vectors per class? The probability drops to 1%

Page 16: 6.034 Introduction to Artificial Intelligence

Dealing with outliers

• We should make the linear decision boundary resistant to outliers (e.g., due to mislabeled samples)

−5 0 5−5

0

5

−5 0 5−5

0

5

Page 17: 6.034 Introduction to Artificial Intelligence

Dealing with outliers

• One way to increase resistance to outliers is to add a diagonal term to the kernel function so that each example appears more similar to itself than before.

K ←

K(x1, x1) + λ · · · K(x1, xn)

· · · · · · · · ·K(xn, x1) · · · K(xn, xn) + λ

Page 18: 6.034 Introduction to Artificial Intelligence

The effect of lambda

−5 0 5−5

0

5λ = 0

Page 19: 6.034 Introduction to Artificial Intelligence

The effect of lambda

−5 0 5−5

0

5λ = 2

Page 20: 6.034 Introduction to Artificial Intelligence

The effect of lambda

−5 0 5−5

0

5λ = 4

Page 21: 6.034 Introduction to Artificial Intelligence

The effect of lambda

−5 0 5−5

0

5λ = 8

Page 22: 6.034 Introduction to Artificial Intelligence

The effect of lambda

−5 0 5−5

0

5λ = 16

Page 23: 6.034 Introduction to Artificial Intelligence

Results

• AML vs MML distinction- training set: 27 ALL and 11 AML

- test set: 20 ALL and 14 ALM

• The SVM classifier achieves perfect classification of the test samples

genes

Page 24: 6.034 Introduction to Artificial Intelligence

Problems we will cover

• Computational biology- cancer classification

- functional classification of genes

• Information retrieval- document classification/ranking

• Recommender systems- predicting user preferences (e.g., movies)

Page 25: 6.034 Introduction to Artificial Intelligence

Functional classification of genes

• We don’t know what most genes do

• Given known roles for some genes, we would like to predict the function of all the remaining genes

ribosomalgenes

F2N1.3T18A10.9F5J6.12· · ·

unannotated“genes”

YLA003WYPL037C· · ·

Page 26: 6.034 Introduction to Artificial Intelligence

Tissue/gene profiles

thym

us

lym

phnode

Tonsil

bonem

arr

ow

BM

-CD

105+

Endoth

elia

l

BM

-CD

34+

BM

-CD

71+

Earl

yE

ryth

roid

WH

OLE

BLO

OD

PB

-BD

CA

4+

Dentr

itic

_C

ells

BM

-CD

33+

Mye

loid

PB

-CD

14+

Monocyte

s

PB

-CD

56

+N

KC

ells

PB

-CD

4+

Tcells

PB

-CD

8+

Tcells

PB

-CD

19+

Bce

lls

721_B

_ly

mphobla

sts

lym

phom

aburk

itts

Raji

leukem

iapro

mye

locytic(h

l60)

lym

phom

aburk

itts

Daudi

leukem

ialy

mphobla

stic(m

olt4)

leukem

iachro

nic

mye

logenous(k

562)

Lung

feta

llung

PL

AC

EN

TA

Pro

sta

te

Th

yroid

Ute

rus

feta

lTh

yroid

Pancre

as

Pancre

aticIs

lets

trachea

saliv

ary

gla

nd

Colo

recta

lAdenocarc

inom

a

AD

IPO

CY

TE

bro

nchia

lepithelia

lcells

Ca

rdia

cM

yocyte

s

Sm

ooth

Muscle

testis

TestisG

erm

Cell

TestisLe

ydig

Ce

ll

TestisIn

ters

titial

TestisS

em

inifero

usT

ubule

Ovary

Ute

rusC

orp

us

skin

Appendix

Superi

orC

erv

icalG

anglio

n

Tri

gem

inalG

anglio

n

cili

ary

gang

lion

DR

G

atr

ioventr

icula

rnode

Skele

talM

uscle

TO

NG

UE

feta

lliver

Liver

kid

ne

y

Heart

adre

nalg

land

Adre

nalC

ort

ex

Pitu

itary

spin

alc

ord

Olfacto

ryB

ulb

feta

lbra

in

Tem

pora

lLobe

Pons

Medulla

Oblo

ngata

Parieta

lLobe

Cin

gula

teC

ort

ex

glo

buspalli

dus

subth

ala

mic

nucle

us

caudate

nucle

us

Whole

Bra

in

Am

ygdala

Pre

fronta

lCort

ex

Occip

italL

obe

Hypoth

ala

mus

Thala

mus

Cere

bellu

mP

eduncle

s

cere

bellu

m

Genes

How can only ~20,000 genes specify a complex mammal?

Cell-type specific gene expressiontissues/conditions

tissue profile

Page 27: 6.034 Introduction to Artificial Intelligence

Tissue/gene profiles

thym

us

lym

phnode

Tonsil

bonem

arr

ow

BM

-CD

105+

Endoth

elia

l

BM

-CD

34+

BM

-CD

71+

Earl

yE

ryth

roid

WH

OLE

BLO

OD

PB

-BD

CA

4+

Dentr

itic

_C

ells

BM

-CD

33+

Mye

loid

PB

-CD

14+

Monocyte

s

PB

-CD

56

+N

KC

ells

PB

-CD

4+

Tcells

PB

-CD

8+

Tcells

PB

-CD

19+

Bce

lls

721_B

_ly

mphobla

sts

lym

phom

aburk

itts

Raji

leukem

iapro

mye

locytic(h

l60)

lym

phom

aburk

itts

Daudi

leukem

ialy

mphobla

stic(m

olt4)

leukem

iachro

nic

mye

logenous(k

562)

Lung

feta

llung

PL

AC

EN

TA

Pro

sta

te

Th

yroid

Ute

rus

feta

lTh

yroid

Pancre

as

Pancre

aticIs

lets

trachea

saliv

ary

gla

nd

Colo

recta

lAdenocarc

inom

a

AD

IPO

CY

TE

bro

nchia

lepithelia

lcells

Ca

rdia

cM

yocyte

s

Sm

ooth

Muscle

testis

TestisG

erm

Cell

TestisLe

ydig

Ce

ll

TestisIn

ters

titial

TestisS

em

inifero

usT

ubule

Ovary

Ute

rusC

orp

us

skin

Appendix

Superi

orC

erv

icalG

anglio

n

Tri

gem

inalG

anglio

n

cili

ary

gang

lion

DR

G

atr

ioventr

icula

rnode

Skele

talM

uscle

TO

NG

UE

feta

lliver

Liver

kid

ne

y

Heart

adre

nalg

land

Adre

nalC

ort

ex

Pitu

itary

spin

alc

ord

Olfacto

ryB

ulb

feta

lbra

in

Tem

pora

lLobe

Pons

Medulla

Oblo

ngata

Parieta

lLobe

Cin

gula

teC

ort

ex

glo

buspalli

dus

subth

ala

mic

nucle

us

caudate

nucle

us

Whole

Bra

in

Am

ygdala

Pre

fronta

lCort

ex

Occip

italL

obe

Hypoth

ala

mus

Thala

mus

Cere

bellu

mP

eduncle

s

cere

bellu

m

Genes

How can only ~20,000 genes specify a complex mammal?

Cell-type specific gene expressiontissues/conditions

tissue profile

gene profile

Page 28: 6.034 Introduction to Artificial Intelligence

Machine learning problem

• Dimensionality no longer very high (# of tissue samples/conditions)

• Can use other kernels, e.g., radial basis kernel

• New problem: there are much more negatively labeled genes than positive

thym

us

lym

phnode

Tonsil

bonem

arr

ow

BM

-CD

105+

Endoth

elia

l

BM

-CD

34+

BM

-CD

71+

Earl

yE

ryth

roid

WH

OLE

BLO

OD

PB

-BD

CA

4+

Dentr

itic

_C

ells

BM

-CD

33+

Mye

loid

PB

-CD

14+

Monocyte

s

PB

-CD

56+

NK

Ce

lls

PB

-CD

4+

Tcells

PB

-CD

8+

Tcells

PB

-CD

19+

Bce

lls

721_B

_ly

mphobla

sts

lym

phom

aburk

itts

Raji

leukem

iapro

mye

locytic(h

l60)

lym

phom

aburk

itts

Daudi

leukem

ialy

mphobla

stic(m

olt4)

leukem

iachro

nic

mye

logenous(k

562)

Lung

feta

llung

PL

AC

EN

TA

Pro

sta

te

Th

yroid

Ute

rus

feta

lTh

yroid

Pancre

as

Pancre

aticIs

lets

trachea

saliv

ary

gla

nd

Colo

recta

lAdenocarc

inom

a

AD

IPO

CY

TE

bro

nchia

lepithelia

lcells

Card

iacM

yocyte

s

Sm

ooth

Muscle

testis

TestisG

erm

Cell

TestisLe

ydig

Ce

ll

TestisIn

ters

titial

TestisS

em

inifero

usT

ubule

Ovary

Ute

rusC

orp

us

skin

Appendix

SuperiorC

erv

icalG

anglio

n

Trigem

inalG

anglio

n

cili

ary

gang

lion

DR

G

atr

ioventr

icula

rnode

Skele

talM

uscle

TO

NG

UE

feta

lliver

Liver

kid

ne

y

Heart

adre

nalg

land

Adre

nalC

ort

ex

Pitu

itary

spin

alc

ord

Olfacto

ryB

ulb

feta

lbra

in

Tem

pora

lLobe

Pons

Medulla

Oblo

ngata

Parieta

lLobe

Cin

gula

teC

ort

ex

glo

buspalli

dus

subth

ala

mic

nucle

us

caudate

nucle

us

Whole

Bra

in

Am

ygdala

Pre

fronta

lCort

ex

Occip

italL

obe

Hypoth

ala

mus

Thala

mus

Cere

bellu

mP

eduncle

s

cere

bellu

m

Genes

How can only ~20,000 genes specify a complex mammal?

Cell-type specific gene expression

thym

us

lym

phnode

Tonsil

bonem

arr

ow

BM

-CD

105+

Endoth

elia

l

BM

-CD

34+

BM

-CD

71+

Earl

yE

ryth

roid

WH

OLE

BLO

OD

PB

-BD

CA

4+

Dentr

itic

_C

ells

BM

-CD

33+

Mye

loid

PB

-CD

14+

Monocyte

s

PB

-CD

56+

NK

Ce

lls

PB

-CD

4+

Tcells

PB

-CD

8+

Tcells

PB

-CD

19+

Bce

lls

721_B

_ly

mphobla

sts

lym

phom

aburk

itts

Raji

leukem

iapro

mye

locytic(h

l60)

lym

phom

aburk

itts

Daudi

leukem

ialy

mphobla

stic(m

olt4)

leukem

iachro

nic

mye

logenous(k

562)

Lung

feta

llung

PL

AC

EN

TA

Pro

sta

te

Th

yroid

Ute

rus

feta

lTh

yroid

Pancre

as

Pancre

aticIs

lets

trachea

saliv

ary

gla

nd

Colo

recta

lAdenocarc

inom

a

AD

IPO

CY

TE

bro

nchia

lepithelia

lcells

Card

iacM

yocyte

s

Sm

ooth

Muscle

testis

TestisG

erm

Cell

TestisLe

ydig

Ce

ll

TestisIn

ters

titial

TestisS

em

inifero

usT

ubule

Ovary

Ute

rusC

orp

us

skin

Appendix

SuperiorC

erv

icalG

anglio

n

Trigem

inalG

anglio

n

cili

ary

gang

lion

DR

G

atr

ioventr

icula

rnode

Skele

talM

uscle

TO

NG

UE

feta

lliver

Liver

kid

ne

y

Heart

adre

nalg

land

Adre

nalC

ort

ex

Pitu

itary

spin

alc

ord

Olfacto

ryB

ulb

feta

lbra

in

Tem

pora

lLobe

Pons

Medulla

Oblo

ngata

Parieta

lLobe

Cin

gula

teC

ort

ex

glo

buspalli

dus

subth

ala

mic

nucle

us

caudate

nucle

us

Whole

Bra

in

Am

ygdala

Pre

fronta

lCort

ex

Occip

italL

obe

Hypoth

ala

mus

Thala

mus

Cere

bellu

mP

eduncle

s

cere

bellu

m

Genes

How can only ~20,000 genes specify a complex mammal?

Cell-type specific gene expression

known -1 geneknown +1 gene

Page 29: 6.034 Introduction to Artificial Intelligence

Imbalanced classes

−4 −2 0 2 4−4

−3

−2

−1

0

1

2

3

4λ = 2

Page 30: 6.034 Introduction to Artificial Intelligence

Imbalanced classes

−4 −2 0 2 4−4

−3

−2

−1

0

1

2

3

4λ = 4

Page 31: 6.034 Introduction to Artificial Intelligence

Imbalanced classes

−4 −2 0 2 4−4

−3

−2

−1

0

1

2

3

4λ = 8

Page 32: 6.034 Introduction to Artificial Intelligence

Imbalanced classes

−4 −2 0 2 4−4

−3

−2

−1

0

1

2

3

4λ = 16

Page 33: 6.034 Introduction to Artificial Intelligence

Imbalanced classes

• In order to ensure that the classifier pays attention to the positive class, we increase (proportionally) resistance to negative examples

freq. of positive examples freq. of negative examples

K ←

K(x1, x1) + λ(n+/n) · · · K(x1, xn)

· · · · · · · · ·K(xn, x1) · · · K(xn, xn) + λ(n−/n)

Page 34: 6.034 Introduction to Artificial Intelligence

Differential resistance

−4 −2 0 2 4−4

−3

−2

−1

0

1

2

3

4

Page 35: 6.034 Introduction to Artificial Intelligence

Differential resistance

−4 −2 0 2 4−4

−3

−2

−1

0

1

2

3

4λ = 1

Page 36: 6.034 Introduction to Artificial Intelligence

Functional annotation of genes

• SVMs perform very well (though there are other comparable methods)

• Learning methods can identify incorrectly annotated genes, predict functional roles for uncharacterized genes, as well as guide further experimental effort

• Used in many contexts; based on profiles, text, and/or sequence- e.g., understanding developmental roles of genes

(lineage specific genes)

- etc.

Page 37: 6.034 Introduction to Artificial Intelligence

Problems we will cover

• Computational biology- cancer classification

- functional classification of genes

• Information retrieval- document classification/ranking

• Recommender systems- predicting user preferences (e.g., movies)