Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor...

94
Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm [email protected]

Transcript of Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor...

Page 1: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 1

Probabilistic and Bayesian Analytics

Andrew W. MooreProfessor

School of Computer ScienceCarnegie Mellon University

www.cs.cmu.edu/[email protected]

Page 2: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 2

Probability• The world is a very uncertain place• 30 years of Artificial Intelligence and

Database research danced around this fact

• And then a few AI researchers decided to use some ideas from the eighteenth century.

• We will review the fundamentals of probability.

Page 3: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 3

Discrete Random Variables. Caso Binario

• A is a Boolean-valued random variable if A denotes an event, and there is some degree of uncertainty as to whether A occurs.

• Examples• A = The US president in 2023 will be

male• A = You wake up tomorrow with a

headache• A = You have Ebola

Page 4: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 4

Probabilidades• Denotamos P(A) , la probabilidad de A,

como “la fraccion de todas las ocurrencias en las cuales A es cierta”.

• Existen varias maneras de asignar probabilidades pero no hablaremos de eso ahora.

Page 5: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 5

Diagrama de Venn para visualizar A

Espacio muestral de todas las ocurrencias posibles.

El area es 1Ocurrencias en las cuales A es Falsa

Ocurrencias en las cuales A es cierta

P(A) = Area dela parte ovalada

Page 6: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 6

The Axioms of Probability• 0 <= P(A) <= 1• P(A es cierta en todas las ocurrencias) = 1• P(en ninguna ocurrencia A es cierta) = 0• P(A or B) = P(A) + P(B) si A y B son

disjuntos.

Estos axiomas fueron introducidos por la escuela rusa al principio de los 1900: Kolmogorov, Liapunov, Kinthchine, Chebychev

Page 7: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 7

Interpreting the axioms• 0 <= P(A) <= 1• P(True) = 1• P(False) = 0• P(A or B) = P(A) + P(B)

The area of A can’t get any smaller than 0

And a zero area would mean no world could ever have A true

Page 8: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 8

Interpreting the axioms• 0 <= P(A) <= 1• P(True) = 1• P(False) = 0• P(A or B) = P(A) + P(B)

The area of A can’t get any bigger than 1

And an area of 1 would mean all worlds will have A true

Page 9: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 9

Interpreting the axioms• 0 <= P(A) <= 1• P(True) = 1• P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)

A

B

Page 10: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 10

Interpreting the axioms• 0 <= P(A) <= 1• P(True) = 1• P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)

A

B

P(A or B)

BP(A and B)

Simple addition and subtraction

Page 11: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 11

Theorems from the Axioms• 0 <= P(A) <= 1, P(True) = 1, P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)

From these we can prove:P(not A) = P(~A) = 1-P(A)

• How?

Page 12: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 12

Another important theorem• 0 <= P(A) <= 1, P(True) = 1, P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)

From these we can prove:P(A) = P(A ^ B) + P(A ^ ~B)

• How?

Page 13: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 13

Multivalued Random Variables

• Suppose A can take on more than 2 values• A is a random variable with arity k if it can take on

exactly one value out of {v1,v2, .. vk}• Thus…

jivAvAP ji if 0)(

1)....( 21 kvAvAvAP

Page 14: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 14

An easy fact about Multivalued Random

Variables:• Using the axioms of probability…

0 <= P(A) <= 1, P(True) = 1, P(False) = 0P(A or B) = P(A) + P(B) - P(A and B)

• And assuming that A obeys…

• It’s easy to prove that

jivAvAP ji if 0)(1)....( 21 kvAvAvAP

)()...(1

21

i

jji vAPvAvAvAP

Page 15: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 15

An easy fact about Multivalued Random

Variables:• Using the axioms of probability…

0 <= P(A) <= 1, P(True) = 1, P(False) = 0P(A or B) = P(A) + P(B) - P(A and B)

• And assuming that A obeys…

• It’s easy to prove that

jivAvAP ji if 0)(1)...( 21 kvAvAvAP

)()...(1

21

i

jji vAPvAvAvAP

• And thus we can prove

1)(1

k

jjvAP

Page 16: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 16

Another fact about Multivalued Random

Variables:• Using the axioms of probability…

0 <= P(A) <= 1, P(True) = 1, P(False) = 0P(A or B) = P(A) + P(B) - P(A and B)

• And assuming that A obeys…

• It’s easy to prove that

jivAvAP ji if 0)(1)...( 21 kvAvAvAP

)(])...[(1

21

i

jji vABPvAvAvABP

Page 17: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 17

Another fact about Multivalued Random

Variables:• Using the axioms of probability…

0 <= P(A) <= 1, P(True) = 1, P(False) = 0P(A or B) = P(A) + P(B) - P(A and B)

• And assuming that A obeys…

• It’s easy to prove that

jivAvAP ji if 0)(1)...( 21 kvAvAvAP

)(])...[(1

21

i

jji vABPvAvAvABP

• And thus we can prove

)()(1

k

jjvABPBP

Page 18: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 18

Elementary Probability in Pictures

• P(~A) + P(A) = 1

Page 19: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 19

Elementary Probability in Pictures

• P(B) = P(B ^ A) + P(B ^ ~A)

Page 20: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 20

Elementary Probability in Pictures

1)(1

k

jjvAP

Page 21: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 21

Elementary Probability in Pictures

)()(1

k

jjvABPBP

Page 22: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 22

Conditional Probability• P(A|B) = fraccion de ocurrencias en las cuales B

es cierta y en las cuales A es tambien cierta.

F

H

H = “Have a headache”F = “Coming down with Flu”

P(H) = 1/10P(F) = 1/40P(H|F) = 1/2

“Headaches are rare and flu is rarer, but if you’re coming down with ‘flu there’s a 50-50 chance you’ll have a headache.”

Page 23: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 23

Conditional ProbabilityF

H

H = “Have a headache”F = “Coming down with Flu”

P(H) = 1/10P(F) = 1/40P(H|F) = 1/2

P(H|F) = fraccion de los que tiene flu que tienen headache

= #ocurrencias con flu and headache ------------------------------------ #ocurrencias with flu

= Area of “H and F” region ------------------------------ Area of “F” region

= P(H ^ F) ----------- P(F)

Page 24: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 24

Definition of Conditional Probability

P(A ^ B) P(A|B) = ----------- P(B)

Corollary: The Chain Rule

P(A ^ B) = P(A|B) P(B)

Page 25: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 25

Probabilistic Inference

F

H

H = “Have a headache”F = “Coming down with Flu”

P(H) = 1/10P(F) = 1/40P(H|F) = 1/2

One day you wake up with a headache. You think: “Drat! 50% of flus are associated with headaches so I must have a 50-50 chance of coming down with flu”

Is this reasoning good?

Page 26: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 26

Probabilistic Inference

F

H

H = “Have a headache”F = “Coming down with Flu”

P(H) = 1/10P(F) = 1/40P(H|F) = 1/2

P(F ^ H) =P(F)P(H/F)=(1/40)(1/2)=1/80

P(F|H) =P(F^H)/P(H)=(1/80)/(1/10)=1/8=.125

Page 27: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 27

Another way to understand the intuition

Thanks to Jahanzeb Sherwani for contributing this explanation:

Page 28: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 28

What we just did… P(A ^ B) P(A|B) P(B)P(B|A) = ----------- = --------------- P(A) P(A)

This is Bayes Rule. P(B) es llamada la probabilidad apriori, P(A/B) es llamada la veosimiltud, y P(B/A) es la probabilidad posterior

Bayes, Thomas (1763) An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418

Page 29: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 29

More General Forms of Bayes Rule

)(~)|~()()|(

)()|()|(

APABPAPABP

APABPBAP

)(

)()|()|(

XBP

XAPXABPXBAP

Page 30: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 30

More General Forms of Bayes Rule

An

kkk

iii

vAPvABP

vAPvABPBvAP

1

)()|(

)()|()|(

Page 31: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 31

Useful Easy-to-prove facts1)|(~)|( BAPBAP

1)|(1

An

kk BvAP

Page 32: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 32

The Joint Distribution

Recipe for making a joint distribution of M variables:

Example: Boolean variables A, B, C

Page 33: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 33

Usando la Regla de Bayes en Juegos

El sobre “Ganador” tiene un dolar y 4 bolitas (2 rojas y 2 negras).

$1.00

El sobre “Perdedor” tiene 3 bolitas ( una roja y dos negras) y no tiene dinero.

Pregunta: Alguien extrae un sobre al azar y te lo ofrece en venta? Cuanto deberias pagar por el sobre?

Page 34: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 34

Using Bayes Rule to Gamble

El sobre “Ganador” tiene un dolar y 4 bolitas (2 rojas y 2 negras).

$1.00

El sobre “Perdedor” tiene 3 bolitas ( una roja y dos negras) y no tiene dinero.

Pregunta interesante: Antes de decidir, se le permite a Ud. Que vea una bolita extraida del sobre.

Suponga que es negra cuanto deberia Ud pagar por el sobre? Suponga que es roja, cuanto deberia Ud. Pagar por el sobre?

Page 35: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 35

Calculo…$1.00

P(Ganador/Bola negra)=P(Ganador^BN)/P(BN)

=P(Ganador)P(BN/Ganador)/P(BN)

P(BN)=P(BN^Ganador)+P(BN^Perdedor)

=P(Ganador)P(BN/Ganador)/P(BN)+P(Perdedor)P(BN/Perdedor)

=(1/2)(2/4)+(1/2)(2/3)=7/12

P(Ganador/Bola negra)=(1/4)/(7/12)=3/7=.43

No pagaria nada por el sobre.

Ahora P(Ganador/Bola Roja)=P(Ganador ^ BR)/P(BR)

=.40

Page 36: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 36

Distribuciones conjuntas

Recipe for making a joint distribution of M variables:

1. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables then the table will have 2M rows).

Example: Boolean variables A, B, C

A B C0 0 0

0 0 1

0 1 0

0 1 1

1 0 0

1 0 1

1 1 0

1 1 1

A=1 si la persona es varon y 0 si es mujer, B=1 si la persona esta casada y 0 si no esta y C =1 si la persona

esta enferma y 0 si no lo esta

Page 37: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 37

The Joint Distribution

Recipe for making a joint distribution of M variables:

1. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables then the table will have 2M rows).

2. For each combination of values, say how probable it is.

Example: Boolean variables A, B, C

A B C Prob0 0 0 0.30

0 0 1 0.05

0 1 0 0.10

0 1 1 0.05

1 0 0 0.05

1 0 1 0.10

1 1 0 0.25

1 1 1 0.10

Page 38: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 38

The Joint Distribution

Recipe for making a joint distribution of M variables:

1. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables then the table will have 2M rows).

2. For each combination of values, say how probable it is.

3. If you subscribe to the axioms of probability, those numbers must sum to 1.

Example: Boolean variables A, B, C

A B C Prob0 0 0 0.30

0 0 1 0.05

0 1 0 0.10

0 1 1 0.05

1 0 0 0.05

1 0 1 0.10

1 1 0 0.25

1 1 1 0.10

A

B

C0.050.25

0.10 0.050.05

0.10

0.100.30

Page 39: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 39

Datos para el ejemplo de la JD

• The Census dataset:48842 instances, mix of continuous and

discrete features (train=32561, test=16281)45222.

if instances with unknown values are removed (train=30162,test=15060)

Disponible en: http://ftp.ics.uci.edu/pub/machine-learning-databases

Donantes: Ronny Kohavi y Barry Becker (1996).

Page 40: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 40

Variables en Census• 1-age: continuous.• 2-workclass: Private, Self-emp-not-inc, Self-emp-inc,

Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.

• 3-fnlwgt (final weight) : continuous.• 4-education: Bachelors, Some-college, 11th, HS-

grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.

• 5-education-num: continuous.• 6-marital-status: Married-civ-spouse, Divorced,

Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.

• 7-occupation: • 8-relationship: Wife, Own-child, Husband, Not-in-

family, Other-relative, Unmarried.

Page 41: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 41

Variables en Census• 9-race: White, Asian-Pac-Islander,

Amer-Indian-Eskimo, Other, Black.• 10-sex: Female[0], Male[1].• 11-capital-gain: continuous.• 12-capital-loss: continuous.• 13-hours-per-week: continuous.• 14-native-country: nominal• 15 Salary: >50K (rich)[2], <=50K

(poor)[1].

Page 42: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 42

Primeros datos de census(train)

> traincensus[1:5,] age workcl fwgt educ educn marital occup relat race sex gain

loss hpwk1 39 6 77516 13 13 3 9 4 1 1 2174 0 402 50 2 83311 13 13 1 5 3 1 1 0 0 133 38 1 215646 9 9 2 7 4 1 1 0 0 404 53 1 234721 7 7 1 7 3 5 1 0 0 405 28 1 338409 13 13 1 6 1 5 0 0 0 40 country salary1 1 12 1 13 1 14 1 15 13 1>

Page 43: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 43

Calculando frecuencias para sexo

> countsex=table(traincensus$sex)> probsex=countsex/32561> countsex

0 1 10771 21790 > probsex

0 1 0.3307945 0.6692055 >

Page 44: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 44

Calculando las probabilidades conjuntas

> jdcensus=traincensus[,c(10,13,15)]> dim(jdcensus)[1] 32561 3> jdcensus[1:3,] sex hpwk salary1 1 40 12 1 13 13 1 40 1> dim(jdcensus[jdcensus$sex==0 &jdcensus$hpwk<=40.5

&jdcensus$salary==1,])[1][1] 8220> dim(jdcensus[jdcensus$sex==0 &jdcensus$hpwk<=40.5

&jdcensus$salary==1,])[1]/dim(jdcensus)[1][1] 0.2524492> dim(jdcensus[jdcensus$sex==0 &jdcensus$hpwk<=40.5

&jdcensus$salary==2,])[1]/dim(jdcensus)[1][1] 0.02484567

Page 45: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 45

> dim(jdcensus[jdcensus$sex==0 &jdcensus$hpwk>40.5 &jdcensus$salary==1,])[1]/dim(jdcensus)[1]

[1] 0.0421363> dim(jdcensus[jdcensus$sex==0 &jdcensus$hpwk>40.5

&jdcensus$salary==2,])[1]/dim(jdcensus)[1][1] 0.01136329 > dim(jdcensus[jdcensus$sex==1 &jdcensus$hpwk<40.5

&jdcensus$salary==1,])[1]/dim(jdcensus)[1][1] 0.3309174> dim(jdcensus[jdcensus$sex==1 &jdcensus$hpwk<=40.5

&jdcensus$salary==2,])[1]/dim(jdcensus)[1][1] 0.09754> dim(jdcensus[jdcensus$sex==1 &jdcensus$hpwk>40.5

&jdcensus$salary==1,])[1]/dim(jdcensus)[1][1] 0.1336875> dim(jdcensus[jdcensus$sex==1 &jdcensus$hpwk>40.5

&jdcensus$salary==2,])[1]/dim(jdcensus)[1][1] 0.1070606

Page 46: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 46

Using the Joint

One you have the JD you can ask for the probability of any logical expression involving your attribute

E

PEP matching rows

)row()(

Page 47: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 47

Using the Joint

P(Poor Male) = 0.4654 E

PEP matching rows

)row()(

Page 48: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 48

Using the Joint

P(Poor) = 0.7604 E

PEP matching rows

)row()(

Page 49: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 49

Inference with the

Joint

2

2 1

matching rows

and matching rows

2

2121 )row(

)row(

)(

)()|(

E

EE

P

P

EP

EEPEEP

Page 50: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 50

Inference with the

Joint

2

2 1

matching rows

and matching rows

2

2121 )row(

)row(

)(

)()|(

E

EE

P

P

EP

EEPEEP

P(Male | Poor) = 0.4654 / 0.7604 = 0.612

Page 51: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 51

Inferencia es un asunto importante

• Tengo esta evidencia, cual es la probabilidad de que esta conclusion sea cierta?• Tengo la nuca hinchada: Cual es la probabilidad

de que tenga meningitis?• Veo luces en mi casa y son las 9pm. Cual es la

probabilidad de que mi esposa ya se haya quedado dormida?

• Hay bastante aplicaciones de Inferencia Bayesiana en Medicina, Farmacia, Diagnostico de Falla de Maquinas, etc.

Page 52: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 52

Where do Joint Distributions come from?

• Idea One: Expert Humans• Idea Two: Simpler probabilistic facts and some

algebraExample: Suppose you knew

P(A) = 0.7

P(B|A) = 0.2P(B|~A) = 0.1

P(C|A^B) = 0.1P(C|A^~B) = 0.8P(C|~A^B) = 0.3P(C|~A^~B) = 0.1

Then you can automatically compute the JD using the chain rule

P(A=x ^ B=y ^ C=z) =P(C=z|A=x^ B=y) P(B=y|A=x) P(A=x)

In another lecture: Bayes Nets, a systematic way to do this.

Page 53: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 53

Where do Joint Distributions come from?

• Idea Three: Learn them from data!

Prepare to see one of the most impressive learning algorithms you’ll come across in the entire course….

Page 54: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 54

Learning a joint distributionBuild a JD table for your attributes in which the probabilities are unspecified

The fill in each row with

records ofnumber total

row matching records)row(ˆ P

A B C Prob0 0 0 ?

0 0 1 ?

0 1 0 ?

0 1 1 ?

1 0 0 ?

1 0 1 ?

1 1 0 ?

1 1 1 ?

A B C Prob0 0 0 0.30

0 0 1 0.05

0 1 0 0.10

0 1 1 0.05

1 0 0 0.05

1 0 1 0.10

1 1 0 0.25

1 1 1 0.10Fraction of all records in whichA and B are True but C is False

Page 55: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 55

Example of Learning a Joint• This Joint was

obtained by learning from three attributes in the UCI “Adult” Census Database [Kohavi 1995]

Page 56: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 56

Where are we?• We have recalled the fundamentals of

probability• We have become content with what

JDs are and how to use them• And we even know how to learn JDs

from data.

Page 57: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 57

Density Estimation• Our Joint Distribution learner is our first

example of something called Density Estimation

• A Density Estimator learns a mapping from a set of attributes to a Probability

DensityEstimator

ProbabilityInput

Attributes

Page 58: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 58

Density Estimation• Comparado con los otros dos modelos

principales:

Regressor Prediction ofreal-valued output

InputAttributes

DensityEstimator

ProbabilityInput

Attributes

Classifier Prediction ofcategorical output

InputAttributes

Page 59: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 59

Evaluacion de la estimacion de densidad

Regressor Prediction ofreal-valued output

InputAttributes

DensityEstimator

ProbabilityInput

Attributes

Classifier Prediction ofcategorical output

InputAttributes

Test set Accuracy

?

Test set Accuracy

Test-set criterion for estimating performance on future data** See the Decision Tree or Cross Validation lecture for more detail

Page 60: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 60

• Given a record x, a density estimator M can tell you how likely the record is:

• Given a dataset with R records, a density estimator can tell you how likely the dataset is:(Under the assumption that all records were

independently generated from the Density Estimator’s JD)

Evaluating a density estimator

R

kkR |MP|MP|MP

121 )(ˆ)(ˆ)dataset(ˆ xxxx

)(ˆ |MP x

Page 61: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 61

The Auto-mpg datasetDonor: Quinlan,R. (1993) Number of Instances: 398 minus 6 missing=392 (training: 196

test: 196): Number of Attributes: 9 including the class attribute7. Attribute

Information: 1. mpg: continuous (discretizado bad<=25,good>25) 2. cylinders: multi-valued discrete 3. displacement: continuous (discretizado low<=200,

high>200) 4. horsepower: continuous ((discretizado low<=90, high>90) 5. weight: continuous (discretizado low<=3000,

high>3000) 6. acceleration: continuous (discretizado low<=15, high>15) 7. model year: multi-valued discrete (discretizado 70-74,75-

77,78-82 8. origin: multi-valued discrete 9. car name: string (unique for each instance) Note: horsepower has 6 missing values

Page 62: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 62

The auto-mpg dataset18.0 8 307.0 130.0 3504. 12.0 70 1 "chevrolet chevelle

malibu" 15.0 8 350.0 165.0 3693. 11.5 70 1 "buick skylark 320" 18.0 8 318.0 150.0 3436. 11.0 70 1 "plymouth satellite" 16.0 8 304.0 150.0 3433. 12.0 70 1 "amc rebel sst" 17.0 8 302.0 140.0 3449. 10.5 70 1 "ford torino" …………………………………………………………………..27.0 4 140.0 86.00 2790. 15.6 82 1 "ford mustang gl" 44.0 4 97.00 52.00 2130. 24.6 82 2 "vw pickup" 32.0 4 135.0 84.00 2295. 11.6 82 1 "dodge rampage" 28.0 4 120.0 79.00 2625. 18.6 82 1 "ford ranger" 31.0 4 119.0 82.00 2720. 19.4 82 1 "chevy s-10"

Page 63: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 63

A subset of auto-mpg levelmpg modelyear maker [1,] "bad" "70-74" "america" [2,] "bad" "70-74" "america" [3,] "bad" "70-74" "america" [4,] "bad" "70-74" "america" [5,] "bad" "70-74" "america" [6,] "bad" "70-74" "america“…………………………………………………………………………………………….. [7,] "good" "78-82" "america" [8,] "good" "78-82" "europe" [9,] "good" "78-82" "america"[10,] "good" "78-82" "america"[11,] "good" "78-82" "america"

Page 64: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 64

Mpg: Train test and test set indices_samples=sample(1:392,196)mpgtrain=smallmpg[indices_samples,]dim(mpgtrain)[1] 196 3mpgtest=smallmpg[-indices_samples,]#computing the first joint probabilitya1=dim(mpgtrain[mpgtrain[,1]=="bad"

& mpgtrain[,2]=="70-74" & mpgtrain[,3]=="america",])[1]/192

Page 65: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 65

The joint density estimatelevelmpg modelyear maker probability

bad 70-74 America 0.2397959

bad 70-74 Asia 0.01530612

Bad 70-74 Europe 0.03061224

Bad 75-77 America 0.2091837

Bad 75-77 Asia 0.01020408

Bad 75-77 Europe 0.04591837

Bad 78-82 America 0.04591837

Bad 78-82 Asia 0.005102041

Bad 78-82 Europe 0

Good 70-74 America 0.02040816

Good 70-74 Asia 0.01530612

Good 70-74 Europe 0.03061224

Good 75-77 America 0.04081633

Good 75-77 Asia 0.04081633

Good 75-77 Europe 0.02551020

Good 78-82 America 0.1071429

Good 78-82 Asia 0.07142857

Good 78-82 Europe 0.04591837

Page 66: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 66

222-

196

119621

10 1.45 case) (in this

)(ˆ)(ˆ)mpgtest(ˆ

k

k|MP|MP|MP xxxx

This probability has been computed considering that any joint probability is at least greater than 1/1020=10-20

Page 67: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 67

A small dataset: Miles Per Gallon

From the UCI repository (thanks to Ross Quinlan)

192 Training Set Records

mpg modelyear maker

good 75to78 asiabad 70to74 americabad 75to78 europebad 70to74 americabad 70to74 americabad 70to74 asiabad 70to74 asiabad 75to78 america: : :: : :: : :bad 70to74 americagood 79to83 americabad 75to78 americagood 79to83 americabad 75to78 americagood 79to83 americagood 79to83 americabad 70to74 americagood 75to78 europebad 75to78 europe

Page 68: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 68

A small dataset: Miles Per Gallon

192 Training Set Records

mpg modelyear maker

good 75to78 asiabad 70to74 americabad 75to78 europebad 70to74 americabad 70to74 americabad 70to74 asiabad 70to74 asiabad 75to78 america: : :: : :: : :bad 70to74 americagood 79to83 americabad 75to78 americagood 79to83 americabad 75to78 americagood 79to83 americagood 79to83 americabad 70to74 americagood 75to78 europebad 75to78 europe

Page 69: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 69

A small dataset: Miles Per Gallon

192 Training Set Records

mpg modelyear maker

good 75to78 asiabad 70to74 americabad 75to78 europebad 70to74 americabad 70to74 americabad 70to74 asiabad 70to74 asiabad 75to78 america: : :: : :: : :bad 70to74 americagood 79to83 americabad 75to78 americagood 79to83 americabad 75to78 americagood 79to83 americagood 79to83 americabad 70to74 americagood 75to78 europebad 75to78 europe

203-

196

119621

10 3.4 case) (in this

)(ˆ)(ˆ)dataset /(ˆ

k

k|MP|MP|MP xxxx

Dataset puede ser mpgtrain o mpgtest

Page 70: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 70

Log Probabilities

Since probabilities of datasets get so small we usually use log probabilities

R

kk

R

kk |MP|MP|MP

11

)(ˆlog)(ˆlog)dataset(ˆlog xx

Page 71: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 71

A small dataset: Miles Per Gallon

192 Training Set Records

mpg modelyear maker

good 75to78 asiabad 70to74 americabad 75to78 europebad 70to74 americabad 70to74 americabad 70to74 asiabad 70to74 asiabad 75to78 america: : :: : :: : :bad 70to74 americagood 79to83 americabad 75to78 americagood 79to83 americabad 75to78 americagood 79to83 americagood 79to83 americabad 70to74 americagood 75to78 europebad 75to78 europe

466.19 case) (in this

)(ˆlog)(ˆlog)dataset(ˆlog11

R

kk

R

kk |MP|MP|MP xx

In our case= -510.79

Page 72: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 72

Resumen: The Good News• Hemos visto una manera de aprender

un estimador de densidad a partir de datos.

• Estimadores de densidad pueden ser usados para• Ordenar los records segun la probabilidad,

y detectar asi casos anomalos (“outliers”).• Para hacer inferencia: calcular P(E1|E2)• Para calcular clasificadores Bayesianos (se

vera mas tarde).

Page 73: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 73

Summary: The Bad News• Density estimation by directly learning

the joint is trivial, mindless and dangerous

Page 74: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 74

Using a test set

An independent test set with 196 cars has a worse log likelihood

(actually it’s a billion quintillion quintillion quintillion quintillion times less likely)

….Density estimators can overfit. And the full joint density estimator is the overfittiest of them all!

Page 75: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 75

Overfitting Density Estimators

If this ever happens, it means there are certain combinations that we learn are impossible

0)(ˆ any for if

)(ˆlog)(ˆlog)testset(ˆlog11

|MPk

|MP|MP|MP

k

R

kk

R

kk

x

xx

Page 76: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 76

Using a test set

The only reason that our test set didn’t score -infinity is that my code is hard-wired to always predict a probability of at least one in 1020

We need Density Estimators that are less prone to overfitting

Page 77: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 77

Naïve Density Estimation

El problema con el estimador de densidad conjunto es que simplemente refleja la muestra de entrenamiento.

Necesitamos algo que generalize en forma mas util.

El modelo naïve model generaliza mas eficientemente:

Asume que cada atributo esta distribuido independientemente uno del otro.

Page 78: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 78

A note about independence• Assume A and B are Boolean Random

Variables. Then“A and B are independent”

if and only ifP(A|B) = P(A)

• “A and B are independent” is often notated as

BA

Page 79: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 79

Independence Theorems• Assume P(A|B) = P(A)• Then P(A^B) =

= P(A) P(B)

• Assume P(A|B) = P(A)• Then P(B|A) =

• Por el resultado anterior

• P(B/A)=P(A)P(B)/P(A)

= P(B)

P(A/B)=P(A^B)/P(B)

Por Independencia,

P(A)=P(A^B)/P(B)

P(A^B)/P(A)

Page 80: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 80

Independence Theorems• Assume P(A|B) =

P(A)• Then P(~A|B) =• P(~A^B)/P(B)=• [P(B)-P(A^B)]/P(B)• 1-P(A^B)/P(B)=• 1-P(A), por Indep.• P(~A)

= P(~A)

• Assume P(A|B) = P(A)

• Then P(A|~B) =• P(A^~B)/P(~B)• [P(A)-P(A^B)]/1-P(B)• P(A)-P(A)P(B)/1-P(B)• P(A)[1-P(B)]/1-P(B)

= P(A)

Page 81: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 81

Multivalued Independence

For multivalued Random Variables A and B,

BAif and only if

)()|(:, uAPvBuAPvu from which you can then prove things like…

)()()(:, vBPuAPvBuAPvu )()|(:, vBPuAvBPvu

Page 82: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 82

Independently Distributed Data

• Let x[i] denote the i’th field of record x.• The independently distributed assumption

says that for any i,v, u1 u2… ui-1 ui+1… uM

)][(

)][,]1[,]1[,]2[,]1[|][( 1121

vixP

uMxuixuixuxuxvixP Mii

• Or in other words, x[i] is independent of {x[1],x[2],..x[i-1], x[i+1],…x[M]}

• This is often written as ]}[],1[],1[],2[],1[{][ Mxixixxxix

Page 83: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 83

Back to Naïve Density Estimation

• Let x[i] denote the i’th field of record x:• Naïve DE assumes x[i] is independent of {x[1],x[2],..x[i-1], x[i+1],…

x[M]}• Example:

• Suppose that each record is generated by randomly shaking a green dice and a red dice

• Dataset 1: A = red value, B = green value

• Dataset 2: A = red value, B = sum of values

• Dataset 3: A = sum of values, B = difference of values

• Which of these datasets violates the naïve assumption?

Page 84: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 84

Using the Naïve Distribution• Once you have a Naïve Distribution you can

easily compute any row of the joint distribution.

• Suppose A, B, C and D are independently distributed. What is P(A^~B^C^~D)?

Page 85: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 85

Using the Naïve Distribution• Once you have a Naïve Distribution you can

easily compute any row of the joint distribution.

• Suppose A, B, C and D are independently distributed. What is P(A^~B^C^~D)?

= P(A|~B^C^~D) P(~B^C^~D)= P(A) P(~B^C^~D)= P(A) P(~B|C^~D) P(C^~D)= P(A) P(~B) P(C^~D)= P(A) P(~B) P(C|~D) P(~D)= P(A) P(~B) P(C) P(~D)

Page 86: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 86

Naïve Distribution General Case

• Suppose x[1], x[2], … x[M] are independently distributed.

M

kkM ukxPuMxuxuxP

121 )][()][,]2[,]1[(

• So if we have a Naïve Distribution we can construct any row of the implied Joint Distribution on demand.

• So we can do any inference • But how do we learn a Naïve Density

Estimator?

Page 87: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 87

Learning a Naïve Density Estimator

records ofnumber total

][in which records#)][(ˆ uixuixP

Another trivial learning algorithm!

Page 88: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 88

ConparacionJoint DE Naïve DE

Can model anything Can model only very boring distributions

Puede modelar una copia ruidosa de la muestra original

No lo puede hacer

Given 100 records and more than 6 Boolean attributes will screw up badly

Given 100 records and 10,000 multivalued attributes will be fine

Page 89: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 89

A tiny part of the DE

learned by “Joint”

Empirical Results: “MPG”The “MPG” dataset consists of 392 records and 8 attributes

The DE learned by

“Naive”

Page 90: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 90

A tiny part of the DE

learned by “Joint”

Empirical Results: “MPG”The “MPG” dataset consists of 392 records and 8 attributes

The DE learned by

“Naive”

Page 91: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 91

The DE learned by

“Joint”

Empirical Results: “Weight vs. MPG”Suppose we train only from the “Weight” and “MPG” attributes

The DE learned by

“Naive”

Page 92: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 92

The DE learned by

“Joint”

Empirical Results: “Weight vs. MPG”Suppose we train only from the “Weight” and “MPG” attributes

The DE learned by

“Naive”

Page 93: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 93

The DE learned by

“Joint”

“Weight vs. MPG”: The best that Naïve can do

The DE learned by

“Naive”

Page 94: Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 94

Reminder: The Good News• Tenemos dos maneras de aprender un

estimador de densidad a partir de datos. • *In otras clases veremos estimadores de

densidad mas complicados. (Mixture Models, Bayesian Networks, Density Trees, Kernel Densities and many more)

• Los estimadores de densidad pueden ser usados para:• Deteccion de anomalias (“outliers”).• Para ser inferencias: Calcular P(E1|E2)• Para calcular clasificadores bayesianos.