Machine Learning Tutorial.1 Boolean algebra, probabilities.

42
Machine Learning Tutorial.1 Boolean algebra, probabilities

description

Propositional Logic The Ancient Greek philosophers created a system to formalize arguments called propositional logic. –A proposition is a statement that could be TRUE or FALSE –Propositions could be compounded by means of the operators AND, OR and NOT

Transcript of Machine Learning Tutorial.1 Boolean algebra, probabilities.

Page 1: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Machine Learning

Tutorial.1Boolean algebra, probabilities

Page 2: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Boolean Algebra Tutorial(Duncan Fyfe Gillies slides)

• Computer hardware works with binary numbers, but binary arithmetic is much older than computers– Ancient Chinese Civilisation (3000 BC)– Ancient Greek Civilisation (1000 BC)– Boolean Algebra (1850)

Page 3: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Propositional Logic

• The Ancient Greek philosophers created a system to formalize arguments called propositional logic.– A proposition is a statement that could be TRUE or

FALSE– Propositions could be compounded by means of the

operators AND, OR and NOT

Page 4: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Propositional Calculus Example

• Propositions may be TRUE or FALSE:– it is raining– the weather forecast is bad

• A combined proposition:– it is raining OR the weather forecast is bad

Page 5: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Propositional Calculus Example• We can assign values to propositions, for example:

I will take an umbrella if and only ifit is raining OR the weather forecast is bad

• In other words the proposition “I will take an umbrella” is the result of the Boolean combination (OR) between two conditions:

raining weather forecast being bad.

• In fact we could write:I will take an umbrella =

it is raining OR the weather forecast is bad

Page 6: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Diagrammatic representation

• We can think of the umbrella proposition as a result that we calculate from the weather forecast and the fact that it is raining by means of a logical or.

OR

Rain

Bad forecastTake an umbrella

Computer Organization class diagrams

Page 7: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Truth Tables

• Since propositions can only take two values, we can express all possible outcomes of the umbrella proposition by a table:– True = 1– False = 0

Rain Bad Forecast

Umbrella

False False FalseFalse True TrueTrue False TrueTrue True True

Rain Bad Forecast

Umbrella

0 0 00 1 11 0 11 1 1

Page 8: Machine Learning Tutorial.1 Boolean algebra, probabilities.

More complex propositions

• We can make our propositions more complex, for example:

(Take Umbrella ) =( NOT (Take Car ) ) AND ( (Bad Forecast ) OR (Raining ) )

OR

Rain

Bad forecast

ANDTake an umbrellaTake a car NOT

Page 9: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Boolean Algebra

• We could write propositional statements as above but to perform calculations quickly and efficiently we can use an equivalent, but more succinct notation.

• We also need a to have a well-defined semantics for all the “operators”, or connectives that we intend to use.

• The system we will employ is called Boolean Algebra (introduced by the English mathematician Boole in1850) and satisfies the criteria above.

Page 10: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Fundamentals of Boolean Algebra

• The truth values are replaced by 1 and 0:– TRUE = 1– FALSE = 0

• Propositions are replaced by variables (for example):– it is raining = R– The weather forecast is bad = W

• Operators are replaced by symbols– NOT = – OR = disjunction – AND = conjunction

Page 11: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Truth Tables

All possible outcomes of the operators can be written as truth tables

A B R

0 0 00 1 11 0 11 1 1

A B R

0 0 00 1 01 0 01 1 1

A R0 11 0

A AND B A OR B NOR A

For n variables there are 2n possibilities.

Page 12: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Theory of Probability

Olga Kubassova, School of Computing

http://www.comp.leeds.ac.uk/olga

7th February 2007

Page 13: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Probability Basics0≤P≤1

• Experiment – a process that yields outcome.

• Sample space – the set of all possible outcomes.

• Event – a subset of the sample space.– E.g. getting heads when tossing a coin.

• Mutually exclusive – if events cannot occur simultaneously.

Page 14: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Example:

• Rolling a fair six-sided die: • Sample Space: S={1,2,3,4,5,6}, |S| = 6

– Independent events• Event: obtaining 4

Page 15: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Probability notation• P(A) – the prob. of event A occurring

• P(A∪B) – prob. of A or B– Probability of rolling 1 or 6

• P(A∩B) – prob. of A and B– Probability or rolling 1 and 2 – Probability of rolling an even number grater than 3

• P(¬A) – prob. of A not occurring (the complement of A)

Pay attention to the AND/OR notation (set theory)

Page 16: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Even numbers on both dice?

Sum is greater than 5?

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

Are the dice identical?How many different events?

Even numbers on both dice

ANDSum is greater

than 5

Even numbers on both dice

ORSum is greater

than 5

Page 17: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Example:

• Rolling a fair six-sided die: • Sample Space: S={1,2,3,4,5,6}, |S| = 6

– Independent events• Event: obtaining 4

• P(S) =

– Sum of probabilities of all (independent!) events– P(eventi)=1/6 (uniform distribution)

1)(||

1

S

iieventp

Page 18: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Finite probability• Experiment that has a finite number of

outcomes.• Therefore, each event has a finite

probability.

• Experiment:Throw a die. A = number > 4. B = even number.Find P(A), P(B), P(A∪B) and P(A∩B)

Page 19: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Mutually Exclusive Events

• Events are mutually exclusive if they are disjoint, or cannot occur simultaneously

A ∩ B = 0

Examples?

Even numbers on both dice

ANDSum is less

than 1

Page 20: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Uniform distributions • Every outcome in the sample space is

equally probable.

• E.g. tossing coins rolling dice drawing a card from a deck

Page 21: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Conditional probability

)()()|(

BPBAPBAP

• P(A | B) – the prob. of A happening given that B has occurred.

Page 22: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Example of the Conditional Probability

Imagine that 5% of people of a given population own at least one dog. 2% of people own at least one dog and at least one cat. What is the probability that someone will own a cat, given that they also have a dog?

Let A = “dog owner”, B = “cat owner”, then: P(A) = 0.05; P(A ∩ B) = 0.02

P(B | A) = P(A ∩B)/P(A)

Page 23: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Independence• If events A and B do not influence

each other then:

P(A | B) = P(A)

and

P(A∩B) = P(A)P(B)

Page 24: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Example (Independence)• Let throw the coin 2 times. Probability that

the first time it is heads and the second it is tail do not influence each other

Page 25: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Chain rule

),,,( 21 nAAAP

),|(),|()|()( 11213121 nn AAAPAAAPAAPAP

Chain rule allows computing probabilities of sequences of the events

Page 26: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Chain rule (II)• Let us compute sequences of events.• E.g. prob. of sequence of letters, ‘t’,

‘h’ and ‘e’:

P(t,h,e) = P(t)P(h | t)P(e | th)

Page 27: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Bayes’ theorem

Page 28: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Bayes’ theorem (II)• Meningitis causes stiff neck 50% cases• Prob. of meningitis is 1/50000• Prob. of stiff neck is 1/20

• What is the prob. of having meningitis if a patient has a stiff neck?

Page 29: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Additional resources• Andy Roberts homepage: http://andy-roberts.net/teaching/index.html• Google directory:

http://directory.google.com/Top/Science/Math/Probability

• Wikipedia:http://en.wikipedia.org/wiki/Probability

Page 30: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Exponentials – What are they?

• Simply: a number (base) raised to a power (exponent).

• E.g. the area of a square whose sides are of length 3…

• = 32 • = 3 squared, or 3 to the power 2 • = 3 × 3

Page 31: Machine Learning Tutorial.1 Boolean algebra, probabilities.

• Easy enough to calculate:

• ab = a × a × a … × a • So there are ‘b’ lots of ‘a’• What is 43?

• 4 × 4 × 4 = 64

Exponentials – A quick explanation

b

Page 32: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Exponentials in Computer Science

• Have you ever noticed the following numbers popping up in your computer science studies?

• 1, 2, 4, 8, 16, 32, 64, 128…1024…• What do they all have in common?• They are all powers of 2!!

Page 33: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Exponentials in Computer Science

• The powers of 2 should be (very) familiar to you by now.

• All the computers you have been using work in binary bits (at the lowest level)…

• 2 values: 0/1, true/false etc• all ‘data sizes’ must be expressed in powers of

2.

Page 34: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Exponentials in Computer Science

• 1 kilobyte ≠ 1000 bytes (‘standard’ use of kilo)• 1 kilobyte = 1024 bytes• Because…• 29 = 512• 210 = 1024• 211 = 2048• So 210 is the closest we can get to 1000

Page 35: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Working with Exponentials

• First, some easy exponentials to remember:

• a0 = 1

• a1 = a

Page 36: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Logarithms

• This is because log(arithm)s are just ‘reversed’ exponentials

• E.g. if 24 = 16• log216 = 4

– base is 2• Logarithms ‘map’ large numbers onto smaller

numbers

Page 37: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Logarithmic Scale

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 9 10

x

2 x̂

log(x,2)

Page 38: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Logarithms - Bases

• There are several common bases:– 10: very common base, Richter scale etc.– e: used by a lot of scientists– 2: very common in computer science. WHY?

Page 39: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Logarithms – An example

• Imagine a (small) football tournament of 8 teams.

• It is a knockout tournament, so every time that 2 teams play the losing team is eliminated from the tournament and the winning team goes on to the next round.

• How many rounds must be played to determine an overall winner?

Page 40: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Logarithms – An example

• Imagine a (small) football tournament of 8 teams.

• It is a knockout tournament, so every time that 2 teams play the losing one is removed from the tournament and the winning one goes on to the next round.

• How many rounds must be played to determine an overall winner?

Page 41: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Logarithms – An example

Round 0: 8 teams

Round 1: 4 teams

Round 2: 2 teams

Round 3: 1 team

3 rounds needed!

Page 42: Machine Learning Tutorial.1 Boolean algebra, probabilities.

Logarithms – Example explained

• 3 rounds are needed to determine the winner of 8 teams, competing 2 at a time (i.e. one-on-one)

• This can be easily calculated using logs.• 2 teams play at a time, so the base is 2.

(i.e. 2x = 8, so we need to use log2)

• So, log2 8 = ….• 3