CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction...

112
CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1- 6.4) Machine Learning (18.1-18.12, 20.1- 20.3.2) Questions on any topic Review pre-mid-term material if time and class interest Please review your quizzes, mid-term, and old CS-171 tests

Transcript of CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction...

Page 1: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

CS-171 Final Review• First-Order Logic, Knowledge Representation (8.1-8.5)• Constraint Satisfaction Problems (6.1-6.4)• Machine Learning (18.1-18.12, 20.1-20.3.2)

• Questions on any topic

• Review pre-mid-term material if time and class interest

• Please review your quizzes, mid-term, and old CS-171 tests• At least one question from a prior quiz or old CS-171 test will

appear on the Final Exam (and all other tests)

Page 2: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

2

Knowledge Representation using First-Order Logic

• Propositional Logic is Useful --- but has Limited Expressive Power

• First Order Predicate Calculus (FOPC), or First Order Logic (FOL).– FOPC has greatly expanded expressive power, though still limited.

• New Ontology– The world consists of OBJECTS (for propositional logic, the world was facts).– OBJECTS have PROPERTIES and engage in RELATIONS and FUNCTIONS.

• New Syntax– Constants, Predicates, Functions, Properties, Quantifiers.

• New Semantics– Meaning of new syntax.

• Knowledge engineering in FOL

Page 3: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

3

Review: Syntax of FOL: Basic elements

• Constants KingJohn, 2, UCI,...

• Predicates Brother, >,...

• Functions Sqrt, LeftLegOf,...

• Variables x, y, a, b,...

• Connectives , , , ,

• Equality =

• Quantifiers ,

Page 4: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

4

Syntax of FOL: Basic syntax elements are symbols

• Constant Symbols:– Stand for objects in the world.

• E.g., KingJohn, 2, UCI, ...

• Predicate Symbols– Stand for relations (maps a tuple of objects to a truth-value)

• E.g., Brother(Richard, John), greater_than(3,2), ...– P(x, y) is usually read as “x is P of y.”

• E.g., Mother(Ann, Sue) is usually “Ann is Mother of Sue.”

• Function Symbols– Stand for functions (maps a tuple of objects to an object)

• E.g., Sqrt(3), LeftLegOf(John), ...

• Model (world) = set of domain objects, relations, functions• Interpretation maps symbols onto the model (world)

– Very many interpretations are possible for each KB and world!– Job of the KB is to rule out models inconsistent with our knowledge.

Page 5: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

5

Syntax of FOL: Terms

• Term = logical expression that refers to an object

• There are two kinds of terms:

– Constant Symbols stand for (or name) objects:• E.g., KingJohn, 2, UCI, Wumpus, ...

– Function Symbols map tuples of objects to an object:• E.g., LeftLeg(KingJohn), Mother(Mary), Sqrt(x)• This is nothing but a complicated kind of name

– No “subroutine” call, no “return value”

Page 6: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

6

Syntax of FOL: Atomic Sentences

• Atomic Sentences state facts (logical truth values).– An atomic sentence is a Predicate symbol, optionally

followed by a parenthesized list of any argument terms– E.g., Married( Father(Richard), Mother(John) )– An atomic sentence asserts that some relationship (some

predicate) holds among the objects that are its arguments.

• An Atomic Sentence is true in a given model if the relation referred to by the predicate symbol holds among the objects (terms) referred to by the arguments.

Page 7: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

7

Syntax of FOL: Connectives & Complex Sentences

• Complex Sentences are formed in the same way, and are formed using the same logical connectives, as we already know from propositional logic

• The Logical Connectives:– biconditional– implication– and– or– negation

• Semantics for these logical connectives are the same as we already know from propositional logic.

Page 8: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

8

Syntax of FOL: Variables

• Variables range over objects in the world.

• A variable is like a term because it represents an object.

• A variable may be used wherever a term may be used.– Variables may be arguments to functions and predicates.

• (A term with NO variables is called a ground term.)• (A variable not bound by a quantifier is called free.)

Page 9: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

9

Syntax of FOL: Logical Quantifiers

• There are two Logical Quantifiers:– Universal: x P(x) means “For all x, P(x).”

• The “upside-down A” reminds you of “ALL.”– Existential: x P(x) means “There exists x such that, P(x).”

• The “upside-down E” reminds you of “EXISTS.”

• Syntactic “sugar” --- we really only need one quantifier.– x P(x) x P(x)– x P(x) x P(x)– You can ALWAYS convert one quantifier to the other.

• RULES: and

• RULE: To move negation “in” across a quantifier,change the quantifier to “the other quantifier”and negate the predicate on “the other side.”

– x P(x) x P(x)– x P(x) x P(x)

Page 10: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

Universal Quantification

• means “for all”

• Allows us to make statements about all objects that have certain properties

• Can now state general rules:

x King(x) => Person(x) “All kings are persons.”

x Person(x) => HasHead(x) “Every person has a head.” " i Integer(i) => Integer(plus(i,1)) “If i is an integer then i+1 is an integer.”

Note that " x King(x) Person(x) is not correct! This would imply that all objects x are Kings and are People

" x King(x) => Person(x) is the correct way to say this

Note that => is the natural connective to use with .

Page 11: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

Existential Quantification

• x means “there exists an x such that….” (at least one object x)

• Allows us to make statements about some object without naming it

• Examples:

x King(x) “Some object is a king.”

x Lives_in(John, Castle(x)) “John lives in somebody’s castle.”

i Integer(i) GreaterThan(i,0) “Some integer is greater than zero.”

Note that is the natural connective to use with

(And remember that => is the natural connective to use with )

Page 12: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

12

Combining Quantifiers --- Order (Scope)

The order of “unlike” quantifiers is important.

x y Loves(x,y) – For everyone (“all x”) there is someone (“exists y”) whom they love

$ y x Loves(x,y) - there is someone (“exists y”) whom everyone loves (“all x”)

Clearer with parentheses: y ( x Loves(x,y) )

The order of “like” quantifiers does not matter.x y P(x, y) y x P(x, y)x y P(x, y) y x P(x, y)

Page 13: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

13

De Morgan’s Law for Quantifiers

( )

( )

( )

( )

x P x P

x P x P

x P x P

x P x P

( )

( )

( )

( )

P Q P Q

P Q P Q

P Q P Q

P Q P Q

De Morgan’s Rule Generalized De Morgan’s Rule

Rule is simple: if you bring a negation inside a disjunction or a conjunction,always switch between them (or and, and or).

Page 14: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

14

Page 15: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

15

More fun with sentences

• “All persons are mortal.”• [Use: Person(x), Mortal (x) ]

• ∀x Person(x) Mortal(x)• ∀x ¬Person(x) ˅ Mortal(x)

• Common Mistakes:• ∀x Person(x) Mortal(x)

• Note that => is the natural connective to use with .

Page 16: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

16

More fun with sentences

• “Fifi has a sister who is a cat.”• [Use: Sister(Fifi, x), Cat(x) ]•  • ∃x Sister(Fifi, x) Cat(x)

• Common Mistakes:• ∃x Sister(Fifi, x) Cat(x)

• Note that is the natural connective to use with

Page 17: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

17

More fun with sentences

• “For every food, there is a person who eats that food.”• [Use: Food(x), Person(y), Eats(y, x) ]

• All are correct:• ∀x ∃y Food(x) [ Person(y) Eats(y, x) ] • ∀x Food(x) ∃y [ Person(y) Eats(y, x) ] • ∀x ∃y ¬Food(x) ˅ [ Person(y) Eats(y, x) ] • ∀x ∃y [ ¬Food(x) ˅ Person(y) ] [¬ Food(x) ˅ Eats(y, x) ] • ∀x ∃y [ Food(x) Person(y) ] [ Food(x) Eats(y, x) ]

• Common Mistakes:• ∀x ∃y [ Food(x) Person(y) ] Eats(y, x) • ∀x ∃y Food(x) Person(y) Eats(y, x)

Page 18: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

18

Semantics: Interpretation

• An interpretation of a sentence (wff) is an assignment that maps – Object constant symbols to objects in the world, – n-ary function symbols to n-ary functions in the world,– n-ary relation symbols to n-ary relations in the world

• Given an interpretation, an atomic sentence has the value “true” if it denotes a relation that holds for those individuals denoted in the terms. Otherwise it has the value “false.”– Example: Kinship world:

• Symbols = Ann, Bill, Sue, Married, Parent, Child, Sibling, …– World consists of individuals in relations:

• Married(Ann,Bill) is false, Parent(Bill,Sue) is true, …

• Your job, as a Knowledge Engineer, is to construct KB so it is true *exactly* for your world and intended interpretation.

Page 19: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

19

Semantics: Models and Definitions

• An interpretation and possible world satisfies a wff (sentence) if the wff has the value “true” under that interpretation in that possible world.

• A domain and an interpretation that satisfies a wff is a model of that wff

• Any wff that has the value “true” in all possible worlds and under all interpretations is valid.

• Any wff that does not have a model under any interpretation is inconsistent or unsatisfiable.

• Any wff that is true in at least one possible world under at least one interpretation is satisfiable.

• If a wff w has a value true under all the models of a set of sentences KB then KB logically entails w.

Page 20: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

20

Unification

• Recall: Subst(θ, p) = result of substituting θ into sentence p

• Unify algorithm: takes 2 sentences p and q and returns a unifier if one exists

Unify(p,q) = θ where Subst(θ, p) = Subst(θ, q)

• Example: p = Knows(John,x) q = Knows(John, Jane)

Unify(p,q) = {x/Jane}

Page 21: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

21

Unification examples

• simple example: query = Knows(John,x), i.e., who does John know?

p q θ Knows(John,x) Knows(John,Jane) {x/Jane}Knows(John,x) Knows(y,OJ) {x/OJ,y/John}Knows(John,x) Knows(y,Mother(y)) {y/John,x/Mother(John)}Knows(John,x) Knows(x,OJ) {fail}

• Last unification fails: only because x can’t take values John and OJ at the same time– But we know that if John knows x, and everyone (x) knows OJ, we should be

able to infer that John knows OJ

• Problem is due to use of same variable x in both sentences

• Simple solution: Standardizing apart eliminates overlap of variables, e.g., Knows(z,OJ)

Page 22: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

22

Unification

• To unify Knows(John,x) and Knows(y,z),

θ = {y/John, x/z } or θ = {y/John, x/John, z/John}

• The first unifier is more general than the second.

• There is a single most general unifier (MGU) that is unique up to renaming of variables.

MGU = { y/John, x/z }

• General algorithm in Figure 9.1 in the text

Page 23: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

23

Unification Algorithm

Page 24: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

24

Knowledge engineering in FOL

1. Identify the task

2. Assemble the relevant knowledge

3. Decide on a vocabulary of predicates, functions, and constants

4. Encode general knowledge about the domain

5. Encode a description of the specific problem instance

6. Pose queries to the inference procedure and get answers

7. Debug the knowledge base

Page 25: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

25

The electronic circuits domain

1. Identify the task– Does the circuit actually add properly?

2. Assemble the relevant knowledge3. Composed of wires and gates; Types of gates (AND, OR, XOR, NOT)

–– Irrelevant: size, shape, color, cost of gates–

4. Decide on a vocabulary5. Alternatives:

–Type(X1) = XOR (function)Type(X1, XOR) (binary predicate)XOR(X1) (unary predicate)

Page 26: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

26

The electronic circuits domain

4. Encode general knowledge of the domain– t1,t2 Connected(t1, t2) Signal(t1) = Signal(t2)

– t Signal(t) = 1 Signal(t) = 0

– 1 ≠ 0

– t1,t2 Connected(t1, t2) Connected(t2, t1)

– g Type(g) = OR Signal(Out(1,g)) = 1 n Signal(In(n,g)) = 1

– g Type(g) = AND Signal(Out(1,g)) = 0 n Signal(In(n,g)) = 0–

– g Type(g) = XOR Signal(Out(1,g)) = 1 Signal(In(1,g)) ≠ Signal(In(2,g))

– g Type(g) = NOT Signal(Out(1,g)) ≠ Signal(In(1,g))

Page 27: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

27

The electronic circuits domain

5. Encode the specific problem instanceType(X1) = XOR Type(X2) = XORType(A1) = AND Type(A2) = ANDType(O1) = OR

Connected(Out(1,X1),In(1,X2)) Connected(In(1,C1),In(1,X1))Connected(Out(1,X1),In(2,A2)) Connected(In(1,C1),In(1,A1))Connected(Out(1,A2),In(1,O1)) Connected(In(2,C1),In(2,X1))Connected(Out(1,A1),In(2,O1)) Connected(In(2,C1),In(2,A1))Connected(Out(1,X2),Out(1,C1)) Connected(In(3,C1),In(2,X2))Connected(Out(1,O1),Out(2,C1)) Connected(In(3,C1),In(1,A2))

Page 28: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

28

The electronic circuits domain

6. Pose queries to the inference procedure7. What are the possible sets of values of all the terminals for the adder

circuit?

i1,i2,i3,o1,o2 Signal(In(1,C1)) = i1 Signal(In(2,C1)) = i2 Signal(In(3,C1)) = i3 Signal(Out(1,C1)) = o1 Signal(Out(2,C1)) = o2

7. Debug the knowledge base8. May have omitted assertions like 1 ≠ 0

Page 29: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

CS-171 Final Review• First-Order Logic, Knowledge Representation (8.1-8.5)• Constraint Satisfaction Problems (6.1-6.4)• Machine Learning (18.1-18.12, 20.1-20.3.2)

• Questions on any topic

• Review pre-mid-term material if time and class interest

• Please review your quizzes, mid-term, and old CS-171 tests• At least one question from a prior quiz or old CS-171 test will

appear on the Final Exam (and all other tests)

Page 30: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

Constraint Satisfaction Problems• What is a CSP?

– Finite set of variables X1, X2, …, Xn

– Nonempty domain of possible values for each variable D1, D2, …, Dn

– Finite set of constraints C1, C2, …, Cm

• Each constraint Ci limits the values that variables can take, • e.g., X1 ≠ X2

– Each constraint Ci is a pair <scope, relation>• Scope = Tuple of variables that participate in the constraint.• Relation = List of allowed combinations of variable values.

May be an explicit list of allowed combinations.May be an abstract relation allowing membership testing and listing.

• CSP benefits– Standard representation pattern– Generic goal and successor functions– Generic heuristics (no domain specific expertise).

Page 31: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

CSPs --- what is a solution?

• A state is an assignment of values to some or all variables.

• An assignment is complete when every variable has a value. • An assignment is partial when some variables have no values.• An assignment is consistent when no constraints are violated.

• A solution to a CSP is a complete and consistent assignment.

Page 32: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

CSP example: map coloring

• Variables: WA, NT, Q, NSW, V, SA, T• Domains: Di={red,green,blue}• Constraints:adjacent regions must have

different colors.• E.g. WA NT

Page 33: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

CSP example: Map coloring solution

• A solution is:– A complete and consistent assignment.– All variables assigned, all constraints satisfied.

• E.g., {WA=red, NT=green, Q=red, NSW=green,V=red, SA=blue, T=green}

Page 34: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

Constraint graphs

• Constraint graph:

• nodes are variables

• arcs are binary constraints

• Graph can be used to simplify search e.g. Tasmania is an independent subproblem

(will return to graph structure later)

Page 35: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

Backtracking search

• Similar to Depth-first search– At each level, picks a single variable to explore– Iterates over the domain values of that variable

• Generates kids one at a time, one per value

• Backtracks when a variable has no legal values left

• Uninformed algorithm– No good general performance

Page 36: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

function BACKTRACKING-SEARCH(csp) return a solution or failurereturn RECURSIVE-BACKTRACKING({} , csp)

function RECURSIVE-BACKTRACKING(assignment, csp) return a solution or failureif assignment is complete then return assignmentvar SELECT-UNASSIGNED-VARIABLE(VARIABLES[csp],assignment,csp)for each value in ORDER-DOMAIN-VALUES(var, assignment, csp) do

if value is consistent with assignment according to CONSTRAINTS[csp] thenadd {var=value} to assignment result RRECURSIVE-BACTRACKING(assignment, csp)if result failure then return resultremove {var=value} from assignment

return failure

Backtracking search (Figure 6.5)

Page 37: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

Minimum remaining values (MRV)

var SELECT-UNASSIGNED-VARIABLE(VARIABLES[csp],assignment,csp)

• A.k.a. most constrained variable heuristic

• Heuristic Rule: choose variable with the fewest legal moves– e.g., will immediately detect failure if X has no legal values

Page 38: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

Degree heuristic for the initial variable

• Heuristic Rule: select variable that is involved in the largest number of constraints on other unassigned variables.

• Degree heuristic can be useful as a tie breaker.

• In what order should a variable’s values be tried?

Page 39: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

function BACKTRACKING-SEARCH(csp) return a solution or failurereturn RECURSIVE-BACKTRACKING({} , csp)

function RECURSIVE-BACKTRACKING(assignment, csp) return a solution or failureif assignment is complete then return assignmentvar SELECT-UNASSIGNED-VARIABLE(VARIABLES[csp],assignment,csp)for each value in ORDER-DOMAIN-VALUES(var, assignment, csp) do

if value is consistent with assignment according to CONSTRAINTS[csp] thenadd {var=value} to assignment result RRECURSIVE-BACTRACKING(assignment, csp)if result failure then return resultremove {var=value} from assignment

return failure

Backtracking search (Figure 6.5)

Page 40: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

Least constraining value for value-ordering

• Least constraining value heuristic

• Heuristic Rule: given a variable choose the least constraining value– leaves the maximum flexibility for subsequent variable assignments

Page 41: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

Forward checking

• Can we detect inevitable failure early?– And avoid it later?

• Forward checking idea: keep track of remaining legal values for unassigned variables.

• When a variable is assigned a value, update all neighbors in the constraint graph.• Forward checking stops after one step and does not go beyond immediate neighbors.

• Terminate search when any variable has no legal values.

Page 42: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

Forward checkingCheck only neighbors; delete any inconsistent values

• Assign {WA=red}

• Effects on other variables connected by constraints to WA– NT can no longer be red– SA can no longer be red

Red

NotRed

NotRed

Page 43: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

Forward checkingCheck only neighbors; delete any inconsistent values

• Assign {Q=green}

• Effects on other variables connected by constraints with WA– NT can no longer be green– NSW can no longer be green– SA can no longer be green

• MRV heuristic would automatically select NT or SA next

Red

Not RedNot green

Green

Not RedNot green

Not green

We already have failure,but Forward Checking istoo simple to detect it now.

Page 44: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

Forward checkingCheck only neighbors; delete any inconsistent values

• If V is assigned blue

• Effects on other variables connected by constraints with WA– NSW can no longer be blue– SA is empty

• FC has detected that partial assignment is inconsistent with the constraints and backtracking can occur.

Red

Not RedNot green

Green

Not RedNot greenNot blue

Not greenNot blue

Blue

Page 45: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

Arc consistency algorithm (AC-3)function AC-3(csp) returns false if inconsistency found, else true, may reduce csp domains

inputs: csp, a binary CSP with variables {X1, X2, …, Xn}

local variables: queue, a queue of arcs, initially all the arcs in csp/* initial queue must contain both (Xi, Xj) and (Xj, Xi) */

while queue is not empty do(Xi, Xj) REMOVE-FIRST(queue)

if REMOVE-INCONSISTENT-VALUES(Xi, Xj) then

if size of Di = 0 then return false

for each Xk in NEIGHBORS[Xi] − {Xj} do

add (Xk, Xi) to queue if not already there

return true

function REMOVE-INCONSISTENT-VALUES(Xi, Xj) returns true iff we delete a

value from the domain of Xi

removed falsefor each x in DOMAIN[Xi] do

if no value y in DOMAIN[Xj] allows (x,y) to satisfy the constraints

between Xi and Xj

then delete x from DOMAIN[Xi]; removed true

return removed

(from Mackworth, 1977)

Page 46: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

Arc consistency (AC-3)Like Forward Checking, but exhaustive until quiescence

• An Arc X Y is consistent iffor every value x of X there is some value y of Y consistent with x

(note that this is a directed property)

• Consider state of search after WA and Q are assigned & FC is done:

SA NSW is consistent ifSA=blue and NSW=red

Page 47: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

• X Y is consistent iffor every value x of X there is some value y of Y consistent with x

• NSW SA is consistent ifNSW=red and SA=blueNSW=blue and SA=???

• NSW=blue can be pruned: No current domain value of SA is consistent

Arc consistency (AC-3)Like Forward Checking, but exhaustive until quiescence

Page 48: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

• Enforce arc-consistency:Arc can be made consistent by removing blue from NSW

• Continue to propagate constraints….– Check V NSW– Not consistent for V = red – Remove red from V

Arc consistency (AC-3)Like Forward Checking, but exhaustive until quiescence

Page 49: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

Continue to propagate constraints….

• SA NT is not consistent– and cannot be made consistent (Failure!)

• Arc consistency detects failure earlier than FC– Requires more computation: Is it worth the effort??

Arc consistency (AC-3)Like Forward Checking, but exhaustive until quiescence

Page 50: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

Local search for CSPs• Use complete-state representation

– Initial state = all variables assigned values– Successor states = change 1 (or more) values

• For CSPs– allow states with unsatisfied constraints (unlike backtracking)– operators reassign variable values– hill-climbing with n-queens is an example

• Variable selection: randomly select any conflicted variable

• Value selection: min-conflicts heuristic– Select new value that results in a minimum number of conflicts with the other variables

Page 51: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

Local search for CSPfunction MIN-CONFLICTS(csp, max_steps) return solution or failure

inputs: csp, a constraint satisfaction problemmax_steps, the number of steps allowed before giving up

current an initial complete assignment for cspfor i = 1 to max_steps do

if current is a solution for csp then return currentvar a randomly chosen, conflicted variable from VARIABLES[csp]value the value v for var that minimize CONFLICTS(var,v,current,csp)set var = value in current

return failure

Page 52: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

Min-conflicts example 1

Use of min-conflicts heuristic in hill-climbing.

h=5 h=3 h=1

Page 53: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

CS-171 Final Review• First-Order Logic, Knowledge Representation (8.1-8.5)• Constraint Satisfaction Problems (6.1-6.4)• Machine Learning (18.1-18.12, 20.1-20.3.2)

• Questions on any topic

• Review pre-mid-term material if time and class interest

• Please review your quizzes, mid-term, and old CS-171 tests• At least one question from a prior quiz or old CS-171 test will

appear on the Final Exam (and all other tests)

Page 54: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

54

The importance of a good representation

• Properties of a good representation:

• Reveals important features • Hides irrelevant detail• Exposes useful constraints• Makes frequent operations easy-to-do• Supports local inferences from local features

• Called the “soda straw” principle or “locality” principle• Inference from features “through a soda straw”

• Rapidly or efficiently computable• It’s nice to be fast

Page 55: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

55

Reveals important features / Hides irrelevant detail

• “You can’t learn what you can’t represent.” --- G. Sussman

• In search: A man is traveling to market with a fox, a goose, and a bag of oats. He comes to a river. The only way across the river is a boat that can hold the man and exactly one of the fox, goose or bag of oats. The fox will eat the goose if left alone with it, and the goose will eat the oats if left alone with it.

• A good representation makes this problem easy:

111000101010111100010101

0000 1101

1011

0100 1110

0010 1010 1111

0001

0101

Page 56: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

56

Terminology

• Attributes– Also known as features, variables, independent variables,

covariates

• Target Variable– Also known as goal predicate, dependent variable, …

• Classification– Also known as discrimination, supervised classification, …

• Error function– Objective function, loss function, …

Page 57: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

57

Inductive learning

• Let x represent the input vector of attributes

• Let f(x) represent the value of the target variable for x– The implicit mapping from x to f(x) is unknown to us– We just have training data pairs, D = {x, f(x)} available

• We want to learn a mapping from x to f, i.e., h(x; q) is “close” to f(x) for all training data points x

q are the parameters of our predictor h(..)

• Examples:– h(x; q) = sign(w1x1 + w2x2+ w3)

– hk(x) = (x1 OR x2) AND (x3 OR NOT(x4))

Page 58: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

58

Empirical Error Functions

• Empirical error function:

E(h) = x distance[h(x; ) , f]

e.g., distance = squared error if h and f are real-valued (regression) distance = delta-function if h and f are categorical (classification)

Sum is over all training pairs in the training data D

In learning, we get to choose

1. what class of functions h(..) that we want to learn – potentially a huge space! (“hypothesis space”)

2. what error function/distance to use - should be chosen to reflect real “loss” in problem - but often chosen for mathematical/algorithmic convenience

Page 59: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

59

Decision Tree Representations

• Decision trees are fully expressive– can represent any Boolean function– Every path in the tree could represent 1 row in the truth table– Yields an exponentially large tree

• Truth table is of size 2d, where d is the number of attributes

Page 60: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

60

Pseudocode for Decision tree learning

Page 61: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

61

Entropy with only 2 outcomes

Consider 2 class problem: p = probability of class 1, 1 – p = probability of class 2

In binary case, H(p) = - p log p - (1-p) log (1-p)

H(p)

0.5 10

1

p

Page 62: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

62

Information Gain

• H(p) = entropy of class distribution at a particular node

• H(p | A) = conditional entropy = average entropy of conditional class distribution, after we have partitioned the data according to the values in A

• Gain(A) = H(p) – H(p | A)

• Simple rule in decision tree learning– At each internal node, split on the node with the largest information

gain (or equivalently, with smallest H(p|A))

• Note that by definition, conditional entropy can’t be greater than the entropy

Page 63: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

63

Overfitting and Underfitting

X

Y

Page 64: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

64

A Complex Model

X

Y

Y = high-order polynomial in X

Page 65: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

65

A Much Simpler Model

X

Y

Y = a X + b + noise

Page 66: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

66

How Overfitting affects Prediction

PredictiveError

Model Complexity

Error on Training Data

Error on Test Data

Ideal Rangefor Model Complexity

OverfittingUnderfitting

Page 67: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

67

Training and Validation Data

Full Data Set

Training Data

Validation Data

Idea: train eachmodel on the“training data”

and then testeach model’saccuracy onthe validation data

Page 68: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

68

The k-fold Cross-Validation Method

• Why just choose one particular 90/10 “split” of the data?– In principle we could do this multiple times

• “k-fold Cross-Validation” (e.g., k=10)– randomly partition our full data set into k disjoint subsets (each

roughly of size n/k, n = total number of training data points)• for i = 1:10 (here k = 10)

– train on 90% of data,– Acc(i) = accuracy on other 10%

• end

• Cross-Validation-Accuracy = 1/k i Acc(i)

– choose the method with the highest cross-validation accuracy– common values for k are 5 and 10– Can also do “leave-one-out” where k = n

Page 69: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

69

Disjoint Validation Data Sets

Full Data Set

Training Data

Validation Data (aka Test Data)

Validation Data

1st partition 2nd partition

3rd partition 4th partition 5th partition

Page 70: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

70

Classification in Euclidean Space

• A classifier is a partition of the space x into disjoint decision regions– Each region has a label attached – Regions with the same label need not be contiguous– For a new test point, find what decision region it is in, and predict

the corresponding label

• Decision boundaries = boundaries between decision regions– The “dual representation” of decision regions

• We can characterize a classifier by the equations for its decision boundaries

• Learning a classifier searching for the decision boundaries that optimize our objective function

Page 71: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

71

Decision Tree Example

t1t3

t2

Income

DebtIncome > t1

Debt > t2

Income > t3

Note: tree boundaries are linear and axis-parallel

Page 72: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

72

Another Example: Nearest Neighbor Classifier

• The nearest-neighbor classifier– Given a test point x’, compute the distance between x’ and each

input data point – Find the closest neighbor in the training data– Assign x’ the class label of this neighbor– (sort of generalizes minimum distance classifier to exemplars)

• If Euclidean distance is used as the distance measure (the most common choice), the nearest neighbor classifier results in piecewise linear decision boundaries

• Many extensions– e.g., kNN, vote based on k-nearest neighbors– k can be chosen by cross-validation

Page 73: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

73

Overall Boundary = Piecewise Linear

1

1

1

2

2

2

Feature 1

Feature 2

?

Decision Region for Class 1

Decision Region for Class 2

Page 74: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

74

Page 75: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

75

Page 76: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

76

Page 77: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

77

Linear Classifiers

• Linear classifier single linear decision boundary (for 2-class case)

• We can always represent a linear decision boundary by a linear equation:

w1 x1 + w2 x2 + … + wd xd = S wj xj = wt x = 0

• In d dimensions, this defines a (d-1) dimensional hyperplane– d=3, we get a plane; d=2, we get a line

• For prediction we simply see if S wj xj > 0

• The wi are the weights (parameters)– Learning consists of searching in the d-dimensional weight space for the set of weights

(the linear boundary) that minimizes an error measure– A threshold can be introduced by a “dummy” feature that is always one; it weight

corresponds to (the negative of) the threshold

• Note that a minimum distance classifier is a special (restricted) case of a linear classifier

Page 78: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

78

0 1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

FEATURE 1

FE

AT

UR

E 2

Minimum ErrorDecision Boundary

Page 79: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

79

The Perceptron Classifier (pages 729-731 in text)

• The perceptron classifier is just another name for a linear classifier for 2-class data, i.e.,

output(x) = sign( wj xj )

• Loosely motivated by a simple model of how neurons fire

• For mathematical convenience, class labels are +1 for one class and -1 for the other

• Two major types of algorithms for training perceptrons– Objective function = classification accuracy (“error correcting”)– Objective function = squared error (use gradient descent)

Page 80: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

80

The Perceptron Classifier (pages 729-731 in text)

InputAttributes(Features)

Weights For InputAttributes

Bias or Threshold

Transfer Function Output

Page 81: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

81

Two different types of perceptron output

o(f)

f

x-axis below is f(x) = f = weighted sum of inputsy-axis is the perceptron output

(s f)

Thresholded output, takes values +1 or -1

Sigmoid output, takesreal values between -1 and +1

The sigmoid is in effect an approximationto the threshold function above, buthas a gradient that we can use for learning

f

Page 82: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

82

Pseudo-code for Perceptron Training

• Inputs: N features, N targets (class labels), learning rate h • Outputs: a set of learned weights

Initialize each wj (e.g.,randomly)

While (termination condition not satisfied)for i = 1: N % loop over data points (an iteration)

for j= 1 : d % loop over weights deltawj = h ( y(i) – s[f(i)] ) ¶s[f(i)] xj(i) wj = wj + deltawj

endcalculate termination conditionend

Page 83: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

83

Support Vector Machines (SVM): “Modern perceptrons”(section 18.9, R&N)• A modern linear separator classifier

– Essentially, a perceptron with a few extra wrinkles

• Constructs a “maximum margin separator”– A linear decision boundary with the largest possible distance from the

decision boundary to the example points it separates– “Margin” = Distance from decision boundary to closest example– The “maximum margin” helps SVMs to generalize well

• Can embed the data in a non-linear higher dimension space– Constructs a linear separating hyperplane in that space

• This can be a non-linear boundary in the original space– Algorithmic advantages and simplicity of linear classifiers– Representational advantages of non-linear decision boundaries

• Currently most popular “off-the shelf” supervised classifier.

Page 84: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

84

Constructs a “maximum margin separator”

Page 85: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

85

Can embed the data in a non-linear higher dimension space

Page 86: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

86

Multi-Layer Perceptrons (p744-747 in text)

• What if we took K perceptrons and trained them in parallel and then took a weighted sum of their sigmoidal outputs?– This is a multi-layer neural network with a single “hidden” layer

(the outputs of the first set of perceptrons)– If we train them jointly in parallel, then intuitively different

perceptrons could learn different parts of the solution• Mathematically, they define different local decision boundaries

in the input space, giving us a more powerful model

• How would we train such a model?– Backpropagation algorithm = clever way to do gradient descent– Bad news: many local minima and many parameters

• training is hard and slow– Neural networks generated much excitement in AI research in the

late 1980’s and 1990’s• But now techniques like boosting and support vector machines

are often preferred

Page 87: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

87

Multi-Layer Perceptrons (Artificial Neural Networks) (sections 18.7.3-18.7.4 in textbook)

Page 88: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

88

Naïve Bayes Model (section 20.2.2 R&N 3rd ed.)

X1 X2 X3

C

Xn

Basic Idea: We want to estimate P(C | X1,…Xn), but it’s hard to think about computing the probability of a class from input attributes of an example.

Solution: Use Bayes’ Rule to turn P(C | X1,…Xn) into an equivalent expression that involves only P(C) and P(Xi | C).

We can estimate P(C) easily from the frequency with which each class appears within our training data, and P(Xi | C) from the frequency with which each Xi appears in each class C within our training data.

Page 89: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

89

Naïve Bayes Model (section 20.2.2 R&N 3rd ed.)

X1 X2 X3

C

Xn

Bayes Rule: P(C | X1,…Xn) is proportional to P (C) Pi P(Xi | C)[note: denominator P(X1,…Xn) is constant for all classes, may be ignored.]

Features Xi are conditionally independent given the class variable C• choose the class value ci with the highest P(ci | x1,…, xn)• simple to implement, often works very well• e.g., spam email classification: X’s = counts of words in emails

Conditional probabilities P(Xi | C) can easily be estimated from labeled date• Problem: Need to avoid zeroes, e.g., from limited training data• Solutions: Pseudo-counts, beta[a,b] distribution, etc.

Page 90: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

90

Naïve Bayes Model (2)

P(C | X1,…Xn) = a P P(Xi | C) P (C)

Probabilities P(C) and P(Xi | C) can easily be estimated from labeled data

P(C = cj) ≈ #(Examples with class label cj) / #(Examples)

P(Xi = xik | C = cj) ≈ #(Examples with Xi value xik and class label cj)

/ #(Examples with class label cj)

Usually easiest to work with logslog [ P(C | X1,…Xn) ]

= log + [a log P(Xi | C) + log P (C) ]

DANGER: Suppose ZERO examples with Xi value xik and class label cj ?An unseen example with Xi value xik will NEVER predict class label cj !

Practical solutions: Pseudocounts, e.g., add 1 to every #() , etc.Theoretical solutions: Bayesian inference, beta distribution, etc.

Page 91: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

91

Example of Regression

• Vehicle price estimation problem– Features x: Fuel type, The number of doors, Engine size– Targets y: Price of vehicle– Training Data: (fit the model)

– Testing Data: (evaluate the model)

Fuel type The number of doors

Engine size(61 – 326)

Price

1 4 70 11000

1 2 120 17000

2 2 310 41000

Fuel type The number of doors

Engine size(61 – 326)

Price

2 4 150 21000

Page 92: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

92

Example of Regression

• Vehicle price estimation problem

– ϴ = [-300, -200 , 700 , 130]– Ex #1 = -300 + -200 *1 + 4*700 + 70*130 = 11400 – Ex #2 = -300 + -200 *1 + 2*700 + 120*130 = 16500 – Ex #3 = -300 + -200 *2 + 2*700 + 310*130 = 41000

• Mean Absolute Training Error = 1/3 *(400 + 500 + 0) = 300

– Test = -300 + -200 *2 + 4*700 + 150*130 = 21600• Mean Absolute Testing Error = 1/1 *(600) = 600

Fuel type #of doors Engine size Price

1 4 70 11000

1 2 120 17000

2 2 310 41000

Fuel type The number of doors Engine size(61 – 326)

Price

2 4 150 21000

Page 93: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

93

Linear regression

• Define form of function f(x) explicitly• Find a good f(x) within that family

0 10 200

20

40

Tar

get

y

Feature X1

“Predictor”:Evaluate line: ӯ = ϴ0 + ϴ1 * X1 return ӯ

ӯ = Predicted target value (Black line)

(c) Alexander Ihler

New instance with X1=8Predicted value =17

Y = 5 + 1.5*X1

Page 94: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

94

More dimensions?

010

2030

40

0

10

20

30

20

22

24

26

010

2030

40

0

10

20

30

20

22

24

26

x1 x2

y

x1 x2

y

(c) Alexander Ihler

Page 95: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

95

Notation

Define “feature” x0 = 1 (constant)Then

(c) Alexander Ihler

n = the number of features in dataset

Ӯ is a plane in n+1 dimension space

Page 96: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

96

Measuring error

0 200

Error or “residual”

Prediction

Observation

(c) Alexander Ihler

Red points = Real target valuesBlack line = ӯ (predicted value) ӯ = ϴ0 + ϴ1 * XBlue lines = Error (Difference between real value y and predicted value ӯ)

Page 97: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

97

Mean Squared Error

• How can we quantify the error?

• Y= Real target value in dataset,• ӯ = Predicted target value by ϴ*X

• Training Error: m= the number of training instances,• Testing Error: Using a partition of Training error to check predicted

values. m= the number of testing instances,

(c) Alexander Ihler

m=number of instance of data

Page 98: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

98

Finding good parameters: SSE Minimum (m > n+1)

• Reordering, we have

(c) Alexander Ihler

• Most of the time, m > n– There may be no linear function that hits all the data exactly– Minimum of a function has gradient equal to zero (gradient is a

horizontal line.)

Just need to know how to compute parameters.

n +1=1+1 = 2m=3

Page 99: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

99

What is Clustering?

• You can say this “unsupervised learning”• There is not any label for each instance of data.• Clustering is alternatively called as “grouping”.• Clustering algorithms rely on a distance metric

between data points.– Data points close to each other are in a same cluster.

Page 100: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

100

Hierarchical Agglomerative Clustering

• Define a distance between clusters

• Initialize: every example is a cluster.• Iterate:

– Compute distances between all clusters – Merge two closest clusters

• Save both clustering and sequence of cluster operations as a Dendrogram.

Initially, every datum is a cluster

Page 101: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

101

Iteration 1

Merge two closest clusters

Page 102: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

102

Iteration 2

Cluster 2 Cluster 1

Page 103: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

103

Iteration 3

• Builds up a sequence of clusters (“hierarchical”)

• Algorithm complexity O(N2)

(Why?)

In matlab: “linkage” function (stats toolbox)

Page 104: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

104

Dendrogram

Two clustersThree clusters

Stop the process whenever there are enough number of clusters.

Page 105: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

105

K-Means Clustering

• Always there are K cluster. – In contrast, in agglomerative clustering, the number of clusters reduces.

• iterate: (until clusters remains almost fix).– For each cluster, compute cluster means: ( Xi is feature i)

– For each data point, find the closest cluster.

.,,1,)(: KkN

x

mk

kiCii

k

NimxiCKk

ki ,,1,minarg)(1

2

Page 106: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

106

2-means Clustering

• Select two points randomly.

Page 107: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

107

2-means Clustering

• Find distance of all points to these two points.

Page 108: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

108

2-means Clustering

• Find two clusters based on distances.

Page 109: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

109

2-means Clustering

• Find new means.• Green points are mean values

for two clusters.

Page 110: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

110

2-means Clustering

• Find new clusters.

Page 111: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

111

2-means Clustering

• Find new means.• Find distance of all points to

these two mean values.• Find new clusters.• Nothing is changed.

Page 112: CS-171 Final Review First-Order Logic, Knowledge Representation (8.1-8.5) Constraint Satisfaction Problems (6.1-6.4) Machine Learning (18.1-18.12, 20.1-20.3.2)

112

CS-171 Final Review

• First-Order Logic, Knowledge Representation (8.1-8.5)• Constraint Satisfaction Problems (6.1-6.4)• Machine Learning (18.1-18.12, 20.1-20.3.2)

• Questions on any topic

• Review pre-mid-term material if time and class interest

• Please review your quizzes, mid-term, and old CS-171 tests• At least one question from a prior quiz or old CS-171 test will

appear on the Final Exam (and all other tests)