L16 Machine Learning

38
241-320 Design Architecture and Engineering for Intelligent System Suntorn Witosurapot Contact Address:  Phone: 074 287369 or Email: [email protected] January 2010

Transcript of L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 1/38

241-320 Design Architecture and Engineeringfor Intelligent System 

Suntorn Witosurapot

Contact Address: Phone: 074 287369 or

Email: [email protected]

January 2010

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 2/38

Lecture 16: 

Machine Learning - Part 1-

(Learning from Observations) 

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 3/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learning

from Observations)  3

Motivation

An AI agent operating in a complex world requires anawful lot of knowledge:

 – state representations, constraints,

action descriptions, heuristics, probabilities, ...

More and more, AI agents are designed to acquireknowledge through learning (การเรยีนรู )

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 4/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learning

from Observations)  4

Outline

What is Learning?

Learning Agents

Introduction to inductive learning

Logic-based inductive learning:

 – Decision-tree induction

Function-based inductive learning

 – Neural nets

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 5/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learning

from Observations)  5

What’s Learning?

Learning is essential for unknown environments

 – i.e. when designer lacks omniscience

Learning is useful as system construction method

 – i.e. expose agent to reality rather than try towrite it down

Learning modifies the agent's decision mechanismsto improve performance

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 6/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learning

from Observations)  6

Outline

What is Learning?

Learning Agents

Inductive learning

Logic-based inductive learning:

 – Decision-tree induction

Function-based inductive learning

 – Neural nets

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 7/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learning

from Observations)  7

Learning agents

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 8/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learning

from Observations)  8

Learning agents (cont.)

Main idea:

 – agents should use their percepts not only for acting,but also for improving their future ability to act

Wide range of methods

Major design issue is thetype of feedback  that willbe available to the agent

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 9/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learning

from Observations)  9

Types of learning from feedback

Supervised learning

 – Given a set of example inputs and outputs

 – Goal is to learn a function relating the two

(เช น เทคนคิ Decision Tree) Unsupervised learning

 – Given inputs, but no outputs

 – Goal is to group input into different classes

(แยกแยะขอมูลออกเป็นกลุ มๆ เช น เทคนคิ Nearest

Neighborhood) 

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 10/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  10

Examples

Supervised learning

 – Taxi learning to brake with instructor

 – Spam filter

Unsupervised learning

 – Market research

 – Data mining

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 11/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  11

Other factors affects to learning

Representation of learned information

Availability of prior knowledge

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 12/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  12

Outline

Why learning?

Types of learning

Inductive learning

Logic-based inductive learning:

 – Decision-tree induction

Function-based inductive learning

 – Neural nets

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 13/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  13

Inductive Learning

เป็นการเรยีนรู จากเหตกุารณ์ หรอื Feedback โดยที ่รับทราบข อมูลหรอืคาความจรงิเพยีงบางส วน แตกพ็ยายามคนหาหรือประมาณการคาที ่แท จรงิ (หรอืคาที ่ใกลเคยีง) ของ

ข อมูลอ ่ืนๆ ได   Ex: การเรยีนรู ของพนักงานขาย โดย 

 –  ศ  ึกษาพฤตกิรรม / บุคลกิลูกค า / ความสนใจขณะสาธติส  ินค า 

 –  กอ็าจคนหาความตองการที ่แท จรงิของลูกค า เพ ่ือสรางความส ําเร็จในงานขายได 

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 14/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  14

Inductive learning

Simplest form: learn a function from examples

Let's call an example a pair (x,f(x)), where x is theinput and f(x) is the output of the function applied to x 

Pure inductive inference:

 – Given a collection of examples (aka training set ) off , return a function h  that approximates f 

 – h is called a hypothesis  How can we tell if a hypothesis is good?

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 15/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  15

Example

Construct/adjust h  to agree with f  on training set

(h is consistent if it agrees with f on all examples)

 

E.g., curve fitting:  

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 16/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  16

Example

Construct/adjust h  to agree with f  on training set

(h is consistent if it agrees with f on all examples)

 

E.g., curve fitting:  

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 17/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  17

Example

Construct/adjust h  to agree with f  on training set

(h is consistent if it agrees with f on all examples)

 

E.g., curve fitting:  

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 18/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  18

Example

Construct/adjust h  to agree with f  on training set

(h is consistent if it agrees with f on all examples)

 

E.g., curve fitting:  

Q: How do wedecide amongthese hypothesesthat all agree withour data?

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 19/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  19

What we desire from a hypothesis

Since we will use the hypothesis h  most oftento predict the output of f(x) on examples wehaven't seen yet, we want it to do well on

these We call this generalization 

Ideally we would like to find an h such that

h = f 

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 20/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  20

Tradeoff: complexity vs data-fit

Generally, the larger and more complex the hypothesisis, the better we can fit our data

However, we need to take into account thecomputational complexity  of learning

 – Fitting straight lines = easy

 – Fitting high-degree polynomials = harder

Also want to take into account how hard it is to use h.

 – Prefer fast computation time

Learning typically focuses on“simple” representations

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 21/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  21

Outline

Why learning?

Basic Ideas

Inductive learning

Logic-based inductive learning:

 – Decision-tree induction

Function-based inductive learning

 – Neural nets

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 22/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  22

Logic-Based Inductive Learning:Decision Tree Method

It is a supervised learning technique

 –  ใช  คาดคะเน หรอืทํานายเหตกุารณท์ ่ีจะเกดิข ้ึนลวงหนา ตามสถานการณต์างๆ ท ่ีเกดิข ้ึน โดยใช  ผลลัพธท์ ่ีได จากการตัดส  ินใจตามผังโครงสรางขอมูลแบบตนไม 

Widely used algorithm (even in our daily life)

Structure of Decision Tree:

 – Root & Leaves connecting with branches

 – Searching along any branch is upon the situation

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 23/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  23

Ex: More Complex of Decision Tree

Problem: decide whether to wait for a table at arestaurant, based on the following attributes:1. Alternate: is there an alternative restaurant nearby?

2. Bar: is there a comfortable bar area to wait in?

3. Fri/Sat: is today Friday or Saturday?

4. Hungry: are we hungry?

5. Patrons: number of people in the restaurant (None, Some, Full)

6. Price: price range ($, $$, $$$)

7. Raining: is it raining outside?8. Reservation: have we made a reservation?

9. Type: kind of restaurant (French, Italian, Thai, Burger)

10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 24/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  24

Attribute-based representations

Examples described by attribute values (Boolean, discrete,

continuous), e.g. situations where I will/won't wait for a table:

Classification of examples is positive (T) or negative (F)

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 25/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  25

Decision trees

One possible representation for hypotheses E.g., here is the “true” tree for deciding whether to wait:

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 26/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  26

Decision trees

Another possible representation for hypotheses

This decision tree looks less complex and morerealistic than the one in the previous slide – It concerns at hungriness, rather than estimate waiting time

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 27/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  27

Decision trees

Occam’s razor: prefer the simplest hypothesis consistent with data

หากพจิารณาตามเกณฑ์ของ “มดีโกนของอ็อกแคม” ข างตนจะเห็นวา Decision Tree ที ่เล็กที ่สดุนาจะเป็นอันที ่ดทีี ่สดุ 

แตกระบวนการสราง Decision Tree น้ันซ  ับซ  อนมากเพิ ่มข ้ึนตามจํานวน Node ที ่รวมดวย (Hypothesis spaces)

How many distinct decision trees with n Boolean attributes?

= number of Boolean functions= number of distinct truth tables with 2n rows = 22n

 

E.g., with 6 Boolean attributes, there are18,446,744,073,709,551,616 trees

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 28/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  28

Expressiveness

Decision trees can express any function of the input attributes.

E.g., for Boolean functions, truth table row → path to leaf:

Trivially, there is a consistent decision tree for any training setwith one path to leaf for each example, but it probably won'tgeneralize to new examples

Prefer to find more compact decision trees

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 29/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  29

Decision tree learning

Aim: find a small tree consistent with the training examples Idea: (recursively) choose "most significant" attribute as

root of (sub)tree

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 30/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  30

Choosing an attribute

Idea: a good attribute splits the examples into subsetsthat are (ideally) "all positive" or "all negative"

Patrons? is a better choice

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 31/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  31

Choosing an attribute via Information Theory

To implement Choose-Attribute in the DTL algorithm,

It is needed to find Information Content (Entropy):

I(P(v1), … , P(vn)) = Σi=1 -P(vi) log2 P(vi)

คาเอนโทรปีชุดข อมูลโดยเฉลี ่ย = ผลรวมของ (-log2(ความนาจะเป็นของข อมูลแตละตัว)

คาเอนโทรปีนี ้จะนําไปใช  ในการประเมิน “เน ้ือหาของขอมูลสารสนเทศ” ชุดตางๆ วา มีความเหมือนหรอืแตกตางกันได 

 –  ใช  ประกอบการพจิารณาตอไปวาจะสามารถลดจํานวนก ่ิงทเีหมือนกันๆออกไปไดหรอืไม ในกระบวนการแปลงตารางข อมูลเป็นแผนผังต นไม 

 –  ดูตัวอยางการโยนเหรยีญหัวก อย ในสไลดถั์ดไป 

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 32/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  32

Ex: ข อมูลการโยนเหรียญ 

ชุดข อมูล (M) = {หัว, ก อย}

ความนาจะเป็นในการออกหัว และก อย =P(หัว), P(ก อย) ตามลําดับ 

คาเอนโทรปีโดยเฉลี ่ยของขอมูลชุดนี ้ = I(M)

จะเห็นวา เม ่ือออกหัวหรือก อยหมด คาเอนโทรปีจะเป็นศูนย์ แตคานี ้จะเพ ่ิมข ้ึนเรื ่อยๆ จนสงูสดุท ่ีโอกาสของการเป็นหัวหรือก อยมเีทากัน ดังนั้น 

เอนโทรปีมีคาน อย จะบงช  ี ้ได วาชดุข อมูลนี ้มีความแตกตางกันน อย หรืออาจะเป็นชุดเดียวกัน หรือกรณีตรงขามท ่ีเอนโทรปีมีคามาก ชดุข อมูลจะแตกตางกันมากดวย 

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 33/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  33

Information gain

A chosen attribute A divides the training set E intosubsets E 1, … , E v 

according to their values for A,where A has v distinct values.

Information Gain (IG) or reduction in entropy fromthe attribute test:

 –  คดิจากคาเอนโทรปีรวม ลบด วยคาเอนโทรปีหลังจากเลือก attribute

อันหนึ ่งเป็นราก  Choose the attribute with the largest IG

 –  เพ ่ือนํามาใช  ในการพจิารณาเป็น “ราก” ส ําหรับการตัดส  ินใจตอไป 

∑= +++

+=

v

i ii

i

ii

iii

n p

n

n p

 p I 

n p

n p Aremainder 

1

),()(

)(),()( Aremainder n p

n

n p

 p I  A IG −

++

=

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 34/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  34

Information gain

For the training set,  p = n = 6, I(6/12, 6/12) = 1 bit

Consider the attributes Patrons and Type (and others too):

Since Patrons has the highest IG of all attributes, so it ischosen by the DTL algorithm as the root

bits0)]4

2,

4

2(

12

4)

4

2,

4

2(

12

4)

2

1,

2

1(

12

2)

2

1,

2

1(

12

2[1)(

bits0541.)]6

4,6

2(12

6)0,1(12

4)1,0(12

2[1)(

=+++−=

=++−=

 I  I  I  I Type IG

 I  I  I Patrons IG

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 35/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  35

Example (cont.)

Decision tree learned from the 12 examples:

Substantially simpler than “true” tree---a more complexhypothesis isn’t justified by small amount of data

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 36/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  36

Performance measurement

Q: How do we know that h ≈ f ?1. Use theorems of computational/statistical learning theory

2. Try h on a new test set of examples

(use same distribution over example space as training set)

Learning curve = % correct on test set as a function oftraining set size

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 37/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  37

Summary

Learning is needed forunknown environments, lazy designers

Learning agent = feedback + learning element +

performance element

For supervised learning, the aim is to find a simplehypothesis approximately consistent with trainingexamples

Decision tree learning using information gain

Learning performance = prediction accuracymeasured on test set

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 38/38

241 320 Design Architecture &S M hi L i P t 1 (L if Ob i ) 38

Reading 

บทท ี่ 10

การเรียนรูของเครื อ่งจักร

(Machine Learning)

บทท ี่ 6การเรียนรูของเครื อ่ง (หน า 153 - 163)