Topic_6
Transcript of Topic_6
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
CSCI 548/B480: Introduction to BioinformaticsCSCI 548/B480: Introduction to BioinformaticsFall 2002Fall 2002
Topic 5: Machine Intelligence- Learning and Evolution
Dr. Jeffrey Huang, Assistant ProfessorDepartment of Computer and Information Science, IUPUI
E-mail: [email protected]
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Machine IntelligenceMachine Intelligence
• Machine Learning– The subfield of AI concerned with intelligent systems that
learn. – The computational study of algorithms that improve
performance based on experience.
• The attempt to build intelligent entities:– We must understand intelligent entities first– Computational Brain– Mathematics:
• Philosophy staked most of the ideas of AI but to make it a formal science the mathematical formalization is needed in
– Computation– Logic– Probability
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Behavior-Based AI vs. Knowledge BasedBehavior-Based AI vs. Knowledge Based
• Definitions of Machine Learning– Reasoning
• The effort to make computers think and solve problem• The study of mental faculties through the use of computational
models
– Behavior• Make machines to perform human actions requiring intelligence• Seeks to explain intelligent behavior in terms of computational
processes
• Agents
EnvironmentEnvironment
percepts
actions
sensors
effectors
agent?
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Operational AgentsOperational Agents
• Operational Views of Intelligence:– The ability to perform intellectual tasks
• Prove theorems, play chess, solve puzzle
• Focus on what goes on “between the ears”
• Emphasize the ability to build and effectively use mental models
– The ability to perform intellectually challenging “real world” tasks• Medical diagnosis, tax advising, financial investing
• Introduce new issues such as: critical interactions with the world, model grounding, uncertainty
– The ability to survive, adapt, and function in a constantly changing world
• Autonomous agents
• Vision, locomotion, and manipulation,… many I/O issues
• Self-assessment, learning, curiosity, etc.
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Building Intelligent ArtifactsBuilding Intelligent Artifacts
• Symbolic Approaches:– Construct goal-oriented symbol manipulation systems– Focus on high end abstract thinking
• Non-symbolic approaches:– Build performance-oriented systems– Focus on behavior
• Need both in tightly coupled form– Difficult in building such systems– Growing need to automate this process– Good approach: Evolutionary Algorithms
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
• Behavior-Based AI– Behavior-Based AI vs. Knowledge-Based – "Situated" in environment – Multiple competencies ('routines')– Autonomy– Adaptation and Competition
• Artificial Life (A-Life)– Agents: Reactive Behavior
– Abstracting the logical principles of living organism
– Collective Behavior : Competition and Cooperation
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
• Classification: – predicts categorical class labels– classifies data (constructs a model) based on the training set and the
values (class labels) in a classifying attribute and uses it in classifying new data
• Prediction: – models continuous-valued functions, i.e., predicts unknown or missing
values
Classification vs. PredictionClassification vs. Prediction
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Classification—A Two-Step ProcessClassification—A Two-Step Process
• Model construction: describing a set of predetermined classes– Each tuple/sample is assumed to belong to a predefined class, as
determined by the class label attribute– The set of tuples used for model construction: training set– The model is represented as classification rules, decision trees, or
mathematical formulae
• Model usage: for classifying future or unknown objects– Estimate accuracy of the model
• The known label of test sample is compared with the classified result from the model
• Accuracy rate is the percentage of test set samples that are correctly classified by the model
• Test set is independent of training set, otherwise over-fitting will occur
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Classification ProcessClassification Process
TrainingData
NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no
ClassificationAlgorithms
IF rank = ‘professor’OR years > 6THEN tenured = ‘yes’
Classifier(Model)
Model ConstructionModel Construction
Use the Model in PredictionUse the Model in Prediction
TestingData
NAME RANK YEARS TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yes
Unseen Data
(Jeff, Professor, 2)
Tenured?
Classifier(Model)
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Supervised vs. Unsupervised LearningSupervised vs. Unsupervised Learning
• Supervised learning (classification)– Supervision: The training data (observations, measurements, etc.) are
accompanied by labels indicating the class of the observations
– New data is classified based on the training set
• Unsupervised learning (clustering)– The class labels of training data is unknown
– Given a set of measurements, observations, etc. with the aim of
establishing the existence of classes or clusters in the data
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Classification and PredictionClassification and Prediction
• Data Preparation– Data cleaning
• Preprocess data in order to reduce noise and handle missing values
– Relevance analysis (feature selection)• Remove the irrelevant or redundant attributes
– Data transformation• Generalize and/or normalize data
• Evaluating Classification Methods– Predictive accuracy– Speed and scalability
• time to construct the model• time to use the model
– Robustness: handling noise and missing values– Scalability: efficiency in disk-resident databases – Interpretability: understanding and insight provided by the model– Goodness of rules
• decision tree size• compactness of classification rules
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
From Learning to EvolutionaryFrom Learning to Evolutionary
• Optimization– Accomplishing abstract task = Solving problem
= searching through a space of potential solution
finding the “best solution”
an optimization process– Classical Exhaustive Methods??– Large Space?? Special machine learning technique
• Evolution Algorithms– Stochastic Algorithms– Search methods model some phenomena:
• Genetic Inheritance
• Darwinian strife for survival
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
• “… the metaphor underlying genetic algorithms is that of natural evolution. In evolution, the problem each species faces is one of searching for beneficial adaptations to a complicated and changing environment. The ‘knowledge’ that each species has gained is embodied in the makeup of chromosomes of its members”
- L. David and M. Steenstrup, “Genetic Algorithms and Simulated Annealing”, pp. 1-11, Kaufmann, 1987
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
The Essence ComponentsThe Essence Components
– Genetic representation for potential solutions to the problem
– A way to create an Initial population of potential solutions
– An evaluation function that plays the ole of the environment, rating solutions in term of their “fitness”
i.e. the use of fitness to determine survival and reproductive rates
– Genetic operators that alter the composition of children
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Evolutionary Algorithm Search ProcedureEvolutionary Algorithm Search Procedure
Randomly generate aninitial population M(0)
Randomly generate aninitial population M(0)
Compute and save the fitness u(m) for each individual m in the current population M(t)
Define selection probabilities p(m) for each individual m in M(t) so that p(m) is proportional to u(m)
Generate M(t+1) by probabilitically selecting individuals to produce offspring via genetic operations(Crossover and mutation)
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Historical BackgroundHistorical Background
• Three paradigms emerged in the 1960s:– Genetic Algorithms
• Introduced by Holland (MSU) De Jong (GMU)
• Envisioned for broad range of “adaptive systems”
– Evolution Strategies• Introduced by Rechenberg
• Focused on real-valued parameter optimization
– Evolutionary Programming• Introduced by Fogel and Koza
• Applied to AI and machine learning problem
• Today:– Wide variety of evolutionary algorithms– Applied to many area of science and engineering
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Examples of Evolutionary AIExamples of Evolutionary AI
1. Parameter Tuning• Pervasiveness of parameterized models• Complex behavioral changes due to non-linear interactions• Example:
• Weights of an Artificial Neural networks
• Parameters of a heuristic evolution function
• Parameter of a rule induction system
• Parameter of membership functions
• Goal: evolve over time useful set of discrete/ continuous parameter
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
1. Evolving Structure• Effect behavior change via more complex structures• Example:
• Selecting/constructing the topology of ANNs
• Selecting/constructing the feature sets
• Selecting/constructing plans/scenarios
• Selecting/constructing membership functions
• Goal: evolve useful structure over time
3. Evolving Programs• Goal: acquire new behaviors and adapt existing ones• Example:
• Acquire/adapt behavioral rules sets
• Acquire/adapt arm/joint control programs
• Acquire/adapt task-oriented programming code
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
How Does Genetic Algorithm Work?How Does Genetic Algorithm Work?
A simple example of function optimization
Find max f(x)=x2, for x [0, 4]
1. Representation:• Genotype (chromosome): internally points in the search space are
represented as (binary) string over some alphabet
• Phenotype: the expressed traits of an individual
• With a precision for x in [0,4] of 10-4 : it needs14 bits– 8,000 213 < 10,000 < 214 16,000
• Simple fixed length binary– Assigned 0.0 to the string 00 0000 0000 0000– Assign 0.0 + bin2dec(binary string)*4/(214 -1)
the string 00 0000 0000 0001 and so on– Phenotype 4.0 = genotype 11 1111 1111 1111
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
2. Initial population:– Create a population (pop_size) of chromosomes,
where each chromosome is a binary vector of 14 bits
– All 14 bits for each chromosome are initialized randomly
3. Evaluation function• Evaluation function eval for binary vectors v is
equal to the function f:
eval(v) = f(x)
ex; eval(v1)= f(x1) = fitness1
v1
v2
v3
v4
v5
v6
v7
v8
v9
v10
v11
v12
v13
v14
v15
v16
v17
v18
v19
v20
v21
v22
v23
v24
0000000000000000000000000001
……
11111111111111
0.04/(214 -1)
……
4.0
genotypegenotype PhenotypePhenotype
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
– Parameters• pop_size = 24,
• Prob. of Xover, pc = 0.6,
• Prob. of mutation, pm = 0.01
– Recombination: using genetic operations• Crossover (pc)
v1 =01111100010011 => v1’= 01110101011100
v2 =00010101011100 => v2’= 00011100010011
• Mutation (pm) v2’= 00011100010011 => v2”=
00011110010011
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
– Selection M(t) from M(t+1): using roulette wheel• Total fitness of the population
• Probability of selection probi for each chromosome vi
• Cumulative prob qi
• Generate random numbers rj, from [0,1], where j =1…pop_size
• Select chromosome vi such that qi-1 < rj <= qi
sizepop
iifitnessF
_
1
F
fitnessprob i
i
i
jii sizepopiprobq
1
_,...1 where,
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Homing to the Optimal SolutionHoming to the Optimal Solution
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Best-so-far Curve Best-so-far Curve
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Optimal Feature SubsetOptimal Feature Subset
• Search for the Subsets of Discriminatory Features– Combination optimization problem
– Two general approaches to identifying optimal subsets of features:
• Abstract measurement for important properties of good feature sets– Orthogonality (ex. PCA), information content, low variance– Less expensive process– Fall in suboptimal performance if the abstract measures do not correlate well
with actual performance
• Building a classifier from the feature subset and evaluating its performance on actual classification tasks.
– Better classification performance– the cost of building and testing classifiers prohibits any kind of systematic
evaluation of feature subsets
• suboptimal in practice: large numbers of candidate features cannot be handled by any form of systematic search
• 2N possible candidate subsets of N features.
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Inductive LearningInductive Learning
• Learning From Examples– Decision Tree (DT): – Information Theory (IT)– Question: what are the BEST attributes (Features)
for building the decision tree?– Answer: ‘BEST’ attribute is the one that it is ‘MOST’
informative and for whom ‘ambiguity/uncertainty’ is least
– Solution: Measure (information) contents using the expected amount of information provided by the attribute
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Classification by Decision Tree InductionClassification by Decision Tree Induction
• Decision tree – A flow-chart-like tree structure– Internal node denotes a test on an attribute– Branch represents an outcome of the test– Leaf nodes represent class labels or class
distribution
• Decision tree generation consists of two phases– Tree construction
• At start, all the training examples are at the root• Partition examples recursively based on selected
attributes– Tree pruning
• Identify and remove branches that reflect noise or outliers
• Use of decision tree: Classifying an unknown sample– Test the attribute values of the sample against the
decision tree
Exs. Class Size Color Surface
1 A Small Yellow Smooth
2 A Medium Red Smooth
3 A Medium Red Smooth
4 A Big Red Rough
5 B Medium Yellow Smooth
6 B Medium Yellow Smooth
color
yellow red
Asize
small medium
BA
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
• Entropy– Define an entropy function H such that
where pi: the probability associated with ith class– For a feature, the entropy is calculated for each value.– The sum of the entropy weighted by the probability of each value is
the entropy for that feature– Example: Toss a fair coin
if the coin is not fair, i.e. Pheads = 99%, then
So, by tossing the coin you get very little (extra) information (that you didn’t expect)
i
ii ppH log
bit 12
1log
2
1
2
1log
2
122
H
bits 08.0100
99log
100
99
100
1log
100
122
H
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
– In general, if you have p positive examples, and n negative examples
• For p = n H = 1• i.e. originally there is most uncertainty on the eventual outcome
(picking up an example) and most to gain by picking the example.
np
n
np
n
np
p
np
pH 22 loglog
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Decision Tree InductionDecision Tree Induction
• Basic algorithm (a greedy algorithm)– Tree is constructed in a top-down recursive divide-and-conquer
manner– At start, all the training examples are at the root– Attributes are categorical (if continuous-valued, they are discretized in
advance)– Examples are partitioned recursively based on selected attributes– Test attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)
• Conditions for stopping partitioning– All samples for a given node belong to the same class– There are no remaining attributes for further partitioning– Majority voting is employed for classifying the leaf– There are no samples left
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
AlgorithmAlgorithm
1. Select a random subset W (called the window) from the training set T
2. Build a DT for the current W• Select the best feature which minimizes the entropy H (or max.
gain)• Categorize training instances (examples) into subsets by this
feature• Repeat this process recursively until each subset contains
instances of one kind (class) or some statistical criterion is satisfied
3. Scan the entire training set for exceptions to the DT
4. If exceptions are found insert some of them into W and repeat from step 2
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
• Information Gain– The information gain from the attribute test is defined as
the difference between the original information requirement and the new requirement
• Note that the Remainder() is an weighted (by attribute values) entropy function
– Maximize Gain() Minimize Remainder(); and then is the most informative attribute (‘question’)
aluesdistinct v havecan and , ,)(Remainder where
)(Remainder,)(gain
1
ii
i
ii
i
i
ii
np
n
np
pH
np
np
np
n
np
pH
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
The ID3 Algorithm and Quinlan’s C4.5The ID3 Algorithm and Quinlan’s C4.5
• C4.5– Tutorial:
http://yoda.cis.temple.edu:8080/UGAIWWW/lectures/C45/– Matlab program:
http://www.cs.wisc.edu/~olvi/uwmp/msmt.html
• See 5/ C5.0– Tutorial:
http://borba.ncc.up.pt/niaad/Software/c50/c50manual.html– Software for Win2000:
http://www.rulequest.com/download.html
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Example:Exs. Class Size Color Surface
1 A Small Yellow Smooth
2 A Medium Red Smooth
3 A Medium Red Smooth
4 A Big Red Rough
5 B Medium Yellow Smooth
6 B Medium Yellow Smooth
0min
636.0)loglog(
0)log()log(
only 6 5, 1, (row) example toApplies :2 Stage
318.0min
56.0)log()loglog(
318.0)log()loglog(
462.0)log()loglog()log(
:1 Stage
31
31
32
32
33
22
22
32
11
11
31
11
11
61
52
52
53
53
65
33
33
63
31
31
32
32
63
11
11
61
42
42
42
42
64
11
11
61
sizeii
surface
size
colorii
surface
color
size
HH
H
H
HH
H
H
H
color
yellow red
Asize
small medium
BA
color
yellow red
A?
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
• Noise and Overfitting– Question: what about two or more examples with the same description but
different classifications?Answer: Each leaf node reports either MAJORITY classification or relative frequencies
– Question: what about irrelevant attributes (noise and overfitting)?Answer: Tree pruningSolution: An information gain close to zero is a good clue to irrelevance, actual number of (+) and (-) exs. In each subset i, pi and ni vs. expected numbers pi and ni assuming true irrelevance
Where p and n are the total number of positive and negative exs to start with.Total deviation (regarding statistical significant)
Under the null hypothesis, D ~ chi-squared distribution
np
npnn
np
nppp ii
iii
i
ˆ and ,ˆ
i
ii
i i
ii
n
nn
p
ppD
ˆ)ˆ(
ˆ
)ˆ( 2
1
2
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Extracting Classification Rules from TreesExtracting Classification Rules from Trees
• Represent the knowledge in the form of IF-THEN rules• One rule is created for each path from the root to a leaf• Each attribute-value pair along a path forms a conjunction• The leaf node holds the class prediction• Rules are easier for humans to understand• Example
IF age = “<=30” AND student = “no” THEN buys_computer = “no”
IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”
IF age = “31…40” THEN buys_computer = “yes”
IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes”
IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “no”
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Decision TreeDecision Tree
• Avoid Overfitting in Classification– The generated tree may overfit the training data
• Too many branches, some may reflect anomalies due to noise or outliers
• Result is in poor accuracy for unseen samples
– Two approaches to avoid overfitting • Prepruning: Halt tree construction early—do not split a node if this
would result in the goodness measure falling below a threshold– Difficult to choose an appropriate threshold
• Postpruning: Remove branches from a “fully grown” tree—get a sequence of progressively pruned trees
– Use a set of data different from the training data to decide which is the “best pruned tree”
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
• Approaches to Determine the Final Tree Size– Separate training (2/3) and testing (1/3) sets– Use cross validation, e.g., 10-fold cross validation– Use all the data for training
• but apply a statistical test (e.g., chi-square) to estimate whether expanding or pruning a node may improve the entire distribution
– Use minimum description length (MDL) principle: • halting growth of the tree when the encoding is minimized
Dr. Jeffrey Huang, Department of Computer Science, IUPUI CSCI-548: Intro. to Bioinformatics
Decision TreeDecision Tree• Enhancements to basic decision tree induction
– Allow for continuous-valued attributes• Dynamically define new discrete-valued attributes that partition the
continuous attribute value into a discrete set of intervals
– Handle missing attribute values• Assign the most common value of the attribute
• Assign probability to each of the possible values
– Attribute construction• Create new attributes based on existing ones that are sparsely
represented
• This reduces fragmentation, repetition, and replication