Post on 10-Jan-2016
description
CBR for Modeling Complex Systems
Rosina Weber, Jason M. Proctor,
Ilya WaldsteinCollege of Information Science & Technology
Andres Kriete School of Biomedical Engineering, Science and Health System,
Coriell Institute for Medical Research
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
In a Nutshell• Some systems are too complex to be
directly used in reasoning tasks– E.g., biological systems, large organizations,
ecosystem– Large, hard to access, difficult to understand,
hidden interactions
• The alternative is to use models to represent these systems– Models can be built when there is knowledge or
data about the system
• In the absence of both, we propose to use CBR to recommend a model for reuse
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Open research questions
• How can CBR manipulate complex systems and models?
• Can CBR recommend accurate models for reuse?
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Model2
Manipulating Complex Systems with CBR
Case problems
Case solutions
….Complex system1 Complex system2 Complex systemn
Model2Model1 Modeln….
Unknown Complex system
Complex systemn+
Modeln+
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
CBR for Modeling Complex Systems
Modeln
Complex systemn
Model1
Complex system1
Model2
Complex system2
….Case
problem
Case solution
Case outcome
Estimated Measure of
Certainty1
Estimated Measure of
Certaintyn
Estimated Measure of
Certainty2
• Does this work in CBR?
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Challenges
1. What makes one system similar to another?
2. How can models be compared?
How can we find similar solutions for similar problems?
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Approach: Assumptions (i)
• 1st assumption
Two solutions are similar if they have similar features in a chosen representation.
Solutioni Solutionj
Outcome1 Outcome2
Problemi Problemj
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Approach: Assumptions (ii)
• 2nd assumption:
Two problems are similar if they are solved by solutions that are considered similar.
Solutioni Solutionj
Problemi Problemj
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Approach
• 1st step: Identify similar solutions– Cluster existing problem-solution pairs (cases)
based on features of the solutions
• 2nd step: Identify problem features that support the clustering– Determine participation of problem features in
each cluster to eliminate less relevant features
• 3rd step: Define a similarity measure for all cases– Use the results of step 2 to assess similarity
between problems
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Open research questions
• How can CBR manipulate complex systems and models?
• Can CBR recommend accurate models for reuse?
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Validation: Dataset
• Complex systems
• Models to represent them
• Verification of the models’ quality
Software systems
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
CI-Tool (baseline approach)
No indication of what makes a software program similar to another for the purposes of input-output analysis
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Data Set
• Twenty-one (21) software programs described through 23 features
• Problem features– parameters e.g. # of inputs
• Solution features– ANN configuration parameter values– dataset used for the training
• Outcome feature– Success rate of ANN
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Validation: Hypothesis, Metrics
• Hypothesis– Our approach can support the
recommendation of models as accurate as the baseline approach
• Metric: accuracy– Average accuracy of the models
recommended by our CBR approach compared to baseline approach
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Methodology: LOOCV
Si
SiSi
Si
Si Si
Si
SiSi
Si
• 1st step: Cluster analysis
Pi Pi PiPi
PiPiPi
PiPiPi
5
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Methodology: LOOCV
Si
Si
Si
Si
SiSi
Si
SiSi
Si
• 2nd step: Stepwise discriminant analysis
PiPi
PiPiPi
Pi Pi PiPi
• Discriminant functions that map problem features in the cluster space
Pi
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Methodology: LOOCV
Si
Si
Si
Si
SiSi
Si
SiSi
Si
Pi
Pi
PiPi
PiPi
Pi
Pi
Pi
Pi
• 3rd step: Apply discriminant functions to assess similarity between cases
TPi
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Results
• 71.4% of the results support our hypothesis– 61.9% no statistical difference– 9.5% is significantly higher
• CBR can recommend accurate models for reuse in the absence of an alternative
• CBR may also be considered to find highly suitable models
2
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Modeln
Complex systemn
Model1
Complex system1
Model2
Complex system2
….Case
problem
Case outcome
Estimated Measure of
Certainty1
Estimated Measure of
Certaintyn
Estimated Measure of
Certainty2
Performing tasks with gene expression data
Model1Task solution1
Model2
Task solution2
ModelnTask solutionn
ModelnPrescriptionn
Model1Prescription1
Model2
Prescription2
ModelnDiagnosisn
Model1Diagnosis1
Model2
Diagnosis2
Biological systemnBiological system1 Biological system2
Biological systems described through gene expression dataGene expression can be measured with microarraysMicroarrays reveal how genes “behave”Reasoning tasks: case solution includes the model and task solution
Case solution
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Modeln
Complex systemn
Model1
Complex system1
Model2
Complex system2
….
EMC1 EMCnEMC2
Example
Model1Task solution1
Model2
Task solution2
ModelnTask solutionn
ModelnPrescriptionn
Model1Prescription1
Model2
Prescription2
ModelnDiagnosisn
Model1Diagnosis1
Model2
Diagnosis2
Individual3Demographicsn
GE[xnyn]
Individual1Demographics1
GE[x1y1]
Individual2Demographics2
GE[x2y2]
Case problem
Case solution
Case outcome
A study is represented in one caseModel is build with data and diagnosisEMC is determined with statistics of the study
ModeliDiagnosisi
IndividualDemographics
GE[yn]
ModeljDiagnosisj
EMCi EMCj
Diagnose a new target individual using this case baseNo GE data is available for brain cellsRetrieval uses information and data availableRecommends models
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Conclusion
• As more studies are conducted more cases are created
• The certainty of the diagnosis has the potential to increase
• Increased understanding of the domain by the incorporation of analogy through CBR
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Future Work
• Develop and test reuse methods
• Test other models (e.g. SVM, IFN)
• Methods for determining EMC
• Apply the approach to biological and environmental problems
Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Thank you!
Any questions?