Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard...

30
Networks to Predict Networks to Predict Plankton Production Plankton Production from Satellite Data from Satellite Data By: By: Rob Curtis, Richard Fenn, Damon Rob Curtis, Richard Fenn, Damon Oberholster Oberholster Supervisors: Supervisors: Anet Potgieter, John Field, Anet Potgieter, John Field, Laurent Drapeau Laurent Drapeau Department of Computer Science

description

Introduction Aim to predict plankton primary production using satellite data Aim to predict plankton primary production using satellite data Daily satellite data on surface temperature, chlorophyll, winds, currents Daily satellite data on surface temperature, chlorophyll, winds, currents Archive of ships’ sub-surface details Archive of ships’ sub-surface details Predict likely subsurface plankton profile from surface features Predict likely subsurface plankton profile from surface features

Transcript of Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard...

Page 1: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Using Bayesian Networks Using Bayesian Networks to Predict Plankton to Predict Plankton

Production from Satellite Production from Satellite DataData

By: By: Rob Curtis, Richard Fenn, Damon Rob Curtis, Richard Fenn, Damon OberholsterOberholster

Supervisors: Supervisors: Anet Potgieter, John Field, Laurent Anet Potgieter, John Field, Laurent DrapeauDrapeau

Department of Computer Science

Page 2: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

OverviewOverview• IntroductionIntroduction• Work DetailWork Detail• Knowledge AcquisitionKnowledge Acquisition• Knowledge RepresentationKnowledge Representation• Bayesian Learning and InferenceBayesian Learning and Inference• Topic MapsTopic Maps

Page 3: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

IntroductionIntroduction• Aim to predict plankton primary Aim to predict plankton primary

production using satellite dataproduction using satellite data• Daily satellite data on surface Daily satellite data on surface

temperature, chlorophyll, winds, temperature, chlorophyll, winds, currentscurrents

• Archive of ships’ sub-surface detailsArchive of ships’ sub-surface details• Predict likely subsurface plankton Predict likely subsurface plankton

profile from surface featuresprofile from surface features

Page 4: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Current SystemCurrent System• Currently best solution uses Self Currently best solution uses Self

Organising Maps (SOMs: A type of Organising Maps (SOMs: A type of neural network) to classify dataneural network) to classify data– Resulting solution lacks accuracyResulting solution lacks accuracy– Difficult to interpretDifficult to interpret

Page 5: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Proposed SystemProposed System• Propose a system that uses Bayesian Propose a system that uses Bayesian

Networks to predict plankton productionNetworks to predict plankton production– Use ships’ sub surface profiles + satellite Use ships’ sub surface profiles + satellite

data to draw cause effect relationshipsdata to draw cause effect relationships– Will use Bayesian Inference and LearningWill use Bayesian Inference and Learning

• Use Topic Maps to visualize networkUse Topic Maps to visualize network

Page 6: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Work DetailWork DetailKnowledge Acquisition

Inference Engine KnowledgeRepresentation

Learning Engine

Topic Map

RequirementsElicitation

Rob Curtis

Richard Fenn

Damon Oberholster

Page 7: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Knowledge AcquisitionKnowledge Acquisition• ““The process of analyzing, transforming, The process of analyzing, transforming,

classifying, organizing and integrating classifying, organizing and integrating knowledge and representing that knowledge and representing that knowledge in a form that can be used in a knowledge in a form that can be used in a computer system. Typically the knowledge computer system. Typically the knowledge is based on what a human expert does is based on what a human expert does when solving problems”when solving problems”

www.centc251.org/Ginfo/Glossary/tcglosk.htmwww.centc251.org/Ginfo/Glossary/tcglosk.htm

• Relating to this project:Relating to this project:– Huge amounts of dataHuge amounts of data– Data is poorly recorded in Excel spreadsheetsData is poorly recorded in Excel spreadsheets– Gaps in current dataGaps in current data

Page 8: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Knowledge Acquisition: Amount of Knowledge Acquisition: Amount of DataData• 2500 ship sub surface readings 2500 ship sub surface readings

– Recorded over 10 year periodRecorded over 10 year period• Bayesian Network requires satellite Bayesian Network requires satellite

data for the same time perioddata for the same time period• Need to represent data in a form that Need to represent data in a form that

can be used by the Bayesian can be used by the Bayesian NetworkNetwork

Page 9: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Knowledge Acquisition: Current DataKnowledge Acquisition: Current Data

Page 10: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Knowledge Acquisition: Gaps in Knowledge Acquisition: Gaps in DataData

Ships’ sub-surface readings (discrete)

Satellite data (continuous)

Page 11: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Knowledge Acquisition: Gaps in Knowledge Acquisition: Gaps in DataData

Page 12: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Knowledge Acquisition: Knowledge Acquisition: ChallengesChallenges• Making sense of all the available data Making sense of all the available data

(consultations with Dr John Field and (consultations with Dr John Field and Laurent Drapeau)Laurent Drapeau)

• Correlating the 2D continuous satellite data Correlating the 2D continuous satellite data to 3D discrete ships’ sub-surface profileto 3D discrete ships’ sub-surface profile

• Representing all the data in a form easily Representing all the data in a form easily used by the Bayesian Networkused by the Bayesian Network

• Integration of disparate dataIntegration of disparate data

Page 13: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Knowledge RepresentationKnowledge Representation• ““A search for formal ways to describe knowledge A search for formal ways to describe knowledge

presented in informal terms (a prerequisite for its presented in informal terms (a prerequisite for its handling as computation)” handling as computation)”

encyclopedia.laborlawtalk.com/Representationencyclopedia.laborlawtalk.com/Representation

• Relating to this project:Relating to this project:– Need to find causal relationships between environment variablesNeed to find causal relationships between environment variables– Represent those relationships in a Bayesian NetworkRepresent those relationships in a Bayesian Network– Store the data in a database so that it will be easy for the Store the data in a database so that it will be easy for the

Inference and Learning Engines of the Bayesian Network to Inference and Learning Engines of the Bayesian Network to Manipulate.Manipulate.

– Need to consider the temporal aspects of the dataNeed to consider the temporal aspects of the data

Page 14: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Knowledge Representation: Causal Knowledge Representation: Causal RelationshipsRelationships

Primary Plankton

Production

Many variables that influence plankton production: •Chlorophyll•Surface Temp•Wind •Current

Chlorophyll

Surface Temp

Wind

Page 15: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Knowledge Representation: Knowledge Representation: Bayesian NetworkBayesian Network• Directed graphical modelDirected graphical model• Each node represents influencing variableEach node represents influencing variable• An edge from one node to another represents An edge from one node to another represents

causal relationship between those nodes causal relationship between those nodes

• Create Bayesian network structure based on the Create Bayesian network structure based on the most relevant relationships found between the most relevant relationships found between the variablevariable

Page 16: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Knowledge Representation: Knowledge Representation: Temporal aspectsTemporal aspects•Need to divide data up into time steps

•Each time step is dependant on previous step

t + 1t t + 2

Page 17: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Learning EngineLearning Engine• Each Node of the Bayesian network Each Node of the Bayesian network

will have a Conditional Probability will have a Conditional Probability Table (CPT)Table (CPT)

• Learning engine will implement an Learning engine will implement an algorithm to update the probabilities algorithm to update the probabilities in each of these tablesin each of these tables– nine years of satellite and ship data will nine years of satellite and ship data will

be used in training the system be used in training the system

Page 18: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Inference EngineInference Engine• The inference engine will be The inference engine will be

responsible for calculating the responsible for calculating the probability of a certain sequence of probability of a certain sequence of observations given certain input observations given certain input parametersparameters

Page 19: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Testing

• Nine years of sub-surface data will be used to train the system.

• Compare the predicted results for the tenth year against the recorded results for that year.

• The project will be a success if predictions are very similar to those that were recorded.

Page 20: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Representing Bayesian Representing Bayesian Networks using Topic MapsNetworks using Topic Maps

Page 21: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Topic Maps: OverviewTopic Maps: Overview• Brief introduction to topic maps and Brief introduction to topic maps and

hypergraphshypergraphs• Applying topic maps to the systemApplying topic maps to the system• TestingTesting• ChallengesChallenges

Page 22: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Topic MapsTopic Maps• Topic maps provide means for Topic maps provide means for

indexing dataindexing data• ISO standard for describing ISO standard for describing

knowledge structures and knowledge structures and associating them with information associating them with information resources. resources.

Page 23: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Topic Map StructureTopic Map Structure• TopicTopic

– Anything, subject, entity, conceptAnything, subject, entity, concept• OccurrenceOccurrence

– Link to information about topicLink to information about topic• AssociationAssociation

– Relationships between topicsRelationships between topics

Page 24: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Topic Map StructureTopic Map Structure

OccurrenceTopic

Association

Page 25: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Representing Topic MapsRepresenting Topic Maps• HypergraphsHypergraphs

hypergraph is a graph that can have smaller hypergraph is a graph that can have smaller graphs (subgraphs) imbedded within itself graphs (subgraphs) imbedded within itself

Page 26: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Applying Topic MapsApplying Topic Maps• Bayesian NetworkBayesian Network

– Topics will represent nodes in the networkTopics will represent nodes in the network– Associations represent relationships Associations represent relationships

between nodes in the networkbetween nodes in the network– Occurrences will link to info about nodeOccurrences will link to info about node

• Future SystemFuture System– Web application linking topic maps for Web application linking topic maps for

different regions of the oceandifferent regions of the ocean

Page 27: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

Testing Testing • Qualitative approachQualitative approach

• Low-Fi prototypes to test intuitiveness Low-Fi prototypes to test intuitiveness of proposed interface to Bayesian of proposed interface to Bayesian NetworkNetwork

• Test with the intended users of the Test with the intended users of the systemsystem

Page 28: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

ChallengesChallenges• Representing temporal information Representing temporal information

using topic mapsusing topic maps• Representing Bayesian Network Representing Bayesian Network

relationships using topic mapsrelationships using topic maps

Page 29: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

SUMMARYSUMMARY• Represent data in a formal way Represent data in a formal way using knowledge acquisition using knowledge acquisition and representationand representation

• Research the viability of using Research the viability of using Bayesian Networks as a Bayesian Networks as a prediction mechanismprediction mechanism

• Research the viability of using Research the viability of using topic maps for intuitively topic maps for intuitively representing Bayesian representing Bayesian NetworksNetworks

Page 30: Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,

ReferencesReferences• Pepper, S. (2002), ”The TAO of Topic Pepper, S. (2002), ”The TAO of Topic

Maps, Finding the Way in the Age of Maps, Finding the Way in the Age of Infoglut”, retrieved 01/06/2005, URL: Infoglut”, retrieved 01/06/2005, URL: http://www.ontopia.net/topicmaps/mhttp://www.ontopia.net/topicmaps/materials/tao.htmlaterials/tao.html