MML, inverse learning and medical data-sets Pritika Sanghi Supervisors: A./Prof. D. L. Dowe Dr P. E....

MML, inverse learning and medical data-sets

Pritika Sanghi

Supervisors: A./Prof. D. L. DoweDr P. E. Tischer

Overview

What is this project about? Bayesian Networks and their limitations Some techniques

Factor Analysis Minimum Message Length (MML) Decision Trees & Graphs Logistic Regression

Improving Bayesian Networks What is being done in this project?

What is this project about?

The aim of the project is to enhance Bayesian Networks in general and then apply them to certain medical data-sets.

These data-sets have a large number of attributes and small number of cases.

This makes it difficult to model these data-sets using Bayesian Networks.

Bayesian Networks

A popular tool for Data Mining.

Model data to infer the probability of a certain outcome.

They represent the frequency distributions for the values that an attribute can take as Conditional Probability Distributions.

WS GO P(S | WS, GO)

S P(A|S)

Bayesian Networks - Limitations When a child node depends on a large

number of parent attributes, the conditional probability distribution (CPD) becomes very complex. 2n rows in the CPD for n binary parent attributes.

This makes the process of creating the CPD and inferring something from it once created very time consuming.

A more compact representation for CPDs is required.

Factor Analysis

Multiple attributes may be defined by a common factor.

The Wallace and Freeman model for Single Factor Analysis will be implemented.

This serves as dimensionality reduction.

The validity of the program built will be checked using the data-sets specified in the Wallace and Freeman paper.

Attributes A and B have a common factor F1.

Attributes C, D and E have a common factor F2.

Factor AnalysisHeight-Weight of Footy Players

165 170 175 180 185 190 195 200 205

Height

Weight

0 20 40 60 80 100 120

Actual Weight

Height

165 170 175 180 185 190 195 200 205

Actual Height

Factor Analysis

Data Attribute related term Standard Deviation

xnk = μk + аk νn + σk rnk

Mean Record related term Random variates N(0,1)

Size Height Weight

Large Tall AverageLarge Short Heavy

Medium Average AverageSmall Short Light

The equation for Single Factor analysis as defined by Wallace and Freeman is:

The Minimum Message Length (MML) Principle Models the data as a two-part message consisting of

hypothesis H and the data it encodes, D. The best model is the one with minimum message

length. This is done by maximising the posterior probability of

the hypothesis given the data, -log Pr(H|D), as the message length is negative log likelihood of the probability.

Message is represented as:

Hypothesis Data

Decision Trees and Graphs

Graphical way of representing the output attribute in terms of the input attributes.

Used to model the Conditional Probability Distribution of the Bayesian Network.

Graphs are generalisations of decision trees. They merge similar sub-trees.

Logistic Regression

Mathematical modelling approach used for describing the dependence of a variable on other attributes.

Will be used to define the probability of a discrete target attribute as a function of continuous attributes.

f(z) = 1 / (1+e-z) + c

Improving Bayesian Networks Comley and Dowe (2003, 2004) based on the

ideas from Dowe and Wallace (1998) commenced the work of enhancing Bayesian Networks and introduced Generalised Bayesian Networks.

This project will extend on their work by applying some of the techniques described before on Bayesian Networks.

What is being done in this project? Refinement to Generalised Bayesian Networks.

Specifically,First the MML - Single Factor Analysis will be added to Bayesian Networks.Then, Logistic Regression will be looked into.

The Generalised Bayesian Networks will then be used to infer models from some medical data-sets such as breast cancer data-sets.

If time permits, which it almost definitely won’t, other methods of dimensionality reduction and/or decision graphs will be pursued.

References

J W Comley and D L Dowe: General Bayesian Networks and Asymmetric Languages, Proceedings of the 2003 Hawaii International Conference on Statistics and Related Fields (HICS 2003), Honolulu, Hawaii, USA, 5-8 June 2003, ISSN: 1539-7211, pp 1 - 18.

J. W. Comley and D. L. Dowe: Minimum Message Length and Generalised Bayesian Nets with Asymmetric Languages, in P. D. Grunwald, I. J. Myung and M. A. Pitt (ed), Advances in Minimum Description Length: Theory and Applications, MIT Press. To be published 2004.

D L Dowe, C S Wallace: Kolmogorov complexity, minimum message length and inverse learning, in W Robb (ed), Proceedings of the Fourteenth Biennial Australian Statistical Conference (ASC-14), Queensland, Australia, 6-10 July, 1998, p 144.

C S Wallace and P R Freeman: Single factor analysis by MML estimation, J Royal Stat. Soc. B. 54, 1, 195-209, 1992.

More Information

http://www.monash.edu.au/~sanghi sanghi@mail.csse.monash.edu.au

Thank You

Any questions?

MML, inverse learning and medical data-sets Pritika Sanghi Supervisors: A./Prof. D. L. Dowe Dr P. E....

Documents

Transcript of MML, inverse learning and medical data-sets Pritika Sanghi Supervisors: A./Prof. D. L. Dowe Dr P. E....

Visual Object Recognition: DoWe Know More Now ThanWe Did 20 Years Ago?sig Tarr

Mark R Tischer Memorial Presentation

INTERIOR WORLDS - Tischer-Pickup · IDEA RODUCTION FABRIC DESIGN “LINDO“ FABRIC DESIGN “ALSOMA“ FABRIC DESIGN “SAVARO“ FABRIC DESIGN “CARINO“ FABRIC DESIGN “DELIO“

tischer discoid meniscus - Knee Course · 2014-03-03 · Discoid Meniscus – How I do T. Tischer 5th Advanced Course on Knee Surgery, Val d‘isere, 2.-7.2.2014 Universitätsmedizin

Lothar Kunz, Lubow Maier, Steffen Tischer ... - itcp.kit.edu · Email: deutschmann@kit.edu . 2 Modeling the Rate of Heterogeneous Reactions Lothar Kunz, Lubow Maier, Steffen Tischer,

Lothar Kunz, Lubow Maier, Steffen Tischer, Olaf

SDS PODCAST EPISODE 215 WITH BRIAN DOWE - Amazon Web …€¦ · full-stack web developer Brian Dowe joining us, and I literally just got off the call with Brian a few hours ago and

pritikaautoindustries.com · Chandigarh, 16th August 2018: Pritika Auto Industries Limited, amongst leading manufacturers ... Pritika Auto Industries Ltd. is a flagship company of

Image deblocking using local segmentation By Mirsad Makalic Supervisor: Dr. Peter Tischer.

3new Project Pritika

DTT 7.0 // 16. November 2018 - LEIIKKathrin Thiele // Beate Tischer // Diana Trojca // Gaby Waldek // Guntram Walther // Sandra Wehlisch // Marion Wenzel // Susanne Werdin // Christiane

PRITIKA AUTO INDUSTRIES LTD.pritikaautoindustries.com/corp-gov-sept-2018.pdf · Chandigarh. Company has its manufacturing facilities situated at Derabassi& Hoshiarpur (Punjab), Tahliwal

GEA Bock Plusbox · mals eine Serie modular aufgebaute Outdoor Komplettverflüssigungssätze auf Basis halbherme-tischer Bock Verdichter und GEA Verflüssigern. Das modulare Baukastenprinzip

Tuberculosis detection an treatment - lgl.bayern.de · Bei publizis-tischer Verwertung – auch von Teilen – Angabe der Quelle und Übersendung eines Belegexemplars erbeten. Das

pritikagroup.compritikagroup.com/pritika-casting/pdf/audited-results-30... · 2016. 11. 12. · Pritika Autocast Ltd. (Scrip Code 780020 listed on SME-ITP) Audited Results for the

i-mom.unimedias.fr2020/09/16 · Une chanson dowe Que me chantat ma maman En swant mon powe J'¿coutas en m'endormant Cette chanson dowe Je veox la chanter poor toi Car ta peat) est

PICK-UP CABINS - Tischer-Pickup · tischer gmbh freizeitfahrzeuge tischer dealer directory tischer service frankenstrasse 3 · 97892 kreuzwertheim industrial area wiebelbach fon:

Teejay Dowe | Transforming Teen Talent Through Trust | Trust Conference 2014

Quantitative easing, portfolio rebalancing and credit ... · Quantitative Easing, Portfolio Rebalancing and Credit Growth: Micro Evidence from Germany Johannes Tischer Directorate-General

Functional Magnetic Resonance Imaging (fMRI) Alexander Wolf (12568449) Supervisor : Dr. Peter Tischer.