Knowledge Discovery in Biomedicine Limsoon Wong Institute for Infocomm Research.

Post on 27-Dec-2015

214 views 0 download

Transcript of Knowledge Discovery in Biomedicine Limsoon Wong Institute for Infocomm Research.

Knowledge Discovery in Biomedicine

Limsoon Wong

Institute for Infocomm Research

Copyright © 2004 by Limsoon Wong

Plan • Knowledge discovery in brief• Eg 1: Optimizing treatment of childhood ALL• Eg 2: Predicting survivals of patients with

DLBC lymphoma• Concluding remarks

Cop

yrig

ht ©

200

4 by

Lim

soon

Won

g

Knowledge Discovery in Brief

Jonathan’s rules : Blue or CircleJessica’s rules : All the rest

Whose block is this?

Jonathan’s blocks

Jessica’s blocks

What is Knowledge Discovery?

Copyright © 2004 by Limsoon Wong

Question: Can you explain how?

What is Knowledge Discovery?

Copyright © 2004 by Limsoon Wong

Copyright © 2004 by Limsoon Wong

Some classifiers/learning methods

Steps of Knowledge Discovery • Training data gathering• Feature generation

– k-grams, colour, texture, domain know-how, ...

• Feature selection– Entropy, 2, CFS, t-test, domain know-how...

• Feature integration– SVM, ANN, PCL, CART, C4.5, kNN, ...

Cop

yrig

ht ©

200

4 by

Lim

soon

Won

g

Knowledge Discovery forOptimizing Treatment

of Childhood ALL

Image credit: Yeoh et al, 2002

Childhood ALL• Major subtypes: T-ALL,

E2A-PBX, TEL-AML, BCR-ABL, MLL genome rearrangements, Hyperdiploid>50,

• Diff subtypes respond differently to same Tx

• Over-intensive Tx – Development of

secondary cancers– Reduction of IQ

• Under-intensiveTx – Relapse

• The subtypes look similar

• Conventional diagnosis– Immunophenotyping– Cytogenetics– Molecular diagnostics

• Unavailable in most ASEAN countries

Copyright © 2004 by Limsoon Wong

Copyright © 2004 by Jinyan Li and Limsoon Wong

Single-Test Platform ofMicroarray & Knowledge Discovery

training data collection

feature selection

Image credit: Affymetrix

feature generation

feature integration

Conventional Tx:• intermediate intensity to all 10% suffers relapse 50% suffers side effects costs US$150m/yr

Our optimized Tx:• high intensity to 10%• intermediate intensity to 40%• low intensity to 50%• costs US$100m/yr

Copyright © 2004 by Jinyan Li and Limsoon Wong

•High cure rate of 80%• Less relapse

• Less side effects• Save US$51.6m/yr

Impact

Cop

yrig

ht ©

200

4 by

Lim

soon

Won

g

Knowledge Discovery forPredicting Survival of Patients with DLBC

Lymphoma

Image credit: Rosenwald et al, 2002

Copyright © 2004 by Limsoon Wong

Diffuse Large B-Cell Lymphoma• DLBC lymphoma is the

most common type of lymphoma in adults

• Can be cured by anthracycline-based chemotherapy in 35 to 40 percent of patients

DLBC lymphoma comprises several diseases that differ in responsiveness to chemotherapy

• Intl Prognostic Index (IPI) – age, “Eastern Cooperative

Oncology Group” Performance status, tumor stage, lactate dehydrogenase level, sites of extranodal disease, ...

• Not good for stratifying DLBC lymphoma patients for therapeutic trials

Use gene-expression profiles to predict outcome of chemotherapy?

Knowledge Discovery from Gene Expression of “Extreme” Samples

“extreme”sampleselection

knowledgediscovery from gene expression

240 samples

80 samples26 long-

term survivors

47 short-term survivors

7399genes

84genes

T is long-term if S(T) < 0.3

T is short-term if S(T) > 0.7

p-value of log-rank test: < 0.0001Risk score thresholds: 0.7, 0.5, 0.3

Kaplan-Meier Plot for 80 Test Cases

(A) IPI low, p-value = 0.0063

(B) IPI intermediate,p-value = 0.0003

Improvement Over IPI

(A) W/o sample selection (p =0.38) (B) With sample selection (p=0.009)

No clear difference on the overall survival of the 80 samples in the validation group of DLBCL study, if no training sample selection conducted

Merit of “Extreme” Samples

Cop

yrig

ht ©

200

4 by

Lim

soon

Won

g

Knowledge Discovery for A Few Other Biomedical

Applications

• Develop systems to recognize protein peptides that bind MHC molecules• Develop systems to recognize hot spots in viral antigens

Predict Epitopes,Find Vaccine Targets

• Vaccines are often the only solution for viral diseases

• Finding & developing effective vaccine targets (epitopes) is slow and expensive process

Dragon’s 10x reduction of TSS recognitionfalse positives

Recognize Functional Sites,Help Scientists

• Effective recognition of initiation, control, & termination of biological processes is crucial to speeding up & focusing scientific expts

• Data mining of bio seqs to find rules to recognize & understand functional sites

• Knowledge extraction system to process free text • extract protein names• extract interactions

Understand Proteins,Fight Diseases

• Understanding function & role of protein needs organised info on interaction pathways

• Such info are often reported in scientific paper but are seldom found in structured db

Copyright © 2004 by Limsoon Wong

Benefits of Bioinformatics• To the patient:

– Better drug, better treatment

• To the pharma:– Save time, save cost, make more $

• To the scientist:– Better science

Copyright © 2004 by Limsoon Wong

References • A. Yeoh et al, “Classification, subtype discovery, and

prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling”, Cancer Cell, 1:133--143, 2002

• A. Rosenwald et al, “The use of molecular profiling to predict survival after chemotherapy for diffuse large B-cell lymphoma”, NEJM, 346:1937--1947, 2002

• H. Liu et al, “Selection of patient samples and genes for outcome prediction”, Proc. CSB2004, pages 382--392

Cop

yrig

ht ©

200

4 by

Lim

soon

Won

g

Any Question?

Copyright © 2004 by Limsoon Wong

• To be presented• 10/10/04, 8.30--10.00am• Raffles Convention Centre• NHG-IBM Symposium