Knowledge Discovery in Biomedicine Limsoon Wong Institute for Infocomm Research.

24
Knowledge Discovery in Biomedicine Limsoon Wong Institute for Infocomm Research

Transcript of Knowledge Discovery in Biomedicine Limsoon Wong Institute for Infocomm Research.

Knowledge Discovery in Biomedicine

Limsoon Wong

Institute for Infocomm Research

Copyright © 2004 by Limsoon Wong

Plan • Knowledge discovery in brief• Eg 1: Optimizing treatment of childhood ALL• Eg 2: Predicting survivals of patients with

DLBC lymphoma• Concluding remarks

Cop

yrig

ht ©

200

4 by

Lim

soon

Won

g

Knowledge Discovery in Brief

Jonathan’s rules : Blue or CircleJessica’s rules : All the rest

Whose block is this?

Jonathan’s blocks

Jessica’s blocks

What is Knowledge Discovery?

Copyright © 2004 by Limsoon Wong

Question: Can you explain how?

What is Knowledge Discovery?

Copyright © 2004 by Limsoon Wong

Copyright © 2004 by Limsoon Wong

Some classifiers/learning methods

Steps of Knowledge Discovery • Training data gathering• Feature generation

– k-grams, colour, texture, domain know-how, ...

• Feature selection– Entropy, 2, CFS, t-test, domain know-how...

• Feature integration– SVM, ANN, PCL, CART, C4.5, kNN, ...

Cop

yrig

ht ©

200

4 by

Lim

soon

Won

g

Knowledge Discovery forOptimizing Treatment

of Childhood ALL

Image credit: Yeoh et al, 2002

Childhood ALL• Major subtypes: T-ALL,

E2A-PBX, TEL-AML, BCR-ABL, MLL genome rearrangements, Hyperdiploid>50,

• Diff subtypes respond differently to same Tx

• Over-intensive Tx – Development of

secondary cancers– Reduction of IQ

• Under-intensiveTx – Relapse

• The subtypes look similar

• Conventional diagnosis– Immunophenotyping– Cytogenetics– Molecular diagnostics

• Unavailable in most ASEAN countries

Copyright © 2004 by Limsoon Wong

Copyright © 2004 by Jinyan Li and Limsoon Wong

Single-Test Platform ofMicroarray & Knowledge Discovery

training data collection

feature selection

Image credit: Affymetrix

feature generation

feature integration

Conventional Tx:• intermediate intensity to all 10% suffers relapse 50% suffers side effects costs US$150m/yr

Our optimized Tx:• high intensity to 10%• intermediate intensity to 40%• low intensity to 50%• costs US$100m/yr

Copyright © 2004 by Jinyan Li and Limsoon Wong

•High cure rate of 80%• Less relapse

• Less side effects• Save US$51.6m/yr

Impact

Cop

yrig

ht ©

200

4 by

Lim

soon

Won

g

Knowledge Discovery forPredicting Survival of Patients with DLBC

Lymphoma

Image credit: Rosenwald et al, 2002

Copyright © 2004 by Limsoon Wong

Diffuse Large B-Cell Lymphoma• DLBC lymphoma is the

most common type of lymphoma in adults

• Can be cured by anthracycline-based chemotherapy in 35 to 40 percent of patients

DLBC lymphoma comprises several diseases that differ in responsiveness to chemotherapy

• Intl Prognostic Index (IPI) – age, “Eastern Cooperative

Oncology Group” Performance status, tumor stage, lactate dehydrogenase level, sites of extranodal disease, ...

• Not good for stratifying DLBC lymphoma patients for therapeutic trials

Use gene-expression profiles to predict outcome of chemotherapy?

Knowledge Discovery from Gene Expression of “Extreme” Samples

“extreme”sampleselection

knowledgediscovery from gene expression

240 samples

80 samples26 long-

term survivors

47 short-term survivors

7399genes

84genes

T is long-term if S(T) < 0.3

T is short-term if S(T) > 0.7

p-value of log-rank test: < 0.0001Risk score thresholds: 0.7, 0.5, 0.3

Kaplan-Meier Plot for 80 Test Cases

(A) IPI low, p-value = 0.0063

(B) IPI intermediate,p-value = 0.0003

Improvement Over IPI

(A) W/o sample selection (p =0.38) (B) With sample selection (p=0.009)

No clear difference on the overall survival of the 80 samples in the validation group of DLBCL study, if no training sample selection conducted

Merit of “Extreme” Samples

Cop

yrig

ht ©

200

4 by

Lim

soon

Won

g

Knowledge Discovery for A Few Other Biomedical

Applications

• Develop systems to recognize protein peptides that bind MHC molecules• Develop systems to recognize hot spots in viral antigens

Predict Epitopes,Find Vaccine Targets

• Vaccines are often the only solution for viral diseases

• Finding & developing effective vaccine targets (epitopes) is slow and expensive process

Dragon’s 10x reduction of TSS recognitionfalse positives

Recognize Functional Sites,Help Scientists

• Effective recognition of initiation, control, & termination of biological processes is crucial to speeding up & focusing scientific expts

• Data mining of bio seqs to find rules to recognize & understand functional sites

• Knowledge extraction system to process free text • extract protein names• extract interactions

Understand Proteins,Fight Diseases

• Understanding function & role of protein needs organised info on interaction pathways

• Such info are often reported in scientific paper but are seldom found in structured db

Copyright © 2004 by Limsoon Wong

Benefits of Bioinformatics• To the patient:

– Better drug, better treatment

• To the pharma:– Save time, save cost, make more $

• To the scientist:– Better science

Copyright © 2004 by Limsoon Wong

References • A. Yeoh et al, “Classification, subtype discovery, and

prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling”, Cancer Cell, 1:133--143, 2002

• A. Rosenwald et al, “The use of molecular profiling to predict survival after chemotherapy for diffuse large B-cell lymphoma”, NEJM, 346:1937--1947, 2002

• H. Liu et al, “Selection of patient samples and genes for outcome prediction”, Proc. CSB2004, pages 382--392

Cop

yrig

ht ©

200

4 by

Lim

soon

Won

g

Any Question?

Copyright © 2004 by Limsoon Wong

• To be presented• 10/10/04, 8.30--10.00am• Raffles Convention Centre• NHG-IBM Symposium