Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to...

25
Mathematical Modeling of DNA Microarray Data: Discovery of Biological Mechanisms with Tensor Decompositions, and Definitions of Novel Tensor Decompositions from Biological Applications Orly Alter Department of Biomedical Engineering, Institute for Cellular and Molecular Biology and Institute for Computational Engineering and Sciences University of Texas at Austin

Transcript of Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to...

Page 1: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

Mathematical Modeling ofDNA Microarray Data:

Discovery of Biological Mechanisms withTensor Decompositions, and

Definitions of Novel Tensor Decompositionsfrom Biological Applications

Orly Alter

Department of Biomedical Engineering,Institute for Cellular and Molecular Biology and

Institute for Computational Engineering and Sciences

University of Texas at Austin

Page 2: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:
Page 3: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

DNA Microarrays Record Genomic Signals

DNA microarrays relyon hybridization t orecord the completegenomic signals thatguide the progression ofcellular processes, suchas abundance levels ofDNA, RNA and DNA-bound proteins on agenomic scale.

Page 4: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

From Data Patterns to Principles of NatureAlter, PNAS 103, 16063 (2006);

Alter, in Microarray Data Analysis: Methods and Applications (Humana Press, 2007), pp. 17–59.

Kepler’s discovery of his first law of planetary motion from mathematicalmodeling of Brahe’s astronomical data:

Kepler, Astronomia Nova (Voegelinus, Heidelberg, 1609), reproduced by permission of theHarry Ransom Humanities Research Center of the University of Texas, Austin, TX).

Page 5: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

Physics-Inspired Matrix (and Tensor) ModelsMathematical frameworks for the description of the data, in which themathematical variables and operations might represent biological reality.

SVDAlter, Brown & Botstein,PNAS 97, 10101 (2000).

ComparativeGSVD

Alter, Brown & Botstein,PNAS 100, 3351 (2003).

IntegrativePseudoinverse

Alter & Golub,PNAS 101, 16577 (2004).

Uncover CellularProcesses and States

Uncover ProcessesCommon or ExclusiveAmong Two Datasets

Uncover CoordinationAmong Multiple Sets

Eigenvalue Decomposition Generalized EigenvalueDecomposition

Inverse Projection

Page 6: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

Networks are Tensors of “Subnetworks”Alter & Golub, PNAS 102, 17559 (2005);

http://www.bme.utexas.edu/research/orly/network_decomposition/.

Æ =

+ + ...

The relations among the activities of genes, not only the activities of thegenes alone, are known to be pathway-dependent, i.e., conditioned bythe biological and experimental settings in which they are observed.

Page 7: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

A Higher-Order SVD Predicts anEquivalent Biological MechanismLinear transformation of the data tensor from genes ¥x-settings ¥ y-settings space to reduced “eigenarrays”¥ “x-eigengenes” ¥ “y-eigengenes” space.

This HOSVD is computedfrom each SVD of the datatensor unfolded around onegiven axis,

De Lathauwer, De Moor &Vandewalle, SIMAX 21, 1253 (2000);Kolda, SIMAX 23, 243 (2001); Zhang& Golub, SIMAX 23, 543 (2001).

mRNA Expression fromCell Cycle Time Coursesunder Different Conditionsof Oxidative StressShapira, Segal & Botstein,

MBC 15, 5659 (2004);Spellman et al., MBC 9, 3273 (1998).

Page 8: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

HOSVD Integrative ModelingOmberg, Golub & Alter, PNAS 104, 18371 (2007);http://www.bme.utexas.edu/research/orly/HOSVD/.

The data tensor is a superpositionof all rank-1 “subtensors,” i.e.,outer products of an eigenarray,an x- and a y-eigengene,

The significance of a subtensor isdefined by the corresponding“fraction,” computed from thehigher-order singular values,

The complexity of the data tensoris defined by the “normalizedentropy,”

Page 9: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

Rotation in an ApproximatelyDegenerate Subtensor Space

An “approximately degeneratesubtensor space” is defined as thatwhich is span by, e.g., the subtensors

which satisfy

This HOSVD is reformulatedwith a unique single rank-1subtensor that is composed ofthese two subtensors,

Page 10: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

Math Variables & Operations Æ BiologyHOSVD uncovers independent data patterns across each variable and theinteractions among them Æ global picture of the causal coordinationamong biological processes and experimental phenomena:

Equivalent DNA ´ RNA Correlation

Page 11: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

Ove

rexp

ress

ion

ofbi

ndin

gta

rget

sof

repl

icat

ion

init

iati

onpr

otei

nsco

rrel

ates

wit

hre

duce

d,or

even

inhi

bite

d, b

indi

ng o

f th

e or

igin

s.Æ

Rep

lica

tion

init

iati

onre

quir

esbi

ndin

gof

thes

epr

otei

ns a

t ori

gins

of

repl

icat

ion.

Dif

fley

, Coc

ker,

Dow

ell,

& R

owle

y, C

ell 7

8, 3

03 (

1994

).

ÆT

hey

are

invo

lved

with

tran

scri

ptio

nals

ilenc

ing

atth

e ye

ast m

atin

g lo

ci.

Mic

klem

et a

l., N

atur

e 36

6, 8

7 (1

993)

.

Eith

eron

eof

two

prev

ious

lyun

know

nm

echa

nism

sof

regu

latio

n m

ay b

e un

derl

ying

this

cor

rela

tion:

ÆR

eplic

atio

n m

ay r

egul

ate

tran

scri

ptio

n:T

hebi

ndin

gof

MC

Mpr

otei

nsre

pres

ses

the

expr

essi

on o

f ge

nes

that

are

nea

r th

e or

igin

s.Æ

Tra

nscr

iptio

n m

ay r

egul

ate

repl

icat

ion:

The

tran

scri

ptio

nof

gene

sre

duce

sth

eef

fici

ency

of o

rigi

ns th

at a

re n

ear

the

gene

s.D

onat

o, C

hung

& T

ye, P

LoS

Gen

et. 2

, E14

1 (2

006)

;Sn

yder

, Sap

olsk

y &

Dav

is, M

CB

8, 2

184

(198

8).

ÆT

his

corr

elat

ion

iseq

uiva

lent

toa

rece

ntly

disc

over

edco

rrel

atio

n,w

hich

mig

htbe

due

toa

prev

ious

ly u

nkno

wn

mec

hani

sm o

f re

gula

tion.

Alte

r &

Gol

ub, P

NA

S 10

1, 1

6577

(20

04).

The

firs

tti

me

that

ada

ta-d

rive

nm

athe

mat

ical

mod

elof

DN

Am

icro

arra

yda

taha

sbe

enus

edto

pred

ict

ace

llul

arm

echa

nism

ofre

gula

tion

that

istr

uly

ona

geno

me

scal

e.

Page 12: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

Analysis of Synchronized Cdc6�/45� Cultureswhere DNA Replication Initiation is Prevented

without Delaying Cell Cycle ProgressionOmberg, Meyerson, Kobayashi, Drury, Diffley & Alter, Nature MSB 5, 312 (2009);

http://www/nature.com/doifinder/10.1038/msb.2009.70

Page 13: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

HOSVD Detection and Removal of ArtifactsReconstructing the data tensor of 4,270 genes ¥ 12 time points, or x-settings ¥ 8 time courses, or y-settings, filtering out “x-eigengenes” and“y-eigengenes” that represent experimental artifacts.

Batch-of-hybridization

Culture batch,microarrayplatform andprotocols

Page 14: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

Uncovering Effects of Replication and OriginActivity on mRNA Expression with HOSVD

1,1,1 72% >0Steady State

First, ~88% of mRNA expression isindependent of DNA replication.

2,2,1 9% >0 � M/G1 <2·10-33 Ø S/G2 <7·10-16

3,3,1 7% >0 � G1/S <2·10-77 Ø G2/M <3·10-36

Unperturbed Cell Cycle

Page 15: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

Replication-Dependent Perturbations4,1,2 2.7% >0 � ARSs 3’ ~10-2 Ø histones <10-12

7,3,2 0.8% >0 � histones <5·10-4

DNA replication increases time-averagedand G1/S expression of histones.

Histones are overexpressed in the controlrelative to the Cdc6� condition, and to alesser extent also relative to the Cdc45�

condition (a P-value ~2·10-15).

Second, the requirement of DNAreplication for efficient histone geneexpression is independent of conditionsthat elicit DNA damage checkpointresponses.

Page 16: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

Origin Binding-Dependent Perturbations5+6,1,3 1.9% >0 � histones <2·10-8 Ø ARSs 3’ <2·10-3

8,3,3 0.7% >0 Ø ARSs 3’ <7·10-4

Origin binding decreases time-averaged and G2/M expression of geneswith ARSs near their 3’ ends. These genes are overexpressed in theCdc6� relative to the Cdc45� condition, and to a lesser extent also relativeto the control (a P-value <4·10-7) Æ Third, origin licensing decreasesexpression of genes with origins near their 3’ ends, revealing thatdownstream origins can regulate the expression of upstream genes.

Page 17: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

Experimental Verification of theComputationally Predicted MechanismOmberg, Meyerson, Kobayashi, Drury, Diffley & Alter, Nature MSB 5, 312 (2009);

http://www/nature.com/doifinder/10.1038/msb.2009.70

Æ These experimental results reveal that downstream origins canregulate the expression of upstream genes.

Æ These experimental results verify the computationally predictedmechanism of regulation that correlates binding of the licensingproteins Mcm2–7 with reduced expression of adjacent genesduring the cell cycle stage G1.Alter & Golub, PNAS 101, 16577 (2004);Alter, Golub, Brown & Botstein, Proc. MNBWS 15 (2004).

Æ These experimental results are also in agreement with theequivalent correlation between overexpression of binding targetsof Mcm2–7 and expression in response to oxidative stress.Omberg, Golub & Alter, PNAS 104, 18371 (2007);Cocker, Piatti, Santocanale, Nasmyth & Diffley, Nature 379, 180 (1996);Blanchard et al., MBC 13, 1536 (2002).

Æ This demonstrates for the first time that mathematical modeling ofDNA microarray data can be used to correctly predict biologicalmechanisms.

Page 18: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

HO GSVD for Comparative Analysis of DNAMicroarray Data from Multiple Organisms

Ponnapalli, Saunders, Golub & Alter, under revision.

Yeast Spellman et al. MBC 9, 3273 (1998).

Human Whitfield et al. MBC 13, 1977 (2002).

An HO GSVD that extendsto higher orders most of themathematical properties ofthe GSVD,

D1 =U1�1VT ,

D2 =U2�2VT ,

M

DN =UN�NV T .

Æ The only framework todate that is not limited tocomparison of similar genesamong the organisms.Æ Reveals universality andspecialization that are trulyon genomic scales. Alter, Brown & Botstein, PNAS 100, 3351 (2003).

Page 19: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

Mat

h V

aria

bles

ÆB

iolo

gyG

enel

ets

ofal

mos

tequ

alsi

gnif

ican

cein

both

data

sets

Æ p

roce

sses

com

mon

to b

oth

geno

mes

:

Com

mon

Cel

l Cyc

le S

ubsp

ace

Gen

elet

sof

alm

ost

nosi

gnif

ican

cein

one

data

set

rela

tive

to th

e ot

her

Æ g

enom

e ex

clus

ive

proc

esse

s:

Exc

lusi

ve S

ynch

roni

zati

onR

espo

nses

Sub

spac

es

¨Sa

ccha

rom

yces

cer

evis

iae

Hum

anÆ

Page 20: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

A Higher-Order GSVD

Definition:

Di =Ui�iVT , �i = diag(� i,k )

SV =V�

S � 1N ( N�1) (AiAj

�1+

j>i

N

�i=1

N

� Aj Ai�1)

Ai = DiT Di

Assumption: ��V �Rn�n

Interpretation:� i,k � j ,k � 1� vk of similar significance in Di and Dj

� i,k � j ,k <<1� vk of negligible significance in Di relative to Dj

Page 21: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

Math Variables Æ Biology

Genelets of almost equal significance in all datasets Æprocesses common to all genomes:

Common Cell Cycle Subspace

Page 22: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

Math Operations Æ Biology

Simultaneous Classification in theCommon Cell Cycle Subspace

Schizosaccharomyces pombeRustici et al. Nat. Genet. 36, 809 (2004).

Saccharomyces cerevisiaeSpellman et al. MBC 9, 3273 (1998).

HumanWhitfield et al. MBC 13, 1977 (2002).

Genome-wide correspondence among genes of the yeasts and human.

Page 23: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

The interplay between mathematical modeling and experimentalmeasurement is at the basis of the “effectiveness of mathematics” inphysics. Wigner, Commun. Pure Appl. Math. 13, 1 (1960).

Page 24: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

Mathematical modeling of DNA microarray data could lead beyondclassification of genes and cellular samples to the discovery andultimately also control of molecular biological mechanisms.

Andrews & Swedlow, Nikon Small World (2002).

Our mathematical models may form the basis of a future wheremolecular biological systems are modeled and controlled as physicalsystems are today.

Page 25: Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to Principles of Nature Alter, PNAS 103, 16063 (2006); Alter, in Microarray Data Analysis:

Thanks to –Collaborators:

John F. X. DiffleyCancer Research UK, London

Gene H. GolubComputer Science, Stanford

Robin R. GutellIntegrative Biology, UT

Michael A. SaundersOperations Research, Stanford

David BotsteinGenomics Institute, Princeton

Patrick O. BrownBiochemistry, Stanford

Students:Kayta Kobayashi, Pharmacy, UT

Joel R. Meyerson, BME, UTLarsson Omberg, Physics, UT

Cheng H. Lee, BME, UTChaitanya Muralidhara, CMB, UT

Sri Priya Ponnapalli, ECE, UTDaifeng Wang, ECE, UT

Justin A. Drake, BME, UTAndrew M. Gross, BME, UT

Support:NHGRI K01 Development

Award in Genomic Research

NHGRI R01 HG004302

And, thank you!!!