Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to...
Transcript of Department of Biomedical Engineering, Institute for … · 2009-11-04 · From Data Patterns to...
Mathematical Modeling ofDNA Microarray Data:
Discovery of Biological Mechanisms withTensor Decompositions, and
Definitions of Novel Tensor Decompositionsfrom Biological Applications
Orly Alter
Department of Biomedical Engineering,Institute for Cellular and Molecular Biology and
Institute for Computational Engineering and Sciences
University of Texas at Austin
DNA Microarrays Record Genomic Signals
DNA microarrays relyon hybridization t orecord the completegenomic signals thatguide the progression ofcellular processes, suchas abundance levels ofDNA, RNA and DNA-bound proteins on agenomic scale.
From Data Patterns to Principles of NatureAlter, PNAS 103, 16063 (2006);
Alter, in Microarray Data Analysis: Methods and Applications (Humana Press, 2007), pp. 17–59.
Kepler’s discovery of his first law of planetary motion from mathematicalmodeling of Brahe’s astronomical data:
Kepler, Astronomia Nova (Voegelinus, Heidelberg, 1609), reproduced by permission of theHarry Ransom Humanities Research Center of the University of Texas, Austin, TX).
Physics-Inspired Matrix (and Tensor) ModelsMathematical frameworks for the description of the data, in which themathematical variables and operations might represent biological reality.
SVDAlter, Brown & Botstein,PNAS 97, 10101 (2000).
ComparativeGSVD
Alter, Brown & Botstein,PNAS 100, 3351 (2003).
IntegrativePseudoinverse
Alter & Golub,PNAS 101, 16577 (2004).
Uncover CellularProcesses and States
Uncover ProcessesCommon or ExclusiveAmong Two Datasets
Uncover CoordinationAmong Multiple Sets
Eigenvalue Decomposition Generalized EigenvalueDecomposition
Inverse Projection
Networks are Tensors of “Subnetworks”Alter & Golub, PNAS 102, 17559 (2005);
http://www.bme.utexas.edu/research/orly/network_decomposition/.
Æ =
+ + ...
The relations among the activities of genes, not only the activities of thegenes alone, are known to be pathway-dependent, i.e., conditioned bythe biological and experimental settings in which they are observed.
A Higher-Order SVD Predicts anEquivalent Biological MechanismLinear transformation of the data tensor from genes ¥x-settings ¥ y-settings space to reduced “eigenarrays”¥ “x-eigengenes” ¥ “y-eigengenes” space.
This HOSVD is computedfrom each SVD of the datatensor unfolded around onegiven axis,
De Lathauwer, De Moor &Vandewalle, SIMAX 21, 1253 (2000);Kolda, SIMAX 23, 243 (2001); Zhang& Golub, SIMAX 23, 543 (2001).
mRNA Expression fromCell Cycle Time Coursesunder Different Conditionsof Oxidative StressShapira, Segal & Botstein,
MBC 15, 5659 (2004);Spellman et al., MBC 9, 3273 (1998).
HOSVD Integrative ModelingOmberg, Golub & Alter, PNAS 104, 18371 (2007);http://www.bme.utexas.edu/research/orly/HOSVD/.
The data tensor is a superpositionof all rank-1 “subtensors,” i.e.,outer products of an eigenarray,an x- and a y-eigengene,
The significance of a subtensor isdefined by the corresponding“fraction,” computed from thehigher-order singular values,
The complexity of the data tensoris defined by the “normalizedentropy,”
Rotation in an ApproximatelyDegenerate Subtensor Space
An “approximately degeneratesubtensor space” is defined as thatwhich is span by, e.g., the subtensors
which satisfy
This HOSVD is reformulatedwith a unique single rank-1subtensor that is composed ofthese two subtensors,
Math Variables & Operations Æ BiologyHOSVD uncovers independent data patterns across each variable and theinteractions among them Æ global picture of the causal coordinationamong biological processes and experimental phenomena:
Equivalent DNA ´ RNA Correlation
Ove
rexp
ress
ion
ofbi
ndin
gta
rget
sof
repl
icat
ion
init
iati
onpr
otei
nsco
rrel
ates
wit
hre
duce
d,or
even
inhi
bite
d, b
indi
ng o
f th
e or
igin
s.Æ
Rep
lica
tion
init
iati
onre
quir
esbi
ndin
gof
thes
epr
otei
ns a
t ori
gins
of
repl
icat
ion.
Dif
fley
, Coc
ker,
Dow
ell,
& R
owle
y, C
ell 7
8, 3
03 (
1994
).
ÆT
hey
are
invo
lved
with
tran
scri
ptio
nals
ilenc
ing
atth
e ye
ast m
atin
g lo
ci.
Mic
klem
et a
l., N
atur
e 36
6, 8
7 (1
993)
.
Eith
eron
eof
two
prev
ious
lyun
know
nm
echa
nism
sof
regu
latio
n m
ay b
e un
derl
ying
this
cor
rela
tion:
ÆR
eplic
atio
n m
ay r
egul
ate
tran
scri
ptio
n:T
hebi
ndin
gof
MC
Mpr
otei
nsre
pres
ses
the
expr
essi
on o
f ge
nes
that
are
nea
r th
e or
igin
s.Æ
Tra
nscr
iptio
n m
ay r
egul
ate
repl
icat
ion:
The
tran
scri
ptio
nof
gene
sre
duce
sth
eef
fici
ency
of o
rigi
ns th
at a
re n
ear
the
gene
s.D
onat
o, C
hung
& T
ye, P
LoS
Gen
et. 2
, E14
1 (2
006)
;Sn
yder
, Sap
olsk
y &
Dav
is, M
CB
8, 2
184
(198
8).
ÆT
his
corr
elat
ion
iseq
uiva
lent
toa
rece
ntly
disc
over
edco
rrel
atio
n,w
hich
mig
htbe
due
toa
prev
ious
ly u
nkno
wn
mec
hani
sm o
f re
gula
tion.
Alte
r &
Gol
ub, P
NA
S 10
1, 1
6577
(20
04).
The
firs
tti
me
that
ada
ta-d
rive
nm
athe
mat
ical
mod
elof
DN
Am
icro
arra
yda
taha
sbe
enus
edto
pred
ict
ace
llul
arm
echa
nism
ofre
gula
tion
that
istr
uly
ona
geno
me
scal
e.
Analysis of Synchronized Cdc6�/45� Cultureswhere DNA Replication Initiation is Prevented
without Delaying Cell Cycle ProgressionOmberg, Meyerson, Kobayashi, Drury, Diffley & Alter, Nature MSB 5, 312 (2009);
http://www/nature.com/doifinder/10.1038/msb.2009.70
HOSVD Detection and Removal of ArtifactsReconstructing the data tensor of 4,270 genes ¥ 12 time points, or x-settings ¥ 8 time courses, or y-settings, filtering out “x-eigengenes” and“y-eigengenes” that represent experimental artifacts.
Batch-of-hybridization
Culture batch,microarrayplatform andprotocols
Uncovering Effects of Replication and OriginActivity on mRNA Expression with HOSVD
1,1,1 72% >0Steady State
First, ~88% of mRNA expression isindependent of DNA replication.
2,2,1 9% >0 � M/G1 <2·10-33 Ø S/G2 <7·10-16
3,3,1 7% >0 � G1/S <2·10-77 Ø G2/M <3·10-36
Unperturbed Cell Cycle
Replication-Dependent Perturbations4,1,2 2.7% >0 � ARSs 3’ ~10-2 Ø histones <10-12
7,3,2 0.8% >0 � histones <5·10-4
DNA replication increases time-averagedand G1/S expression of histones.
Histones are overexpressed in the controlrelative to the Cdc6� condition, and to alesser extent also relative to the Cdc45�
condition (a P-value ~2·10-15).
Second, the requirement of DNAreplication for efficient histone geneexpression is independent of conditionsthat elicit DNA damage checkpointresponses.
Origin Binding-Dependent Perturbations5+6,1,3 1.9% >0 � histones <2·10-8 Ø ARSs 3’ <2·10-3
8,3,3 0.7% >0 Ø ARSs 3’ <7·10-4
Origin binding decreases time-averaged and G2/M expression of geneswith ARSs near their 3’ ends. These genes are overexpressed in theCdc6� relative to the Cdc45� condition, and to a lesser extent also relativeto the control (a P-value <4·10-7) Æ Third, origin licensing decreasesexpression of genes with origins near their 3’ ends, revealing thatdownstream origins can regulate the expression of upstream genes.
Experimental Verification of theComputationally Predicted MechanismOmberg, Meyerson, Kobayashi, Drury, Diffley & Alter, Nature MSB 5, 312 (2009);
http://www/nature.com/doifinder/10.1038/msb.2009.70
Æ These experimental results reveal that downstream origins canregulate the expression of upstream genes.
Æ These experimental results verify the computationally predictedmechanism of regulation that correlates binding of the licensingproteins Mcm2–7 with reduced expression of adjacent genesduring the cell cycle stage G1.Alter & Golub, PNAS 101, 16577 (2004);Alter, Golub, Brown & Botstein, Proc. MNBWS 15 (2004).
Æ These experimental results are also in agreement with theequivalent correlation between overexpression of binding targetsof Mcm2–7 and expression in response to oxidative stress.Omberg, Golub & Alter, PNAS 104, 18371 (2007);Cocker, Piatti, Santocanale, Nasmyth & Diffley, Nature 379, 180 (1996);Blanchard et al., MBC 13, 1536 (2002).
Æ This demonstrates for the first time that mathematical modeling ofDNA microarray data can be used to correctly predict biologicalmechanisms.
HO GSVD for Comparative Analysis of DNAMicroarray Data from Multiple Organisms
Ponnapalli, Saunders, Golub & Alter, under revision.
Yeast Spellman et al. MBC 9, 3273 (1998).
Human Whitfield et al. MBC 13, 1977 (2002).
An HO GSVD that extendsto higher orders most of themathematical properties ofthe GSVD,
D1 =U1�1VT ,
D2 =U2�2VT ,
M
DN =UN�NV T .
Æ The only framework todate that is not limited tocomparison of similar genesamong the organisms.Æ Reveals universality andspecialization that are trulyon genomic scales. Alter, Brown & Botstein, PNAS 100, 3351 (2003).
Mat
h V
aria
bles
ÆB
iolo
gyG
enel
ets
ofal
mos
tequ
alsi
gnif
ican
cein
both
data
sets
Æ p
roce
sses
com
mon
to b
oth
geno
mes
:
Com
mon
Cel
l Cyc
le S
ubsp
ace
Gen
elet
sof
alm
ost
nosi
gnif
ican
cein
one
data
set
rela
tive
to th
e ot
her
Æ g
enom
e ex
clus
ive
proc
esse
s:
Exc
lusi
ve S
ynch
roni
zati
onR
espo
nses
Sub
spac
es
¨Sa
ccha
rom
yces
cer
evis
iae
Hum
anÆ
A Higher-Order GSVD
Definition:
Di =Ui�iVT , �i = diag(� i,k )
SV =V�
S � 1N ( N�1) (AiAj
�1+
j>i
N
�i=1
N
� Aj Ai�1)
Ai = DiT Di
Assumption: ��V �Rn�n
Interpretation:� i,k � j ,k � 1� vk of similar significance in Di and Dj
� i,k � j ,k <<1� vk of negligible significance in Di relative to Dj
Math Variables Æ Biology
Genelets of almost equal significance in all datasets Æprocesses common to all genomes:
Common Cell Cycle Subspace
Math Operations Æ Biology
Simultaneous Classification in theCommon Cell Cycle Subspace
Schizosaccharomyces pombeRustici et al. Nat. Genet. 36, 809 (2004).
Saccharomyces cerevisiaeSpellman et al. MBC 9, 3273 (1998).
HumanWhitfield et al. MBC 13, 1977 (2002).
Genome-wide correspondence among genes of the yeasts and human.
The interplay between mathematical modeling and experimentalmeasurement is at the basis of the “effectiveness of mathematics” inphysics. Wigner, Commun. Pure Appl. Math. 13, 1 (1960).
Mathematical modeling of DNA microarray data could lead beyondclassification of genes and cellular samples to the discovery andultimately also control of molecular biological mechanisms.
Andrews & Swedlow, Nikon Small World (2002).
Our mathematical models may form the basis of a future wheremolecular biological systems are modeled and controlled as physicalsystems are today.
Thanks to –Collaborators:
John F. X. DiffleyCancer Research UK, London
Gene H. GolubComputer Science, Stanford
Robin R. GutellIntegrative Biology, UT
Michael A. SaundersOperations Research, Stanford
David BotsteinGenomics Institute, Princeton
Patrick O. BrownBiochemistry, Stanford
Students:Kayta Kobayashi, Pharmacy, UT
Joel R. Meyerson, BME, UTLarsson Omberg, Physics, UT
Cheng H. Lee, BME, UTChaitanya Muralidhara, CMB, UT
Sri Priya Ponnapalli, ECE, UTDaifeng Wang, ECE, UT
Justin A. Drake, BME, UTAndrew M. Gross, BME, UT
Support:NHGRI K01 Development
Award in Genomic Research
NHGRI R01 HG004302
And, thank you!!!