I. Systems Biology of Circadian Rhythms & II. Neural Network …€¦ · BIO_CYCLE Architecture....

Post on 24-Jul-2020

0 views 0 download

Transcript of I. Systems Biology of Circadian Rhythms & II. Neural Network …€¦ · BIO_CYCLE Architecture....

I. Systems Biology of Circadian Rhythms

II. Neural Network Capacity

P. BaldiUniversity of California, Irvine

Department of Computer ScienceInstitute for Genomics and BioinformaticsCenter for Machine Learning and Intelligent Systems

1

Circadian (and Other) Rhythms are Pervasive in Biological Systems

Circadian (≈24 hours)

Ultradian (<24 hours)

Seasonal (>24 hours)

EEG activity during sleep

Sleep/wake

Strumwasser, F. (1960) Some physiological principles governing hibernation. Bulletin of the Museum of Comparative Zoology, 124, Harvard University

Hibernation

EEG activity during sleep

2

Circadian Rhythms are Self-Perpetuating

Observation: plant leaves continued to fold rhythmically,

even in constant darkness

Observation: Plant leaves continued to fold rhythmically but rhythms slightly deviant from the

24-hour time span. This was indicative of endogenous, free-running clock

Jean-Jacques d'Ortous de Mairan

Augustin Pyramus de Candolle

Mimosa pudica

3

4

5

Astronauts in Caves: Michel Siffre

Circadian Clocks are Tethered to the Environment via the SCN

6

7

Cell, 155, 7, 1464-1478, 2013.8

Autonomous Core Clock

9

Consequences of the Clock Breaking Down

• Sleep disorders (FASPS, etc.)• Depression• Obesity/metabolic disorders• Aging

10

Towards Personalized Medicine

11

Zhang et al. PNAS, 11, 45, 2014.

BIO_CYCLE

• Given a circadian time series for a transcript, metabolite, protein etc determine if it is periodic or not with some statistical significance.

• If periodic, estimate the period, the phase, the amplitude.

12

BIO_CYCLE Architecture

13

3 Hidden Layers 100 Units Each

Sigmoidal: Periodic/AperiodicLinear: Period

Additional Computation of Amplitudes, Phases, p-values, q-values

Sample of Training Signals

14Training set 1-10M examples

Evaluation

15

BIO_TIME

• Given measurements for transcripts, metabolites, proteins etc taken at a single time point, determine the time (or phase).

• Initially use only core clock transcripts in mouse.

16

17

Circadian AutoencoderNeural Network

BIO_TIME Architecture

18

10-16 Core Clock Genes

Cos=U1/sqrt(U12+U22)Sin=U2/sqrt(U12+U22)

BIO_TIME EVALUATION

• For WT, predicts time with 90 minutes fairly robustly across tissues.

• Used to impute a time for all mouse Gene Expression Omnibus (GEO) experiments.

• Challenges for extending to other species.

19

Examples of High-Throughput Experiments

20

S. Masri, T. Papagiannakopoulos, K. Kinouchi, Y. Liu, M. Cervantes, P. Baldi, T. Jacks, and P. Sassone-Corsi. Lung Adenocarcinoma Distally Rewires Hepatic Circadian Homeostasis. Cell, 165, 4, 896—909, (2016).

S. Masri, P. Rigor, M. Cervantes, N. Ceglia, C. Sebastian, C. Xiao,M. Roqueta-Rivera, C. Deng, T. F. Osborne, R. Mostoslavsky, P. Baldi, and P. Sassone-Corsi. Partitioning Circadian Transcription by SIRT6 Leads to Segregated Control of Cellular Metabolism. Cell, 158, 3, 659—672, (2014).

K. L. Eckel-Mahan, V. R. Patel, S. de Mateo, N. J. Ceglia, S. Sahar, S. Dilag, K. A. Dyar, R. Orozco-Solis, P. Baldi, and Paolo Sassone-Corsi. Reprogramming of the Circadian Clock by Nutritional Challenge. Cell, 155, 7, 1464-1478, (2013).

M. M. Bellet, E. Deriu, J. Liu, B. Grimaldi, C. Blaschitz, M. Zeller, R. A. Edwards, S. Sahar, S.Dandekar, P. Baldi, M. D.George, M. Raffatellu, and P. Sassone-Corsi. The Circadian Clock Regulates the Host Response to Salmonella. PNAS, 110, 24, 9897-9902, (2013).

K. L. Eckel-Mahan, V. R. Patel, K. S.Vignola, R. P. Mohney, P. Baldi, and P. Sassone-Corsi. Coordination of Metabolome and Transcriptome by the Circadian Clock. PNAS, 109 (14) 5541-5546, (2012).

http://circadiomics.igb.uci.eduNature Methods 9, 8, 772-773, 2012.Nuclei Acids Research, Web Server Issue, in press, (2018).

21

Cell, 155, 7, 1464-1478, 2013.22

23

Trends in Cell Biology, 24, 329-331, 2104.Bioinformatics, 31, 19, 2015.

At p = 0.05, 68% oscillate.95% with more recent data sets.

24

At p = 0.05, 67% oscillate.

25

26

Main Findings1. The core clock contains only a dozen genes.2. In any tissue/condition 10% (± 5%) of transcripts

or metabolites oscillates. 3. The overlap across tissues/conditions is small

(2%).4. Genetic or environmental perturbations result in

massive changes:– Amplitude changes (including suppression)– Phase changes– New oscillations

27

Explanation

1. In general, molecular species in isolation do not oscillate

2. Loops of interacting (regulatory, metabolic, PPI) species can oscillate

3. Many oscillator loops in the cell4. Why do they tend to have an intrinsic period

of 24h?

28

29

Physical objects have intrinsic vibration frequencies….

30

3.5x109x365= 1.3 x 1012

More like: 2 x 1012

(period has increased due to tidal effects) 31

Number of Revolutions since the Origin of Life

Cyanobacteria

32

Network of Coupled Circadian Oscillators: Spectrum of Models

• At one extreme, completely centralized. The core clock controls all the oscillators.

• At the other extreme, completely decentralized. The oscillators compete and self-organize.

• Biology is somewhere in between. Where?

33

34

Circadian Regulatory Control (CRC)

• TF protein coding transcript if and only if:1) TF and transcript are circadian at some p-

value (BIO_CYCLE);2) TF has binding sites in the promoter of

transcript (MotifMap, MotifMap-RNA);3) TF and transcript have the “right” phase

lag;Similarly for RBPs (taking the introns or UTRs of the target transcript).

35

Empirical Distribution of Lags

36

Tables showing the ranking of circadian TFs and RBPs by CRC E-score in different tissue types. The leftmost table shows ranking in mouse transcriptome across all datasets.

RBPs are labeled in red while TFs are labeled in black. Core clock TFs have been removed from the listing.

37

38

39

Highly enriched in olfactory GPCRs.

40

Highly enriched in olfactory GPCRs.

Hierarchical Organization

• Core Clock at the apex (level 0)• Level 1 (35%, distance 1)• Level 2 (70%, distance 2)• Level 3 (80%, distance 3)• Fan out decreases with distance.• There is feedback between levels.• Most of cellular reprogramming must occur at level 1.• Small set of transcripts that do no oscillate in any

experiment: highly enriched in olfactory GPCRs.

41

42

Summary• Roughly 10% of all molecular species oscillate in any

cell/tissue/condition with small overlaps beyond the core clock.• Genetic, epigenetic, and environmental conditions (e.g. diet) have a

profound effect on which species oscillate and lead to cellular reprogramming.

• Tools: BIO_CYCLE, BIO_TIME, CircadiOmics.• Coupled-circadian-oscillator networks provides a general

framework.• Hierarchical organization emanating at the core clock. • Precision medicine: monitor and optimize health by monitoring

and optimizing oscillations. • Precision medicine: New diagnostic tools. Optimize timing of

therapeutic interventions (drugs). • Even a 5% increase in efficacy could have significant impact.

43

II. Neural Network Capacity

44

Neural Network Capacity

• h = target function (typically known from examples)• A = class of hypothesis or approximating functions (typically

associated with a NN architecture)

h

A

Neural Network Capacity

• h = target function (typically known from examples)• A = class of hypothesis or approximating functions (typically

associated with a NN architecture)

h

A

C(A) = log2 |A|

Neural Network Capacity• Can we compute C(A) for specific, interesting, neural

networks?

h

A

C(A) = log2 |A|

Neural Network Capacity

• Assume neural networks of linear or polynomial threshold gates (Boolean functions)

f = sign

Threshold Gates

• Linear Threshold Gatesy = sign [∑i wi xi]

• Polynomial Threshold Gatesy = sign [Pd(x)]

• Variations:– Homogenous– Binary weights– Positive weights

Network Capacity

• Given a network of linear or polynomial threshold gates, |A| is finite.

• We define the capacity as:

C(network) = log2 |A|=log2(#number of Boolean functions that can be

implemented by the network)

ANDORNOTGEQLEQSINGLE

PARITYCONNECTEDPAIR

22𝑁𝑁

?

Capacity of a Single Linear Threshold Gate

Capacity Of Linear Threshold Gates

C[LTG(N)] ≤ N2

T. Cover 1965

Capacity Of Linear Threshold Gates

C[LTG(N)] ≤ N2

cN2 ≤ C[LTG(N)] (c<1) T. Cover 1965

S. Muroga (1965)

Capacity Of Linear Threshold GatesC[LTG(N)] ≤ N2

cN2 ≤ C[LTG(N)] (c<1) T. Cover 1965

S. Muroga (1965)

C[LTG(N)] = N2 (1 + o(1))

Yu. A. Zuev (1989)

ANDORNOTGEQLEQSINGLE

PARITYCONNECTEDPAIR

22𝑁𝑁

2𝑁𝑁2

Capacity of a Single Linear Threshold Gate

Capacity Of Polynomial Threshold Gates

C[PTG(N,d)] ≤ 𝑁𝑁𝑑𝑑+1

𝑑𝑑!P.B. 1988

Capacity Of Polynomial Threshold Gates

C[PTG(N,d)] ≤ 𝑁𝑁𝑑𝑑+1

𝑑𝑑!P.B. 1988

𝑁𝑁𝑑𝑑 + 1 ≤ C[PTG(N,d)] M. Saks 1993

Capacity Of Polynomial Threshold Gates

C[PTG(N,d)] ≤ 𝑁𝑁𝑑𝑑+1

𝑑𝑑!P.B. 1988

𝑁𝑁𝑑𝑑 + 1 ≤ C[PTG(N,d)] M. Saks 199

C[PTG(N,d)] = 𝑁𝑁𝑑𝑑+1

𝑑𝑑!(1 + o(1))

P.B. and R.V. 2018

Additional Results

• Binary weights (wi= -1 or +1):

C(Binary-Weight LTG) = N

• Positive weights (wi ≥ 0): C(Positive-Weight LTG) = N2 – N

• ReLUC(ReLU) = N2 + N

𝑑𝑑 = 1 𝑁𝑁2

𝑑𝑑 = 2 𝑁𝑁3/2

⁄𝑁𝑁𝑑𝑑+1 𝑑𝑑!

𝑑𝑑 = 1𝑁𝑁

𝑑𝑑 = 1 𝑁𝑁2−𝑁𝑁

Linear threshold functions with binary weights

Linear threshold functions with positive weights

Linear threshold functions (d=1)

Polynomial threshold functions of degree d

All Boolean functions of N variables 2N

ReLU N2 + N

What about Networks?

• Focus on LTG but everything can be extended to PTG.• Fully connected RNN with N linear threshold gates:

C(RNN) = N3

• More generally, for any neural network NN with N linear threshold gates:

C(NN) ≤ ∑ capacities = ∑ (fan-ini)2 ≤ N (max fan-ini)2

Neural Network Capacity

• The capacity satisfies: C(NN) ≤ ∑ capacities

• The capacity of a polynomial-size network can be expressed as a polynomial. The capacity of a network with N LTG units is at most N3.

• Can we get estimates on the capacity?• Can we get lower bounds on the capacity?• Can we compare different architectures?

Neural Network Capacitywith Single Hidden-Layer

N

M

Theorem: C[N,M,1] = MN2 (1 + o(1))

For instance, if M = α N: C ≈ α N3

Conclusions• Precise definition of capacity C=log2[#functions].• C [PTG(N,d)] = Nd+1/d! (1+o(1)).• C[LTG(N,d)] = N2(1+o(1)).• Extensions to special cases (binary weights, positive

weights).• The capacity of a fully connected network with N units

is N3.• The capacity of feedforward networks can be

estimated. It is a low degree polynomial in the relevant variables (fan-ins, layer sizes).

• C[N,M,1] = MN2(1+o(1))

Conclusions

• The capacity can be used to compare different architectures.

• Ongoing work: deep networks versus shallow networks, finite size versus asymptotic.

THANK YOU

66

A

67

Diet Change

Normal Chow (2221) (1517) High-Fat (1110)

68

Clock-Bmal1 form a complex

And bind to tandem E-boxes in the promoter of Upp2

Driving the rhythmic expression of Upp2

And in turn the rhythmic expression of Uracil and Uridine

Example:

PNAS, 109 (14) 5541-5546, 2012. 69

Coupled-Circadian-Oscillators Framework

• Many oscillatory loops• Intrinsic periodicity close to 24 h (evolution)• Coupled-oscillators• Many coupling mechanisms:

– ≈10% of genes are in a directed loop containing Clock or Bmal1

– ≈60% of genes are within two hops from Clock or Bmal1

– odd/even loops• Amplitude limited by energy balance

(homeostasis)

70

71

Highly enriched in olfactory GPCRs.

72

Highly enriched in olfactory GPCRs.

73

Highly enriched in olfactory GPCRs.