Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro...

63
Computational Genomics Irit Gat-Viks & Ron Shamir & Haim Wolfson Fall 2015-16 1 CG © 2015

Transcript of Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro...

Page 1: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Computational Genomics

Irit Gat-Viks & Ron Shamir & Haim Wolfson Fall 2015-16

1 CG © 2015

Page 2: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

What’s in class this week

• Motivation • Administrata • Some very basic biology • Some very basic biotechnology • Examples of our type of computational

problems

CG © 2015 2

Page 3: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

• The information science of biology: organize, store, analyze, visualize biological data

• Responds to the explosion of biological data, and builds on the IT revolution

Bioinformatics

3 CG © 2015

Page 4: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Paradigm shift in biological research

Classical biology: focus on a single gene or sub-system. Hypothesis driven

Systems biology: measure (or model) the behavior of numerous parts of an entire biological system. Hypothesis generating

Large-scale data; Bioinformatics

4 CG © 2015

Page 5: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Personalized medicine

6 CG © 2015

Page 6: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Administration • ~5 home assignments as part of a home exam, to be done independently (50%) • Final exam (50%) • Must pass the Final to pass the course (TAU rules)

• Classes: Tue 12:15-13:30; Thu 14:45-16:00 • TA: Ron Zeira (Thu 16-17).

7 CG © 2015

Page 7: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Administration (cont.) • Web page of the course: http://www.cs.tau.ac.il/~rshamir/cg/15/

• Includes slides and full lecture scribes of previous years on each of the classes.

8 CG © 2015

Page 8: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Bibliography

• No single textbook covers the course :-( • See the full bibliography list in the

website (also for basic biology) • Key sources:

– Gusfield: Algorithms for strings, trees and sequences

– Durbin et al.: Biological sequence analysis – Pevzner: Computational molecular biology – Pevzner and Shamir (eds.): Bioinformatics for

Biologists CG © 2015 9

Page 9: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

CG © 2015 10

lear

n.ge

neti

cs.u

tah.

edu

Page 10: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Lecture 1: Introduction

1. Basic biology 2. Basic biotechnology + some computational challenges arising along the way

11 CG © 2015

Slides prepared mainly by Ron Shamir and Adi Akavia

Page 11: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

1. Basic Biology

•Touches on Chapters 1-8 in “The Cell” by Alberts et al.

12 CG © 2015

Page 12: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

The Cell • Basic unit of life. • Carries complete characteristics of the species. • All cells store hereditary information in DNA. • All cells transform DNA to proteins, which are “the robots of the cell” and determine cell’s structure and function. • Two classes: eukaryotes (with nucleus) and prokaryotes (without).

http://regentsprep.org/Regents/biology/units/organization/cell.gif 13 CG © 2015

Page 13: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Nucleotide Chain Double helix

sugar

phosphate

Nucleotides/ Bases: Adenine (A), Guanine (G), Cytosine (C), Thymine (T).

Weak hydrogen bonds between base

pairs

Strong covalent bonds (phophodiester linkage) between sugars

Gregor Mendel laws of inheritance, “gene” 1866

Watson and Crick DNA structure 1953

14 CG © 2015

Page 15: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

DNA (Deoxy-Ribonucleic acid) • Bases:

– Adenine (A) – Guanine (G) – Cytosine (C) – Thymine (T)

• Bonds: – G - C – A - T

• Oriented from 5’ to 3’. • Located in the cell nucleus

Purines

pyrimidines

16 CG © 2015

Page 16: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

DNA and Chromosomes • DNA is packaged (105-fold)

• Chromatin: complex of DNA and proteins that pack it (histones)

• Chromosome: contiguous stretch of DNA

• Diploid: two homologous chromosomes, one from each parent

• Genome: totality of DNA material

17 CG © 2015

Page 17: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Replication

Replication fork

18 CG © 2015

Page 18: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Genes • Gene: a segment of DNA that specifies a protein. • The transformation of a gene into a protein is called

expression. • Genes are < 3% of human DNA • The rest - non-coding (used to be called “junk DNA”)

– RNA elements – Regulatory regions – Retrotransposons – Pseudogenes – and more…

19 CG © 2015

Page 19: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

20

Gene Structure

CG © 2015

Page 20: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

21

Gene Structure

CG © 2015

Page 21: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

CG © Ron Shamir 2010 22

The Gene Finding Problem Given a DNA sequence, predict the location of genes (open reading frames) exons and introns. •A simple solution: seeking stop codons.

•6 ways of interpreting DNA sequence

• In most cases of eukaryotic DNA, a segment encodes only one gene.

•Difficulty in Eukaryotic DNA: introns & exons

22 CG © 2015

Page 22: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

CG © Ron Shamir 2010 23

Proteins: The Cellular Machines

CG © 2015

Page 23: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Proteins • Build the cell and drive

most of its functions. • Polymers of amino-acids

(20 total), linked by peptide bonds.

• Oriented (from amino to carboxyl group).

• Fold into 3D structure of lowest energy.

24 CG © 2015

Page 24: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

DNA RNA protein

transcription translation

The hard disk

One program

Its output

http://www.ornl.gov/hgmis/publicat/tko/index.htm

25 CG © 2015

Page 25: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

RNA (Ribonucleic acid) • Bases:

– Adenine (A) – Guanine (G) – Cytosine (C) – Uracil (U); replaces T

• Oriented from 5’ (phosphate) to 3’ (sugar). • Single-stranded => flexible backbone =>

secondary structure => catalytic role.

26 CG © 2015

Page 26: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

27

The RNA Folding Problem Given an RNA sequence, predict its (secondary structure) folding = the one that creates a maximum number of matched pairs

27 http://www.phys.ens.fr/~wiese/highlights/RNA-folding.html

GCCUUAAUGCACAUGGGCAAGCCCACGUAGCUAGUCGCGCGACACCAGUCCCAAAUAUGUUCACCCAACUCGCCUGACCGUCCCGCAGUAGCUAUACUACCGACUCCUACGCGGUUGAAACUAGACUUUUCUAGCGAGCUGUCAUAGGUAUGGUGCACUGUCUUUAAUUUUGUAUUGGGCCAGGCACGAAAGGCUUGGAAGUAAGGCCCCGCUUGACCCGAGAGGUGACAAUAGCGGCCAGGUGUAACGAUACGCGGGUGGCACGUACCCCAAACAAUUAAUCACACUGCCCGGGCUCACAUUAAUCAUGCCAUUCGUUGCCGAUCCGACCCAUAAGGAUGUGUAUGCCUCAUUCCCGGUCGGGGCGGCGACUGUUAACGCAUGAGAACUGAUUAGAUCUCGUGGUAGUGCUUGUCAAAUAGAAUGAGGCCAUUCCACAGACAUAGCGUUUCCCAUGAGCUAGGGGUCCCAUGUCCAGGUCCCCUAAAUAAAAGAGUCUCAC

CG © 2015

Page 27: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Transcription

http://www.iacr.bbsrc.ac.uk/notebook/courses/guide/words/transcriptiongif.htm

Template

28 CG © 2015

Page 28: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

The Genetic Code

• Codon - a triplet of bases, codes a specific amino acid (except the stop codons)

• Stop codons - signal termination of the protein synthesis process

• Different codons may code the same amino acid

http://ntri.tamuk.edu/cell/ribosomes.html 29 CG © 2015

Page 29: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Translation

http://biology.kenyon.edu/courses/biol114/Chap05/Chapter05.html#Protein 30 CG © 2015

Page 30: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

31

CG © 2015

Page 31: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

DNA Protein

transcription translation

RNA

Expression and Regulation

Gene

Transcription factors (TFs) : proteins that control transcription by binding to specific DNA sequence motifs.

32 CG © 2015

Page 32: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

33

Proteins: The Cellular Machines

CG © 2015

Page 33: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

CG © Ron Shamir 2010 34

The Protein Folding Problem

•Given a sequence of amino acids, predict the 3D structure of the protein. •Motivation: functionality of protein is determined by its 3D structure. •Solution Approaches:

•Homology •Threading •de novo (=from scratch) 34 CG © 2015

Page 34: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

The Human Genome: numbers • 23 pairs of chromosomes • ~3,200,000,000 bases • ~21,000 genes • Gene length: 1000-3000 bases,

spanning 30-40K bases

35 CG © 2015

Page 35: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Model Organisms

• Eukaryotes; increasing complexity • Easy to grow, manipulate.

Budding yeast • 1 cell • 6K genes

Nematode worm • 959 cells • 19K genes

Fruit fly • vertebrate-like • 14K genes

mouse • mammal • 30K genes

36 CG © 2015

• Lots of common ground with humans: many / most genes are common – but with mutations

Page 36: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

CG © Ron Shamir 2010 37

Sequence Alignment problems

Given two sequences, find their best alignment: Match with insertion/deletion of min cost. Same for best match of contiguous subeq. Same for several sequences “Workhorse” of Bioinformatics! Key challenge: huge volume of data (more on this later) 37 CG © 2015

Page 37: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

38 CG © 2015

Page 38: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Introduction II: Basic Biotechnology and computational

challenges

Ron Shamir and Roded Sharan CG, Fall 2014-15

39 CG © 2015

Page 39: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

40

Restriction Enzymes • Natural role: break foreign DNA

entering the cell. • Ability:

– Breaks the phosphodiester bonds of a DNA upon appearance of a certain cleavage (cut) sequence.

– Different sequence for each enzyme – Hundreds of different enzymes known.

• Digestion = application of restriction enzymes to a sequence. CG © 2015

Page 40: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Cloning vector (plasmids)

Foreign DNA

Recombinant DNA

Introduction into host cell

Use of antibiotics to grow recombinant cells

Cloning

CG © 2015

Page 41: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

5’ 3’

5’ 3’

5’ 3’

5’ 3’

5’

5’

3’

3’

5’ 3’

5’ 3’

5’ 3’

5’ 3’

5’ 3’ 5’ 3’

5’ 3’

5’ 3’

5’ 3’

5’ 3’

5’ 3’ 5’ 3’

5’ 5’ 3’ 3’

5’

5’ 3’

5’ 3’

5’ 3’

3’

5’ 3’

5’ 3’

5’ 3’

5’ 3’

Denaturation

Annealing

Extension

Cycle 1

Cycle 2

Cycle 3

PCR

42 CG © 2015

Page 42: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

CG © 2015 43 http://www.atdbio.com/content/20/Sequencing-forensic-analysis-and-genetic-analysis

Page 43: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

44

Gel Electrophoresis

• Use: “race” digested DNA fragments through electrically charged gel

• Goals: – Separate a mixture of DNA fragments – Measure length of DNA fragments

• How does it work: – smaller molecule travel faster than larger ones – same size and shape ⇒ the same movement

speed

CG © 2015

Page 44: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

45 CG © 2015

http

://dl

ab.re

ed.e

du/p

roje

cts/

vgm

/vgm

/VG

MPr

ojec

tFol

der/V

GM

/RED

/RED

.ISG

/map

ping

.htm

l

Page 45: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

46

The Double Digest Problem Given 3 sets of distances {Xi} {Yi}

{Zi}, reconstruct cut sites A1<…<An and B1<…<Bm s.t. – {Ai-A i-1}={X}, {Bi-B i-1}={Y} – for C=A U B (ordered), {Ci-C I-1}={Z}

Complexity: NP hard, many

heuristics. 46 CG © 2015

Page 46: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

47

The Partial Digest Problem •Problem: Given a (multi-) set of distances {|Xi-Xj|} 1 ≤ i ≤ j ≤ n, reconstruct the original series X1,…,Xn

•Complexity: unknown (yet)

47 CG © 2015

Page 47: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

48

Sequencing • Sequencing: determining the sequence of

bases in a given DNA molecule. • Classical approach: gel electrophoresis • Basic idea: knowing the lengths of all prefixes

ending with letter X gives a partial seq • Creating DNA strands of different lengths :

catalyzing replication in environment with “terminator” A*.

• Repeat separately with C*, G*, T* • Abilities: reconstructs sequences of 500-1000

nucleotides. CG © 2015

• ---A-----A-

• -CC---CC—--

• T---T------

• -----G----G

Page 48: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

CG © 2015 49 http://www.atdbio.com/content/20/Sequencing-forensic-analysis-and-genetic-analysis

Page 49: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

51

The Sequence Assembly Problem

• Given a set of sub- strings, find the shortest (super)string containing all the members of the set.

http://www.ornl.gov/hgmis/graphics/slides/images1.html CG © 2015

Page 50: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

52

Rearrangement

Rearrangement is a change in the order of complete segments along a chromosome.

CG © 2015 http://www.copernicusproject.ucr.edu/ssi/HSBiologyResources.htm

Page 51: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

53

Genome Rearrangements

Challenges: •Reconstruct the evolutionary path of rearrangements •Shortest sequence of rearrangements between two permutations 53 CG © 2015

Page 52: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

54

More problems in sequencing data

Solve all the problems above (alignment, gene finding, rearrangements,…) on really huge datasets Need to handle practical problems of efficiency – time and space Need to overcome large noise (errors) due to data size

54 CG © 2015

Page 53: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

DNA Microarrays

55 CG © 2015

Page 54: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Hybridization

• DNA double strands form by “gluing” of complementary single strands

• Complementarity rule: A-T, G-C

ACTCCG TGAGGC

| | | | | |

56 CG © 2015

Page 55: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

57 CG © 2015

Page 56: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Gene Expression Arrays • Assumption: transcription level indicates

gene’s importance in a specific condition.

Given the expression profiles of normal vs. disease: - Build an algorithm to predict if a new sample is normal or disease (classifier) - Cluster disease profiles into sub-classes - Cluster genes into functional groups

58 CG © 2015

Page 57: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Breast cancer treatment

Van’t veer et al., nature’02 59 CG © 2015

Page 58: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

FIN

CG © 2015 60

Page 59: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Classifier construction Pa

tient

s

70-gene signature

No met.

Met.

• Classify to minimize incorrect assignments in the no met. class.

• 17/19 correct predictions on test set. 61 CG © 2015

Page 60: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

62

Human Variation • DNA of two human

beings is ~99.9% identical

• Phenotype and disease variation is due these 1/1000 mutations

Challenges: •Associate mutations to specific disease •Deal with huge datasets (noise and statistics)

62 CG © 2015

Page 61: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Challenges in network analysis:

identifying modules

63 CG © 2015

Page 62: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

64

Complexity summary

• ~21,000 genes in the genome • Hard to identify • Harder to figure their function • Even harder to figure how they work together

CG © 2015

Page 63: Gene Expression Analysis, DNA Chips and Genetic Networksrshamir/algmb/presentations/cg-intro 2015.pdf · • Gene: a segment of DNA that specifies a protein. ... Cloning vector (plasmids)

Promise and problems in comparative genomics (2009)

65

arabidposis c. elegans h. sapiens s. cerevisiae

genes

arabidopsis 26207 0.19 0.24 0.42 c. elegans 19992 0.26 0.38 0.38 h. sapiens 21673 0.30 0.28 0.43 s. cerevisiae 5884 0.21 0.13 0.19

a(x,y)= fraction of genes in genome y that have strict orthologs in genome x Source: 10/2009 CG © 2015