Research in Computational Genomics Mar Albà
Transcript of Research in Computational Genomics Mar Albà
Research in Computational Genomics
Mar Albà
Evolutionary Genomics GroupResearch Unit on Biomedical Informatics
Universitat Pompeu Fabra
UPC, April 1 2005
1. The genetic information
2. The human genome project
3. Genomics: techniques and research
1. The genetic information
1865 – Mendel
The genetic information: inheritance
1928 – Griffith : transforming principle
deadly bacteria
non deadly bacteria
pneumonia Infection of mice
Die
Live
boiled deadly bacteria Live
Die+
1944 - Avery, MacLeod, McCarthy: DNA is the transforming principle
Live
Die
+ + DNAse
+ + protease
DNA is the hereditary material
DNA structure
1953 – Watson and Crickdiscover the structure of DNA
1953 – Rosalind FranklinX difraction image of DNA
DNA structure: antiparallel double helix
A: adenineG: guanineC: citosineT: thymine
C-GA-T
nucleotides:
RNA:
-single strand
-uracil instead of thimine
-contains riboseinstead of desoxiribose
A-UC-G
Proteins
QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSFNVATLPAEKMKILELPFASGDLSMLVLLPDEVSDLERIEKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTSVLMALGMTDLFIPSANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADHPFLFLIKHNPTNTIVYFGRYWSP
Proteins are made of amino acids
amino acid
20 amino acids
Peptide bond
Proteins: amino acid chain
DNA replication
Transcription
The transcription of a gene may be off or on, dependingon the cell type and conditions.
Translation
Translation
Genetic code
1 2 3 4 5 6
nucleotides coding DNA
AA 1 AA 2amino acids
protein
ATGGCACAACCA…
MetAlaGlnPro..
DNA cloning
DNA fragments Vectors (replicating DNA)
+ DNA ligase
vectorwith insert
transformation of bacteria
amplificationextraction
DNA sequencing
......
DNA polymerase
DNA synthesis
resulting partial labelled fragments
DNA sequencing
2. The human genome project
The human genome project
1953 - Discovery of the DNA double helix by Watson and Crick
1995 - Haemophilus influenzae genome
2001 - The first draft of the human genome ispublished, covering approximately 94% of thegenome (Public Consortium + Celera)
2003 – Human genome sequence complete
2001 – Draft of the human genome
15 February 2001
Josep Abril and Roderic Guigó
IMIM (Institut Municipal d’Investigacions Mèdiques, Barcelona)participates in the annotation of the human genome
Human genome : 3.000.000.000 nucleotides
Human chromosomes
What’s in the human genome?
gene non-coding part
gene coding part(2%)
“parasitic”repetitiveelements
microsatellitesDNA long repeats
EXONS
INTRONS
‘UPSTREAM’REGULATORYELEMENT
‘DOWNSTREAM’REGULATORYELEMENT
PROMOTER
PROTEIN
Gene structure
Organism Genome Size (Bases) Estimated Genes
Human (Homo sapiens) 3 billion 30,000
Laboratory mouse (M. musculus) 2.6 billion 30,000
Mustard weed (A. thaliana) 100 million 25,000
Roundworm (C. elegans) 97 million 19,000
Fruit fly (D. melanogaster) 137 million 13,000
Yeast (S. cerevisiae) 12.1 million 6,000
Bacterium (E. coli) 4.6 million 3,200
Human immunodeficiency virus (HIV)
9700 9
Comparison with other genomes
~ 30.000 genes
~ 10.000 already known (cDNA)
-Gene prediction programmes
-Homology to other species
-ESTs (expressed sequence tags)
Gene catalogue
- the functions of approximately half of the genes are not known !
“Parasitic” repetitive elements
Nature, Feb. 15, 2001
“Parasitic” repetitive elementsRetrotransposition
genomeLINE
RNA
transcriptionpol II
translation Translocationof the complex
LINE copy
cytoplasm
3. Genomics: techniques and research
- bioinformatics
- genome sequencing and annotation
- functional genomics
- systems biology
Genomics
Genome sequencing and annotation
Exponential growth of DNA sequences
How many genomes?
Genome Sequencing Projects on GOLD ©
0
200
400
600
800
1000
1200
Dec-97Mar-98Jun-98Sep-98Dec-98Mar-99Jun-99Sep-99Dec-99Mar-00Jun-00Sep-00Dec-00Mar-01Jun-01Sep-01Dec-01Mar-02Jun-02Sep-02Dec-02Mar-03Jun-03Sep-03Dec-03Mar-04
Incomplete
Complete
Recently sequenced eukaryotic genomes
T.rubripes
C.intestinalis
A.gossypii
A.mellifera
R.norvegicus
A.gambiae
How long does it take to sequence a genome?
bacteria: 1 day
fungus: 1 week
insect: 1-2 months
mammal: 1-2 years
Gene prediction
- DNA coding for protein sequences (exons) only accounts for 2% of the human genome
-Information we can use:
- splice site signals-statistics of coding sequences
EXONS
PROTEIN
gene
Sequence similarity
-To predict genes we can also use sequence similaritysearches to known proteins
alignment of protein sequences
Microbial Genomes at NCBI
http://www.ncbi.nlm.nih.gov/genomes/MICROBES/Complete.html
National Center for Biotechnology information, National Institute of Health
Functional annotation of all genes in a genome
Ensembl Genome Browser
http//www.ensembl.org European Bioinformatics Institute
Ensembl Genome Browser
Encode (NIH)Encyclopedia Of DNA Elements
- exhaustive analysis of 1% of the human genome
- identification of functional elements
- development and comparison ofdifferent computational methods
http://www.genome.gov/Pages/Research/ENCODE/2003-
HapMap (Haplotype Map)
http://www.hapmap.org/2002-
Variability map (single nucleotide polymorphism, SNPs) in Africa, Asiaand USA populations.
It will help identify genes involved incomplex disease, by association with particular haplotypes.
haplotype variants
SNPs
Environmental Genome Shotgun Sequencing of the Sargasso Sea
J.Craig Venter et al. Science, Vol 304, Issue 5667, 66-74, 2 April 2004
1.045 billion base pairs
1800 genomic species
148 previously unknown bacterial phylotypes
Functional genomics
DNA microarrays: high throughput analysisof gene transcription
chIp-chip : analysis of protein binding DNA fragments
cross-link protein and DNA
immunoprecipitation
eliminate protein
hybridize with DNA
Protein-protein interactions: yeast two hybrid
Protein interaction networks
Systems biology
- Development of mathematical methods to model thebehaviour of biological systems, including all elements inthe system and their interactions.
Funded in 2000 byLeroy Hood, Seattle
Masaru Tomita,Keio Unversity, Japan
National Center for Biotechnology Information (USA):
http://www.ncbi.nlm.nih.gov
European Bioinformatics Institute (UK):
http://www.ebi.ac.uk
Acknowledgements :
Grup de Recerca en Informàtica Biomèdica – Ferran SanzGrup de Genòmica Computacional – Roderic Guigó
Universitat Pompeu Fabra
www.imim.es/grib
Genòmica ComputacionalGRIB