BITS - Introduction to comparative genomics

Post on 26-Jan-2015

114 views 3 download

description

This is the first presentation of the BITS training on 'Comparative genomics'. It reviews the basic concepts of sequence homology on different levels.Thanks to Klaas Vandepoele of the PSB department.

Transcript of BITS - Introduction to comparative genomics

Comparative genomicsin eukaryotes

Klaas Vandepoele, PhD

Professor Ghent UniversityComparative & Integrative GenomicsVIB – Ghent University, Belgium

Klaas.Vandepoele@psb.vib-ugent.be

http://www.bits.vib.be

Outline

Introduction

Gene family analysis

Genome analysis

ConTra: promoter alignment analysis

2

3

What is comparative genomics?

Because all modern genomes have arisen from common ancestral genomes, the relationships between genomes can be studies with this fact in mind. This commonality means that information gained in one organism can have application in other even distantly related organisms. Comparative genomics enables the application of information gained from facile model systems to agricultural and medical problems. The nature and significance of differences between genomes also provides a powerful tool for determining the relationship between genotype and phenotype through comparative genomics and morphological and physiological studies.

http://genomics.ucdavis.edu/what.html

4

Principles

DNA sequences encoding and regulating the expression of essential proteins and RNAs will be conserved

Consequently, the regulatory profiles of genes involved in similar processes among related species will be conserved

Conversely, sequences that encode or control the expression of proteins or RNAs responsible for differences between species will be divergent

5

Definition

“ The combination of genomic data and comparative / evolutionary biology to address questions of genome structure, evolution and function”

Hardison, PLoS Biology 2003

6

What can we learn from cross-species comparisons?

Genome conservation transfer knowledge gained from model

organisms to non-model organisms

Genome variation understand how genomes change over time in

order to identify evolutionary processes and constraints

Detection of functional elements Coding elements (e.g. exons) Conserved non-coding sequences / elements

7

Conservation of gene structure

8

Homology & sequence similarity

Homology = shared ancestral common origin

Inferred based on: Sequence similarity Similar (multi-) protein domain

composition and organization So sequence similarity means homology?

No, it depends!

"Orthologs, paralogs, and evolutionary genomics“, Koonin 2005

9

Homology & sequence similarity

Sequence analysis aims at finding important sequence similarities that would allow one to infer homology. The latter term is extensively used in scientific literature, often without a clear understanding of its meaning, which is simply common origin.

Homologous organs are not necessarily similar (at least the similarity may not be obvious); similar organs are not necessarily homologous.

For some reason, this simple concept tends to get extremely muddled when applied to protein and DNA sequences. Phrases like “sequence (structural) homology”, “high homology”, “significant homology”, or even “35% homology” are as common, even in top scientific journals, as they are absurd, considering the definition.

Sequence analysis aims at finding important sequence similarities that would allow one to infer homology. The latter term is extensively used in scientific literature, often without a clear understanding of its meaning, which is simply common origin.

Homologous organs are not necessarily similar (at least the similarity may not be obvious); similar organs are not necessarily homologous.

For some reason, this simple concept tends to get extremely muddled when applied to protein and DNA sequences. Phrases like “sequence (structural) homology”, “high homology”, “significant homology”, or even “35% homology” are as common, even in top scientific journals, as they are absurd, considering the definition.

10

Multiple Sequence AlignmentsS

eque

nces

(~

taxa

)

Columns (~positions) in the alignment

11

Genome-wide sequence retrieval

Finding information from whole-genome sequencing projects DNA sequence reads Assembled genomic DNA sequences Annotated genes (RNA genes + protein-

encoding genes) Repeats, transposable elements Integrated platform providing both sequence

data and functional genomics dataInfo

rmat

ion

va

lue

low

high

12

Genome databases

Species-specific databases SGD TAIR Many others, e.g. wormbase, flybase,...

General & Integrative repositories EBI Genomes & Integr8 / Ensembl NCBI Entrez Genome UCSC

13

14