Comparitive genomics

51
COMPARITIVE GENOMICS.

Transcript of Comparitive genomics

Page 1: Comparitive genomics

COMPARITIVE

GENOMICS.

Page 2: Comparitive genomics

GENOME.

• The whole genetic complement of an organism

• The study of all the genes present in an organism

Page 3: Comparitive genomics

WHAT IS THE GENOME.

• cells contain chromosomes.

• Chromosomes consist of million of

genes.

• All these genes are present in DNA.

• The four Nucleotides(A,G,C,T) arrange

in different combinations to form

different genes.

• DNA encodes for thousands of proteins.

• Hereditary material

Page 4: Comparitive genomics

DNA SEQUENCING

• DNA of any organism is sequenced in

parts.

• In DNA sequencing both Molecular

biology and Computational techniques

is used.

• Molecular biology techniques like

Recombinant DNA Technology, PCR

amplification, Libraries Constructions

etc are used.

• Ultimately DNA Libraries are

constructed.

Page 5: Comparitive genomics
Page 6: Comparitive genomics

LIBRARIES

• Genomic Libraries:

• libraries of whole genome

• cDNA Libraries:

• libraries of only coding

regions (genes encodes for

some proteins)

Page 7: Comparitive genomics

SEQUENCED GENOMES

• After the completion of Human Genome

Sequencing in 2001 , genomes of many other

organisms are sequenced

• Table below show few examples of organisms

whose genomes are sequenced

Organisms Year Scientist

mouse 2002 Waterston et al

rat 2005 Gibbs et al

fruit fly 2000 Adams et al

baker's yeast 1996 Goffeau et al.

chicken 2004 Blattner et al

cow 2009 Elsik et al.

monkey 2007 Gibbs et al

Page 8: Comparitive genomics

COMPARATIVE GENOMICS

• comparative genomics is the field of biology in which

genomes of different organisms are compared by using

computational techniques

• Sequence alignment is main principle on which field of

comparative genomics based

• Different techniques of sequence alignment are in used

• Dot Plot

• Dynamic Programming

• Smith waterman algorithm

• Needleman Wunch Algorithm

• Heuristics

• FASTA,

• BLAST etc

• .

Page 9: Comparitive genomics

COMPARATIVE GENOMICS.

By comparing genomes of different

organisms we can find:

What is conserved between species

What make the closely related species

different

we can also study evolutionary changes,

gene function and inherited diseases

Page 10: Comparitive genomics

HOW TO COMPARE? Three possible ways to compare genomes

comparing of general features

Finer-Resolution comparison

Comparison of discrete segments

Page 11: Comparitive genomics

GENERAL FEATURES

• General features of genomes are comared to find

similarity and differences

• Genome Length

• Number of Exons

• Number of genes

• Chromosome Numbers

Page 12: Comparitive genomics
Page 13: Comparitive genomics

FINER RESOLUTION METHOD.

In Finer-Resolution Method genomes are compared by the direct

comparison of DNA sequences of different species for instance

Human chromosomes, with segments containing at least two

genes whose order is conserved in the mouse genome as colour

blocks. Each colour corresponds to a particular mouse

chromosome. Centromeres, subcentromeric heterochromatin of

chromosomes 1, 9 and 16, and the repetitive short arms of 13, 14,

15, 21 and 22 are in black.

Conserved segments in the human and mouse genome

Page 14: Comparitive genomics

COMPARING OF DISCRETE SEGMENTS

Comparison of Discrete segments can be obtain through the

comparison of homologous segments of sequences for instance

A human gene (pyruvate kinase: PKLR) and the corresponding PKLR homologs

from macaque, dog, mouse, chicken, and zebrafish are aligned. Macaque show

similarity in all regions like exon(blue),introns(red) and untranslated regions(light

blue).

Page 15: Comparitive genomics

Applications Comparative genomics are used in finding

Evolutionary relationship

Counterparts of genes

New model organisms

Page 16: Comparitive genomics

EVOLUTIONARY RELATIONSHIP

o With the passage of time some changes take place in the hereditary materiel which sometime brings useful changes and sometime cause drastic effect

o To study these changes evolutionary trees are generated on the bases of evolutionary distances

o Two types of trees are constructed

o Rooted trees

o Unrooted trees

Page 17: Comparitive genomics

ROOTED TREE • Rooted Tree is that tree which show the common

ancestor of all the target organisms

EXAMPLE

• let suppose we say that Human, apes, gorillas are all

comes under the Mammals. So we can say that

mammals is the parent of all so it is a common ancestor

all

Page 18: Comparitive genomics

UNROOTED TREE

• Unrooted trees are those trees which show the relationship of target

organisms with each other but do not show the common ancestor of all

• EXAMPLE

• Let suppose we say that Eagle, Sparrow, Crow, Dove are all related to each

other some how

Eagle

Sparrow

Crow

Dove

Page 19: Comparitive genomics

TERMINOLOGIES. • Some of the terminologies used is studying Phylogenatic trees

are

• Root: • The point which represent the common ancestor of all target

organisms

• Nodes: • Points from which leaves originates

• Leaves: • The teminal nodes or the end children of the tree

• Clade: • Subtree of a large tree

Page 20: Comparitive genomics

STRUCTURE OF TREE

Page 21: Comparitive genomics

SOFTWARES.

• The most commonly used software for

constructing the Phylogenatic tree is

ClustralW

• In clustralW insert all the sequences in it in

FASTA format

• Than submit it

• Select guide tree option to generate tree

Page 22: Comparitive genomics

ClustalW2

Retriev sequence from NCBI and insert it into

ClustalW2 and submit it

Page 23: Comparitive genomics

RESULTS.

Page 24: Comparitive genomics

COUNTER PARTS OF GENES

• Dramatic results have emerged from the

rapidly developing field of comparative

genomics

• Comparison of the fruit fly genome with the

human genome reveals that about sixty

percent of genes are conserved (Adams et al.

2000). That is, the two organisms appear to

share a core set of genes

• Researchers have also found that two-thirds

of human genes known to be involved in

cancer have counterparts in the fruit fly

Page 25: Comparitive genomics

CONT…

Michigan Tech researchers Thomas Werner and Komal

Kumar Bollepogu Raja have traced these black spots to

three specific genes in the fruit fly genome. These

particular genes all have counterparts in human DNA,

and all three of these counterparts just so happens to

cause cancer.

"We are looking here at proto-oncogenes, which are

cancer genes that cause disease when they are active in

an uncontrolled manner. Both humans and flies have

them, and in flies they learned to paint black spots on the

abdomen

Page 26: Comparitive genomics

MODEL ORGANISMS

Comparative genomics is an exciting new field of biological research in which the genome sequences of different species - human, mouse and a wide variety of other organisms from yeast to chimpanzees - are compared.

By comparing the finished reference sequence of the human genome with genomes of other organisms, researchers can identify regions of similarity and difference. This information can help scientists better understand the structure and function of human genes and thereby develop new strategies to combat human diseases

when scientists inserted a human gene associated with early-onset Parkinson's disease into fruit flies, they displayed symptoms similar to those seen in humans with the disorder, raising the possibility the tiny insects could serve as a new model for testing therapies aimed at Parkinson's.

Page 27: Comparitive genomics

INTRODUCTION TO GENOMES WITH

ENSEMBL

Page 28: Comparitive genomics

DATABASE OF COMPARATIVE

GENOMICS. Ensemble is a database in which genomes of different

vertebrates and eukaryotes are store

By using Ensemble one can use BLAST and BLAT to

search the similar sequences in other species and

organisms

User can download sequences in FASTA format and

can also view the karyotypes

One can find Homologues, gene trees, and whole

genome alignments across multiple species.

User can get information about DNA methylation,

transcription factor binding sites, histone

modifications, and regulatory features such as

enhancers and repressors, and microarray

annotations.

Page 29: Comparitive genomics
Page 30: Comparitive genomics

Ensembl is Used Worldwide

Top users: UK US Canada China

France Germany Italy Japan Spain

Page 31: Comparitive genomics

THE ENSEMBL GENOME BROWSER:

MAKING IT INTERESTING

Splice variants, proteins, non-coding RNA

Small and large scale sequence variation, phenotype

associations

Whole genome alignments, protein trees

Potential promoters and enhancers, DNA methylation

User upload, custom data

Page 32: Comparitive genomics

Different species

whose genomes are

present in Ensemble

Total almost 69

spices

Page 33: Comparitive genomics

ENSEMBL FEATURES

The gene set.

•Comparative analysis

•Variation and regulation

•BioMart (data export)

•Display of external data

(DAS) •Programmatic

access via the Perl API.

•Open Source

Page 34: Comparitive genomics

SEQUENCE DISPLAYS.

Gene: Sequence

Transcript:cDNA

Transcript: Exons

Page 35: Comparitive genomics

KARYOTYPE OF

HUMAN GENOME.

Page 36: Comparitive genomics

Chromosome Summary

Page 37: Comparitive genomics

Comparative Genomics

Page 38: Comparitive genomics

COMPARATIVE GENOMICS

• In comparative genomics section it provide four

options

• Alignment image

• Alignment Text

• Region comparison

• Synteny

Page 39: Comparitive genomics

ALIGNMENT IMAGE. • It show the image of alignment that at which

point which gene is similar to which

Page 40: Comparitive genomics

ALIGNMENT TEXT

• It give the whole sequence in FASTA format

in which Exons are highlighted

• It also provide an opportunity of alignment

• User can select the specie to which he/she

want to align the target sequence

• The results will show all possible hits of the

target sequence or region in that specie

• Chromosome number and position of the

similar region on it are provided for all hits

• One can check its detail by selecting it

Page 41: Comparitive genomics

Human sequence is aligned against Mus

musculus specie

Page 42: Comparitive genomics

REGION COMPARISON

• It will compare different regions like protein

coding region, pseudo gene, RNA gene,

processed transcript etc.

Page 43: Comparitive genomics

SYNTENY.. In classical genetics, synteny describes

the physical co-localization of genetic

loci on the same chromosome within an

individual or species.

It show the similar region in the form of

bands of different colors.

Page 44: Comparitive genomics

HUMAN VS MOUSE..

• Human and mouse genome show

approximately 85% identity

• Comparisons of mRNA sequences of 1196

orthologous human and mouse gene pairs were

recently reported (Makalowski et al. 1996),

showing that coding regions tend to show

approximately 85% identity at the nucleotide

and protein levels

• A total of 117 orthologous gene pairs were

identified and studied

Page 45: Comparitive genomics

Exon Identity..

• For the purpose of comparing the genomic structure of the

gene pairs, we used dynamic programming algorithms

(employing both nucleotide similarity and codon

similarity using the PAM20 matrix (Dayhoff et al. 1978))

to align the sequences. We carefully inspected the

alignments to ensure that they correctly aligned the exons.

• The number of exons was identical for 95% of the genes

studied. There were six instances in which the number of

exons differed.

Page 46: Comparitive genomics

CONT….

• In two cases, a single internal coding exon in mouse is reported to correspond to two internal coding exons in human. In the spermidine synthase gene, mouse exon 5 corresponds to human exons 5 and 6, with the total exonic lengths agreeing perfectly

• In the lymphotoxin beta gene mouse exon 2 corresponds to human exons 2 and 3. Interestingly, the mouse exon 2 is 316 bp while the sum of the lengths of human exon 2, intron 2 and exon 3 is only 301 bp.

Page 47: Comparitive genomics

EXON LENGTH

The length of corresponding exons was strongly conserved.

The lengths were identical in 73% of cases. Those differences

that did occur were quite small: the mean ratio of the larger to

smaller length was 1.05.

Moreover, the differences were nearly always a multiple of

three. The length difference was a multiple of three for 95% of

all exons and 99% of all internal coding exons. This is readily

understood in terms of the effects of evolutionary selection

Only three instances were found in which corresponding

internal exons had lengths differing by other than a multiple of

three.

Page 48: Comparitive genomics

CONT..

o In the skeletal muscle specific myogenic gene the

respective lengths of exons 2 and 3 are 81 bp and 123 bp

in the human and 82 and 122 in the mouse

o gene encoding the Flt3 ligand The respective lengths of

exons 2 and 3 are 111 bp and 54 bp in the human and

122 bp and 46 bp in the mouse, while the respective

lengths of exons 5 and 6 are 139 bp and 179 bp in the

human and 144 bp and 189 bp in the mouse.

Page 49: Comparitive genomics

Intron length

Exon lengths tended to be well-preserved,

intron lengths varied considerably

Human introns tended to be larger than mouse

introns (68% of cases), but this could represent

a selection bias reflecting the fact that the less

extensive sequencing of the mouse genome

may lead to an underrepresentation of instances

in which the mouse genomic locus is larger

Page 50: Comparitive genomics

SEQUENCE IDENTITY

Coding regions showed strong sequence similarity, with

approximately 85%

In contrast, introns showed only weak sequence similarity

with approximately 35% sequence identity, which is not much

higher than the background rate of sequence identity in gapped

alignments of random sequences.

Coding regions showed strong sequence similarity,

with

approximately 85%

In contrast, introns showed only weak sequence

similarity

with approximately 35% sequence identity, which

is not much

higher than the background rate of sequence

identity in gapped

alignments of random sequences.

SEQUENCE IDENTITY

Page 51: Comparitive genomics

SEQUENCE IDENTITY

• The degree of conservation varied considerably

among genes. For example, the gene encoding the

ribosomal protein S24 showed 88% identity at the

DNA level and 100% identity at the amino acid level

in coding exons, but only 27% identity at the DNA

level in introns.

• In the tumor necrosis factor-beta gene the first intron

has 75% nucleotide identity and nearly perfect

agreement in length (86 bp in human, 83 bp in

mouse). Interestingly, the flanking exons are less

well-conserved, showing only 70% nucleotide

identity and 60% amino acid identity