genomic comparison
-
Upload
comsats-university-of-science-information-technology -
Category
Education
-
view
619 -
download
0
Transcript of genomic comparison
![Page 1: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/1.jpg)
Comparative genomics
Presented by Arooba Baig Fomaz Tariq
![Page 2: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/2.jpg)
GenomicsGenomics is an area within genetics that concerns the sequencing and analysis of an organism’s
genome.Development and application of genetic mapping, sequencing, and computation (bioinformatics) to
analyze the genomes of organisms.
Sub-fields of genomics:
Structural genomics-genetic and physical mapping of genomes.
Functional genomics-analysis of gene function (and non-genes).
Comparative genomics-comparison of genomes across species.
Includes structural and functional genomics.
Evolutionary genomics.
![Page 3: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/3.jpg)
Comparative genomicsComparative genomics is an exciting field of biological research in which
researchers use a variety of tools, including computer-based analysis, to compare the complete genome sequences of different species
A comparison of gene numbers, gene locations & biological functions of gene, in the genomes of different organisms, one objective being to identify groups of genes that play a unique biological role in a particular organism.
![Page 4: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/4.jpg)
History• Comparative genomics has a root in the comparison of virus genomes in
the early 1980s.• For example, small RNA viruses infecting animals (picorna viruses) and
those infecting plants ( cowpea mosaic virus) were compared and turned out to share significant sequence similarity and, in part, the order of their genes.
• In 1986, the first comparative genomic study at a larger scale was published, comparing the genomes of varicella-zoster virus and Epstein-Barr virus that contained more than 100 genes each
![Page 5: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/5.jpg)
Contd..• The first complete genome sequence of a cellular organism, that of
Haemophilus influenzae Rd, was published in 1995. • The second genome sequencing paper was of the small parasitic
bacterium Mycoplasma genitalium published in the same year. • Saccharomyces cerevisiae, the baker's yeast, was the first eukaryote
to have its complete genome sequence published in 1996.• After the publication of the roundworm Caenorhabditis elegans genome
in 1998, and together with the fruit fly Drosophila melanogaster genome in 2000, Gerald M. Rubin and his team published a paper titled "Comparative Genomics of the Eukaryotes“.
• In which they compared the genomes of the eukaryotes D. melanogaster, C. elegans, and S. cerevisiae, as well as the prokaryote H. influenza .
![Page 6: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/6.jpg)
Related Terminologies• Homology is the relationship of any two characters (such as two proteins that have similar
sequences) that have descended, usually through divergence, from a common ancestral character
• Homologues Homologues can either be orthologues, paralogues
• Orthologues are homologues that have evolved from a common ancestral gene by speciation. They usually have similar function
• Paralogues are homologues that are related or produced by duplication within a genome. They often have evolved to perform different functions
![Page 7: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/7.jpg)
Comparative Genomics Tools Similarity search programs • BLAST2 (Basic Local Alignment Search Tool) • FASTA • MUMmer (Maximal Unique Match) (Comparisons and analyses at both Nucleic
acid and protein level) Other alignment programs • DBA [DNA Block Aligner]• Blastz• BLAT/AVID, – WABA [Wobble Aware Bulk Aligner] • DIALIGN [Diagonal ALIGNment]• SSAHA [Sequence Search and Alignment by Hashing Algorithm]
![Page 8: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/8.jpg)
Contd.. Comparative gene prediction programs Twins can Double scan SGP-1 Regulatory region prediction Consite Visualization/ Sequence analysis programs Dot plot (e.g. Dotter) PIP maker (Percent Identity Plot) Alfresco VISTA (VISualization Tools for Alignments) ACT (Artemis comparison tool) S S Jena
![Page 9: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/9.jpg)
Comparative Genomics Tool The UCSC Genome Browser is an on-line genome
browser hosted by the University of California, Santa Cruz. The UCSC Genome Browser is an on-line genome browser hosted by the University of California, Santa Cruz
![Page 10: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/10.jpg)
Synteny RegionsSynteny Regions of two genomes that show considerable similarity in
terms of sequence and conservation of the order of genes. Genes that are in the same relative position on two different
chromosomes. Closely related species generally have similar order of genes on
chromosomes. Synteny can be used to identify genes in one species based on map-
position in another
![Page 11: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/11.jpg)
Interactive DAGchianer Algorithm:Tool for mining GenomeDuplication & Synteny
Finding putative genes or regions of homology between two genomes
Identifying collinear sets of genes or regions of sequence
Generating a dot plot of the results and coloring syntenic pairs.
Comparative Genomics Tool
![Page 12: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/12.jpg)
![Page 13: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/13.jpg)
Syntentic dot plot: Syntentic dot plots give biologists very valuable information about how organisms diverged from a common ancestor.
Biologists can easily look at one of these dot plots and see where large sections of DNA have been deleted, inserted, copied, or moved.
The dot plots are also very good at depicting how closely two organisms are related through the quantity and linearity of green dots over an entire genome.
![Page 14: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/14.jpg)
Sequence Similarity Search
The most frequently performed type of sequence comparison is the sequence similarity search
Sequence comparisons that implicate function are widely used: To determine if newly sequenced cDNA or genomic region encodes gene
of known function. Search for similar sequence in other species (or in same species)
![Page 15: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/15.jpg)
Contd.. Search databases of DNA sequences Use computer algorithms to align sequences Don’t require perfect matches between sequences Most commonly used algorithms: BLAST FAST-A Homology searches
![Page 16: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/16.jpg)
BLASTThe Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between
sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
![Page 17: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/17.jpg)
![Page 18: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/18.jpg)
General Databases Useful for Comparative Genomics
• Locus Link/Ref Seq: http://www.ncbi.nih.gov/LocusLink/ • PEDANT-Protein Extraction Description Analysis Tool http://pedant.gsf.de • COGs - Cluster of Orthologous Groups (of proteins) http://www.ncbi.nih.gov/COG/ • KEGG- Kyoto Encyclopedia of Genes and Genomes http://www.genome.ad.jp/kegg/ • MBGD - Microbial Genome Database http://mbgd.genome.ad.jp/ • GOLD - Genome Online Database http://wit.integratedgenomics.com/GOLD/ • TIGR – The Institute of Genome Research Comparative genomics of Parasites
![Page 19: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/19.jpg)
Comparative genomic process
Alignment of DNA sequences is the core process in comparative genomics.
An alignment is a mapping of the nucleotides in one sequence onto the nucleotides in the other sequence, with gaps introduced into one or the other sequence to increase the number of positions with matching nucleotides.
Several powerful alignment algorithms have been developed to align two or more sequences
![Page 20: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/20.jpg)
Methods for comparative genomics
• Comparative analysis of genome structure• Comparative analysis of coding regions (exon)• Comparative analysis of non-coding regions (introns)
![Page 21: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/21.jpg)
Comparative analysis of genome structure
Analysis of the global structure of genomes, such as nucleotide composition, syntenic relationships, and gene ordering offer insight into the similarities and differences between genomes.
This provide information on the organization and evolution of the genomes, and highlight the unique features of individual genomes
The structure of different genomes can be compared at three levels: • Overall nucleotide statistics,• Genome structure at DNA level• Genome structure at gene level.
![Page 22: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/22.jpg)
Comparison of genome structure at DNA level
Chromosomal breakage and exchange of chromosomal fragments are common mode of gene evolution. They can be studied by comparing genome structures at DNA level.
• Identification of conserved Synteny and genome rearrangement events• Analysis of breakpoints• Analysis of content and distribution of DNA repeats
![Page 23: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/23.jpg)
Comparison of genome structure at gene level
Chromosomal breakage and exchange of chromosomal fragments cause disruption of gene order
Therefore gene order correlates with evolutionary distance between genomes
![Page 24: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/24.jpg)
Comparative analysis of coding regions
The analysis and comparison of the coding regions starts with the gene identification algorithm that is used to infer what portions of the genomic sequence actively code for genes.
There are four basic approaches for gene identification
![Page 25: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/25.jpg)
25
Comparative analysis of coding regions
Number of algorithms that have been use in comparative genomics to aid function prediction of genes.
Identification of gene-coding regions
comparison of gene content
comparison of protein content
Comparative genome based function prediction
![Page 26: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/26.jpg)
26
Comparison of gene content
After the predicted gene set is generated, it is very interesting and important to compare the content of genes across genomes
The first statistics to compare is the estimated total number of genes in a genome, elucidate the similarities and differences between the genomes include percentage of the genome that code for genes, distribution of coding regions across the genome average gene length, codon usage
This is often done using a pairwise sequence comparison tool such as BLASTN or TBLASTX
![Page 27: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/27.jpg)
Comparison of protein contentA second level of analysis that can be performed is to compare the set of
gene products (protein) between the genomes, which has been termed ‘‘comparative proteomics”
It is important to compare the protein contents in critical pathways and important functional categories across genomes
Two widely used resources for pathways and functional categories are the KEGG pathway database and the Gene Ontology (GO) hierarchy
![Page 28: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/28.jpg)
• Interesting statistics to compare include• Level of sequence identity between orthologous pairs across genome• Paralogous pairs within genome,• Number of replicated copies in corresponding paralog families• Functions of the paralogs
![Page 29: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/29.jpg)
Comparative analysis of noncoding regions
Noncoding regions of the genome gained a lot of attention in recent years because of its predicted role in regulation of transcription, DNA replication, and other biological functions
![Page 30: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/30.jpg)
30
Insights into Genome Fluxes and the Processes of Evolution
• From an evolutionary biology perspective, whole genome comparisons provide molecular insights into the processes of evolution that include the molecular events responsible for the variations and fluxes that occur through a genome. These include processes like, inversions, translocations, deletions, duplications and insertions.
![Page 31: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/31.jpg)
31
The Impact of Comparative Genomics in Phylogenetic Analysis
Schematic depiction of Microsporidia's phylogenetic position based on Small Subunit RNA (SSU rRNA) as an early branching eukaryote that evolved prior to the acquisiton of mitochondria, and it's subsequent placement based on a composite gene phylogeny where it was placed closer to fungi. The latter placement has been confirmed by the complete sequenceof the micro-sporidia, Encephalitozoon cuniculi, where despite the absence of mitochondria, the presence of several mitochondrial genes could be observed.
![Page 32: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/32.jpg)
![Page 33: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/33.jpg)
Contd…We have learned from homologous sequence alignment that the information that
can be gained by comparing two genomes together is largely dependent upon the phylogenetic distance between them.
Phylogenetic distance is a measure of the degree of separation between two organisms or their genomes on an evolutionary scale, usually expressed as the number of accumulated sequence changes, number of years, or number of generations.
The more distantly related two organisms are, the less sequence similarity or shared genomic features will be detected between them.
Thus, only general insights about classes of shared genes can be gathered by genomic comparisons at very long phylogenetic distances (e.g., over one billion years since their separation). Over such very large distances, the order of genes and the signatures of sequences that regulate their transcription are rarely conserved
![Page 34: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/34.jpg)
How Are Genomes Compared?
• A simple comparison of the general features of genomes such as genome size, number of genes, and chromosome number presents an entry point into comparative genomic analysis.
• Data for several fully-sequenced model organisms is shown in Table 1.
![Page 35: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/35.jpg)
Contd…• For example, while the tiny flowering plant Arabidopsis thaliana has a
smaller genome than that of the fruit fly Drosophila melanogaster (157 million base pairs v. 165 million base pairs, respectively)
• It possesses nearly twice as many genes (25,000 v. 13,000). • In fact A. thaliana has approximately the same number of genes as
humans (~25,000). • Thus, a very early lesson learned in the "genomic era" is that genome
size does not correlate with evolutionary status, nor is the number of genes proportionate to genome size.
![Page 36: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/36.jpg)
Contd..• Figure 1 depicts a chromosome-level comparison of the human and
mouse genomes that shows the level of Synteny between these two mammals
• Synteny is a situation in which genes are arranged in similar blocks in different species.
• The nature and extent of conservation of Synteny differs substantially among chromosomes.
• For example, the X chromosomes are represented as single, reciprocal syntenic blocks.
• Human chromosome 20 corresponds entirely to a portion of mouse chromosome 2, with nearly perfect conservation of order along almost the entire length, disrupted only by a small central segment
• Human chromosome 17 corresponds entirely to a portion of mouse chromosome 11.
• Other chromosomes, however, show evidence of more extensive interchromosomal rearrangement.
• Results such as these provide an extraordinary glimpse into the chromosomal changes that have shaped the mouse and human genomes since their divergence from a common ancestor 75–80 million years ago.
![Page 37: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/37.jpg)
Comparing Human, Chimp, and Mouse Genomes
The graphs below indicate the similarity between the human genome and those of the chimpanzee and the mouse as they are mapped to identical locations in the human genome.
Since the chimpanzee genome is closer in evolutionary time to the human genome, the chimp chromosomes map very closely to human chromosomes
The mouse genome is more distant in evolutionary time from human, and thus its chromosomes do not map as closely as do the chimp chromosomes.
The white areas indicate areas of the human genome that either do not map well to the other genome, or are areas of centromeres and telomeres where the genome sequence is unknown.
Chromosome numbering is purely arbitrary, based upon early microscopic estimates of chromosome length.
The chimpanzee genome has 23 numbered chromosomes, the human genome has 22 numbered chromosomes (chimp chromosomes 2a and 2b map to human chromosome 2), the mouse genome has 19 numbered chromosomes.
The X and Y sex chromosomes have unique names, as well as other unique characteristics.
![Page 38: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/38.jpg)
Mouse genome mapped on the human genome
• This image shows the 34% of the mouse genome that maps to identical sequence in the human genome.
• The matching locations are jumbled, indicating rearrangements of the two genomes since their last common ancestor, approximately 75 million years before present.
• Data for this figure comes from assemblies of the human and mouse genomes available from the UCSC Genome Browser in June 2006.
![Page 39: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/39.jpg)
Chimpanzee genome mapped on the human genome• This image shows the 95% of the
chimpanzee genome that maps to identical sequence in the human genome.
• The consistency of the color indication demonstrates the close identity between the two genomes since their last common ancestor, approximately 5 million years before present.
• The human chromosome 2 actually aligns to two separate chimp chromosomes, now called chr2a and chr2 and represented here by the same color..
• Data for this figure comes from assemblies of the human and chimpanzee genomes available from the UCSC Genome Browser in June 2006.
![Page 40: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/40.jpg)
Benefits of comparative genomics Identifying DNA sequences that have been "conserved“It pinpoints genes that are essential to life and highlights genomic signals
that control gene function across many speciesComparative genomics also provides a powerful tool for studying evolution
Applications• agriculture,• biotechnology • and zoology• evolutionary tree• Drugs discovery
![Page 41: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/41.jpg)
Comparative Genomics in Drug Discovery
Comparative genomic studies throw important light on the pathogenesis of organisms, throwing up opportunities for therapeutic intervention as well as help in understanding and identifying disease genes
One of the most important fallouts of comparative analyses at a genome-wide scale is in the ability to identify and develop novel drug targets
![Page 42: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/42.jpg)
Comparative genomics in drug discovery programs. A flow chart diagram explaining how comparative genomics can facilitate drug discovery programs for
the discovery of new antimicrobials
![Page 43: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/43.jpg)
References1. http://www.slideshare.net/naripati/comparative-genomics-459218012. http://www.genome.gov/115095423. http://lib.dr.iastate.edu/cgi/viewcontent.cgi?article=3150&context=etd4. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1891719/5. http://
www.powershow.com/view1/1fa4ca-ZDc1Z/UCSC_Genome_Browser_Tutorial_powerpoint_ppt_presentation
6. http://www.dcode.org/7. http://blast.ncbi.nlm.nih.gov/Blast.cgi8. http://
www.proteinstructures.com/Sequence/Sequence/sequence-alignment.html9. https://www.dnalc.org/view/1241-Breakpoints.html
![Page 44: genomic comparison](https://reader036.fdocuments.us/reader036/viewer/2022062316/58a6267a1a28ab416c8b53b3/html5/thumbnails/44.jpg)