Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .
-
Upload
godwin-cannon -
Category
Documents
-
view
217 -
download
0
Transcript of Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .
Spring 2007 BioinformatiaticsBioinformatiatics
Ch. 6 - Genomics
Completed genomesSpring 2009 BioinformatiaticsBioinformatiatics
• http://www.genomesonline.org
Spring 2009 BioinformatiaticsBioinformatiatics
•Avg. genome = 5 mb•Typical sequence coverage = 20X, therefore approx. 100 mb of DNA
•Avg. English word size = 5 letters•Avg. words per page = 250, therefore 1250 letters per page•Avg. book size = 200 pages, therefore 250,000 letters per book•Approximately 400 books per genome
•958 completed genomes as of January 1, 2009•Approximately 383,200 books worth of genomic information
•MSU library holdings: 182,000
Spring 2007 BioinformatiaticsBioinformatiatics
• Whole Genome Sequencing• Shotgun Sequencing• Expressed Sequence Tags• Comparative Genomics• Metagenomics
Approaches to Genome Sequencing
Overview of Genome SequencingOverview of Genome Sequencing
Genomic DNA
Create Genomic LibraryCreate Genomic Library
BAC Clones
Construction of Genome MapConstruction of Genome Map
DNA Sequencing and AssemblyDNA Sequencing and Assembly
Isolate Genomic DNAIsolate Genomic DNA
Isolating Genomic DNAIsolating Genomic DNAala, Qiagen’s DNeasy kit
Lysis:• Proteinase K digestion• Lysis by chaotropic salt
Purification:• DNA negatively charged• Bind positively charged column• Wash (EtOH) away impurities
Elution:• Removal of DNA• Disrupt ionic interaction with high salt buffer
Preservation:• Store at -20°C to -160°C• Tris•EDTA buffer [pH 8.0]
Sephadex StructureSephadex Structure
Creating a Genomic LibraryCreating a Genomic Library
BAC Clones
Cut Genomic DNA:
• Partial Restriction Digest•EcoRI & EcoRI methylase
• Mechanical Shearing
• Determine Avg. fragment size
Clone Fragments into BAC vectors:• Proporties of BACs
Transform E. coli:• Electroporation
Pulse Field Gel ElectrophoresisPulse Field Gel Electrophoresis
Average Insert Size by Pulse Field Gel Electrophoresis
Average Insert Size in Human BACs
Creating a Genomic LibraryCreating a Genomic Library
BAC Clones
Cut Genomic DNA:
• Partial Restriction Digest•EcoRI & EcoRI methylase
• Mechanical Shearing
• Determine Avg. fragment size
Clone Fragments into BAC vectors:• Proporties of BACs
Transform E. coli:• Electroporation
Bacterial Artificial ChromosomeBacterial Artificial Chromosome
• Derived from F plasmids • Multiple cloning site• Selectable Marker
• Antibiotic Resistance Gene - ie, cm• Ori S - unidirectional• Par genes
• partitioning genes• maintain single copy of BAC
Creating a Genomic LibraryCreating a Genomic Library
BAC Clones
Cut Genomic DNA:
• Partial Restriction Digest•EcoRI & EcoRI methylase
• Mechanical Shearing
• Determine Avg. fragment size
Clone Fragments into BAC vectors:• Proporties of BACs
Transform E. coli:• Electroporation
Construction of Genome MapConstruction of Genome Map
BAC Clones
Construction of Genome MapConstruction of Genome Map
DNA Sequencing and AssemblyDNA Sequencing and Assembly
• BAC end sequencing• Identify overlapping BACs• Subclone BACs into plasmids
Transformed E. coli:
Plasmid MiniprepPlasmid Miniprep
Genome Assembly and
Annotation
Overview of Shotgun SequencingOverview of Shotgun Sequencing
Genomic DNA
Create Genomic LibraryCreate Genomic Library
Plasmid Clones
Construction of Genome MapConstruction of Genome Map
DNA Sequencing and AssemblyDNA Sequencing and Assembly
Isolate Genomic DNAIsolate Genomic DNA
Overview of EST SequencingOverview of EST Sequencing
Create Genomic LibraryCreate Genomic Library
DNA SequencingDNA Sequencing
Isolate mRNAIsolate mRNA
Create cDNACreate cDNA
Comparative GenomicsComparative Genomics
Create Genomic LibraryCreate Genomic Library
BAC Clones
Construction of Genome MapConstruction of Genome Map
DNA Sequencing and AssemblyDNA Sequencing and Assembly
Isolate mRNA and create cDNAIsolate mRNA and create cDNA
Synteny - same gene order preserved between species
Comparative Genomics BAC array
Comparative Genome Hybridization
Comparative Genome Hybridization
Bordetella phylogenyBordetella phylogeny
Comparative Genome Hybridization
Comparative Genome Hybridization
Comparative Genome Hybridization
Comparative Genome Hybridization
Metagenomic analysisMetagenomic analysis
• What is metagenomics?– Metagenomics is the genomic analysis of the collective
genomes of an assemblage of organisms from a defined environment.
» Handelsman, et al, 2002– a.k.a., community genomics, environmental genomics– Derived from tools, techniques and models used in genomics.
• Why do metagenomic analysis?– Genomic content of all eucaryotes, bacteria, archaea and
viruses in an evironment.– Provides a picture of genetic/functional potential of the
community.
MetagenomicsMetagenomics
Venter’s Trip
Yooseph, et al, PLOS biology, 2007
Yooseph, et al, PLOS biology, 2007
Creation of Fosmid Libraries
Preliminary Categorization of 263 ORFsfrom a Fosmid Library of Subgingival Plaque
Preliminary Categorization of 263 ORFsfrom a Fosmid Library of Subgingival Plaque
Category Percentage of library
Eucaryotic 34%
Bacterial 21%
Archaeal 1.1%
Viral1 0.8%
Bacteriophage 2%
Unidentified 41% 1not bacteriophage
Spring 2007 BioinformatiaticsBioinformatiatics
Genome Annotation
Genome Assembly and
Annotation
RefSeq db
Caveats
• Finding genes involves computational methods as well as experimental validation
• Computational methods are often inadequate, and often generate erroneous ‘gene’ (false positive) sequences which:– Are missing exons– Have incorrect exons– Over predict genes– Where the 5’ and 3’ UTR are missing
Things we are looking to annotate?
• CDS• mRNA• Alternative RNA• Promoter and Poly-A Signal• Pseudogenes• ncRNA• Repeat elements• G+C content
Pseudogenes• Could be as high as 20-30% of all Genomic sequence
predictions could be pseudogene• Non-functional copy of a gene
– Processed pseudogene• Retro-transposon derived• No 5’ promoters• No introns• Often includes poly-A tail
– Non-processed pseudogene• Gene duplication derived
– Both include events that make the gene non-functional• Frameshift• Stop codons
• We assume pseudogenes have no function, but we really don’t know!
Noncoding RNA (ncRNA)
• tRNA – transfer RNA: involved in translation• rRNA – ribosomal RNA: structural component
of ribosome, where translation takes place• snRNA – small nuclear RNA:
functional/catalytic in RNA maturation• Antisense RNA - gene regulation• siRNA - gene silencing
Noncoding RNA (ncRNA)
• ncRNA represent 80-98% of all transcripts in cell• ncRNA have not been taken into account in gene
counts• cDNA• ORF computational prediction• Comparative genomics looking at ORF
• ncRNA can be:– Structural– Catalytic– Regulatory
GenBank Features
-10_signal-35_signal3'clip3'UTR5'clip5'UTRattenuatorCAAT_signalCDSconflictC_regionD-loopD_segmentenhancerexon
GC_signalgeneiDNAintronJ_segmentLTRmat_peptidemisc_bindingmisc_differencemisc_featuremisc_recombmisc_RNAmisc_signalmisc_structuremodified_base
mRNAN_regionold_sequencepolyA_signalpolyA_siteprecursor_RNAprimer_bindprim_transcriptpromoterprotein_bindRBSrepeat_regionrepeat_unitrep_originrRNA
satellitescRNAsig_peptidesnoRNAsnRNAS_regionstem_loopSTSTATA_signalterminatortransit_peptidetRNAunsurevariationV_regionV_segment
LOCUS NG_005487 1850 bp DNA linear ROD 14-FEB-2006DEFINITION Mus musculus ubiquitin-conjugating enzyme E2 variant 2 pseudogene (LOC625221) on chromosome 6.ACCESSION NG_005487VERSION NG_005487.1 GI:87239965KEYWORDS .SOURCE Mus musculus (house mouse) ORGANISM Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus.REFERENCE 1 (bases 1 to 1850) AUTHORS Wilson,R. TITLE Mus musculus BAC clone RP24-201D17 from 6 JOURNAL Unpublished (2003)COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from AC121925.2.FEATURES Location/Qualifiers source 1..1850 /organism="Mus musculus" /mol_type="genomic DNA" /db_xref="taxon:10090" /chromosome="6" /note="AC121925.2 32277..34126" gene 101..1750 /gene="LOC625221" /pseudo /db_xref="GeneID:625221" repeat_region 1792..1827 /rpt_family="ID"ORIGIN 1 tcttctgcct caattcctca agtgctagta tcatatgccc atgccattat ttttaactcc 61 cctttttcat gctaagaatt gaacacacgg ccctgcgtgc ggtggtgcgt ctggtagcag 121 gagaagatgg cggtctccac aggagttaaa gttcctcgta attttcgctt gttggaagaa
LOCUS NG_005487 1850 bp DNA linear ROD 14-FEB-2006DEFINITION Mus musculus ubiquitin-conjugating enzyme E2 variant 2 pseudogene (LOC625221) on chromosome 6.ACCESSION NG_005487VERSION NG_005487.1 GI:87239965KEYWORDS .SOURCE Mus musculus (house mouse) ORGANISM Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus.REFERENCE 1 (bases 1 to 1850) AUTHORS Wilson,R. TITLE Mus musculus BAC clone RP24-201D17 from 6 JOURNAL Unpublished (2003)COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from AC121925.2.FEATURES Location/Qualifiers source 1..1850 /organism="Mus musculus" /mol_type="genomic DNA" /db_xref="taxon:10090" /chromosome="6" /note="AC121925.2 32277..34126" gene 101..1750 /gene="LOC625221" /pseudo /db_xref="GeneID:625221" repeat_region 1792..1827 /rpt_family="ID"ORIGIN 1 tcttctgcct caattcctca agtgctagta tcatatgccc atgccattat ttttaactcc 61 cctttttcat gctaagaatt gaacacacgg ccctgcgtgc ggtggtgcgt ctggtagcag 121 gagaagatgg cggtctccac aggagttaaa gttcctcgta attttcgctt gttggaagaa
The ideal annotation of “MyGene”
MyGene
All mRNAs
All proteins
All structures
All SNPs
All clones
• All protein modifications• Ontologies • Interactions (complexes, pathways, networks)•Expression (where and when, and how much)•Evolutionary relationships
Promoter(s)