Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

44
Spring 2007 Bioinformatiatics Ch. 6 - Genomics

Transcript of Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Page 1: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Spring 2007 BioinformatiaticsBioinformatiatics

Ch. 6 - Genomics

Page 2: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Completed genomesSpring 2009 BioinformatiaticsBioinformatiatics

• http://www.genomesonline.org

Page 3: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Spring 2009 BioinformatiaticsBioinformatiatics

•Avg. genome = 5 mb•Typical sequence coverage = 20X, therefore approx. 100 mb of DNA

•Avg. English word size = 5 letters•Avg. words per page = 250, therefore 1250 letters per page•Avg. book size = 200 pages, therefore 250,000 letters per book•Approximately 400 books per genome

•958 completed genomes as of January 1, 2009•Approximately 383,200 books worth of genomic information

•MSU library holdings: 182,000

Page 4: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Spring 2007 BioinformatiaticsBioinformatiatics

• Whole Genome Sequencing• Shotgun Sequencing• Expressed Sequence Tags• Comparative Genomics• Metagenomics

Approaches to Genome Sequencing

Page 5: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Overview of Genome SequencingOverview of Genome Sequencing

Genomic DNA

Create Genomic LibraryCreate Genomic Library

BAC Clones

Construction of Genome MapConstruction of Genome Map

DNA Sequencing and AssemblyDNA Sequencing and Assembly

Isolate Genomic DNAIsolate Genomic DNA

Page 6: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Isolating Genomic DNAIsolating Genomic DNAala, Qiagen’s DNeasy kit

Lysis:• Proteinase K digestion• Lysis by chaotropic salt

Purification:• DNA negatively charged• Bind positively charged column• Wash (EtOH) away impurities

Elution:• Removal of DNA• Disrupt ionic interaction with high salt buffer

Preservation:• Store at -20°C to -160°C• Tris•EDTA buffer [pH 8.0]

Page 7: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Sephadex StructureSephadex Structure

Page 8: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Creating a Genomic LibraryCreating a Genomic Library

BAC Clones

Cut Genomic DNA:

• Partial Restriction Digest•EcoRI & EcoRI methylase

• Mechanical Shearing

• Determine Avg. fragment size

Clone Fragments into BAC vectors:• Proporties of BACs

Transform E. coli:• Electroporation

Page 9: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Pulse Field Gel ElectrophoresisPulse Field Gel Electrophoresis

Page 10: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Average Insert Size by Pulse Field Gel Electrophoresis

Page 11: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Average Insert Size in Human BACs

Page 12: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Creating a Genomic LibraryCreating a Genomic Library

BAC Clones

Cut Genomic DNA:

• Partial Restriction Digest•EcoRI & EcoRI methylase

• Mechanical Shearing

• Determine Avg. fragment size

Clone Fragments into BAC vectors:• Proporties of BACs

Transform E. coli:• Electroporation

Page 13: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Bacterial Artificial ChromosomeBacterial Artificial Chromosome

• Derived from F plasmids • Multiple cloning site• Selectable Marker

• Antibiotic Resistance Gene - ie, cm• Ori S - unidirectional• Par genes

• partitioning genes• maintain single copy of BAC

Page 14: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Creating a Genomic LibraryCreating a Genomic Library

BAC Clones

Cut Genomic DNA:

• Partial Restriction Digest•EcoRI & EcoRI methylase

• Mechanical Shearing

• Determine Avg. fragment size

Clone Fragments into BAC vectors:• Proporties of BACs

Transform E. coli:• Electroporation

Page 15: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Construction of Genome MapConstruction of Genome Map

BAC Clones

Construction of Genome MapConstruction of Genome Map

DNA Sequencing and AssemblyDNA Sequencing and Assembly

• BAC end sequencing• Identify overlapping BACs• Subclone BACs into plasmids

Transformed E. coli:

Plasmid MiniprepPlasmid Miniprep

Page 16: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Genome Assembly and

Annotation

Page 17: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Overview of Shotgun SequencingOverview of Shotgun Sequencing

Genomic DNA

Create Genomic LibraryCreate Genomic Library

Plasmid Clones

Construction of Genome MapConstruction of Genome Map

DNA Sequencing and AssemblyDNA Sequencing and Assembly

Isolate Genomic DNAIsolate Genomic DNA

Page 18: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Overview of EST SequencingOverview of EST Sequencing

Create Genomic LibraryCreate Genomic Library

DNA SequencingDNA Sequencing

Isolate mRNAIsolate mRNA

Create cDNACreate cDNA

Page 19: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Comparative GenomicsComparative Genomics

Create Genomic LibraryCreate Genomic Library

BAC Clones

Construction of Genome MapConstruction of Genome Map

DNA Sequencing and AssemblyDNA Sequencing and Assembly

Isolate mRNA and create cDNAIsolate mRNA and create cDNA

Synteny - same gene order preserved between species

Page 20: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Comparative Genomics BAC array

Page 21: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Comparative Genome Hybridization

Comparative Genome Hybridization

Page 22: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Bordetella phylogenyBordetella phylogeny

Page 23: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Comparative Genome Hybridization

Comparative Genome Hybridization

Page 24: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Comparative Genome Hybridization

Comparative Genome Hybridization

Page 25: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Metagenomic analysisMetagenomic analysis

• What is metagenomics?– Metagenomics is the genomic analysis of the collective

genomes of an assemblage of organisms from a defined environment.

» Handelsman, et al, 2002– a.k.a., community genomics, environmental genomics– Derived from tools, techniques and models used in genomics.

• Why do metagenomic analysis?– Genomic content of all eucaryotes, bacteria, archaea and

viruses in an evironment.– Provides a picture of genetic/functional potential of the

community.

Page 26: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

MetagenomicsMetagenomics

Page 27: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .
Page 28: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .
Page 29: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .
Page 30: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Venter’s Trip

Page 31: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Yooseph, et al, PLOS biology, 2007

Page 32: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Yooseph, et al, PLOS biology, 2007

Page 33: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Creation of Fosmid Libraries

Page 34: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Preliminary Categorization of 263 ORFsfrom a Fosmid Library of Subgingival Plaque

Preliminary Categorization of 263 ORFsfrom a Fosmid Library of Subgingival Plaque

Category Percentage of library

Eucaryotic 34%

Bacterial 21%

Archaeal 1.1%

Viral1 0.8%

Bacteriophage 2%

Unidentified 41% 1not bacteriophage

Page 35: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Spring 2007 BioinformatiaticsBioinformatiatics

Genome Annotation

Page 36: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Genome Assembly and

Annotation

RefSeq db

Page 37: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Caveats

• Finding genes involves computational methods as well as experimental validation

• Computational methods are often inadequate, and often generate erroneous ‘gene’ (false positive) sequences which:– Are missing exons– Have incorrect exons– Over predict genes– Where the 5’ and 3’ UTR are missing

Page 38: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Things we are looking to annotate?

• CDS• mRNA• Alternative RNA• Promoter and Poly-A Signal• Pseudogenes• ncRNA• Repeat elements• G+C content

Page 39: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Pseudogenes• Could be as high as 20-30% of all Genomic sequence

predictions could be pseudogene• Non-functional copy of a gene

– Processed pseudogene• Retro-transposon derived• No 5’ promoters• No introns• Often includes poly-A tail

– Non-processed pseudogene• Gene duplication derived

– Both include events that make the gene non-functional• Frameshift• Stop codons

• We assume pseudogenes have no function, but we really don’t know!

Page 40: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Noncoding RNA (ncRNA)

• tRNA – transfer RNA: involved in translation• rRNA – ribosomal RNA: structural component

of ribosome, where translation takes place• snRNA – small nuclear RNA:

functional/catalytic in RNA maturation• Antisense RNA - gene regulation• siRNA - gene silencing

Page 41: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

Noncoding RNA (ncRNA)

• ncRNA represent 80-98% of all transcripts in cell• ncRNA have not been taken into account in gene

counts• cDNA• ORF computational prediction• Comparative genomics looking at ORF

• ncRNA can be:– Structural– Catalytic– Regulatory

Page 42: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

GenBank Features

-10_signal-35_signal3'clip3'UTR5'clip5'UTRattenuatorCAAT_signalCDSconflictC_regionD-loopD_segmentenhancerexon

GC_signalgeneiDNAintronJ_segmentLTRmat_peptidemisc_bindingmisc_differencemisc_featuremisc_recombmisc_RNAmisc_signalmisc_structuremodified_base

mRNAN_regionold_sequencepolyA_signalpolyA_siteprecursor_RNAprimer_bindprim_transcriptpromoterprotein_bindRBSrepeat_regionrepeat_unitrep_originrRNA

satellitescRNAsig_peptidesnoRNAsnRNAS_regionstem_loopSTSTATA_signalterminatortransit_peptidetRNAunsurevariationV_regionV_segment

Page 43: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

LOCUS NG_005487 1850 bp DNA linear ROD 14-FEB-2006DEFINITION Mus musculus ubiquitin-conjugating enzyme E2 variant 2 pseudogene (LOC625221) on chromosome 6.ACCESSION NG_005487VERSION NG_005487.1 GI:87239965KEYWORDS .SOURCE Mus musculus (house mouse) ORGANISM Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus.REFERENCE 1 (bases 1 to 1850) AUTHORS Wilson,R. TITLE Mus musculus BAC clone RP24-201D17 from 6 JOURNAL Unpublished (2003)COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from AC121925.2.FEATURES Location/Qualifiers source 1..1850 /organism="Mus musculus" /mol_type="genomic DNA" /db_xref="taxon:10090" /chromosome="6" /note="AC121925.2 32277..34126" gene 101..1750 /gene="LOC625221" /pseudo /db_xref="GeneID:625221" repeat_region 1792..1827 /rpt_family="ID"ORIGIN 1 tcttctgcct caattcctca agtgctagta tcatatgccc atgccattat ttttaactcc 61 cctttttcat gctaagaatt gaacacacgg ccctgcgtgc ggtggtgcgt ctggtagcag 121 gagaagatgg cggtctccac aggagttaaa gttcctcgta attttcgctt gttggaagaa

LOCUS NG_005487 1850 bp DNA linear ROD 14-FEB-2006DEFINITION Mus musculus ubiquitin-conjugating enzyme E2 variant 2 pseudogene (LOC625221) on chromosome 6.ACCESSION NG_005487VERSION NG_005487.1 GI:87239965KEYWORDS .SOURCE Mus musculus (house mouse) ORGANISM Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus.REFERENCE 1 (bases 1 to 1850) AUTHORS Wilson,R. TITLE Mus musculus BAC clone RP24-201D17 from 6 JOURNAL Unpublished (2003)COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from AC121925.2.FEATURES Location/Qualifiers source 1..1850 /organism="Mus musculus" /mol_type="genomic DNA" /db_xref="taxon:10090" /chromosome="6" /note="AC121925.2 32277..34126" gene 101..1750 /gene="LOC625221" /pseudo /db_xref="GeneID:625221" repeat_region 1792..1827 /rpt_family="ID"ORIGIN 1 tcttctgcct caattcctca agtgctagta tcatatgccc atgccattat ttttaactcc 61 cctttttcat gctaagaatt gaacacacgg ccctgcgtgc ggtggtgcgt ctggtagcag 121 gagaagatgg cggtctccac aggagttaaa gttcctcgta attttcgctt gttggaagaa

Page 44: Spring 2007 Bioinformatiatics Ch. 6 - Genomics. Completed genomes Spring 2009 Bioinformatiatics .

The ideal annotation of “MyGene”

MyGene

All mRNAs

All proteins

All structures

All SNPs

All clones

• All protein modifications• Ontologies • Interactions (complexes, pathways, networks)•Expression (where and when, and how much)•Evolutionary relationships

Promoter(s)