Bioinformatics
description
Transcript of Bioinformatics
![Page 1: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/1.jpg)
Bioinformatics
Overview
School of B&I TCD May 2010
![Page 2: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/2.jpg)
Who, me?• Andrew Lloyd
• 087-225-9850, 053-9255717, 01-896-2450
• Director INCBI 1993-2000
• Population genetics, evolution
• Whole genome analysis
• Immunology, chickens, FIRM
![Page 3: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/3.jpg)
Definition/scope
• Storage, retrieval and analysis of biological (sequence) information.
• Insert better definition here• Case can be made for microarray analysis• NOT
– ecoinformatics (ecology)– Image analysis– Bar-coding hospital sheets
![Page 4: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/4.jpg)
Philosophy
“Nothing worth learning can be taught” Oscar Wilde
![Page 5: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/5.jpg)
Getting bioinformation
• Type it in: A,T,C,C,G,T,C,A (1991)
• Access databases– Literature (Pubmed)– Medical (OMIM)– DNA sequence (EMBL/GenBank)– Protein sequence (UniProt, SwissProt, PIR)– 3-D structure (PDB)
![Page 6: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/6.jpg)
Annotation
• In any DB, half is data and half context.– Gene ontology (language)– Parsing sequence (ORF, RBS, Intron, -helix)– Recognising similar sequences (evolution!)– Complementary info : DB cross-referencing
• (DNA -> Protein -> 3D structure -> motifs)
![Page 7: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/7.jpg)
Secondary databases
• Protein motifs, domains, families
• RNA structures (16S ribosomal RNA…)
• Taxonomy/classification
• Metabolic pathways (KEGG)
• Enzymes (Brenda, TCD, Ireland)
• SNPs: mutations and variants
• Disease DBs (OMIM)
• Immuno, epitope DBs
![Page 8: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/8.jpg)
Complete genomes
• Ensembl (complex, basically vertebrate)– Uniform look-and-feel; cross-refs
• UCSC GoldenPath browser
• Plants
• Bacterial genomes– Including mitochondrial, chloroplast– Eubacteria vs Archaea vs Eukaryotes
![Page 9: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/9.jpg)
Annotated/known genes
• What does my gene do?
• Blast (fasta) against the DB
• SRS/Entrez to access databases– Neighboring (similar things in same DB)
• DB cross-references– full picture of attributes– What biochemical pathway?
![Page 10: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/10.jpg)
UniProtProtein sequence
GenBank/EMBLDNA Sequence
PDB3-D struct
OMIM
PubMed
Taxonomy
Maps &Genomes
FullTextJournals
Prosite Pfam PSSM
The territory
![Page 11: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/11.jpg)
Databases
•BIG
• EMBL/GenBank 200Gbp, 100m entries, 2500 complete genomes, 200K species
• Encycl. Britannica 180m letters. 40m words• EMBL 1km of Britannica Volumes• Doubling every 14-18 mo• Human genome is X bp?
![Page 12: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/12.jpg)
Intrinsic vs Context
Internal• DNA, protein sequence
– DNA: Purine/Pyrimidine– AAs: small, hydrophobic, aromatic, polar– Variants: SNPs, Indels, Alt Splicing
• 2ndry structure– DNA: stem/loops– Protein: helix, sheet, turn, loop
![Page 13: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/13.jpg)
Intrinsic vs Context
External, context for your molecule
• In other species (homologs, phylog trees)
• In which cell
• In which cellular location (GO)
• Molecular complex (dimers)
• Which pathway (KEGG)
• Where in genome (neighbors, synteny)
![Page 14: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/14.jpg)
New Unknown Gene
• Blast homology searching
• Genomic location/neighboring genes
• Where is it expressed?
• How regulated (control sequences)
• Intron/exon structure
• Domain structure
• Restriction sites etc.
• Primer design
![Page 15: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/15.jpg)
DNA/gene structure
• Four bases A T C G U– 2 pyrimidine, 2 purine– LOTS of them: how many?
• Open reading frame
• 5’ signals, 3’ signals
• Introns/exons
• Neighbours (operons)
![Page 16: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/16.jpg)
Two sequences
• Alignment– Local– Global
• Dotplot
• Threading
![Page 17: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/17.jpg)
One seq vs many
• Homology search vs database
• Special case of 2-seq alignment
• Blast vs fasta
• Limit by species/taxon
• Substitution matrices
• Low complexity masking
![Page 18: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/18.jpg)
Multiple sequence alignment
• MSA
• Progressive alignment
• ClustalW or (better) T-Coffee
![Page 19: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/19.jpg)
Phylogenetic trees
• Computationally intensive
• Distance matrix methods– Neighbor-joining (NJ)– UPGMA
• Minimum evolution
• Maximum parsimony
• Maximum likelihood– Bayesian methods
![Page 20: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/20.jpg)
Genefinding
• Special case of DNA analysis• How to annotate a genome• Bacterial
– Find open reading frames (ORFs)– With start/stop codons– With promoter, RBS, CAAT, TATA
• Eukaryotic– As above PLUS– Introns/exons– Alternative splicing
![Page 21: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/21.jpg)
Exon 1 Exon 2 Exon 3 Exon 4
StopStart (ATG) IntronsControlRegion
Typical mammalian gene structure
Introns “spliced out” and discarded
DNA
RNA
RNA
ATGCCCAGGAGATTTGGA . . .
PROTEIN MetProArgArgPheGly . . .
miRNAs?
5’ 3’gt.. …ag
Stop: TAG, TGA, TAA
![Page 22: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/22.jpg)
Protein substructure• DNA makes protein and protein (enzymes)
make everything else.
• 20 Amino acids
• Amino acid properties
• Motifs
• Domains
• Biological units
![Page 23: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/23.jpg)
Amino acid propertiesagain … and again and again
![Page 24: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/24.jpg)
Protein 3-D structure
• Relationship between sequence & structure
• Secondary structure– Alpha helix– Beta sheet– Coil– Turn
• Threading sequence to homologous structure
![Page 25: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/25.jpg)
Gene Expression
• EST
• SAGE
• MicroArray
• Clustering of same expressed genes
![Page 26: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/26.jpg)
Genomics
• Complete DNA seq for a species
• Gene order
• Gene clusters/operons– Missing operons
• Gene duplication
• Whole genome duplication (WGD)
![Page 27: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/27.jpg)
SNPs
• Key issue in genetics is that two organisms are both the same and different:– Humans vs chimps vs mouse– Parent vs offspring vs co-national vs human
• Single nucleotide polymorphisms• Variation between individuals• Pharmacogenetics
– Personal tailored medicine
![Page 28: Bioinformatics](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813e60550346895da86799/html5/thumbnails/28.jpg)
Summary/take home
• Course designed to give you access to databases, software tools
• …and ways of thinking about data