Lecture #4 : Comparing genes
description
Transcript of Lecture #4 : Comparing genes
Lecture #4 : Comparing Lecture #4 : Comparing genesgenes
9/14/09
This weekThis week Homework #2 due on Wed
Email with questionsEmail me answers or hand in in class
Wed - I will be at Dept of Biology retreatLecture will be given by Kelly O’Quin -
expert in phylogeneticsHe will go over homework so it must
be done before class
Questions for todayQuestions for today
0. More BLAST1. Where do we get high quality
gene sequences?2. How do genes evolve?3. How do we compare genes?
How to find genesHow to find genes
Start with genes which are known from model organisms
Use these to pull out genes from genomes
Compare genes to learn about sensory evolution
Blast - GenbankBlast - Genbank
What database do you want to search?
What do you want to compare?
What program do you want to do the searching?
Query Database Type
Nucleotide Nucleotide Blastn, Megablast, Discont megablast
Protein Protein Blastp, Psi-blast, Phi-blast
Translated nucleotide
Protein Blastx
Protein Translated nucleotide
Tblastn
Translated nucleotide
Translated nucleotide
Tblastx
Types of blast queriesTypes of blast queries
Defaults
Database
Program
Confirm
Nucleotide BLAST = DNA nucleotide query vs nucleotide database
Choices for programsChoices for programs
Megablast Highly similar sequences >95%
Word length 28 Discontiguous megablast
Pretty similar seqs Word length 11
Blastn Dissimilar seqsWord length 11
Translated blast = protein query vs translated database
BLAST a genomeBLAST a genome
Request IDAWJ4D4B7012
BLASTing is funBLASTing is fun
This is meant to be enjoyable Be a genome explorer
Find out what kind of data is out thereFind out what kind of data isn’t there
QUESTIONS?????
Q1.Q1.
There is so much data in Genbank. How do you find GOOD data?
ExampleBovine rhodopsin - 1st G protein
coupled receptor to be sequencedSearch Genbank with text
49 entries
Bovine opsinBovine opsin
Bovine rhodopsinBovine rhodopsin
Searching for genesSearching for genes
Searching by text is fraught with perilGenbank has too many linksPull up many things that are not what
you want BLAST is better approach NCBI has also made records which
combine all similar sequences into one
NCBI has done some of NCBI has done some of the workthe work
They have hand-curated data for some species to make a set of reference sequencesNucleotide sequences - NMxxxxxxxProtein sequences - NPxxxxxx
For human rhodopsinNM000539NP000530
These are the gold standard for sequences
HomologeneHomologene
HomologsHomologs
Two genes which arise in the common ancestor of two organisms and are passed down
Implies genes perform same function in two organisms
Therefore they can be compared to learn about evolution
Human
Chimp
Macaque
Bushbaby
These 4 primates have many genes which are homologsand have been passed down from primate ancestor
Homologene search for Homologene search for rhodopsinrhodopsin
HomologeneHomologene
Three primary sequence Three primary sequence portals: 1. NCBIportals: 1. NCBI
3. DNA database of Japan3. DNA database of Japan
2. Ensembl - European 2. Ensembl - European Bioinformatics Institute Bioinformatics Institute
(EBI)(EBI)
Select just genesSelect just genes
Scroll down to find the Scroll down to find the gene you wantgene you want
Location Orthologues are predicted and linkedLinks to transcript and protein
OMIM - Online mendelian OMIM - Online mendelian inheritance in maninheritance in man
Good places to find genesGood places to find genes
Model organisms: NCBI homologene Genes from models and other organisms:
Sanger Ensembl gene familiesNOTE: These are often predicted from genome
sequencesIf there is a sequence in NCBI homologene, it
may be different (and more accurate) than Sanger predictions
OMIM is a good reference
Q2. How do genes change Q2. How do genes change through time?through time?
Change in actual sequenceMutationRecombination
Change in frequency of a sequenceSelection - “survive” betterDrift - get passed on by chanceMigration - move between populations
Mutation vs selectionMutation vs selection Mutation = sequence changeATGCCGTGACGT ATGCCTTGACGT
Selection/drift/migration = sequence frequency changes across a number of individuals
ATGTG ATGTG ATGTG ATGTG ATGTG ATGTGATGTG ATGTG ATGTG ATGTG ATGTG ATGTT
ATGTG ATGTG ATGTG ATGTT ATGTT ATGTT ATGTG ATGTG ATGTG ATGTT ATGTT ATGTTATGTT ATGTG ATGTG ATGTT ATGTT ATGTT
Evolution as tinkererEvolution as tinkerer
Changes are typically small Mutation is source of new
sequenceNot all mutations are created equalSome occur more often than others
Other forces shift frequency of particular sequence
Triplet amino acid codeTriplet amino acid codeF, phe TTT S, ser TCT Y, tyr TAT C, cys TGTF, phe TTC S, ser TCC Y, tyr TAC C, cys TGCL, leu TTA S, ser TCA O, stopTAA J, stop TGAL, leu TTG S, ser TCG B, stopTAG W, trp TGG
L, leu CTT P, pro CCT H, his CAT R, arg CGTL, leu CTC P, pro CCC H, his CAC R, arg CGCL, leu CTA P, pro CCA Q, gln CAA R, arg CGAL, leu CTG P, pro CCG Q, gln CAG R, arg CGG
I, ile ATT T, thr ACT N, asn AAT S, ser AGTI, ile ATC T, thr ACC N, asn AAC S, ser AGCI, ile ATA T, thr ACA K, lys AAA R, arg AGAM, metATG T, thr ACG K, lys AAG R, arg AGG
V, val GTT A, ala GCT D, asp GAT G, gly GGTV, val GTC A, ala GCC D, asp GAC G, gly GGCV, val GTA A, ala GCA E, glu GAA G, gly GGAV, val GTG A, ala GCG E, glu GAG G, gly GGG
Mutation causes Mutation causes nucleotide changenucleotide change
What about AA sequence? Synonymous change
Syn = sameAA stays same
Nonsynonymous changeNot sameAA changes
Amino acid codeAmino acid codeF, phe TTT S, ser TCT Y, tyr TAT C, cys TGTF, phe TTC S, ser TCC Y, tyr TAC C, cys TGCL, leu TTA S, ser TCA O, stopTAA J, stop TGAL, leu TTG S, ser TCG B, stopTAG W, trp TGG
L, leu CTT P, pro CCT H, his CAT R, arg CGTL, leu CTC P, pro CCC H, his CAC R, arg CGCL, leu CTA P, pro CCA Q, gln CAA R, arg CGAL, leu CTG P, pro CCG Q, gln CAG R, arg CGG
I, ile ATT T, thr ACT N, asn AAT S, ser AGTI, ile ATC T, thr ACC N, asn AAC S, ser AGCI, ile ATA T, thr ACA K, lys AAA R, arg AGAM, metATG T, thr ACG K, lys AAG R, arg AGG
V, val GTT A, ala GCT D, asp GAT G, gly GGTV, val GTC A, ala GCC D, asp GAC G, gly GGCV, val GTA A, ala GCA E, glu GAA G, gly GGAV, val GTG A, ala GCG E, glu GAG G, gly GGG
Amino acid (AA) typesAmino acid (AA) types
Non-polar A, F, G, I, L, M, P, V, W
Polar N, Q, S, T, Y Charged, + H, K, R Charged, - D, E Other C
Often changing AA within a group does not affect protein function
SelectionSelection Stabilizing selection - Acts to
keep protein function the sameSynonymous change more frequent
than nonsynonymous Amino acid changes occur within
group much more common than betweenNon polar nonpolarPolar polar
Similarity matrixSimilarity matrix
A = alanineC = cysteineD = aspartic acidE = glutamic acidF = phenylalanineG = glycineH = histidine
Comparing sequencesComparing sequences
Can do at either nucleotide or AA level
Gather sequences from a bunch of different organisms
Need to align them so that sites which perform the same function can be compared
Aligning sequencesAligning sequences
Sequences may differ in lengthOften have differences at amino- or
carboxy- terminus of the proteinNeed a way to align parts of protein
that are performing the same function
Example - RH2 opsin in Example - RH2 opsin in fishesfishes
Goldfish MNGTEGNNFYVPLSNRMedaka MENGTEGKNFYIPMNNRZebrafish MNGTEGSNFYIPMSNRKillifish MGYGPNGTEGNNFYIPMSNKTrout MQNGTEGSNFYIPMSNRHalibut MVWDGGIEPNGTEGKNFYIPMSNRCod MRMEANGTEGKNFYIPMSNRTetraodon MVWDGGIEPNGTEGKNFYIPMSNR
Align sequencesAlign sequences
Zebrafish M--------NGTEGSNFYIPMSNR Trout M------Q-NGTEGSNFYIPMSNR Medaka M------E-NGTEGKNFYIPMNNR Cod M----RMEANGTEGKNFYIPMSNR Halibut MVWDGGIEPNGTEGKNFYIPMSNR Tetraodon MVWDGGIEPNGTEGKNFYIPMSNR Goldfish M--------NGTEGNNFYVPLSNR Killifish M---GYG-PNGTEGNNFYIPMSNK * *****.***:*:.*:
* identical: conserved. semi-conserved
Amino acid (AA) typesAmino acid (AA) types
Non-polar A, F, G, I, L, M, P, V, W
Polar N, Q, S, T, Y Charged, + H, K, R Charged, - D, E Other C
Often changing AA within a group does not affect protein function