Motif discovery Tutorial 5. Motif discovery –MEME –MAST –TOMTOM –GOMO –PROSITE Multiple...
-
date post
20-Dec-2015 -
Category
Documents
-
view
237 -
download
5
Transcript of Motif discovery Tutorial 5. Motif discovery –MEME –MAST –TOMTOM –GOMO –PROSITE Multiple...
Motif discovery
Tutorial 5
• Motif discovery– MEME– MAST– TOMTOM– GOMO– PROSITE
Multiple sequence alignments and motif discovery
Can we find motifs using multiple sequence alignment?
1 2 3 4 5 6 7 8 9 10
A 0 0 0 0 0 3/6 1/6 2/6 0 0
D 0 3/6 2/6 0 0 1/6 5/6 1/6 0 1/6
E 0 0 4/6 1 0 0 0 0 1 5/6
G 0 1/6 0 0 1 1/3 0 0 0 0
H 0 1/6 0 0 0 0 0 0 0 0
N 0 1/6 0 0 0 0 0 0 0 0
Y 1 0 0 0 0 0 3/6 3/6 0 0
..YDEEGGDAEE....YDEEGGDAEE....YGEEGADYED....YDEEGADYEE....YNDEGDDYEE....YHDEGAADEE..
MotifA widespread pattern with a biological significance
Can we find motifs using multiple sequence alignment (MSA)?
YES! NO
Using MSA for motif discoveryCan only work if things align nicely alone
For most motifs this is not the case!
ClustalW - Inputhttp://www.ebi.ac.uk/Tools/clustalw2/index.html
Input sequences
Gap scoring
Scoring matrix
Email address
Output format
http://www.ebi.ac.uk/Tools/muscle/index.html
Muscle
Input sequences
Email address
Output format
Motif search: from de-novo motifs to motif annotation
gapped motifs
Large DNA data
http://meme.sdsc.edu/
MEME – Multiple EM* for Motif finding
http://meme.sdsc.edu/• Motif discovery from unaligned sequences
Genomic or protein sequences• Flexible model of motif presence (Motif can be absent in
some sequences or appear several times in one sequence)
*Expectation-maximization
MEME - InputEmail addres
s
Input file (fasta file)
How many times in each
sequence?
How many motifs?
How many
sites?
Range of motif
lengths
MEME - Output
Motif score
MEME - Output
Motif length
Number of times
Motif score
MEME - Output
Low uncertainty
=
High information content
MEME - Output
Multilevel Consensus
Patterns can be presented as regular expressions
[AG]-x-V-x(2)-{YW}
[] - Either residuex - Any residuex(2) - Any residue in the next 2 positions{} - Any residue except these
Examples: AYVACM, GGVGAA
Sequence names
Position in sequence
Strength of match
Motif within sequence
MEME - Output
Overall strength of motif matches
Motif location in the input sequence
MEME - OutputSequence names
What can we do with motifs?
• MAST - Search for them in non annotated sequence databases (protein and DNA)
• TOMTOM - Find the protein who binds the DNA motifs.
• GOMO - Find putative target genes (DNA) of motifs and analyze their associated annotation terms.
• PROSITE - Search for them in annotated protein sequence databases.
MAST
• Searches for motifs (one or more) in sequence databases:– Like BLAST but motifs for input– Similar to iterations of PSI-BLAST
• Profile defines strength of match– Multiple motif matches per sequence– Combined E value for all motifs
• MEME uses MAST to summarize results: – Each MEME result is accompanied by the MAST result for
searching the discovered motifs on the given sequences.
http://meme.sdsc.edu/meme4_4_0/cgi-bin/mast.cgi
MAST - InputEmail
address
Input file (motifs)
Database
MAST - OutputInput motifs
Presence of the motifs in a given database
TOMTOM
• Searches one or more query DNA motifs against one or more databases of target motifs, and reports for each query a list of target motifs, ranked by p-value.
• The output contains results for each query, in the order that the queries appear in the input file.
http://meme.sdsc.edu/meme/doc/tomtom.html
TOMTOM - Input
Input motif
Background frequencies
Database
DNA IUPAC* codeA --> adenosine M --> A C (amino) C --> cytidine S --> G C (strong) G --> guanine W --> A T (weak) T --> thymidine
B --> G T C D --> G A T R --> G A (purine) H --> A C T Y --> T C (pyrimidine) V --> G C A K --> G T (keto) N --> A G C T (any)
Example: YCAY = [TC]CA[TC]
*IUPAC = International Union of Pure and Applied Chemistry
TOMTOM - OutputInput motif
Matching motifs
TOMTOM – OutputWrong input, ok results
JASPAR
• Profiles – Transcription factor binding sites– Multicellular eukaryotes– Derived from published collections of experiments
• Open data accesss
scoreorganism logoName of gene/protein
GOMO
• GOMO takes DNA binding motifs to find putative target genes and analyze their associated GO terms. A list of significant GO terms that can be linked to the given motifs will be produced.
• GOMO returns a list of GO-terms that are significantly associated with target genes of the motif.
• Gene Ontology provides a controlled vocabulary to describe gene and gene product attributes in any organism.
GOMO - Input
Email addres
s
Input file (motifs)
Database
GOMO - OutputInput motifs
GO annotation
MF - Molecular functionBP - Biological process CC - Cellular compartment
ProSite is a database of protein domains and motifs that can be searched by either regular expression patterns or sequence profiles.
Prositehttp://www.expasy.org/tools/scanprosite
Prosite - inputInput motif
a regular expression
Database
Filters
Prosite - OutputInput motif
Location in the protein sequence
protein