Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST –...
-
date post
21-Dec-2015 -
Category
Documents
-
view
229 -
download
0
Transcript of Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST –...
![Page 1: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/1.jpg)
Introduction to Bioinformatics - Tutorial no. 5
MEME – Discovering motifs in sequences
MAST – Searching for motifs in databanks
TRANSFAC – The Transcription Factor DB
![Page 2: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/2.jpg)
http://weblogo.berkeley.edu
WebLogo - InputAligned
Sequences(e.g. output of
ClulatlW)
RUN !
![Page 3: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/3.jpg)
Genes:
WebLogo - Output
Proteins:
![Page 4: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/4.jpg)
MEME
http://meme.sdsc.edu/ Motif discovery from unaligned sequences
Genomic or protein sequences Identifies profile motifs
Multiple motifs for any input Flexible model of motif presence
Motif can be absent in some sequences Can appear several times in one sequence
![Page 5: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/5.jpg)
MEME InputEmail address Multiple input sequences
How many times in each sequence?
How many motifs?
How many sites?
Range of motif lengths
![Page 6: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/6.jpg)
MEME Output (1)
Motif length
Number of times
Like BLAST
“Position-Specific Probability Matrix”
= Motif Profile
Diversion of motif position
from background
Most popular symbols
![Page 7: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/7.jpg)
MEME Output (2)
Sequence names
Reverse complement (genomic input only)
Position in sequence
Strength of match
Motif within sequence
![Page 8: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/8.jpg)
MEME Output (3)
Overall strength of motif matches
Original sequence lengths
Motif instance
![Page 9: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/9.jpg)
MAST Searches for motifs (one or more) in
sequence databases: Like BLAST but motifs for input Similar to iterations of PSI-BLAST
Profile defines strength of match Multiple motif matches per sequence Combined E value for all motifs
MEME uses MAST to summarize results: Each MEME result is accompanied by the MAST
result for searching the discovered motifs on the given sequences.
![Page 10: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/10.jpg)
MAST InputEmail address
Database (like BLAST)
Motif file (e.g. MEME output)
Consider matched sequence length
E value threshold
![Page 11: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/11.jpg)
MAST Output (1)
Matched accession
Match E value
Length of sequence
Link to GenBank
![Page 12: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/12.jpg)
MAST Output (2)Motif
diagram
![Page 13: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/13.jpg)
MAST Output (3)
Position of each instance
P value of instance
Matched parts of
sequence
Motif ‘consensus’
Motif and orientation
![Page 14: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/14.jpg)
TRANSFACDatabase of eukaryotic DNA transcription regulation: Individual regulatory sites (SITES table)
Genes to which they belong Proteins which bind them
Proteins which bind sites (FACTORS table) Cellular source of protein Nucleotide motif profile for binding Some grouping and classification
Classification of factors (CLASS table) Position-specific matrices for select factors
(MATRIX table) Cell localization (CELL table)
![Page 15: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/15.jpg)
Searching TRANSFAC www.gene-regulation.com Search a single table
By identifier, factor name, gene name By species, author
Browse your way from table to table Search within a sequence
MatInspector, TFScan (EMBOSS package)
![Page 16: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/16.jpg)
TRANSFAC FactorDT Date; authorFA Factor nameGE Encoding geneSF Structural featuresCP Cell specificity (positive)CN Cell specificity (negative)EX Expression patternFF Functional featuresIN Interacting factors MX MatrixBS Binding SITE DR External databases
References: RN Reference no.RX MEDLINE IDRA Reference authorsRT Reference titleRL Reference data
![Page 17: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/17.jpg)
TRANSFAC MatrixAccession
Position Specific Matrix
Statistical basis
Concensus (IUPAC subset
symbols)
![Page 18: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/18.jpg)
TRANSFAC Site (1)
Accession number
DNA or
RNA
Gene
Gene region
Sequence of regulatory element
Position range of factor
binding site
![Page 19: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/19.jpg)
TRANSFAC Site (2)
Binding factor
accession
Factor name
Binding ‘quality’1 functionally confirmed
2 binding of pure protein
3immunologically
characterized extract
4via known binding
sequence
5extract protein binding to
bona fide element
6 unassigned
Organism
Cellular source
Methods of identifying site
External links
![Page 20: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/20.jpg)
TRANSFAC Factor (1)
AC: Accession number
FA: Factor name
SX: Other names
OS: OrganismOC: Taxonomy
HO: Homologs
CL: Classification
SZ: SizeSX: Amino
acid sequence
![Page 21: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/21.jpg)
TRANSFAC Factor (2)
Protein sequence reference
Features and positions
Structural featuresCell specificity
![Page 22: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/22.jpg)
Question
A biologist at your university has found 15 target genes that she thinks are co-regulated. She gives you 15 upstream regions of length 50 base pairs in FASTA format, file DNASample50.txt, and asks you to identify the motif, and - if possible - the potential regulating protein. She tells you the sequences are from Homo sapiens, and by intuition feels the motifs of length 8. She wants you to suggest only the best possible candidate motif.
![Page 23: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649d575503460f94a35de4/html5/thumbnails/23.jpg)
QuestionAfter you ran all the programs your biologist friend confesses that she is not sure if her intuition about the motif length was correct. Re-run the tool without knowledge of motif length. Do you get the same results?
Determine a potential DNA binding protein using TRANSFAC