Motif discovery

35
Motif discovery Tutorial 5

description

Tutorial 5. Motif discovery. Agenda. Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM in motif DBs. Cool story of the day: How NOT to be a bioinformatician. Motif – definition. Motif - PowerPoint PPT Presentation

Transcript of Motif discovery

Page 1: Motif discovery

Motif discovery

Tutorial 5

Page 2: Motif discovery

Motif discovery•MEME

Creates motif PSSM de-novo (unknown motif)•MAST

Searches for a PSSM in a DB•TOMTOM

Searches for a PSSM in motif DBs

Agenda

Cool story of the day: How NOT to be a bioinformatician

Page 3: Motif discovery

Motif – definition

Motifa widespread pattern with a biological significance.

Sequence motif

PTB (RNA binding protein)

UCUU

CAP (DNA binding protein)

TGTGAXXXXXXTCACAXT

Page 4: Motif discovery

Sequence motif – definition

1 2 3 4 5 6 7 8 9 10

A 0 0 0 0 0 3/6 1/6 2/6 0 0

D 0 3/6 2/6 0 0 1/6 5/6 1/6 0 1/6

E 0 0 4/6 1 0 0 0 0 1 5/6

G 0 1/6 0 0 1 1/3 0 0 0 0

H 0 1/6 0 0 0 0 0 0 0 0

N 0 1/6 0 0 0 0 0 0 0 0

Y 1 0 0 0 0 0 3/6 3/6 0 0

..YDEEGGDAEE....YDEEGGDAEE....YGEEGADYED....YDEEGADYEE....YNDEGDDYEE....YHDEGAADEE..

Motifa nucleotide or amino-acid sequence pattern that is widespread

and has a biological significance

PSSM - position-specific scoring matrix

Page 5: Motif discovery

Can we find motifs using multiple sequence alignment (MSA)?

YES! NO

Local multiple sequence alignment is a hard problem to solve

Page 6: Motif discovery

Motif search: from de-novo motifs to motif annotation

gapped motifs

Large DNA data

http://meme.sdsc.edu/

Page 7: Motif discovery

MEME

Page 8: Motif discovery

MEME – Multiple EM* for Motif finding

• Motif discovery from unaligned sequences - genomic or protein sequences

• Flexible model of motif presence (Motif can be absent in some sequences or appear several times in one sequence)

*Expectation-maximization

http://meme.sdsc.edu/

Page 9: Motif discovery

MEME - Input

Input file (fasta file)

How many times in each

sequence?

How many motifs?

How many

sites?

Range of motif lengths

Page 10: Motif discovery

MEME - Output

Motif e-value

Page 11: Motif discovery

MEME – Sequence logo

Motif length

Number of appearnces

Motif e-value

A graphical representation of the sequence motif

Page 12: Motif discovery

MEME – Sequence logoHigh information content = High confidence

The relative sizes of the letters indicates their frequency in the sequences The total height of the letters depicts the information content of the position, in bits of information.

Page 13: Motif discovery

Multilevel Consensus

MEME – Sequence logo

Page 14: Motif discovery

Patterns can be presented as regular expressions

[AG]-x-V-x(2)-{YW}

[] - Either residuex - Any residuex(2) - Any residue in the next 2 positions{} - Any residue except these

Examples: AYVACM, GGVGAA

Page 15: Motif discovery

Sequence names

Position in sequence

Strength of match

Motif within sequence

MEME – motif alignment

Page 16: Motif discovery

Overall strength of motif matches

Motif location in the input sequence

MEME – motif locationsSequence names

Page 17: Motif discovery

What can we do with motifs?

• MAST - Search for them in non annotated sequence databases (protein and DNA).

• TOMTOM - Find the protein which binds the DNA motifs.

Page 18: Motif discovery

MAST

Page 19: Motif discovery

MAST

• Searches for motifs (one or more) in sequence databases:– Like BLAST but motifs for input– Similar to iterations of PSI-BLAST

• Profile defines strength of match– Multiple motif matches per sequence

• MEME uses MAST to summarize results: – Each MEME result is accompanied by the MAST result for

searching the discovered motifs on the given sequences.

http://meme.sdsc.edu/meme4_4_0/cgi-bin/mast.cgi

Page 20: Motif discovery

MAST - Input

Input file (motifs)

Database

Page 21: Motif discovery

If you wish to use motifs discovered by MEME

Page 22: Motif discovery

MAST - OutputInput motifs

Presence of the motifs in a given database

Page 23: Motif discovery

MAST – Output (another example, global view)

Page 24: Motif discovery

MAST – Output (another example, global view)

Page 25: Motif discovery

TOMTOM

Page 26: Motif discovery

TOMTOM

• Searches one or more query DNA motifs against one or more databases of target motifs, and reports for each query a list of target motifs, ranked by p-value.

• The output contains results for each query, in the order that the queries appear in the input file.

http://meme.sdsc.edu/meme/doc/tomtom.html

Page 27: Motif discovery

TOMTOM - Input

Input motif

Background frequencies

Database

Page 28: Motif discovery

TOMTOM - OutputInput motif

Matching motifs

Page 29: Motif discovery

TOMTOM – OutputWrong input (RNA sequence of RNA binding protein NOVA1)

“OK” results

Page 30: Motif discovery

MAST vs. TOMTOM

MAST TOMTOMComparison Profile against DB Profile against

ProfileDB General DBs Known motif DBs

Page 31: Motif discovery

Cool Story of the day

How NOT to be a bioinformatician

Page 32: Motif discovery
Page 33: Motif discovery
Page 34: Motif discovery
Page 35: Motif discovery