Bioinformatics
description
Transcript of Bioinformatics
![Page 1: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/1.jpg)
Bioinformatics
Dr. Aladdin Hamwieh Khalid Al-shamaaAbdulqader Jighly
2010-2011
Lecture 3Finding Motifs
Aleppo UniversityFaculty of technical engineeringDepartment of Biotechnology
![Page 2: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/2.jpg)
Main Lines• Definition• Motif types• Motifs problem• Motifs: Profiles and Consensus• Motif Logo• Motif Search in Local Database
![Page 3: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/3.jpg)
Definition
• A motif is a short conserved sequence pattern associated with distinct functions of a protein or DNA.
![Page 4: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/4.jpg)
Motif Types1. Regulatory sequences
![Page 5: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/5.jpg)
Combinatorial Gene Regulation
• A microarray experiment showed that when gene X is knocked out, 20 other genes are not expressed
–How can one gene have such drastic effects?
Combinatorial Gene Regulation
![Page 6: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/6.jpg)
Combinatorial Gene Regulation•Gene X encodes regulatory protein, a.k.a. a transcription factor (TF)
•The 20 unexpressed genes rely on gene X’s TF to induce transcription
•A single TF may regulate multiple genes
Regulatory Protein
![Page 7: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/7.jpg)
• Every gene contains a regulatory region (RR) typically stretching 100-1000 bp upstream of the transcriptional start site• Located within the RR are the
Transcription Factor Binding Sites (TFBS), also known as motifs, specific for a given transcription factor• TFs influence gene expression by
binding to a specific location in the respective gene’s regulatory region - TFBS
Regulatory Regions
![Page 8: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/8.jpg)
• A TFBS can be located anywhere within the Regulatory Region.
• TFBS may vary slightly across different regulatory regions since non-essential bases could mutate
Transcription Factor Binding Sites
![Page 9: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/9.jpg)
geneATCCCG
geneTTCCGG
geneATCCCG
geneATGCCG
geneATGCCC
Motifs and Transcriptional Start Sites
![Page 10: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/10.jpg)
TTGACA
-35 hexamerspacer
TATAAT
-10 hexamer
Transcription start site
interval
15 - 19 bases 5 - 9 bases
-35 -10
A weight matrix contains more information
ATGC
1 2 3 4 5 6ATGC
1 2 3 4 5 6
Based on ~450 known promoters
0.1 0.1 0.1 0.5 0.2 0.5 0.7 0.7 0.2 0.2 0.2 0.2
0.1 0.1 0.5 0.1 0.1 0.2
0.1 0.1 0.2 0.2 0.5 0.1
0.1 0.7 0.2 0.6 0.5 0.1
0.7 0.1 0.5 0.2 0.2 0.8
0.1 0.1 0.1 0.1 0.1 0.0
0.1 0.1 0.2 0.1 0.1 0.1
Consensus considerations
![Page 11: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/11.jpg)
• GAL4 in Yeast– Activator of galactose-
induced genes (convert galactose to glucose)
– Protein structure determines motif• DNA-protein interactions
require certain bases at specified locations• Motif reflects homodimer
structure
Example
![Page 12: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/12.jpg)
Motif Types2. Motifs in protein structure
![Page 13: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/13.jpg)
Importance• Functional relationships between
proteins cannot be distinguished through simple BLAST or FASTA database. • Proteins often perform multiple functions
that cannot be fully described using a single annotation. • To resolve these issues, identification of
the motifs and domains becomes very useful.
![Page 14: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/14.jpg)
atgaccgggatactgataccgtatttggcctaggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg
acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatactgggcataaggtaca
tgagtatccctgggatgacttttgggaacactatagtgctctcccgatttttgaatatgtaggatcattcgccagggtccga
gctgagaattggatgaccttgtaagtgttttccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga
tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatggcccacttagtccacttatag
gtcaatcatgttcttgtgaatggatttttaactgagggcatagaccgcttggcgcacccaaattcagtgtgggcgagcgcaa
cggttttggcccttgttagaggcccccgtactgatggaaactttcaattatgagagagctaatctatcgcgtgcgtgttcat
aacttgagttggtttcgaaaatgctctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta
ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatttcaacgtatgccgaaccgaaagggaag
ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttctgggtactgatagca
Random Sample
![Page 15: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/15.jpg)
Implanting Motif AAAAAAAGGGGGGG
atgaccgggatactgatAAAAAAAAGGGGGGGggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg
acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaataAAAAAAAAGGGGGGGa
tgagtatccctgggatgacttAAAAAAAAGGGGGGGtgctctcccgatttttgaatatgtaggatcattcgccagggtccga
gctgagaattggatgAAAAAAAAGGGGGGGtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga
tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatAAAAAAAAGGGGGGGcttatag
gtcaatcatgttcttgtgaatggatttAAAAAAAAGGGGGGGgaccgcttggcgcacccaaattcagtgtgggcgagcgcaa
cggttttggcccttgttagaggcccccgtAAAAAAAAGGGGGGGcaattatgagagagctaatctatcgcgtgcgtgttcat
aacttgagttAAAAAAAAGGGGGGGctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta
ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatAAAAAAAAGGGGGGGaccgaaagggaag
ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttAAAAAAAAGGGGGGGa
![Page 16: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/16.jpg)
• Hard to identify– Relatively short sequences (as small as 6
bases)– Many positions not well conserved
• Factors improving identification– Usually localized in certain proximity of a
gene (search within 3 kb upstream)– Some positions highly conserved– Use other data (Microarray?)
The Challenge
![Page 17: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/17.jpg)
• Find a motif in a sample of:• 20 “random” sequences (e.g. 600
nt long)• each sequence containing an
implanted pattern of length 15. • each pattern appearing with 4
mismatches as (15,4) motif.
Challenge Problem
![Page 18: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/18.jpg)
atgaccgggatactgatagaagaaaggttgggggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg
acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatacaataaaacggcggga
tgagtatccctgggatgacttaaaataatggagtggtgctctcccgatttttgaatatgtaggatcattcgccagggtccga
gctgagaattggatgcaaaaaaagggattgtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga
tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatataataaaggaagggcttatag
gtcaatcatgttcttgtgaatggatttaacaataagggctgggaccgcttggcgcacccaaattcagtgtgggcgagcgcaa
cggttttggcccttgttagaggcccccgtataaacaaggagggccaattatgagagagctaatctatcgcgtgcgtgttcat
aacttgagttaaaaaatagggagccctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta
ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatactaaaaaggagcggaccgaaagggaag
ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttactaaaaaggagcgga
Where is the Motif???
![Page 19: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/19.jpg)
AgAAgAAAGGttGGG
cAAtAAAAcGGcGGG|||..|.|||..|..
Why Finding (15,4) Motif is Difficult?
atgaccgggatactgatAgAAgAAAGGttGGGggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg
acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatacAAtAAAAcGGcGGGa
tgagtatccctgggatgacttAAAAtAAtGGaGtGGtgctctcccgatttttgaatatgtaggatcattcgccagggtccga
gctgagaattggatgcAAAAAAAGGGattGtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga
tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatAtAAtAAAGGaaGGGcttatag
gtcaatcatgttcttgtgaatggatttAAcAAtAAGGGctGGgaccgcttggcgcacccaaattcagtgtgggcgagcgcaa
cggttttggcccttgttagaggcccccgtAtAAAcAAGGaGGGccaattatgagagagctaatctatcgcgtgcgtgttcat
aacttgagttAAAAAAtAGGGaGccctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta
ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatActAAAAAGGaGcGGaccgaaagggaag
ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttActAAAAAGGaGcGGa
![Page 20: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/20.jpg)
a G g t a c T t C c A t a c g t
Alignment a c g t T A g t
a c g t C c A t C c g t a c g G
_________________
A 3 0 1 0 3 1 1 0
Profile C 2 4 0 0 1 4 0 0
G 0 1 4 0 0 0 3 1 T 0 0 0 5 1 0 1 4
_________________
Consensus A C G T A C G T
• Line up the patterns by their start indexes
s = (s1, s2, …, st)
• Construct matrix profile with frequencies of each nucleotide in columns
• Consensus nucleotide in each position has the highest score in column
Motifs: Profiles and Consensus
![Page 21: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/21.jpg)
Motif Search in Local Database
![Page 22: Bioinformatics](https://reader033.fdocuments.us/reader033/viewer/2022052603/568168f3550346895ddffafc/html5/thumbnails/22.jpg)