Transcription Factor DNA Binding Prediction

1. Transcription Factor-DNA binding predictionTahmina AhmedProsunjit BiswasIffat Sharmin ChowdhuryBadri Sampath 1

2. Motivation Label the unlabeled DNA sequences by the model, built by examining the labeled DNA sequences and be able to perceive some real world Machine Learning problems. 2

3. Approaches K-mer based Fixed length K-mer K-mer with Mismatches Using Regular Expression PWM based MEME and MAST Combined Model Unite both model 3

4. K-mer Approach Based on Regular ExpressionMotivation 2-mer appears mostly in the sequences. So, emphasize mostly on 2-mer.Strategy - For any two 2-mers X & Y, generate regular expression X(.*)Y and Y(.*)X. - Use these Regular expression as candidate attribute.

5. Classifier Selection Fig : Around 9 classifiers applied on TF data setAlgorithms are numbered as follows - (1)Logistic (2)SMO (3)NaiveBayes (4)BayesianLogisticRegression (5)Kstar (6)Bagging 7)LogitBoost (8)RandomForest (9)J48Summary - * 9 classifiers are applied on 10 data set. 3 are shown among them * choosing an absolute classifier is not a trivial task * same classifier behaves differently on different data sets 5

6. Change in Accuracy due to Different Classifiers Logistic J48 RandomForest NaiveBayes Logistic J48 RandomForest NaiveBayes Fig : The performance of different types of Classifiers on TF_3 data set Fig : The performance of different types of Classifiers on TF_5 data setSummary - * classifiers have great consequences on accuracy * one has to be prudent when choosing classifiers 6

7. Change in Accuracy due to Different K-mer Length 4-mer 5-mer 6-mer Fig : The performance of different length K-mer on TF_3 data setSummary - * K-mer length also has consequences on accuracy * not trivial, difficult to find the absolute one 7

8. Attribute Space Selection Fig : The performance of different selecting k-mer on TF_4 data setSummary - * considering number of attributes also has consequences on accuracy * accuracy increases if we consider greater number of attributes, but from such saturation point it decreases. 8

9. PWM based Analysis on Accuracy (TF_1 data set)Fig : J48, minW 6 - maxW 15, no. of sites 10 Fig : J48, minW 6 maxW 15, no. of motifs 5Summary - * accuracy increases when we have more motifs but fixed no. of sites * accuracy increases when we have more sites but fixed no. of motifs * what happened when we increases both ????? 9

10. PWM based Analysis Fig : Accuracy vary on no. of motifs and no. of sites* 1st bar concern with no. of sites* 2nd bar concern with no. of motifs* 3rd bar concern with accuracy* the point is that accuracy decreases when we increases no. of motifs and no. of sites.

11. Extra Work for TF_20 Sequences identified by both modelK-mer The New Model + for TF-20Pwm Sequences Biased 2- Newly identified mer Model Labeled differently Sequences Fig : Flow diagram of Building New Model for TF-20Summary - * we have done some extra work for TF_20

12. AUC based on the Feedback (bonus model) Fig : AUC of 10 data sets based on last submission* accuracy improved than first submission* PWM does not have pleasant result 12

13. Participation Background Working Working Paramete Automation Study with Tools with r Tuning Models Badri DNA,RNA, AlignAce, PWM K-mer Arff Writer, Sampath protein, MEME, Mast output motif MAST writer Iffat Protein, Weka, K-mer PWM Script for Sharmin Motif, AlignAce, FASTA,Chowdhury Transcriptio ScanAce Weka nProsunjit DNA, MEME, K-mer PWM Script for Biswas Transcriptio MAST RE, for new nK-mer model Tahmina MEME, MEME, PWM K-mer Script for Ahmed MAST, MAST, MEME, PWM Weka MAST 13

14. Acknowledgment 14

15. Questions ???

Transcription Factor DNA Binding Prediction

Technology

Transcript of Transcription Factor DNA Binding Prediction

dPeak: High Resolution Identification of Transcription Factor Binding ...

DNA binding factorsgenetics.wustl.edu/...DNAbindingfactors_2019.pdf · •Specific protein and DNA binding •Transcription factor binding sites recognition •Statistical definition:

Sequence-specific DNA Binding and Transcription Factor ...

The orientation of transcription factor binding site ...

Prediction of EF-hand calcium-binding proteins and ...jcmb.halic.edu.tr/pdf/8-2/5.pdf · Prediction of EF-hand calcium-binding proteins and ... protein prediction and calcium-binding

Methods for function prediction based on diverse large ...€¦ · Methods for function prediction based on diverse large-scale data ... – Transcription factor binding sites –

(Transcription Factor-Transcription Factor Binding Site) patterns

Transcription Factor Binding Element Detection Using Functional …rulai.cshl.edu/reprints/go_cluster_NAR.pdf · 2004. 4. 14. · Transcription Factor Binding Element Detection Using

Location analysis of transcription factor binding sites

Transcription-factor binding and sliding on DNA studied ...

Prediction of Binding Poses and Binding Affinities for Glycans and … · Prediction of Binding Poses and Binding Affinities for Glycans and their Binding Proteins using a Robust

Eukaryotic Transcription factors: Transcription … lecture 6.pdfEukaryotic Transcription factors: ... Major functional domains of eukaryotic transcription factor DNA binding ... •

Interplay between Minor and Major Groove-binding Transcription ...

DNA-Binding Specificity of GATA Family Transcription Factors

Finding Transcription Factor Binding Sites

Improved Models for Transcription Factor Binding Site Identification ...

In Silico Transcription Factor Binding Site Prediction How ...pieterdb/MASTERS/TFBS_prediction_how_… · In Silico Transcription Factor Binding Site Prediction ... reflect in vivo

Prediction of ligand binding sites

Unraveling determinants of transcription factor binding outside the ...

Subtypes of Associated Protein-DNA (Transcription Factor-Transcription Factor Binding Site) Patterns