Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define...

88
Computa(onal Approaches to Promoter Analysis Part II Shifra Ben‐Dor June 2010

Transcript of Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define...

Page 1: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Computa(onalApproachestoPromoterAnalysis

PartII

ShifraBen‐Dor June2010

Page 2: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Firstdefineyourques(on

•  Areyoulookingforthetranscrip(onstartsite?

•  Areyouinterestedintranscrip(onfactorbindingsites?

•  Areyouinterestedinproximalordistalsignals?

Page 3: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

•  LocateTSS

•  Iden(fytranscrip(onfactorbindingsites

•  Characterizeregulatoryproper(es

Page 4: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

TSSiden(fica(on

•  ThereareprogramsthataKempttoiden(fyTSS

•  Thesearebasedondifferentalgorithms

•  Thebestistousewhatevertranscrip(onalinforma(onisavailable

•  ExpertsPick:UCSCGenomeBrowser

Page 5: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Fickett JW, Hatzigeorgiou AG. Genome Res 1997 Sep;7(9):861-78

Evaluation of several programs ability to predict TSS If no tss, then 3’ end of promoter region Correct if 200 bp 5’ or 100 bp 3’ of actual

Page 6: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Both studies found that the absolute measures of correctness of all programs were quite low, although Hu et al. found that the algorithms which they tested were capable of predicting at least one binding site accurately more than 90% of the time.

MICROBIOLOGY AND MOLECULAR BIOLOGY REVIEWS, Sept. 2009, p. 481–509 Vol. 73, No. 3

Page 7: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

•  Sequencebasedmethods

•  Wordbasedmethods

•  Phylogene(cfootprin(ng

Transcription factor binding sites:

Page 8: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Sequencebasedmethods

•  Lookforconsensussequencesthatareknowntobeinpromoters

– PromoterelementssuchasTATA,CCAAT

– Transcrip(onFactorbindingsites

•  Comparetoexis(ngpromoters

•  Bothrequiredatabases,eitherofpromoters,

promoterelementsorbindingsites

Page 9: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Corepromoterconservedsequences

•  TATAbox TATAA(AA)

•  InrBoxMammals: PyPy(C)A+1NT/APyPy

Drosophila: TCA+1G/TTC/T

•  DPE(A+24) A/G+28G,A/T,C/T,G/A/T

•  MTE CSARCSSAACGS

•  BRE G/C,G/C,G/A,CGCC

•  CpGIslands

Page 10: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Wheredowegetthesequencesfrom?

•  EMSA

•  Fingerprin(ng•  In‐vitroselec(onorSELEX(Systema(cEvolu(onof

LigandsbyEXponen(alenrichment)

•  High‐throughputbindingtooligos•  Computa(onalanalysisofsuspectedco‐regulated

genes

Page 11: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

In‐vitroSelec(on:SELEX

•  StartfromalargelibraryofdsDNAoligos(20‐30bp)

•  Allowproteintobind•  Selectsequencesthatdobind(e.g.column)•  Releasethemfrombinding

•  Amplifythesequencesofthebinders•  Repeat5‐15(mes•  Alignfinalsequences

Page 12: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Typesofsequence

•  Specificsequence

•  Consensussequence– Majority–  IUPAC(degenerate)

•  Posi(onWeightMatrices(PWM)

ACGTCG TGGTAG ATGTAG ATGTAG

ATGTAG

WBGTMG

1 2 3 4 5 6 A 3 0 0 0 3 0C 0 1 0 0 1 0 G 0 1 4 0 0 4T 1 2 0 4 0 0

Page 13: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Posi(onWeightMa(x(PWM)

•  Assignsaweighttoeachpossiblenucleo(deateachposi(onofabindingsite

•  Thesumoftheweightsisthesitescore

•  Basis:differentposi(onsinthesitecontributeindependentlytobinding

•  Cangiveascoretomatches

Page 14: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Databases

•  EPD(Eukaryo(cPromoterDatabase)

•  Transfac

•  Matbase

•  JASPAR

•  TRED

•  TFD(Transcrip(onFactorDatabase)

•  TRRD(Transcrip(onRegulatoryRegionDatabase)

•  dbTSS(DatabaseofTranscrip(onStartSites)

Page 15: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

PlantSpecificDatabases

•  Place

•  PlantCare

•  PlantPromDB

Page 16: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Problemswithsequence‐basedsearches

•  Notallbindingsitesdefined•  Samesitemightbinddifferentfactorsindifferentcell

types,stagesofdevelopment….andviceversa

•  Bindingsitesareverydegenerate•  Bindingsitesareveryshort,canbefoundatrandomin

manyplacesinthegenome

Page 17: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Problemswithsequence‐basedsearches

•  Notallbindingsitesdefined•  Samesitemightbinddifferentfactorsindifferentcell

types,stagesofdevelopment….andviceversa

•  Bindingsitesareverydegenerate•  Bindingsitesareveryshort,canbefoundatrandomin

manyplacesinthegenome

Page 18: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

OtherProblems….

•  WehavetorememberthatthefactthatabindingsiteexistsdoesNOTmeanthatitisbound.

•  Thesitesareshortanddegenerate,andsoappearmany(mesatrandominthegenome(Inrevery512bpandTATAevery120bp)

Page 19: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

OtherProblems….

•  It’salsoimportanttorememberthatmanytranscrip(onfactorbindingsitesareac(veregardlessoftheorienta(on(plusstrandorminusstrand)

•  MostalgorithmsarebuilttodealwithsinglestrandedDNA(onedirec(on),whilemanypromoterelementsbidirec(onal

Page 20: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Problems

•  Mostprogramspredict1/1000bp•  Genesareonly1perthousands•  Programsbasedontranscrip(onfactordensitymayalsofindenhancers

Page 21: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Loca(on,Loca(on,Loca(on

Page 22: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Themostcri(calfactorinpromotersearchesistodefinethecorrectupstreamregion

Loca(on,Loca(on,Loca(on

Page 23: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Stopthinkingprotein‐centric!

•  mRNAdoesNOTstartwithATG!!!!Transla(onstartswithATG.

•  Inmoreandmoregenes,wefindthatthefirstexonisshortandnon‐coding.Thefirstintronhasatendencytobelong(longerthanmostpredic(onprogramscanhandle).

•  Makesureyouhavethecomplete5’endofyourgene.

Page 24: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

This was the state of the database as of 22.02.04, when we ran the search

Page 25: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

But then we zoomed out…

*

Page 26: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

* * * * * *

This is what the database looked like one year later:

Three of the sequences were added 03.03.04

Page 27: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Wordbasedmethods

•  Trytodefinedifferencesbetweenpromoterandnon‐promoterregionsbasedonthedifferenceinnucleic

acidcomposi(on.

•  Somecomparebetweenpromoterandallother,some

definepromoter,coding,andothernon‐coding(intronandutr).

•  Promoterelementsofcourse,canbefoundinintrons

aswell….

Page 28: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Phylogene(cFootprin(ng

•  TerminologyborrowedfromDNAfootprin(ng,acommonmethodtodeterminebindingsites.

•  Comparisonoftwoormorespeciestofindconservedtranscrip(onfactorbindingsites.

•  Orthologvsparalog

Page 29: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Phylogene(cFootprin(ng

•  Basedonthetheorythatpromoterelements,likegenes,shouldbeconservedbetweenspecies

•  Alignmentoftwoormoregenomesequences(bothlocalandglobal)todefineareasofconservedsequence

•  Furtheranalysishastobeperformedtofindthepromoterrelatedaspectsofthecomparison

Page 30: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

ChoosingOrganisms

•  Nottooclose‐otherwisetoomuchisconserved

•  Nottoofar‐otherwisenotenoughisconserved

•  Recentworkhasshownthatbetweenmouseandhumantherearemajordifferencesinfunc(onalregulatorysites

Page 31: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Whatstepsareneeded?

•  Choosingthesequencestoalign•  Alignmentofthesequences

•  Findingthebindingsites

•  Thesecondandthirdstepscanbedoneineitherorder,dependingonthealgorithm

Page 32: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

SequenceAlignment

•  Sequencealignmentcanbelocalorglobal.

•  Someprogramsrequirepre‐alignmentdonebytheuser.Othersincludethealignmentaspartofthepackage.

•  Someprogramsrequiremasking.

Page 33: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Post‐chip

•  Basedontheassump(onthatgenesthathavesimilarexpressionprofileshavesimilarregula(onprofiles(co‐regula(on)

•  Thisassump(onisnotalwaystrue(forexample,posi(onaleffects)

Page 34: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Post‐Chip

•  Twomajortypesofsearch:

– Searchforknownbindingsites

– Searchfornewbindingsites

•  Someprogramsdoone,theotherorboth

•  Someprogramsusetheexpressionprofiles,othersdon’t

Page 35: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Therearedifferenttypesofprograms…

•  Packagesthatdoeverything•  Programsthatlookforknowntranscrip(onfactorbindingsitesingivenpercentagesofsequences

•  Programsthatlookforoverrepresentedbindingsites

•  Programsformo(fdiscovery

Page 36: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

The biological thinking has to be done by YOU

Page 37: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Databases

Page 38: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

EPD(Eukaryo(cPromoterDatabase)

•  ExperimentallymappedTSS

•  Surroundingregion

•  Curated

Page 39: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

TRED

•  Knownpromoters+predictedpromoters

•  Human,mouse,rat

•  Curated

Page 40: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

TFD

•  FirstPublicDatabase•  Orginialdatabaselastupdatedin1993•  Newversionavailable:ooTFD

•  Lacksinterfacetousethefiles(originalversionworkedwithGCG,newonecanbeusedwithSQLquerylanguage)

Page 41: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

TRRD

•  Hierarchicaldatabasewithmanylevelsofregula(on,frompathwaystoindividualbindingsites

•  Formostapplica(ons,commercial

Page 42: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

TransFac

•  Up‐to‐datedatabaseoftranscrip(onfactorbindingsites,consensussequences,andweightmatrices

•  Hasprogramsforsearchandanalysisonsite

•  Moreandmoreofthesiteiscommercial

Page 43: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

MatBase

•  Up‐to‐datedatabaseoftranscrip(onfactorbindingweightmatrices

•  Hasprogramsforsearchandanalysisonsite

•  Commercial

Page 44: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

JASPAR

•  Transcrip(onFactorbindingsites,modeledasmatrices

•  Opensource!

•  Experimentalevidence(forthecorematrices),includesselex

•  Alsohasphylogene(callyextractedelements

•  Curated,butsmall

Page 45: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

dbTSS

•  Transcrip(onstartsitestakenfromsequence(experimental!)data(TSS‐seqinvariouscelllinesandcondi(ons)

•  sourcesofinforma(on:TSS‐Seq,mRNAfrom5’enrichedlibraries,ESTs,orCAGEtags

•  Bestforhumanandmouse,alsohaveinfoforsomeotherspecies

Page 46: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

PromoterPredic(onPrograms

Page 47: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

NNPP •  Neural Network Promoter Prediction •  Trained on TATA and Inr, allowing

variable lengths between them •  Output: Predicted TSS

Page 48: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

SignalScan •  Specific, consensus and matrix searches •  Based on TFD and TransFac •  Web, PC and Unix versions available

Page 49: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Audic/Claverie •  Markov Models based on comaprison of

EPD sequences to sequences flanking them

•  For given window Bayesian choice is made whether it is promoter or non-promoter

Page 50: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Dragon Promoter Finder

•  Uses ANN (artificial neural networks) •  First step: Takes a sliding window and

determines whether it is CG rich or poor •  Second step: Puts through three filters

(promoter, exon, intron) using pentamer PSSM, gives a score.

•  Third step: ANN •  Output: prediction of TSS, directional

Page 51: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Promoter2.0 •  Neural Network-Genetic Algorithm •  Based on conserved sequences and

conserved distances between them •  Discriminates between promoter and

non-promoter sequences •  Output:Predicted TSS

Page 52: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

PromFind

•  Intended for finding promoters when gene location is approximately known, strand known

•  Based on differences in hexamer frequencies between promoter, protein coding regions, and noncoding regions downstream of the first exon

•  Finds all possible sites, takes site with highest discrimination between promoter and non-coding as the promoter

•  Output: Only one prediction

Page 53: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

PromoterScan

•  Looks at both TATA box weight matrix and density of transcription factor binding sites

•  Compares to promoter recognition profile derived from comparison of promoter to non-promoter Primate sequences (takes the sites from TFD, checks frequency in EPD)

•  Output: Either TSS or 250 bp window representing the core promoter

Page 54: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

PromoterScan

•  Latest version includes option to do further analyses: – Compare to EPD to find similar promoter – Provide a list of binding sites common to the

predicted promoter and promoters in EPD

Page 55: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

TSSW/TSSG/TSSP

•  Linear discriminant function based on: –  TATA box score –  Triplet preferences around TSS –  Hexamer frequencies in consecutive upstream 100-bp

regions –  Transcription Factor binding sites

•  W - TFD, G - TransFac, P - Plant •  Output: promoter predictions, list of

transcriptional elements

Page 56: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

CorePromoter •  Quadratic discriminant analysis

– Pentamers in 30bp windows and 45 bp windows in a 240bp region

•  Output: Predicted TSS

Page 57: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Tess

•  String Search, Filtered String Search and Weight matrices

•  Based on Transfac •  Allows mismatches •  Various cutoffs available •  Filtering is important (by organism, cell type…) •  For matrix searches, 3 classes (vertebrate, non-

vertebrate, and fungi)

Page 58: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

TFsearch •  Weight matrices

Page 59: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Match/Patch •  Whats left of what was once available

with transfac •  Uses weight matrices •  Has some tissue specific profiles

Page 60: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Genomatix

•  MatInspector – Core and matrix cuttoffs – Organism classes – Uses MatBase matrices

•  FastM - builds a module – Number of elements – Transcription factor binding sites – Distance between them

Page 61: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

PromoterInspector

•  Focuses on on the genetic context of promoters, not their exact location

•  Utilizes an unsupervised learning approach •  Compares word (IUPAC groups) frequencies

between promoters, exons, introns and 3’UTR •  Uses sliding windows which are classified as

above, if a number of consecutive windows have the same classification

•  Always predicts on both strands - context!

Page 62: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

PromFD

•  Residue composition •  Find exact promoter (doesn’t specify strands) •  Algorithm:

–  Words of different lengths (5-10bp) over-represented in promoter vs non-promoter

–  Search against weight matrices, again comparing frequency in prom vs non-prom

–  Results input into PromFD database –  Input sequences are searched against this database

Page 63: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

ConPro (Consensus Promoter)

•  Align mRNA or EST with genomic sequence

•  Use Genscan to predict missing 5’ region (70kb)

•  Region upstream of 5’ end chosen (1.5 kb)

•  Run TSSG, TSSW, Proscan, PromFD, NNPP

•  Results compared to create consensus

Page 64: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

ConPro

Page 65: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

GeneSpring •  Can ask the program to take x number

of bases upstream and look for a particular binding site

Page 66: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Grail

•  Can’t separate the promoter prediction module, part of gene prediction

•  Needs coding region (without, can’t find anything)

•  Scores for TATA (must have), GC, CAAT, cap site, translation start

•  Uses matrices and Neural network

Page 67: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

phylogene(cfootprin(ngprograms

Page 68: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Local Alignment Programs •  Blast-z (Pipmaker, zpicture) •  Blat •  Dialign

Page 69: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Global Alignment Programs •  AVID (mVista)

– Seed: exact match •  Lagan

– Seed: doesn’t require exact match

Page 70: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Consite

•  Uses DPB global alignment •  Local scanning for conserved segments •  Scan conserved segments for

conserved binding sites (Ann-spec, Jaspar)

•  Can also input existing alignment, or profile of choice

Page 71: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Conreal

•  Starts by searching for transcription factor binding sites

•  Pairwise comparison of hits and flanking regions

•  Sorts list of hits by homology •  Anchors hits on sequence, throwing

out overlaps to existing hits (assumes order)

Page 72: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

PromH •  Based on TSSW •  Added to the linear discriminant

function: measure of conservation in several points around predicted TSS

•  Better for TATA containing promoters, than TATAless

Page 73: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Trafac

•  Blast-z (pipmaker) •  Match or MatInspector •  Can search database of existing

alignments •  if you register, you can enter your own

sequence (both phylogenetic and coregulated)

Page 74: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Footprinter

•  Uses dynamic programming •  Requires both sequences and

phylogenetic tree relating them •  Identifies motifs that mutate at a

slower rate than the sequence surrounding them (finds the motifs)

•  Works best with large groups of sequence

Page 75: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

rVISTA

•  Based on phylogenetic footprinting •  Input: a global multiple alignment of

two or more sequences (mVISTA or mAVID)

•  Output: a viewer where the user can visualize the predicted transcription factor binding sites on the background of sequence conservation

Page 76: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

rVISTA

1)  Potential transcription factor binding sites are predicted based on TRANSFAC for both human and mouse sequences independently. Only the hits where core positions of the human and mouse potential binding sites correspond are called aligned hits. A qualifying aligned hit is allowed a maximum core shift of 6 basepairs (bp), and only one gap of any length inside it.

Page 77: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

rVISTA

2) Human-mouse sequence conservation of a DNA region spanning a transcription factor binding site is assessed using a strategy that identifies the maximal percent identity for the DNA fragment surrounding the core of a binding site by allowing a dynamic shift. Only predicted binding sites located in the sequence fragments conserved at the level of over 80% over 24 bp window were selected. These are conserved hits.

Page 78: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

promotersingroupsofgenes

Page 79: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

WebMOTIFs

•  Runs multiple de novo motif discovery programs – Puts all the results together

•  Runs Bayesian motif discovery program – Has models for transcription factor

families

Page 80: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

PROMO (multi-search site) •  Based on Transfac Matrices •  Can be limited by species •  Derives the Matrices from scratch •  Can find hits in all or part of a search

set

Page 81: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

OTFBS •  Looks for over represented binding

sites in a group of sequences •  Uses Transfac matrices •  Uses MatInspector to find candidate

binding sites

Page 82: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Toucan

•  Suite of programs: •  Automated upstream retrieval

– Ensembl based – Can also use user sequence

•  Find known transcription factor binding sites – Runs Motif Scanner (Markov Models)

Page 83: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Toucan

•  Identify putative regulatory regions – Runs AVID/VISTA

•  Find new sites – Runs MotifSampler (Gibbs based)

•  Find overrepresented sites – binomial distribution model

•  Finds Modules (using ModuleSearcher) – Two methods: A*, Genetic Algorithm

Page 84: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Amadeus

•  Doesdenovobindingsitefinding•  Canworkonprecomputedgroups,orcanworkonrawmicroarraydata

•  Cancomparetoknownbindingsites

Page 85: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Prima

•  Part of Expander •  Based on Transfac Matrices •  Looks for over represented Matrices

in a group of coregulated sequences •  Compares to a background model

Page 86: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Reduce

•  For Yeast and Drosophila (maybe good for others as well)

•  The expression level of a gene is modeled as a sum of contributions of all binding sites in the promoter region

•  Identifies motifs, and does regression analysis to infer the the activity.

Page 87: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

R-Motif •  Utilizes Expression Coherence •  Motif Characterization •  Motif Refinement (extension and

mutation) •  De Novo Motif finding •  Currently only with Yeast Data

Page 88: Computaonal Approaches to Promoter Analysis Part II€¦ · Word based methods • Try to define differences between promoter and non‐ promoter regions based on the difference

Frameworker

•  Part of Genomatix •  Uses MatBase matrices •  Looks for binding sites in similar

order in a given percentage of your dataset