Hal.05

1
RECOMB 2005, Poster Session A, Bay 43 Integrated Design Flow for Universal DNA Tag Arrays N. Hundewale 1 , I. Mandoiu 2 , L. Perelygina 3 , C. Prajescu 2 , and A. Zelikovsky 1 1 CS Department, GSU, 2 CSE Department, UCONN, 3 Department of Biology, GSU DNA m icroarrays provide a tool foransw ering a w ide variety ofquestions aboutthe dynam ics ofcells In w hich cell tissues and underw hatenvironm ental conditions is each gene active? H ow does the activity level ofa gene change w ith:cell cycle stage, environm ental conditions,disease,etc.? W hatgenes seem to be regulated together? U niversaltag arrays (U TA s)technology Provides unprecedented assay custom ization flexibility w hile m aintaining a high degree ofm ultiplexing and low unitcost In this posterw e describe an integrated design flow for genom ic assays based on U TAs W e use the proposed flow to design U TA-based assays for m easuring H erpes B viral gene expression in cells derived from m acaque and hum an hosts Afterdefining a “B virus m olecularsignature”,the assay can provide a sensitive tool forearly B virus infection diagnosis and differentiation betw een B herpes and the closely related herpes sim plex viruses A bstract U niversalD N A Tag A rrays “Program m able” A rray Form at[Brenner97,M orris etal.98] Array consists ofapplication independentoligonucleotides called tags –Two-partreporterprobes:aplicationspecific prim ers ligatedto antitags D etection carried by a sequence ofreactions separately involving the prim erand the antitag partofreporterprobes Tag/A ntitag H ybridization C onstraints (H 1)Antitags hybridize strongly to com plem entary tags (H 2)N o antitag hybridezes to a non-com plem entary tag (H 3)Antitags do notcross-hybridize to each other t1 t1 t2 t2 t1 t2 t1 + M ix reporterprobes w ith genom ic D N A Solution phase hybridization Solid phase hybridization Single-B ase Extension G eneric U TA -B ased A ssay B ioperl Sequences in FASTA form at ORFsin Fasta form at G enM ark/ O R F Finder Probe pools Promide Tag/antitagsequences PerT ags G enom ic ID s Assay parameters R eporterprobes Prim erDel+ H ybridization Experim entand Analysis H ybridization Experim entand Analysis D esign Flow Tag SetD esign C ycle Packing A lgorithm [Mandoiu&Trinca 05] T {} 1. Foreach cycle C in c-token factorgraph G ,in increasing orderofcycle length,do IfC has no c-tokens in com m on w ith T,then add tag defined by C to T and rem ove C from G 2. R eturn T Find:m axim um cardinality setoftags such thatno tag/tag or tag/antitag pairshares a substring ofw eight c W here:w eight(A)=w eight(T)=1,weight(C )=w eight(G )=2,and c is a given hybridization stringency constant C onservative form alization of(H 1)-(H 3)based on nucleation com plex theory and 2-4 rule: Tag A ssignm ent Prim er-to-tag hybridization constraints: Ifprim erp hybridizes w ith tag t,then either p ortm ustbe leftun-assigned,unless p is assigned to t p t t’ p’ M axim um A ssignable Prim erSetProblem : given prim erset P and tag setT,find a m axim um size assignable subsetofP G reedy prim erdeletion heuristic [Ben-D or04] R epeatedly delete a prim erofm axim um w eightuntil P becom es assignable,where W eightofp is sum ofpotentials oftags to w hich ithybridizes Potential ofa tag hybridizing w ith k prim ers is 2 -k Prim erD el+ [M andoiu etal.05] M odified prim erdeletion heuristic (exploiting availability ofseveral prim ercandidates w ith equivalentfunctionality Experim entalR esults % U til. # arrays % U til. # arrays % U til. # arrays 76.10 1 99.80 2 97.80 4 5 76.10 1 98.90 2 96.73 4 1 1522 70 78.00 1 99.90 2 98.00 4 5 78.00 1 98.70 2 96.53 4 1 1560 67 72.30 1 100.00 2 96.13 4 5 72.30 1 97.20 2 94.06 4 1 1446 60 2000 tags 1000 tags 500 tags Pool size # pools T m % U til. # arrays % U til. # arrays % U til. # arrays 70.30 2 91.10 2 92.26 4 5 65.40 2 73.65 3 88.46 4 1 1522 70 67.20 2 76.00 3 91.86 4 5 61.15 2 69.70 3 86.33 4 1 1560 67 63.55 2 70.95 3 88.26 4 5 57.05 2 65.35 3 82.26 4 1 1446 60 2000 tags 1000 tags 500 tags Pool size # pools T m G enFlex Tags Periodic Tags W e have described a suite ofsoftw are tools fordesigning genom ic assays based on UTAs Integrating design flow optim ization steps yields higherm ultiplexing rates and leads to reduced assay costs In future w ork w e w ill m ake the entire softw are suite available as an online w eb server R eferences Aym etrix,Inc.,G eneFlex tag array probe set,available atthe N etAffx™ Analysis C enter, http://ww w.affym etrix.com /analysis/ M .Atlas,N .H undew ale,L.Perelygina,and A.Zelikovsky, P roc. International C onf. of the IE E E E ngineering in Medicine and Biology (E MBC ) ,pp.172-175,2004. A.BenD or,T.H artm an,B.Schw ikow ski,R .Sharan,and Z.Yakhini.Tow ards optim ally m ultiplexed applications ofuniversal D N A tag system s. Proc. 7th A nnual International C onference on R esearch in C omputational Molecular Biology (R E C OMB) ,pp.48-56,2003 S.Brenner.M ethods forsorting polynucleotides using oligonucleotide tags.U S Patent5,604,097,1997. I.I.M andoiu and D .Trinca.Exactand approxim ation algorithm s forD N A tag setdesign. P roc. 16th A nnual S ymposium on C ombinatorial P attern Matching (C P M) ,pp.383-393,2005. I.I.M andoiu,C .Prajescu,and D .Trinca.Im proved tag setdesign and m ultiplexing algorithm s foruniversal arrays. Proc. 5th Int. C onf. on C omputational S cience (IC C S 2005), Part II ,pp.994-1002,2005. M.Borodovsky,Genemark,http://opal.biology.gatech.edu/GeneMark O R F finder,http://w w w .ncbi.nih.gov/gorf/gorf.htm l. S.R ahm ann,R apid large-scale oligonucleotide selection form icroarrays,Proc.IEEE C om puterSociety Bioinform atics C onference (C SB),2002. C onclusions O pen reading fram es (O R Fs) O RFs are regions ofgenetic m aterial beginning w ith a startcodon and ending w ith a stop codon thatm ightcode fora protein O RFs can be extracted by m eans ofthe genom e's sequence orid using O R F Finder .A second approach is to use the G enM ark fam ily of statisticalgene prediction program s [B orodovsky] Primerselection -C onstraints: - H om ogeneity:E ach prim erm usthybridize to its targetsite atthe tem perature selected forthe experim ent - Sensitivity:M ustavoid self-hybridization and ensure thatprim ers do notform secondary structures - Specificity: Each prim erm usthybridize to one particularO RF -Selection tools: -Prim erand m icroarray probe selection are w ell studied;w e use the Prom ide tool [R ahm ann 03]forselecting pools ofprim ercandidates m eeting the above constraints foreach O RF O R F and Prim erSelection

description

Integrated Design Flow for Universal DNA Tag Arrays N. Hundewale 1 , I. Mandoiu 2 , L. Perelygina 3 , C. Prajescu 2 , and A. Zelikovsky 1 1 CS Department, GSU, 2 CSE Department, UCONN, 3 Department of Biology, GSU. - PowerPoint PPT Presentation

Transcript of Hal.05

Page 1: Hal.05

RECOMB 2005, Poster Session A, Bay 43

Integrated Design Flow for Universal DNA Tag ArraysN. Hundewale1, I. Mandoiu2, L. Perelygina3, C. Prajescu2, and A. Zelikovsky1

1CS Department, GSU, 2CSE Department, UCONN, 3Department of Biology, GSU

• DNA microarrays provide a tool for answering a wide variety of questions about the dynamics of cells– In which cell tissues and under what environmental conditions is each

gene active?– How does the activity level of a gene change with: cell cycle stage,

environmental conditions, disease, etc.?– What genes seem to be regulated together?

• Universal tag arrays (UTAs) technology – Provides unprecedented assay customization flexibility while

maintaining a high degree of multiplexing and low unit cost

• In this poster we describe an integrated design flow for genomic assays based on UTAs– We use the proposed flow to design UTA-based assays for

measuring Herpes B viral gene expression in cells derived from macaque and human hosts

– After defining a “B virus molecular signature”, the assay can provide a sensitive tool for early B virus infection diagnosis and differentiation between B herpes and the closely related herpes simplex viruses

Abstract Universal DNA Tag Arrays

• “Programmable” Array Format [Brenner 97, Morris et al. 98]– Array consists of application independent oligonucleotides called tags

– Two-part reporter probes: aplication specific primers ligated to antitags

– Detection carried by a sequence of reactions separately involving the primer and the antitag part of reporter probes

• Tag/Antitag Hybridization Constraints(H1) Antitags hybridize strongly to complementary tags

(H2) No antitag hybridezes to a non-complementary tag

(H3) Antitags do not cross-hybridize to each other

t1t1 t2t2 t1 t2t1

+

Mix reporter probes with genomic DNASolution phase hybridization

Solid phase hybridization

Single-Base Extension

Generic UTA-Based Assay

Bioperl

Sequences in FASTA format

ORFs in Fasta format

GenMark/ORF Finder

Probe pools

Promide

Tag/antitag sequences

PerTags

Genomic IDs

Assayparameters

Reporter probes

PrimerDel+

Hybridization Experiment and AnalysisHybridization Experiment and Analysis

Design Flow Tag Set Design

Cycle Packing Algorithm [Mandoiu&Trinca 05]• T{}1. For each cycle C in c-token factor graph G, in increasing

order of cycle length, do– If C has no c-tokens in common with T, then add tag

defined by C to T and remove C from G2. Return T

Find: maximum cardinality set of tags such that no tag/tag or tag/antitag pair shares a substring of weight c

Where: weight(A)=weight(T)=1, weight(C)=weight(G)=2, and c is a given hybridization stringency constant

Conservative formalization of (H1)-(H3) based on nucleation complex theory and 2-4 rule:

Tag AssignmentPrimer-to-tag hybridization constraints:If primer p hybridizes with tag t, then either p or t must be left un-assigned, unless p is assigned to t p

t

t’

p’

Maximum Assignable Primer Set Problem: given primer set P and tag set T, find a maximum size assignable subset of P

• Greedy primer deletion heuristic [Ben-Dor 04] • Repeatedly delete a primer of maximum weight until P becomes

assignable, where– Weight of p is sum of potentials of tags to which it hybridizes

– Potential of a tag hybridizing with k primers is 2-k

• PrimerDel+ [Mandoiu et al. 05] – Modified primer deletion heuristic (exploiting availability of several

primer candidates with equivalent functionality

Experimental Results

% Util.# arrays% Util.# arrays% Util.# arrays

76.10199.80297.8045

76.10198.90296.7341152270

78.00199.90298.0045

78.00198.70296.5341156067

72.301100.00296.1345

72.30197.20294.0641144660

2000 tags1000 tags500 tagsPool size

# poolsTm

% Util.# arrays% Util.# arrays% Util.# arrays

70.30291.10292.2645

65.40273.65388.4641152270

67.20276.00391.8645

61.15269.70386.3341156067

63.55270.95388.2645

57.05265.35382.2641144660

2000 tags1000 tags500 tagsPool size

# poolsTm

GenFlex Tags

Periodic Tags

• We have described a suite of software tools for designing genomic assays based on UTAs– Integrating design flow optimization steps yields higher multiplexing

rates and leads to reduced assay costs

• In future work we will make the entire software suite available as an online web server

References• Aymetrix, Inc., GeneFlex tag array probe set, available at the NetAffx™ Analysis Center,

http://www.affymetrix.com/analysis/• M. Atlas, N. Hundewale, L. Perelygina, and A. Zelikovsky, Proc. International Conf. of the IEEE

Engineering in Medicine and Biology (EMBC), pp. 172-175, 2004.• A. BenDor, T. Hartman, B. Schwikowski, R. Sharan, and Z. Yakhini. Towards optimally multiplexed

applications of universal DNA tag systems. Proc. 7th Annual International Conference on Research in Computational Molecular Biology (RECOMB), pp. 48-56, 2003

• S. Brenner. Methods for sorting polynucleotides using oligonucleotide tags. US Patent 5,604,097, 1997.• I.I. Mandoiu and D. Trinca. Exact and approximation algorithms for DNA tag set design. Proc. 16th Annual

Symposium on Combinatorial Pattern Matching (CPM), pp. 383-393, 2005. • I.I. Mandoiu, C. Prajescu, and D. Trinca. Improved tag set design and multiplexing algorithms for universal

arrays. Proc. 5th Int. Conf. on Computational Science (ICCS 2005), Part II, pp. 994-1002, 2005.• M. Borodovsky, Genemark, http://opal.biology.gatech.edu/GeneMark• ORF finder, http://www.ncbi.nih.gov/gorf/gorf.html.• S. Rahmann, Rapid large-scale oligonucleotide selection for microarrays, Proc. IEEE Computer Society

Bioinformatics Conference (CSB), 2002.

Conclusions

• Open reading frames (ORFs)– ORFs are regions of genetic material beginning with a start codon and ending with a stop codon that might code for a protein

– ORFs can be extracted by means of the genome's sequence or id using ORF Finder. A second approach is to use the GenMark family of statistical gene prediction programs [Borodovsky]

•Primer selection

-Constraints:-Homogeneity: Each primer must hybridize to its target site at the temperature selected for the experiment

-Sensitivity: Must avoid self-hybridization and ensure that primers do not form secondary structures

-Specificity: Each primer must hybridize to one particular ORF-Selection tools:

-Primer and microarray probe selection are well studied; we use the Promide tool [Rahmann 03] for selecting pools of primer candidates meeting the above constraints for each ORF

ORF and Primer Selection