Real-Time Primer Design for DNA Chips
-
Upload
frances-gonzalez -
Category
Documents
-
view
26 -
download
5
description
Transcript of Real-Time Primer Design for DNA Chips
Real-Time Primer Design for DNA Chips
Annie HuiCMSC 838 Presentation
CMSC 838T – Presentation
Use of primers in PCR and Microarrays PCR (polymerase chain reaction:
to amplify a particular DNA fragment Use: to test for the presence of nucleotide sequences
Ladder: a mixture of fragments of known length Lane 1 : PCR fragment is ~1850 bases long. Lane 2 and 4 : the fragments are ~ 800 bases long. Lane 3 : no product is formed, so the PCR failed. Lane 5 : multiple bands are formed because one of
the primers fits on different places.
Test of PCR products:
CMSC 838T – Presentation
Use of primers in PCR and Microarrays
DNA chips (Microarrays): to analyse a large number of genes in parallel.
Primers: 20 to 100 bases long Synthetically manufactured
Automated design of primer A computational approach Objective: To find primers that bind
well without self-hybridizing Critique: how accurate?
Fixed on chip
fluorescence
Bound to primer
CMSC 838T – Presentation
Motivation:
This group uses the automated NucliSens extraction system (bioMerieux) to develop their primers here.
CMSC 838T – Presentation
1. Select primers from target sequence two primers P (forward) and Q (reverse) for PCR, one primer
for DNA chip (microarray)
Using window size W, number of possible primers with length between m and n within 1 window is:
Technique: The computational model
n
mllWS 1)(
CMSC 838T – Presentation
Technique: The computational model
2. For each primer pair, or single primer,
Quantify 4 hybridization conditions:a. Primer length
b. Melting temperature
c. GC content
d. Secondary structurei. Self annealing
ii. Self end annealing
iii. Pair annealing
iv. Pair end annealing
We are starting here
CMSC 838T – Presentation
Technique: quantifying hybridization conditions
a. Primer length len(P) Affect melting temperature and hybridization
b. Melting temperature Tm(P) Temperature at which the bonds between primer
and gene sequence break
c. CG content CG(P) G-C pairs are more stable than A-T pairs
(because of more H-bonds)
# in # in 100
G P C PGC p
p
,1 0
4
9
0
ln
1.987 /
50 10
237.15
21.6
m
H pT p T t
S p R
p primer
R cal C mol
T C
t C
H p enthalpy
S p entropy
What is this measure good for?
1
11
1
11
,
,
n
i ii
n
i ii
H p H p p
S p S p p
CMSC 838T – Presentation
Technique: quantifying hybridization conditions
d. Secondary structure Study how likely a primer entangles with itself or with another
primer
P = {p1, p2, …, pn}, Q = {q1, q2, …, qm},
Scoring function: S(pi, qj) = 2 if {pi, qj} = {A, T}
= 4 if {pi, qj} = {C, G}
= 0 otherwise
Example:
P: ...AGCTTTAGCCATAG
Q: TCTTAGGATCGC...
score S(pi, q1) = 2+4+2+2+4 = 14
Position i of primer P
CMSC 838T – Presentation
Technique: quantifying hybridization conditions
Four measures of secondary structure:i. Self annealing, SA(P, P’)
• P’ = reverse of P P
P’P’P’P’P’P’P’
ii. Self end annealing, SEA(P, P’)• Like Self annealing• k>=0• Only count longest continuous overlaps
P
P’P’P’P’
iii. Pair annealing, PA(P, Q)• P and Q are the forward and reverse primers
iv. Pair end annealing, PEA(P, Q)• similar to self end annealing
m
ikii
mmk
ppsppSA11,...,1
)',()',( max
CMSC 838T – Presentation
For PCR:
P is forward primer, Q is reverse primer Ideally, no annealing, length, GC and temp of P equals Q
The optimization is:
For DNA chips (Microarrays): Q doesn’t exist. No pair annealing to study. Only 5 terms left.
Technique: How to apply the model
, ,0 0 0 0 0 0
0.5 1 1 0.1 0.2 0.5 1 1 0.1 0.2 0.1 0.2
ideal p p m p p p m pSCPCR p len GC T len GC T
w
min
( , )
PCRp
TPCR ideal
l p
where
l p SCPCR p q SCPCR p w
]),(),()()()()()(
)()()()()([),(
qpPEAqpPAqSEAqSAqTqGCqlen
pSEApSApTpGCplenqpSCPCR
m
m
CMSC 838T – Presentation
Technique: parallelize SCPCR(p,q) calculation
Calculate Len, GC, Temp, SA and SEA in parallel
Compute PA and PEA in parallel
CMSC 838T – Presentation
Melting temperature and CG content: Simple adder+divider Use pipelining 1st one: O(m) Subsequent cost: O(1)
Annealing matrix
Technique: details
adbd
cda
bc
de
f
cebe
aeaf
bfcf
Whole window: AGCGATATAi-th P primer: GCGATA(i+I)-th P primer: CGATAT
• CG(Pi+1) = CG(Pi) - 1• H(Pi+1) = H(Pi) - H(GC) + H(AT), • similar for S
CMSC 838T – Presentation
Complexity for sequential algorithm: For PCR:
Number of choices of P (window size=Wp):
Number of choices of Q (window size=Wq): Each distance SCPCR(P,Q): Total:
Complexity for parallel algorithm: For PCR:
Distance measure SCPCR(P, Q) = O(1) Total: O(S*T)
Similar but simpler for Microarray
Complexity
p
p
n
ml p lWS 1)(
q
q
n
ml q lWT 1)(
qpqp llllO 22
qpqp WWWWTSO 22
O(S*S*T*T) is a typo in the paper
CMSC 838T – Presentation
Evaluation
Experimental environment 512 primer pairs, |Wp| = |Wq| = 16
1. 500MHz Celeron system with integrated hardware accelerator
2. Software implementation
Evaluation results 1920 secs for software implementation 3.41 secs for using hardware accelerator
CMSC 838T – Presentation
Related Work
Previous approach DOPRIMER
Same computational model Differ in the way of doing dynamic programming Sequential in nature
Other Primer selection softwares Eg: Primer Premier 5, Primer3, PrimerGen, PrimerDesign Similarities:
Criteria: Length, Temp range, GC range, GC Clamp, 3’ end stability, uniqueness of 3’ end base, Dimer/hairpins, Degeneracy, Salt concentration, Annealing Oligo Concentration, etc
Differences: Not a weighed linear sum of all criteria Need much expert’s supervision, the numerical criteria are used as a guide only
CMSC 838T – Presentation
More Related Works
Case study Burpo did a critical review of PCR primer design algorithms
Subject: saccharomyces cerevisiae deletion strains Conclusion:
no suitable program for the task of post-design PCR analysis Especially in the aspect of accurately predicting non-specific
hybridization events that impair PCR amplification.
CMSC 838T – Presentation
Observations
My observations: Minus side:
Is the computational model too simplistic? Specifically, is a weighed linear sum justified?
Plus side: The design of the parallel architecture is neat. Since primers are about the length of 18-22 bases, current
technology certainly can handle it. When would you need fast primer selection?
Primer walking to connect contigs together quickly To scan through a large number of sequences for possible
primers