3/23/[email protected] shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv...

25
3/23/09 [email protected] Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab shRNA libraries sequencing using DNA Sudoku

Transcript of 3/23/[email protected] shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv...

3/23/09 [email protected] shRNA libraries with DNA Sudoku

Yaniv Erlich Hannon Lab

shRNA libraries sequencing using DNA Sudoku

3/23/09 [email protected] shRNA libraries with DNA Sudoku

Preparing DNA libraries

Programmable microarray Cloning into plasmids Transformation

Array single colonies

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku

The problem

Input: 40,000 bacterial colonies

Output: The sequence of the shRNA inserts

Insert type

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku

Motivation

• Filtering the correct fragments

• Balanced representation

• Subset selection.

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku

Clone-by-clone sequencing

Clone-by-clone sequencing:

Sequence each clone by a capillary platform

Caveat:

Cost: ~40,000$

Conclusion: using next generation sequencing

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku

Naïve next-gen

Pooling Solexa

??

Conclusion: we need to add a source clone identifier (barcode)

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku

Naive barcoding

Barcoding

Pooling Solexa

Barcode Sequence

214 AGTGC..

8106 CTCAA..

30010 TTTCG..

88 TTGAA..

Caveats:

• Order 40,000 barcodes. Each of length of ~95nt.

• 40,000 PCR reactions.

Conclusion: we need less barcodes

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku [email protected]

Naive Pooling(1)

1 2 3 4 5 6 7 8

A

B

C

D

E

F

Genotype Barcode

ACACA 5

ACACA B

Barcode:

Which specimen appears in both barcode #5 and #B?

Specimen #13!

Case #1:

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku [email protected]

Naive Pooling(2)

1 2 3 4 5 6 7 8

A

B

C

D

E

F

Barcode:

Genotype Barcode

ACGTT 1

ACGTT D

ACGTT E

ACGTT 2

ACGTT associated with specimens #25(D,1) and #34 (E,2)!

Or maybe

ACGTT associated with specimens #25(D,2) and #34(E,1)?

Ambiguity

Conclusion: we should deal with shRNA ‘duplicates’

Case #2:

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku [email protected]

Lessons learned for the desired scheme

Features of the required encoding scheme

Compactness Using a small set of barcodes

Dealing with duplicates Every specimen should be resolved without ambiguity.

Experimental overhead While reducing the number of barcodes, we should also pay attention to the resource allocated to the pooling itself.

Simple This is not a computer program. Encoding is done by a robot and chemistry -

So keep It Simple

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku [email protected]

Barcoding PE sequencing Decoding

Overview of our solution

‘Chinese’ Pooling

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku

The pooling design

Combinatorial pooling using the

Chinese Remainder Theorem (CRT).

Combinatorial pooling using the

Chinese Remainder Theorem (CRT).

"I have never done anything 'useful'. No discovery of mine has made, or is likely to make, directly or indirectly, for good or ill, the least difference to the amenity of the world” (G. Hardy, A Mathematician's Apology,1940)

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku

Chinese remainder riddle

“An old woman goes to market and a horse steps on her basket and crashes the eggs. The rider offers to pay for the damages and asks her how many eggs she had brought. She does not remember the exact number, but when she had taken them out 3 at a time, there was one egg left. The same happened when she picked them out 4, and 5 at a time, but when she took them 7 at a time they came out even. What is the smallest number of eggs she could have had?”

Answer: 91 eggs

0)7,mod(

1)5,mod(

1)3,mod(

1)2,mod(

n

n

n

n Chinese Remainder Theorem says:

-There is one-to-one correspondence between n (0n<2*3*5*7) and the residues.

- There is an easy algorithm to solve the equation system.

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku

Pooling construction with modular equations

0)7,mod(

1)5,mod(

1)3,mod(

1)2,mod(

n

n

n

n

Specimen Pooling window Destination well (different plates)

One-to-One correspondence…

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku03/06/09 [email protected]

Example of Chinese pooling

)8(mod

)5(mod

PoolSpecimen

PoolSpecimen

Source array:

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku [email protected]

Chinese Remainder Theorem asserts:

(1) Two specimens will be meet in no more than one pool.

(2) The number of pools

Inputs: N (number of specimens in the experiment)

Weight (pooling efforts)

Algorithm:1. Find W numbers {x1,x2,…,xw} such that:

(a) Bigger than

(b) Pairwise coprime

For instance: {5,8,9} but not {5,6,9}

2. Generate W modular equations:

3. Construct the pooling design upon the modular equations

Output: Pooling design

Chinese Remainder Pooling Design

N

)(mod

)(mod

)(mod

2

1

WxPoolSpecimen

xPoolSpecimen

xPoolSpecimen

NW~

Number of bc:

~ N

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku [email protected]

How good is our method?

Features of the required encoding scheme

Compactness Using a small set of barcodes

Dealing with duplicates Every specimen should be resolved without ambiguity.

Experimental overhead While reducing the number of barcodes, we should also pay attention to the resource allocated to the pooling itself.

Simple This is not a computer program. Encoding is done by a robot and chemistry -

So keep It Simple

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku [email protected]

Barcode reduction

IEEE Transaction on Information Theory (1964)

Proved upon pure combinatorial constrains:

the lower theoretical bound of the number of barcodes is N

Our method is very close the lower theoretical bound

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku [email protected]

How good is our method?

Features of the required encoding scheme

Compactness Using a small set of barcodes

Dealing with duplicates Every specimen should be resolved without ambiguity.

Experimental overhead While reducing the number of barcodes, we should also pay attention to the resource allocated to the pooling itself.

Simple This is not a computer program. Encoding is done by a robot and chemistry -

So keep It Simple

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku

Dealing with duplicates - simulation

[email protected]

Duplicates size

Pro

ba

bil

ity

of

co

rre

ct

de

co

din

g

40,000 specimens with only 384 barcodes

0.99

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku [email protected]

How good is our method?

Features of the required encoding scheme

Compactness Using a small set of barcodes

Dealing with duplicates Every specimen should be resolved without ambiguity.

Experimental overhead While reducing the number of barcodes, we should also pay attention to the resource allocated to the pooling itself.

Simple This is not a computer program. Encoding is done by a robot and chemistry -

So keep It Simple

W=5:

•5 lanes of Solexa

•One week and a half of robotics

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku [email protected]

How good is our method?

Features of the required encoding scheme

Compactness Using a small set of barcodes

Dealing with duplicates Every specimen should be resolved without ambiguity.

Experimental overhead While reducing the number of barcodes, we should also pay attention to the resource allocated to the pooling itself.

Simple This is not a computer program. Encoding is done by a robot and chemistry -

So keep It Simple

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku

Real results…• Arabidopsis shRNA library with 17,000 shRNA fragments

• Picked 40,320 bacterial colonies

• Sequence 3,000 colonies with capillary sequencing for comparison.

• Decoded ~20,500 bacterial colonies with correct inserts

• 96% of the assignments were correct.

• ~8,000 unique fragments of the library.

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku

Future directions• Developing a more advance decoder using machine

learning approach

• 2-stage algorithm

Introduction Naïve Solutions Chinese Pooling Analysis Results

3/23/09 [email protected] shRNA libraries with DNA Sudoku03/06/09 [email protected] Sudoku

Greg Hannon

Acknowledgements

Ken Chang

Michelle Rooks

Assaf Gordon

Oron Navon and Roy Ronen