Next-Generation Sequencing applied to ancient DNA - CBS€¦ · Next-Generation Sequencing applied...

Post on 20-May-2020

15 views 0 download

Transcript of Next-Generation Sequencing applied to ancient DNA - CBS€¦ · Next-Generation Sequencing applied...

Next-Generation Sequencing applied to ancient DNA

Aurélien Ginolhac & Hákon Jónsson18th June 2013

Next Generation Sequence Analysis course, CBS, DTU

What is ancient DNA?

Museum samples Bones

DNA extracted from fossils, remains of every nature

Coprolites Hairs

Extinct species

Actual species, mainly human

Plague, Y. pestisaborigineOtzï

Why is ancient DNA interesting?

Provide access to extinct species

Not 2 species but sexual dimorphism, Bunce et al. 2003

Why is ancient DNA interesting?

Provide access to evolutionary point from the past

Rasmussen et al. 2011

Why using NGS?

800 mg, Neanderthal379 bp mtDNA, HVR-I

Rasmussen et al. 2011

600 mg, Aborigine6.4X nuclear genome

Sanger Illumina

Which protocol?

Stoneking and Krause, 2011

How ancient DNA looks like?

Ancient equid, unpublished

Read length distribution

How ancient DNA looks like?

H. Jónsson et al, mapDamage 2013

Ancient equid, unpublished

5’-ends C>T G>A 3’-ends

READ

READ

REF REF

DNA damage mapDamage Usage

mapDamage2.0NGS course at DTU

Hákon Jónsson and Aurélien Ginolhac

DNA damage mapDamage Usage

DNA

ON N

N

NH2

N

O

POH

O

O

ON O

N

NH2

O

POH

O

O

ON N NH2

NH

O

N

ONO

N

NH2

O

P OH

O

O

ONNNH2

NH

O

N

O

P OH

O

O

ONO

NH

O

H3C

ATGCAAGTATCCGCACCCCTGCATGCAAGTATCCGCACCCTACGTTCATAGGCGTGGGGACGTACGTTCATAGGCGTGGG

DNA damage mapDamage Usage

DNA depurination

ON N

N

NH2

N

O

POH

O

O

ON O

N

NH2

O

POH

O

O

O

ONO

N

NH2

O

P OH

O

O

O

O

P OH

O

O

ONO

NH

O

H3C

ATGCAAGTATCCGCACCCCTNCATGCAAGTATCCGCACCCTACGTTCATAGGCGTGGGGACNTACGTTCATAGGCGTGGG

DNA damage mapDamage Usage

Hydrolysis of the backbone

ON N

N

NH2

N

O

POH

O

O

ON O

N

NH2

ONO

NH

O

H3C

ATGCAAGTATCCGCACCCCTNTACGTTCATAGGCGTGGGGAC

DNA damage mapDamage Usage

Cytosine deamination

ON N

N

NH2

N

O

POH

O

O

ON O

NH

O

ONO

NH

O

H3C

ATGCAAGTATCCGCACCCCTNTACGTTCATAGGCGTGGGGAU

DNA damage mapDamage Usage

Sequencing

Ref AATGTAGCTTACTAATATAAAGCAAGGCACTGAAAATGCCRead1 ..TGTAGCTTACTAATATAAAGCAAGGCACTGAA......Read2 ....TAGCTTACTAATATAATGCAAGGCACTGAAAA....Read3 .....AGCTTACTAATATAAAGCAAGGCACTGAAAATGC.Read4 .......UTTACTAATATAAAGCAAGGCACTGAAAATGCTRead5 ........TTACTAATATAAAGCAAGGCACTGAAAATACCRead6 ...........UTAATATAAAGCAAGGCACTGAAAATGCCRead7 ............TAATATAAAGCAAGGCACTGAAAATGCCRead8 ..................AAAGCCAGGCACTGAAAATGCCRead9 ......................UAAGGCACTGAAAATGCCRead10 .......................AAGGCACTGAAAATGCC

DNA damage mapDamage Usage

mapDamageplot

A

●● ● ● ● ● ●

● ●

● ● ●● ● ● ●

● ●

0.0

0.1

0.2

0.3

0.4

0.5

Fre

quen

cy

A

●●

●●

●●

●●

● ●●

● ● ●

C

● ● ●● ●

●●

●●

● ●● ●

●● ● ●

●● ●

C

● ●● ●

●●

● ●●

●● ●

● ●●

● ●

0.0

0.1

0.2

0.3

0.4

0.5

G

●● ● ●

● ●● ● ●

● ●●

● ● ●●

● ●

0.0

0.1

0.2

0.3

0.4

0.5

Fre

quen

cy

−10 −

9−

8−

7−

6−

5−

4−

3−

2−

1 1 2 3 4 5 6 7 8 9 10

G

●● ●

●●

● ●

● ●● ●

●● ●

−10 −

9−

8−

7−

6−

5−

4−

3−

2−

1 1 2 3 4 5 6 7 8 9 10

T

●●

●●

●● ● ●

● ● ●● ● ●

●●

−10 −

9−

8−

7−

6−

5−

4−

3−

2−

1 1 2 3 4 5 6 7 8 9 10

T

● ● ●●

●●

● ●

● ●● ● ●

0.0

0.1

0.2

0.3

0.4

0.5

−10 −

9−

8−

7−

6−

5−

4−

3−

2−

1 1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

0.00

0.05

0.10

0.15

0.20

0.25

0.30−

25−

24−

23−

22−

21−

20−

19−

18−

17−

16−

15−

14−

13−

12−

11−

10 −9

−8

−7

−6

−5

−4

−3

−2

−1

0.00

0.05

0.10

0.15

0.20

0.25

0.30

DNA damage mapDamage Usage

mapDamage2.0

Cstart

C to T

Single s.

Double s.

Tend

Cend

νi

1− νi

λi

1− λi

δs

1− δs

δd

1− δd

U C T A A T C T A C G G G A C C

A T T A G A T G C C C T G G T

Overhang Nick

5’

3’

3’

5’

SC ,k ∼ Mul (DA , (1, 0, 0, 0) ·Θ(µ) · Pdam(δd , δs , λ, ν, k))

DNA damage mapDamage Usage

Posterior predictive intervals

●●

●●

●●

● ● ● ● ● ● ● ● ● ● ● ●●

● ● ● ● ● ● ● ● ● ● ●

● ● ●●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●0.00

0.05

0.10

0.15

0.20

0.25

1 3 5 7 9 11 −11 −9 −7 −5 −3 −1Relative position

Sub

stitu

tion

rate

Subs. type

●●●

●●●

●●●

C−>T

G−>A

Others

Posterior prediction intervals

DNA damage mapDamage Usage

Posterior distributions for parameters

Theta

Den

sity

0.0005 0.0010 0.0015

050

015

00

Rho

Den

sity

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

DeltaD

Den

sity

0.015 0.020 0.025 0.030

050

100

150

DeltaS

Den

sity

0.5 0.6 0.7 0.8

02

46

8

Lambda

Den

sity

0.30 0.35 0.40 0.45

05

1015

LogLik

Den

sity

−295 −290 −285

0.00

0.10

0.20

Cstart

C to T

Single s.

Double s.

Tend

Cend

νi

1− νi

λi

1− λi

δs

1− δs

δd

1− δd

U C T A A T C T A C G G G A C C

A T T A G A T G C C C T G G T

Overhang Nick

5’

3’

3’

5’

DNA damage mapDamage Usage

mapDamage −i seq . bam −r ref . fa

You should see something like this

Started with the command : mapDamage −i seq . bam −r ref . faPerforming Bayesian estimatesStarting grid search , starting from random valuesAdjusting the proposal variance iteration 1..Adjusting the proposal variance iteration 10Done burning , starting the iterationsDone with the iterations , finishing upWriting and plotting to files

DNA damage mapDamage Usage

Go into the results folder

cd results

Take a look at these files

Fragmisincorporation_plot . pdfLength_plot . pdfStats_out_MCMC_hist . pdfStats_out_MCMC_post_pred . pdfStats_out_MCMC_trace . pdfStats_out_MCMC_iter_summ_stat . csv

Hands On session!

Bos et al. 2011

How different was the Yersinia pestis strain in 14thcentury?

• Small genomes• Capture, enrichment method

Hands On session!

Fetch the tutorial, open a terminal and type:

cp /home/people/ludovic/HandsOnDTUaDNA/HandsOnDTU* .

in 2h

Acknowledgments

Ludovic Orlando & the paleomix group

http://geogenetics.ku.dk/

Stinus Lindgreen, Mikkel Schubert, Anders KroghBiocentre, København University

Bent Petersen, Josef Vogt, Thomas Sicheritz-PontenCBS, Technical University of Denmark

Danish High-Troughput Sequencing Centre

Acknowledgments

Ludovic Orlando & the paleomix group

http://geogenetics.ku.dk/

Stinus Lindgreen, Mikkel Schubert, Anders KroghBiocentre, København University

Bent Petersen, Josef Vogt, Thomas Sicheritz-PontenCBS, Technical University of Denmark

Danish High-Troughput Sequencing Centre

Illumina – workflow

6

FASTQ files

@HWUSI-EAS1510_0024_FC:7:1:1563:932#0NAGGACAGGGAAGCCGAAGATACCATTTGTGTTCTTCCCAAACTTTATTACTTTTGTAGCAAAAAGAAAA+HWUSI-EAS1510_0024_FC:7:1:1563:932#0BKJKMRRQRQ[[[[[[[[[[______b_________________bbZ_QQ______QQ__BBBBBBBBBB@HWUSI-EAS1510_0024_FC:7:1:1574:952#0NATCATCGCGGGGGTCGGCAGCTTCGACACCGCGCACACGATCCACTCGGCGAAGGGTGCTGCCGCGTTG+HWUSI-EAS1510_0024_FC:7:1:1574:952#0BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB@HWUSI-EAS1510_0024_FC:7:1:1656:932#0ATGATTAGTGCATCAGCCCTTTGAAAAGTGGCCTACAGACATTGTCCTTAGCTAACAACCACAGATCGGA+HWUSI-EAS1510_0024_FC:7:1:1656:932#0JJKJJQQQQN__bbb_____bb________b___b__bb__b_____b___b_b___b____b__RSQRPfastQC

Quality scores Kmers GC content