Sequencing the large and complex genome of - Global Engage · Sequencing the large and complex...

30
Sequencing the large and complex genome of Aegilops tauschii, one of the progenitors of common wheat Ae. tauschii ssp. strangulata accession AL8/78

Transcript of Sequencing the large and complex genome of - Global Engage · Sequencing the large and complex...

Sequencing the large and complex genome of

Aegilops tauschii, one of the progenitors of

common wheat

Ae. tauschii ssp. strangulata accession AL8/78

0.1

A310

R392

A374

C356

A260

Af369

Tm415Af255

Af335

Af256

A295Af371Af367

Af258Af370C347Af368Af261

Af372

A263

Af259Af373

A304A264

A281

A282

A265

A302A278A280

A283A279 I298

A303I364

I365

A300

A358

A305A266

I297V424V425

Tm410

C348

Tm418B327

B333Af262

B318

I399I400

C349C350

C343

C346

C344

C345

Tm409Tm411

Tm412

Tm414Tm413

Tm416

V423

V420419422I366

V426V421

A357

A271A301

C341C340

A267A268

A270

I403

Tm408

A363A273

A359

A360

A375A376

A285

A308

A362

A377

A269

A299I381

I404

I398B402I407

B315316

B331B326R379

B320B321B317B322B323B324

Af396328

B330A277

A274

A275311

I401A294R296A306

A307A361I397

A288

A290A276

I391

R389

B390

A293A309B334

T286

R388

A312A313

R394

B319

R395

B314

R382

R384

B325R387

R393

R385

R383R386

197189

124

114125

44

699100

35

2310974

51721256

26

4113725311925477

65195

15

3185180196

5379

248

311420025047

211

1445224645

1291819

16622108

170208153

2152209227

1143156207

229

202198

428223

225342266373

427186120194 181

187

182183184

4

136252

154193

Strangulata gene pool Transcaucasia

Strangulata gene pool southwestern Caspian Iran

Strangulata gene pool southeastern Caspian Iran

Tauschii gene pool

T. aestivum

Strangulata gene pool T. aestivum

Tauschii gene pool

Aegilops tauschii

Ae. tauschii (genomes DD)

ssp.strangulata

AL8/78

Ae. tauschii

ssp. strangulata

Ae. tauschii

ssp. tauschii

Hexaploid wheat (genomes AABBDD)

172 Ae. tauschii accessions, 178 wheat accessions, 55 RFLP loci

Choice of Ae. tauschii accession

Collection site of AL8/78

Year 2012

AL8/78

Hexaploid

wheat origin

Year 1999

• ~ 90% repeated sequences

Aegilops tauschii genome

• Haploid genome ~ 4 to 5 Gbp

Computed (red and blue curves) and observed (circles)

declines of synteny in intergenic regions in Triticeae

genomes

Science 316, 1862 (2007)

Syn

teny

in inte

rgenic

spaces

• Developed SNaPshot HICF BAC fingerprinting

method

• Fingerprinted 461,706 BAC clones

• Assembled BACs into 3,578 contigs

• MTP across the contigs: 4,792 Mbp

Physical map development

• AL8/78 (ssp. strangulata) x AS75 (ssp. tauschii)

• Developed Ae. tauschii 10K Infinium

SNP assay

• Genotyped 1,102 F2 plants

• Mapped 7,185 SNP markers

Genetic map

Anchoring of BAC contigs on the genetic map with

the 10K Ae. tauschii SNP Infinium assay

F2 population

2-D BAC pools

Luo et al., PNAS

110, 2013

Anchoring algorithm

• BAC is positive for an SNP marker

• At least one neighboring BAC in the

contig is positive for the same SNP

marker

• Accept as true

A portion of a 15 Mbp BAC contig anchored on the 3D genetic map

• MTP = 42,882 BAC clones

• MTP = 4,792 Mbp

• Anchored MTP = 4,030 Mbp

Minimum tiling path (MTP)

Aegilops tauschii physical map

Luo et al., PNAS 110

2013

recombination rate

physical map

gene density

density of collinear

genes

Division of labor among participating labs

By chromosome By task

Genome sequencing

• Validate, pool eight overlapping BAC clones, isolate

DNA, index each pool, and sequence with MiSeq

MTP

Chromosomegroups

poolDNAisolation

BAC-end sequence

Re-fingerprinting

NGS

• Assemble pair-end reads together with the BGI long pair-

end reads into scaffolds

• Merge scaffolds within a pool

• Merge scaffolds among pools

• Validate assembly and scaffold merging

• Sequence assembly

Advanced optical mapping: BioNano technology

o High throughput

o Uniform DNA stretching

facilitating precise DNA length

measurements

o Low error rates in assembly

Restriction nicked (Nt.BspQ1 enzyme) and labeled HMW DNA

IRYS instrument

0

200

400

600

800

1000

1200

1400

1600

1800

20X 30X 40X 50X 60X 70X 80X 90X 100X

N50

Average

Whole-genome nanomap scaffold length C

on

tig

len

gth

in

Kb

Genome equivalents assembled

Distribution of restriction sites in a Nanomap contig

Distribution of restriction sites in DNA scaffold

Whole-genome nanomap of Ae. tauschii

Error rate 11/3000 contigs examined (0.4%)

Ordering and orienting sequence scaffolds on

the WG nanomap

ctg1715

6D (76.097 cM)

ctg12344

6D (76.097 cM)

ctg195

6D (75.686 cM)ctg6115

6D (76.097 cM)

Nanomap contig Sequence scaffolds

Sequence scaffold assembly v.1.0

Total scaffold length: 5.7 Gb

Average scaffold N50 length: 203 Kb

Sequence scaffold assembly v. 1.1

Total scaffold length: 4.4 Gb

Average scaffold N50 length: 405 Kb

Repeat landscape of the current assembly

New TE families

TE category Known New

LTR retrotransposon 1241 1361

SINE 1 61

MITE 18 233

S. bicolor O. sativa B. distachyon Ae. tauschii

27,640 28,236 25,532 36,371 genes

(-3,026) (+7,813)

Dynamics of gene content in grass lineages

28,289

28,200

28,965

Massa et al. Mol Biol Evol 28:2537, 2011

Aegilops tauschii physical map

Luo et al., PNAS 110

2013

recombination rate

physical map

gene density

only collinear genes

Prolamin gene region

Ancestral genome

Sorghum

Rice

B.distachyon

1 2 3 4 5 6 7 8 9 10 11 12 13

14 15

16 17 18

19 20 21 22

Ancestral genesProlamin genesInserted genesDeleted genes

Ae.tauschii

1 2 3 4 5 6 7 8 9 10 11 12 13

14 15

16 17 18

19 20 21 22

Ancestral genesProlamin genesInserted genesDeleted genes

Sorghum

Rice

B.distachyon

Ancestral genome

Prolamin gene region in distal region of 1D

3.1 Mb in Ae. tauschii

PacBio P6-C4 chemistry

56 SMRT cells

Total 40 Gbp

Mean read length 9.1 kb

N50 read length 13.0 kb

10 kb20 kb

Num

ber

of re

ads

Read length

Whole-genome shotgun sequence to

close gaps

Where can I access data?

http://aegilops.wheat.ucdavis.edu/ATGSP/data.php

Batch download of scaffolds:

BLAST:

ftp://ftp.ccb.jhu.edu/pub/data/Aegilops_tauschii/

Karin Deal

Pat McGuire

Ming-Cheng Luo

Naxin Huo, Yi Wang

Chad Jorgensen

Tingting Zhu, Sonny Van

Lichan Xiao, Luxia Yuan

Luis Curiel, Scott Liu

JC Rodriguez, Thanh Ngo

Armond Murray

Olin Anderson

Yong Gu

Katrien Devos

Hao Wang

Jeffrey Bennetzen

Acknowledgements

Richard McCombie

National Science Foundation Plant Genome

Klaus Mayer

Matthias Pfeifer, Karl Kugler

Steven Salzberg

Aleksey Zimin

Daniela Puiu

Geo Pertea

Thomas Wicker

Jaroslav

Doležel

Shuhong Ouyang

Yong Liang

Zhenzhong Wang

Zhiyong Liu

Qixin Sun

Zhengqiang Ma

Alex Hastie

Andrew Anfora

Palak Sheth

Long Mao

Eric Lyons

Frank You Philippe Leroy

Cari Soderlund