Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ......

20
Maria J. Monteros, Joann Mudge, Andrew D. Farmer, Nicholas P. Devitt , Diego A. Fajardo, Thiru Ramaraj, Xinbin Dai, Zhaohong Zhuang, Peng Zhou, Joseph Guhlin, Christopher D. Town, Patrick X. Zhao, Jason R. Miller, Kevin A. Silverstein, E. Charles Brummer, Nevin D. Young Genome Sequence of Medicago sativa: Cultivated Alfalfa at the Diploid Level (CADL) NAAIC Madison, WI July 13, 2016

Transcript of Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ......

Page 1: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

Maria J. Monteros, Joann Mudge, Andrew D. Farmer, Nicholas P. Devitt , Diego A. Fajardo, Thiru Ramaraj, Xinbin Dai, Zhaohong

Zhuang, Peng Zhou, Joseph Guhlin, Christopher D. Town, Patrick X. Zhao, Jason R. Miller, Kevin A. Silverstein,

E. Charles Brummer, Nevin D. Young

Genome Sequence of Medicago sativa: Cultivated Alfalfa at the Diploid Level

(CADL)

NAAIC Madison, WI July 13, 2016

Page 2: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

Alfalfa - Medicago sativa Complex

Tetraploid 4X

Medicago sativa subsp. sativa

Medicago sativa subsp. falcata

Diploid 2X Medicago sativa

subsp. caerulea Medicago sativa subsp. falcata

Medicago sativa subsp. varia

Medicago sativa subsp. hemicycla

Warm weather Cold tolerant

Page 3: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

Alfalfa (Medicago sativa) Genetics

Basic chromosome number (x) = 8

Diploid 2n = 2x = 16

Tetraploid 2n = 4x = 32

2X 4X

Page 4: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

Diploid Genome vs. Tetraploid Alfalfa

Images provided by Haibao Tang

Page 5: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

Cultivated Alfalfa at the Diploid Level (CADL)

• Population developed by Ted Bingham • Developed from cultivated tetraploids over 10 years to

produce a diploid form

• Used 4x-2x cross method and backgrossing to increase cultivated germplasm background

• Simpler to analyze and assemble the genome than the tetraploid alfalfa grown commercially

• Genotype HM342 sequenced as part of the M. truncatula HapMap project

Bingham ET and McCoy TJ. 1979. Cultivated Alfalfa at the Diploid Level: origin, reproductive stability, and yield of seed and forage. Crop Science 19: 97-100.

Page 6: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

Role of Plant Genomes

Growth Development Adaptation Stress tol. Quality Persistence

Page 7: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

CADL Genome Assembly Pipeline (V.0.95)

Quiver5

(Polishing assembly)

SSPACE-Longread4

(Scaffolding)

1. Myers G. 2014. The Daligner Overlap Library. https://github.com/thegenemyers/DALIGNER. 2. Chin C, et al. 2013. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature Methods. 10:563–569. 3. Chin J. 2015. FALCON: experimental PacBio diploid assembler. https://github.com/PacificBiosciences/FALCON. 4. Boetzer M, Pirovano W. 2014. SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information. BMC Bioinformatics. 15:211. 5. PacBio® variant consensus caller (Quiver algorithm). https://github.com/PacificBiosciences/GenomicConsensus

Falcon2,3

(Genome assembly)

Daligner1

(Overlap Computation)

Source: Joann Mudge, NCGR

>100X coverage

Page 8: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

CADL - PacBio de novo Assembly (V.0.95)

Contigs 6,593 Contig Length 1,254,734,629 Contig N50 547,092 Max Contig Size 4,047,589

Falcon 0.2 + SSPACE + Quiver

Source: Joann Mudge, NCGR

Page 9: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

CADL Gene Annotation Pipeline (V.0.95)

OUTPUT 120,094 prelim gene models

Training datasets Augustus

for gene models

MAKER Pipeline Gene prediction

Prelim. gene models

Source: Xinbin Dai, Patrick Zhao

Evidence Alfalfa RNA Seq

Alfalfa ESTs Mt and G.max

Protein

Campbell MS et al. 2014. Maker pipeline for genome annotation and curation. Curr. Prot. Bioinformatics. 48:1-39.

6,593 scaffolds

SPADA 2,433 small peptide gene

models

124,527 gene

models

BUSCO annotation

quality 956 univ.

orthologs in plants (99%)

57,054 High conf. gene

models

Page 10: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

CADL Gene Functional Annotation (V.0.95)

BLAST against Unipro

database

MAKER Pipeline Gene prediction

Prelim. gene models

Source: Xinbin Dai, Patrick Zhao

6,593 scaffolds

SPADA 2,433 small peptide gene

models

InterPro Scan

Protein domain analysis

Orthologs in M.

truncatula

57,054 gene

models

Alignment of CADL and

M. truncatula

Genome Size Sequenced size H_Conf. Genes M. truncatula 4.0 454-526 MB 370 MB 31,661

Alfalfa CADL 415-430 MB 1,200 MB 57,054

Page 11: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

Whole Genome BLASTN CADL vs Mt4.0

Source: Andrew Farmer, NCGR

Chr 1

Chr 2

Chr 3

Chr 4 Chr 5 Chr 6 Chr 7 Chr 8

Good coverage of the gene space

Page 12: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

Divergent Haplotypes Differ in Gene Content (Chr. 7)

Source: Andrew Farmer, NCGR

M. truncatula CADL CADL

Mt

CADL

Page 13: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

Alfalfa Breeder’s Toolbox See Poster #25

Page 14: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

Genome BLAST Search - Alfalfa Breeder’s Toolbox

http://alfalfatoolbox.org/doblast/

CADL (0.95)

M. truncatula (4.0)

Page 15: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

Available at the M. truncatula HapMap Project http://www.medicagohapmap.org/downloads/cadl

Page 16: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

CADL Genome Assembly Pipeline (~V.1.0)

Falcon2,3

(Genome assembly)

Daligner1

(Overlap Computation)

Quiver4

(Polishing assembly)

Dovetail HiRise

(Scaffolding)

1. Myers G. 2014. The Daligner Overlap Library. https://github.com/thegenemyers/DALIGNER. 2. Chin C, et al. 2013. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature Methods. 10:563–569. 3. Chin J. 2015. FALCON: experimental PacBio diploid assembler. https://github.com/PacificBiosciences/FALCON. 4. PacBio® variant consensus caller (Quiver algorithm). https://github.com/PacificBiosciences/GenomicConsensus.

v 1.3.0-57-g4d1fc9b

Quiver4

(Polishing assembly)

Source: Joann Mudge, NCGR

Page 17: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

CADL Dovetail assembly (Anticipated V.1.0) (Falcon 0.4 + Quiver + Dovetail + Quiver)

Scaffolds 5,751 Scaffold Length 1,251,060,667 Scaffold N50 1,271,357 Max Scaffold Size 6,073,685

Source: Joann Mudge, NCGR

Page 18: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

Acknowledgements Univ. of Minnesota Nevin Young Roxanne Denny

NCGR Joann Mudge Andrew Farmer Diego Fajardo Nicolas Devitt Thiru Ramaraj

Noble Foundation Patrick Zhao Xinbin Dai Jaeyoung Choi Chunlin He Perdeep Mehta Michael Udvardi

UC Davis Charlie Brummer

Haibao Tang Christopher Town Ted Bingham

PAG 2016

Page 19: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr
Page 20: Cultivated Alfalfa at the Diploid Level (CADL)...Cultivated Alfalfa at the Diploid Level (CADL) ... Whole Genome BLASTN CADL vs Mt4.0 . Source: Andrew Farmer, NCGR Chr 1 Chr 2 Chr

CADL Scaffolds “Painted” by Mt Chromosomes

chr1=green chr2=orange chr3=blue chr4=black chr5=yellow chr6=red chr7=magenta chr8=pink

Source: Andrew Farmer, NCGR