Genotype-by-Sequence · Jesse’s GS presentation cf. “Diversity and structure” presentation...

Post on 04-Oct-2020

2 views 0 download

Transcript of Genotype-by-Sequence · Jesse’s GS presentation cf. “Diversity and structure” presentation...

Genotype-by-Sequence

Yung-Fen Huang

Collaborative Oat Research Meeting, March 7, Ottawa

Outline

2

• Principle of Genotype-by-Sequence (GBS)

• Oat GBS markers

• SNP assay vs. GBS

• Possible applications

• Ongoing oat GBS analysis and expected outcomes

Genotype-by-Sequence

3

1. Complexity reduction: Digest DNA with restriction enzyme(s)

Use methylation-sensitive enzyme to filter out repetitive genomic regions

2. Ligate adapters

Sample 1 Sample 2 Sample 3

Genomic

DNA

sample-specific barcode

3. Pool and amplify samples

4. Sequencing Case of oat: 1.5 M reads/sample ~ 0.7% of genome (96-plex)

Genotype-by-Sequence

4

5. SNP calling Use bioinformatic pipeline(s) to process the raw data for SNP identification

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

Ex. 1373 oat sample = 237 Gb (compressed file) = 1.5 billion reads

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

Marker S1 S2 S3 S4 S5 S6 S7 S8 S9

M1 A C C A A A A A A

M2 A H G A A A N A A

M3 A T T N T T N N T

M4 C A N A N N N N A

M5 A A C C C A A A A

M6 G C G C G G G C G

M7 N G N G G G G G N

Advantages – fast, large and cheap

- Marker discovery and genotyping at the same time

- Multiple samples at the same time

Challenges…. Bioinfo? And other?

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

Oat molecular data from CORE

5

KxO OxT TxM

DxE

Genotype-by-Sequence

(120K)

SxH OxP HxZ PxG

GoldenGate

(2K)

subset (108 lines)

Breeders’ selection (580 lines)

In progress

Infinium

(6K)

Bi-parental

mapping

populations

IOI (350 lines)

How many markers does GBS produce?

6

Bi-parental (7 populations)

Europe (32 lines) Diversity panel (152 IOI lines)

~ 10,000 SNPs

More sequences

= more SNPs

North America (12 Breeding programs)

Cumulative marker numbers

Missing data (%)

10 20 30 40 50 60 70 80

0

20,000

40,000

60,000

80,000

100,000

120,000 N

o. o

f m

ark

ers

All (1373 lines)

OxT

~ 4,000 SNPs

North America (12 Breeding programs)

All (1373 lines) Bi-parental (7 populations)

Europe (32 lines) Diversity panel (152 IOI lines)

Oat GBS markers

7

No. of markers at different levels of completeness

Completeness (%)

10 20 30 40 50 60 70 80 90

0

5,000

10,000

15,000

20,000

25,000

30,000

No

. o

f m

ark

ers

Most SNPs are

25-50% complete

OxT

More sequences

= more markers

at high completeness

More sequences are

expected with

technology update

SNP assay vs. GBS

8

SNP assay GBS

SNP discovery Required Not required

No. of markers 6K-100K 2K-100K1

Time of experiment2 3 days 2-4 weeks

Time for SNP calling3 Weeks to months < 1 day

IT demand for SNP calling Simple High informatic effort

Data completeness (%) > 90% 0 < < 1004

Reproducibility High High

Cost/sample5 ~ $60 $10-20

For 96 samples (in the case of oat; based on former data with future projection):

1: Variable according to sample diversity and data completeness 2: From library preparation to raw data collection 3: Including data curation 4: Completeness varies according to end-use 5: Library and beadchip/sequencing consumables

What can the GBS data tell us?

9

Breeding cycle

New

cultivars Major gene introgression

Genomic prediction

Genomic contribution: selection precision

(+cycle acceleration)

Genetic structure

of the sample

QTL mapping Association mapping

Structure and relatedness analysis

Genome organisation

Genetic map

High-throughput genomic data

Trait genetic

architecture

Missing data don’t matter

ex. Wheat, barley, cassava

Good

phenotypes

Ongoing analysis – genetic map update

10

oc_plos_16A

[0] gmi_es15_c4222_543 [3] gmi_es17_c10073_640

[11] gmi_es17_c20215_324 [15] gmi_es_cc12708_442

[16] gmi_es17_c17558_304 [18] gmi_es15_c5368_259

[19] gmi_es01_c13907_104 [20] gmi_es01_c18017_440

[21] gmi_es17_c968_903 [22] gmi_es01_c7970_395 [23] gmi_es01_c1725_728

[24] gmi_es15_lrc19562_699 [25] gmi_es15_c19227_114

[26] gmi_es02_c3206_293 [30] gmi_es01_c13820_382

[32] gmi_snp2043_1 [36] gmi_es02_c1538_477

[37] gmi_es_cc2716_392 [39] gmi_snp_lrc40347_1 [42] gmi_es17_c8741_79

[44] gmi_es02_c15898_126

[52] gmi_es17_c3846_396 [54] gmi_es_cc13348_93

[56] gmi_es02_c21402_61 [57] gmi_ds_cc4575_55

[58] gmi_es15_c2802_625 [59] gmi_es02_c8034_282

[60] gmi_es_cc6497_157 [61] gmi_es17_c3200_273 [62] gmi_es17_c1612_641

[66] m38721-1 [67] af237553-1-2

[69] gmi_es15_c10509_256 [71] bm_912a

[74] gmi_es17_c5169_555 [75] gmi_es01_c17040_394

[76] gmi_es01_c1287_580 [77] gmi_es02_c12598_260 [81] gmi_es17_lrc7334_312

[82] gmi_es01_c284_1036 [86] bm_183a

gbs2_pg95_with_dist_16A

gmi_es01_c4259_207 [0] avjp1302 [0] gmi_es_cc9290_178 [3] tp252329 [12] avjp42734 [13] gmi_es17_c20215_324 [48] gmi_es17_c12516_818 [54] tp17466 [54] tp342240 [55] gmi_es15_c965_491 [56] gmi_es15_c735_156 [56] gmi_es15_c5905_473 [58] avjp70170 [59] avjp70171 [59] gmi_es02_c12745_731 [61] gmi_es17_c4427_657 [61] avjp20306 [63] avjp77463 [64] gmi_es14_c2025_443 [65] gmi_es17_c9257_328 [67] gmi_es03_c2344_498 [69] gmi_es17_c2699_441 [69] gmi_es01_c1725_728 [70] avjp76937 [72] avjp12767 [73] gmi_es01_c7970_395 [74]

avjp97487 [94] avjp53477 [96] gmi_es17_c9625_419 [98] gmi_es_cc14000_280 [99] gmi_es17_c5367_259 [100] avjp125669 [101] gmi_es02_c21402_61 [102] gmi_es15_c2802_625 [102] gmi_es17_c1612_641 [106] avjp68334 [106] avjp77316 [109] avjp105825 [110] avjp116909 [113] gmi_es15_c10509_256 [114] avjp115236 [116] gmi_es01_c17040_394 [119] avjp119774 [119] gmi_es17_c5169_555 [120] avjp85794 [121] avjp42711 [123] avjp12306 [124] avjp14139 [125] avjp52039 [126] tp279042 [127] gmi_es05_c8916_635 [129] avjp65787 [130] gmi_es17_c2063_243 [131] gmi_es15_c900_850 [149] avjp49411 [152] avjp90884 [161] gmi_es15_c17743_247 [170] tp336131 [177] gmi_es17_c7320_909 [184]

- 2nd generation framework map with high

quality SNP and GBS markers (1.5K +

40.6K) from 7 bi-parental populations

(quite challenging!)

Larger regions are covered

(ex. 16A: consensus vs. updated PxG map)

- Place historical markers and 19.6K

medium-quality GBS markers on

updated framework map

High-density consensus map of more than

50K ordered markers

Genetic map

Expected outcome – upcoming analyses

11

Breeding cycle

New

cultivars Major gene introgression

Genomic prediction

QTL mapping Association mapping

High-throughput genomic data

Trait genetic

architecture

Genome organisation

Structure and relatedness analysis Structure and relatedness analysis

Association mapping QTL mapping

Expected outcome

Upcoming analysis

Trait genetic

architecture

Genomic prediction

cf. Jesse’s GS presentation

cf. “Diversity and structure” presentation

cf. Allele mining presentations

Genetic structure

of the sample

Genetic structure

of the sample

12

Genotype-by-Sequence

13

Main steps of SNP calling

A

A

G

G

G A

G

G

Sample 1 Sample 2 Sample 3

i. Group sequences (“reads”) of the same sample according to barcode

ii. Group identical reads (groups of reads = “tags”)

A S1 T S1

T S1 A S1

G S2

S3

G S2

G S2 S2

A S3

S2 G S3

G S3

ii. Identify SNPs (group tags with few base mismatches, ex. 1 base)

C

T

T

C C

C

C C

T S1

T S1

S3

S2

C

C

SNP 1 SNP 2

G S2

G S2

G S2

G S3

G S3

A S1

A S1