2015. Patrik Schnable. Trait associated SNPs provide insights into heterosis in maize
Transcript of 2015. Patrik Schnable. Trait associated SNPs provide insights into heterosis in maize
Trait-associated SNPs provide insights
into heterosis in maize
Patrick S. Schnable
Iowa State University
China Agriculture University
Data2Bio, LLC
ICRISAT
19 February 2015
How to Translate Genomic Data into
Biological Understanding and Crop
Improvement?
2
B73 Reference Genome NGS data in NCBI SRA (Feb. 2014)
zmHapMap1
zmHapMap2
CAU resequencing
ISU Zeanome (RNA-seq)
Ames Diversity Panel
IBM RILs RNA-seq
CAAS resequencing
And many others
Schnable, Ware et al., Science, 2009
Te
ra (
10
12)
Ba
se
s
$32M
Associate Genes (or genetic markers)
with Traits
• Which of the ~50,000 maize genes control important traits?
• GWAS (Genome-wide association studies)
– Typically conducted on diversity panels
– By exploiting historical recombination events they yield higher resolution associations than QTL studies
– Identifies associations between genetic markers (e.g., SNPs) and traits
• Forward and reverse genetics
Y: phenotypic trait;
Pi: Fix effect of cross type population (N=4);
Sl: Fix effect of sub-population (N=25).
Approaches for GWAS
4
Y = u+ biPii=1
4
å + alSll=1
25
å + dSNP + e
• Single-marker GWAS approach
– SNP effects tested one at a time
– Using PLINK command line tool
• Stepwise regression approach
– SNPs fitted in a step-wise manner
– Using GenSel4 Stepwise (alpha=0.05,MaxMarkers=300)
• Bayesian-based approach
– SNPs fitted simultaneously into a model
– Using GenSel4 BayesC (chainLength=41,000, burnin=1,000)
GWAS for Yield-Related Traits
Kernel Count
Total Kernel Weight
Avg. Kernel Weight
Cob
Length
Cob Diameter
Cob Weight
Kernel Row Number
Jinliang Yang (杨金良)
Jeff Ross-Ibarra Lab, UC Davis
Yu, J. et al. Genetics 2008;178:539-551
Nested Association Mapping (NAM) Population
6
Four related populations (N=7,000
lines):
• NAM RILs (N=5,000 lines) + IBM
RILs (N=300 lines)
• Subset of MxRILs (N=300 lines;
IBM + NAM)
• Subset of BxRILs (N=800 lines;
IBM + NAM)
• NAM Partial Diallel (N=250 lines)
High Density Genotypic Data
• SNPs from three sources:
– Maize HapMap1* (1.6M)
– Maize HapMap2* (18.4M)
– Our RNA-seq SNPs from
5 tissues (4.9M)
7
# Concordance among overlapping variant sites
HapMap1
0.7 M
HapMap2
16.6 M
0.4 M
98.7%
1.2 M
96.6%
0.3 M
96.9%
0.2 M
RNA-seq
3.2 M
##
#
Imputation or
Projection
NAM RILs
BxNAM RILs
MxNAM RILs
NAM Diallels
*Gore, M.A., et. al.,
Science, 2009;
Chia, H-M, et. al., Nature
Genetics, 2012.
Merging and Filtering
Minor Allele Freq. (MAF) >= 0.1SNP Missing Rate < 0.6
Merged SNP set
13.0M
Phenotypic distributions
8
• CD=Cob Diameter, AKW=Avg. Kernel Weight, CL=Cob Length, CW=Cob
Weight, KC=Kernel Count, TKW=Total Kernel Weight
Based on ~100k observations/trait from 9 locations; ~20% our data and 80%
from: Brown, P. J., et. al., PLoS Genetics, 2011
Different GWAS Approaches are
Complementary
9
40/77 (52%) KAVs, representing 39 chromosomal bins
(bin size =100kb), have been cross-validated.
Genotyped TAS
Cross-validated TAS
Single-variant GWAS (-log10(P-Value))
Baye
sia
n-b
ased G
WA
S (
Model F
req)
Bayesian-based
and single-variant
N=16/21(76%)
Bayesian-based
N=9/26(35%)
Single-variant
N=10/15(67%)
Stepwise
regression
N=6/14(43%)
10
GWAS identified >1,000 associations
for seven yield-related traits
Heterosis
•Genotypes: B73, F1, Mo17
•Understand fundamental biology
•Predict hybrid performance
12
Schnable and Springer, Annu.
Rev. Plant Biol. 2013
Variation in percent heterosis across
traits
KRN=Kernel Row Number, CD=Cob Diameter, AKW=Avg. Kernel
Weight, CL=Cob Length, CW=Cob Weight, KC=Kernel Count,
TKW=Total Kernel Weight
Missing heritability
Inclusion of dominant gene action
improves predictions
13
Percentage of HPH
herita
bili
ty
Four GWAS populations Only Diallel population
Additive Dominance General
herita
bili
ty
Percentage of HPH
Missing heritability
Classical Models for Heterosis
Over-dominance
x
AA bb aa BB Aa Bb
Complementation
Zamir
Additive or dominant gene action Over-dominant gene action
Degree of Dominance for TASs
15
Degree of dominance (h), where d denotes dominant
effect and a denotes additive effect.
h =d
a
A A B BBA
a
d
positive
dominance
h > 0.5
negative
dominance
h < -0.5
additive
-0.5 <= h <= 0.5
Trait Associated SNP Effects
16 *Dominance includes true dominance, over-dominance and pseudo-overdominance
Phenotype (P) = Genotype
(G) + Environment (E) +
GxE
• Genotype: NGS revolution and GBS
• Environment: weather, soil type, water,
nutrients, disease pressure, agronomic
practices etc.
• GxE interactions complicate phenotypic
predictions, but offer fascinating avenues
of investigation
17
L
SL
L L
SL
S
S
L
S
SL S
SSL
L
L
SL
S
S
L
SL
L
L
SLS
SLL
S
L
L
S
SL
SS
SL LSL
S
The Drought Monitor focuses on broad-scale conditions. Local conditions may
vary. See accompanying text summary for forecast statements.S
SL
L
http://droughtmonitor.unl.edu/
U.S. Drought Monitor October 1, 2013
Valid 7 a.m. EDT
(Released Thursday, Oct. 3, 2013)
Intensity:D0 Abnormally Dry
D1 Moderate Drought
D2 Severe Drought
D3 Extreme Drought
D4 Exceptional Drought
Author: David Miskus
Drought Impact Types:
S = Short-Term, typically less than 6 months (e.g. agriculture, grasslands)
L = Long-Term, typically greater than
6 months (e.g. hydrology, ecology)
Delineates dominant impacts
NOAA/NWS/NCEP/CPC
E and GxE complicate
phenotypic predictions
• Strategies for dealing with “E” and “GxE”
– Study traits that are stable across E
– Conduct studies in controlled environments,
taking E and GxE out of the equation
– Control for and study the effects of E and GxE
statistically…embrace the opportunity to gain
a deeper understanding of the underlying
biology
18
Field-Based Phenotyping
• Sensors mounted on field-deployed
robots/UAV (expensive)
• Inexpensive, field-based sensors
• Unmanned Aerial Vehicles (UAVs)
RTK-GPS
John Deere Sub-Compact Utility Tractor Equipped
with Topcon Universal Auto-Steer System
20
Camera View Angle 2
Row 3
60-inch row spacing
Row 2Row 1 Row 4
GPS
Camera View Angle 1
GPS
Top-View
Back-View
4 m
Lead Screw Drive
3D TOF Cameras
“Next Generation Phenotyping”
Lie Tang Maria Salas
Fernandez
NIR Stereo Camera
21
Phenobot
22
Field-Based Phenotyping
• Sensors mounted on field-deployed
robots/UAV (expensive)
• Inexpensive, field-based sensors
• Unmanned Aerial Vehicles (UAVs)
Stop-Action Photography
for Phenomics
James Schnable Univ of NE
Yong Suk Chung Iowa State Univ
Dynamic Responses to Drought
Field-Based Phenotyping
• Sensors mounted on field-deployed
robots/UAV (expensive)
• Inexpensive, field-based sensors
• Unmanned Aerial Vehicles (UAVs)
Unmanned Aerial Vehicles (UAVs)
27
Phenotype (P) = Genotype (G) + Environment (E) + GxE
Predictive Models Will:
• Improve the accuracy of selection in plant breeding programs, thereby increasing the rate of genetic gain per year
• Enhance our ability to efficiently breed crops to withstand the increased weather variability associated with global climate change
• Improved ability to provide farmers with evidence-based recommendations for the appropriate varieties to plant in a given field, under a particular management practice in a given year, leading to greater farmer profits and enhanced yield stability
28
Summary
• DNA sequence variation (SNP) can explain 40-70% of genetic variation (considering only additive gene action) or 80-90% (including dominant gene action)
• Dominant effects explain much of the missing heritability
• Ratio of loci exhibiting positive dominant gene action to those exhibit negative dominant gene action is correlated with the degree of heterosis for that trait
• Determining which loci confer positive and negative heterosis for specific traits may increase our ability to predict hybrid performance
• Phenomics is a bottleneck in GWAS, GS and breeding
• Field-based sensors will allow us to study the genetics of dynamic traits rather than being limited to end-point traits
PSS has IP and equity interests in Data2Bio LLC
Data2Bio, LLC
31
•Founded in 2010, Data2Bio designs,
executes, analyzes and interprets
research projects involving next
generation sequencing
•Core strengths are experimental
design, genomics, bioinformatics, and
breeding support
•Academic and private-sector
customers on all continents except
Antarctica
•Proprietary genomic technologies
associated with DNA barcoding and
genotyping-by-sequencing (tGBS™),
as well as proprietary bioinformatic
pipelines