Post on 13-Jan-2020
R/qtl2high-dimensional data & multi-parent populations
Karl Broman
Biostatistics & Medical Informatics, UW–Madison
kbroman.orggithub.com/kbroman
@kwbromanSlides: bit.ly/uncc2017
New
University of Wisconsin–MadisonPhD in Biomedical Data Science
bit.ly/MadBDS
2
17 years of R/qtl
0
5000
10000
15000
20000
25000
30000
35000
40000
●●
●●
●● ●●
●●
●●●●
●
●●
●●●●
●●
●●●●● ●●●●● ● ●●●●●
●
●●● ●●● ● ●
Year
Line
s of
cod
e
●
●● ● ●
● ●●●●
●●●●
●● ●●●●●
●●
●●●●● ●●●●●
● ●●●●●●●●● ●●● ● ●
●●
● ● ● ● ●● ●●●● ●●
●●●●●●●
●●●●●●● ●●●●● ● ●●●●●●●●● ●●● ● ●
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
idea svn git
R
C
doc
3
4
Intercross
P1 P2
F1 F1
F2
6
Data
7
QTL mapping
0
2
4
6
8
Chromosome
LOD
sco
re
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X
BB BR RR
0.8
0.9
1.0
1.1
8
17 years of R/qtl
0
5000
10000
15000
20000
25000
30000
35000
40000
●●
●●
●● ●●
●●
●●●●
●
●●
●●●●
●●
●●●●● ●●●●● ● ●●●●●
●
●●● ●●● ● ●
Year
Line
s of
cod
e
●
●● ● ●
● ●●●●
●●●●
●● ●●●●●
●●
●●●●● ●●●●●
● ●●●●●●●●● ●●● ● ●
●●
● ● ● ● ●● ●●●● ●●
●●●●●●●
●●●●●●● ●●●●● ● ●●●●●●●●● ●●● ● ●
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
idea svn git
R
C
doc
10
Good things
▶ some of the code▶ basics of the user interface▶ diagnostics and data visualization▶ quite comprehensive▶ quite flexible
11
Good things▶ some of the code▶ basics of the user interface▶ diagnostics and data visualization▶ quite comprehensive▶ quite flexible
11
Bad things
12
Input file
BBBBSBSBBB1m260.4547.1215
BBSB−−−1m395.2551.4814
BBSB−−−1m275.6360.9413
SBSBSBBBSB1m133.0538.6412
SBSB−−−1m184.1265.6311
SBSBBBBBBB1m154.8859.4710
SSSS−−−1m191.5459.269
SBSB−−−1m181.0565.318
SSSSBBSBSB1m126.1378.067
SBSBSBSBBB1m131.91586
BBBB−−−1m178.5888.335
SBSBSBSBBB1m153.1661.924
48.138.3110.451.427.33
221112
D2Mit75D2Mit379D1Mit17D1Mit80D1Mit18pgmsexspleenliver1
IHGFEDCBA
13
Stupidest code ever
n <- ncol(data)temp <- rep(FALSE ,n)for(i in 1:n) {
temp[i] <- all(data[2,1:i]=="")if(!temp[i]) break
}if(!any(temp)) stop("...")n.phe <- max((1:n)[temp])
kbroman.org/blog/2011/08/17/the-stupidest-r-code-ever
14
Open source meanseveryone can see my stupid mistakes
Version control meanseveryone can see every stupid mistake I’ve evermade
15
Open source meanseveryone can see my stupid mistakes
Version control meanseveryone can see every stupid mistake I’ve evermade
15
Documentation
16
Support
17
QTL mapping
0
2
4
6
8
Chromosome
LOD
sco
re
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X
BB BR RR
0.8
0.9
1.0
1.1
18
Congenic line1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Y
19
Improving precision
▶ more recombinations▶ more individuals▶ more precise phenotype▶ lower-level phenotypes
– transcripts, proteins, metabolites
20
Advanced intercross linesP
A B
F2
F3
F4
F7
F10
21
Recombinant inbred linesP1 P2
F1 F1
F2
F3
F4
●
●
●
F∞
22
Collaborative CrossG0
A B C D E F G H
A B C D E F G H
G1
G2
ABCD EFGH
G3
G4
●
●
●
G∞
23
Heterogeneous stockG0
A B C D E F G H
G1
G2
G10
G15
G20
24
Challenges: diagnostics
kbroman.org/blog/2012/04/25/microarrays-suck26
Challenges: diagnostics
▶ What might have gone wrong?▶ How might it be revealed?▶ Make lots of graphs▶ Follow up artifacts
28
Challenges: scale of results
genotypes phenotypes
29
Challenges: scale of results
genotypes phenotypes
results
29
Challenges: organizing, automating
genotypes phenotypes
30
Challenges: organizing, automating
genotypes phenotypes
30
Challenges: organizing, automating
genotypes phenotypes
30
Challenges: organizing, automating
genotypes phenotypes
30
Challenges: organizing, automating
genotypes phenotypes
30
Challenges: organizing, automating
genotypes phenotypes
30
Challenges: organizing, automating
genotypes phenotypes
30
Challenges: metadata
What the heck is "FAD_NAD SI 8.3_3.3G"?
31
What was the question again?
32
R/qtl2
▶ High-density genotypes▶ High-dimensional phenotypes▶ Multi-parent populations▶ Linear mixed models
kbroman.org/qtl2
34
R/qtl2: Let’s not make the same mistakes
▶ C++ and Rcpp▶ Roxygen2 for documentation▶ Unit tests▶ A single “switch” for cross type
▶ Split into multiple packages▶ Yet another data input format▶ Flatter data structures, but still complex
35
R/qtl2: Let’s not make the same mistakes
▶ C++ and Rcpp▶ Roxygen2 for documentation▶ Unit tests▶ A single “switch” for cross type▶ Split into multiple packages▶ Yet another data input format▶ Flatter data structures, but still complex
35
Sustainable academic software
36
Acknowledgments
Danny ArendsGary ChurchillNick FurlotteDan GattiRitsert JansenPjotr PrinsŚaunak SenPetr SimecekArtem TarasovHao WuBrian Yandell
Robert CortyTimothée FlutreLars RonnegardRohan ShahLaura ShannonQuoc TranAaron Wolen
NIH/NIGMS
37
Slides: bit.ly/uncc2017
kbroman.org
kbroman.org/qtl2
github.com/kbroman
@kwbroman
38