R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17...
Transcript of R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17...
![Page 1: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/1.jpg)
R/qtl2high-dimensional data & multi-parent populations
Karl Broman
Biostatistics & Medical Informatics, UW–Madison
kbroman.orggithub.com/kbroman
@kwbromanSlides: bit.ly/uncc2017
![Page 2: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/2.jpg)
New
University of Wisconsin–MadisonPhD in Biomedical Data Science
bit.ly/MadBDS
2
![Page 3: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/3.jpg)
17 years of R/qtl
0
5000
10000
15000
20000
25000
30000
35000
40000
●●
●●
●● ●●
●●
●●●●
●
●●
●●●●
●●
●●●●● ●●●●● ● ●●●●●
●
●●● ●●● ● ●
Year
Line
s of
cod
e
●
●● ● ●
● ●●●●
●●●●
●● ●●●●●
●●
●●●●● ●●●●●
● ●●●●●●●●● ●●● ● ●
●●
● ● ● ● ●● ●●●● ●●
●●●●●●●
●●●●●●● ●●●●● ● ●●●●●●●●● ●●● ● ●
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
idea svn git
R
C
doc
3
![Page 4: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/4.jpg)
4
![Page 6: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/6.jpg)
Intercross
P1 P2
F1 F1
F2
6
![Page 7: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/7.jpg)
Data
7
![Page 8: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/8.jpg)
QTL mapping
0
2
4
6
8
Chromosome
LOD
sco
re
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X
BB BR RR
0.8
0.9
1.0
1.1
8
![Page 10: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/10.jpg)
17 years of R/qtl
0
5000
10000
15000
20000
25000
30000
35000
40000
●●
●●
●● ●●
●●
●●●●
●
●●
●●●●
●●
●●●●● ●●●●● ● ●●●●●
●
●●● ●●● ● ●
Year
Line
s of
cod
e
●
●● ● ●
● ●●●●
●●●●
●● ●●●●●
●●
●●●●● ●●●●●
● ●●●●●●●●● ●●● ● ●
●●
● ● ● ● ●● ●●●● ●●
●●●●●●●
●●●●●●● ●●●●● ● ●●●●●●●●● ●●● ● ●
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
idea svn git
R
C
doc
10
![Page 11: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/11.jpg)
Good things
▶ some of the code▶ basics of the user interface▶ diagnostics and data visualization▶ quite comprehensive▶ quite flexible
11
![Page 12: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/12.jpg)
Good things▶ some of the code▶ basics of the user interface▶ diagnostics and data visualization▶ quite comprehensive▶ quite flexible
11
![Page 13: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/13.jpg)
Bad things
12
![Page 14: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/14.jpg)
Input file
BBBBSBSBBB1m260.4547.1215
BBSB−−−1m395.2551.4814
BBSB−−−1m275.6360.9413
SBSBSBBBSB1m133.0538.6412
SBSB−−−1m184.1265.6311
SBSBBBBBBB1m154.8859.4710
SSSS−−−1m191.5459.269
SBSB−−−1m181.0565.318
SSSSBBSBSB1m126.1378.067
SBSBSBSBBB1m131.91586
BBBB−−−1m178.5888.335
SBSBSBSBBB1m153.1661.924
48.138.3110.451.427.33
221112
D2Mit75D2Mit379D1Mit17D1Mit80D1Mit18pgmsexspleenliver1
IHGFEDCBA
13
![Page 15: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/15.jpg)
Stupidest code ever
n <- ncol(data)temp <- rep(FALSE ,n)for(i in 1:n) {
temp[i] <- all(data[2,1:i]=="")if(!temp[i]) break
}if(!any(temp)) stop("...")n.phe <- max((1:n)[temp])
kbroman.org/blog/2011/08/17/the-stupidest-r-code-ever
14
![Page 16: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/16.jpg)
Open source meanseveryone can see my stupid mistakes
Version control meanseveryone can see every stupid mistake I’ve evermade
15
![Page 17: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/17.jpg)
Open source meanseveryone can see my stupid mistakes
Version control meanseveryone can see every stupid mistake I’ve evermade
15
![Page 18: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/18.jpg)
Documentation
16
![Page 19: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/19.jpg)
Support
17
![Page 20: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/20.jpg)
QTL mapping
0
2
4
6
8
Chromosome
LOD
sco
re
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X
BB BR RR
0.8
0.9
1.0
1.1
18
![Page 21: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/21.jpg)
Congenic line1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Y
19
![Page 22: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/22.jpg)
Improving precision
▶ more recombinations▶ more individuals▶ more precise phenotype▶ lower-level phenotypes
– transcripts, proteins, metabolites
20
![Page 23: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/23.jpg)
Advanced intercross linesP
A B
F2
F3
F4
F7
F10
21
![Page 24: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/24.jpg)
Recombinant inbred linesP1 P2
F1 F1
F2
F3
F4
●
●
●
F∞
22
![Page 25: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/25.jpg)
Collaborative CrossG0
A B C D E F G H
A B C D E F G H
G1
G2
ABCD EFGH
G3
G4
●
●
●
G∞
23
![Page 26: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/26.jpg)
Heterogeneous stockG0
A B C D E F G H
G1
G2
G10
G15
G20
24
![Page 28: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/28.jpg)
Challenges: diagnostics
kbroman.org/blog/2012/04/25/microarrays-suck26
![Page 30: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/30.jpg)
Challenges: diagnostics
▶ What might have gone wrong?▶ How might it be revealed?▶ Make lots of graphs▶ Follow up artifacts
28
![Page 31: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/31.jpg)
Challenges: scale of results
genotypes phenotypes
29
![Page 32: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/32.jpg)
Challenges: scale of results
genotypes phenotypes
results
29
![Page 33: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/33.jpg)
Challenges: organizing, automating
genotypes phenotypes
30
![Page 34: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/34.jpg)
Challenges: organizing, automating
genotypes phenotypes
30
![Page 35: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/35.jpg)
Challenges: organizing, automating
genotypes phenotypes
30
![Page 36: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/36.jpg)
Challenges: organizing, automating
genotypes phenotypes
30
![Page 37: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/37.jpg)
Challenges: organizing, automating
genotypes phenotypes
30
![Page 38: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/38.jpg)
Challenges: organizing, automating
genotypes phenotypes
30
![Page 39: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/39.jpg)
Challenges: organizing, automating
genotypes phenotypes
30
![Page 40: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/40.jpg)
Challenges: metadata
What the heck is "FAD_NAD SI 8.3_3.3G"?
31
![Page 41: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/41.jpg)
What was the question again?
32
![Page 43: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/43.jpg)
R/qtl2
▶ High-density genotypes▶ High-dimensional phenotypes▶ Multi-parent populations▶ Linear mixed models
kbroman.org/qtl2
34
![Page 44: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/44.jpg)
R/qtl2: Let’s not make the same mistakes
▶ C++ and Rcpp▶ Roxygen2 for documentation▶ Unit tests▶ A single “switch” for cross type
▶ Split into multiple packages▶ Yet another data input format▶ Flatter data structures, but still complex
35
![Page 45: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/45.jpg)
R/qtl2: Let’s not make the same mistakes
▶ C++ and Rcpp▶ Roxygen2 for documentation▶ Unit tests▶ A single “switch” for cross type▶ Split into multiple packages▶ Yet another data input format▶ Flatter data structures, but still complex
35
![Page 46: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/46.jpg)
Sustainable academic software
36
![Page 47: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/47.jpg)
Acknowledgments
Danny ArendsGary ChurchillNick FurlotteDan GattiRitsert JansenPjotr PrinsŚaunak SenPetr SimecekArtem TarasovHao WuBrian Yandell
Robert CortyTimothée FlutreLars RonnegardRohan ShahLaura ShannonQuoc TranAaron Wolen
NIH/NIGMS
37
![Page 48: R/qtl2 - high-dimensional data & multi-parent populationskbroman/presentations/rqtl2_uncc2017.pdf17 years of R/qtl 0 5000 10000 15000 20000 25000 30000 35000 40000 2 2 2 2 2 222 2](https://reader031.fdocuments.us/reader031/viewer/2022041606/5e348587c82df2487d4bfd9e/html5/thumbnails/48.jpg)
Slides: bit.ly/uncc2017
kbroman.org
kbroman.org/qtl2
github.com/kbroman
@kwbroman
38