Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help...
-
Upload
randolph-york -
Category
Documents
-
view
224 -
download
5
Transcript of Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help...
![Page 1: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/1.jpg)
Visualising NGS data in GBrowse 2
August 2009 GMOD Meeting6-7 August 2009
Dave ClementsGMOD Help DeskNational Evolutionary Synthesis Center (NESCent)[email protected]
![Page 2: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/2.jpg)
Overview
• Visualizing Next Generation Sequence in GBrowse– Using GBrowse with SAMtools as a short read
viewer• Whole genome resequencing of E. coli strains
– GBrowse for population genetics• SNPs in threespine stickleback
– Other visualizations
• Next Generation Sequencing & Bioinformatics
![Page 3: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/3.jpg)
SAMtools
Platform neutral set of programs and file formats specifically for short reads.
Heng Li, et al., http://samtools.sf.net
![Page 4: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/4.jpg)
QuickTime™ and a decompressor
are needed to see this picture.
E. coli whole genome resequencing
Tale of 4 strains• Ancestral
– Reference
• Derived 1– Manipulated in two places
(neutral, metabolic)– Exposed to phage yielding
2 resistant strains
• Derived 1A– 1bp change
• Derived 1B– 2-3kbp deletion
Brendan Bohannon, Liz Perry, Nick Stiffler, University of Oregon
Ancestral
Derived 1
Derived 1A
Derived 1B
![Page 5: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/5.jpg)
E. coli: Process
• Extract DNA from 3 derived strains
• Sonicate, aiming for 500bp fragments
• Unpaired end run on an Illumina GA2
• Filter results for quality
• Align with MAQ*
• Convert to BAM (SAMtools binary format)
• Visualize with GBrowse
* Li H., Ruan J. and Durbin R. (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res., 18 (11), 1851-1858. http://maq.sf.net
![Page 6: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/6.jpg)
GBrowse asan Alignment ViewerHigh magnification
view: 100bp
Uses GBrowse 2 (Beta) and the Bio::DB::Sam GBrowse database adaptor (Alpha), and SAMtools
![Page 7: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/7.jpg)
GBrowse ConfigurationTo produce this:
DB stanza
Track stanza
Add callback
![Page 8: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/8.jpg)
GBrowse Conf
Anything in SAM format is accessible.
![Page 9: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/9.jpg)
GBrowse asan Alignment ViewerAs you zoom out to
200bp you lose letters.
As you zoom out to 2000bp the view becomes much less useful.
SAMtools, GBrowse 2, & Bio::DB::Sam adaptor make this volume of data computationally tractable
![Page 10: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/10.jpg)
GBrowse as an Alignment ViewerSummarize!
SAMtools and GBrowse help here
SAMtools can summarize values from the data.
![Page 11: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/11.jpg)
With a little bit of script writing you can see more.
Read coverage by strand.
GBrowse as an Alignment ViewerEven basic summarization can
be enough to gain insight.
GC content & read coverage for first 200Kbp of E. coli
![Page 12: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/12.jpg)
Overview
• Visualizing Next Generation Sequence in GBrowse– SAMtools, and GBrowse as a short read viewer
• Whole genome resequencing of E. coli strains
– GBrowse for population genetics• SNPs in threespine stickleback
– Other visualizations
• Next Generation Sequencing & Bioinformatics
![Page 13: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/13.jpg)
GBrowse for Population Genetics
Threespine Stickleback• Tale of 2 populations, 8 (or 12) fish from each
– Rabbit Slough, marine• ancestral, reference• High body plating
– Bearpaw Lake, freshwater • Diverged in last 10-15,000 years• Low body plating
• Pattern repeats all over northern hemisphere• Deep sequencing around restriction sites • Aiming to identify SNPs at a minimum density,
genome wide
Bill Cresko, Paul Hohenlohe, Nick Stiffler, University of Oregon
![Page 14: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/14.jpg)
GBrowse for Population Genetics
Process for threespine stickleback• Extract DNA from each fish• Break it up with restriction enzymes.• Apply RAD tags with bar code • Do an unpaired-end run on an Illumina GA2• Filter results for quality• Align it with MAQ• Make SNP calls • Visualize it with GBrowse
![Page 15: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/15.jpg)
GBrowse for Population GeneticsShows
Where we lookedAllele & genotype frequencies By population Individual genotypes
Could also show:• Frequency by
phenotype or any other characteristic
• Sliding window stats
![Page 16: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/16.jpg)
Overview
• Visualizing Next Generation Sequence in GBrowse– SAMtools, and GBrowse as a short read viewer
• Whole genome resequencing of E. coli strains
– GBrowse for population genetics• SNPs in threespine stickleback
– Other visualizations
• Next Generation Sequencing & Bioinformatics
![Page 17: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/17.jpg)
HapMap Allele Frequencies
http://hapmap.org
![Page 18: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/18.jpg)
Geolocation data
QuickTime™ and aBMP decompressor
are needed to see this picture.
QuickTime™ and aBMP decompressor
are needed to see this picture.
Ben Faga
![Page 19: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/19.jpg)
Overview
• Visualizing Next Generation Sequence in GBrowse– SAMtools, and GBrowse as a short read viewer
• Whole genome resequencing of E. coli strains
– GBrowse for population genetics• SNPs in threespine stickleback
– Other visualizations
• Next Generation Sequencing & Bioinformatics
![Page 20: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/20.jpg)
Needed Resources
To visualize NGS data you will need– Data– A computer, or two
• How big?
– Bioinformatics support • How much?
![Page 21: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/21.jpg)
Bioinformatics Support for NGS
• Examples– E. coli data
• Given aligned reads, SNP calls• Did software installs and configuration, but only
programming was the single Perl callout
– Threespine Stickleback• Given aligned reads, SNP calls• Did installs and configuration, and some scripting
(Python and Perl)
• To just visualize the data– You need someone who can install and
configure software, and write scripts.
![Page 22: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/22.jpg)
To do more than visualise …
http://www.genomeweb.com/sequencing/users-weigh-next-gen-platforms-over-half-consider-adding-systems-%E2%80%9808
Almost all survey respondents pointed out the considerable computational and bioinformatics needs that the new platforms require. “Anyone thinking of getting these instruments needs a strong IT/informatics group,” wrote one Illumina user.
“Our greatest challenge is the lack of bioinformatics support,” another said.
“Invest in file servers, computer platforms, and computational biologists,” a 454 user said.
An ABI SOLiD user said the greatest challenge has been “data management, interrogation, and visualization.”
GenomeWeb Survey
![Page 23: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/23.jpg)
Acknowledgements
OICRLincoln Stein
Scott CainOregonNick StifflerLiz Perry
Brendan Bohannon Paul Hohenlohe
Bill CreskoPatrick Phillips
OxfordSteve Taylor
Simon MacGowanZong-Pei Han
NESCentTodd VisionHilmar Lapp
SangerHeng Li
![Page 24: Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)](https://reader031.fdocuments.us/reader031/viewer/2022020801/56649ebe5503460f94bc7bc8/html5/thumbnails/24.jpg)
Thank You!
Dave ClementsGMOD Help Desk
National Evolutionary Synthesis Center
[email protected]@gmod.org
http://gmod.org/GMOD_Help_Deskhttp://nescent.org