Candidate gene screening using long-read sequencing · Candidate gene screening using long-read...
Transcript of Candidate gene screening using long-read sequencing · Candidate gene screening using long-read...
For Research Use Only. Not for use in diagnostics procedures. © Copyright 2016 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences.
BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. All other trademarks are the sole property of their respective owners.
Poly-T
Long reads and heterozygous SNPs allow you to phase reads into two haplotypes
Candidate gene screening using long-read sequencing
Jenny Ekholm, Ting Hon, Yu-Chih Tsai, David Greenberg, Tyson A. Clark, Steve KujawaPacBio, Menlo Park, CA USA
We have developed several candidate gene
screening applications for both Neuromuscular
and Neurological disorders. The power behind
these applications comes from the use of long-
read sequencing. It allows us to access
previously unresolvable and even unsequencable
genomic regions. SMRT Sequencing offers
uniform coverage, a lack of sequence context
bias, and very high accuracy. In addition, it is also
possible to directly detect epigenetic signatures
and characterize full-length gene transcripts
through assembly-free isoform sequencing.
SMRT Sequencing overview Alzheimer's disease (AD): TOMM40 gene
1) Roses AD, et al. (2010). A TOMM40 variable-length polymorphism predicts the age
of late-onset Alzheimer’s disease. Pharmacogenomics J. 10(5): 375-84
2) Sekar A, et al . (2016). Schizophrenia risk from complex variation of complement
component 4. Nature. 530(7589):177-83
Long-read sequencing data properties
References
Data generated with 20 kb size-selected
human library using 6 hour movies with
P6-C4 chemistry using PacBio RS II,
analyzed with SMRT® Analysis v 2.3.
Each SMRT Cell generates ~55,000.
reads.
E. coli 20 kb-insert library, SMRT
Analysis v2.3
A G C T mA G T T Template strand
C G A G C T AG TTC A T G T Template strand
Example: N6-methyladenine
In addition to calling the bases, SMRT
Sequencing uses the kinetic information from
each nucleotide to distinguish between modified
and native bases.
Informatics Pipeline
Remove adapters
Remove artifacts
Classify
sequence
reads
Reads clustering
Isoform
clusters
Consensus
calling
Nonredundant
transcript
isoforms
Quality
filtering
Final isoforms
PacBio raw
sequence
reads
Map to
reference genome
Experimental pipeline Informatics pipeline
PacBio raw
sequence reads
Figure 1
a b
AAAA
AAAA
AAAAA
AAAAA
AAAAA
AAAAA
AAAAA
Size partitioning &
PCR amplification
cDNA synthesis
with adapters
SMRTbell ligation
RS sequencing
Remove adapters
Remove artifacts
Reads clustering
Quality filtering
Clean
sequence reads
Nonredundant
transcript isoforms
Final isoforms
TTTT
TTTT
Consensus calling
Isoform clusters
Map to reference genome
Evidence-based gene models
polyA mRNA
AAAA
AAAA
TTTT
TTTT
AAAATTTT
AAAATTTT
AAAATTTT
AAAATTTT
Evidenced-based
gene models6 7 8 9 10
DevNet: Iso-Seq wiki page
SampleNet: Iso-Seq Method
with Clonetech cDNA
Synthesis Kit
Raw5’ primer 3’ primer
(AAA)n
(TTT)nSMRT adapter
(TTT)n
(AAA)n
Coding sequencePoly(A) tail
SMRT adapter
(AAA)nReads of Insert(AAA)n
Poly(A) mRNA
AAAAA
AAAAA
AAAAA
AAAAA
cDNA synthesis with
adapters
AAAAATTTTT
AAAAATTTTT
AAAAATTTTT
AAAAATTTTT
AAAAATTTTT
AAAAATTTTT
AAAAATTTTT
AAAAATTTTT
Size Selection & PCR
amplification SMRTbell ligation
Experimental Pipeline
1 2 3 4
Sequel/PacBio RS II
Sequencing
5
A variable poly-T repeat at the rs10524523 SNP
within intron 6 of the TOMM40 gene that in
combination with APOE3 allele will affect the
age of onset of AD1)
The C4 structural variant is highly complex:2)
- Two functionally distinct genes (isotypes); C4A
and C4B
- Both isotypes can have 1 - 3 functional copies
- A human endogenous retroviral (HERV) insertion
in intron 9 changes the length of the gene
Sample 1 All
Sample 1 Phase 0
Sample 1 Phase I
Schizophrenia: C4 gene
Determines if
Long or Short 2 aa difference
in exon 26 determines
C4A or C4B isotype
Different possible
combinations
(L/S/C4A/C4B)
We successfully captured and sequenced the
associated poly-T repeat in the TOMM40 gene
and were able to determine that the sample had
one short (15) allele and one Very Long (34)
allele.
Phase 0
Phase 1 Intron 9
Phase 0
Phase 1
Example: Isotype
C4A/L was
successfully
called for a
sample using
long-read
sequencing
Gene isoform characterization
Amplification-free targeted sequencing
using CRISPR/Cas9
Repeat expansion disorders are challenging to
interrogate due to the long repetitive regions.
Using CRISRP/Cas9 we are able to access the
repeat counts, interruption sequences as well as
epigenetic information without introducing PCR
bias.
Targeted sequencing and multiplexing
Targeted sequencing workflow Barcoding options
2. Barcoded Universal Primer
1. Barcoded adapters
C4 plays a role in signaling which connections
between neurons should be “pruned” or removed,
as the brain develops after childhood. And the
more C4 was present, the higher the risk of
developing schizophrenia. Certain versions of the
C4 gene seem to increase people’s risk for
developing schizophrenia by 27 to 50 percent.
We successfully captured and sequenced the
C4 gene as part of a MHC capture panel. We
were able to see the two different C4 isotypes
(C4A and C4B) as well as seeing the 7 kb
HERV insertion in intron 9.
The Huntingtin gene
CAG repeat counts in
HD patients
Direct methylation detection of FMR1
pre-mutation sample
CGG repeat region appears to be heavily
methylated (5mC)
HTT gene in HD sample
gDNA & Transcripts from SK-BR-3 Cell Line Captured with NimbleGen
Oncology Panel - example AURKA gene
Phased Transcripts reveal retained introns and skipped exons
Read Length Accuracy