Targeted RNA Sequencing, Urban Metagenomics, and Astronaut Genomics

138
Targeted RNA Sequencing, Urban Metagenomics, and Astronaut Genomics Christopher E. Mason Associate Professor Department of Physiology and Biophysics & The Institute for Computational Biomedicine at the Weill Cornell Medicine and the Tri-Institutional Program on Computational Biology and Medicine Fellow of the Information Society Project, Yale Law School February 11 th , 2016 @mason_lab

Transcript of Targeted RNA Sequencing, Urban Metagenomics, and Astronaut Genomics

104 genes, 25ng of RNA

Targeted RNA Sequencing, Urban Metagenomics, and Astronaut GenomicsChristopher E. MasonAssociate ProfessorDepartment of Physiology and Biophysics &The Institute for Computational Biomedicine at theWeill Cornell Medicine and theTri-Institutional Program on Computational Biology and MedicineFellow of the Information Society Project, Yale Law SchoolFebruary 11th , 2016@mason_lab

Exome:2% of the genome?

Human Genome Organization

from Mason, and Bozinosky, Kaplan & Sadocks Comprehensive Textbook of Psychiatry, 2016

3

ENCODE active elements!

These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions.

Some disagree

The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks.We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten.

Can RNA-Seq replace microarrays?

Marioni and Mason et. al, Genome Research, 2008RNA-Seq: An assessment of technical reproducibility and comparison with gene expression arrays-Original title The Death Knell to Microarrays.

6

More DEGs found with RNA-seq(37%)DE inSolexa4,959(12%)DE inAffy1,579(50%)Both6,534Marioni and Mason et al, Genome Research, 2008

7Whislt this is interesting, a more relevant measure might be the number of genes that overlap between the two technologies

More DE genes found with RNA-Seq

Liu et al, 2010

8Estimated log2 fold-change from Solexa (y axis) and Aymetrix (x axis) are highly correlated. Weconsider only genes that were interrogated using both platforms and genes where the mean numberof counts across lanes was greater than 0 for both the liver and kidney samples. Red and green dotsrepresent genes called as dierentially expressed based on the Solexa data at an FDR of 0.1%, witha mean number of counts greater than (red), or less than (green), 250 reads. Black dots representgenes not called as dierentially expressed based on the Solexa data. The set of dierentiallyexpressed genes that show the strongest correlation between the two technologies seem to be thosethat are mapped to by many reads (colored red), while the correlation is weaker for dierentiallyexpressed genes mapped to by fewer reads (green).

Differential expression by gene, exon, splice isoform, allele, & transcriptAlgorithms: STAR, r-make, ASE, limma-voom, RSEM3readsalignmentsSorted BAMsGENCODE annotation

queries Reads/bpAlignment on HPC nodes

Referenceshg19RefSeqmiRBaserRNAAdapters

Find ncRNAs and new genesAlgorithms: r-make, Aceview4Sequencing DataGene fusion detectionAlgorithms: r-make, Snowshoes5

Genetic variation (SNVs and indels)Algorithms: STAR/GATK, r-make2

Predict polyA sites & gain/loss of miRNA binding sitesAlgorithms: r-make, BAGET AlexaSeq, TargetScan6

Viruses/Bacteria/Other Use: BLAST, MetaPhlANRemaining Reads TCTGCTTTAGGATAGATCGATAGCTAGTTCAT CTGCTTTAGGATAGATCGATAGCTAGTTCATCTCTGCTTTAGGATAGATCGATAGCTAGTTCAT7

1RNA-seq gives many views of biology.RNA-seq=Love.Li S, Nature Biotechnology, 2014.

9

Many ways to sequence RNA

Sept. 9, 2014http://www.nature.com/nbt/focus/seqc/index.html

All technologies have varying strengths

GENCODE annotationv24Coding genes: 19,815

Noncoding genes: 25,823

Psuedogenes: 14,505IgG/TcR/Other genes: 41160,554 totalhttp://www.gencodegenes.org/stats.html

But the lncRNAs hide at low levels

Derrien et al., Genome Research, 2012http://genome.cshlp.org/content/22/9/1775.full

What else is at low levels? A Clinical example:Using RNA-Seq to find chemo-resistant clones in ALL

Meyer et al, Nature Genetics, 2013

Meyer et al, Nature Genetics, 2013

Only Significantly Associated Clinical Variable Was Early Relapse

Meyer et al, Nature Genetics, 2013

7 of 40

We see functional mutation clustering within the protein

NT5C2: 5'-nucleotidase (purine), cytosolic type II Meyer et al, Nature Genetics, 2013

5nucleotidase, cytosolic II18

NT5C2 Mutants Confer Chemoresistance to Purine Nucleoside Analogue Treatment 6-MP 6-TG Reh cells transiently lentiviral infected with WT, GFP, and mutantsMeyer et al, Nature Genetics, 2013

Meyer et al, Nature Genetics, 2013Many mutations hide at low frequency how do we find them?

Global RNA-sequencing has wiggles

Wang, et al. 2011, Nat Rev Genetics

QIAseq Targeted RNA Panels for gene expression profiling using Digital RNA sequencing

Molecular barcodes enablingDigital RNAseq

Rapid Targeted RNA Panel Design(Life Moves Pretty Fast)

AML recurrent, relapse-specific dys-regulated genes140 patient cohort of diagnosis-relapse pairs of AML: WES, RRBS, RNA-seq on allDifferentially Expressed Genes (DEGs) were found with DESeq2, and also inverse correlation with methylation in gene promoters. 25% methylation difference, q-value 0.01, and >1.5-fold FCOut of 140 patients, 104 genes found in at least 30% of themOnly 10ng of total RNA left from patient samples

QIASeq Targeted RNA Panels Leverage Barcodes (a.k.a. UMIs)

Digital sequencing using molecular barcode technologyTagging each cDNA template with a unique barcode Counting the number of barcodes to correct any amplification artifactsProviding unparalleled value: accurate and unbiased gene expression analysis with NGS

QIAseq Targeted RNA Panels use a digital sequencing method, whereby a each transcript (after mRNA is converted to cDNA) get tagged with unique 12-base random molecular barcode prior to any amplification step. Thus, enrichment and amplification events yield a unique combination of molecular barcode and target sequence. At the end of sequencing, the relative amount of each mRNA target is determined by counting the number of unique molecular barcode-target combinations instead of reads, thereby eliminating PCR duplicates and amplification bias, resulting in more accurate, unbiased gene expression analysis.25

6 hour library prep procedure

6 hours

Very good performance metricsERCCsAccuracy (vs. qPCR): R2 = 0.90Specificity (on-target reads): >97%Uniformity (20% of mean): >97%Reproducibility (lab 1 vs. lab 2): R2 = 0.99Sensitivity: detect ~0.2 copies of RNA per cell

128 copies1 2 3 4 5 6 7 8 9 10 11 12 1310 tags

Easy-to-use, Online custom builderChoose your own gene content from 20,000+ human genes and lncRNAEasy to use online Custom Panel Builder to tailor panel specific to your research needsInput list of genesSelect proper controls (genomic DNA contamination control, HKGs)

Output: list of genomic ordinates for primers designed specific to genes of interest

We used custom panel consist of 104 genes of interest + GDCs + 10 HKGs

Targeted RNA Capture Panel ran well on the NextSeq (1x150)

UHU-24-1-E9_S57 VS NBM (Normalized to Top3 HKGs)

Easy to see switch of leukemia genes from normal bone marrow (NBM)

UHU-51-1-H11_S95(group1) AND UHU-51-2-H12_S96 (group2) VS NBM (Normalized to Top3 HKGs)

Differences from Diagnosis to Relapse relative to normal bone marrow (NBM)

HIST1H1C, HIST1H1D, HIST1H2BD, HIST2H2BE

R2=0.92All = 0.86

The Others

But! There is more than one genome:In your bodys cellular democracy, YOU are a minority party:

1.3-10X bacterial:human cells (Zhu et al., 2010, Sender et al, 2016)150:1 bacterial:human active transcripts in the gut microbiome (Qin et al, 2010)

http://biorxiv.org/content/early/2016/01/06/036103

Jessica Lee Green

38Flatworms, cnidarians, Ecteinascidia turbinata , invertebrates, vertebrates, ganesh

Open-Source GIS Cloud App (iOS and Android)

City-scale metagenomics

First city-scale metagenome profile

1. Swab (3 min)Data Entry2. Annotate3. GPS-tag/timestampExtract DNA (n=1,457 samples)

96-plex TruSeq/Qiagen Libraries10.2 billion 125x125 DNA Seqs.Quality Trim (Q20)MegaBLAST-LCA alignment

Confirm with MetaPhlANUpload

Pseudomonas DensityHalf of the world under our fingertips is unknown

Power to the Soil Gave Us All Kingdoms

http://www.wsj.com/articles/big-data-and-bacteria-mapping-the-new-york-subways-dna-1423159629http://graphics.wsj.com/patho-map/?sel=stn_311

Pseudomonas putida can help absorb chemicals

HMP Comparison ShowsThat the Subway Looks Like Skin

AssociatedBody Region

Log2 Ratio of (Observed/Expected)Staphylococcus epidermidisStaphylococcus aureusAcinetobacter radioresistensPropionibacterium acnes

>600 speciesride the subway with you!

Mostly harmless.

Pathogenicity markersabsent.NY Magazine, November 5th, 2013

molecular echoes

Hourly dynamics of a populated kiosk are far more heterogeneous

8:15

9:00

10:00

11:00

12:00

13:00

14:00

15:00

16:00

17:00

Some areas are more stable: Gowanus Canal

blogs.ei.columbia.edu

Apr. 222015

Gowanus Canal was a methanogen heaven

Hurricane-Flooded,

Staphylococcus aureusEnterococcus faecium

All species57552Specific species10Pseudoalteromonas haloplanktis

Species diversity varies by area of the city

A persistent molecular echo of the cold, ocean water

Shewanella frigidimarina Frolova, G. M.; Gumerova, P. A.; Romanenko, L. A.; Mikhailov, V. V. (2011). "Characterization of the lipids of psychrophilic bacteria Shewanella frigidimarina isolated from sea ice of the Sea of Japan". Microbiology 80 (1): 3036An Antarctic species with the ability to produce eicosapentaenoic acid. It grows anaerobically by dissimilatory Fe (III) reduction.[1] Its cells are motile and rod shaped

EPA is obtained in the human diet by eating oily fish or fish oil, e.g. cod liver, herring, mackerel, salmon, menhaden and sardine, and various types of edible seaweed.

The Hygiene HypothesisThe hygiene hypothesis is a hypothesis that states that a lack of early childhood exposure to infectious agents, symbiotic microorganisms (e.g. gut flora or probiotics), and parasites increases susceptibility to allergic diseases by suppressing the natural development of the immune system.

"Infants born by cesarean delivery are at increased risk of asthma, obesity and type 1 diabetes, whereas breastfeeding is variably protective against these and other disorders.- Rob Knighthttp://blog.ted.com/how-microbes-could-cure-disease-rob-knight-at-ted2014/

http://j-humphries.deviantart.com/art/Forcefield-397075521

Genotype data can predict your birthplace

Genes mirror geography within EuropeNovembre et al., 2008

http://www.cnn.com/2013/09/04/tech/innovation/dna-face-sculptures/from Heather Dewey-Hagborgs Stranger Visions at Genspace (Brooklyn)

http://demographics.coopercenter.org/DotMap/index.html. Image Segmentation (BIS)

Machine-learning Image Segmentation (BIS)

Human Ancestry Prediction

YorubaLuhyaAfrican AmericanPuerto RicanSpanishTuscanEuropean-UtahBritishFinnishHan ChineseJapaneseColombianMexican

Collection Site=P01461 Demographic data Ancestry Prediction

100 -

80 -

60 -

40 -

20 -

0 -Ancestry Mapper Genetic MatchPredicted Ancestry of DNA left behindcan Mirror Census Data in White areasAfshinnekoo E, Meydan C, et al., Cell Systems, 2015.

Alleles appear more Hispanic and more Asian in downtown Manhattan

Collection Site = P00951 Demographic data Ancestry Prediction

YorubaLuhyaAfrican AmericanPuerto RicanSpanishTuscanEuropean-UtahBritishFinnishHan ChineseJapaneseColombianMexican

100 -

80 -

60 -

40 -

20 -

0 -

Ancestry Mapper Genetic MatchChinatownAfshinnekoo E, Meydan C, et al., Cell Systems, 2015.

North Harlem and Washington Heights show more Yoruban alleles and Puerto Rican alleles

YorubaLuhyaAfrican AmericanPuerto RicanSpanishTuscanEuropean-UtahBritishFinnishHan ChineseJapaneseColombianMexican

100 -

80 -

60 -

40 -

20 -

0 -

Collection Site = P00166 Demographic data Ancestry Prediction

Ancestry Mapper Genetic Match

We can detect humans molecular echoAfshinnekoo E, Meydan C, et al., Cell Systems, 2015.

75

You can choose what DNA to leave behind

https://www.google.com/patents/US8073628

Should I ride the subway?

Yes!With Ice Cream

Washington D.C.

www.metasub.org

Now a Global Effort 45 citieshttp://www.metasub.org/interactive-map.html

Optimized PCR cycles for QiaSeqFX environmental DNA samples (10-25ng)

QIAseq FX generates mechanical-quality DNA fragmentation Title, Location, Date92Sample-to-sample fragmentation reproducibility:

Customized fragment size from any input or G/C content:100ng1ug

In our hands, fragmentation profiles generated using Qiagens FX technology are highly comparable to Covaris in both reproducibility and fragment size tunability. In the top two figures, you can see that fragmentation of multiple samples using the same FX reaction condition is highly reproducible as reported by both Bioanalyzer and also insert size calculated by downstream analysis.

Likewise, in the bottom left figure, we target three different fragment sizes with each of two input amounts, demonstrating both the tunability of the kit and also the flexibility to accommodate a range of DNA inputs.

Finally, in the lower right, weve treated human genomic DNA and a bacterial DNA mixture designed to have broad G/C content with either a 5 minute or 10 minute reaction to demonstrate consistent results regardless of sample origin. 92

The Olympiome Rio 2016

BeforeDuringAfterOlympiome:Rio -2016Tokyo -2020

ExtremeContext

http:/www.extrememicrobiome.org

Extreme Microbiomes for New Biology and Drug Discovery

Biosynthetic Gene Clusters show new drugs right under our fingertips

South Ferry station

PAB03Metal payphonePAB07Plastic signPAB09Metal stairway rail

PAB031 kb

PAB07

PAB09

Mohamed Donia

Clinical Context:Precision Metagenomics

Evidence of live & antibiotic resistant bacteria

Afshinnekoo E, Meydan C, et al., Cell Systems, 2015.

Metagenomics reveals the likely source of Tetracycline resistance (TetK) on both media

Antibiotic resistance genes

Afshinnekoo E, Meydan C, et al., Cell Systems, 2015.

Examine hospital settings at Chicago (Jack Gilbert) and now at WCMC/MSK

http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1005413

Genomic Classification gives more granularity of species present

Waiting three weeks for a culture is un-ethicalAny sufficiently advanced technology is indistinguishable from magic.Arthur C. Clark

Any sufficiently advanced ignorance is indistinguishable from malice.

Still challenges on the informatics:Organism is different from pathogen.http://read-lab-confederation.github.io/nyc-subway-anthrax-study/

Nonsense mutation106

Free computational tool for anyone with suspicious metagenomic sampleshttps://science.onecodex.com/bacillus-anthracis-panel/

Regardless of source, we should be able to detect what is inside. MetaSUB 2.0 collaborating with CLC

Agreement of tools is close to real number of species in a sample

All Kingdoms Deserve Love In Study and Clinical Practice

All Kingdoms Deserve Pulverizationfor Study and Clinical Practice

MGRG polyzyme summary

Optimizing MoBio kit for better extraction

Mutanolysin Achromopeptidase Chitinase Lysozyme Lysostaphin Lyticase

@StationCDRKelly: Superb Twitter Feed!

Longitudinal, Integrative Systems Biology

Participatory Medicine with Twin Astronauts

Cells only 36 hours after being in orbit

Plasma

PBMCs

Ficollplug

Mononuclear cells^Frozen tubes*

CD4+ cells

CD8+ cells

CD19+ cellsPlasma Lymphocyte depleted cells(LD)

Blood collection processing protocol * All collections (Ground and ISS) one date missed in pre-flight collections^ Performed once on pre-flight samples# Magnetic bead based positive selection

###

Purity validationOn 1/15/15, a CPT tube was obtained from a volunteer donor and subjected to parallel processing with flight subject sample

Isolation efficiency assessment by flow cytometry performed by Dr. Brian Crucian at JSC (CD4, CD8 and CD19 staining)Antibodies: CD8-FITC, human (clone: BW135/80)CD19-PE, human (clone: LT19)CD4-APC, human (clone: M-T466)

PBMCsLymphocyte depleted cells

CD4+ cells(91%)

CD8+ cells(88%)

CD19+ cells(72%)

Nucleic Acid ExtractionQiagen AllPrep Kit for DNA & RNA (#80204)https://www.qiagen.com/us/products/catalog/sample-technologies/rna-sample-technologies/dna-rna-protein/allprep-dnarna-mini-kit

Baseline (10/16/14)Post-vaccination (10/30/14)Flight subject

Ground subjectDNARNA10KbPre-flight collections yielded high quality nucleic acids for studyCD4 CD8 CD19 LD CD4 CD8 CD19 CD4 CD8 CD19 LD 1/15/15Baseline (10/16/14)CD4 CD8 CD19 LD CD4 CD8 CD19 CD4 CD8 CD19 LD bpBaseline (12/3/14)Post-vaccination (12/13/14)1/20/1510KbbpRIN average=9.85; range=8.1-10

CD4 CD8 CD19 LD CD4 CD8 CD19 CD4 CD8 CD19 LD 1/15/15Post-vaccination (10/30/14)

CD4 CD8 CD19 LD CD4 CD8 CD19 CD4 CD8 CD19 LD Baseline (12/3/14)Post-vaccination (12/13/14)1/20/15

28/18S rRNA Quality: Good Going In

QIAGEN MagAttract DNA extraction kit

Leveraging single molecules with the 10X Chromium System

Whole Genome Sequencing (WGS) with phased reads:20-100kb molecules

38-45.3X Sequencing depth22-23K mean molecule length1.4-1.5 M GEMs detected2 lanes, 2x150 HiSeq4000

NA12878 HMW control24X increase in N50 phase block lengthAll Prep #1NA12878

We can see drops in coverage when structural variants (SV) appear

TDG gene transposition

Spliced TDG inserted hereExonsExonic barcode signals

Scott Kelly ISS for one yearMark Kelly Earth control

Telomere Length

DNA Mutations & Structural VariationDNA Hydroxy-methylationChromatin

(small & large)RNA expression& RNA MethylationProteomics

Antibody TitersCytokines

DNA Methylation

B-cells / T-cells

Targeted and Global MetabolomicsMicrobiome

Cognition

Vasculature

These People are Awesome@mason_lab

133

Thanks to the Swabbing Teams! www.pathomap.org/people/

Gratitude to Many People and PlacesIlluminaGary SchrothMarc Van OeneUniv. ChicagoYoav GiladYale UniversityNenad SestanSherman Weissman

FDA/SEQC/Fudan Univ.Leming ShiNIH/UDP/NCBIJean & Danielle Thierry-MiegBaylorJeff RogersMSKCCDanwei HuangfuChristina LeslieRoss Levine

HudsonAlphaShawn LevyMason LabEbrahim AfshinnekooSofia AhsanuddinNoah AlexanderPradeep AmbroseMarjan BozinoskiDhruva ChandramohanSagar ChhangawalaShanin ChowdhuryJorge GandaraFrancine Garrett-BakelmanElizabeth Hnaff Sheng LiAlexa McIntyreCem MeydenLenore PipesDarryl ReevesYogesh SaletorePriyanka VijayCornell/WCMCJason BanfelderScott BlanchardSelina Chen-KiangOlivier ElementoYariv HouvrasSamie JaffreyAri MelnickMargaret RossAdam SiepelEpigenomics Core

Horner LabStacy HornerIcahn/MSSMEric Schadt, Andrew Kasarskis,Joel Dudley, Ali Bashir, Bobby SebraABRFGeorge GrillsScott TigheDon Baldwin

UMMSMaria E Figueroa

AMNHGeorge AmatoMark Sidall@mason_lab

NYUJane CarltonJulia Maritz

138