Cell type deconvolution - GitHub Pages...Blood is a mixture of many cell types Wholebloodcelltypes:...

Cell type deconvolution

Mikhail Dozmorov

Spring 2018

Mikhail Dozmorov Cell type deconvolution Spring 2018 1 / 19

Characterizing Cell-types

Goal: Whole transcriptome (and epigenome) profiles of individualcell-typesProblem: Whole-tissue sample is composed of two or more distinct celltypesExample: Blood samples comprise a mixture of predominantly 5 celltypes

NeutrophilsLymphocytesMonocytesBasophilsEosinophils

Blood is a mixture of many cell types

Whole blood cell types: T cells(CD8T, CD4T, Natural Killer), Bcells, Granulocytes, MonocytesBioconductor data packageavailable:library(FlowSorted.Blood.450k)

Characterizing Cell-types

Technically challenging to measure whole transcriptome expressionfrom single-cellsApproach: Computational deconvolution of cell mixtures

Reference-basedReference-freeSemi-reference-free

Reference-based cell-type deconvolution

Using an existing reference DNA methylation (DNAm) database of celltypes that are thought to be present in the tissue of interestEstimated fractions are relative (can be absolute if the dominant celltype is known)Estimated fractions can then be used as covariates in supervisedmultivariate regression models to infer differentially methylatedcytosines (DMCs) that are independent of changes in cell-typecomposition.

Advantages

Absolute or relative cell-type fractions can be estimated in eachindividual sampleIf required, they can be easily combined with batch-correction methodssuch as COMBATThe model itself is relatively assumption freeMikhail Dozmorov Cell type deconvolution Spring 2018 5 / 19

Reference-based cell-type deconvolution

Using an existing reference DNA methylation (DNAm) database of celltypes that are thought to be present in the tissue of interestEstimated fractions are relative (can be absolute if the dominant celltype is known)Estimated fractions can then be used as covariates in multivariateregression models to infer differentially methylated cytosines (DMCs)that are independent of changes in cell-type composition.

Disadvantages

Require knowledge of the main cell types that are present in the tissue.Reliable reference DNAm profiles must be available for these cell typesCannot deal with unknown confounding factorsAssume that cell–cell interactions in the sample do not affect theDNAm profiles of the individual cell typesReference profiles could be confounded by factors such as age orgenotypeMikhail Dozmorov Cell type deconvolution Spring 2018 6 / 19

Reference-free cell-type deconvolution

Inferring from the full data matrix ‘surrogate variables’, which includesources of data variation that are driven by cell-type compositionThese surrogate variables are inferred from the data without the needfor a reference DNAm database and are used as covariates in the finalsupervised multivariate regression model to infer DMCs that areindependent of changes in cell-type composition and other cofounders

Advantages

There is no requirement to know the main cell types in a tissue or tohave reference DNAm profiles; hence, in principle, they are applicableto any tissue typeDe novo (unsupervised) discovery of novel cell subtypesAllow for the possibility that cell–cell interactions alter the profiles ofindividual cell typesCan adjust simultaneously for other confounding factors, known orunknownMikhail Dozmorov Cell type deconvolution Spring 2018 7 / 19

Reference-free cell-type deconvolution

Inferring from the full data matrix ‘surrogate variables’, which includesources of data variation that are driven by cell-type compositionThese surrogate variables are inferred from the data without the needfor a reference DNAm database and are used as covariates in the finalsupervised multivariate regression model to infer DMCs that areindependent of changes in cell-type composition and other cofounders

Disadvantages

Without further biological input, they cannot provide estimates ofcell-type fractions in individual samplesPerformance is strongly dependent on model assumptions, which areoften not satisfied

Semi-reference-free cell-type deconvolution

Inferring surrogate variables representing variation due to cell-typecomposition but that, unlike a purely ‘reference-free’ approach, does soby using partial prior biological knowledge of which cytosine–guaninedinucleotides (CpGs) differ between cell typesInfer the surrogate variables from the reduced data matrix, projectedon this set of selected features

Advantages

Allow for the possibility that cell–cell interactions alter the DNAmprofiles of individual cell typesIf required, can be combined with batch-correction methods such asCOMBATMore robust to incomplete knowledge of underlying cell types in thetissue of interestCan provide approximate relative estimates of cell-type fractions inindividual samplesMikhail Dozmorov Cell type deconvolution Spring 2018 9 / 19

Semi-reference-free cell-type deconvolution

Inferring surrogate variables representing variation due to cell-typecomposition but that, unlike a purely ‘reference-free’ approach, does soby using partial prior biological knowledge of which cytosine–guaninedinucleotides (CpGs) differ between cell typesInfer the surrogate variables from the reduced data matrix, projectedon this set of selected features

Disadvantages

Performance is still strongly dependent on model assumptions, whichmay not be satisfiedInference of absolute cell-type fractions in individual samples remainschallengingThe ability to resolve highly similar cell types is limited

Computational Deconvolution of cell mixtures

Venet, D., F. Pecasse, C. Maenhaut, and H. Bersini. “Separation ofSamples into Their Constituents Using Gene Expression Data.”Bioinformatics (Oxford, England) 17 Suppl 1 (2001): S279-287.

Proffered a linear relationship: “Any cellular type present in the tissuecontributes differently to the measured expression of a given gene”“. . . start directly from the gene expression data obtained on thecomposite samples to determine mathematically the profile of expressionof the cellular types present.”

M - matrix of heterogeneous measures, genes (rows) by samples(columns)G - cell signatures, genes (rows) by cell types (columns)C - cell proportions, cell types (rows) by samples (columns)

Heterogeneous measures can be expressed as a linear combination of cellsignatures and proportions

Mij =Nct∑k

GikCik

gene i , sample j , number of cell types Nct

In matrix notation

M = GC

Venet, D., F. Pecasse, C. Maenhaut, and H. Bersini. “Separation of Samples into Their Constituents Using Gene ExpressionData.” Bioinformatics (Oxford, England) 17 Suppl 1 (2001): S279-287.

Estimating cell proportions given signature

Abbas et al, 2009Gong et al, 2011Kuhn et al, 2011Qiao et al, 2012Houseman et al, 2012Zhong et al, 2013Liebner et al, 2014Chikina et al, 2015

Gene expression deconvolution

Gene expression deconvolution is an emerging technique analogous toin silico flow cytometry

The deconvolution methods aim to computationally resolve a GEP intoits component cell types (virtual tissue dissection)

Expression DeconvolutionRequires a signature matrix consisting of marker genes and theirexpression valuesAlso requires a biological mixtureThere is also a vector which contains the cell subset of the mixture inthe signature matrix

Modeling Cell Mixtures

Mixtures (X) are a linear combination of signature matrix (S) andconcentration matrix (C)

Xm×n = Sm×k · Ck×n

Previous Work

Coupled Deconvolution: Given: X , infer: S, CNMF. Repsilber, BMC Bioinformatics, 2010Minimum polytope. Schwartz, BMC Bioinformatics, 2010

Estimation of Mixing Proportions: Given: X , S, infer: CQuadratic Programming. Gong, PLoS One, 2012LDA. Qiao, PLoS Comp Bio, 2012

Estimation of Expression Signatures: Given: X , C , infer: ScsSAM. Shen-Orr, Nature Brief Com, 2010

Cell type composition

The epigenome will vary from cell type to cell type. Blood is composedof many cell types.Houseman (2012) BMC Bioinformatics showed that this can (will)confound studies of DNA methylation performed on blood samples.Reinius (2012) PLoS One has flow-sorted blood data on 450k.Obviously, other tissues can be affected. See Guintivano (2013)Epigenetics for brain.

Cell-type deconvolution algorithms

Name DescriptionProgramminglanguage Web links

CP/QP Reference-based method using constrainedprojection

R https://github.com/sjczheng/EpiDISH

RPC Reference-based robust partial correlations R https://github.com/sjczheng/EpiDISH

CIBERSORT Reference-based support vector regressions R https://github.com/sjczheng/EpiDISH

SVA Surrogate variable analysis (reference-free) R www.bioconductor.org/SVApackage

ISVA Independent surrogate variable analysis(reference-free)

R https://cran.r-project.org/package=isva

RefFreeEWAS Reference-free deconvolution R https://cran-r-project.org/package=RefFreeEWAS

RefFreeCellMix Reference-free or semi-reference-free NMF usingrecursive QP

R https://cran-r-project.org/package=RefFreeEWAS

MeDeCom Reference-free or semi-reference-free constrainedand regularized NMF

R http://github.com/lutsik/MeDeCom

EDec Like RefFreeCellMix but applied to breast cancer ortissue

R https://github.com/BRL-BCM/EDec

RUV/RUVm Removing unwanted variation R http://www.bioconductor.org/missMethyl package

CancerLocator Inference of tumour burden and tissue of originfrom plasma cfDNA

Java https://github.com/jasminezhoulab

MethylPurify Tumour purity estimation from WGBS or RRBSdata

Python https://pypi.python.org/pypi/MethylPurify

InfiniumPurify Tumour purity estimation from Illumina Infiniumdata

Python https://bitbucket.org/zhengxiaoqi/

Teschendorff, Andrew E., and Caroline L. Relton. “Statistical and Integrative System-Level Analysis of DNA Methylation Data.”Nature Reviews Genetics, November 13, 2017. https://doi.org/10.1038/nrg.2017.86.

Cell type deconvolution - GitHub Pages...Blood is a mixture of many cell types Wholebloodcelltypes:...

Documents

Transcript of Cell type deconvolution - GitHub Pages...Blood is a mixture of many cell types Wholebloodcelltypes:...

InductionofgutregulatoryCD39 Tcells by teriflunomide ...

Immunomodulatory Activity Monophosphoryl Lipid A …iai.asm.org/content/57/5/1483.full.pdfthe suppressor Tcells ofC3H/HeJmiceare refractory to inactivation byMPLand(ii) someofthe polyclonal

Apparent viscosity - Stanford Universitybiomechanics.stanford.edu/me339_08/evans89.pdf · 2008. 11. 3. · Apparent viscosity and cortical tension of blood granulocytes determined

CLINICAL UTILITIES OF IMMATURE GRANULOCYTES

Components of the Hematologic System - MCCCfalkowl/documents/Bio217F12Unit6Ch... · Components of the Hematologic System ... –Granulocytes (neutrophils, eosinophils, basophils)

Leukocytes Are mobile units of the body’s protective system Granulocytes a.Neutrophils b.Eosinophils c.Basophils Agranulocytes a.Monocytes b.Lymphocytes.

Les granulocytes neutrophiles : morphologie, fonctions et ... · morphologie, fonctions et méthodes de quantification dans l’espèce bovine. Thèse d'exercice, Médecine vétérinaire,

Neuromyelitis Optica: Diagnosis and Treatment · Transverse myelitis Rheumatologic diseases Neuromyelitis optica Multiple sclerosis • Complement, IgG, IgM • Granulocytes • Responsive

Relative Frequencies of Alloantigen-Specific Helper CD4T ...€¦ · Hospitals NHS Foundation Trust, Cambridge, United Kingdom, 4 Laboratory of Lymphocyte Signalling and Development,

Nonviral andviral a immunodeficiency virus gene human Tcells · 11582 Medical Sciences: Woffendin et al. natantand1 mlofAIM-Vmedium.Afterinfection, cells were washedonce by centrifugation

Generation cytolytic Tcells in individuals infected by ... fileThorax 1992;47:695-701 GenerationofcytolyticTcells inindividuals infectedbyMycobacteriumtuberculosis andvaccinatedwithBCG

Comparative Adherence of Granulocytes to Endothelial …dm5migu4zj3pb.cloudfront.net/manuscripts/108000/108981/JCI78108981.pdf · Comparative Adherence ofGranulocytes to Endothelial

Immunofluorescence Catalog 2018 · Kit IFA Human Granulocytes formalin fixed 48 testsAD-PAN60 10 slides x 6 wells AD-PAN48 IFA Human Granulocytes formalin fixed 96 tests 12 slides

Studies Derivation of Transcobalamin III from Granulocytes

INDUCED ANTIBACTERIAL PROTEIN IN INSECTS ......Cells with phagocytic activity usually represent a subpopulation of insect hemocytes. Both granulocytes and Both granulocytes and plasmatocytes

Functional idiotypic and I-J Tcells - okudanaika.comokudanaika.com/pdf/5.pdf · Proc. NatL. Acad. Sci. USA78(1981) 4559 Table2. Immunochemicalcharacterization ofB6-29hybridoma suppressorfactor

images.library.wisc.eduimages.library.wisc.edu/WI/EFacs/transactions/WT1954/reference/wi... · the esophagus were removed, ... eosinophilic granulocytes in areas where the exudate

Differentiated ANCA diagnostics using BIOCHIP Mosaicstypo3.euroimmun.de/fileadmin/template/images/pdf/... · PR3 microdroplets Granulocytes (EOH) Granulocytes (HCHO) MPO microdoplet

Differentiation ofeosinophilic granulocytes ofcarp ...

hydrogen peroxide by granulocytes as a · H2O2-induced decrease in the fluorescence of scopoletin catalyzed by horseradish peroxidase. The dialysate of phagocytosing granulocytes