School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
ARIESMethylation Pre-
processingand Clean up
Geoff Woodward
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Overview
Initial QC Normalisation Batch Correction Data MWAS (Methylome Wide Assoc. Study) Results
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Initial QC
Probe p-value confidence in detection
• background• -ve controls
overall QC indicator• High background• Low signal• Poor stringency
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Initial QC: Control Probes
Mixture of dependent/independent Sample independent
• Staining (Biotin/DNP)• Hybridisation (synthetic target)• Extension (hairpin)
Sample dependent• Bisulfite conversion (HindIII site)• G/T mismatch (non-spec.)• Specificity & Non-polymorphic• Negative
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Initial QC: LIMS
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
LIMS Control DashBoard
Real time Jscript/JSON Zoom & scroll All Illumina controls
probes +ve & -ve
Area Max Median Min
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Intial QC: MDS Start pre-processing
What’s affecting the data?• Failures• controls
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Initial QC: MDS Remove Controls/Failures Remove Sex Chromosomes
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Sample Confirmation
Genotyping 65 SNP probes Kmeans clustering
• Call genotype Cross reference with SNP data Calculate % match
• Fully automated in pipeline• Stored in LIMS
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Normalisation Why?
Cancer vs. Control – not req. More sensitive differences...
Quantile? Rank & scale according to ref dist. (av.)
Not appropriate: Type I & II assays differ
• Medians – opposite ends of β scale• SD (across reps.) smaller in Type I probes• Interrogate different subsets of the genome
– Type II > proportion in open-sea– Type I > proportion in gene promoters
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Normalisation: Method 1
Subset Within Array Normalisation (minfi) To address differences in dist:
• No. of CpGs in probe body indicates density/loc.• Dist. more similar in these groups
Approach• Reference quantiles:
– N random type I & II selected for each group– Split meth/unmeth channels
• Linear interpolation fit probes to ref. Doesn’t treat type I & II separately
• BUT does decrease difference
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Normalisation: Method 2 Touleimat & Tost
To address differences:• CpG region
– Shore / Shelf / Island / Open-sea
• Treat Type I & II separately Approach:
• reference quantiles– Type I used “anchors” for each region– More reliable / lower SD
• estimate target quantiles• Fit type II to target
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Normalisation: Method 3
Dasen (wateRmelon) Under review Separate QN of
• methylated Type I• unmethylated Type I• methylated Type II• unmethylated Type II intensities.
Both directions
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Normalisation: Comparison wateRmelon metrics:
Imprinted DMRs• 237 probes within iDMRs• iDMR e=50% meth.• SE = SD / √ N
– SD of all 237 probes– N = number of samples
iDMRs
Raw 0.00431
Dasen 0.00241
Tost 0.00214
Swan 0.00428
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Normalisation: Comparison
SNP probes• 63 highly polym. SNP probes• K-means clustering into 3 genotypes• SE like measure for each group
AA AB BB
Raw 9.025 e-05 1.910 e-04 5.145 e-05
Dasen 1.669 e-04 2.047 e-04 2.321 e-05
Tost 8.253 e-05 5.242 e-04 1.541 e-04
Swan Na Na na
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Normalisation: Comparison wateRmelon metrics:
X-Chromosome Inactivation• 11,232 probes• T-test all probes for sex differences• ROC analysis
– using p-val for sex diff.
• 1 – AUC – 0 being the perfect predictor & best sex separation
X-Inact.
Raw 0.0947
Dasen 0.0889
Tost 0.0892
Swan 0.4952
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Comparison: Density Plots
Metrics are great but how do they really effect the data?
All typeI typeII
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Comparison: Density Plots
Normalised distributions All typeI typeII
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Comparison: Scatter Plot
Pepsi Plot – you’ll see why! Raw (x) vs. Normalised (y)
• typeI typeII
SWAN Tost dasen
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Comparison: Scatter Plot
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Batch Correction: Exp. Design
Bisulphite Conversion Excess of samples > 48 Redundant controls QC and PCR
MSA4 Plate Well dictates chip position (Robot) Randomised
• Min. 4 of each time point• Max 1 control• Mix of gender
Infinium 450k Chips 12 arrays per chip
Throughput doubled
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Batch Correction: Metadata
LIMS tracking Every process All consumables
• ~20• Formamide to hyb. Buffers• > 1000 used so far!
All equipment• Fridge/centrifuge/PCR block
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Batch Correction What are we seeing?
Bisulphite batch Correction
Many algorithms available• SVD/SVA/DWD
Gene expression
ComBat Chen C, Grennan K, Badner J, Zhang D, Gershon E, et al. (2011) Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods. PLoS ONE 6(2): e17238. doi:10.1371/journal.pone.0017238
Empirical Bayesian framework• Create a model matrix• Supply batch var• Standardise gene-wise
– Least squares approach
• Fits L/S model – find priors• Adjust to empirical parametric priors
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Batch Correction Example data
Batch correct Tost norm. data use M values Convert back to β Values can escape 0-1 limit
• Scale• 0.02% of probes• Dist. unaffected.
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Batch Correction: BEFORE
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Batch Correction: AFTER
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Datasets ARIES pre-release:
Filtered probes SNP probes
Age group n
Cord 584
F7 598
TF3 (15) 64
F17 280
Antenatal 394
FOM 329
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
MWAS
Choice of servers: Epi-garrod BlueCrystal
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Epi-garrod
Request account via IT-services for: epi-garrod.bris.ac.uk
Relatively quiet server in the dept. No queuing system
Check htop before running jobs Cord data requires ~15% RAM
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Epi-garrod
Data: SAN
• Accessible from multiple servers /mnt/sscm3/ARIES_DATA/…
Permissions for this folder You must be a member of the aries group
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Blue Crystal
Request an account via: https://www.acrc.bris.ac.uk/login-area/apply.cgi
Queuing handled Data:
/gpfs/cluster/smed/alspac-shared/aries/… Again, permissions required:
Member of aries group
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Files
ALN_dasen_<<time_code>>_betas.Rdata ALN_tost_<<time_code>>_betas.Rdata <<time_code>>_manifest.Rdata fdata.Rdata MWAS.r
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
ALN_dasen_<<time_code>>_betas.Rdata
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
<<time_code>>_manifest.Rdata
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Fdata_new.RData
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
CpGassoc
CRAN http://cran.r-project.org/web/packages/CpGassoc/index.html
Tests for association between an independent variable and methylation
Option to include additional covariates Assesses significance with:
Holm (step-down Bonferroni) FDR methods
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
MWAS.r
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
MWAS.r continued...
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
MWAS.r continued...
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Manhattan / QQ
Replicated the following studies results: 450K Epigenome-Wide Scan Identifies Differential DNA Methylation in Newborns Related to
Maternal Smoking during Pregnancy.Bonnie R. Joubert, et.al.,
Gene hits: GFI1, AHRR, MYO1G, CYP1A1 "CYP1A1 plays a key role in the aryl hydrocarbon receptor
signaling pathway, which mediates the detoxification of the components of tobacco smoke." - Joubert, et.al.,
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Results file
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
BlueCrystal .bashrc
School ofSOCIAL AND COMMUNITY
MEDICINE
University ofBRISTOL
Any Questions?
Top Related