School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and...

43
School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre- processing and Clean up Geoff Woodward

Transcript of School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and...

Page 1: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

ARIESMethylation Pre-

processingand Clean up

Geoff Woodward

Page 2: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Overview

Initial QC Normalisation Batch Correction Data MWAS (Methylome Wide Assoc. Study) Results

Page 3: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Initial QC

Probe p-value confidence in detection

• background• -ve controls

overall QC indicator• High background• Low signal• Poor stringency

Page 4: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Initial QC: Control Probes

Mixture of dependent/independent Sample independent

• Staining (Biotin/DNP)• Hybridisation (synthetic target)• Extension (hairpin)

Sample dependent• Bisulfite conversion (HindIII site)• G/T mismatch (non-spec.)• Specificity & Non-polymorphic• Negative

Page 5: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Initial QC: LIMS

Page 6: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

LIMS Control DashBoard

Real time Jscript/JSON Zoom & scroll All Illumina controls

probes +ve & -ve

Area Max Median Min

Page 7: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Intial QC: MDS Start pre-processing

What’s affecting the data?• Failures• controls

Page 8: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Initial QC: MDS Remove Controls/Failures Remove Sex Chromosomes

Page 9: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Sample Confirmation

Genotyping 65 SNP probes Kmeans clustering

• Call genotype Cross reference with SNP data Calculate % match

• Fully automated in pipeline• Stored in LIMS

Page 10: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Normalisation Why?

Cancer vs. Control – not req. More sensitive differences...

Quantile? Rank & scale according to ref dist. (av.)

Not appropriate: Type I & II assays differ

• Medians – opposite ends of β scale• SD (across reps.) smaller in Type I probes• Interrogate different subsets of the genome

– Type II > proportion in open-sea– Type I > proportion in gene promoters

Page 11: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Normalisation: Method 1

Subset Within Array Normalisation (minfi) To address differences in dist:

• No. of CpGs in probe body indicates density/loc.• Dist. more similar in these groups

Approach• Reference quantiles:

– N random type I & II selected for each group– Split meth/unmeth channels

• Linear interpolation fit probes to ref. Doesn’t treat type I & II separately

• BUT does decrease difference

Page 12: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Normalisation: Method 2 Touleimat & Tost

To address differences:• CpG region

– Shore / Shelf / Island / Open-sea

• Treat Type I & II separately Approach:

• reference quantiles– Type I used “anchors” for each region– More reliable / lower SD

• estimate target quantiles• Fit type II to target

Page 13: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Normalisation: Method 3

Dasen (wateRmelon) Under review Separate QN of

• methylated Type I• unmethylated Type I• methylated Type II• unmethylated Type II intensities.

Both directions

Page 14: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Normalisation: Comparison wateRmelon metrics:

Imprinted DMRs• 237 probes within iDMRs• iDMR e=50% meth.• SE = SD / √ N

– SD of all 237 probes– N = number of samples

iDMRs

Raw 0.00431

Dasen 0.00241

Tost 0.00214

Swan 0.00428

Page 15: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Normalisation: Comparison

SNP probes• 63 highly polym. SNP probes• K-means clustering into 3 genotypes• SE like measure for each group

AA AB BB

Raw 9.025 e-05 1.910 e-04 5.145 e-05

Dasen 1.669 e-04 2.047 e-04 2.321 e-05

Tost 8.253 e-05 5.242 e-04 1.541 e-04

Swan Na Na na

Page 16: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Normalisation: Comparison wateRmelon metrics:

X-Chromosome Inactivation• 11,232 probes• T-test all probes for sex differences• ROC analysis

– using p-val for sex diff.

• 1 – AUC – 0 being the perfect predictor & best sex separation

X-Inact.

Raw 0.0947

Dasen 0.0889

Tost 0.0892

Swan 0.4952

Page 17: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Comparison: Density Plots

Metrics are great but how do they really effect the data?

All typeI typeII

Page 18: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Comparison: Density Plots

Normalised distributions All typeI typeII

Page 19: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Comparison: Scatter Plot

Pepsi Plot – you’ll see why! Raw (x) vs. Normalised (y)

• typeI typeII

SWAN Tost dasen

Page 20: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Comparison: Scatter Plot

Page 21: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Batch Correction: Exp. Design

Bisulphite Conversion Excess of samples > 48 Redundant controls QC and PCR

MSA4 Plate Well dictates chip position (Robot) Randomised

• Min. 4 of each time point• Max 1 control• Mix of gender

Infinium 450k Chips 12 arrays per chip

Throughput doubled

Page 22: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Batch Correction: Metadata

LIMS tracking Every process All consumables

• ~20• Formamide to hyb. Buffers• > 1000 used so far!

All equipment• Fridge/centrifuge/PCR block

Page 23: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Batch Correction What are we seeing?

Bisulphite batch Correction

Many algorithms available• SVD/SVA/DWD

Gene expression

ComBat Chen C, Grennan K, Badner J, Zhang D, Gershon E, et al. (2011) Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods. PLoS ONE 6(2): e17238. doi:10.1371/journal.pone.0017238

Empirical Bayesian framework• Create a model matrix• Supply batch var• Standardise gene-wise

– Least squares approach

• Fits L/S model – find priors• Adjust to empirical parametric priors

Page 24: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Batch Correction Example data

Batch correct Tost norm. data use M values Convert back to β Values can escape 0-1 limit

• Scale• 0.02% of probes• Dist. unaffected.

Page 25: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Batch Correction: BEFORE

Page 26: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Batch Correction: AFTER

Page 27: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Datasets ARIES pre-release:

Filtered probes SNP probes

Age group n

Cord 584

F7 598

TF3 (15) 64

F17 280

Antenatal 394

FOM 329

Page 28: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

MWAS

Choice of servers: Epi-garrod BlueCrystal

Page 29: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Epi-garrod

Request account via IT-services for: epi-garrod.bris.ac.uk

Relatively quiet server in the dept. No queuing system

Check htop before running jobs Cord data requires ~15% RAM

Page 30: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Epi-garrod

Data: SAN

• Accessible from multiple servers /mnt/sscm3/ARIES_DATA/…

Permissions for this folder You must be a member of the aries group

Page 31: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Blue Crystal

Request an account via: https://www.acrc.bris.ac.uk/login-area/apply.cgi

Queuing handled Data:

/gpfs/cluster/smed/alspac-shared/aries/… Again, permissions required:

Member of aries group

Page 32: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Files

ALN_dasen_<<time_code>>_betas.Rdata ALN_tost_<<time_code>>_betas.Rdata <<time_code>>_manifest.Rdata fdata.Rdata MWAS.r

Page 33: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

ALN_dasen_<<time_code>>_betas.Rdata

Page 34: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

<<time_code>>_manifest.Rdata

Page 35: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Fdata_new.RData

Page 36: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

CpGassoc

CRAN http://cran.r-project.org/web/packages/CpGassoc/index.html

Tests for association between an independent variable and methylation

Option to include additional covariates Assesses significance with:

Holm (step-down Bonferroni) FDR methods

Page 37: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

MWAS.r

Page 38: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

MWAS.r continued...

Page 39: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

MWAS.r continued...

Page 40: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Manhattan / QQ

Replicated the following studies results: 450K Epigenome-Wide Scan Identifies Differential DNA Methylation  in Newborns Related to

Maternal Smoking during Pregnancy.Bonnie R. Joubert, et.al., 

Gene hits: GFI1, AHRR, MYO1G, CYP1A1 "CYP1A1 plays a key role in the aryl hydrocarbon receptor

signaling pathway, which mediates the detoxification of the components of tobacco smoke." - Joubert, et.al.,

Page 41: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Results file

Page 42: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

BlueCrystal .bashrc

Page 43: School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and Clean up Geoff Woodward.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Any Questions?