Tumour karyotype Spectral karyotyping showing chromosomal aberrations in cancer cell lines.
-
date post
18-Dec-2015 -
Category
Documents
-
view
238 -
download
0
Transcript of Tumour karyotype Spectral karyotyping showing chromosomal aberrations in cancer cell lines.
Chromosomal aberrations• Segments of DNA that get duplicated
– Gains
• Segments of DNA that get deleted– Losses
• Chromosomal aberrations are being investigated as diagnostic indicators of cancer and other diseases– Better diagnosis of disease– Potentially reveals biomolecular mechanisms of disease
• This research is done using:– array comparative genomic hybridization (aCGH)– Measures DNA copy number changes
Array CGH
• Array CGH is a genetic technique used to identify chromosomal aberrations in cancer
• High resolution• Full coverage
Array CGH data
• Measures log2 ratios of normal vs sample for pre-specified segments of the genome called clones– Theoretical log2 ratios:
• 1 copy gain (duplication) = log(3/2) = 0.58• Neutral = log (2/2) = 0• 1 copy loss (deletion) = log(1/2) = -1
• Measurement based on detection of hybridization level to probes on an array
• ~30,000 measurements per sample
Computational challenges
• Noisy signals• Spatial dependence between adjacent clones• Outliers
– Systematic errors– Copy number polymorphisms
Our approach• Use a “supervised” hidden Markov model (HMM) to model spatial
dependency between clones– States are loss, neutral, gain-one, gain-many– Infer the unobserved state sequence from the data using a standard
efficient algorithm called for forwards-backwards– This part of the model is standard
Use a Gaussian mixture model to model the outliers separately from the inliers• Inliers have spatial dependence• Outliers do not
Use prior knowledge about locations of CNPs to ‘inform’ the model about possible locations of outliers• Several published lists of CNPs are available• Internally generated list more comprehensive
Pool data across chromosomes to gain “statistical strength”
Test data
• Mantle cell lymphoma cell lines with “ground truth”
• 123 losses and 72 gains covering approximately 1% of the human genome• Compare results to state-of-the-art algorithms
Conclusions• HMM framework superior to MergeLevels
• Adding robustness further improves performance over the ‘Base-HMM’
• Adding LSP information improves performance marginally over robust HMM, but importantly does not make results worse– Motivation for using more comprehensive lists to improve results– Some false positives – are they real?
• Pooling across chromosomes results in big gains– Data more easily overwhelms incorrect priors– Makes the algorithm less sensitive to parameter settings