DDP Stage 2 Presentation
-
Upload
saket-choudhary -
Category
Engineering
-
view
92 -
download
5
description
Transcript of DDP Stage 2 Presentation
![Page 1: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/1.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Pattern Recognition In Clinical Data
Saket ChoudharyDual Degree Project
Guide: Prof. Santosh Noronha
C G C A T C G A G C T
C G C G T C G A G C T
June 30, 2014
![Page 2: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/2.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
OBJECTIVE
Next GenerationSequencing
&Cancer Research
DriverMutationDetection
Bio-markerprediction
usingMicroarray
ViralGenome
Integration
Visualisationtools for
NGS
Bench-marking
Alignmenttools
ReproducibleResearch
![Page 3: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/3.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
OBJECTIVE
Next GenerationSequencing
&Cancer Research
DriverMutationDetection
Bio-markerprediction
usingMicroarray
ViralGenome
Integration
Visualisationtools for
NGS
Bench-marking
Alignmenttools
ReproducibleResearch
![Page 4: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/4.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
OBJECTIVE
Next GenerationSequencing
&Cancer Research
DriverMutationDetection
Bio-markerprediction
usingMicroarray
ViralGenome
Integration
Visualisationtools for
NGS
Bench-marking
Alignmenttools
ReproducibleResearch
![Page 5: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/5.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
OBJECTIVE
Next GenerationSequencing
&Cancer Research
DriverMutationDetection
Bio-markerprediction
usingMicroarray
ViralGenome
Integration
Visualisationtools for
NGS
Bench-marking
Alignmenttools
ReproducibleResearch
![Page 6: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/6.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
OBJECTIVE
Next GenerationSequencing
&Cancer Research
DriverMutationDetection
Bio-markerprediction
usingMicroarray
ViralGenome
Integration
Visualisationtools for
NGS
Bench-marking
Alignmenttools
ReproducibleResearch
![Page 7: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/7.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
OBJECTIVE
Next GenerationSequencing
&Cancer Research
DriverMutationDetection
Bio-markerprediction
usingMicroarray
ViralGenome
Integration
Visualisationtools for
NGS
Bench-marking
Alignmenttools
ReproducibleResearch
![Page 8: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/8.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
OBJECTIVE
Next GenerationSequencing
&Cancer Research
DriverMutationDetection
Bio-markerprediction
usingMicroarray
ViralGenome
Integration
Visualisationtools for
NGS
Bench-marking
Alignmenttools
ReproducibleResearch
![Page 9: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/9.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
WORKFLOWS FOR DRIVER MUTATION DETECTION
I Cancer = Lots of Mutations!I Driver mutations confer selective advantage to the cell,
being selected positively.I Sites of driver mutation are targeted therapeutic sites,
prognosis markers
Problem
I Multiple prediction toolsI Different score-range for predictionI Non overlapping results, non-overlapping formats
AimUnify the various predictions, to help nail down the consensus
![Page 10: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/10.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
WORKFLOWS FOR DRIVER MUTATION DETECTION
I Cancer = Lots of Mutations!I Driver mutations confer selective advantage to the cell,
being selected positively.I Sites of driver mutation are targeted therapeutic sites,
prognosis markers
Problem
I Multiple prediction toolsI Different score-range for predictionI Non overlapping results, non-overlapping formats
AimUnify the various predictions, to help nail down the consensus
![Page 11: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/11.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
WORKFLOWS FOR DRIVER MUTATION DETECTION
I Cancer = Lots of Mutations!I Driver mutations confer selective advantage to the cell,
being selected positively.I Sites of driver mutation are targeted therapeutic sites,
prognosis markers
Problem
I Multiple prediction toolsI Different score-range for predictionI Non overlapping results, non-overlapping formats
AimUnify the various predictions, to help nail down the consensus
![Page 12: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/12.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
WORKFLOWS FOR DRIVER MUTATION DETECTION
I Cancer = Lots of Mutations!I Driver mutations confer selective advantage to the cell,
being selected positively.I Sites of driver mutation are targeted therapeutic sites,
prognosis markers
Problem
I Multiple prediction toolsI Different score-range for predictionI Non overlapping results, non-overlapping formats
AimUnify the various predictions, to help nail down the consensus
![Page 13: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/13.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
WORKFLOWS FOR DRIVER MUTATION DETECTION
I Cancer = Lots of Mutations!I Driver mutations confer selective advantage to the cell,
being selected positively.I Sites of driver mutation are targeted therapeutic sites,
prognosis markers
Problem
I Multiple prediction toolsI Different score-range for predictionI Non overlapping results, non-overlapping formats
AimUnify the various predictions, to help nail down the consensus
![Page 14: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/14.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
WORKFLOWS FOR DRIVER MUTATION DETECTION
I Cancer = Lots of Mutations!I Driver mutations confer selective advantage to the cell,
being selected positively.I Sites of driver mutation are targeted therapeutic sites,
prognosis markers
Problem
I Multiple prediction toolsI Different score-range for predictionI Non overlapping results, non-overlapping formats
AimUnify the various predictions, to help nail down the consensus
![Page 15: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/15.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
WORKFLOWS FOR DRIVER MUTATION DETECTION
I Cancer = Lots of Mutations!I Driver mutations confer selective advantage to the cell,
being selected positively.I Sites of driver mutation are targeted therapeutic sites,
prognosis markers
Problem
I Multiple prediction toolsI Different score-range for predictionI Non overlapping results, non-overlapping formats
AimUnify the various predictions, to help nail down the consensus
![Page 16: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/16.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
WORKFLOWS FOR DRIVER MUTATION DETECTION
Approach
I Wrap the tools in a toolbox using GalaxyI Galaxy is a web based framework for running
bioinformatic workflows, with focus on reproducibility ofthe analyses
I Combine all scores and render it as a heatmap. Easy wayto pick up few target mutations
![Page 17: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/17.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
WORKFLOWS FOR DRIVER MUTATION DETECTION
Approach
I Wrap the tools in a toolbox using GalaxyI Galaxy is a web based framework for running
bioinformatic workflows, with focus on reproducibility ofthe analyses
I Combine all scores and render it as a heatmap. Easy wayto pick up few target mutations
![Page 18: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/18.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
WORKFLOWS FOR DRIVER MUTATION DETECTION
Approach
I Wrap the tools in a toolbox using GalaxyI Galaxy is a web based framework for running
bioinformatic workflows, with focus on reproducibility ofthe analyses
I Combine all scores and render it as a heatmap. Easy wayto pick up few target mutations
![Page 19: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/19.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
A sample workflow :
![Page 20: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/20.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Here is how you visualise:
![Page 21: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/21.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Here is how you focus:
![Page 22: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/22.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Problems:
I No way to run multipletools on a dataset withoutdata-fiddling
I Lack of a way to combinethese predictions
I Irreproducibility =>What cut-offs used tofilter drivers?(much morethan this)
Solutions :
I Run multiple tools(inparallel) on the samedataset
I Combine predictions,visualise, focus
I Perfectly reproducibleanalyses
![Page 23: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/23.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
BIO-MARKER PREDICTION USING MICROARRAY DATA
Problem DefinitionGiven a set of gene expression values of two sets of patients:normal and cancer, predict a small subset of genes that could beused to differentiate these.
![Page 24: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/24.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
MICROARRAYS
Tumortissue
Normaltissue
cDNAcDNA
MicroarrayChip
fluorescent Labeling
Hybridization
![Page 25: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/25.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
MICROARRAY: QUESTIONS WE ARE TRYING TO
ANSWER
Questions
I Given expression data of 17000 genes, which of thesegenes are differentially expressed
I Among the differentially expressed set of genes, whichgenes show maximum association (+/-) with the cohort
I Is there a very small subset(5/10/20...) that can helpdifferentiate the unknown samples
![Page 26: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/26.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
MICROARRAY: QUESTIONS WE ARE TRYING TO
ANSWER
Questions
I Given expression data of 17000 genes, which of thesegenes are differentially expressed
I Among the differentially expressed set of genes, whichgenes show maximum association (+/-) with the cohort
I Is there a very small subset(5/10/20...) that can helpdifferentiate the unknown samples
![Page 27: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/27.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
MICROARRAY: QUESTIONS WE ARE TRYING TO
ANSWER
Questions
I Given expression data of 17000 genes, which of thesegenes are differentially expressed
I Among the differentially expressed set of genes, whichgenes show maximum association (+/-) with the cohort
I Is there a very small subset(5/10/20...) that can helpdifferentiate the unknown samples
![Page 28: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/28.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
PRE-PROCESSING[STANDARD WORKFLOW]
Raw Data BackgroundCorrection
Normalization
Differentially Expressed
![Page 29: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/29.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
BACKGROUND CORRECTION
Raw Data BackgroundCorrection
Normalization
Differentially Expressed
![Page 30: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/30.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
BACKGROUND CORRECTION
The Need
I Microarray spot intensities have two components:foreground + background
I Background may arise due to non-specific bindingI Important step to correct for ambient intensity around a
spot
![Page 31: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/31.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
BACKGROUND CORRECTION
The Need
I Microarray spot intensities have two components:foreground + background
I Background may arise due to non-specific bindingI Important step to correct for ambient intensity around a
spot
![Page 32: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/32.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
BACKGROUND CORRECTION
The Need
I Microarray spot intensities have two components:foreground + background
I Background may arise due to non-specific bindingI Important step to correct for ambient intensity around a
spot
![Page 33: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/33.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
BACKGROUND CORRECTION
The Need
I Microarray spot intensities have two components:foreground + background
I Background may arise due to non-specific bindingI Important step to correct for ambient intensity around a
spot
![Page 34: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/34.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Naive approach: Subtract background intensities from theforegroundWhat’s not right?: How does one interpret negativeintensities?(Loss of information + bias)[Remember, backgroundis itself measured from the nearby spots and not that one spotdirectly]Alternate:
I Model observed [foreground-background] as sum ofexponential (true) and normal (random noise)
S = B + T + Sb (1)
S = foreground,Sb = backgroundT = True signal
![Page 35: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/35.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
B = Random noise We model S− Sb [observed intensity]
T ∼ 1α
exp−tα
(2)
t > 0,
B ∼ N (µ, σ2) (3)
µ, σ, α are unknowns[Details later]
![Page 36: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/36.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
NORMALIZATION
Raw DataPre
processing Normalization
Differentially Expressed
![Page 37: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/37.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
NORMALISATION
The Need
I The expression levels of majority genes should be the sameacross arrays. This should be reflected in the overallintensity
I Adjust for effects arising due to array-to-arraymanufacture differences, different amounts of dye,different amount of hybridising sample etc
Objective
I Overall distribution of expression levels across arraysshould be similar
![Page 38: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/38.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
NORMALISATION
The Need
I The expression levels of majority genes should be the sameacross arrays. This should be reflected in the overallintensity
I Adjust for effects arising due to array-to-arraymanufacture differences, different amounts of dye,different amount of hybridising sample etc
Objective
I Overall distribution of expression levels across arraysshould be similar
![Page 39: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/39.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
NORMALISATION
The Need
I The expression levels of majority genes should be the sameacross arrays. This should be reflected in the overallintensity
I Adjust for effects arising due to array-to-arraymanufacture differences, different amounts of dye,different amount of hybridising sample etc
Objective
I Overall distribution of expression levels across arraysshould be similar
![Page 40: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/40.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Quantile Normalization
I Associate the highest value of dataset X to highest value ofdataset Y, and so on...
I A Q-Q plot, thereafter would be a perfect diagonal
![Page 41: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/41.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Quantile Normalization
I Associate the highest value of dataset X to highest value ofdataset Y, and so on...
I A Q-Q plot, thereafter would be a perfect diagonal
![Page 42: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/42.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
NORMALIZATION
Figure: Raw intensities Figure: Normalized intensities
![Page 43: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/43.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
DIFFERENTIAL EXPRESSION
Raw Data Backgroundcorrection
Normalization
Differentially Expressed
![Page 44: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/44.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
DIFFERENTIAL EXPRESSION I
Hypothesis
H0: Gene X is not differentially expressed[Expression levels inthe two cohorts are same]H1: Gene X is differentially expressed[up/down regulated]
I This is tested for multiple genes.[17000 of them].I Any test statistic employed should be able to control for
multiple testing. [Details later]
![Page 45: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/45.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
DIFFERENTIAL EXPRESSION IIWe use a modified version of t-test. [Details later]t-test :
zi =xi
C − xiD
si(4)
si =
√sc2
iNC
+sd2
iND
(5)
where sci and sdi are the standard deviations with sample sizesNC and ND for the control and disease respectively.This zi statistic follows a t-distribution:
zi ∼ ti (6)
The associated p-value is given by:
![Page 46: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/46.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
DIFFERENTIAL EXPRESSION III
p− value = 2 ∗ P(ti ≥ |zi|) (7)
![Page 47: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/47.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
SO FAR..
Raw Data Backgroundcorrection
Normalization
Differentially Expressed
CorrespondenceAnalysis
![Page 48: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/48.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
DIMENSIONALITY REDUCTION
The Need
I The list of differentially expressed genes is too long,interpretation still not trivial
I How does one infer associations between the geneexpressions and the cohorts?
I p-values are not indicative of associationsI log fold changes are (Ratio of average expression over
cohorts ) biologically important, they are already part ofthis long sublist, hence uninformative post the filteringstep.
![Page 49: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/49.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
DIMENSIONALITY REDUCTION
The Need
I The list of differentially expressed genes is too long,interpretation still not trivial
I How does one infer associations between the geneexpressions and the cohorts?
I p-values are not indicative of associationsI log fold changes are (Ratio of average expression over
cohorts ) biologically important, they are already part ofthis long sublist, hence uninformative post the filteringstep.
![Page 50: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/50.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
DIMENSIONALITY REDUCTION
The Need
I The list of differentially expressed genes is too long,interpretation still not trivial
I How does one infer associations between the geneexpressions and the cohorts?
I p-values are not indicative of associationsI log fold changes are (Ratio of average expression over
cohorts ) biologically important, they are already part ofthis long sublist, hence uninformative post the filteringstep.
![Page 51: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/51.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
DIMENSIONALITY REDUCTION
The Need
I The list of differentially expressed genes is too long,interpretation still not trivial
I How does one infer associations between the geneexpressions and the cohorts?
I p-values are not indicative of associationsI log fold changes are (Ratio of average expression over
cohorts ) biologically important, they are already part ofthis long sublist, hence uninformative post the filteringstep.
![Page 52: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/52.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Approach
I Project data in higher dimension(2000+ at times) to a lowerdimension
I The data in lower-dimension should be a reflective of thehigher-dimension data
I Try to determine that subset of genes that revealinformation between the expression levels and associatedcohort
I Try to avoid any kind of model assumptions
![Page 53: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/53.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Approach
I Project data in higher dimension(2000+ at times) to a lowerdimension
I The data in lower-dimension should be a reflective of thehigher-dimension data
I Try to determine that subset of genes that revealinformation between the expression levels and associatedcohort
I Try to avoid any kind of model assumptions
![Page 54: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/54.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Approach
I Project data in higher dimension(2000+ at times) to a lowerdimension
I The data in lower-dimension should be a reflective of thehigher-dimension data
I Try to determine that subset of genes that revealinformation between the expression levels and associatedcohort
I Try to avoid any kind of model assumptions
![Page 55: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/55.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Approach
I Project data in higher dimension(2000+ at times) to a lowerdimension
I The data in lower-dimension should be a reflective of thehigher-dimension data
I Try to determine that subset of genes that revealinformation between the expression levels and associatedcohort
I Try to avoid any kind of model assumptions
![Page 56: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/56.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Approach
I Project data in higher dimension(2000+ at times) to a lowerdimension
I The data in lower-dimension should be a reflective of thehigher-dimension data
I Try to determine that subset of genes that revealinformation between the expression levels and associatedcohort
I Try to avoid any kind of model assumptions
![Page 57: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/57.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
CORRESPONDENCE ANALYSIS
Underlying hypothesis
There is no association between the rows[genes] andcolumns[samples]
I Project data to first 2 or 3 informative coordinatesI Treats rows(genes) and columns(samples) equivalentlyI Attempts to separate dissimilar objects from each
other(both genes and samples simultaneously)I Unlike the more famous PCA, reveals the association
between genes and samples(biplots)
![Page 58: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/58.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
CORRESPONDENCE ANALYSIS
Underlying hypothesis
There is no association between the rows[genes] andcolumns[samples]
I Project data to first 2 or 3 informative coordinatesI Treats rows(genes) and columns(samples) equivalentlyI Attempts to separate dissimilar objects from each
other(both genes and samples simultaneously)I Unlike the more famous PCA, reveals the association
between genes and samples(biplots)
![Page 59: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/59.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
CORRESPONDENCE ANALYSIS
Underlying hypothesis
There is no association between the rows[genes] andcolumns[samples]
I Project data to first 2 or 3 informative coordinatesI Treats rows(genes) and columns(samples) equivalentlyI Attempts to separate dissimilar objects from each
other(both genes and samples simultaneously)I Unlike the more famous PCA, reveals the association
between genes and samples(biplots)
![Page 60: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/60.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
CORRESPONDENCE ANALYSIS
Underlying hypothesis
There is no association between the rows[genes] andcolumns[samples]
I Project data to first 2 or 3 informative coordinatesI Treats rows(genes) and columns(samples) equivalentlyI Attempts to separate dissimilar objects from each
other(both genes and samples simultaneously)I Unlike the more famous PCA, reveals the association
between genes and samples(biplots)
![Page 61: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/61.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
CLUSTERING
![Page 62: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/62.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
INTERPRETING BIPLOTSThe output of a CA is a biplot:
![Page 63: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/63.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
INTERPRETING BIPLOTS
I The distance on biplot are proportional to χ2 distances inthe original higher dimension
I The farther away a point is from the centroid, the higher isthat row’s contribution to the value of statistic
I Associations between the rows and columns is given bythe angle made by lines joining the centroid to thepoints(acute=positive, right=no association)
I Thus we focus on points along the end of the axes. Positiveregulation is indicated by genes appearing in the upperhalf.
![Page 64: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/64.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
INTERPRETING BIPLOTS
I The distance on biplot are proportional to χ2 distances inthe original higher dimension
I The farther away a point is from the centroid, the higher isthat row’s contribution to the value of statistic
I Associations between the rows and columns is given bythe angle made by lines joining the centroid to thepoints(acute=positive, right=no association)
I Thus we focus on points along the end of the axes. Positiveregulation is indicated by genes appearing in the upperhalf.
![Page 65: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/65.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
INTERPRETING BIPLOTS
I The distance on biplot are proportional to χ2 distances inthe original higher dimension
I The farther away a point is from the centroid, the higher isthat row’s contribution to the value of statistic
I Associations between the rows and columns is given bythe angle made by lines joining the centroid to thepoints(acute=positive, right=no association)
I Thus we focus on points along the end of the axes. Positiveregulation is indicated by genes appearing in the upperhalf.
![Page 66: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/66.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
INTERPRETING BIPLOTS
I The distance on biplot are proportional to χ2 distances inthe original higher dimension
I The farther away a point is from the centroid, the higher isthat row’s contribution to the value of statistic
I Associations between the rows and columns is given bythe angle made by lines joining the centroid to thepoints(acute=positive, right=no association)
I Thus we focus on points along the end of the axes. Positiveregulation is indicated by genes appearing in the upperhalf.
![Page 67: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/67.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
INTERPRETING BIPLOTS
I The distance on biplot are proportional to χ2 distances inthe original higher dimension
I The farther away a point is from the centroid, the higher isthat row’s contribution to the value of statistic
I Associations between the rows and columns is given bythe angle made by lines joining the centroid to thepoints(acute=positive, right=no association)
I Thus we focus on points along the end of the axes. Positiveregulation is indicated by genes appearing in the upperhalf.
![Page 68: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/68.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
INTERPRETING BIPLOTS
In PCA the distance between the projected points areeuclidean, whereas CA takes into account the chi-squareddistances. This is relevant here, since we are dealing withexpression values and we are concerned with the levels andnot the absolute values. for example consider :
CA vs PCAA = 1, 2, 3B = 10, 25, 34
Are A,B related/same?
![Page 69: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/69.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
SO FAR..
Raw Data Backgroundcorrection
Normalization
Differentially Expressed
CorrespondenceAnalysis
FeatureExtraction
![Page 70: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/70.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
FEATURE EXTRACTION & CLASSIFICATION
The Need
I Given the shortlist of genes showing association with thecohorts, we need to identify the subset of most informativegenes
I CA does not answer this question. A panel of genes allexhibiting positive/negative association with the cohortsmight not be too informative collectively
I Genes whose expression levels are themselves correlated,being in the same panel are less informative
![Page 71: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/71.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
FEATURE EXTRACTION & CLASSIFICATION
The Need
I Given the shortlist of genes showing association with thecohorts, we need to identify the subset of most informativegenes
I CA does not answer this question. A panel of genes allexhibiting positive/negative association with the cohortsmight not be too informative collectively
I Genes whose expression levels are themselves correlated,being in the same panel are less informative
![Page 72: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/72.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
FEATURE EXTRACTION & CLASSIFICATION
The Need
I Given the shortlist of genes showing association with thecohorts, we need to identify the subset of most informativegenes
I CA does not answer this question. A panel of genes allexhibiting positive/negative association with the cohortsmight not be too informative collectively
I Genes whose expression levels are themselves correlated,being in the same panel are less informative
![Page 73: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/73.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
FEATURE EXTRACTION & CLASSIFICATION
The Need
I Given the shortlist of genes showing association with thecohorts, we need to identify the subset of most informativegenes
I CA does not answer this question. A panel of genes allexhibiting positive/negative association with the cohortsmight not be too informative collectively
I Genes whose expression levels are themselves correlated,being in the same panel are less informative
![Page 74: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/74.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Approach
I Choose a classification algorithmI Start with all features, determine the coefficients for the
modelI Eliminate the least informative featureI Re-train the model, cross validateI Repeat till you end up with required set of features
![Page 75: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/75.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Approach
I Choose a classification algorithmI Start with all features, determine the coefficients for the
modelI Eliminate the least informative featureI Re-train the model, cross validateI Repeat till you end up with required set of features
![Page 76: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/76.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Approach
I Choose a classification algorithmI Start with all features, determine the coefficients for the
modelI Eliminate the least informative featureI Re-train the model, cross validateI Repeat till you end up with required set of features
![Page 77: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/77.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Approach
I Choose a classification algorithmI Start with all features, determine the coefficients for the
modelI Eliminate the least informative featureI Re-train the model, cross validateI Repeat till you end up with required set of features
![Page 78: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/78.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Approach
I Choose a classification algorithmI Start with all features, determine the coefficients for the
modelI Eliminate the least informative featureI Re-train the model, cross validateI Repeat till you end up with required set of features
![Page 79: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/79.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Approach
I Choose a classification algorithmI Start with all features, determine the coefficients for the
modelI Eliminate the least informative featureI Re-train the model, cross validateI Repeat till you end up with required set of features
![Page 80: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/80.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
SVM
I Search for a hyperplane that best separates the data,maximising the margin of separation
I Data is assumed to be linearly separable (can be made towork irrespective of that)
I Given the high dimension of input, it is safe to assume thatat that number of dimensions our data is linearly separable
![Page 81: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/81.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
SVM
I Search for a hyperplane that best separates the data,maximising the margin of separation
I Data is assumed to be linearly separable (can be made towork irrespective of that)
I Given the high dimension of input, it is safe to assume thatat that number of dimensions our data is linearly separable
![Page 82: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/82.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
SVM
I Search for a hyperplane that best separates the data,maximising the margin of separation
I Data is assumed to be linearly separable (can be made towork irrespective of that)
I Given the high dimension of input, it is safe to assume thatat that number of dimensions our data is linearly separable
![Page 83: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/83.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
SVM
I Search for a hyperplane that best separates the data,maximising the margin of separation
I Data is assumed to be linearly separable (can be made towork irrespective of that)
I Given the high dimension of input, it is safe to assume thatat that number of dimensions our data is linearly separable
![Page 84: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/84.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
SVM
Recursive feature elimination with k-fold cross validationI Determine the rankings of each feature by training a SVM
on given dataI Randomly partition data in k equally sized subsetsI The data with n feature is trained on k− 1 subsets and
validated using the remaining 1 set.I this training process is repeated k times, such that each of
the k subsamples are used exactly once as validationdataset
I These k results are then averaged for determining thespecificity
I Eliminate the feature with least weight and repeat
![Page 85: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/85.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
SVM
Recursive feature elimination with k-fold cross validationI Determine the rankings of each feature by training a SVM
on given dataI Randomly partition data in k equally sized subsetsI The data with n feature is trained on k− 1 subsets and
validated using the remaining 1 set.I this training process is repeated k times, such that each of
the k subsamples are used exactly once as validationdataset
I These k results are then averaged for determining thespecificity
I Eliminate the feature with least weight and repeat
![Page 86: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/86.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
SVM
Recursive feature elimination with k-fold cross validationI Determine the rankings of each feature by training a SVM
on given dataI Randomly partition data in k equally sized subsetsI The data with n feature is trained on k− 1 subsets and
validated using the remaining 1 set.I this training process is repeated k times, such that each of
the k subsamples are used exactly once as validationdataset
I These k results are then averaged for determining thespecificity
I Eliminate the feature with least weight and repeat
![Page 87: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/87.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
SVM
Recursive feature elimination with k-fold cross validationI Determine the rankings of each feature by training a SVM
on given dataI Randomly partition data in k equally sized subsetsI The data with n feature is trained on k− 1 subsets and
validated using the remaining 1 set.I this training process is repeated k times, such that each of
the k subsamples are used exactly once as validationdataset
I These k results are then averaged for determining thespecificity
I Eliminate the feature with least weight and repeat
![Page 88: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/88.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
SVM
Recursive feature elimination with k-fold cross validationI Determine the rankings of each feature by training a SVM
on given dataI Randomly partition data in k equally sized subsetsI The data with n feature is trained on k− 1 subsets and
validated using the remaining 1 set.I this training process is repeated k times, such that each of
the k subsamples are used exactly once as validationdataset
I These k results are then averaged for determining thespecificity
I Eliminate the feature with least weight and repeat
![Page 89: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/89.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
SVM
Recursive feature elimination with k-fold cross validationI Determine the rankings of each feature by training a SVM
on given dataI Randomly partition data in k equally sized subsetsI The data with n feature is trained on k− 1 subsets and
validated using the remaining 1 set.I this training process is repeated k times, such that each of
the k subsamples are used exactly once as validationdataset
I These k results are then averaged for determining thespecificity
I Eliminate the feature with least weight and repeat
![Page 90: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/90.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
SVM
Recursive feature elimination with k-fold cross validationI Determine the rankings of each feature by training a SVM
on given dataI Randomly partition data in k equally sized subsetsI The data with n feature is trained on k− 1 subsets and
validated using the remaining 1 set.I this training process is repeated k times, such that each of
the k subsamples are used exactly once as validationdataset
I These k results are then averaged for determining thespecificity
I Eliminate the feature with least weight and repeat
![Page 91: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/91.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
CONCLUSIONS
I Developed a whole workflow to arrive at the final list ofbio-markers
I Need to be tested for biological significance, previousliterature reports
I Results generated dynamically, perfectly reproducible
![Page 92: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/92.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
CONCLUSIONS
I Developed a whole workflow to arrive at the final list ofbio-markers
I Need to be tested for biological significance, previousliterature reports
I Results generated dynamically, perfectly reproducible
![Page 93: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/93.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
CONCLUSIONS
I Developed a whole workflow to arrive at the final list ofbio-markers
I Need to be tested for biological significance, previousliterature reports
I Results generated dynamically, perfectly reproducible
![Page 94: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/94.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
CONCLUSIONS
I Developed a whole workflow to arrive at the final list ofbio-markers
I Need to be tested for biological significance, previousliterature reports
I Results generated dynamically, perfectly reproducible
![Page 95: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/95.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
VISUALISATION TOOLS
The power of the unaided mind is highly overrated. The realpowers come from devising external aids that enhance
cognitive abilities. Donald Norman
![Page 96: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/96.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
PHRED SCORE VIEWER
fastq format
@SEQIDGATTTGGGGTTCAAA+!”*((((***+))
![Page 97: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/97.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
PHRED SCORE VIEWER
![Page 98: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/98.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Need/Motivation
I Cross-platform viewer for visualising the quality of fastqreads
I No commands required, user-friendly for biologists
![Page 99: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/99.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Need/Motivation
I Cross-platform viewer for visualising the quality of fastqreads
I No commands required, user-friendly for biologists
![Page 100: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/100.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Need/Motivation
I Cross-platform viewer for visualising the quality of fastqreads
I No commands required, user-friendly for biologists
![Page 101: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/101.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
HUMAN GENETIC VARIATION VIEWER
![Page 102: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/102.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
HUMAN GENETIC VARIATION VIEWER
Need/Motivation
I Comprehensive visualisation of catalogue of proteinvariants
I Could be used to discover patterns with respect tomutation sites, frequency
![Page 103: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/103.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
HUMAN GENETIC VARIATION VIEWER
Need/Motivation
I Comprehensive visualisation of catalogue of proteinvariants
I Could be used to discover patterns with respect tomutation sites, frequency
![Page 104: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/104.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
HUMAN GENETIC VARIATION VIEWER
Need/Motivation
I Comprehensive visualisation of catalogue of proteinvariants
I Could be used to discover patterns with respect tomutation sites, frequency
![Page 105: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/105.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
NEXT GENERATION SEQUENCING
![Page 106: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/106.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
VIRAL GENOME DETECTION
Cervical cancers have been proven to be associated withHuman Papillomavirus(HPV)Cervical cancer datasets from Indian women was put throughan analysis to detect :
1. Any possible HPV integration2. Sites of HPV integration
Who Cares?I PrognosisI Replacing whole genome sequencing, by targeted
sequencing at the sites where these virus have beendetected in a cohort of samples, thus speeding up thewhole process.
![Page 107: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/107.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
VIRAL GENOME DETECTION
Cervical cancers have been proven to be associated withHuman Papillomavirus(HPV)Cervical cancer datasets from Indian women was put throughan analysis to detect :
1. Any possible HPV integration2. Sites of HPV integration
Who Cares?I PrognosisI Replacing whole genome sequencing, by targeted
sequencing at the sites where these virus have beendetected in a cohort of samples, thus speeding up thewhole process.
![Page 108: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/108.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
VIRAL GENOME DETECTION
Cervical cancers have been proven to be associated withHuman Papillomavirus(HPV)Cervical cancer datasets from Indian women was put throughan analysis to detect :
1. Any possible HPV integration2. Sites of HPV integration
Who Cares?I PrognosisI Replacing whole genome sequencing, by targeted
sequencing at the sites where these virus have beendetected in a cohort of samples, thus speeding up thewhole process.
![Page 109: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/109.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Figure: Detecting Virus Genomes
![Page 110: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/110.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
Figure: Aligned HPV genomes
![Page 111: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/111.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
BWA V/S BWA-PSSM I
BWA-PSSM is uses quality score matrices to improve thealignment.@readACT+IIIAssuming Sanger encoded quality scores, all the base positionshave a phred score of (73-33=40) . Given an error model of thesequencing platform, it is possible to come up with a matrixlike:
A T G CATGC
![Page 112: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/112.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
BWA V/S BWA-PSSM II
for all possible phred scores, which assigns to each possiblescore and a given nuclotide a score given by (i,j), emphasizingthe probability that an observed nucleotide by the sequencer isindeed the same nucleotide
I Simulate genomes with different error rates andinsertion-deletion ratios
I Simulate reads from the genomesI Align reads to reference
A ROC curve can be plotted since the number of reads that areexpected to match is known apriori.
![Page 113: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/113.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
BWA V/S BWA-PSSM III
Figure: ROC curve for BWA v/s BWA-PSSM mappings
![Page 114: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/114.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
WRAP UP
I Developed a toolbox for driver mutation prediction.I Open SourcedI Deployed to be used by community
I Predicted a set of bio-markers for GliomaI Pending validation (literature, biological)
I Determined presence of HPV sequences in Cervicalcancers
I Tools for VisualisationI Phred quality viewerI Human Genetic Variation Viewer
![Page 115: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/115.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
WRAP UP
I Developed a toolbox for driver mutation prediction.I Open SourcedI Deployed to be used by community
I Predicted a set of bio-markers for GliomaI Pending validation (literature, biological)
I Determined presence of HPV sequences in Cervicalcancers
I Tools for VisualisationI Phred quality viewerI Human Genetic Variation Viewer
![Page 116: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/116.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
WRAP UP
I Developed a toolbox for driver mutation prediction.I Open SourcedI Deployed to be used by community
I Predicted a set of bio-markers for GliomaI Pending validation (literature, biological)
I Determined presence of HPV sequences in Cervicalcancers
I Tools for VisualisationI Phred quality viewerI Human Genetic Variation Viewer
![Page 117: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/117.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
WRAP UP
I Developed a toolbox for driver mutation prediction.I Open SourcedI Deployed to be used by community
I Predicted a set of bio-markers for GliomaI Pending validation (literature, biological)
I Determined presence of HPV sequences in Cervicalcancers
I Tools for VisualisationI Phred quality viewerI Human Genetic Variation Viewer
![Page 118: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/118.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
WRAP UP
I Developed a toolbox for driver mutation prediction.I Open SourcedI Deployed to be used by community
I Predicted a set of bio-markers for GliomaI Pending validation (literature, biological)
I Determined presence of HPV sequences in Cervicalcancers
I Tools for VisualisationI Phred quality viewerI Human Genetic Variation Viewer
![Page 119: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/119.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
WRAP UP
I Developed a toolbox for driver mutation prediction.I Open SourcedI Deployed to be used by community
I Predicted a set of bio-markers for GliomaI Pending validation (literature, biological)
I Determined presence of HPV sequences in Cervicalcancers
I Tools for VisualisationI Phred quality viewerI Human Genetic Variation Viewer
![Page 120: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/120.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
WRAP UP
I Developed a toolbox for driver mutation prediction.I Open SourcedI Deployed to be used by community
I Predicted a set of bio-markers for GliomaI Pending validation (literature, biological)
I Determined presence of HPV sequences in Cervicalcancers
I Tools for VisualisationI Phred quality viewerI Human Genetic Variation Viewer
![Page 121: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/121.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
APPENDIX
Appendix
![Page 122: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/122.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
DIFFERENTIAL EXPRESSION STATISTICS ISmyth et al. suggested linear models for modelling microarrayexperiments. N set of samples, gene g with gene expressionlevel yg :
yTg = (yg1, yg2, ..., ygn) (8)
E(yg) = Xαg (9)
Where X is the design matrix and αg is an unknown coefficientvector.
var(yg) = Wgσ2g (10)
where Wg is a weight matrix, and σ2g represents unknown
genewise variance. Consider βg as the log-fold change for geneg.
![Page 123: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/123.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
DIFFERENTIAL EXPRESSION STATISTICS IIAssume the contrast to be tested is βg = cTαg where cT is acontrast matrix like X. Since αg is unknown, given the responsevectors and X it is possible to fit a linear model to obtain anestimate of coefficient vector as αg such that the covariance isgiven by:
var(αg) = Vgσ2g (11)
where Vg is independent from σ2g and is positive definite.
Thus the estimate of βg is given by βg = cTαg Assuming βg to benormally distributed without forcing the normal distributionon yg. βg is assumed to be normally distributed with mean βgand can be approximated as :
βg|βg, σ2g ∼ N (βg, vgσ
2) (12)
![Page 124: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/124.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
DIFFERENTIAL EXPRESSION STATISTICS III
wherevg = cTVgc (13)
the variance s2g is assumed to follow a scaled χ2 distribution.
s2g|σ2
g ∼σ2
g
dgχ2
dg(14)
where dg represents the residual degrees of freedom for gene g.Under the above assumptions, the statistic tg follows at-distribution with dg degrees of freedom:
tg =βg
sg√vg
![Page 125: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/125.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
DIFFERENTIAL EXPRESSION STATISTICS IVInformation Pooling:Given we are fitting linear models to thousands of genes, wecould make use of this parallel structure fitting same model tothe gene. We focus on βgj and σg using a prior distributionmodel to focus how they change across genes :
1σ2
g=
1d0s2
0χ2
d0 (15)
Let pj = proportion of differentially expressed genes :
P(βgj 6= 0) = pj (16)
Thus updating our prior information(prio obs. equals zero withvariance v0):
βgj|σ2g, βgj 6= 0 ∼ N(0, v0σ
2g) (17)
![Page 126: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/126.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
DIFFERENTIAL EXPRESSION STATISTICS V
Posterior mean of 1σ2
gis given by 1
s2g:
s2g =
d0s20 + dgs2
g
d0 + dg(18)
Thus the moderated t-statistic :
tgj =βgj
sg√vgj
(19)
has d0 + dg degrees of freedom.
![Page 127: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/127.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
CORRESPONDENCE ANALYSIS I
Let N = IxJ denote the data matrix. Converting the N matrix toP such that:
P =N∑
i∑
j nij(20)
The row masses are represented by:
ri =
J∑j=1
pij (21)
The column masses are represented by:
cj =
I∑i=1
pij (22)
![Page 128: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/128.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
CORRESPONDENCE ANALYSIS IIFor row and column masses, the diagonals are given by:
Dr = diag(r) (23)
Dc = diag(c) (24)
Distance between two rows i and i′ is given by:
d2(i, i′) =
J∑j=1
1cj
(pij
ri−
ni′j
r′i)2 (25)
Euclidean distances weighted by the inverse of thecorresponding frequency, hence standardized variance-wise.Even if the rows i and i′ are replaced by their sum of rows, thendistances between columns would not change.The inertia for ith row profile is thus defined as:
![Page 129: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/129.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
CORRESPONDENCE ANALYSIS III
Rowinertia = Rowmass ∗ Squareofdistancefromthecentroidoftherows(26)
The underlying hypothesis for CA is that the rows and columnsare independent. In a contingency table the theoretical value ofa cell at (i, j) is given by, assuming the above hypothesis is true :
Ei,j = ri ∗ cj (27)
However the observed value at (i, j) is pij. Thus the Chi-squaredistance is alculated as :
χ2 = nJ∑
j=1
I∑i=1
(pij − ricj)2
ricj(28)
Consider the centroid z of the row vector points:
![Page 130: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/130.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
CORRESPONDENCE ANALYSIS IV
z = [c1, c2, ...., cJ] (29)
The distance between any ith row and it’s centroid is given by,using the distance relation between rows from above:
d2iz =
J∑j=i
(pijri− cj)
2
cj(30)
which can be rewritten in terms of the centroid µij = ricj as:
d2iz =
1ri
J∑j=i
(pij − µij)2
µij(31)
Thus row inertia:
![Page 131: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/131.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
CORRESPONDENCE ANALYSIS V
rid2iz =
J∑j=i
(pij − µij)2
µij(32)
The column inertia can be defined similarly.Consider the residual matrix S:
Sij = |pij − µij√µij| (33)
In order to decompose S to lower dimensions consider SVDdecomposition of S:
S = UDαVT (34)
where U,V are orthonormal VVT = 1 and UUT = 1 and Dα is adiagonal matrix with entries in descending order as λ1, λ2,....
![Page 132: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/132.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
CORRESPONDENCE ANALYSIS VIThe scores of the rows is then given by:
F = D−12
r UDα (35)
and the column scores are given by:
G = D−12
c VDα (36)
The dimension of these score matrices is min(I − 1, J − 1) andessentially represent the coordinates of these row vectors in thehigher-dimensional subspace.Points in this space are so arranged that the euclidean distancesbetween two points corresponds to the Chi-square distance inthe original matrix.In order to quantify the amount of inertia represented by thisplot, we consider the following score:
![Page 133: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/133.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
CORRESPONDENCE ANALYSIS VII
φ2 =
I∑i=1
rid2iz (37)
and the amount of inertia captured by he first two principalaxes is given by:
λ21 + λ2
2φ2 (38)
![Page 134: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/134.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
SVM I
Support Vector Machines are binary classifiers. Given atraining set of (points,labels) (xi, yi) where xi ∈ R and y ∈ −1, 1]. The idea is to search for a hyperplane that would separate thepoints with yi = 1 from yi = −1. There could be multiplehyperplanes like that, the focus is however only on thehyperplane that with maximum-margins(on both sides). Anysuch hyperplane satisfies:
w.x− b = 0 (39)
If the data is linearly separable, two hyperplanes can be found :
w.x− b = 1 (40)
w.x− b = −1 (41)
![Page 135: DDP Stage 2 Presentation](https://reader033.fdocuments.us/reader033/viewer/2022042516/557d5a78d8b42aba3d8b4ae5/html5/thumbnails/135.jpg)
INTRODUCTION Driver Mutation Detection Bio-marker prediction Visualisation Tools Viral Genome Detection BWA v/s BWA-PSSM Appendix
SVM IIThe distance between the two hyperplanes is 2
||w|| . Thusminimising ||w||would yield the required the hyperplane.In order to prevent misclassification, the following constraintsare required:
(w.xi − b) ≥ 1 (42)
for xi belonging to class 1 and
(w.xi − b) ≤ −1 (43)
for xi belonging to class -1 which can be combined as:
yi(w.xi − b) ≥ 1 (44)
and the objective function to be minimised under thisconstraint is : ||w||