SeqExpress: Introduction. Features Visualisation Tools Data: gene expression, gene function and...
-
Upload
nancy-chandler -
Category
Documents
-
view
214 -
download
0
Transcript of SeqExpress: Introduction. Features Visualisation Tools Data: gene expression, gene function and...
SeqExpress: Introduction
Features
Visualisation Tools Data: gene expression, gene function and gene location. Analysis: probability models, hierarchies and clusters.
Analysis Tools Cluster analysis, refinement and validation. Using mixture modelling. Graphs and Hierarchies.
Data Tools Data Import/Export tools (Remote access of GEO, local
access of tab separated and MAGE format). Data Integration: optional underlying data and annotation
database. Data Manipulation.
SeqExpress: Visualisation Tools
Visualisations
Data Visualisation: Gene Expression; Gene Variance; Gene Function/Ontology; and Chromosome Features.
Analysis Visualisations: Hierarchies/Graphs; Probabilistic Methods; and Cluster Comparison.
Gene Expression
Also: Histograms, Annotation lists and Gene Tables
Scatter Plots Parallel Plots
Gene VarianceGene Spectrums Gene Clouds
Gene Ontology VisualisationsTreeMaps
Graphs
Tables
Chromosome Feature Visualisations
Data AnalysisProbability Models
Dendrograms
Cluster Comparison
Example: Viewing Clusters
A cluster has been selected in the gene tab. The genes are then selected in a scatter plot, a parallel plot and the histogram.
Example: Gene Function Selection
The binding term has been selected from the results of an ontology term search. The binding term is then automatically selected in the Function tab, as well as the open Tree Map visualisation. All genes that have been annotated with the binding term are also selected in the parallel plot.
Example: Genome Location
A combined expression profile and location-based cluster analysis has been performed and the results viewed. The parallel plot shows the similar expression profiles, whilst the two genome views show the locale of the genes. The genome view in the middle is set to auto-zoom, and so shows the locale in detail.
Example: Data Analysis
A series of models have been generated, and the genes with a high probability of belonging to one of the models has been selected in the model viewer. The corresponding location of the genes and their expression profiles are then shown
Summary Number of visualisations available to support
variety of tasks: Expression Ontology (plus pathway and protein-protein interaction) Location Hierarchies Cluster comparison Variance Probability-theory
Visualisations inter-linked
SeqExpress: Analysis Tools
Analysis Tools 1: Clusters, Hierarchies and Concepts Clustering:
Distance based Refinement (ontology or model based). Validation (C-Index)
Hierarchies: SDD*, Hierarchical Projection:
Covariance*: eigen(covar(A)) or A=USVT
Co-occurrence*: P(g,e)=P(g)ΣP(e|z)P(z|g)
*Used for global/enterprise-wide information retrieval
Cluster Distances
TERM:1
TERM:2
TERM:4
TERM:3
TERM:5
TERM:6
Y Z
1 2
43
Expression FunctionLocation
Pearson, CosineEuclidian, Manhattan.
Information theory:2*N3/(N1+N2+2*N3)
Intra gene distance distance to feature
SAGE: Semi Discrete Decomposition
TYDXA
100
010
001
001
300
030
005
110
111
001
3300
3355
0055
•Immunity to outliers•Uses local density•Describes both experiments and genes•Hierarchical description•Stencils means that fold-in possible•Highly scalable
Analysis Tools 2: Models and Graphs
Graphs: Two factor analysis using (1)Graph Connectivity and (2) Edge Length.
Models: N-factor analysis using product rule: P(A,B|C)=P(A|BC)*P(B|C).
Multi-factor analysis to identify complex features within the data (e.g. genes which have both a similar expression profile and are located on the same part of a chromosome)
Models: Discovery
Different models can be found, and altered using energy parameters and tempering.
Size: 27
Size: 55
Size: 34 ( Missing values )
Size: 32 ( Ribosomal and phosphate metabolism )
Size: 42 ( mRNA, rRNA and tRNA processing )
Size: 53 ( Respiration and carbon regulation )
Size: 31 ( Energy and Osmotic stress I )
Size: 63 ( Energy, osmolarity and cAMP signaling )
Unsupervised Clusters Regulatory Modules
Size: 211
Size: 71
Size: 555
Size: 30 ( Cell cycle (G2/M) )
Size: 34 ( Missing values )
Size: 32 ( Ribosomal and phosphate metabolism )
Size: 42 ( mRNA, rRNA and tRNA processing )
Size: 38 ( AA metabolism II )
Size: 28 ( Mixed I )
Size: 76 ( DNA and RNA processing )
Size: 71 ( Cell cycle, TFs and DNA metabolism )
Size: 41 ( Energy and Osmotic stress II )
Size: 30 ( Nitrogen catabolite repression )
Size: 77 ( Sporulation and Cell wall )
Size: 59 ( Sporulation and cAMP pathway )
Size: 87 ( Unkown (sub-telomeric) )
Size: 54 ( Mixed II )
Unsupervised Clusters Regulatory Modules
Size: 27Size: 101
Size: 37Size: 13Size: 28Size: 72Size: 53Size: 52Size: 53Size: 49Size: 88Size: 19Size: 37Size: 40Size: 39Size: 29Size: 26Size: 75Size: 69Size: 79Size: 87Size: 43
Size: 34 ( Missing values )Size: 87 ( Mitochondrial and Signaling )Size: 74 ( Snf kinase regulated processes )Size: 28 ( Mixed I )Size: 77 ( ER and Nuclear )Size: 48 ( TFs and nuclear transport )Size: 59 ( Sporulation and cAMP pathway )Size: 64 ( Cell cycle and general TFs )Size: 41 ( Mixed III )Size: 32 ( Ribosomal and phosphate metabolism )Size: 42 ( mRNA, rRNA and tRNA processing )Size: 38 ( AA metabolism II )Size: 28 ( Unknown genes II )Size: 53 ( Respiration and carbon regulation )Size: 31 ( Energy and Osmotic stress I )Size: 63 ( Energy, osmolarity and cAMP signaling )Size: 41 ( Energy and Osmotic stress II )Size: 71 ( Cell cycle, TFs and DNA metabolism )Size: 86 ( Trafficking and Mitochondrial )Size: 47 ( Nuclear )Size: 77 ( Sporulation and Cell wall )Size: 59 ( Protein modification and trafficking )Size: 40 ( Cell differentiation )Size: 23 ( Cell wall and transport I )Size: 34 ( Mixed IV )
Unsupervised Clusters Regulatory Modules
Size: 37
Size: 76
Size: 75
Size: 36
Size: 107
Size: 51
Size: 122
Size: 81
Size: 49
Size: 789
Size: 52 ( AA and purine metabolism )
Size: 32 ( Ribosomal and phosphate metabolism )
Size: 42 ( mRNA, rRNA and tRNA processing )
Size: 38 ( AA metabolism II )
Size: 40 ( Cell differentiation )
Size: 41 ( Energy and Osmotic stress II )
Size: 61 ( Cell wall and Transport II )
Size: 48 ( TFs and nuclear transport )
Size: 53 ( Respiration and carbon regulation )
Size: 31 ( Energy and Osmotic stress I )
Size: 54 ( Mixed II )
Size: 64 ( Cell cycle and general TFs )
Unsupervised Clusters Regulatory Modules
Spline (beta 0.1) Linear (beta 0.6)
Cosine (beta 1.1)Normal (beta 0.1)
Models: Usage
Clusters generation: High probabilities equate to cluster membership.
Fitting data: Use normal tissues to fit models to genes, use disease tissues to fit genes to models. Changed behaviour equates to likelihood of model transition.
Combining models: complex feature identification (given feature X on condition Y).
Graph: Discovery
Graph connectivity equates to: MST of expression values Sub-graphs of the gene ontology Chromosome relationship
Edge Distance equates to: Expression distance Network (ontology) distance Linear chromosomal distance
Graph partitioned: regular (using Metis) irregular (Min/Max)
Analysis: Summary
Desktop analysis. Number of techniques available. Techniques can be customised for different
data sets (e.g. organism, array type). Borrows heavily from Information Retrieval. Probabilistic techniques show most promise.
SeqExpress: Data Tools
Data Analysis
Data Import/Export tools: Remote access of GEO (one click access), Import tab separated and MAGE format. Export tab separated and Bioconductor format
Data Integration: data and annotation database. Automatic and configurable annotation mapping (e.g.
SAGE tag to locuslink (entrez gene?) to unigene) Data Manipulation: transformation, filtering and
constraining
Data Integration: GEO
Data Integration: Annotation Builder
SeqExpress: Summary
Summary
Written in C#, is free and runs under windows. Not associated with any academic institution,
funding body or commercial organisation. Development is still ongoing. Plan to develop to the Expression Application
Class Specification. Looking for employment in Seattle…