SeqExpress: Introduction. Features Visualisation Tools Data: gene expression, gene function and...

SeqExpress: Introduction

Features

Visualisation Tools Data: gene expression, gene function and gene location. Analysis: probability models, hierarchies and clusters.

Analysis Tools Cluster analysis, refinement and validation. Using mixture modelling. Graphs and Hierarchies.

Data Tools Data Import/Export tools (Remote access of GEO, local

access of tab separated and MAGE format). Data Integration: optional underlying data and annotation

database. Data Manipulation.

SeqExpress: Visualisation Tools

Visualisations

Data Visualisation: Gene Expression; Gene Variance; Gene Function/Ontology; and Chromosome Features.

Analysis Visualisations: Hierarchies/Graphs; Probabilistic Methods; and Cluster Comparison.

Gene Expression

Also: Histograms, Annotation lists and Gene Tables

Scatter Plots Parallel Plots

Gene VarianceGene Spectrums Gene Clouds

Gene Ontology VisualisationsTreeMaps

Graphs

Tables

Chromosome Feature Visualisations

Data AnalysisProbability Models

Dendrograms

Cluster Comparison

Example: Viewing Clusters

A cluster has been selected in the gene tab. The genes are then selected in a scatter plot, a parallel plot and the histogram.

Example: Gene Function Selection

The binding term has been selected from the results of an ontology term search. The binding term is then automatically selected in the Function tab, as well as the open Tree Map visualisation. All genes that have been annotated with the binding term are also selected in the parallel plot.

Example: Genome Location

A combined expression profile and location-based cluster analysis has been performed and the results viewed. The parallel plot shows the similar expression profiles, whilst the two genome views show the locale of the genes. The genome view in the middle is set to auto-zoom, and so shows the locale in detail.

Example: Data Analysis

A series of models have been generated, and the genes with a high probability of belonging to one of the models has been selected in the model viewer. The corresponding location of the genes and their expression profiles are then shown

Summary Number of visualisations available to support

variety of tasks: Expression Ontology (plus pathway and protein-protein interaction) Location Hierarchies Cluster comparison Variance Probability-theory

Visualisations inter-linked

SeqExpress: Analysis Tools

Analysis Tools 1: Clusters, Hierarchies and Concepts Clustering:

Distance based Refinement (ontology or model based). Validation (C-Index)

Hierarchies: SDD*, Hierarchical Projection:

Covariance*: eigen(covar(A)) or A=USVT

Co-occurrence*: P(g,e)=P(g)ΣP(e|z)P(z|g)

*Used for global/enterprise-wide information retrieval

Cluster Distances

TERM:1

TERM:2

TERM:4

TERM:3

TERM:5

TERM:6

Y Z

1 2

43

Expression FunctionLocation

Pearson, CosineEuclidian, Manhattan.

Information theory:2*N3/(N1+N2+2*N3)

Intra gene distance distance to feature

SAGE: Semi Discrete Decomposition

TYDXA

100

010

001

001

300

030

005

110

111

001

3300

3355

0055

•Immunity to outliers•Uses local density•Describes both experiments and genes•Hierarchical description•Stencils means that fold-in possible•Highly scalable

Analysis Tools 2: Models and Graphs

Graphs: Two factor analysis using (1)Graph Connectivity and (2) Edge Length.

Models: N-factor analysis using product rule: P(A,B|C)=P(A|BC)*P(B|C).

Multi-factor analysis to identify complex features within the data (e.g. genes which have both a similar expression profile and are located on the same part of a chromosome)

Models: Discovery

Different models can be found, and altered using energy parameters and tempering.

Size: 27

Size: 55

Size: 34 ( Missing values )

Size: 32 ( Ribosomal and phosphate metabolism )

Size: 42 ( mRNA, rRNA and tRNA processing )

Size: 53 ( Respiration and carbon regulation )

Size: 31 ( Energy and Osmotic stress I )

Size: 63 ( Energy, osmolarity and cAMP signaling )

Unsupervised Clusters Regulatory Modules

Size: 211

Size: 71

Size: 555

Size: 30 ( Cell cycle (G2/M) )

Size: 34 ( Missing values )



Size: 38 ( AA metabolism II )

Size: 28 ( Mixed I )

Size: 76 ( DNA and RNA processing )

Size: 71 ( Cell cycle, TFs and DNA metabolism )

Size: 41 ( Energy and Osmotic stress II )

Size: 30 ( Nitrogen catabolite repression )

Size: 77 ( Sporulation and Cell wall )

Size: 59 ( Sporulation and cAMP pathway )

Size: 87 ( Unkown (sub-telomeric) )

Size: 54 ( Mixed II )


Size: 27Size: 101

Size: 37Size: 13Size: 28Size: 72Size: 53Size: 52Size: 53Size: 49Size: 88Size: 19Size: 37Size: 40Size: 39Size: 29Size: 26Size: 75Size: 69Size: 79Size: 87Size: 43

Size: 34 ( Missing values )Size: 87 ( Mitochondrial and Signaling )Size: 74 ( Snf kinase regulated processes )Size: 28 ( Mixed I )Size: 77 ( ER and Nuclear )Size: 48 ( TFs and nuclear transport )Size: 59 ( Sporulation and cAMP pathway )Size: 64 ( Cell cycle and general TFs )Size: 41 ( Mixed III )Size: 32 ( Ribosomal and phosphate metabolism )Size: 42 ( mRNA, rRNA and tRNA processing )Size: 38 ( AA metabolism II )Size: 28 ( Unknown genes II )Size: 53 ( Respiration and carbon regulation )Size: 31 ( Energy and Osmotic stress I )Size: 63 ( Energy, osmolarity and cAMP signaling )Size: 41 ( Energy and Osmotic stress II )Size: 71 ( Cell cycle, TFs and DNA metabolism )Size: 86 ( Trafficking and Mitochondrial )Size: 47 ( Nuclear )Size: 77 ( Sporulation and Cell wall )Size: 59 ( Protein modification and trafficking )Size: 40 ( Cell differentiation )Size: 23 ( Cell wall and transport I )Size: 34 ( Mixed IV )


Size: 37

Size: 76

Size: 75

Size: 36

Size: 107

Size: 51

Size: 122

Size: 81

Size: 49

Size: 789

Size: 52 ( AA and purine metabolism )



Size: 38 ( AA metabolism II )

Size: 40 ( Cell differentiation )

Size: 41 ( Energy and Osmotic stress II )

Size: 61 ( Cell wall and Transport II )

Size: 48 ( TFs and nuclear transport )

Size: 53 ( Respiration and carbon regulation )

Size: 31 ( Energy and Osmotic stress I )

Size: 54 ( Mixed II )

Size: 64 ( Cell cycle and general TFs )


Spline (beta 0.1) Linear (beta 0.6)

Cosine (beta 1.1)Normal (beta 0.1)

Models: Usage

Clusters generation: High probabilities equate to cluster membership.

Fitting data: Use normal tissues to fit models to genes, use disease tissues to fit genes to models. Changed behaviour equates to likelihood of model transition.

Combining models: complex feature identification (given feature X on condition Y).

Graph: Discovery

Graph connectivity equates to: MST of expression values Sub-graphs of the gene ontology Chromosome relationship

Edge Distance equates to: Expression distance Network (ontology) distance Linear chromosomal distance

Graph partitioned: regular (using Metis) irregular (Min/Max)

Analysis: Summary

Desktop analysis. Number of techniques available. Techniques can be customised for different

data sets (e.g. organism, array type). Borrows heavily from Information Retrieval. Probabilistic techniques show most promise.

SeqExpress: Data Tools

Data Analysis

Data Import/Export tools: Remote access of GEO (one click access), Import tab separated and MAGE format. Export tab separated and Bioconductor format

Data Integration: data and annotation database. Automatic and configurable annotation mapping (e.g.

SAGE tag to locuslink (entrez gene?) to unigene) Data Manipulation: transformation, filtering and

constraining

Data Integration: GEO

Data Integration: Annotation Builder

SeqExpress: Summary

Summary

Written in C#, is free and runs under windows. Not associated with any academic institution,

funding body or commercial organisation. Development is still ongoing. Plan to develop to the Expression Application

Class Specification. Looking for employment in Seattle…

SeqExpress: Introduction. Features Visualisation Tools Data: gene expression, gene function and...

Documents

Transcript of SeqExpress: Introduction. Features Visualisation Tools Data: gene expression, gene function and...