Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX...
-
Upload
ross-johnston -
Category
Documents
-
view
213 -
download
0
Transcript of Genopolis Microarray DB a Progress Report Marco Brandizi Dec 12, 2005 Dottorato in Informatica XIX...
Genopolis Microarray DBGenopolis Microarray DBa Progress Reporta Progress Report
Genopolis Microarray DBGenopolis Microarray DBa Progress Reporta Progress Report
Marco Brandizi<[email protected]>
Dec 12, 2005
Dottorato in InformaticaXIX Ciclo
OutlineIntroduction
GCA Application
Main features
Demo
Demo/Gene Browser
Recent added features
Access control
Search & Save
Ongoing and future
MAGE Export
Migration on cluster
Management of knowledge about Higher Level Analysis
Other possible developments
DNA
gene
mRNA
protein
Genes Machine
Cell/Life
Microarray Data, conceptual model
Microarray Data Management Issues
Exp. data vs. seq. data:
Context dependent (living system, exp. Conditions)
Lack of standard unit of measure
Several normalizations methods
Multiple platforms and methods
No standard for data annotation
Vocabularies and terminology coherence
Details about: experiment, source, protocols, exp.
conditions
Microarrays Data Management Issues / 2
Evidences about data quality
What to store?
Raw Images
Computed values
Normalized values
How to find data
Complex vocabularies aware systems (ontologies)
Data mining and exp. comparison tools
Data access control
MIAME Experiment Modeling
OutlineIntroduction
GCA Application
Main features
Demo
Demo/Gene Browser
Recent added features
Access control
Search & Save
Ongoing and future
MAGE Export
Migration on cluster
Management of knowledge about Higher Level Analysis
Other possible developments
GCA FeaturesCurated experimental design representation
MIAME-compliant, (although with simplified model)
Use of controlled vocabularies
Experiment checking/publishing, with supervision
Targeted to Affymetrix platform
Chip description is simple, imported from NETAffx
Single channel technology
Access control
Users are grouped into groups and access roles
Experiments belong to user groups
GCA FeaturesData Retrieval and visualization
Gene browser, a graphical visualization interface, based on the
matrix model
Search & Save data
Current content:
A set of time-courses about DCs stimulated with different stimuli
Implementation & Deployment
LAMP application (Linux + Apache + MySQL + PHP)
Model Viewer Controller as much as possible:
Business objects layer
Presentation widgets (DAO-lib)
Other application control layers
GCA Features
Shortly:
A Gene Expression database software, focused
on Affymetrix technology, useful as a facility for a
distributed community of users
GCA Data Model
GCA Data Model
OutlineIntroduction
GCA Application
Main features
Demo
Demo/Gene Browser
Recent added features
Access control
Search & Save
Ongoing and future
MAGE Export
Migration on cluster
Management of knowledge about Higher Level Analysis
Other possible developments
GCA Login
GCA Editing
GCA Experiment Checking
GCA Import of chip annotations
GCA CVs and protocols
GCA CVs and protocols
OutlineIntroduction
GCA Application
Main features
Demo
Demo/Gene Browser
Recent added features
Access control
Search & Save
Ongoing and future
MAGE Export
Migration on cluster
Management of knowledge about Higher Level Analysis
Other possible developments
GCA Gene Browser
GCA Gene Browser
GCA Gene Browser
GCA Gene Browser
GCA Gene Browser
OutlineIntroduction
GCA Application
Main features
Demo
Demo/Gene Browser
Recent added features
Access control
Search & Save
Ongoing and future
MAGE Export
Migration on cluster
Management of knowledge about Higher Level Analysis
Other possible developments
GCA Access Management
Bicocca Besta
Granucci
ADMIN
NormanTiranti
Andrea
Brandizi
Experiment 123
Ottavio
User PermissionsBrandizi, Andrea AllGranucci ReadNorman Read, WriteTiranti All (except admin)Ottavio None
User PermissionsBrandizi, Andrea AllGranucci ReadNorman Read, WriteTiranti All (except admin)Ottavio None
All but admin
All rights
R, W, -publish
Read only
Access managementAccess managementAccess managementBased on a core library
Recent developments (security lib)
Code has been changed so that it uses security lib
All the code that interacts with user has been wrapped with
access management controls
Even malicious access attempts has been considered:
Handy writing of an URL
Handy request of an uploaded file (to be completed)
Does it work?
Yes, pretty sure
But more testing is needed
OutlineIntroduction
GCA Application
Main features
Demo
Demo/Gene Browser
Recent added features
Access control
Search & Save
Ongoing and future
MAGE Export
Migration on cluster
Management of knowledge about Higher Level Analysis
Other possible developments
Search and Save
Search and Save
Search and Save
Search and Save
Search and Save
Search and Save
Search and Save
Search and Save
OutlineIntroduction
GCA Application
Main features
Demo
Demo/Gene Browser
Recent added features
Access control
Search & Save
Ongoing and future
MAGE Export
Migration on cluster
Management of knowledge about Higher Level Analysis
Other possible developments
MAGE ExportWill allow to export a GCA experiment to MAGE/Array Express
A collaboration with EBI
in the context of u-GENE
So far:
Schema of GCA->MAGE
(in AE compatible form)
Basic code fragments
(Business objects in Java)
Still to do
Full code
Mappings with MGED-Ontology
Tests with AE
MAGE Export
OutlineIntroduction
GCA Application
Main features
Demo
Demo/Gene Browser
Recent added features
Access control
Search & Save
Ongoing and future
MAGE Export
Migration on cluster
Management of knowledge about Higher Level Analysis
Other possible developments
GCA on cluster architectureThree machines, the minimum to have a cluster
Master (Xeon 3.2 Ghz, 2Gb RAM)
+ Master Clone that ensures high availability
computation node computers
(P4 3 Ghz, 512Mb)
1Tb of SCSI disk, shared via NFS
Based on:
Debian (Linux)
Linux Virtual Server
(Load Balancer)
Hearthbeat (High availability)
GCA on cluster architecture
GCA on cluster architectureCode needs slight changes:
PHP side and sessions:
Objects that are saved on session need to be reloaded properly
See:
http://it2.php.net/manual/en/language.oop.magic-functions.php#14473
__wakeup() is already used
__sleep() with proper return value is to be implemented
MySQL side:
The stable DB:
We need to specify the type of DB access: Read Only Mode vs.
Read/Write mode
RO access uses local copy of DB
RW access uses master copy
The temporary DB:
Only master copy exists (3307 port, current deployment)
GCA on cluster architecturePossible other uses of cluster
Heavy computations (normalizations)
Integration with R (Grad. Thesis of L. Vanotti, Grad. Th. of M.
Sesana)
Other Integrations with R (AMDA)
Other related services
Knowledge management app.
Groupware Integration (DC-Thera)
Cytoscape Integration (DC-Thera)
Service mgmt. app.
R computations
BUT it has been designed for GCA
OutlineIntroduction
GCA Application
Main features
Demo
Demo/Gene Browser
Recent added features
Access control
Search & Save
Ongoing and future
MAGE Export
Migration on cluster
Management of knowledge about Higher Level Analysis
Other possible developments
The mA Experiments Cycle
“Closing the loop”
What we need to model
What we need to model
What we need to model
How to model: Semantic Web Technologies“The Semantic Web is an extension of the current web in whichinformation is given well-defined meaning, better enabling computersand people to work in cooperation.”
Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001
Microarrays Annotation Ontology
Microarray entities
Annotation entities
Microarrays Annotation OntologyAnnotation (source, target, child, parent, rank)
Microarrays Annotation Ontology
Microarrays Annotation Ontology
Examples of useAnnotating a saved search
Comments and answer to comments
Originating operations (import, intersection, merge...)
Which user is working on this data set
Why the data set is being saved
Functional family
I'm studying IL2
I'm studying Shistosomia
disease
Examples of use: AMDA
Examples of use: AMDARepresentation an AMDA report in a structured form
DEGs and genes clusters
This is a DEG set, computed by AMDA (PAM method ) on samples
s1, s2, s3
Correlation between chips (storing values and links to
chip pairs)
Functional annotations of genes, by means of KEGG
(with reporting of significance)
Import of analysis annotations on GCA
Presenting analysis annotation together with data sets
OutlineIntroduction
GCA Application
Main features
Demo
Demo/Gene Browser
Recent added features
Access control
Search & Save
Ongoing and future
MAGE Export
Migration on cluster
Management of knowledge about Higher Level Analysis
Other possible developments
GCA: other possible developmentsTemplates for experiment insertion
Advanced CVs (taxonomies, mapping to MGED-Ontology)
Knowledge management features (with or without annotation
ontology)
e-Groupware and links between eGroupware
forums/documents and GCA experiments/data sets
Integration with AMDA (with or without annotation ontology)
Export of API, via Web Services Technology
Integration with Taverna or Cytoscape
Connections with pathway databases (ex.: by means of
Pathway processor)