1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.

18
1 maxdLoad The maxd website: http://www.bioinf.man.ac.uk/microarray/maxd/ © 2002 Norman Morrison for Manchester Bioinformatics.

Transcript of 1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.

Page 1: 1 maxdLoad The maxd website:  © 2002 Norman Morrison for Manchester Bioinformatics.

1

maxdLoad

The maxd website:

http://www.bioinf.man.ac.uk/microarray/maxd/

© 2002 Norman Morrison for Manchester Bioinformatics.

Page 2: 1 maxdLoad The maxd website:  © 2002 Norman Morrison for Manchester Bioinformatics.

2

Overview

• Microarray Expression Data

• Some Terminology

• maxdLoad

• maxdView

Page 3: 1 maxdLoad The maxd website:  © 2002 Norman Morrison for Manchester Bioinformatics.

3

Microarray Expression Data I

• A hybridisation of Probes with Genes (Reporter with Sample)

• Duplicate and/or replicate Spots on the array.

• Image analysis of one or more channels.

• Numerical data (matrix): mRNA abundance.

• Experimental data: descriptions of biological materials such as source and sample, and of processes such as protocols.

Page 4: 1 maxdLoad The maxd website:  © 2002 Norman Morrison for Manchester Bioinformatics.

4

Microarray Expression Data II

• The database holds raw data resulting from the image analysis process. The data typically needs to be normalised before use.

• The experimental data is mostly stored as plain text descriptions, e.g. concepts like Extract and ExtractionProtocol.

• Elements such as protocols and samples can be reused. For example a series of Samples might be made from the same Source.

Page 5: 1 maxdLoad The maxd website:  © 2002 Norman Morrison for Manchester Bioinformatics.

5

Some Terminology I

• JAVA – Object-Oriented Programming Language. Programs (well) written in JAVA are Platform Independent.

• RDBMS - Relational DataBase Management System - the software which provides the underlying database support, for example "Oracle 9“, "SQL Server 7“ or “MySQL”.

• SQL - Structured Query Language - the language used for talking to the database.

• JDBC - Java DataBase Connectivity - the protocol used for transporting SQL to and from a database.

Page 6: 1 maxdLoad The maxd website:  © 2002 Norman Morrison for Manchester Bioinformatics.

6

Some Terminology II

• XML – eXtensible Markup Language - a set of rules for designing text formats that let you structure your data. XML makes it easy for a computer to generate data, read data, and ensure that the data structure is unambiguous.

• UML – Unified Modeling Language - a standard notation for the modeling of real-world objects as a first step in developing an object-oriented design methodology. Boxes and Sticks.

• MIAME – Minimum Information About a Microarray Experiment.

Page 7: 1 maxdLoad The maxd website:  © 2002 Norman Morrison for Manchester Bioinformatics.

7

Even More Terminology!

• Measurements = Arrays = Columns = Hybridizations….

• Spots = Wells = Rows = Probe = Reporters (=) Feature

• Sample = Target = BioMaterial….

• Clusters = hierarchical collections of Spots or Measurements

• Attributes = associated data, for example SpotAttribute and MeasurementAttribute

Page 8: 1 maxdLoad The maxd website:  © 2002 Norman Morrison for Manchester Bioinformatics.

8

Data Models

ArrayExpress

maxdSQL

MAML (microarray markup language) : MGED community

GEML (gene expression markup language) : Rosetta

MAML + GEML = MAGE-ML (MAGE-ML & MAGE-OM)

UML

XML

XML

SQL

XML UML

Page 9: 1 maxdLoad The maxd website:  © 2002 Norman Morrison for Manchester Bioinformatics.

9

Architecture Overview I

• The maxd software has three major components:– maxdSQL

the definition of the database in SQL

– maxdLoadthe data loading and database curation tool

– maxdViewthe data visualisation and analysis tool

• All are freely available from the website.

• maxdView and maxdLoad are Java applications (not applets).

• maxdView can be used independently of the maxdSQL database.

Page 10: 1 maxdLoad The maxd website:  © 2002 Norman Morrison for Manchester Bioinformatics.

10

Architecture Overview II

maxdSQL

JDBC JDBC XML

views

flat-files

MAGE-ML

flat-files

Page 11: 1 maxdLoad The maxd website:  © 2002 Norman Morrison for Manchester Bioinformatics.

11

Architectural Overview III

JDBC JDBC

maxdSQL

flat-files

MAGE-ML

flat-files

XML

views

Page 12: 1 maxdLoad The maxd website:  © 2002 Norman Morrison for Manchester Bioinformatics.

12

Getting started with maxdLoad and maxdSQL

• Create an empty database.

• Create database tables.

• Install maxdLoad.

• Connect to the database.

• Load some data.

• Browse the database.

• Have a nice cup of tea.

Page 13: 1 maxdLoad The maxd website:  © 2002 Norman Morrison for Manchester Bioinformatics.

13

maxdSQL

• Many different RDBMS can be used to host the maxdSQL data: Oracle, MySQL, Postgres, SYBASE and Firebird (a.k.a. Interbase) are known to work.

Any system which supports the ANSI SQL92 standard should work with maxdSQL.

• The database is created by loading the schema definition file via whatever tool the RDBMS provides.

– son-of-maxdLoad you won’t have to do this.

• ‘Cut & Paste’ into the SQL console is also an option.

Page 14: 1 maxdLoad The maxd website:  © 2002 Norman Morrison for Manchester Bioinformatics.

14

maxdLoad

• maxdLoad presents an interface which is tightly related to the underlying database schema.

• JDBC provides network transparency.

• Each type of ‘object’ in the database has a corresponding form in maxdLoad.

• Links between ‘objects’ become links between forms.

• When creating these links there are normally two choices. Either an existing ‘object’ can be reused or a new ‘object’ created.

Page 15: 1 maxdLoad The maxd website:  © 2002 Norman Morrison for Manchester Bioinformatics.

15

Example Form

Page 16: 1 maxdLoad The maxd website:  © 2002 Norman Morrison for Manchester Bioinformatics.

16

Loading Data

• Generally the loading starts at the Measurement form (although it is possible to start elsewhere).

• The easiest way to navigate is by filling in the fields from top to bottom on each form as it is displayed.

• Most fields are completed either by filling in another form or by picking an existing entry from the database.

• Some forms are completed by providing a plain text data file and a description of how the file is formatted.

Page 17: 1 maxdLoad The maxd website:  © 2002 Norman Morrison for Manchester Bioinformatics.

17

The Database Browser

• The browser displays a sub-set of the database schema and enables links between data items to be explored and altered. The browser also permits data values (such as names and descriptions) to be edited.

Page 18: 1 maxdLoad The maxd website:  © 2002 Norman Morrison for Manchester Bioinformatics.

18

Future Developments

• Exporting data in MAGE-ML format is now possible in the beta release.

• Adding fields to maintain MIAME compliance.

• son-of-maxdLoad– Improved user interface– Database table creation facility