Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

36
Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB

Transcript of Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Page 1: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Bioimage database architecture and infrastructure

2005, Bio-ITR, UCSB

Page 2: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Overview

• Current system – Status of collection– Capabilities– Architecture

• Joint system under development– Capabilities– Architecture

• Future– Layered databases– Distributed databases

Page 3: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Current collection• Retinal

– Confocal microscope – EM (Electron

micrograph)

Type Current Backlog

Rate/y

Expected 4Yrs

Total size

Retinal EM 600 19000 500 20,000 20GB

Retinal confocal P

3000 500 2400 10,000 10GB

Retinal confocal Z

0 14000 12000 10,000 65GB

Microtubule light 3000 2500 2500 13,000 12GB

Microtubule AFM 200 0 1200 5000 15GB

Microtubule DIC 0 0 2.7M 10M 10TB

• Microtubule – Light– Atomic Force Microscopy– DIC/Nomarski

Page 4: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Current capabilities

• Import process• Image and meta storage • Web access and browsing• Limited access by content

Page 5: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Screenshots (browsing)

Page 6: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Screenshots (search)

Page 7: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Screenshot (metadata edit)

Page 8: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Screenshot (retina meta)

Page 9: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Current architecture

• Metadata• Database implementation• Front end implementation• Image import API• Software and hardware infrastructure

Page 10: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Metadata• Standard (image types, parameters)

– File, size, type, tiff data, channel info, etc.• Retinal

– Visible cells– Antibody labeling– Experimental conditions– Researcher

• Microtubule– Track (hand captured)

• AFM– Machine parameters

• Metadata sources– Researcher– Annotated excel files – Proprietary image formats

Page 11: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Database implementation• MySql • First generation schema

– image parameters• File, • size, • type, • tiff data,• etc.

– Metadata• Experimenter, • condition, • antibodies, • tissue, • notes, • etc.

imagemodel.svg

Page 12: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Front end

• Apache, Php, Javascript• Import proprietary image types• Browse images• Search by metadata • Search by similarity• Multi user and release protection

Page 13: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Image and metadata import

• Excel parser for metadata • Image import library

– Image Format API and C/C++ library for database and client applications were developed.

– Currently supported proprietary image formats:• Metamorph Stack,• Fluoview TIFF, BioRad PIC,• PSIA TIFF, Nanoscope,• + common: JPEG, TIFF, BMP, PNG…

Page 14: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Hardware and software infrastructure• Hardware

– Dell Server with dual Intel Xeon cpu at 2.4Ghz

– 140GB scsi hard drive set up as RAID 1– Gigabit network switch

• Software– Linux, version Fedora 2– Apache Web server with PHP, PERL and

graphical modules– MySQL Database server

Page 15: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Overview

• Current system – Status of collection– Capabilities– Architecture

• Joint system under development– Capabilities– Architecture

• Future– Layered databases– Distributed databases

Page 16: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Overview

• Current system – Status of collection– Capabilities– Architecture

• Joint system under development– Capabilities– Architecture

• Future– Layered databases– Distributed databases

Page 17: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Motivation

• Common schema between UCSB and CMU

• Support greater functionality– Analysis and interpretation tools– Ground truth– Semantics– Uncertainty – Complex features and distance metrics

• MPEG-7 features• Other features

– Querying and relevance feedback

Page 18: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Capabilities

• Image and metadata storage • Web access and browsing• Access and search by content• Import/Export

– Streamlined XML import/export for external tools

• Schema extensions– Image5d, semantic, uncertainty, analysis

• Image processing modules and tools

Page 19: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Infrastructure – Interchange XML

Unified interchange XML format is being developed for database feeding and extraction procedures, external client application interaction and database intercommunication.

DBXML

External clients

Image library

ExternalDB

interchange

Import/export

remote access

Ground truth tools

Image processing tools

Page 20: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Ground truth acquisition toolsImage processing and infrastructure teams are developing universal “ground truth” collection tools able to retrieve data from data-base and feed user defined information back to the database. The main communication vehicle is XML interchange format.

At the current stage stand alone tools are being developed and tested that later on will be grouped in the universal application able to communicate directly to the data-base.

+

Page 21: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Image processing APIFast development of image processing tools concentrated on problem solving. API provides simple access to multi-channel image and mask information. Allows progress output, acquisition of user defined parameters and automatically created filter preview.

Example of API usage: Noise removal for Fluoview images

resultnoise

input

Page 22: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Semantic data modules

• Integration of current research in automatic image analysis:– Cell identification– Layer detection– Cell counting– Microtubule detection and tracking– Microtubule dynamicity and global

characterization

Page 23: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Modeling uncertainty• Uncertain identification/analysis

– Simple probability (e.g., 0.8)– “Is this a rod bipolar cell?”

• Imprecise location/extent/count– 90% accuracy in cell count– Line segment (single or sequence), polygon

• Identified by a sequence of points• Each point Gaussian• Store mean x, mean y, and standard deviation

– Circle• Center Gaussian point, as above• Radius mean r and standard deviation

Page 24: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Schema• Image5d• Analysis and interpretation tools

– Quantitative data generation– Semantic Labeling

• Experimental description• Shape and geometry• Domain knowledge

– Ground truth– Semantic objects

• Uncertainty • Features and distance metrics

• MPEG-7 features• Other features

• Querying and relevance feedback

Page 25: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Schema (image5d)

Plane#id+im age_id+im agedata_id+channel_id+tim ept_id+zlevel+m ask_id+parent_id

PlaneT+im age_id+tim ept_id+plane_id

PlaneC+im age_id+channel_id+plane_id

idid1 1

PlaneZ+im age_id+zlevel+plane_id

id PlaneTC+im age_id+tim ept_id+plane_id+channel_id

PlaneTCZ+im age_id+tim ept_id+plane_id+channel_id+zlevel

PlaneCZ+im age_id+channel_id+plane_id+zlevel: int

ididid

PlaneZT+im age_id+zlevel+plane_id+tim ept_id

id

Microtubule Images

Confocal Images

Image#id+description+slide_id+slide_pos+im ageprotocol_id+perm ission_id

id1

*

id1

*

id1

*

EM Z series

id1im age_id

*

id1

*

id1

*

id1

*

id1

*

5d images• Image is a set of bit-

planes• Group planes by which

dimensions vary• Permits

– Multiple formats– Caching

Page 26: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Schema (semantic objects)Target

+target_id *+...

Antibody_labeling+target_id*+cell_part_id*

Cell_and_cell_part+cell-part_id #+name+is_kind_of *+is_part_of *

Semantic_object+sem_obj_id #+cell_part_id *+shape : enum {round, line, polygon}+confidence+source_id *

Cell_part_location+cell_part_id *+layer_id *+expt_cond

Layer_order+layer_id #+layer_order+name

Layer+plane_id #*+layer_id *+confidence+source_id *

Layer_thickness+region_id #*+thickness: gaussian+source_id *

Layer_shape+region_id #*+point_id #+x: gaussian+y:gaussian

Semantic_round_object+sem_obj_id #*+center_x: gaussian+center_y: gaussian+radius: gaussian

Semantic_line_object+sem_obj_id #*+start_x: gaussian+start_y: gaussian+end_x: gaussian+end_y: gaussian

Semantic_polygon_point+sem_obj_id #*+point_id #+x: gaussian+y: gaussian

1 * 1

*

1

1

*

is_kind_of1is_kind_of*

is_part_of1

is_part_of*

1

*

*

*

1

*

1

*

1

1

10..1

1

0..1

1

0..1

•Capture semantics

•Capture uncertainty

•Type of object : confidence

•Position of object: Gaussian

domain

domain

domain

Page 27: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Schema (analysis and features)FeatureDescriptor+id+description+code_ptr+perm ission_id

Feature_result+id+im age_id+result+tim estam p

result:type (double, vector)nam e:string

FeatureInputType+feature_id+table_nam e

FeatureOutputType+feature_id+table_nam e

CellCount_result+id+im age_id+result: num eric+tim estam p

FeatureDescriptor----------------------------1 Cell Counter

FeatureInputTypes----------------------------1 PlaneC

FeatureOutputType----------------------------1 CellCount_result

CellCount_resultid im age_id result tim e----------------------------1 1 103 1

CellIdentfier_result+id+im age_id+result: cell_location+tim estam p

• Capture provenance• Support type

checking• Support feature

substitution

Page 28: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Hardware and software components• Hardware requirements

– Same as original system

• Software – Postgresql backend– JSP / JSF front end

• Migrate php/javascript current code into components

Page 29: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Architecture

WebPage

UI Generation

View

Menu Table

Semantic Interface

DB Storage

Image

Cell

Dynamic

JSF Components

Programmable

Image API

Model API

Object

Relational

(Postgresql)

HTML

XML

Page 30: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Overview• Current system

– Status of collection– Capabilities– Architecture

• Joint system under development– Capabilities– Architecture

• Future– Layered databases– Integration with other databases

• BIRN • OME metadata and schema exchange

Page 31: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Layered database• Overlay model

(interpretation) on image (raw) data

• Multiple interpretations of data

• URI references between databases• Pro: Logical distinction, multiple

interpretations, flexible implementation

Raw Image

Metadata

SemanticObject

Semantic

Complex biological model 1

Complex biological model 2

ObjectInterpretation 3

Semantic objectInterpretation 1

# 2

Page 32: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

BIRN (Biomedical Informatics Research Network)• Goals:

– Link multiple databases with different schemas, maintained at different research institutions• 19 universities, 26 research groups

• Current collection– Three test beds centered around brain imaging of human

neurological disorders and associated animal models:• Functional BIRN• Morphometry BIRN • Mouse BIRN

Page 33: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Integration with BIRN• Databases at UCSB/CMU Centers can be

integrated into the BIRN federation• UCSB/CMU infrastructure supports

– Extensive metadata for images – Standard XML interchange format for 5d images– Computational tools to refine data

• Web based visualization and analysis tools

• We need to:– Translate UCSB/CMU Schema to F-logic

(Knowledge-based mediation)– Link UCSB/CMU dataset to UMLS (Unified Medical

Language System) ontology– Reference a common spatial framework

• Standard atlas coordinate system, e.g., SMART Atlas

Page 34: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

OME• Open Microscopy Environment

– a set of software that interacts with a database to manage images, image meta data, image analysis and analysis results

• Designed to perform as a local system

• Integration with OME– Adapt OME XML image interchange

mechanism– Adapt the database oriented modular

analysis approach of OME

Page 35: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Conclusion

• Built prototype and collected ~4000 images– Being used internally

• Concurrent work on 2nd generation system– Image loading– Integration of tools– New front end

Page 36: Bioimage database architecture and infrastructure 2005, Bio-ITR, UCSB.

Intro bio slide

• Retina

Images from webvision.med.utah.edu