REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the...
Transcript of REMI Database: A Data Management System for Nuclear ...REMI Facts about REMI Developed for the...
REMI Database: A Data Management System forNuclear Medicine Studies
Antall Johann Fernandes
Florida Institute of Technology
30th November, 2011
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 1 / 43
REMI
Facts about REMI
Developed for the Nuclear Imaging Group of the Life SciencesDivision at Lawrence Berkeley National Laboratory, California.
Supported by a subcontract on NIH grant 5-R01-EB007219-04,”Molecular Imaging of Cardiac Hypertrophy Using microPET andPinhole SPECT”.
Currently is in production and houses 180 experiments, and 8105 datafiles.
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 2 / 43
Motivation
Typical Work Process by the Nuclear Imaging Group
Perform a SPECT, DT-MRI or Micro-PET experiment.
Copy experimental data to individual researcher’s workstation.
Perform data analysis, image reconstruction and various otherprocessing tasks on experimental data.
Extract results from processed data.
Publish results and findings based on results.
What happens to the experiment raw data or processed data?
Data resides on individual researcher’s workstation under a researchergenerated directory structure.
Data is moved or copied into a central file server into a researchergenerated location and directory structure.
Data about the experimental data may or may not be captured.
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 3 / 43
Motivation
Knowledge about Data
What experimental data has been collected?
Where does all of the experimental data reside?
Data Sharing
Can we share the experimental data with others?
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 4 / 43
Motivation
The Need for a System
Aware of all the experimental data captured over the years.
Knowledge of the location of the experimental data.
Provide the capabilities to share the experimental data.
Sharing Experimental Data
Data should be stored in a standardized manner.
Data should be query-able.
Meta-data needs to be captured in a timely manner. (Meta-data isexplained later).
Users should not need to deal with file systems.
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 5 / 43
Design Considerations
Typical Scientific Data Management Systems should have...
Creation of logical collections - physical data to logical collections.
Physical data handling - storage of the physical data files.
Security support - data access authorization and change verification.
Data ownership - who is responsible for data quality and meaning.
Persistence - data lifetime.
Knowledge and information discovery - identify useful informationinside the data collection.
Meta-data collection, management and access.
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 6 / 43
Design Considerations
REMI’s Design Considerations
Open access to downloading of experimental data.
Users should be able to access the database from anywhere.
Authorized researchers should be able to upload experimental data.
REMI should handle the physical storage of data files (explainedlater).
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 7 / 43
Understanding Experiments and its Data
SPECT Experiment
Data and Meta-Data
Patient Coordinate System
Detector Coordinate System
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 8 / 43
Meta-data
What is Meta-data?
Information about data.
Describes how data is measured, acquired, and computed.
Enables data browsing, data transfer, and data documentation
Makes it possible to build data independent automated tools.
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 9 / 43
Entities and Relationships
Entity
A ”thing” or ”object” in the real world that is distinguishable from allother objects. Eg. Person or Vehicle.
An entity has a set of properties and values that identify an entity.
Relationship
An association among several entities. Eg. A Person owns a Vehicle.
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 10 / 43
Entities and Relationships
Entity-Relationship Data Model
Based on the perception of the real world that consists of a set of basicobjects called entities, and of relationships among these objects.
Mapping Cardinalities
Represents constraints on the relationship.
Expresses the number of entities to which another entity can beassociated via a relationship.
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 11 / 43
REMI Meta-data Database Development
Logical Entity Relationship Diagram
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 12 / 43
REMI Meta-data Database Development
Physical Table Design Diagram
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 13 / 43
REMI Meta-data Database Development
Physical Table Design Diagram: Machine
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 14 / 43
REMI Meta-data Database Development
Physical Table Design Diagram: Tracer
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 15 / 43
REMI Meta-data Database Development
Physical Table Design Diagram: File Tag
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 16 / 43
REMI Meta-data Database Development
Physical Table Design Diagram: File Association
Owner ID maps to the entity which owns the File.
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 17 / 43
REMI Meta-data Database Development
Physical Table Design Diagram: User
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 18 / 43
REMI File Storage Development
REMI’s Design Consideration (mentioned earlier)
REMI should handle the physical storage of data files.
File Storage Schemes
(c) Single Directory (d) Directory Hierarchy
All Files under a single directory.
Individual directory for each experiment.
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 19 / 43
REMI File Storage Development
REMI File Storage
Concatenate experiment ID and filename and generate a 64 lengthSHA256 string.
Create directory structure based on SHA256 string, directory levels,and directory name size
Parameter directory name size determines how many characters touse of the hash value
Parameter directory levels determines the depth of the directorystructure
Example 1
SHA256 string: fb80f1735b3153e7a41d38390de5d9773c35259965..directory levels : 2directory name size : 2=>ROOT DIR/fb/80/<experiment id> <filename>.<extension>
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 20 / 43
REMI File Storage Development
REMI File Storage: Example 2
SHA256 string: fb80f1735b3153e7a41d38390de5d9773c35259965..directory levels : 2directory name size : 3=>ROOT DIR/fb8/0f1/<experiment id> <filename>.<extension>
REMI File Storage: Example 3
SHA256 string: fb80f1735b3153e7a41d38390de5d9773c35259965..directory levels : 3directory name size : 2=>ROOT DIR/fb/80/f1/<experiment id> <filename>.<extension>
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 21 / 43
REMI File Storage Development
File Storage
directory levels : 2directory name size : 2
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 22 / 43
REMI Application Development
Rails Model-View-Controller (MVC) Architecture
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 23 / 43
REMI Application Development
System Design
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 24 / 43
REMI Application Development
System Component Design
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 25 / 43
REMI Application Development
Reasons to upload Data Files in Parts
Web Browsers can upload a maximum of 4 GB of data per request.
Certain data files are in excess of 50 GB in size.
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 26 / 43
REMI Application Development
File Upload Process
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 27 / 43
REMI Application Development
File Upload Process
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 28 / 43
REMI Application Development
File Upload Process: Save the chunk files
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 29 / 43
REMI Application Development
File Upload Process: Concatenate the chunk files
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 30 / 43
REMI User Interface Development
JSON (JavaScript Object Notation)
A lightweight data-interchange format.
Easy for humans to read and write.
Built on two structures.
A collection of name/value pairs.Realized as an object, record, struct, dictionary, hash table, keyed list,or associative array.An ordered list of values.Realized as an array, list, or sequence.
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 31 / 43
REMI User Interface Development
JSON Syntax
(e) Object Syntax
(f) Array Syntax
(g) Value Syntax
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 32 / 43
REMI User Interface Development
REMI Menu
id: Translates to the element ID on the HTML page.
value: Display name shown on the user interface.
link: URL to be called.
submenu: Array to contain sub-menus.Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 33 / 43
REMI User Interface Development
Download Menu Options
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 34 / 43
REMI User Interface Development
Searching for Experiment based on Modality
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 35 / 43
REMI User Interface Development
Creating a New SPECT Experiment
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 36 / 43
REMI User Interface Development
Various Sections within the SPECT Experiment
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 37 / 43
Weaknesses
Weaknesses within REMI
Lack of full support for Semi-Structured Data.
Semi-Structured Data does not conform with the formal structure oftables and data models associated with relational databases.Meta-data is all semi-structured data.
Resume functionality for uploading and downloading data files.
Server side file compression on large files.
Meta-data Templates and User Personalization.
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 38 / 43
Future Enhancements
Provide support for Semi-Structured Data.
Relational databases with XML support.
Move from a relational model to Entity-Attribute-Value model.
Document based databases.
User Personalization.
Provide user specific views of owned experiments.
Support saving of personalized search queries.
Meta-data
Best approach to modeling meta-data.
Make the meta-data capture process less cumbersome.
Reading meta-data from header files.
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 39 / 43
Future Enhancements
Resume functionality for Data File Uploads.
Files uploaded in chunks.
Query the server for existing file size on the server.
Transfer file chunks greater than the size on the server.
Resume functionality for Data File Downloads.
Look at HTTP Byte Range Retrieval Extensions.
See how the resume functionality affects the compression process onthe server.
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 40 / 43
Questions
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 41 / 43
Demonstration
REMI Website
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 42 / 43
Thank You
Antall Johann Fernandes (Florida Tech.) REMI Database 30th November, 2011 43 / 43