REMI Database: A Data Management System for Nuclear … · 2011-12-02 · I owe many thanks to...
Transcript of REMI Database: A Data Management System for Nuclear … · 2011-12-02 · I owe many thanks to...
REMI Database: A Data Management Systemfor Nuclear Medicine Studies
byAntall Johann Fernandes
A thesis submitted toFlorida Institute of Technology
in partial fulfillment of therequirements for the degree of
Mastersin
Computer Science
Melbourne, FloridaDecember 2011
We the undersigned committee hereby approve the attached thesis
REMI Database: A Data Management System for Nuclear Medicine Studies
byAntall Johann Fernandes
Debasis Mitra, Ph.D.Professor, Computer ScienceCommittee Chair, Thesis Advisor
Philip Bernhard, Ph.D.Associate Professor, Computer ScienceThesis Committee
Marcus Hohlmann, Ph.D.Associate Professor, Physics and Space SciencesThesis Committee
William Shoaff, Ph.D.Associate Professor, Computer ScienceDepartment Head
Abstract
Title: REMI Database: Web Application for Research Study
Author: Antall Fernandes
Committee Chair: Debasis Mitra, Ph.D.
Medical Imaging data acquisition and processing are expensive ventures in terms
of personnel time and resources. Scientific labs, and in particular the nuclear medicine
research group of the Life Sciences Division, Lawrence Berkeley National Laboratory,
we work with, acquires tens of terabytes of raw and processed data worth a few million
dollars. This data, serves as a basis for many resulting publications over close to a
decade. The emphasis on reducing health care research costs makes it necessary to have
imaging data openly available for other researchers and other research institutions. This
is more true for publicly funded research data. With new scientific instruments growing
exponentially in their capability to generate research data, new infrastructure needs to
be developed, and deployed to allow researchers effectively and securely manage their
research data. Researchers need to be able to easily acquire data from instruments,
store and manage potentially large quantities of data, easily process the data, share
research resources and work spaces with colleagues both inside and outside of their
institution, search and discover across their accessible collections, and easily publish
iii
datasets and related research artifacts. The ability to share data with other institutions
depends on the selection of a good data model, ensuring that data is captured and stored
in a consistent and standardized format, and the ease of searching for data. Meta-data,
which is information about data, proves to be important for the purpose of searching and
retrieving data. Meta-data is also important to understand an experiment and the data
associated with that experiment.
In this thesis, we describe the REsearch Medical Imaging (REMI) database. REMI
is a scientific data management database, aimed at cardiac imaging studies, that helps
researchers store their experimental data in a central location. REMI also provides the
ability, via a web interface, for other researchers to query the database and download
experiment data. REMI is built with some of the best tools and development practices
that the software industry has to offer, backed by a consistent data model, and a
framework platform that provides Application Programming Interfaces (API) for future
integrations with other third party tools.
REMI is available for everyone to browse and download experimental data. REMI
can be reached via the Universal Resource Locator (URL) http://remi.lbl.
gov/. We also maintained a project page, providing links to various documents
and presentations. The project page can also be accessed via the URL https:
//cs.fit.edu/Projects/REMI.
iv
Contents
1 Introduction 1
1.1 Scientific Data Management . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Research at the Lawrence Berkeley National Laboratory . . . . . . . . 5
1.3 Overview of the chapters . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Literature Review 8
3 Design Considerations 13
4 Description of REMI 19
4.1 Overall description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 System architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 REMI database development . . . . . . . . . . . . . . . . . . . . . . . 28
4.3.1 Roles and Abilities . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3.2 Categories and Category-Values . . . . . . . . . . . . . . . . . 34
4.4 REMI application development . . . . . . . . . . . . . . . . . . . . . . 39
4.4.1 Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4.2 File Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4.3 Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
v
4.4.4 Behaviour Driven Development . . . . . . . . . . . . . . . . . 43
5 Status report 51
6 Lessons learned 53
7 Weakness and Future Enhancements 62
7.1 Limitations within REMI . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.1.1 Semi Structured Data . . . . . . . . . . . . . . . . . . . . . . . 62
7.1.2 Resume functionality for Uploads and Downloads . . . . . . . . 63
7.1.3 File Compression on the Server . . . . . . . . . . . . . . . . . 64
7.1.4 Meta-data Templates . . . . . . . . . . . . . . . . . . . . . . . 64
7.2 Future Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8 Conclusion 68
A User Documentation 76
A.1 Public User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
A.1.1 Downloading Experiment Data . . . . . . . . . . . . . . . . . . 76
A.2 Login into REMI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
A.3 Data Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A.4 Contributor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A.4.1 New Subject Work-flow . . . . . . . . . . . . . . . . . . . . . 83
A.4.2 New Experiment Work-flow . . . . . . . . . . . . . . . . . . . 89
A.5 System Administrator . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
A.5.1 Researchers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
A.5.2 Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
vi
A.5.3 Categories and Category Values . . . . . . . . . . . . . . . . . 125
B Installation Documentation 136
B.1 Version Control within REMI . . . . . . . . . . . . . . . . . . . . . . . 136
B.2 Creating a development environment . . . . . . . . . . . . . . . . . . . 137
B.3 List of Used Software Tools . . . . . . . . . . . . . . . . . . . . . . . 141
C Maintenance Documentation 143
C.1 Model-View-Controller (MVC) . . . . . . . . . . . . . . . . . . . . . . 143
C.2 Project Layout of REMI . . . . . . . . . . . . . . . . . . . . . . . . . 145
C.3 Adding a new Entity . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
D Database Documentation 156
D.1 List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
vii
List of Figures
3.1 SPECT Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1 Hibernate Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Rails MVC Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 SOAP vs REST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4 Software Component Diagram . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Logical Entity Relationship Diagram . . . . . . . . . . . . . . . . . . . 31
4.6 Physical Design Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.7 System Component Design . . . . . . . . . . . . . . . . . . . . . . . . 40
4.8 Proposed Directory Structure . . . . . . . . . . . . . . . . . . . . . . . 42
4.9 BDD Life Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.10 Feature: System Administrator manages Categories . . . . . . . . . . . 47
4.11 Feature: Users should be able to Log into REMI . . . . . . . . . . . . . 47
4.12 Failed Cumcumber Step . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.13 Step Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.14 Cucumber Test Passed . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.1 Sequence Diagram for Saving Files . . . . . . . . . . . . . . . . . . . 55
6.2 Sequence Diagram for Saving Files: Part 1 . . . . . . . . . . . . . . . . 57
viii
6.3 Sequence Diagram for Saving Files: Part 2 . . . . . . . . . . . . . . . . 58
6.4 Sequence Diagram for Saving Files: Part 3 . . . . . . . . . . . . . . . . 59
6.5 Sequence Diagram for Saving Files: Part 4 . . . . . . . . . . . . . . . . 60
A.1 Download: REMI Main Page . . . . . . . . . . . . . . . . . . . . . . . 77
A.2 Download: Modality Menu . . . . . . . . . . . . . . . . . . . . . . . . 78
A.3 Download: List of DTMRI Experiments . . . . . . . . . . . . . . . . . 79
A.4 Download: Search based on Experiment Name . . . . . . . . . . . . . 80
A.5 Download: Search for experiments that make use of WKY type of subjects 81
A.6 Log-in: REMI Main Page . . . . . . . . . . . . . . . . . . . . . . . . . 82
A.7 Log-in: Log-in Page . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A.9 New Subject: Subjects Menu . . . . . . . . . . . . . . . . . . . . . . . 84
A.8 New Subject: Contribute Link . . . . . . . . . . . . . . . . . . . . . . 84
A.10 New Subject: Subjects Index Page . . . . . . . . . . . . . . . . . . . . 85
A.11 New Subject: Subject Detail Page . . . . . . . . . . . . . . . . . . . . 86
A.12 New Subject: Saving Subject Successful . . . . . . . . . . . . . . . . . 87
A.13 New Subject: Back Button . . . . . . . . . . . . . . . . . . . . . . . . 87
A.14 New Subject: Updated Subject Index Page . . . . . . . . . . . . . . . . 88
A.15 New Subject: Searching a Subject . . . . . . . . . . . . . . . . . . . . 89
A.16 New Experiment: Contribute Link . . . . . . . . . . . . . . . . . . . . 90
A.17 New Experiment: Add New Experiment . . . . . . . . . . . . . . . . . 91
A.18 New Experiment: Create Experiment . . . . . . . . . . . . . . . . . . . 92
A.19 New Experiment: Update Experiment . . . . . . . . . . . . . . . . . . 94
A.20 New Experiment: Blank Subject Listing . . . . . . . . . . . . . . . . . 95
A.21 New Experiment: Select Subject . . . . . . . . . . . . . . . . . . . . . 97
ix
A.22 New Experiment: Associated Subjects . . . . . . . . . . . . . . . . . . 99
A.23 New Experiment: Blank Tracer Listing . . . . . . . . . . . . . . . . . . 100
A.24 New Experiment: Select Tracer . . . . . . . . . . . . . . . . . . . . . . 101
A.25 New Experiment: Associated Tracers . . . . . . . . . . . . . . . . . . 102
A.26 New Experiment: Blank File Listings . . . . . . . . . . . . . . . . . . 103
A.27 New Experiment: Selecting File . . . . . . . . . . . . . . . . . . . . . 104
A.28 New Experiment: File Details . . . . . . . . . . . . . . . . . . . . . . 105
A.29 New Experiment: Subject Information . . . . . . . . . . . . . . . . . . 107
A.30 New Experiment: Acquisition Information . . . . . . . . . . . . . . . . 109
A.31 New Experiment: Voxel Information . . . . . . . . . . . . . . . . . . . 111
A.32 New Researcher: Click System Administration Link . . . . . . . . . . . 112
A.33 New Researcher: Adding a New User . . . . . . . . . . . . . . . . . . 113
A.34 New Researcher: Filling the User Details . . . . . . . . . . . . . . . . 114
A.35 New Researcher: Successful Saving the User Details . . . . . . . . . . 115
A.36 New Researcher: Searching for the User . . . . . . . . . . . . . . . . . 116
A.37 New Researcher: Log In . . . . . . . . . . . . . . . . . . . . . . . . . 117
A.38 New Researcher: Logged In Successfully . . . . . . . . . . . . . . . . 118
A.39 New Role: System Administration Link . . . . . . . . . . . . . . . . . 119
A.40 New Role: Adding a New Role . . . . . . . . . . . . . . . . . . . . . . 120
A.41 New Role: Creating a New Role . . . . . . . . . . . . . . . . . . . . . 121
A.42 New Role: Searching for a Role . . . . . . . . . . . . . . . . . . . . . 122
A.43 New Role: Associating the Abilities with the New Role . . . . . . . . . 123
A.44 New Category: Category Listing and Menu . . . . . . . . . . . . . . . 125
A.45 New Category: Creating a New Category . . . . . . . . . . . . . . . . 126
A.46 New Category: Searching the New Category . . . . . . . . . . . . . . . 127
x
A.47 New Category: Updating the Category and Adding Values for the Category128
A.48 New Category: Adding a Category Value . . . . . . . . . . . . . . . . 129
A.49 New Category: Category Detail Page with New Category Value Listed . 130
A.50 New Category: Search for Species Category . . . . . . . . . . . . . . . 131
A.51 New Category: Adding New Value to Species Category . . . . . . . . . 132
A.52 New Category: Saving the New Value to Species Category . . . . . . . 133
A.53 New Category: Species Menu on Main Page . . . . . . . . . . . . . . . 134
A.54 New Category: Species dropdown on Subject Page . . . . . . . . . . . 135
C.1 Project Layout of REMI . . . . . . . . . . . . . . . . . . . . . . . . . 145
C.2 Create a new Entity: Scaffold Generator . . . . . . . . . . . . . . . . . 146
C.3 Create a new Entity: Migration File . . . . . . . . . . . . . . . . . . . 147
C.4 Create a new Entity: Routes File . . . . . . . . . . . . . . . . . . . . . 147
C.5 Create a new Entity: Run Migration . . . . . . . . . . . . . . . . . . . 148
C.6 Create a new Entity: Access the Entity . . . . . . . . . . . . . . . . . . 148
C.7 Create a new Entity: Adding a Menu to REMI . . . . . . . . . . . . . . 150
C.8 Create a new Entity: Adding a Menu to REMI Menu Page . . . . . . . 151
C.9 Create a new Entity: New Entity Menu Item . . . . . . . . . . . . . . . 151
C.10 Create a new Entity: Default Listings Page . . . . . . . . . . . . . . . . 153
C.11 Create a new Entity: Modified Listings Page . . . . . . . . . . . . . . . 153
C.12 Create a new Entity: New Listings Page . . . . . . . . . . . . . . . . . 154
C.13 Create a new Entity: Modifying the form.html.erb File . . . . . . . . . 155
xi
List of Tables
B.1 List of Used Software Tools . . . . . . . . . . . . . . . . . . . . . . . 142
xii
Acronyms
API Application Programming Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
CSMD Core Scientific Meta-Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
CT Computed Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
CRUD Create, Read, Update, and Delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
DICOM Digital Imaging and COmmunications in Medicine . . . . . . . . . . . . . . . . . . 11
DOM Document Object Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
DTMRI Diffusion Tensor Magnetic Resonance Imaging (MRI) . . . . . . . . . . . . . . . 19
fMRI functional MRI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
ICAT Information CATalog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
LAMP Linux, Apache, MySQL, and PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
MEDIMAN Medical Image MANagement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
MRI Magnetic Resonance Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
MVC Model-View-Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
NEMA National Electrical Manufacturers Association . . . . . . . . . . . . . . . . . . . . . . 15
ORM Object Relational Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
xiii
PACS Picture Archiving and Communication System . . . . . . . . . . . . . . . . . . . . . . . 9
PET Positron Emission Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
RAID Redundant Array of Independent Disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
REST REpresentational State Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
RPC Remote Procedure Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
REMI REsearch Medical Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
SOAP Simple Object Access Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
SPECT Single-Photon Emission Computed Tomography . . . . . . . . . . . . . . . . . . . . . . 6
STFC Science & Technology Facilities Council . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
TCP Transmission Control Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
URL Universal Resource Locator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
WASA Workflow-based Architecture to support Scientific Applications . . . . . . . . 9
WSDL Web Services Description Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24
xiv
Acknowledgements
I owe a tremendous debt of gratitude to the many people who made this thesis possible.
I would like to start with thanking my mentor and advisor, Dr. Debasis Mitra,
for giving me the opportunity to work with him on a number of different projects,
and eventually to work on REMI that lead up to this thesis. His constant guidance,
enthusiasm, and support through all of the projects I undertook with him were
invaluable.
Special thanks goes to Dr. Grant Gullberg for giving me the opportunity to work
on REMI and for all of his guidance and inputs that went into REMI. This project was
supported by a subcontract on NIH grant 5-R01-EB007219-04, ”Molecular Imaging
of Cardiac Hypertrophy Using microPET and Pinhole SPECT”. I also appreciate his
giving me the chance to work with the other researchers at the Lawrence Berkeley
National Laboratory, and more importantly a chance to do my internship in Berkeley.
I owe many thanks to Rostyslav Buchko, Archontis Giannakidis, Martin Boswell and
Andrew Hernandez for their wonderful insights and feedback during the entire course
of developing REMI. They have been very patient and were always there on hand if any
thing needed clarification or review. It has been a joy working with these researchers.
xv
I extend my thanks to Stephanie Taylor and Kathleen Brennan, in helping me
understand the MicroPET experiment processes, and for an entertaining time at the
UCSF Laboratory during the long MicroPET experiments.
I want to thank my committee members, Dr. Philip Bernhard and Dr. Marcus
Hohlmann. Their comments, understanding, and interest in my work were extremely
encouraging.
I would also like to thank my colleagues Mahmoud Abdallah and Daniel Eiland,
for their insights and support, when tackling with certain issue I faced during the
development process.
I owe many thanks to the Department of Computer Science for supporting my study.
Last but not least, I sincerely thank the friends and families that offered me most
generous help and joy during my stay in Florida.
xvi
Declaration
I declare that the work in this thesis is solely my own except where attributed and cited
to another author.
xvii
Chapter 1
Introduction
Medical imaging data acquisition and processing are expensive ventures in terms of
personnel time and resources. Hence it becomes imperative that data captured during
any research study is stored in a manner that is reliable and easily retrievable. In
many research organizations the use of hierarchical data model for file organization
prevails. Elaborate directory structures are thought out that are deemed best suited for
capturing as much information related to the studies as possible, and also providing a
systematic and easy way to lookup data later. The hierarchical data model originated
for systems with thousands of files, and yet now we store 7 to 9 orders of magnitude
greater number of files within a single system. Advances in computing technology,
disk capacity, and processing power have paved the way for data intensive scientific
computing. This has led to growth in raw data, which in turn has resulted in the need
for for data analysis capabilities. The importance of data analysis brought forth the need
for storing additional related meta-data in an organized way.
1
With the traditional hierarchical data model1, user defined tags can only be defined as
part of file names or its location in the hierarchical structure. Any additional meta-data
that needs to be captured would have to in some way be incorporated through one of
these two methods. This leads to very lengthy file or directory naming conventions,
or very deep directory structure both of which are not suitable for fast data access.
Also given the familiarity of Google search, there is a popularity around doing full text
searches for files and documents which is not possible in the hierarchical data model.
This is far short for proper meta-data tracking, which is almost non query-able and it is
definitely not a very scalable approach.
Building and organizing medical data depends heavily on meta-data and the ability
to associate this meta-data with the data files. Meta-data describes how data was
measured, acquired, and computed. It is this meta-data that proves quite useful in
database searches. Meta-data can be captured in a number of different ways and at
different time points within a study. Meta-data may be generated by the instruments
used, or parameters that govern the study in some way. In most cases the meta-data
is logged in log files, or parameter files generated by the instruments themselves, or
in other cases meta-data could be external and could be in the principle researchers
mind or something noted down in their journals. Consolidating all of this meta-data that
could be dispersed across different mediums is very critical. Lack of meta-data results
in researchers spending a lot of time wading through entire sets of files to ascertain
relevance of files. If researchers are to read data collected by others, then the data
must be carefully documented and must be published in forms that allow easy access
and automated manipulation. README files and other types of log and text files have
been used in the past to capture all sort of meta-data, but this approach does not scale
1The File Directory structure is an example of the hierarchical data model
2
well. Another set of critical information that does not make its way into text files is the
association of different data sets and files within a single study or even across different
studies. These associations across data sets are important in understanding how different
files or data set came to be. The availability of useful and meaningful meta-data can also
aid in the process of automation. Meta-data is information about data. Good meta-data
makes it possible to build data independent automated tools. Tools could read the meta-
data, understand the structure and properties of the data, and then go about processing
the data. Such tools would be generic in nature and could be used on a number of
different data sets independent of the physical nature of the data.
It is also a fairly common practice among researchers to store all of their research
work either on a personal workstation or a personal account on a remote network storage.
They even create their own data organization schemes that they find suitable. In most
cases there are no set guidelines or standards maintained for data organization. Having
the data across a number of systems is ideal from a storage point of view, but that makes
it hard to search for studies across different researchers. The varying data organization
schemes just add to the complexity and it is a major hurdle when it comes to sharing
research data with other research organizations. These hurdles are even more prominent
when a researcher leaves the organization, leaving other researches with the task of
locating and understanding all of their experiment data.
Given all of the concerns mentioned above, the need for a system to capture,
organize, and share research information has gained a lot of importance. With meta-
data playing a vital part within such systems and given that using a hierarchical data
model does not suffice, most of the system being built and deployed have employed
some sort of database. Be it a relational, or object oriented, or XML database, or a
combination of them.
3
1.1 Scientific Data Management
Before introducing our work, it is worth mentioning that the problem of scientific data
management has been tackled by different groups, in a number of different ways. Some
of them are very specialized databases either focusing on a particular modality or a
specific anatomy part. There are other database and data modeling efforts that focus on
the general scientific studies. Most of the efforts use some sort of database to store meta-
data. While designing and building a scientific data management system it is expected
that the system supports certain features that have been listed below[39].
• Creation of logical collections.
The primary goal of a Data Management system is to abstract the physical data
into logical collections. The resulting view of the data is a uniform homogeneous
library collection.
• Physical data handling.
This layer maps between the physical to the logical data views. Here you find
items like data replication, backup, caching, etc.
• Security support.
Data access authorization and change verification. This is the basis of trusting
your data.
• Data ownership.
Define who is responsible for data quality and meaning.
• Persistence.
Definition of data lifetime. Deployment of mechanisms to counteract technology
obsolescence.
4
• Knowledge and information discovery.
Ability to identify useful relations and information inside the data collection.
• Meta-data collection, management and access.
Meta-data are data about data.
1.2 Research at the Lawrence Berkeley National
Laboratory
REMI has been developed for the Lawrence Berkeley National Laboratory, Life
Science Division, Department of Radiotracer Development & Imaging Technology. The
Department of Radiotracer Development & Imaging Technology uses nuclear tracers to
study biochemical processes in living organisms, and develops new tracers and imaging
systems. Research includes
• Technology to make radioisotopes more widely available.
• Development of new radiotracers.
• Radiotracer imaging to study and guide the treatment of diseases of the brain and
heart, and cancer.
• Radiotracer imaging to enable the development of improved biofuels.
• Scintillation radiation detectors.
• Application-specific detector arrays and readout electronics.
• Image reconstruction and analysis.
5
The department facilities include a medical cyclotron, a Positron Emission Tomography
(PET) camera, a Single-Photon Emission Computed Tomography (SPECT) camera with
X-ray Computed Tomography (CT) scanner, and a 1.5T MRI system.
The group, for whom REMI was developed, deals with cardiac nuclear imaging
and its image reconstruction and analysis. The researchers within this group perform
experiments using either a PET or SPECT or DT-MRI machine. Data generated from
these equipment along with the meta-data are then uploaded into REMI.
Diagnostic techniques in nuclear medicine use radioactive tracers which emit gamma
rays from within the body. They can be given by injection, inhalation or orally. The
first type are where single photons are detected by a gamma camera which can view
organs from many different angles. The camera collects the gamma rays passed through
collimator holes and reconstruction algorithms build up an image from the points from
which radiation is emitted; this image is enhanced by a computer and viewed by a
physician on a monitor for indications of abnormal conditions. These types of systems
are known as SPECT machines.
A more recent development is Positron Emission Tomography (PET) which is
a more precise and sophisticated technique. A positron-emitting radionuclide is
introduced, usually by injection, and accumulates in the target tissue. As it decays
it emits a positron, which promptly combines with a nearby electron resulting in the
simultaneous emission of two identifiable gamma rays in opposite directions. These are
detected by a PET camera and give very precise indication of their origin. PET is used
in cardiac and brain imaging.
6
1.3 Overview of the chapters
The remainder of this thesis is organized into the following chapters:
In Chapter 2 we review the related literature. Our review is focused on two areas.
First, we provide an overview of the different database offerings built to manage medical
imaging data. Secondly, we look at the data models that are used to model scientific data.
Chapter 3 discusses the design considerations made during the development of
REMI and the methods used to develop its database.
Chapter 4 describes REMI, detailing its goals, architecture, functionality and the
database and application development process.
Chapter 5 presents the current status report of REMI. We provide details of the
present production environment.
Chapter 6 describes the problems we faced developing REMI and the lessons
we learned while investigating those problems and implementing solutions to those
problems.
Chapter 7 highlights the weaknesses in the work as it stands, and suggest future areas
for consideration and ways in which the current work could be extended.
7
Chapter 2
Literature Review
In this chapter, we provide a review of the literature related to the work described in
this thesis. Management of research data has been approached in a number of different
ways. Some approaches focus on comparative analysis of medical images comparing
one image to another. These are categorized as imaging databases, built with a certain
modality or anatomic region in mind. Other initiatives address the work flow that
is associated with the entire research process and how software systems could better
manage them. Research workflows and workflow management differ significantly from
that of business applications and efforts are being made to build systems that capture
the entire workflow process within research studies. Lastly we have scientific data
management systems, whose primary focus is storage and retrieval of research data and
its associated meta-data and data-files. Scientific data management systems differ from
the first two groups in that they do not focus on any workflows nor do they aid in any
comparative data studies. Central to all of these three systems is the use of meta-data and
in most cases efficient and reliable capturing of meta-data helps define the usefulness of
such systems. Modelling of this meta-data is another area of constant research given
8
that scientific data is heterogeneous and distributed over heterogeneous software and
hardware platforms.
Today, medical imaging units can store and transmit medical images from multiple
modalities using Picture Archiving and Communication System (PACS)[26]. The PACS
can store and retrieve scanned documents in addition to images and reports. However,
its objective is completely clinical and whose focus is oriented towards individual
patient information management. Such systems handle entire workflows covering the
entire spread of patients entry into the system upto the billing process. The workflow
architecture in a research lab may be quite different from that in a commercial clinical
imaging organization. Mathias Weske and Gottfried Vossen, in their work[40], talks
about Workflow-based Architecture to support Scientific Applications (WASA) as an
approach to workflow management for Scientific Applications. Their work is centered
around experiments, experiments being composed of a number of steps and the stages
in which an experiment may exist in. Although our work is also centered around the
notion of an experiment, our current implementation of REMI focuses purely on the
storage and management of experiments and their files, within the entire experiment life
cycle.
Within the Cardiac Nuclear Imaging domain two types of databases exist; static
and dynamic. Static databases are the commercially available clinical systems, like
Quantitative Perfusion SPECT[9], Emory Cardiac Toolbox[4], euHeartDB[31] and
Corridor 4DM[27], containing normalized archived images along with prepackaged
software for analysis and visualization. The primary purpose of these databases is
for researchers and clinicians to compare their images with the ones archived within
the system. On the other hand, dynamic databases as Cardiac Atlas from Oxford[30],
MEDIMAN[34] or ours (REMI), grow dynamically with continuous contribution. The
9
primary target users are the researchers, who may or may not have access to any
data gathering labs, to process the data on their own. REMI is freely and easily
accessible without any liability to the database owner. Our evaluation criteria will be
the usage frequency and ease of usage for the contributors (uploads data) or public users
(downloads data) of data.
Since REMI is a scientific data management system, our focus has been on defining
a data model to capture experiment information and related meta-data and designing
and building a system to manage research experiments using this data model. The
data model that REMI uses is similar to Core Scientific Meta-Data Model (CSMD)[35]
which has been developed at the Science & Technology Facilities Council (STFC), UK.
The model captures high level information about scientific studies and the data that
they produce. The model is designed to be generic across scientific disciplines and has
application beyond scientific facilities, particularly in the structural sciences (such as
chemistry, material science, earth science, and biochemistry). The paper also talks about
an Information CATalog (ICAT) system built on top of the CSMD data model. ICAT
in many ways is similar to REMI capturing experiment and meta-data information, but
is developed with more advanced features for integration with other medical devices.
REMI currently is not feature-rich when it comes to integration with different medical
instruments, but does expose REpresentational State Transfer (REST) API[29] for future
enhancements in the area.
Another effort comparable to our work was presented by Esperanza Marcos, Cesar J.
Acuna, Belen Vela, Jose M. Cavero and Juan A. Hernandez[34]. They developed a web
information system known as Medical Image MANagement (MEDIMAN) for medical
image management and processing. The system is primarily being used for PET and
functional MRI (fMRI) of the brain. MEDIMAN provides a web interface through
10
which the researchers could upload new clinical trials and research tasks and associate
result files, along with image files and various other documents. The paper specifically
talks about the development of the database and has no mention of how the various files
being uploaded are handled. MEDIMAN makes use two different database technologies
namely the Object Relational Model (ORM) and XML Model. The ORM is used to
model the structured part of the data being captured and the XML model is used to
capture the semi structured or dynamic data. Majority of the data that MEDIMAN
deals with is either Digital Imaging and COmmunications in Medicine (DICOM)[3] or
ANALYZE[1] file. Meta-data is extracted from within these files and stored within an
XML structure which is then saved to the database.
The ARCHER Project[21] has developed a production-ready, open-source generic
e-Research infrastructure including a Research Repository and a Scientific Dataset
Managers (both a web and desktop application) and a Collaborative Workspace
Environment. Institutions can selectively deploy these components to greatly assist
their researchers in managing their research data. ARCHER allows researchers the
flexibility of iterative and heuristic workflows and ease of collaborative management,
and ensures that data remain well curated and publication-ready, with appropriate meta-
data, provenance, and authorization. Our interest in the ARCHER Project is their data
storage scheme and data model to store data. The ARCHER Project primarily makes
use of a key/value data store capable of dealing with large and numerous dataset. The
key/value data store was deemed inadequate for the requirements of proper data curation
and hence an additional meta-data store was used based on the Core Scientific Meta-
Data Model.
Apart from literature that talks about the different approaches to building the various
systems to manage scientific data, a couple of papers talk about the general landscape
11
of the area dealing with Scientific Data Management. Jim Gray and David T. Liu in
their paper Scientific Data Management in the Coming Decade[32] talk about how the
solution to scientific data management lies in moving towards Science Centers. They
state that the demand for tools and computational resources to perform scientific data
analysis is rising faster than data volumes. Science centers provide access to both the
data and the applications that analyze the data. Providing such capabilities then do
away with the need for researchers to download or copy the files to a local system
and using his own resources to work on the data. This minimizes data movement and
allows collaboration among a group of scientists doing joint analysis. They also discuss
the ability to replicate data, much like Redundant Array of Independent Disks (RAID)
does today to provide both data availability and protection against data loss. Meta-data
also proves to be an enabler for sophisticated tool automation. Extensive meta-data
describing the data in standard terms is needed so people and programs can understand
the data.
A more detailed look into the characteristics of a scientific data management system
is provided by Mario Valle in his article Scientific Data Management[39]. Mario
starts with the problems that plague the scientific research field such as scattering of
research data across a research organization, lack of important meta-data and proper
data storage strategies to store the research data. He then identifies critical points
that any scientific data management system should incorporate such as creation of
logical collections, physical data handling, security support, data ownership meta-data
collection, management and access, persistence, knowledge and information discovery
and data dissemination and publication. REMI was developed with the intention that it
could deliver most of the above mentioned point if not all.
12
Chapter 3
Design Considerations
This section we discuss the design decisions of REMI and the basic functionality needed
to be supported by REMI.
One of the primary motives to building a medical research database system is
the need to share research data and results across a number of different research
organizations and different researchers. Users need to navigate through data sets based
on different search criteria, and download data that they are interested in to their local
machines. A web interface easily facilitates the ability to allow anyone to access the
system, explore, and download files from anywhere in the world, provided they are
connected to the Internet. To interact with the web interface all one needs is a web
browser. Web browsers come installed on most present day operating systems, and one
can be easily be downloaded from the Internet. One of our initial design decisions is
to avoid the use of any third party browser plug-ins. Uploading data to the database
can also be done independently from anywhere, provided the user has proper privileges
assigned to them. Privileges are assigned to users via roles. A User interacts with
the database through a specific role. A user may want to upload data to REMI and
13
hence plays the role of a contributor. As a contributor, the user is provided privileges
to upload experiments to REMI. Similarly a user may want to add users. They do this
as a system administrator and needs to have the System Administrator role assigned
to them. The System Administrator role has privileges to add, remove, and delete users
from REMI. Hence roles are assigned to users there-by allowing them certain operations
within REMI.
Data generated by medical imaging studies can be of very high volume. Given
the advancements in technology, and the availability of high volume data storage
devices, it is made possible for medical equipment to capture highly precise and detailed
information, at the cost of generating huge volumes of data. Given the need to be able
to store such high volumes of data, it is necessary that the storage scheme should be
dynamic enough to increase as and when needed by adding more disk space.
Another necessity of the database is to be able to handle storage of files in an
automated and well distributed manner on the file system. There are a number of ways
one can go about storing files on a file system, and each of them vary in complexity. The
simplest of solutions is to have all files stored under a single directory. Another approach
is to create an individual directory for each of the research studies. The intention is to
find a solution that is able to store files in a highly efficient and distributed manner across
directories.
The database should be backed by a good data model that captures all types of data
that gets generated as part of different research studies being conducted in any single
organization. We have four level of data sets, namely:
• Scientific Equipment generated Raw Data
Most present day scientific equipments, in addition to generating raw data,
also generate additional information regarding different machine parameters.
14
Parameters may be static values defined on the machine themselves or may
be parameters that could be changed through some user interface either on the
machine itself or via an attached computer terminal. Each machine could follow
different formats of how this information is captured. In most studies that we
deal with DICOM[3] or INTERFILE[20] or ANALYZE[1] file formats are used
to capture significant amount of meta-data.
The DICOM is a standard created by the National Electrical Manufacturers
Association (NEMA) to aid the distribution and viewing of medical images, such
as CT scans, MRIs, and ultrasound. Part of the standard describes a file format
for the distribution of images. Image files which are compliant with the DICOM
standard are known as DICOM format files. A single DICOM file contains both a
header, which stores information about the patient’s name, the type of scan, image
dimensions, etc, as well as all of the image data, which can contain information
in three dimensions.
Analyze[1] is an image processing program, written by The Biomedical Imaging
Resource at the Mayo Foundation. The Analyze format is the image processing
format used by the Analyze program. The Analyze format stores the image data
in one file (*.img) and the header data in a standardized text file (*.hdr).
• Parameter data
Raw data sometimes may need to be massaged or normalized before any analysis
on it takes place. An example of this process is to re-sample raw data for image
reconstructions. This sort of data massaging is done based on a certain process
and variable or parameters that belong to that process. All of these parameters
15
used in such data massaging are also needed to be captured in the database and
forms part of the meta-data for that study process.
• Reconstruction Data
In nuclear medicine, the SPECT or PET camera rotates around the patient,
taking pictures of radioisotope distribution within the patient from different
angles. These ”pictures” acquired from the nuclear medicine camera are called
projections. The procedure to put the projections together to obtain a patient’s
image is called image reconstruction. Reconstructed images constitute the third
type of binary data we deal with. A wide number of reconstruction methods
can be used to reconstruct images. Each of these methods have their own
set up parameters that drive the reconstruction process and generate parameters
as part of the reconstructed image. Again these parameters may exist within
log file or parameter files that are generated by the reconstruction process. In
short, the typical meta-data includes processing parameters that form part of the
reconstruction process.
• Quantified Research Results
Research studies are conducted to evaluate some original research hypothesis and
are used to generate quantified research results. These results also need to be
captured so that these results can be queried and shared across with other research
organizations and publications.
REMI involves all four stages of data capture and attempts to include all meta-data
and lineage of the data files, thus, facilitating maximum usefulness to any researcher,
even non-members of the lab. Figure 3.1 illustrates a particular SPECT study. The image
shows a patient lying down on the machine gantry and is being imaged by the detector
16
of the SPECT machine that rotates around the patient. The imaging data captured by the
SPECT machine forms the raw data captured. Vital to SPECT imaging and the process
of image reconstruction is the presence of two coordinate systems depicted in the image.
The first coordinate system is the Patient Coordinate System (PCS) shown by the x, y
and z axis. Things to note about the PCS is
• the PCS origin lies at the patient’s right anterior abdomen
• x-axis points from the patient’s right to their left
• y-axis points from their chest to their back
• z-axis points from their feet to their head.
The second coordinate system is the Detector Coordinate System (DCS) shown by
V and U axises on the detector. The geometry of the DCS is
• the DCS origin is in the detector corner behind the operator’s head
• V points from the patient’s right anterior to left posterior
• U points from the patient’s feet to head.
All of this information along with the length of each of these axis, forms part of the
parameters that prove useful in understanding the experiment and aids the researcher
during the image reconstruction process.
17
Figure 3.1: The patient coordinate system and detector coordinate system are depictedin the image
18
Chapter 4
Description of REMI
4.1 Overall description
REMI is a web-based system built to handle medical research data and is currently
being used by researchers within the Life Sciences Division at the Lawrence Berkeley
National Laboratory, California. The primary reason for building REMI is to enable
researchers to share their research studies with other research organizations. Most data
within REMI are files that are generated from either Diffusion Tensor MRI (DTMRI) or
SPECT or Micro-PET studies using scanners that are applicable to each of the different
modalities. Each of these data acquisition and processing studies are expensive ventures
in personnel time and resources and hence there is a need to collect, tag and store the
data in a reliable and consistent manner. Although REMI operates in a centralized way,
it can be used virtually anywhere as it is web-accessible. Organization and researchers
are allowed to query the system through an easy and intuitive web interface.
19
4.2 System architecture
The web interface for REMI is built using the Ruby on Rails application framework.
Ruby on Rails (henceforth ’Rails’)[33] is an open source web application framework and
uses the MVC[38] architecture pattern to organize application programming. Figure 4.2
demonstrates the Rails implementation of the MVC architecture. The framework makes
web development an easy task by aiding the developer with ’out-of-the-box’ tools that
take care of common development tasks. One such important tool is the scaffolding
generator that generates for the developer the bare bone components of model, view
and controller based on the parameters provided to the scaffold generator. Apart from
providing such tools, Rails also follows a couple of development principles that make
developing and using this framework quite appealing.
20
(a) Hibernate XML Configuration File
(b) User Table Definition
Figure 4.1: To define a mapping between the model class and the database table aXML configuration file is maintained to specify this mapping. (a) Hibernate XMLConfiguration File. (b) User Table Definition
A key principle of Rails development is ’Convention over Configuration’ (CoC).
This means that the programmer only has to specify a particular mapping if and only if
the programmer deviates from the conventions set by the application framework. This
is quite a refreshing change as opposed to frameworks such as Java Platform, Enterprise
Edition (J2EE)[6] where integrations with the various components in the framework
need to be recorded within configuration files. Figure 4.1(a) shows a typical hibernate
XML configuration file that defines the mapping of a User class and its attributes to
its corresponding database table and columns. Figure 4.1(b) shows the table definition
of the User table defined on the database. While following the Convention process,
defining a class named User automatically translates to the application framework, in
21
this case Rails, to map to a table named User in the database. Also attributes need not
be explicitly defined on the class as it figures out the related attributes by querying the
table definition within the database. Doing away with configuration files reduces on the
number of files the developer needs to maintain which results in a smaller codebase to
manage. Another principle that the Rails framework encourages is the ’Don’t Repeat
Yourself’ (DRY) principle of software development. Rails provides easy programming
constructs that help the user to do away with repetitive code being written in a number of
places. Programmers with the help of partials and modules can make the code compact
and modular by extracting common code into one of these two constructs.
22
Figure 4.2: The Model-View-Controller architecture as implemented in Rails
Another characteristic of Rails is the emphasis on RESTful application design.
REST[29] is a style of software architecture based around the client-server relationship.
It encourages a logical structure within the application, which means that the REMI
application can easily be opened up as an API of other tools and programs to interact
with REMI. There are a number of competing application programming interface
standard such as Remote Procedure Calls (RPC)[23], Simple Object Access Protocol
23
(SOAP)[18] and Web Services Description Language (WSDL)[11]. Each of these
standards have their merits but are heavy and verbose in their implementations as
opposed to RESTful API. The REST state makes use of the URL[22] string along with
a combination of the HTTP GET and POST requests to send back instructions to the
server. A combination of the type of HTTP request along with the URL itself contains
information such as the component that we want to interact with, the operation that we
want to perform with that component and the ID for a particular instance that we are
interested in.
Figure 4.3 shows the implementation of the SOAP and REST protocols. The web
service example is querying a phonebook application for the details of a given user. The
ID of the user is provided to the web-service. The SOAP implementation is sent to the
server via an HTTP POST request, and the data is received inside a SOAP response
envelope. Additional processing is required to process the SOAP response envelope to
fetch the returned data. Looking at the REST implementation, the construct is not a
request body, but just a simple URL. This URL is sent to the server using a simpler
GET request, and the HTTP reply is the raw result data, not embedded inside anything,
just the data one can directly use.
24
Figure 4.3: Difference between SOAP and REST. Example taken from an onlineresource[7]
Users interact with the REMI system via a web interface. All HTTP requests from
the clients are handled by an Apache HTTP server. The REMI system is built using the
Rails framework. Apache cannot directly communicate with the Rails application and
hence cannot hand off the requests coming from the clients; to the Rails application.
Phusion Passenger[17] is a module for the Apache HTTP Server to handle deployment
of Ruby applications, including those built using the Ruby on Rails framework. To help
Apache with the task of request forwarding to the Rails application Phusion Passenger
is installed.
For any application the user experience is one of the most important aspects of
application adoption and success. To provide a better user experience and to make
the user interface more responsive we found it necessary to minimize a lot of client
- server communication. Our clients are thin clients but rely heavily on JavaScript
within our web pages to make the user interface absolutely responsive to user inputs.
We have also made use of the File[15] and Drag and Drop[5] API, that form part of
25
the HTML5[16] specifications, and asynchronous Ajax requests to improve the user
interaction with REMI.
Data-files and meta-data captured as part of each experiment is very critical for
research study to be conducted. It is expected that REMI needs to provide a highly
reliable, fault tolerant and secure data storage solution. The usage of a RAID[37] storage
device is highly recommended which addresses all of the above requirements. Through
the use of redundancy, most RAID levels provide protection for the data stored on the
array. This means that the data on the array can withstand even the complete failure of
one hard disk (or sometimes more) without any data loss. A good RAID implementation
can improve data availability by providing fault tolerance and by providing special
features that allow for recovery from hardware faults without disruption. A RAID
device turns a number of smaller drives into a larger array, there-by permitting the
possibility of creating a data storage of larger capacity than each single storage drive.
This facilitates applications that require large amounts of contiguous disk space, and
also makes disk space management simpler. RAID systems improve performance by
allowing the controller to exploit the capabilities of multiple hard disks to get around
performance-limiting mechanical issues that plague individual hard disks. Different
RAID implementations improve performance in different ways and to different degrees,
but all improve it in some way.
26
Figure 4.4: Various software components that make up REMI
Based on the individual components explained above, figure 4.4 shows a high
level software component interaction diagram that collectively works together to make
up the REMI system. The primary method of interacting with REMI is through a
web browser. The web interface makes use of JavaScript to handle client side user
interface manipulations and asynchronous interactions with the REMI server. The web
browser on the client side sends requests to the Apache web-server on the server. The
27
web-server hands off the requests to the passenger module to interact with the Rails
application framework. The application framework houses the business logic to handle
all the requests coming from the client. Depending on the nature of the request, Rails
interacts with the database and in addition may also interact with the underlying file
system to fetch or store data files. The Rails framework communicates with the MySql
database via the ActiveRecord object-relational model API. The ActiveRecord Model
library comes bundled with the Rails installation so it forms part of the web-application
framework. To interact with the file system to store and retrieve data files, we have made
use of core File library API that comes as part of the Ruby language.
4.3 REMI database development
REMI makes use of a data model that is organized around a notion of an Experiment,
where an experiment is defined as being a body of scientific work on a particular
subject or subjects of investigation. Each experiment is a single scientific task, be it
an observation, measurement, or simulation. Raw data is generated for this task, which
is then analyzed to produce derived data. All of the data generated is then associated
with the Experiment, with proper meta-data filled out. Every experiment is conducted
and owned by a Researcher. The experiment process may include the need to make
use of an imaging device to gather and capture related experiment information. An
experiment is defined as a fundamental unit of research work, which may or may not
consist of smaller tasks. REMI defines an experiment, not as a series of events or tasks,
but as a research work that starts on a particular date and involves making use of at
most one medical device for that process. Hence on any given date an experiment could
28
make use of a single medical device and a number of Subjects could participate in that
experiment.
Following is a listing of all the logical entities of the data model for REMI. Figure
4.5 shows the relationships between each of the entities.
• Experiment
A fundamental unit of research work that has a name associated to it, a description
to describe what the experiment is about or what the experiment aims to
accomplish, date on which the experiment took place and the modality to which
the experiment belonged to.
• Subject
A subject represents any object or phenomenon that is observed for the purpose of
research. For the kind of experiments that we address the subjects mostly belong
to the Animal kingdom. Each subject is uniquely identified based on a unique
assigned subject tag. Other information such as the species, gender, date of birth
and health condition is also captured. It is important to ensure that information
related to a subject be made anonymous and in no way one should be able to
identify an subject based on the captured information.
• Researcher
An experiment is always conducted by a particular researcher. The researcher
owns the experiments they perform. Apart from the name of the researcher, it is
also necessary to capture the position he holds, the department that he belongs to
and the institution or organization that he works for. Within the context of REMI,
a researcher is one who conducts experiments and then contributes to REMI by
29
uploading the various data-files and meta-data generated as part of the experiment
conducted.
• File
A file is a block of arbitrary information, or resource for storing information,
which is usually based on some kind of durable storage. Files are generated as
a process of conducting an experiment and contains information related to the
experiment being performed. Apart from the actual contents of the file, additional
information such as a file name, size of the file, date of creation and modification,
content type, location of the file within a hierarchical directory structure and fixity
information such as a checksum are also maintained with the file.
• Tag
A tag is a keyword or term assigned to a piece of information such as an Internet
bookmark, digital image, or in our case the data file. This kind of meta-data helps
describe an item and allows it to be found again by browsing or searching. Tags
represent semi-structured data that can be used to describe the files being attached
to an experiment.
• Machine
A Machine represents the instrument used, as part of an experiment, to gather
and capture data. Machines are varied in purpose, construction and capabilities.
Although a machine can be defined by numerous parameters, the ones that are
important to REMI to capture is the name of the machine, the manufacturer,
the physical address of the machine and number of detectors on the machine if
applicable.
30
• Detector
Detector contains information of the detectors on the machines and information
such as type of collimator, detector material being used for an experiment.
Figure 4.5: Main Entities of REMI
The proposed logical entity relationship diagram for REMI is shown in figure 4.5.
The Experiment entity is central to the entire REMI system. Experiments are conducted
by a particular Researcher and may be performed on one or more Subjects. A researcher
conducts a number of different experiments as deemed necessary within his area of
research. As part of the experiment, the researcher may make use of a particular
Machine to capture data or monitor subject and experiment conditions. A single
machine can be used to conduct different experiments. In the process of conducting
31
an experiment, data is captured/logged into one or more files. These files form part
of the entire experiment or could be files generated while studying a particular subject
that participates in the experiment. During the course of the experiment information
regarding the status of the subject needs to be captured. This information is represented
by attributes defined on the relationship between Subject and Experiment. Files are
generated as a result of performing an experiment. Since files cannot exist without being
part of a certain experiment, the File entity is shown as a weak entity whose existence
depends on the Experiment entity. File can be associated to an experiment in two ways.
As mentioned earlier certain files are generated while studying a particular subject that
participates in the experiment and hence should get associated to the ’participation’
relationship of the experiment and that subject. Every other file generated that does
not pertain to any particular subject is then associated directly to the experiment.
This association between Experiment and File or Experiment-Subject and File is also
captured within the logical ER diagram.
Although the data model is centred around the Experiment entity, the primary objects
for REMI is the data files that get associated to the experiments. Having explained the
importance and the role that meta-data can play in medical data management systems
such as REMI it is important to understand the nature and characteristics of the meta-
data being captured. All of the meta-data that gets captured can be broken down into two
distinct categories. Firstly medical image information, i.e. the data of each image file
and the results of the image processing; which can be considered as semi-structured
data; secondly information on the research study, subjects, researchers or principle
investigator which is highly structured data. Structured meta-data are those attributes
that we know to be important at the design-time and some may be query-able, i.e.
some user may want to access data-files that have queried attribute values (e.g., search
32
human sinograms where reconstructed image is available, or ”search list-mode raw data
for all hypertensive rats”). Other structured meta-data may never be queried on, e.g.,
processing parameters, and yet must be available. Semi-structured meta-data may be
query-able, but we do not either know those attributes nor their values, at the design-
time. They are designed as tags or key-words whose values are user provided. The query
engine may do string search on them. Unstructured meta-data are textual descriptions
that are open for viewing only and database has no search capability on them.
Identifying this distinction between these two sets of data is critical in the design
and development of the REMI database. The structured data can be easily mapped to a
relational database model but for the semi structured data, a relational data model does
not work since the semi structured data is dynamic in nature where we do not know of
those attributes nor their values at design-time.
The logical ER diagram include some additional entities to handle user privileges
and they have been explained below.
4.3.1 Roles and Abilities
Roles are entities that are used to govern how the users can interact with REMI. Roles
are assigned to users of the system. There are namely four roles defined on the system
out of the box.
• Public User
Public User is not an explicit role defined on the system but if a user has no other
role assigned to him then he automatically is a public user. Any user by default
is a public user. As a public user one can browse and query the REMI system for
the various studies within REMI. As a public user one can download data files
33
associated with the various experiments. In short one can view the system via this
role but cannot contribute or add data back into the database.
• Contributor
A contributor is one that can contribute experiments to the REMI system. He can,
in addition to viewing other experiments, create, update, and delete experiments.
The other entities that he has privileges on are Subjects and Machines. So he can
perform Create, Read, Update, and Delete (CRUD) operations on these entities
too.
• Data Authenticator It is important to ensure that the REMI system be used
mainly for storing and sharing medical experiment data. There is potential to
use the system for file sharing purposes other than medical data files. To ensure
that experiments and more importantly the files being uploaded are related to
the experiment there is a need to ensure some sort of gate keeping before the
experiments can be made public. A Data Authenticator in this role is primarily
tasked with authenticating the data being uploaded and granting permission for
that particular experiments to be made public.
• System Administrator A System Administrator is in charge of maintaining the
Roles, Users and the Categories within the system. He has CRUD privilege on all
of these three entities.
4.3.2 Categories and Category-Values
Every entity/class comprises of a list of attributes. An instance of an entity is described
by the data stored in each of its attributes. For some attributes, defined on a number
34
of entities within REMI, its value is chosen from a set of pre-defined values. Each
of these attributes would have its own unique set of values to chose from. We have
defined this attribute to predefined list of values mapping as categories and category-
values. Modeling data of this nature is also known as List-of-Values (LOVs)[8]. A lot
of the attributes, such as gender, species, modality, and data types, are stored within the
REMI system as categories and possible values for each of these attributes are stored as
category-values.
Figure 4.6: Physical Design Diagram. Refer to Appendix D for a complete listing of alltables.
Having defined the logical entities essential for REMI the natural next step is to
translate the logical ER diagram to a table design diagram. Figure 4.6 shows the various
35
tables and how each table is related to another. Each entity in the logical ER diagram
translates to an individual table within the database. If two entities share a one-to-
many relationship between them then at the table design the table that represents the
many aspect of the relationship includes an additional attribute that maps to the unique
identifier of records in the table that represents the one aspect of the relationship. Two
entities that share a many-many relationship between them have tables that represent
each of those two entities and an additional third table to store the associations between
those two entities. If a many-many relation has attributes defined on it then those
attribute are added to this third table. The Experiment and Subject entities share a
many-to-many relationship and this relationship has Condition and Weight as attributes
defined on it. The ExpSubjectAssociation table is defined on the database to represent
this relationship and this table contains columns to capture the condition and weight
attributes.
Another interesting relationship, that we needed to model properly to database
tables, is the relationship between File, Subject and Experiment. Depending on how
a researcher chooses to define an experiment, an experiment may contain no or one
or many subjects. If a researcher does choose to associate more than one subject to
an experiment then there is a chance that certain files generated over the course of
the experiment may pertain to a specific subject that participated in that experiment.
Hence those files need to be associated to the relationship that is defined between that
experiment and that subject. Within that same experiment some files may be generated
that is related more to the experiment rather than a specific subject participating in the
experiment. Example of such files would be experiment calibration files or README
files to name a few. These files need to be associated directly to the experiment itself.
The File entity translates to a table called file on the database. The Experiment and
36
Subject entities have their own table and the ExpSubjectAssociaton table is the table used
to capture the many-to-many relationship between these two entities. An experiment
could generate many files and one file could be part of a number of experiments. One
file could be generated as output of an experiment and that same file could act as an input
for another experiment. To capture this many-to-many relationship between Experiment
and File a third table is created called FileAssocation. Similarly a file can be associated
to the same subject but different experiments and also an association of subject and
experiment can generate more than one file. Hence this also forms a many-to-many
relationship between File and Experiment-Subject relationship. The FileAssocation can
be used to capture both these relationships with the File entity. The attributes defined on
the FileAssociation table are
• file id: Unique identifier from the File table.
• owner id: Unique identifier either from the Experiment table or the
ExpSubjectAssociation table.
• owner type: String field that is used to indicate if the record is related to the
Experiment table or ExpSubjectAssociation table.
As a process of designing our database we tried to follow data normalization
processes. Our database is in the second normal form, thereby ensuring that every
column value in the database is atomic and every non-key column is fully dependent
on the (entire) primary key. The database currently is not in the desired third normal
form due to the fact that the file table is a big de-normalized table containing not only
the attributes of a file but also attributes of all the meta-data that is associated with that
file. There is a fair amount of duplicate data that gets stored in this table as the same
37
meta-data can shared by more than one file. The de-normalization of meta-data and
file attributes was necessary to make the storing and retrieving of data fairly simple.
Breaking the meta-data into various entities would then mean joining a large number of
tables to search for files. Also since each file has its own meta-data attributes, updating
the meta-data for a particular file is fairly easy. Ease of handling meta-data related to
files is done at the cost of duplication of data. The exp file table is a de-normalized table
containing file information as well as other additional information that is not directly
associated with the file but is information about the data within the file. This data forms
the meta-data pertaining to the file. Currently the additional fields contained within
exp file is a static list of attributes that were identified along with the researchers as
critical data needed for storage and retrieval purposes. It being a static list is a big
limitation that we have mentioned in chapter 7.
For the database development we made use of MySQL server 5.1. The Rails
application framework provides a facility called Migrations. Migrations are a
convenient way to alter a database in a structured and organized manner. Migrations
allows one to use Ruby to define changes to a database schema, making it possible to
use a version control system to keep things synchronized with the actual code. Each
change in the database schema has its own migration object, encapsulating a move up
and a move down. Rails migrations allow data migration as well as schema migration,
and migration code is database independent. We have made use of migration files to
generate all of the tables within REMI.
38
4.4 REMI application development
Our initial development started with a system using PHP and Linux, Apache, MySQL,
and PHP (LAMP) based on a paper[41] published by Hugh E. Williams. This was a very
basic system that would allow anyone to upload files to a server. We tried building the
current system on top of this work but it became evident as the project got bigger that
an established and easy to iterate web framework was needed and hence the decision to
move our project to the Rails framework took place.
Rails uses the MVC[38] architecture pattern to organize application programming
and makes use of the ActiveModel object relational mapping model. Refer to appendix
C for details on the MVC pattern and the layout of different components within the
REMI source code. For every table within the REMI database there is a corresponding
model class defined within the framework. It is through these model classes that Rails
communicates with the database and performs CRUD operations on the respective
tables. A single instance of a particular model class maps to a single tuple of the
corresponding table in the database. In addition to basic CRUD operations it is a good
Rails programming practice for models to encapsulate the business logic. Business logic
could reside in the model class itself or modules could be created which models can look
up and execute code. To elaborate the above point, the ExpFile model, in addition to the
basic CRUD operations that are performed on the database table Files, also contains
the business logic to store files on the file-server. Logic related to generating the file
path, storing of the intermediate files, and merging of the intermediate files all resides
within the ExpFile model. Figure 4.7 show the various software components including
the views, controllers and models that are a part of the REMI system.
39
Figure 4.7: Various system components and its interaction with the System Roles
4.4.1 Authentication
Although REMI has been developed with the intention that anyone from anywhere
should be able to access and download experimental data there is processes within REMI
that we do not want to expose to the general public. Functions such as contribution
of new experiments to REMI, authorizing new experiments to be made available to
the public users and administrating user resources and role are some of the privileged
functionality that we want to make available to certain groups of people only after proper
credential validation. A simple username and password authentication system is put in
place to validate users. Depending on the role of the user and upon user credential
validation different menus are then exposed to the user through which user can perform
more advanced actions.
40
To ensure that authentication rules are applied correctly an ability module is placed
between every controller and model interface. It makes sense to have the module
sandwiched between the controller and the model because it is the model that deals with
the data layer within REMI and before we provide access to the data proper privilege
check need to be made. The ability module looks at every request that a controller makes
on behalf of the user and validates if the user has the valid permissions to perform that
action on the model. Abilities to perform an action on a model it associated with roles.
Assigning a role to a user in effect provide him the ability to perform actions on models
that role defines.
4.4.2 File Storage
A challenge in designing a file storage system is implementing a good file and directory
structure design to store possibly a large number of files. The simplest possible solution
to storing file within a file system is to have all of the files under a single parent
directory. This sort of design is very simplistic and very easy to implement. The
disadvantage with such a design scheme is that at the operating system level, any file
or directory commands executed on this directory would be slow and unresponsive.
Another approach is creating an individual directory for each of the experiments that
are created. This again can eventually lead to a large number of sub-directories being
created as the number of experiments within REMI increases. The challenge is to come
up with a mechanism to create such a directory structure that can grow and shrink
autonomously. We have used the following method to create directory structures on
the fly. Every file that needs to get uploaded to the database would contain a file name.
We concatenate the experiment id with the filename and generate a 64 length SHA256
41
hash. The hash is a string of hexadecimal characters. Now depending on two system
parameters ”directory levels” and ”directory name size”, the directory structure
is automatically generated using the hash value and the file is stored under the last
directory. The parameter ”directory name size” determines how many characters to
use of the hash value to create a directory. The parameter ”directory levels” determines
the depth of the directory structure which in turn determines the number of different
sequences of strings generated based on the parameter ”directory name size”.
Figure 4.8: Based on the settings of directory levels and directory name size set to 2,following the possible directory structure created
Following is a sample hash that gets generated:
fb80f1735b3153e7a41d38390de5d9773c35259965b835b859a32bfa3713bd4b
Now if ”directory levels” is set to 2 and ”directory name size” is set to 2
then using the above hash value we generate directory ”fb” at level 1 and
another directory ”80” at level 2. Hence the directory structure that you get is
ROOT DIRECTORY/fb/80/<experiment id> <filename>.<extension>
42
4.4.3 Menu
The main menu as seen by a public user is dynamic in nature. Menu sections such as
Modality, DataSet, and Species have their values retrieved from the REMI database.
These three menu items are stored as categories within the database and whose values
are in the category-values table. Reason for this is that new modalities, datasets, or
Species can be added by the System Administrator and those new entries need to surface
on the public-user user interface as part of the menu. All other static menu items are
retrieved from the menu.json file that resides under the folder app/configs. The menu
structure can be changed around by modifying the contents of the menu.json file.
4.4.4 Behaviour Driven Development
Behavior driven development (or BDD)[36] is an agile software development technique
that we followed. BDD focuses on obtaining a clear understanding of desired software
behavior through discussion with stakeholders. It extends Test Driven Development
(TDD) by writing test cases in a natural language that non-programmers can read.
Behavior-driven developers use their native language in combination with the ubiquitous
language of domain driven design to describe the purpose and benefit of their code.
This allows the developers to focus on why the code should be created, rather than
the technical details, and minimizes translation between the technical language in
which the code is written and the domain language spoken by the business, users,
stakeholders, project management, etc. BDD relies on the use of a very specific (and
small) vocabulary to minimize miscommunication and to ensure that everyone, that is
the business, developers, testers, analysts, and managers,are not only on the same page
but use the same words.
43
While using BDD, the idea is to use stories for planning and estimation. But once
we deliver working code it is more natural to talk about code in terms of features rather
than stories. And since stories in later iterations can lead to enhancements of existing
features, it is much easier to keep things organized by feature and just add new scenarios
to the existing features.
44
Figure 4.9: The Behavioural Driven Development Life Cycle. Picture taken from TheRSpec Book[25]
45
We typically use Cucumber[2] to describe behavior of the application from the
outside and RSpec[10] to describe the behavior of its component parts. Test driven
development follows the familiar red/green/refactor cycle, where we start of with a
failing test case indicated by the color red, then we write code so as to make the failing
test pass which is indicative of the color green. Now we can refactor the code with the
assurance that the test we just wrote would catch any bug we introduce as a result of
the refactor. With the addition of a higher level tool like Cucumber, we actually have
two concentric red/green/refactor cycles, as depicted in figure 4.9. Both cycles involve
taking small steps and listening to the feedback from the tools. We start with a failing
step (red) in Cucumber (the outer cycle). To get that step to pass, we drop down to RSpec
(the inner cycle) and drive the underlying code at a granular level (red/green/refactor).
At each green point in the RSpec cycle, we check the Cucumber cycle. If it is still red,
the resulting feedback should guide us to the next action in the RSpec cycle. If it is
green, we can jump out to Cucumber, refactor if appropriate, and then repeat the cycle
by writing a new failing Cucumber step.
As part of developing REMI we decided to practice BDD. Given this approach
we ensured that we had test cases in place before writing any piece of code. Figures
4.10 depicts the scenario of Creating a new Category that forms part of the feature A
System Administrator manages Categories. Similarly 4.11 illustrate two scenarios each
of which is a user logging in either as a Contributor or System Administrator. Both of
these scenarios are part of the Log In feature.
46
Figure 4.10: As a system administrator, he can create new categories in REMI
Figure 4.11: Users should be able to Log into REMI
To elaborate on the development process using BDD let us understand the Login
feature and specifically the first scenario. The scenario tests the login process for a user
who has the Contributor role assigned to him. The Given...And...Then statements then
define the scenario. It states that given a user logs in the ’Contributor’ role, when he
clicks the Contribute link he should be taken to the list of all experiments page. The list
of all experiments page is only accessible as a contributor. Testing this scenario throws
47
an error for the first step itself. Cucumber is not able to interpret the statement ”I am
logged in as a user in the (.*) role”. This is shown in figure 4.12.
Figure 4.12: Cucumber fails to interpret the statement
To help Cucumber understand that particular we write code explaining what needs
to be done as part of that step. Figure 4.13 shows the code for that particular step. Based
on the role passed as part of the description of that step we fetch that corresponding role
from the database. Then we make use of a test fixture to create a user and assign that
user the role. Then we visit the log in page and log into REMI using this test user. Upon
logging into REMI, the user should be directed to the home page and that particular
check is being made at the end of that piece of code.
48
Figure 4.13: Provide a definition to a step that Cucumber does not understand.
Figure 4.14: All the cucumber test pass now.
49
Now we run the test again and we notice that all the test pass as indicated in figure
4.14. As for the other two steps in that scenario, the cucumber-rails gem installs code
that can interpret the rest of the steps. Using the cucumber-rails gem one can interpret
clicking links and buttons, being redirected or landing on pages, filling out of fields in
forms and validating the value of fields on forms.
50
Chapter 5
Status report
REMI has been deployed into production and is being used by the Nuclear Imaging
group of the Life Sciences Division within the Lawrence Berkeley National Laboratory.
The system is being used to upload and share experimental data with other research
organizations like UC Davis and UC San Francisco in China Basin, San Francisco. As
of today REMI contains 175 experiments within the system and manages 8105 files that
are associated to one of these experiments. There is an active initiative to upload all of
the experiments and its associated data files, conducted in the past, to REMI.
REMI resides on an Apple XServer 3.1 system with the latest Mac OS X Server
10.7.1 operation system installed on it. The Apple XServer machine houses two Quad-
Core Intel Xeon 2.26 GHz processors with 6 GB of RAM to utilize. The server is
connected to a RAID array through Fibre Channel cables. The RAID array consists of
2 SATA drives.
A complete hardware configuration of REMI is provided below
1. RAID Arrays:
51
(a) Promise VTrak E610f - 16x 2 TB SATA
(b) Promise VTrak E610f - 16x 1 TB SATA
2. Capacity:
(a) has a raw capacity of 32 TB and a formatted capacity of 26 TB (RAID 6)
(b) has a raw capacity of 16 TB and formatted capacity of 13 TB (RAID 6)
3. Server:
• Apple Xserve (Xserve3,1)
• 2 x Quad-Core Intel Xeon 2.26 GHz processors
• 6 GB RAM
• Operating System: Mac OS X Server 10.7.1
The RAID arrays are directly attached to the server using Fibre Channel cables. The
single Apple XServer houses the MySQL database, the application server (Rails) and is
directly connected to the RAID device. Hence all the software components that make
up REMI are all present on a single server machine.
52
Chapter 6
Lessons learned
During the course of building REMI we faced a number of challenges along the way.
One of the basic requirements of REMI is the ability to upload research data files. The
nature of the data files varies in data types, file encoding formats and file sizes, to name
a few. One of the challenges we knew upfront was that REMI needs to handle uploading
and downloading of large files. Data files generated by various experiments vary from
few kilobytes to a little less than 100 gigabytes of data. REMI would need to be able
to handle such large volumes of data. Uploading a file is done within the context of an
update operation of an experiment. The process is to add a file to an experiment and
save the experiment. The new file then gets uploaded to the server and gets associated
to that particular experiment. A large number of files can be added to an experiment
within a single work flow. In order to handle updating the experiment and the newly
attached files, we let each file get uploaded within a separate Ajax request, one for each
file and the final update request for the experiment is sent to the server as a regular HTTP
request. Our assumption was that each XMLHttpRequest[14] would be able to handle
the upload process of each file irrespective of the size of the file. This assumption turned
53
out to be false as each browser has a maximum cap on the amount of data that can be
transferred via a single request. We tried to confirm the cap on file size for browser
uploads by looking up published information for each of the browsers on respective
browser websites. The limit on file size is varied depending on the browser used. The
current version of all the browsers now support a maximum upload file size limit of
4GB123.
During our test phase we also noticed that REMI was not able to upload some of
our test files. For such files we noticed that the browser sends a synchronization request
to the server and instead of beginning the file transfer, the browser will just not send
any more requests. We confirmed this behavior by monitoring the requests being sent
between client and server using a tool called wireshark[13]. Because of this behavior we
noticed that the server will start a connection and will wait on the browser for the next
incoming request, but since the browser will not send any further requests the connection
on the server will time out. Although we could not get to the root cause of this problem
we have a strong belief that the data encoding of the file and the manner in which the file
was being sent across from the browser to the server within the Ajax request was causing
a problem. The file should have been transferred as a binary large object (BLOB) but
instead was being transferred as a regular data file. We could have just created a BLOB
object for the entire file and the browser could then send the upload request to the server,
but this would not be a good solution as creating a BLOB for a large file increases the
memory footprint of the browser.
1http://code.google.com/p/gss/issues/detail?id=942http://blogs.msdn.com/b/ieinternals/archive/2011/03/10/wininet-internet-explorer-file-download-
and-upload-maximum-size-limits.aspx3http://www.motobit.com/help/scptutl/pa98.htm
54
Figure 6.1: Sequence Diagram for the File Upload Process. Refer to figures 6.2, 6.3,6.4, and 6.5 for more visible images.
To tackle the problems mentioned above, we resorted to uploading files from the
client to the server in chunks. Each chunk being uploaded is 250MB in size, which
is well below the maximum cap of file size for data uploads through a single browser
request. Uploading the files in chunks also provides us the ability to resume an upload in
case a file upload fails for any reason. The file resume feature is currently not supported
by REMI and is part of our future road map. However, a file may be reloaded again
55
from scratch. Currently we make use of the MozSlice method to read file chunks and
this method is specific to the Firefox browser. A similar method exists for browsers
that are based on the Webkit[12] engine known as webkitslice. Work is ongoing on
standardizing this method across all browsers as part of the HTML5 initiative. The
MozSlice/WebkitSlice method returns a new BLOB containing the data in the specified
range of bytes of the source. This helps us with the second issue of ensuring that all files
are being sent to the server as BLOB objects.
Figure 6.1 shows the entire sequence diagram during the upload process. A detailed
explanation of each of the steps is provided below based on the different parts of the
sequence diagram mentioned above.
The upload process starts with the user clicking the Update Experiment button on
the web browser. Clicking the button calls a JavaScript that then takes all of the meta-
data added for each new file and creates an object, per file and meta-data association
(Step 1.1 in figure 6.2). These objects are then loaded in an upload queue that is part
of the file uploader (Step 1.2 in figure 6.2). Once all the file objects are loaded in the
queue, we call the upload method on the file uploader (Step 1.3 in figure 6.2). Details of
the upload process are explained further down. Once the files are uploaded then another
POST request is sent to the server to update the other experiment related details (Step 1.4
in figure 6.3) and depending upon if the files and experiment data has been successfully
uploaded, a response is sent back to the browser.
56
Figure 6.2: Sequence Diagram for the File Upload Process: Part 1
57
Figure 6.3: Sequence Diagram for the File Upload Process: Part 2
To describe the details of the upload process, when the upload method is called on
the file uploader, the file uploader starts to process the queue of files. For each file a
new Ajax (XMLHttpRequest) object is created (Step 1.3.1 in figure 6.2). Then we start
to transfer the file to the server in parts. We first find out the chunk size (Step 1.3.2
in figure 6.2) and then we send that chunk to the server by calling the save file API on
the experiment controller (Step 1.3.3 in figure 6.2). A response of the chunk file being
saved is then received from the server (Step 1.3.4 in figure 6.3)
Figure 6.4 shows the details of saving the file chunk on the server.
58
Figure 6.4: Sequence Diagram for the File Upload Process: Part 3
The client browser is responsible for creating the chunk files and transferring them
to the server. The browser keeps a track of how much of the file has been transferred
and how much is remaining. The web browser sets an addition final flag to true when it
is the last chunk of the file to be transferred. This final flag indicates to the experiment
model that the entire file has been transferred and that it needs to concatenate those file
chunks. Once the chunk files are concatenated and a single file is created, the meta-data
associated to the file is then stored in the database. This behavior is shown in figure 6.5.
Once the file is concatenated the status response is sent back to the client browser.
59
Figure 6.5: Sequence Diagram for the File Upload Process: Part 4
Since REMI is dealing with medical imaging data that is very critical to research
organizations it becomes crucial to ensure that the data files get uploaded in a consistent
and reliable manner. Now that we upload files in chunks it becomes even more important
to ensure that files get split, copied and merged back on the server in a very robust
way. The entire process needs to be fail-safe and should be able to recover from
failures. Currently we have not implemented the feature to resume (restart from where
it failed) file uploads but the current working of the file upload process does ensure that
the uploaded file appears within the REMI system only after it has been successfully
copied over to the file server. In the case of any failed uploads we are aware of chunk
files remaining on the file server, but any attempt to upload the same file again within
60
the same experiment will over-write the previous chunk files. The combination of the
internal experiment number and the file name ensures that the file will always get copied
to the same directory structure. Any failure during the process of uploading a file, does
not restrict the user from adding the file back to the experiment and trying to upload
the file again. It is assumed that no two files to be attached in the same experiment will
have the same name and that any modification to the file names would be done before
deciding to upload the files to REMI.
Although we went ahead with slicing the files and uploading the files using
JavaScript and Ajax, we were aware of other solutions that exist to handle file uploads.
These solutions would need the browser to run an additional plug-in to help the upload
process. One such solution is to make use a Java applet to handle the file uploads. The
applet would run within the browser itself but this would make it necessary that the
client machine have Java installed on the system. Another approach would be to make
use of flash. Flash-based file-upload utilities could be used to upload the files which
would then mean that the client would need flash to be installed within the browser. One
of our design goals was to keep the client side as light-weight as possible and hence we
decided to search for other solutions than the two mentioned above. WebSockets[19] is
another technology providing capabilities for bi-directional, full-duplex communication
channels, over a single Transmission Control Protocol (TCP) socket. It is being designed
as part of the HTML5 specification with the intention of it being implemented in web
browsers and web servers.
61
Chapter 7
Weakness and Future Enhancements
Although REMI makes some good strides in consolidating research data and making all
of the experiment files available to access via a web interface, REMI in many ways is
still in its budding phase. As the system gets utilized more and more and with constant
user feedback, new versions of REMI should only make the system more mature. REMI
forms a perfect platform for a lot of exciting future work and enhancements and as of
now enables basic features of managing and sharing of experiments and its data. REMI
is far from a well polished finished product and currently has a number of limitations that
we hope will be addressed over time. A list of these limitations have been mentioned
below.
7.1 Limitations within REMI
7.1.1 Semi Structured Data
Semi-structured data is a form of structured data that does not conform with the formal
structure of tables and data models that are associated with relational databases. They do
62
however contain tags or markers to separate semantic elements and enforce hierarchies
of records and fields within the data. XML files, JSON files and other mark up languages
are all forms of semi-structured data. A lot of the meta-data that needs to be associated
with files are also semi-structured data. Our current data model has not been designed
to capture this semi-structured data in its entirety. In our current implementation of
REMI, we identified elements within the semi-structured data that we thought to be
important and useful enough for searching and display purposes. All elements identified
are associated to some modality or process that forms a part of the experiment. The need
to capture a particular element within the database was validated by researchers within
their field of research. Every element identified is then mapped to a single column within
the file table. We also did provide the ability to associate tags to a file which in some
way does provide the ability to capture semi-structured data but the solution is not a
scalable one.
7.1.2 Resume functionality for Uploads and Downloads
REMI needs to handle uploading and downloading of files whose size is anything less
than 100GB each. Transferring such high volumes of data is a very time consuming
process and is prone to the possibility of disruption during uploads or downloads.
Ability to resume a data upload or download is necessary so that bandwidth, time and
processing power is not utilized to resend data that was once already transferred. With
the upload process we did change the behaviour of the system to upload the data to the
server in chunks. This was necessary to over up the file upload limits set by browsers
that we discussed earlier. Uploading files in chunks makes it possible to resume file
uploads but currently we have not built that feature into REMI. As for the downloads,
63
the downloads all happen at one shot and any failure in the download progress would
result in the user needing to start the download process all over again from the very
beginning.
7.1.3 File Compression on the Server
As mentioned above, REMI handles files which are huge in nature. As a public user
of the system, one can search for experiments, and then select one or many files that
they want to download. The HTTP protocol does not support sending multiple file in
response to a single request. There are a number of documented solutions to go around
this, but they either involve compressing or archiving all the individual files into one
single file and then sending it to the client, or to make use of some third party plug-in,
that in the background sends an individual request for each of the files that the user
selects to download. We decided to implement the solution of compressing the files on
the server. That said, this is not a very favorable and scalable solution given that the size
of the files are really large, and compressing such large files on the server will take time,
temporary disk space, and is CPU intensive. Also some files are dense files and makes
no sense wasting CPU cycles trying to compress them.
7.1.4 Meta-data Templates
The process of filling out meta-data for each file a very time consuming and tedious
process. In most cases the values for the meta-data are the same across a group of files
and even across experiments. Initially a contributor had to fill in each meta-data field
explicitly and after a couple of suggestions we provided a feature where-by one could
club files into groups and associate the same set of meta-data to all files within that
64
group. This process still needed the user to fill out each of the meta-data fields and
associate it with the files. The next requirement was the ability to copy pre-existing
meta-data, thereby providing the user the ability to copy meta-data and only modifying
elements that needed to be modified for that data-set. Due to lack of time we did provide
a stopgap feature where the most latest copy of the meta-data was stored and a user could
choose to use that copy to associate with the files.
7.2 Future Enhancements
All of the weaknesses mentioned above need to be addressed in due course of time and
lends itself to form the basis of our future enhancement list.
• Given that REMI needs to capture a whole lot of meta-data and that meta-data is
mostly semi-structured data it is quite necessary that REMI be able to handle semi-
structured data in a more reliable and scalable way. Currently we have identified
certain elements of the meta-data that we need to store and have existing columns
defined for each of these elements. This approach is very restrictive and does not
address the dynamic nature of semi-structured data. Moving forward we need to
look at how best to represent and store meta-data within a database. XML files are
a representation of semi-structured data and most database vendors have realized
the importance to capture and store semi-structured data. Different database
vendors have taken different approaches to address the problem of storing semi-
structured data within databases. Few databases are still relational in nature but
provide the facility of storing an entire XML document within a row of a table.
Oracle introduced a new datatype, XMLType, to facilitate native handling of XML
data in the database. The database management system then provides API to query
65
and manipulate these XML documents. Similarly we need to research and see if
MySQL has similar capabilities to handle and store XML documents.
• Resuming file uploads and downloads is another big enhancement that is critical
to the success of REMI. On the upload process we do already chunk the files
and upload them. This in some ways has already laid the foundation to provide
the resume feature on the file uploads. One just needs to check the size of the
incomplete file that currently resides on the server and then tell the browser to
start uploading the data past that size. On the download side we currently do
not have a strategy in place, but literature suggests that using the HTTP Byte
Range Retrieval Extensions[28], resuming file downloads is possible. To provide
a resume feature on downloads we need to take a detailed view of our current
download process and our compression strategy on the server. Currently all files
selected to be downloaded are temporarily compressed on the server and then this
compressed file is served to the browser. Since the file served is a temporary
compressed file, careful thought needs to be taken in understanding how this ties
into resuming file downloads.
• During our live demonstrations of REMI with the various stakeholders we noticed
that the meta-data being entered for a lot of the experiments were the same.
This was due to the fact that a lot of the experiments carried out were being
conducted in the same controlled environment with the same set of parameters.
This observation brings about two important facts. One is the ability to copy
meta-data over from either other experiments or from a meta-data template bank.
Currently we have not given much thought into this, but may prove to be necessary
66
down the line. Our current implementation just provides a method of copying over
the last filled meta-data.
• Dealing with the meta-data issue, it became evident for the need to provide
more personalized features. Personalization settings or views such as saved
search criteria, meta-data templates and owned experiments make it easier for
the researcher to interact with REMI and tend to focus on the studies that he has
uploaded into REMI. Currently REMI does not provide any capabilities of data
personalization.
67
Chapter 8
Conclusion
The work related to this thesis started with two simple problems. The first problem,
that the nuclear imaging group from the Life Sciences Division, Lawrence Berkeley
National Laboratory, faced was the scattering of experimental data. The nuclear imaging
group, over the years have conducted a number of experiments and has collected tens of
terabytes of raw and processed data. This data either resided on individual researchers
local machines or within an individual location on a file-server. The data was also
organized as per each researcher’s preference. Having experimental data scattered on
individual local machines and organized as per each researcher’s liking, made it hard for
other researches to get access to the data. It was even harder for researcher to understand
the data, given that they first needed to understand how the data was being stored. It
became difficult to track all of this data since the data was stored in different locations.
The second problem was driven by the need to make experimental data available for
researchers belonging to other scientific research laboratories. The nuclear imaging
group collaborates with researchers from other organization located at different parts of
the world. The need was to be able to provide these researchers with access to all the
68
experiment data that is captured by the nuclear imaging group. For the same reasons as
mentioned in the first problem, it would be hard to share this data with other researcher.
To address these two problems, we identified the need to have a scientific data
management system in place to help consolidate all of the experiment data, and a way
to query for experimental data. We also needed a consistent way of managing and
storing experiment information. A data model was needed to be designed that best
captured all of the experiment information along with other related information. The
data management system needed to be accessible not only by members of the nuclear
imaging group within Lawrence Berkeley National Laboratory, but also by researchers
in other research organizations, located in different parts of the world.
We developed REMI, a web-based scientific data management system, to address
the above problems. We started with understanding the experiment data that was being
captured and collected by the nuclear imaging group. The group mainly conducts
SPECT, DTMRI, and Micro-PET cardiac studies. It was necessary to identify the
differences in the experimental methodologies and the data generated for each of these
studies. We also identified the various objects that participated within a particular
experiment study. All of the above information helped us to understand the operations
that REMI is to perform and the design of the data model needed to store all of this
data. REMI was also developed with the intention of making all of the experiment
data publicly available to anyone. Apart from this, one needs valid credentials to use
REMI to upload new experiment data. While developing REMI, there were a number of
challenges that we needed to overcome. From uploading the large datafiles in parts to
providing features to help researchers copy meta-data templates, each of them needed
to be addressed in a timely manner to make the project a success. In this thesis, we
69
reported our design considerations, problems encountered, and their solutions behind
REMI database.
Although there are a number of different databases available to manage research data
within a specific research area, REMI is the first scientific data management system that
is developed to address experimental data generated by cardiac nuclear imaging studies.
REMI manages a variety of data files, including raw medical imaging data files, machine
calibration data files, and image reconstruction data files. REMI makes all of this data
available to researchers, present in different parts of the world, through an easy-to-use
web interface. REMI is also built on top of a data model that can be extended to handle
most scientific experimental data. Through the use of tags, categories and category
values, REMI can be adapted to manage experimental data, generated by studies in
other similar scientific research domains.
As of today, REMI is running in production environment and is being used to store
and share experiment data with different researchers. REMI currently stores 8105 files
and 175 experiment. That said, there are a number of weaknesses present in the REMI
database. REMI does not store all of the semi-structured data generated within each
experiment. REMI stores a small subset of the semi-structured data, as identified
necessary by researchers. Also filling in meta-data is a tedious process, given that
researchers need to manually enter all of the meta-data. A good percentage of the meta-
data is present in various header files, and it would be nice if REMI could fetch this
meta-data from the header files themselves. We plan to address these weaknesses in our
future work. We also expect the REMI database to mature, as REMI gets utilized over
time and as more researchers begin interact with REMI.
70
Bibliography
[1] Mayo Clinic, Analyze Software, Retrieved from http://medical.nema.
org/, Accessed in 2011.
[2] Cucumber, http://cukes.info/, Accessed in 2011.
[3] Dicom: Digital imaging and communications in medicine, http://medical.
nema.org/.
[4] Emory cardiac toolbox, http://www.medx-inc.com/emory_toolbox.
html, Accessed in 2011.
[5] Html5: User interaction: Drag and drop, http://dev.w3.org/html5/
spec/dnd.html, Accessed in 2011.
[6] Java 2 platform, enterprise edition(j2ee), http://java.sun.com/j2ee/
overview.html, Accessed in 2011.
[7] Learn rest: A tutorial, http://rest.elkstein.org/, Accessed in 2011.
[8] List-of-values components, Oracle Fusion Middleware Web User Interface
Developer’s Guide for Oracle Application Development Framework.
71
[9] Qps: Quantitative perfusion spect, http://www.csaim.com/qps.php,
Accessed in 2011.
[10] Rspec: A behavior-driven development tool for ruby programmers., http://
rspec.info/, Accessed in 2011.
[11] Web services description language, http://www.w3.org/TR/wsdl,
Accessed in 2011.
[12] The webkit open source project., http://www.webkit.org/, Accessed in
2011.
[13] Wireshark, http://www.wireshark.org/, Accessed in 2011.
[14] Xmlhttprequest: W3c candidate recommendation, August 2010, http://www.
w3.org/TR/XMLHttpRequest/, Accessed in 2011.
[15] File api: W3c working draft, October 2011, http://dev.w3.org/2006/
webapi/FileAPI/, Accessed in 2011.
[16] Html5, June 2011, http:
//dev.w3.org/html5/spec/Overview.html, Accessed in 2011.
[17] Phusion passenger, December 2011, http://www.modrails.com/,
Accessed in 2011.
[18] Soap version 1.2, October 2011, http://www.w3.org/TR/soap/, Accessed
in 2011.
[19] The websocket api, November 2011, http://dev.w3.org/html5/
websockets/, Accessed in 2011.
72
[20] Todd-Pokropek A, Cradduck TD, and Deconinck F., A file format for the exchange
of nuclear medicine image data: a specification of interfile version 3.3, Nuclear
Medicine Communication (1992), 673–699.
[21] Steve Androulakis, Ashley M Buckle, Ian Atkinson, David Groenewegen, Nick
Nicholas, Andrew Treloar, and Anthony Beitz, Archer e-research tools for
research data management, The International Journal of Digital Curation 4 (2009),
no. 1.
[22] T. Berners-Lee, L. Masinter, and M. McCahill, Uniform resource locators (url),
December 1994, http://www.ietf.org/rfc/rfc1738.txt, Accessed
in 2011.
[23] Andrew D. Birrell and Bruce Jay Nelson, Implementing remote procedure calls,
ACM Transactions on Computer System (1984), 39–59.
[24] Scott Chacon, Git, Retrieved from http://git-scm.com/.
[25] David Chelimsky, Dave Astels, Bryan Helmkamp, Dan North, Zach Dennis, and
Aslak Hellesoy, The rspec book, Pragmatic Bookshelf, December 2010.
[26] Robert H. Choplin, Jobannes M. Boebme II, and C. Douglas Maynard, Picture
archiving and communication systems: an overview., Radiographics (1992).
[27] E.P.Ficaro, B.C. Lee, J.N. Kritzman, and J.R. Corbert, Corridor4dm: The
michigan method for quantitative nuclear cardiology., Proc. Am. Society Of
Nuclear Cardiology (2007), 455–465.
73
[28] R. Fielding, J.Gettys, J. C. Mogul, H. Frystyk, L. Masinter, P. Leach, and
T. Berners-Lee, Hypertext transfer protocol – http/1.1, June 1999, http://
tools.ietf.org/pdf/rfc2616.pdf.
[29] Roy T. Fielding and Richard N. Taylor, Principled design of the modern web
architecture, ACM Transactions on Internet Technology (2002), 115–150.
[30] Carissa G. Fonseca, Michael Backhaus, David A. Bluemke, Randall D. Britten,
Jae Do Chung, Brett R. Cowan, Ivo D. Dinov, J. Paul Finn1, Peter J. Hunter,
Alan H. Kadish, Daniel C. Lee, Joao A. C. Lima, Kalyanam Shivkumar Pau
Medrano-Gracia, Avan Suinesiaputra, Wenchao Tao1, and Alistair A. Young,
The cardiac atlas projectan imaging database for computational modeling and
statistical atlases of the heart, Bioinformatics (2011).
[31] Daniele Gianni, Steve McKeever, and Nic Smith, euheartdb: A web enabled
database for geometrical models of the heart., Functional Imaging and Modeling
of the Heart 5528 (2009), no. 44, 407–416.
[32] Jim Gray, David T. Liu, Maria Nieto-Santisteban, Alex Szalay, David J. DeWitt,
and Gerd Heber, Scientific data management in the coming decade, Cyber
Technology Watch (2005).
[33] David Heinemeier Hansson, Ruby on rails, Retrieved from http://
rubyonrails.org/.
[34] Esperanza Marcos, C’esar J. Acuna, Belen Vela, Jos’ M. Cavero, and Juan A.
Hernandez, A database for medical image management, computer methods and
programs in biomedicine (2007), 255–269.
74
[35] Brian Matthews, Shoaib Sufi, Damian Flannery, Laurent Lerusse, Tom Griffin,
Michael Gleaves, and Kerstin Kleese, Using a core scientific metadata model in
large-scale facilities, The International Journal of Digital Curation 5 (2010).
[36] Dan North, Introducing bdd, March 2006, http://dannorth.net/
introducing-bdd/, Accessed in 2011.
[37] David A. Patterson, Garth Gibson, and Randy H Katz, A case for redundant arrays
of inexpensive disks (raid), ACM SIGMOD 17 (1988), no. 3.
[38] Trygve Reenskaug, Model-view-controller (mvc) architecture., (1979), Retrieved
from http:
//heim.ifi.uio.no/˜trygver/themes/mvc/mvc-index.html.
[39] Mario Valle, Scientific data management, http://personal.cscs.ch/
˜mvalle/sdm/scientific-data-management.html.
[40] Gottfried Vossen and Mathias Weske, The wasa approach to workflow
management for scientific applications, NATO ASI Series 164 (1998), 145–164.
[41] Hugh E. Williams, Managing images with a web database application, Tech.
report, May 2002, Published on OnLamp.com, Accessed in 2011.
75
Appendix A
User Documentation
A.1 Public User
A.1.1 Downloading Experiment Data
1. When a user access the REMI website they are taken to the REMI main page.
Figure A.1 shows the main page that presents to the user three menu from which
one can choose to search for experiments.
76
Figure A.1
2. The modality menu, as shown in the figure A.2, lists all the modalities within
REMI. Choosing one of the modalities brings up a list of all the experiments that
belong to that modality.
77
Figure A.2
3. Figure A.3 shows a list of all DTMRI experiments stored in REMI. To view and
download the files associated to a particular experiment one needs to click the
green ”expand” icon. A User then selects the files he wants to download and
clicks the Download Files button.
78
Figure A.3
4. The download page provides a set of search fields that the user can use to search
for experiments. Figure A.4 shows a user searching for the experiment Rat Heart
SHR8 data set B.
79
Figure A.4
5. One can also use a text based search to search for experiments across parameters
shown in the table of experiments. In figure the user searches for all DTMRI
experiments that use WKY subjects.
80
Figure A.5
A.2 Login into REMI
1. Users that need to contribute to the REMI system need to log into the system.
Present at the bottom left corner is a log-in button indicated by the red arrow in
figure A.6. Clicking the button redirects the user to the log-in page.
81
Figure A.6
2. A unique username is assigned to every user of the system. To log into the REMI
system a user enters his username and password and hits the log in button as
shown in figure A.7.
82
Figure A.7
A.3 Data Authentication
Although we have defined a role called Data Authenticator and have defined the actions
one can perform as a data authenticator, the nuclear imaging group at the Life Science
Division, Lawrence Berkeley National Laboratory have not yet decided if they want to
support this policy or not.
A.4 Contributor
A.4.1 New Subject Work-flow
1. To add a new subject to the system you need to first click the contribute link at the
top as shown in figure A.8.
83
Figure A.9
Figure A.8
2. click the Subjects menu item. Figure A.9
84
3. A.10 shows a list of all existing subjects within REMI. To add a new subject click
the ’Add New Subject’ button.
Figure A.10
4. Once on the subject detail page shown in Figure A.11, enter all the required
information regarding the subject. Labels marked in red with an asterisks are
mandatory fields. Press the ’Create Subject’ button to save the subject.
85
Figure A.11
5. If all goes well and the subject is saved to REMI you should see a success message
as shown in Figure A.12. The success message disappears in a while leaving you
on the details page for any further modification if needed. If for any reason the
subject could not be saved to REMI then an appropriate error message is display
along with the reason for which the record could not be saved.
86
Figure A.12
6. Click the ’Back’ button to take you back to the Subjects index page. A.13
Figure A.13
87
7. As mentioned before the Subject index page shows a listing of all the subjects
available in REMI. The newly created subject should be available in the list and
we can search of that subject in a number of ways. A.14 shows 3 controls to
navigate the list.
(a) 1 in the figure determines the number of records one sees on the page
(b) 2 shows a text box that is used to filter records. searching is done across all
columns.
(c) 3 helps to navigate the list. You could use the arrow buttons to more one
page at a time forward or backwards. The double arrowed buttons take you
to the beginning of the list or the end of the list and the numbered buttons
takes you to that specific page in the list.
(d) 4 gives us a summary of the number of records in the list.
Figure A.14
88
8. We search for all records containing ’ftha’ and the list updates depending on our
search criteria. Notice that figure A.15 shows the new subject we just added is also
displayed along with an updated summary of the number of records the filtered
list contains.
Figure A.15
A.4.2 New Experiment Work-flow
1. As a contributor you can create new experiments by first clicking on the contribute
link as shown in figure A.16.
89
Figure A.16
2. Clicking the contribute link takes you to the experiment listing page. This page
displays all the experiments within REMI. To add a new experiment click the ’Add
New Experiment’ button as shown in figure A.17.
90
Figure A.17
3. The system starts of with a bare basic template of the experiment first. On saving
the template based on the modality chosen the appropriate experiment is created
with the additional information that is associated with that modality. Fields in
red with an asterisks sign are mandatory fields. The ’Researcher name’ field is
a lookup text box. Typing in the first couple of letters of the researchers user
91
name bring up a drop-down list displaying the list of researchers that match the
characters entered in that field. Figure A.18 shows a snapshot of the experiment
page.
Figure A.18
4. Saving the template created in the previous step takes the user to the edit page
which displays all the details entered by the user when creating the experiment.
In addition depending on the modality respective tabs appear at the top. Each
tab encapsulates certain amount of related information that needs to be capture as
part of the experiment. There are five actions as shown in figure A.19 that can be
performed from this page as listed below.
(a) Subjects
In most cases experiments are carried out on one or more subjects. Also
experiments may not have any subject associated with it. The Subjects tab
allows the user to associate subjects to the experiment.
92
(b) Tracers
Like subjects, tracers too may or may not have been used within an
experiment. The Tracers tab allows the user to associate tracers to the
experiment. A tracer is a pharmaceutical that allows the ability to image
the body, based on the cellular function and physiology.
(c) Detectors If the experiment has any detector information that needs to be
captured, the corresponding fields are located under the Detectors tab.
(d) Files To associate files to the experiment as well as enter meta-data related
to the files click the ’Files’ tab.
(e) Update Experiment
The ’Update Experiment’ button updates the experiment with the
information that is filled on the form. At any point during the process of
filling the various sections, one can hit the ’Update Experiment’ button and
that information added by the user will get stored to REMI.
(f) Back
Back takes the user back to the experiment listings page.
93
Figure A.19
5. Clicking the Subjects tab takes the user to the subject page from where the user
can associate the subjects to the experiment. To add a subject to the experiment
one needs to click the ’Associate New Subject’ button. See figure A.20.
94
Figure A.20
6. Clicking the ’Associate New Subject’ button brings up a list of all the subjects
within the REMI system as shown in figure A.21. Four actions can be taken from
here, namely;
(a) 1 shows the search box. The search box can be used to filter the subjects
based on a certain condition.
95
(b) 2 Check mark the subjects that need to be associated with the experiment.
(c) 3 Hit the ’Add Subject to Experiment’ button once the subjects have been
selected to associate those subjects to the experiment.
(d) 4 To go back and not associate any subjects to the experiment.
96
Figure A.21
97
7. The selected subjects are then added to the experiment. NOTE: The associations
are not saved until the ’Update Experiment’ button is clicked, telling the system
to save the association to REMI. Figure A.22 shows the two subjects associated to
the experiment. Also at the bottom addition information concerning the individual
subjects can be entered. 1 indicates that clicking that header expands the section
for that corresponding subject and collapses the other sections.
98
Figure A.22
8. Clicking the Tracers tab takes the user to the tracer page from where the user can
associate the tracers to the experiment. To add a tracer to the experiment one
needs to click the ’Associate New Tracer’ button. See figure A.23.
99
Figure A.23
9. Clicking the ’Associate New Tracer’ button brings up a list of all the tracers within
the REMI system as shown in figure A.24. Three actions can be taken from here
namely
(a) 1 Check mark the tracers that need to be associated with the experiment.
(b) 2 Hit the ’Add Tracer to Experiment’ button once the tracers have been
selected to associate those tracers to the experiment.
(c) 3 To go back and not associate any tracers to the experiment.
100
Figure A.24
10. The selected tracers are then added to the experiment. NOTE: The associations
are not saved until the ’Update Experiment’ button is clicked, telling the system
to save the association to REMI. Figure A.25 shows the two tracers associated to
the experiment.
101
Figure A.25
11. Clicking the Files tab you can now associate files to the experiment. Figure A.26
shows the file listing page. Files can be associated to the experiment in two ways.
One way to associate files is to click the browse button and select a single or
multiple files. The other method is to select the files and drag and drop them in
the ’Drop new files here’ section.
102
Figure A.26
12. Once the files appear on the page as shown in figure A.27 associated meta-data is
added to it by selecting the corresponding file using the check box and then hitting
the ’Edit Meta Data’ button. A single file or multiple files can be selected at the
same time.
103
Figure A.27
13. Four categories of meta-data can be associated to the files selected namely file
details, subject information, acquisition information and voxel information. File
details containing file information like description and data type can be entered
here. Refer to figure A.28.
104
Figure A.28
105
14. Depending on the subjects added to the experiment, files may also be associated
to a specific subject rather than an experiment. One can choose which subject the
current selected files belong to from here. Refer to figure A.29.
106
Figure A.29
107
15. Various other parameters that are captured during data acquisition should be
entered under the Acquisition Information tab.
108
Figure A.30109
16. The Voxel Information tab captures voxel information in case the files selected
have any voxel data associated with them.
110
Figure A.31
111
A.5 System Administrator
A.5.1 Researchers
1. To add a new researcher, a system administrator needs to click the System
Administration link. Figure A.32 shows the same.
Figure A.32
2. The user is then taken to the page containing a list of all users in REMI. To create
a new user, one needs to click the New User link as shown in figure A.33.
112
Figure A.33
3. The User then needs to enter the details of the user in the user details page. All
fields marked in red are mandatory and trying to safe the details with any of the
mandatory details missing throws a valid error message as shown in figure A.34.
113
Figure A.34
4. Figure A.35 shows the user details being successfully saved to the REMI database.
114
Figure A.35
5. Going back to the list of all researcher, one can search for the newly created user
using the search box, as shown in figure A.36.
115
Figure A.36
6. Figure A.37 shows the newly created user logging into REMI.
116
Figure A.37
7. The testUser successfully logs into REMI and as defined by the Contributor role,
which was assigned to testUser, they can access that menu item. This behavior is
shown in Figure A.38.
117
Figure A.38
A.5.2 Roles
1. To add a new Role to the REMI, a system administrator first needs to click the
System Administration link. This is shown in figure A.39.
118
Figure A.39
2. To add a new role, the system administrator then needs to click the New Role
button present on the Roles page. The roles page shows a listing of all roles
defined on REMI. Figure A.40 shows the user interface for the same.
119
Figure A.40
3. The system administrator then enters the role name and a description for that role
and hits the Create Role button as shown in figure A.41.
120
Figure A.41
4. The system administrator can also search for the new role from the roles listing
page shown in figure A.42.
121
Figure A.42
5. In addition to adding the role from the user interface, the system administrator
also needs to define the abilities of that role. The abilities of a role is defined in
the system file called ability.rb which can be found under the app/models folder.
Figure A.43 shows the details of the ability file. A new condition block needs to
be added to the ability file to assign abilities to the new role.
122
Figure A.43
The following code shows a template for the new role condition to be added to the
ability.rb file.
c l a s s A b i l i t y
i n c l u d e CanCan : : A b i l i t y
def i n i t i a l i z e ( u s e r )
# c r e a t e a u s e r i f t h e u s e r v a r i a b l e i s
# n o t a l r e a d y d e f i n e d
#
# t h e u s e r v a r i a b l e would n o t be d e f i n e d
# i f t h e u s e r i s n o t l og ge d i n t o t h e s y s t e m
123
#
# i f a u s e r l o g s i n t o REMI t h e u s e r v a r i a b l e
# p as se d t o t h e method w i l l be d e f i n e d .
u s e r | | = User . new
# F i r s t we s e t t h a t t h e u s e r s h o u l d n o t be a b l e
# t o a c c e s s any methods d e f i n e d on t h e c o n t r o l l e r s .
c a n n o t : manage , : a l l
# Code f o r c o n t r i b u t o r . . . s e e t h e f i l e
# f o r d e t a i l s
#
# Code f o r s y s t e m a d m i n i s t r a t o r . . .
# s e e t h e f i l e f o r d e t a i l s
# i f t h e u s e r i s a <new r o l e name> t h e n
# a l l o w them t o manage < l i s t o f c o n t r o l l e r s >
i f u s e r . i s a <new r o l e name>?
can : manage , [< l i s t o f c o n t r o l l e r s >]
end
end
end
124
A.5.3 Categories and Category Values
1. To add a new category or category value, a system administrator needs to go to the
system administration section within REMI, and click the Category Menu. This
brings up a listing of all the categories and one can add new categories too. Figure
A.44 shows the list of all categories and how to add a new category.
Figure A.44
2. A system enters the category name and saves the category by clicking the Create
Category button. Figure A.45 also shows the list of category values associated to
the new category. This list is empty as expected because this is a new category.
125
Figure A.45
3. The system administrator can then search for the new category from the list of
categories page as shown in figure A.46. To add category values to this new
category, they can click the name of the category to take them to the details page
again.
126
Figure A.46
4. Figure A.47 shows the details page of the category testCategory. The system
administrator could change the name of the category or could go ahead and add a
list of values for this category.
127
Figure A.47
5. The system administrator then adds a category value name and saves it as shown
in the figure A.48.
128
Figure A.48
6. The new category value then appears on the category detail page. Refer to figure
A.49
129
Figure A.49
7. The system administrator can also add a value to an existing Category in
REMI. The figure A.50 shows the system administrator searching for the Species
category.
130
Figure A.50
8. Once at the details page of the Species category, the system administrator clicks
the Add New Category Value button shown in figure A.51.
131
Figure A.51
9. A new value, ”New Animal” is added to the Species category shown in figure
A.52.
132
Figure A.52
10. The new value should appear is all places where the Species category is reference.
The figure A.53 shows the new value New Animal now appears in the menu listing
under Species.
133
Figure A.53
11. Also the new value appears in the dropdown on the Subject detail page as shown
in figure A.54.
134
Figure A.54
135
Appendix B
Installation Documentation
B.1 Version Control within REMI
For version controlling purposes we made use of an open source, free, distributed
version control system known as Git[24]. Git supports two kinds of repositories, ”bare”
repositories and repositories. In the ideal world of distributed revision control there is
no central repository. People just pull from whomever they want changes from and no
pushes exist. Using repositories one can only pull content into and pushing content into
a repositories is not supported. In Git one should only use a ”bare” repository to clone
and pull from, and push to.
Within our REMI development environment, REMI’s ’bare’ repository resides
on tasman. All developer need to pull and push data to this ’bare’ repository.
[email protected]:/Users/antall/git repositories/ror is the remote url to the ’bare’
repository.
136
B.2 Creating a development environment
Following are the steps to set up a development environment. Please note that these
steps are related to setting up the development environment on a Ubuntu machine and
should be applicable to any operating system that is a derivative of Ubuntu.
1. Installing MySQL
Depending on the operating system download a copy of MySQL 5.1 database and
install it on the machine.
2. Install additional tools and libraries
Additional tools such as curl and git are a prerequisite to installing the Ruby
Version Manager (RVM). libxml2-dev and libxslt1-dev are required for nokogiri
which will be later installed by the RVM as a gem. libmysqlclient16-dev is also
needed to build the mysql2 driver to communicate with the MySQL database.
$ sudo apt-get install curl git libxml2-dev libxslt1-dev libmysqlclient16-dev
3. Install RVM.
The use of the Ruby Version Manager, known as RVM (to handle the ruby
execution engine and other rubygems), is strongly encouraged. RVM helps
managing different ruby interpreters and gems in an isolated manner. This makes
switching between interpreters easy, and we do not run the risk of accidentally
updating gems that other projects are dependent on. The complete installation
document can be found on the RVM website.
4. Install additional packages for RVM
A few additional packages that are needed by ruby when it installs the various
137
gems need to be installed. For additional information regarding each of these
packages please refer to the packages session under the documentation section
located on the RVM website.
$ rvm pkg install zlib
$ rvm pkg install openssl
$ rvm pkg install readline
5. Installing ruby
For development purposes we use the MRI ruby interpreter. A complete list of all
ruby interpreters that RVM supports can be listed by the command
$ rvm list known.
We currently support MRI ruby interpreter - version 1.8.7 and we install the inter-
preter passing the various packages that were installed for RVM in the previous
step.
$ rvm install ruby-1.8.7 –with-zlib-dir=$rvm path/usr
–with-readline-dir=$rvm path/usr –with-openssl-dir=$rvm path/usr
6. Creating a Gemset and .rvmrc file
RVM provides the facility of creating self contained compartmentalized
independent ruby setups. This is achieved by creating gemsets and then installing
the necessary gems within that gemset. Gemsets can then be associated with a
single ruby project. Although it is encouraged to have one gemset per project, it
is still possible to associate a number of projects to the same gemset. Ensuring
that one gemset is associated with one project avoids the possibility of gems in a
gemset getting updated in one project which in turn could break the dependencies
of the other projects on that gemset.
138
To create a gemset, the version of ruby needs to be selected first. We then create
the gemset by providing a gemset name. Below we create a gemset called remi.
$ rvm use 1.8.7
$ rvm gemset create remi
7. Create location for the codebase
Create a location to house the codebase. We also want to associate the ruby engine
and the gemset to this location. We can specify the ruby engine and the gemset
within a .rvmrc file which should be placed in the directory that we just created.
Now every time we change directory to this location or any other sub directory,
the .rvmrc file is run and the corresponding ruby engine and gemset gets loaded.
$ mkdir remi
$ cd remi
$ echo ”rvm use 1.8.7@remi” >.rvmrc
$ source .rvmrc
8. Fetching the source code
Having created the directory for the source code we need to set up a git repository
by initializing it first. We then create an alias for the remote repository - in the
following example we have named it remi. Once the alias is setup we pull the
source code from this location.
$ git init
$ git remote add remi [email protected]:/Users/antall/git repositories/ror
$ git pull remi master
139
9. Installing the gems
After the source code has been copied all the gems associated with the project
have to be installed. The REMI project makes use of the bundler gem that takes
care of installing all the other gems present inside the Gemfile. So first we install
the bundler gem and then ask bundler to install all of the other gems.
$ gem install bundler
$ bundle install
10. Creating and loading the database
To create the development database with the defined tables and initial data, the
following commands are needed to be executed.
$ rake db:create
$ rake db:migrate
$ rake db:seed
11. Starting the Webrick application server
Once all the gems have been installed and the database has been created and
loaded we can start the application server. Once the application server has started,
http://localhost:3000 should take you to REMI’s main page.
$ rails s
All of the above steps should help you in setting up a development environment on
a Linux (Ubuntu) machine.
140
B.3 List of Used Software Tools
The table below contains a list of all the software tools and components that have been
used to develop REMI.
141
Table B.1: Used Software Summary.
Name Description VersionActiverecord ORM implementation 3.0.9Apache Web Server 2.2.17Authlogic Simple model based ruby authentication
solution3.0.3
CanCan Authorization Gem for Ruby on Rails 1.6.7Capybara Acceptance test framework for web
applications1.1.1
Cucumber BDD framework 0.10.2Factory girl rails Rails integration for factory girl 1.2.0Formtastic A Rails form builder plugin with
semantically rich and accessible markup2.0.2
Guard-cucumber Guard::Cucumber automatically run yourfeatures
0.7.2
Guard-rspec Guard::RSpec automatically run yourspecs
0.5.0
JSON JavaScript Object Notation 1.6.1jQuery JavaScript Library 1.6.4MySQL Database 5.1Mysql2 A modern, simple and very fast Mysql
library for Ruby - binding to libmysql0.2.6
Net-scp Pure-Ruby implementation of the SCPprotocol
1.0.4
Net-ssh Library for programmatically tunnelingconnections to servers
2.2.1
Passenger Deployment of Ruby web applications onApache
3.0.9
Rails Web framework 3.0.9Rails3-jquery-autocomplete
An easy and unobtrusive way to usejQuery’s autocomplete with Rails 3
0.7.3
Rspec-rails Rspec-2 for Rails-3 2.6.1Ruby A dynamic, open source programming
language1.8.7
Rubyzip A ruby library for reading and writing zipfiles.
0.9.4
Smartgit A GUI for git repositories any version
142
Appendix C
Maintenance Documentation
C.1 MVC
MVC is the concept of encapsulating data together with its processing (the model) and
isolate it from the manipulation (the controller) and presentation (the view) part that has
to be done on a user interface.
• Models
Models represent knowledge. A model could be a single object, or it could be
some structure of objects. There should be a one-to-one correspondence between
the model and the represented world. For example, ExpFile is a model within
REMI that maps to the real world entity File. Similarly the User model maps
to the individual researcher/user of the REMI database. Relational tables are
represented within the models.
• Views
A view is a (visual) representation of its model. It ordinarily highlights certain
143
attributes of the model and suppress others. It is thus acting as a presentation filter.
A view is attached to its model and gets the data necessary for the presentation
from the model by looking up attributes of the model. It may also update the
model by sending appropriate data back to the model. The view needs to know
the semantics of the attributes of the model it represents.
• Controllers
A controller is the link between a user and the system. It provides the user with
input by arranging for relevant views to present themselves in appropriate places
on the screen. It provides means for user output by presenting the user with menus
or other means of giving commands and data. The controller receives such user
output, translates it into the appropriate instructions and pass these instructions
on to one or more of the views. For example the UserController accepts the
GET/POST requests from the client browser and accordingly passes on the request
to the associated model or fetches a particular view.
144
C.2 Project Layout of REMI
Figure C.1: Project Layout of REMI
The figure C.1 shows the project layout of REMI. The entire REMI application resides
under the app folder. Following the MVC pattern, the app folder contains a sub-folder
for each of the Models, Views, and Controllers components. The Models directory
contains the classes which are used to store and manipulate the state of objects that
are stored in the REMI database. The user interface like HTML and ERB, which is
necessary to render the model to the user, is stored under the Views directory. ERB
provides an easy to use but powerful templating system for Ruby. Using ERB, actual
Ruby code can be added to any plain text document for the purposes of generating
document information details and/or flow control. The Controllers directory contains
all the controllers of the application. The controller decides what the user’s input was,
how the model needs to change as a result of that input, and which resulting view should
be used.
145
C.3 Adding a new Entity
The following section shows how to go about adding a new entity to REMI.
1. To create a new entity in REMI, we start with using the scaffold generator to cre-
ate the template code. Syntax to the scaffold generator is$ rake g scaffold <entity name>[<column name:datatype>]*
As shown in figure C.2, the scaffold generator generates the migration file, the
model file, the controller file and the various view files under the app folder. The
scaffold generator also creates additional files for testing and a helper file to put
additional helper logic.
Figure C.2: Using the scaffold generator to generate a template
2. Figure C.3 shows the contents of the newly created migration file. The migration
file defines a class containing two methods up and down. The up method creates
the table in the database and the down method rolls back the changes made my
the migration file.
146
Figure C.3: Contents of the migration file
3. The scaffold generator also updates the routes.rb file with an entry for the
new entities entity. The routes.rb contains a list of all the permitted RESTful
API defined on an entity. Figure C.4 shows the same.
Figure C.4: Entry made by the scaffold generator in the routes.rb file
4. We need to run the migrate command to update the database with the new table
that is defined in the migration file. To run the migration script the command is
147
$ rake db:migrate
In case one needs to rollback the changes then the command for it is$ rake db:rollback
Figure C.5 shows the output on the console confirming that the table is created in
the database.
Figure C.5: Executing the migrate command
5. Having performed all of the above steps, the new entity object is now accessible
via the web interface through the REST API. The listings page is shown in figure
C.6.
Figure C.6: One can access the new entity objects via the web browser
6. Now that we have access to the entity via the REST API, we need to be able to
access the entities via the REMI menu. An entry for the new entity needs to be
made in the menu.json file. The file currently has two menu configurations, first
148
the system administration menu configuration and second the contributor menu
configuration. Depending on where one would like the menu to new entities to
show up, an entry needs to be made in that section. In figure C.7, we have added
the menu to the contributor menu configuration. Components of the entry made
are
• id: The ID value that translates to the element ID on the HTML page. It
provides us with a handle for Document Object Model (DOM) manipulation.
• value: The display name that is shown on the user interface.
• link: The name of the restful API to be called on clicking that menu item.
149
Figure C.7: Adding an entry to the menu.json file
7. Also since we have added the new entity menu to the contributor menu
configuration, a similar entry needs to be made in the file menu.html.erb. This
file is located under the folder apps/views/common. This file acts as the view file
to display the menu entries across all of the REMI web pages. Figure C.8 shows
the corresponding changes made to the file.
150
Figure C.8: Adding an entry to the menu.html.erb file
8. Figure C.9 shows the new new entity menu item on the REMI menu.
Figure C.9: New Entity Menu Item
151
9. So far we have managed to create the necessary template files. We managed
to create the table within the database, and we updated the REMI menu to
display the new entity. The scaffold generator generated for us some default
view pages. Although we could make use of the same default view pages, we
need to modify these default pages, so that they are consistent with the look
and behavior of all the other REMI pages. All the view files can be found
under the apps/views/new entities directory. We start of with the listing page (file
index.html.erb). Figure C.10 show the default code within this page and figure
C.11 shows the modified code. There are three changes that need to be made.
(a) Step 1 in figure C.11, shows that the table header is modified to add a couple
of div tags to the file. We also add the table partial just before the div tags.
The table partial contains the presentation logic for all tables. It is important
to ensure that the ID for the table tag is the same as the table name parameter
passed to the table partial. Partial templates usually just called ”partials”
are a mechanism for breaking the rendering process into more manageable
chunks. With a partial, you can move the code for rendering a particular
piece of a response to its own file.
(b) Step 2 in figure C.11, changes the display logic of the rows of the table.
Rather than having separate columns for the Show, Edit, and Delete options,
we remove the Show and Edit options. We then modify name attribute for
the entity, by creating the name attribute as a link to the edit page.
(c) Step 3 in figure C.11, adds a class attribute to the link object. CSS looks at
this class attribute and paints a button object for that link.
152
Figure C.10: Default listings page
Figure C.11: Modified listings page
10. Figure C.12 shows the newly modified listings page. One can see the new table
template being displayed in comparison to the page displayed in figure C.9.
153
Figure C.12: New listing page
11. The last file to modify is the form.html.erb file. This file is a partial that is used
in the files new.html.erb and edit.html.erb. While creating a new new entity the
new.html.erb is displayed, and to update an existing new entity the edit.html.erb
file is displayed. A total of four changes are made to the form.html.erb file,
namely;
(a) Step 1 in figure C.13, we need to tell the page to make use of the formtastic
declarative syntax. This is done by adding the tag semantic form for. For a
list of all the gems used in REMI, refer to the table B.1.
(b) Step 2 in figure C.13, adds a timed notice to the page. This notice message is
displayed for 5 secs and is used to display messages related to creating and
updating of the objects.
(c) Step 3 in figure C.13, adds the various attributes of the object on the page. If
any new attribute is later added to the model, then that attribute also should
be mentioned here for display purposes.
(d) Step 4 in figure C.13, adds the submit and the cancel buttons on the page.
154
Figure C.13: Making changes to the form.html.erb file
155
Appendix D
Database Documentation
D.1 List of Tables
c l a s s C r e a t e U s e r M e t a d a t a T e m p l a t e s < A c t i v e R e c o r d : : M i g r a t i o n
def s e l f . up
c r e a t e t a b l e : f i l e t a g s do | t |
t . r e f e r e n c e s : e x p f i l e , : n u l l => f a l s e
t . r e f e r e n c e s : t ag , : n u l l => f a l s e
t . t i m e s t a m p s
end
a d d i n d e x : f i l e t a g s , : e x p f i l e i d
a d d i n d e x : f i l e t a g s , : t a g i d
c r e a t e t a b l e : t a g s do | t |
t . s t r i n g : tag name , : n u l l => f a l s e
t . t i m e s t a m p s
156
end
a d d i n d e x : t a g s , : t ag name
c r e a t e t a b l e : c a t e g o r i e s do | t |
t . s t r i n g : name , : n u l l => f a l s e
t . t e x t : d e s c r i p t i o n
t . b o o l e a n : u s e r d e f i n e d
t . t i m e s t a m p s
end
a d d i n d e x : c a t e g o r i e s , : name
c r e a t e t a b l e : c a t v a l u e s do | t |
t . s t r i n g : name , : n u l l => f a l s e
t . t e x t : d e s c r i p t i o n
t . b o o l e a n : u s e r d e f i n e d
t . r e f e r e n c e s : c a t e g o r y
t . t i m e s t a m p s
end
a d d i n d e x : c a t v a l u e s , : c a t e g o r y i d
a d d i n d e x : c a t v a l u e s , : name
c r e a t e t a b l e : s u b j e c t s do | t |
t . s t r i n g : s u b j e c t t a g , : n u l l => f a l s e
t . s t r i n g : unique name , : n u l l => f a l s e
157
t . r e f e r e n c e s : specimen , : n u l l => f a l s e
t . r e f e r e n c e s : gender , : n u l l => f a l s e
t . t e x t : c o n d i t i o n s
t . d a t e : b i r t h d a t e , : n u l l => f a l s e
t . d a t e : h a r v e s t d a t e
t . r e f e r e n c e s : r e a s o n o f d e a t h
t . r e f e r e n c e s : c o n d i t i o n a t d e a t h
t . r e f e r e n c e s : h e a r t c o n d i t i o n a t d e a t h
t . r e f e r e n c e s : w a s f o m b l i n u s e d
t . f l o a t : h e a r t w e i g h t
t . f l o a t : b o d y w e i g h t
t . t i m e s t a m p s
end
a d d i n d e x : s u b j e c t s , : s u b j e c t t a g
a d d i n d e x : s u b j e c t s , : s p e c i m e n i d
e x e c u t e ”ALTER TABLE s u b j e c t s ADD CONSTRAINT
c o n s t r a i n t u n i q s u b j e c t UNIQUE ( unique name , b i r t h d a t e ) ”
c r e a t e t a b l e : e x p e r i m e n t s do | t |
t . s t r i n g : name , : n u l l => f a l s e
t . t e x t : d e s c r i p t i o n
t . d a t e : e x p d a t e , : n u l l => f a l s e
t . r e f e r e n c e s : u se r , : n u l l => f a l s e
t . r e f e r e n c e s : machine
158
t . r e f e r e n c e s : moda l i t y , : n u l l => f a l s e
t . s t r i n g : t y p e
t . t i m e s t a m p s
end
a d d i n d e x : e x p e r i m e n t s , : m o d a l i t y i d
a d d i n d e x : e x p e r i m e n t s , : u s e r i d
a d d i n d e x : e x p e r i m e n t s , : m a c h i n e i d
a d d i n d e x : e x p e r i m e n t s , : e x p d a t e
e x e c u t e ”ALTER TABLE e x p e r i m e n t s ADD CONSTRAINT
c o n s t r a i n t u n i q e x p e r i m e n t UNIQUE ( m o d a l i t y i d , u s e r i d , e x p d a t e , name ) ”
c r e a t e t a b l e : c o l l i m a t o r s do | t |
t . s t r i n g : name
t . r e f e r e n c e s : c o l l i m a t o r t y p e , : n u l l => f a l s e
t . s t r i n g : m a t e r i a l
t . f l o a t : r a d i u s
t . f l o a t : l e n g t h
t . t i m e s t a m p s
end
a d d i n d e x : c o l l i m a t o r s , : c o l l i m a t o r t y p e i d
c r e a t e t a b l e : u s e r m e t a d a t a t e m p l a t e s do | t |
t . r e f e r e n c e s : u se r , : n u l l => f a l s e
159
t . i n t e g e r : n u m b y t e s p e r p i x e l
t . r e f e r e n c e s : i m a g e b y t e o r d e r
t . s t r i n g : d a t a d i m e n s i o n s
t . i n t e g e r : num images
t . i n t e g e r : t o t a l n u m i m a g e s
t . s t r i n g : g a t e d t y p e
t . i n t e g e r : num ga te s
t . s t r i n g : i m a g e d a t a f o r m a t
t . r e f e r e n c e s : d a t a f i l e t y p e
t . r e f e r e n c e s : a c q u i s i t i o n m o d e
t . r e f e r e n c e s : d a t a t y p e
t . r e f e r e n c e s : n o r m a l i z a t i o n m e t h o d
t . r e f e r e n c e s : r e c o n s t r u c t i o n m e t h o d
t . b o o l e a n : f l o o d c o r r e c t e d
t . i n t e g e r : n u m a n g l e s t e p s
t . f l o a t : t o t a l a n g l e r o t a t i o n
t . b o o l e a n : d e c a y c o r r e c t e d
t . i n t e g e r : n p r o j e c t i o n s
t . f l o a t : t i m e p e r p r o j e c t i o n
t . s t r i n g : d i r e c t i o n o f r o t a t i o n
t . f l o a t : s t a r t a n g l e
t . s t r i n g : p i x e l d i m e n s i o n s
t . s t r i n g : v o x e l d i m e n s i o n s
t . i n t e g e r : a t t n z c n t #Num V o x e l s used from A t t n Map
160
t . f l o a t : v o x e l x w i d t h
t . f l o a t : v o x e l y w i d t h
t . f l o a t : v o x e l z w i d t h
t . t e x t : v o x e l d e s c r i p t i o n
t . b o o l e a n : i s g a t e d
t . i n t e g e r : s p a t i a l r e s o l u t i o n
t . i n t e g e r : n d i f f u s i o n g r a d i e n t s
t . f l o a t : b v a l u e
t . i n t e g e r : s t u d y d u r a t i o n
t . r e f e r e n c e s : s u b j e c t o r i e n t a t i o n
end
a d d i n d e x : u s e r m e t a d a t a t e m p l a t e s , : u s e r i d , : un iq ue => t rue
c r e a t e t a b l e : f i l e s do | t |
t . s t r i n g : f i l e n a m e , : n u l l => f a l s e
t . s t r i n g : c o n t e n t t y p e , : n u l l => f a l s e
t . s t r i n g : f i l e p a t h , : n u l l => f a l s e
t . f l o a t : f i l e s i z e
t . s t r i n g : f i l e c h e c k s u m
t . t e x t : f i l e d e s c r i p t i o n
t . i n t e g e r : n u m b y t e s p e r p i x e l
t . r e f e r e n c e s : i m a g e b y t e o r d e r
t . s t r i n g : d a t a d i m e n s i o n s
t . i n t e g e r : num images
161
t . i n t e g e r : t o t a l n u m i m a g e s
t . s t r i n g : g a t e d t y p e
t . i n t e g e r : num ga te s
t . s t r i n g : i m a g e d a t a f o r m a t
t . r e f e r e n c e s : d a t a f i l e t y p e
t . r e f e r e n c e s : a c q u i s i t i o n m o d e
t . r e f e r e n c e s : d a t a t y p e
t . r e f e r e n c e s : n o r m a l i z a t i o n m e t h o d
t . r e f e r e n c e s : r e c o n s t r u c t i o n m e t h o d
t . b o o l e a n : f l o o d c o r r e c t e d
t . i n t e g e r : n u m a n g l e s t e p s
t . f l o a t : t o t a l a n g l e r o t a t i o n
t . b o o l e a n : d e c a y c o r r e c t e d
t . i n t e g e r : n p r o j e c t i o n s
t . f l o a t : t i m e p e r p r o j e c t i o n
t . s t r i n g : d i r e c t i o n o f r o t a t i o n
t . f l o a t : s t a r t a n g l e
t . s t r i n g : p i x e l d i m e n s i o n s
t . s t r i n g : v o x e l d i m e n s i o n s
t . i n t e g e r : a t t n z c n t #Num V o x e l s used from A t t n Map
t . f l o a t : v o x e l x w i d t h
t . f l o a t : v o x e l y w i d t h
t . f l o a t : v o x e l z w i d t h
t . t e x t : v o x e l d e s c r i p t i o n
162
t . b o o l e a n : i s g a t e d
t . i n t e g e r : s p a t i a l r e s o l u t i o n
t . i n t e g e r : n d i f f u s i o n g r a d i e n t s
t . f l o a t : b v a l u e
t . i n t e g e r : s t u d y d u r a t i o n
t . r e f e r e n c e s : s u b j e c t o r i e n t a t i o n
t . t i m e s t a m p s
end
c r e a t e t a b l e : f i l e a s s o c i a t i o n s , : i d => f a l s e do | t |
t . r e f e r e n c e s : f i l e
t . r e f e r e n c e s : owner , : n u l l => f a l s e , : po lymorph i c => t rue
t . t e x t : t a g s
t . t i m e s t a m p s
end
a d d i n d e x : f i l e a s s o c i a t i o n s , : o w n e r i d
a d d i n d e x : f i l e a s s o c i a t i o n s , : f i l e i d
c r e a t e t a b l e : e x p t r a c e r a s s o c i a t i o n s do | t |
t . r e f e r e n c e s : expe r imen t , : n u l l => f a l s e
t . r e f e r e n c e s : t r a c e r , : n u l l => f a l s e
end
a d d i n d e x : e x p t r a c e r a s s o c i a t i o n s , : e x p e r i m e n t i d
163
a d d i n d e x : e x p t r a c e r a s s o c i a t i o n s , : t r a c e r i d
c r e a t e t a b l e : e x p s u b j e c t a s s o c i a t i o n s do | t |
t . r e f e r e n c e s : expe r imen t , : n u l l => f a l s e
t . r e f e r e n c e s : s u b j e c t , : n u l l => f a l s e
t . t e x t : c o n d i t i o n
t . f l o a t : w e i gh t
end
a d d i n d e x : e x p s u b j e c t a s s o c i a t i o n s , : e x p e r i m e n t i d
a d d i n d e x : e x p s u b j e c t a s s o c i a t i o n s , : s u b j e c t i d
c r e a t e t a b l e : machines do | t |
t . s t r i n g : name , : n u l l => f a l s e
t . s t r i n g : company
t . r e f e r e n c e s : l o c a t i o n , : n u l l => f a l s e
t . i n t e g e r : n d e t e c t o r s , : d e f a u l t => 0 , : n u l l => f a l s e
t . s t r i n g : a d d r e s s
t . s t r i n g : c i t y
t . s t r i n g : s t a t e , : l i m i t => 2
t . s t r i n g : z i p c o d e
t . t i m e s t a m p s
end
a d d i n d e x : machines , : name
a d d i n d e x : machines , : l o c a t i o n i d
164
c r e a t e t a b l e : energy windows do | t |
t . r e f e r e n c e s : e x p e r i m e n t d e t e c t o r
t . d e c i m a l : energy window
t . d e c i m a l : min energy window
t . d e c i m a l : max energy window
end
a d d i n d e x : energy windows , : e x p e r i m e n t d e t e c t o r i d
c r e a t e t a b l e : e x p e r i m e n t d e t e c t o r s do | t |
t . r e f e r e n c e s : s p e c t , : n u l l => f a l s e
t . r e f e r e n c e s : d e t e c t o r h e a d , : n u l l => f a l s e
t . r e f e r e n c e s : c o l l i m a t o r
t . s t r i n g : d e t e c t o r m a t e r i a l
t . t i m e s t a m p s
end
a d d i n d e x : e x p e r i m e n t d e t e c t o r s , : s p e c t i d
a d d i n d e x : e x p e r i m e n t d e t e c t o r s , : d e t e c t o r h e a d i d
a d d i n d e x : e x p e r i m e n t d e t e c t o r s , : c o l l i m a t o r i d
c r e a t e t a b l e : u s e r s do | t |
t . s t r i n g : username , : n u l l => f a l s e
t . s t r i n g : f i r s t n a m e
165
t . s t r i n g : l a s t n a m e
t . s t r i n g : emai l , : n u l l => f a l s e
t . s t r i n g : d e p a r t m e n t
t . s t r i n g : i n s t i t u t i o n
t . s t r i n g : p o s i t i o n
t . s t r i n g : c r y p t e d p a s s w o r d
t . s t r i n g : p a s s w o r d s a l t
t . s t r i n g : p e r s i s t e n c e t o k e n
t . r e f e r e n c e s : s t a t u s
t . s t r i n g : a d d r e s s
t . s t r i n g : c i t y
t . s t r i n g : s t a t e , : l i m i t => 2
t . s t r i n g : z i p c o d e
t . t i m e s t a m p s
end
a d d i n d e x : u s e r s , : username , : un i qu e => t rue
c r e a t e t a b l e : r o l e s do | t |
t . s t r i n g : name , : n u l l => f a l s e
t . t e x t : d e s c r i p t i o n
t . t i m e s t a m p s
end
a d d i n d e x : r o l e s , : name , : un i qu e => t rue
166
c r e a t e t a b l e : u s e r r o l e s , : i d => f a l s e do | t |
t . r e f e r e n c e s : u se r , : n u l l => f a l s e
t . r e f e r e n c e s : r o l e , : n u l l => f a l s e
t . t i m e s t a m p s
end
a d d i n d e x : u s e r r o l e s , [ : u s e r i d , : r o l e i d ] , : u n iq ue => t rue
c r e a t e t a b l e : e x p e r i m e n t a p p r o v a l s , : i d => f a l s e do | t |
t . r e f e r e n c e s : expe r imen t , : n u l l => f a l s e
t . r e f e r e n c e s : u s e r
t . t i m e s t a m p s
end
a d d i n d e x : e x p e r i m e n t a p p r o v a l s , : e x p e r i m e n t i d
a d d i n d e x : e x p e r i m e n t a p p r o v a l s , : u s e r i d
end
def s e l f . down
d r o p t a b l e : e x p e r i m e n t a p p r o v a l s
d r o p t a b l e : energy windows
d r o p t a b l e : c o l l i m a t o r s
d r o p t a b l e : f i l e a s s o c i a t i o n s
d r o p t a b l e : e x p s u b j e c t a s s o c i a t i o n s
d r o p t a b l e : e x p t r a c e r a s s o c i a t i o n s
167
d r o p t a b l e : u s e r m e t a d a t a t e m p l a t e s
d r o p t a b l e : f i l e s
d r o p t a b l e : e x p e r i m e n t d e t e c t o r s
d r o p t a b l e : e x p e r i m e n t s
d r o p t a b l e : machines
d r o p t a b l e : u s e r r o l e s
d r o p t a b l e : u s e r s
d r o p t a b l e : r o l e s
d r o p t a b l e : s u b j e c t s
d r o p t a b l e : c a t v a l u e s
d r o p t a b l e : c a t e g o r i e s
d r o p t a b l e : f i l e t a g s
d r o p t a b l e : t a g s
end
end
168