REMI Database: A Data Management System for Nuclear … · 2011-12-02 · I owe many thanks to...

REMI Database: A Data Management Systemfor Nuclear Medicine Studies

byAntall Johann Fernandes

A thesis submitted toFlorida Institute of Technology

in partial fulfillment of therequirements for the degree of

Mastersin

Computer Science

Melbourne, FloridaDecember 2011

We the undersigned committee hereby approve the attached thesis

REMI Database: A Data Management System for Nuclear Medicine Studies

byAntall Johann Fernandes

Debasis Mitra, Ph.D.Professor, Computer ScienceCommittee Chair, Thesis Advisor

Philip Bernhard, Ph.D.Associate Professor, Computer ScienceThesis Committee

Marcus Hohlmann, Ph.D.Associate Professor, Physics and Space SciencesThesis Committee

William Shoaff, Ph.D.Associate Professor, Computer ScienceDepartment Head

Abstract

Title: REMI Database: Web Application for Research Study

Author: Antall Fernandes

Committee Chair: Debasis Mitra, Ph.D.

Medical Imaging data acquisition and processing are expensive ventures in terms

of personnel time and resources. Scientific labs, and in particular the nuclear medicine

research group of the Life Sciences Division, Lawrence Berkeley National Laboratory,

we work with, acquires tens of terabytes of raw and processed data worth a few million

dollars. This data, serves as a basis for many resulting publications over close to a

decade. The emphasis on reducing health care research costs makes it necessary to have

imaging data openly available for other researchers and other research institutions. This

is more true for publicly funded research data. With new scientific instruments growing

exponentially in their capability to generate research data, new infrastructure needs to

be developed, and deployed to allow researchers effectively and securely manage their

research data. Researchers need to be able to easily acquire data from instruments,

store and manage potentially large quantities of data, easily process the data, share

research resources and work spaces with colleagues both inside and outside of their

institution, search and discover across their accessible collections, and easily publish

iii

datasets and related research artifacts. The ability to share data with other institutions

depends on the selection of a good data model, ensuring that data is captured and stored

in a consistent and standardized format, and the ease of searching for data. Meta-data,

which is information about data, proves to be important for the purpose of searching and

retrieving data. Meta-data is also important to understand an experiment and the data

associated with that experiment.

In this thesis, we describe the REsearch Medical Imaging (REMI) database. REMI

is a scientific data management database, aimed at cardiac imaging studies, that helps

researchers store their experimental data in a central location. REMI also provides the

ability, via a web interface, for other researchers to query the database and download

experiment data. REMI is built with some of the best tools and development practices

that the software industry has to offer, backed by a consistent data model, and a

framework platform that provides Application Programming Interfaces (API) for future

integrations with other third party tools.

REMI is available for everyone to browse and download experimental data. REMI

can be reached via the Universal Resource Locator (URL) http://remi.lbl.

gov/. We also maintained a project page, providing links to various documents

and presentations. The project page can also be accessed via the URL https:

//cs.fit.edu/Projects/REMI.

iv

Contents

1 Introduction 1

1.1 Scientific Data Management . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Research at the Lawrence Berkeley National Laboratory . . . . . . . . 5

1.3 Overview of the chapters . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Literature Review 8

3 Design Considerations 13

4 Description of REMI 19

4.1 Overall description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2 System architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3 REMI database development . . . . . . . . . . . . . . . . . . . . . . . 28

4.3.1 Roles and Abilities . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3.2 Categories and Category-Values . . . . . . . . . . . . . . . . . 34

4.4 REMI application development . . . . . . . . . . . . . . . . . . . . . . 39

4.4.1 Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.4.2 File Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.4.3 Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

v

4.4.4 Behaviour Driven Development . . . . . . . . . . . . . . . . . 43

5 Status report 51

6 Lessons learned 53

7 Weakness and Future Enhancements 62

7.1 Limitations within REMI . . . . . . . . . . . . . . . . . . . . . . . . . 62

7.1.1 Semi Structured Data . . . . . . . . . . . . . . . . . . . . . . . 62

7.1.2 Resume functionality for Uploads and Downloads . . . . . . . . 63

7.1.3 File Compression on the Server . . . . . . . . . . . . . . . . . 64

7.1.4 Meta-data Templates . . . . . . . . . . . . . . . . . . . . . . . 64

7.2 Future Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

8 Conclusion 68

A User Documentation 76

A.1 Public User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

A.1.1 Downloading Experiment Data . . . . . . . . . . . . . . . . . . 76

A.2 Login into REMI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

A.3 Data Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

A.4 Contributor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

A.4.1 New Subject Work-flow . . . . . . . . . . . . . . . . . . . . . 83

A.4.2 New Experiment Work-flow . . . . . . . . . . . . . . . . . . . 89

A.5 System Administrator . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

A.5.1 Researchers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

A.5.2 Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

vi

A.5.3 Categories and Category Values . . . . . . . . . . . . . . . . . 125

B Installation Documentation 136

B.1 Version Control within REMI . . . . . . . . . . . . . . . . . . . . . . . 136

B.2 Creating a development environment . . . . . . . . . . . . . . . . . . . 137

B.3 List of Used Software Tools . . . . . . . . . . . . . . . . . . . . . . . 141

C Maintenance Documentation 143

C.1 Model-View-Controller (MVC) . . . . . . . . . . . . . . . . . . . . . . 143

C.2 Project Layout of REMI . . . . . . . . . . . . . . . . . . . . . . . . . 145

C.3 Adding a new Entity . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

D Database Documentation 156

D.1 List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

vii

List of Figures

3.1 SPECT Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.1 Hibernate Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.2 Rails MVC Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.3 SOAP vs REST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.4 Software Component Diagram . . . . . . . . . . . . . . . . . . . . . . 27

4.5 Logical Entity Relationship Diagram . . . . . . . . . . . . . . . . . . . 31

4.6 Physical Design Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.7 System Component Design . . . . . . . . . . . . . . . . . . . . . . . . 40

4.8 Proposed Directory Structure . . . . . . . . . . . . . . . . . . . . . . . 42

4.9 BDD Life Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.10 Feature: System Administrator manages Categories . . . . . . . . . . . 47

4.11 Feature: Users should be able to Log into REMI . . . . . . . . . . . . . 47

4.12 Failed Cumcumber Step . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.13 Step Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.14 Cucumber Test Passed . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.1 Sequence Diagram for Saving Files . . . . . . . . . . . . . . . . . . . 55

6.2 Sequence Diagram for Saving Files: Part 1 . . . . . . . . . . . . . . . . 57

viii




A.1 Download: REMI Main Page . . . . . . . . . . . . . . . . . . . . . . . 77

A.2 Download: Modality Menu . . . . . . . . . . . . . . . . . . . . . . . . 78

A.3 Download: List of DTMRI Experiments . . . . . . . . . . . . . . . . . 79

A.4 Download: Search based on Experiment Name . . . . . . . . . . . . . 80

A.5 Download: Search for experiments that make use of WKY type of subjects 81

A.6 Log-in: REMI Main Page . . . . . . . . . . . . . . . . . . . . . . . . . 82

A.7 Log-in: Log-in Page . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

A.9 New Subject: Subjects Menu . . . . . . . . . . . . . . . . . . . . . . . 84

A.8 New Subject: Contribute Link . . . . . . . . . . . . . . . . . . . . . . 84

A.10 New Subject: Subjects Index Page . . . . . . . . . . . . . . . . . . . . 85

A.11 New Subject: Subject Detail Page . . . . . . . . . . . . . . . . . . . . 86

A.12 New Subject: Saving Subject Successful . . . . . . . . . . . . . . . . . 87

A.13 New Subject: Back Button . . . . . . . . . . . . . . . . . . . . . . . . 87

A.14 New Subject: Updated Subject Index Page . . . . . . . . . . . . . . . . 88

A.15 New Subject: Searching a Subject . . . . . . . . . . . . . . . . . . . . 89

A.16 New Experiment: Contribute Link . . . . . . . . . . . . . . . . . . . . 90

A.17 New Experiment: Add New Experiment . . . . . . . . . . . . . . . . . 91

A.18 New Experiment: Create Experiment . . . . . . . . . . . . . . . . . . . 92

A.19 New Experiment: Update Experiment . . . . . . . . . . . . . . . . . . 94

A.20 New Experiment: Blank Subject Listing . . . . . . . . . . . . . . . . . 95

A.21 New Experiment: Select Subject . . . . . . . . . . . . . . . . . . . . . 97

ix

A.22 New Experiment: Associated Subjects . . . . . . . . . . . . . . . . . . 99

A.23 New Experiment: Blank Tracer Listing . . . . . . . . . . . . . . . . . . 100

A.24 New Experiment: Select Tracer . . . . . . . . . . . . . . . . . . . . . . 101

A.25 New Experiment: Associated Tracers . . . . . . . . . . . . . . . . . . 102

A.26 New Experiment: Blank File Listings . . . . . . . . . . . . . . . . . . 103

A.27 New Experiment: Selecting File . . . . . . . . . . . . . . . . . . . . . 104

A.28 New Experiment: File Details . . . . . . . . . . . . . . . . . . . . . . 105

A.29 New Experiment: Subject Information . . . . . . . . . . . . . . . . . . 107

A.30 New Experiment: Acquisition Information . . . . . . . . . . . . . . . . 109

A.31 New Experiment: Voxel Information . . . . . . . . . . . . . . . . . . . 111

A.32 New Researcher: Click System Administration Link . . . . . . . . . . . 112

A.33 New Researcher: Adding a New User . . . . . . . . . . . . . . . . . . 113

A.34 New Researcher: Filling the User Details . . . . . . . . . . . . . . . . 114

A.35 New Researcher: Successful Saving the User Details . . . . . . . . . . 115

A.36 New Researcher: Searching for the User . . . . . . . . . . . . . . . . . 116

A.37 New Researcher: Log In . . . . . . . . . . . . . . . . . . . . . . . . . 117

A.38 New Researcher: Logged In Successfully . . . . . . . . . . . . . . . . 118

A.39 New Role: System Administration Link . . . . . . . . . . . . . . . . . 119

A.40 New Role: Adding a New Role . . . . . . . . . . . . . . . . . . . . . . 120

A.41 New Role: Creating a New Role . . . . . . . . . . . . . . . . . . . . . 121

A.42 New Role: Searching for a Role . . . . . . . . . . . . . . . . . . . . . 122

A.43 New Role: Associating the Abilities with the New Role . . . . . . . . . 123

A.44 New Category: Category Listing and Menu . . . . . . . . . . . . . . . 125

A.45 New Category: Creating a New Category . . . . . . . . . . . . . . . . 126

A.46 New Category: Searching the New Category . . . . . . . . . . . . . . . 127

x

A.47 New Category: Updating the Category and Adding Values for the Category128

A.48 New Category: Adding a Category Value . . . . . . . . . . . . . . . . 129

A.49 New Category: Category Detail Page with New Category Value Listed . 130

A.50 New Category: Search for Species Category . . . . . . . . . . . . . . . 131

A.51 New Category: Adding New Value to Species Category . . . . . . . . . 132

A.52 New Category: Saving the New Value to Species Category . . . . . . . 133

A.53 New Category: Species Menu on Main Page . . . . . . . . . . . . . . . 134

A.54 New Category: Species dropdown on Subject Page . . . . . . . . . . . 135

C.1 Project Layout of REMI . . . . . . . . . . . . . . . . . . . . . . . . . 145

C.2 Create a new Entity: Scaffold Generator . . . . . . . . . . . . . . . . . 146

C.3 Create a new Entity: Migration File . . . . . . . . . . . . . . . . . . . 147

C.4 Create a new Entity: Routes File . . . . . . . . . . . . . . . . . . . . . 147

C.5 Create a new Entity: Run Migration . . . . . . . . . . . . . . . . . . . 148

C.6 Create a new Entity: Access the Entity . . . . . . . . . . . . . . . . . . 148

C.7 Create a new Entity: Adding a Menu to REMI . . . . . . . . . . . . . . 150

C.8 Create a new Entity: Adding a Menu to REMI Menu Page . . . . . . . 151

C.9 Create a new Entity: New Entity Menu Item . . . . . . . . . . . . . . . 151

C.10 Create a new Entity: Default Listings Page . . . . . . . . . . . . . . . . 153

C.11 Create a new Entity: Modified Listings Page . . . . . . . . . . . . . . . 153

C.12 Create a new Entity: New Listings Page . . . . . . . . . . . . . . . . . 154

C.13 Create a new Entity: Modifying the form.html.erb File . . . . . . . . . 155

xi

List of Tables

B.1 List of Used Software Tools . . . . . . . . . . . . . . . . . . . . . . . 142

xii

Acronyms

API Application Programming Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

CSMD Core Scientific Meta-Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

CT Computed Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

CRUD Create, Read, Update, and Delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

DICOM Digital Imaging and COmmunications in Medicine . . . . . . . . . . . . . . . . . . 11

DOM Document Object Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

DTMRI Diffusion Tensor Magnetic Resonance Imaging (MRI) . . . . . . . . . . . . . . . 19

fMRI functional MRI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10

ICAT Information CATalog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

LAMP Linux, Apache, MySQL, and PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

MEDIMAN Medical Image MANagement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

MRI Magnetic Resonance Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

MVC Model-View-Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

NEMA National Electrical Manufacturers Association . . . . . . . . . . . . . . . . . . . . . . 15

ORM Object Relational Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

xiii

PACS Picture Archiving and Communication System . . . . . . . . . . . . . . . . . . . . . . . 9

PET Positron Emission Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

RAID Redundant Array of Independent Disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

REST REpresentational State Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

RPC Remote Procedure Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23

REMI REsearch Medical Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

SOAP Simple Object Access Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

SPECT Single-Photon Emission Computed Tomography . . . . . . . . . . . . . . . . . . . . . . 6

STFC Science & Technology Facilities Council . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

TCP Transmission Control Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

URL Universal Resource Locator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

WASA Workflow-based Architecture to support Scientific Applications . . . . . . . . 9

WSDL Web Services Description Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24

xiv

Acknowledgements

I owe a tremendous debt of gratitude to the many people who made this thesis possible.

I would like to start with thanking my mentor and advisor, Dr. Debasis Mitra,

for giving me the opportunity to work with him on a number of different projects,

and eventually to work on REMI that lead up to this thesis. His constant guidance,

enthusiasm, and support through all of the projects I undertook with him were

invaluable.

Special thanks goes to Dr. Grant Gullberg for giving me the opportunity to work

on REMI and for all of his guidance and inputs that went into REMI. This project was

supported by a subcontract on NIH grant 5-R01-EB007219-04, ”Molecular Imaging

of Cardiac Hypertrophy Using microPET and Pinhole SPECT”. I also appreciate his

giving me the chance to work with the other researchers at the Lawrence Berkeley

National Laboratory, and more importantly a chance to do my internship in Berkeley.

I owe many thanks to Rostyslav Buchko, Archontis Giannakidis, Martin Boswell and

Andrew Hernandez for their wonderful insights and feedback during the entire course

of developing REMI. They have been very patient and were always there on hand if any

thing needed clarification or review. It has been a joy working with these researchers.

xv

I extend my thanks to Stephanie Taylor and Kathleen Brennan, in helping me

understand the MicroPET experiment processes, and for an entertaining time at the

UCSF Laboratory during the long MicroPET experiments.

I want to thank my committee members, Dr. Philip Bernhard and Dr. Marcus

Hohlmann. Their comments, understanding, and interest in my work were extremely

encouraging.

I would also like to thank my colleagues Mahmoud Abdallah and Daniel Eiland,

for their insights and support, when tackling with certain issue I faced during the

development process.

I owe many thanks to the Department of Computer Science for supporting my study.

Last but not least, I sincerely thank the friends and families that offered me most

generous help and joy during my stay in Florida.

xvi

Declaration

I declare that the work in this thesis is solely my own except where attributed and cited

to another author.

xvii

Chapter 1

Introduction

Medical imaging data acquisition and processing are expensive ventures in terms of

personnel time and resources. Hence it becomes imperative that data captured during

any research study is stored in a manner that is reliable and easily retrievable. In

many research organizations the use of hierarchical data model for file organization

prevails. Elaborate directory structures are thought out that are deemed best suited for

capturing as much information related to the studies as possible, and also providing a

systematic and easy way to lookup data later. The hierarchical data model originated

for systems with thousands of files, and yet now we store 7 to 9 orders of magnitude

greater number of files within a single system. Advances in computing technology,

disk capacity, and processing power have paved the way for data intensive scientific

computing. This has led to growth in raw data, which in turn has resulted in the need

for for data analysis capabilities. The importance of data analysis brought forth the need

for storing additional related meta-data in an organized way.

1

With the traditional hierarchical data model1, user defined tags can only be defined as

part of file names or its location in the hierarchical structure. Any additional meta-data

that needs to be captured would have to in some way be incorporated through one of

these two methods. This leads to very lengthy file or directory naming conventions,

or very deep directory structure both of which are not suitable for fast data access.

Also given the familiarity of Google search, there is a popularity around doing full text

searches for files and documents which is not possible in the hierarchical data model.

This is far short for proper meta-data tracking, which is almost non query-able and it is

definitely not a very scalable approach.

Building and organizing medical data depends heavily on meta-data and the ability

to associate this meta-data with the data files. Meta-data describes how data was

measured, acquired, and computed. It is this meta-data that proves quite useful in

database searches. Meta-data can be captured in a number of different ways and at

different time points within a study. Meta-data may be generated by the instruments

used, or parameters that govern the study in some way. In most cases the meta-data

is logged in log files, or parameter files generated by the instruments themselves, or

in other cases meta-data could be external and could be in the principle researchers

mind or something noted down in their journals. Consolidating all of this meta-data that

could be dispersed across different mediums is very critical. Lack of meta-data results

in researchers spending a lot of time wading through entire sets of files to ascertain

relevance of files. If researchers are to read data collected by others, then the data

must be carefully documented and must be published in forms that allow easy access

and automated manipulation. README files and other types of log and text files have

been used in the past to capture all sort of meta-data, but this approach does not scale

1The File Directory structure is an example of the hierarchical data model

2

well. Another set of critical information that does not make its way into text files is the

association of different data sets and files within a single study or even across different

studies. These associations across data sets are important in understanding how different

files or data set came to be. The availability of useful and meaningful meta-data can also

aid in the process of automation. Meta-data is information about data. Good meta-data

makes it possible to build data independent automated tools. Tools could read the meta-

data, understand the structure and properties of the data, and then go about processing

the data. Such tools would be generic in nature and could be used on a number of

different data sets independent of the physical nature of the data.

It is also a fairly common practice among researchers to store all of their research

work either on a personal workstation or a personal account on a remote network storage.

They even create their own data organization schemes that they find suitable. In most

cases there are no set guidelines or standards maintained for data organization. Having

the data across a number of systems is ideal from a storage point of view, but that makes

it hard to search for studies across different researchers. The varying data organization

schemes just add to the complexity and it is a major hurdle when it comes to sharing

research data with other research organizations. These hurdles are even more prominent

when a researcher leaves the organization, leaving other researches with the task of

locating and understanding all of their experiment data.

Given all of the concerns mentioned above, the need for a system to capture,

organize, and share research information has gained a lot of importance. With meta-

data playing a vital part within such systems and given that using a hierarchical data

model does not suffice, most of the system being built and deployed have employed

some sort of database. Be it a relational, or object oriented, or XML database, or a

combination of them.

3

1.1 Scientific Data Management

Before introducing our work, it is worth mentioning that the problem of scientific data

management has been tackled by different groups, in a number of different ways. Some

of them are very specialized databases either focusing on a particular modality or a

specific anatomy part. There are other database and data modeling efforts that focus on

the general scientific studies. Most of the efforts use some sort of database to store meta-

data. While designing and building a scientific data management system it is expected

that the system supports certain features that have been listed below[39].

• Creation of logical collections.

The primary goal of a Data Management system is to abstract the physical data

into logical collections. The resulting view of the data is a uniform homogeneous

library collection.

• Physical data handling.

This layer maps between the physical to the logical data views. Here you find

items like data replication, backup, caching, etc.

• Security support.

Data access authorization and change verification. This is the basis of trusting

your data.

• Data ownership.

Define who is responsible for data quality and meaning.

• Persistence.

Definition of data lifetime. Deployment of mechanisms to counteract technology

obsolescence.

4

• Knowledge and information discovery.

Ability to identify useful relations and information inside the data collection.

• Meta-data collection, management and access.

Meta-data are data about data.

1.2 Research at the Lawrence Berkeley National

Laboratory

REMI has been developed for the Lawrence Berkeley National Laboratory, Life

Science Division, Department of Radiotracer Development & Imaging Technology. The

Department of Radiotracer Development & Imaging Technology uses nuclear tracers to

study biochemical processes in living organisms, and develops new tracers and imaging

systems. Research includes

• Technology to make radioisotopes more widely available.

• Development of new radiotracers.

• Radiotracer imaging to study and guide the treatment of diseases of the brain and

heart, and cancer.

• Radiotracer imaging to enable the development of improved biofuels.

• Scintillation radiation detectors.

• Application-specific detector arrays and readout electronics.

• Image reconstruction and analysis.

5

The department facilities include a medical cyclotron, a Positron Emission Tomography

(PET) camera, a Single-Photon Emission Computed Tomography (SPECT) camera with

X-ray Computed Tomography (CT) scanner, and a 1.5T MRI system.

The group, for whom REMI was developed, deals with cardiac nuclear imaging

and its image reconstruction and analysis. The researchers within this group perform

experiments using either a PET or SPECT or DT-MRI machine. Data generated from

these equipment along with the meta-data are then uploaded into REMI.

Diagnostic techniques in nuclear medicine use radioactive tracers which emit gamma

rays from within the body. They can be given by injection, inhalation or orally. The

first type are where single photons are detected by a gamma camera which can view

organs from many different angles. The camera collects the gamma rays passed through

collimator holes and reconstruction algorithms build up an image from the points from

which radiation is emitted; this image is enhanced by a computer and viewed by a

physician on a monitor for indications of abnormal conditions. These types of systems

are known as SPECT machines.

A more recent development is Positron Emission Tomography (PET) which is

a more precise and sophisticated technique. A positron-emitting radionuclide is

introduced, usually by injection, and accumulates in the target tissue. As it decays

it emits a positron, which promptly combines with a nearby electron resulting in the

simultaneous emission of two identifiable gamma rays in opposite directions. These are

detected by a PET camera and give very precise indication of their origin. PET is used

in cardiac and brain imaging.

6

1.3 Overview of the chapters

The remainder of this thesis is organized into the following chapters:

In Chapter 2 we review the related literature. Our review is focused on two areas.

First, we provide an overview of the different database offerings built to manage medical

imaging data. Secondly, we look at the data models that are used to model scientific data.

Chapter 3 discusses the design considerations made during the development of

REMI and the methods used to develop its database.

Chapter 4 describes REMI, detailing its goals, architecture, functionality and the

database and application development process.

Chapter 5 presents the current status report of REMI. We provide details of the

present production environment.

Chapter 6 describes the problems we faced developing REMI and the lessons

we learned while investigating those problems and implementing solutions to those

problems.

Chapter 7 highlights the weaknesses in the work as it stands, and suggest future areas

for consideration and ways in which the current work could be extended.

7

Chapter 2

Literature Review

In this chapter, we provide a review of the literature related to the work described in

this thesis. Management of research data has been approached in a number of different

ways. Some approaches focus on comparative analysis of medical images comparing

one image to another. These are categorized as imaging databases, built with a certain

modality or anatomic region in mind. Other initiatives address the work flow that

is associated with the entire research process and how software systems could better

manage them. Research workflows and workflow management differ significantly from

that of business applications and efforts are being made to build systems that capture

the entire workflow process within research studies. Lastly we have scientific data

management systems, whose primary focus is storage and retrieval of research data and

its associated meta-data and data-files. Scientific data management systems differ from

the first two groups in that they do not focus on any workflows nor do they aid in any

comparative data studies. Central to all of these three systems is the use of meta-data and

in most cases efficient and reliable capturing of meta-data helps define the usefulness of

such systems. Modelling of this meta-data is another area of constant research given

8

that scientific data is heterogeneous and distributed over heterogeneous software and

hardware platforms.

Today, medical imaging units can store and transmit medical images from multiple

modalities using Picture Archiving and Communication System (PACS)[26]. The PACS

can store and retrieve scanned documents in addition to images and reports. However,

its objective is completely clinical and whose focus is oriented towards individual

patient information management. Such systems handle entire workflows covering the

entire spread of patients entry into the system upto the billing process. The workflow

architecture in a research lab may be quite different from that in a commercial clinical

imaging organization. Mathias Weske and Gottfried Vossen, in their work[40], talks

about Workflow-based Architecture to support Scientific Applications (WASA) as an

approach to workflow management for Scientific Applications. Their work is centered

around experiments, experiments being composed of a number of steps and the stages

in which an experiment may exist in. Although our work is also centered around the

notion of an experiment, our current implementation of REMI focuses purely on the

storage and management of experiments and their files, within the entire experiment life

cycle.

Within the Cardiac Nuclear Imaging domain two types of databases exist; static

and dynamic. Static databases are the commercially available clinical systems, like

Quantitative Perfusion SPECT[9], Emory Cardiac Toolbox[4], euHeartDB[31] and

Corridor 4DM[27], containing normalized archived images along with prepackaged

software for analysis and visualization. The primary purpose of these databases is

for researchers and clinicians to compare their images with the ones archived within

the system. On the other hand, dynamic databases as Cardiac Atlas from Oxford[30],

MEDIMAN[34] or ours (REMI), grow dynamically with continuous contribution. The

9

primary target users are the researchers, who may or may not have access to any

data gathering labs, to process the data on their own. REMI is freely and easily

accessible without any liability to the database owner. Our evaluation criteria will be

the usage frequency and ease of usage for the contributors (uploads data) or public users

(downloads data) of data.

Since REMI is a scientific data management system, our focus has been on defining

a data model to capture experiment information and related meta-data and designing

and building a system to manage research experiments using this data model. The

data model that REMI uses is similar to Core Scientific Meta-Data Model (CSMD)[35]

which has been developed at the Science & Technology Facilities Council (STFC), UK.

The model captures high level information about scientific studies and the data that

they produce. The model is designed to be generic across scientific disciplines and has

application beyond scientific facilities, particularly in the structural sciences (such as

chemistry, material science, earth science, and biochemistry). The paper also talks about

an Information CATalog (ICAT) system built on top of the CSMD data model. ICAT

in many ways is similar to REMI capturing experiment and meta-data information, but

is developed with more advanced features for integration with other medical devices.

REMI currently is not feature-rich when it comes to integration with different medical

instruments, but does expose REpresentational State Transfer (REST) API[29] for future

enhancements in the area.

Another effort comparable to our work was presented by Esperanza Marcos, Cesar J.

Acuna, Belen Vela, Jose M. Cavero and Juan A. Hernandez[34]. They developed a web

information system known as Medical Image MANagement (MEDIMAN) for medical

image management and processing. The system is primarily being used for PET and

functional MRI (fMRI) of the brain. MEDIMAN provides a web interface through

10

which the researchers could upload new clinical trials and research tasks and associate

result files, along with image files and various other documents. The paper specifically

talks about the development of the database and has no mention of how the various files

being uploaded are handled. MEDIMAN makes use two different database technologies

namely the Object Relational Model (ORM) and XML Model. The ORM is used to

model the structured part of the data being captured and the XML model is used to

capture the semi structured or dynamic data. Majority of the data that MEDIMAN

deals with is either Digital Imaging and COmmunications in Medicine (DICOM)[3] or

ANALYZE[1] file. Meta-data is extracted from within these files and stored within an

XML structure which is then saved to the database.

The ARCHER Project[21] has developed a production-ready, open-source generic

e-Research infrastructure including a Research Repository and a Scientific Dataset

Managers (both a web and desktop application) and a Collaborative Workspace

Environment. Institutions can selectively deploy these components to greatly assist

their researchers in managing their research data. ARCHER allows researchers the

flexibility of iterative and heuristic workflows and ease of collaborative management,

and ensures that data remain well curated and publication-ready, with appropriate meta-

data, provenance, and authorization. Our interest in the ARCHER Project is their data

storage scheme and data model to store data. The ARCHER Project primarily makes

use of a key/value data store capable of dealing with large and numerous dataset. The

key/value data store was deemed inadequate for the requirements of proper data curation

and hence an additional meta-data store was used based on the Core Scientific Meta-

Data Model.

Apart from literature that talks about the different approaches to building the various

systems to manage scientific data, a couple of papers talk about the general landscape

11

of the area dealing with Scientific Data Management. Jim Gray and David T. Liu in

their paper Scientific Data Management in the Coming Decade[32] talk about how the

solution to scientific data management lies in moving towards Science Centers. They

state that the demand for tools and computational resources to perform scientific data

analysis is rising faster than data volumes. Science centers provide access to both the

data and the applications that analyze the data. Providing such capabilities then do

away with the need for researchers to download or copy the files to a local system

and using his own resources to work on the data. This minimizes data movement and

allows collaboration among a group of scientists doing joint analysis. They also discuss

the ability to replicate data, much like Redundant Array of Independent Disks (RAID)

does today to provide both data availability and protection against data loss. Meta-data

also proves to be an enabler for sophisticated tool automation. Extensive meta-data

describing the data in standard terms is needed so people and programs can understand

the data.

A more detailed look into the characteristics of a scientific data management system

is provided by Mario Valle in his article Scientific Data Management[39]. Mario

starts with the problems that plague the scientific research field such as scattering of

research data across a research organization, lack of important meta-data and proper

data storage strategies to store the research data. He then identifies critical points

that any scientific data management system should incorporate such as creation of

logical collections, physical data handling, security support, data ownership meta-data

collection, management and access, persistence, knowledge and information discovery

and data dissemination and publication. REMI was developed with the intention that it

could deliver most of the above mentioned point if not all.

12

Chapter 3

Design Considerations

This section we discuss the design decisions of REMI and the basic functionality needed

to be supported by REMI.

One of the primary motives to building a medical research database system is

the need to share research data and results across a number of different research

organizations and different researchers. Users need to navigate through data sets based

on different search criteria, and download data that they are interested in to their local

machines. A web interface easily facilitates the ability to allow anyone to access the

system, explore, and download files from anywhere in the world, provided they are

connected to the Internet. To interact with the web interface all one needs is a web

browser. Web browsers come installed on most present day operating systems, and one

can be easily be downloaded from the Internet. One of our initial design decisions is

to avoid the use of any third party browser plug-ins. Uploading data to the database

can also be done independently from anywhere, provided the user has proper privileges

assigned to them. Privileges are assigned to users via roles. A User interacts with

the database through a specific role. A user may want to upload data to REMI and

13

hence plays the role of a contributor. As a contributor, the user is provided privileges

to upload experiments to REMI. Similarly a user may want to add users. They do this

as a system administrator and needs to have the System Administrator role assigned

to them. The System Administrator role has privileges to add, remove, and delete users

from REMI. Hence roles are assigned to users there-by allowing them certain operations

within REMI.

Data generated by medical imaging studies can be of very high volume. Given

the advancements in technology, and the availability of high volume data storage

devices, it is made possible for medical equipment to capture highly precise and detailed

information, at the cost of generating huge volumes of data. Given the need to be able

to store such high volumes of data, it is necessary that the storage scheme should be

dynamic enough to increase as and when needed by adding more disk space.

Another necessity of the database is to be able to handle storage of files in an

automated and well distributed manner on the file system. There are a number of ways

one can go about storing files on a file system, and each of them vary in complexity. The

simplest of solutions is to have all files stored under a single directory. Another approach

is to create an individual directory for each of the research studies. The intention is to

find a solution that is able to store files in a highly efficient and distributed manner across

directories.

The database should be backed by a good data model that captures all types of data

that gets generated as part of different research studies being conducted in any single

organization. We have four level of data sets, namely:

• Scientific Equipment generated Raw Data

Most present day scientific equipments, in addition to generating raw data,

also generate additional information regarding different machine parameters.

14

Parameters may be static values defined on the machine themselves or may

be parameters that could be changed through some user interface either on the

machine itself or via an attached computer terminal. Each machine could follow

different formats of how this information is captured. In most studies that we

deal with DICOM[3] or INTERFILE[20] or ANALYZE[1] file formats are used

to capture significant amount of meta-data.

The DICOM is a standard created by the National Electrical Manufacturers

Association (NEMA) to aid the distribution and viewing of medical images, such

as CT scans, MRIs, and ultrasound. Part of the standard describes a file format

for the distribution of images. Image files which are compliant with the DICOM

standard are known as DICOM format files. A single DICOM file contains both a

header, which stores information about the patient’s name, the type of scan, image

dimensions, etc, as well as all of the image data, which can contain information

in three dimensions.

Analyze[1] is an image processing program, written by The Biomedical Imaging

Resource at the Mayo Foundation. The Analyze format is the image processing

format used by the Analyze program. The Analyze format stores the image data

in one file (*.img) and the header data in a standardized text file (*.hdr).

• Parameter data

Raw data sometimes may need to be massaged or normalized before any analysis

on it takes place. An example of this process is to re-sample raw data for image

reconstructions. This sort of data massaging is done based on a certain process

and variable or parameters that belong to that process. All of these parameters

15

used in such data massaging are also needed to be captured in the database and

forms part of the meta-data for that study process.

• Reconstruction Data

In nuclear medicine, the SPECT or PET camera rotates around the patient,

taking pictures of radioisotope distribution within the patient from different

angles. These ”pictures” acquired from the nuclear medicine camera are called

projections. The procedure to put the projections together to obtain a patient’s

image is called image reconstruction. Reconstructed images constitute the third

type of binary data we deal with. A wide number of reconstruction methods

can be used to reconstruct images. Each of these methods have their own

set up parameters that drive the reconstruction process and generate parameters

as part of the reconstructed image. Again these parameters may exist within

log file or parameter files that are generated by the reconstruction process. In

short, the typical meta-data includes processing parameters that form part of the

reconstruction process.

• Quantified Research Results

Research studies are conducted to evaluate some original research hypothesis and

are used to generate quantified research results. These results also need to be

captured so that these results can be queried and shared across with other research

organizations and publications.

REMI involves all four stages of data capture and attempts to include all meta-data

and lineage of the data files, thus, facilitating maximum usefulness to any researcher,

even non-members of the lab. Figure 3.1 illustrates a particular SPECT study. The image

shows a patient lying down on the machine gantry and is being imaged by the detector

16

of the SPECT machine that rotates around the patient. The imaging data captured by the

SPECT machine forms the raw data captured. Vital to SPECT imaging and the process

of image reconstruction is the presence of two coordinate systems depicted in the image.

The first coordinate system is the Patient Coordinate System (PCS) shown by the x, y

and z axis. Things to note about the PCS is

• the PCS origin lies at the patient’s right anterior abdomen

• x-axis points from the patient’s right to their left

• y-axis points from their chest to their back

• z-axis points from their feet to their head.

The second coordinate system is the Detector Coordinate System (DCS) shown by

V and U axises on the detector. The geometry of the DCS is

• the DCS origin is in the detector corner behind the operator’s head

• V points from the patient’s right anterior to left posterior

• U points from the patient’s feet to head.

All of this information along with the length of each of these axis, forms part of the

parameters that prove useful in understanding the experiment and aids the researcher

during the image reconstruction process.

17

Figure 3.1: The patient coordinate system and detector coordinate system are depictedin the image

18

Chapter 4

Description of REMI

4.1 Overall description

REMI is a web-based system built to handle medical research data and is currently

being used by researchers within the Life Sciences Division at the Lawrence Berkeley

National Laboratory, California. The primary reason for building REMI is to enable

researchers to share their research studies with other research organizations. Most data

within REMI are files that are generated from either Diffusion Tensor MRI (DTMRI) or

SPECT or Micro-PET studies using scanners that are applicable to each of the different

modalities. Each of these data acquisition and processing studies are expensive ventures

in personnel time and resources and hence there is a need to collect, tag and store the

data in a reliable and consistent manner. Although REMI operates in a centralized way,

it can be used virtually anywhere as it is web-accessible. Organization and researchers

are allowed to query the system through an easy and intuitive web interface.

19

4.2 System architecture

The web interface for REMI is built using the Ruby on Rails application framework.

Ruby on Rails (henceforth ’Rails’)[33] is an open source web application framework and

uses the MVC[38] architecture pattern to organize application programming. Figure 4.2

demonstrates the Rails implementation of the MVC architecture. The framework makes

web development an easy task by aiding the developer with ’out-of-the-box’ tools that

take care of common development tasks. One such important tool is the scaffolding

generator that generates for the developer the bare bone components of model, view

and controller based on the parameters provided to the scaffold generator. Apart from

providing such tools, Rails also follows a couple of development principles that make

developing and using this framework quite appealing.

20

(a) Hibernate XML Configuration File

(b) User Table Definition

Figure 4.1: To define a mapping between the model class and the database table aXML configuration file is maintained to specify this mapping. (a) Hibernate XMLConfiguration File. (b) User Table Definition

A key principle of Rails development is ’Convention over Configuration’ (CoC).

This means that the programmer only has to specify a particular mapping if and only if

the programmer deviates from the conventions set by the application framework. This

is quite a refreshing change as opposed to frameworks such as Java Platform, Enterprise

Edition (J2EE)[6] where integrations with the various components in the framework

need to be recorded within configuration files. Figure 4.1(a) shows a typical hibernate

XML configuration file that defines the mapping of a User class and its attributes to

its corresponding database table and columns. Figure 4.1(b) shows the table definition

of the User table defined on the database. While following the Convention process,

defining a class named User automatically translates to the application framework, in

21

this case Rails, to map to a table named User in the database. Also attributes need not

be explicitly defined on the class as it figures out the related attributes by querying the

table definition within the database. Doing away with configuration files reduces on the

number of files the developer needs to maintain which results in a smaller codebase to

manage. Another principle that the Rails framework encourages is the ’Don’t Repeat

Yourself’ (DRY) principle of software development. Rails provides easy programming

constructs that help the user to do away with repetitive code being written in a number of

places. Programmers with the help of partials and modules can make the code compact

and modular by extracting common code into one of these two constructs.

22

Figure 4.2: The Model-View-Controller architecture as implemented in Rails

Another characteristic of Rails is the emphasis on RESTful application design.

REST[29] is a style of software architecture based around the client-server relationship.

It encourages a logical structure within the application, which means that the REMI

application can easily be opened up as an API of other tools and programs to interact

with REMI. There are a number of competing application programming interface

standard such as Remote Procedure Calls (RPC)[23], Simple Object Access Protocol

23

(SOAP)[18] and Web Services Description Language (WSDL)[11]. Each of these

standards have their merits but are heavy and verbose in their implementations as

opposed to RESTful API. The REST state makes use of the URL[22] string along with

a combination of the HTTP GET and POST requests to send back instructions to the

server. A combination of the type of HTTP request along with the URL itself contains

information such as the component that we want to interact with, the operation that we

want to perform with that component and the ID for a particular instance that we are

interested in.

Figure 4.3 shows the implementation of the SOAP and REST protocols. The web

service example is querying a phonebook application for the details of a given user. The

ID of the user is provided to the web-service. The SOAP implementation is sent to the

server via an HTTP POST request, and the data is received inside a SOAP response

envelope. Additional processing is required to process the SOAP response envelope to

fetch the returned data. Looking at the REST implementation, the construct is not a

request body, but just a simple URL. This URL is sent to the server using a simpler

GET request, and the HTTP reply is the raw result data, not embedded inside anything,

just the data one can directly use.

24

Figure 4.3: Difference between SOAP and REST. Example taken from an onlineresource[7]

Users interact with the REMI system via a web interface. All HTTP requests from

the clients are handled by an Apache HTTP server. The REMI system is built using the

Rails framework. Apache cannot directly communicate with the Rails application and

hence cannot hand off the requests coming from the clients; to the Rails application.

Phusion Passenger[17] is a module for the Apache HTTP Server to handle deployment

of Ruby applications, including those built using the Ruby on Rails framework. To help

Apache with the task of request forwarding to the Rails application Phusion Passenger

is installed.

For any application the user experience is one of the most important aspects of

application adoption and success. To provide a better user experience and to make

the user interface more responsive we found it necessary to minimize a lot of client

- server communication. Our clients are thin clients but rely heavily on JavaScript

within our web pages to make the user interface absolutely responsive to user inputs.

We have also made use of the File[15] and Drag and Drop[5] API, that form part of

25

the HTML5[16] specifications, and asynchronous Ajax requests to improve the user

interaction with REMI.

Data-files and meta-data captured as part of each experiment is very critical for

research study to be conducted. It is expected that REMI needs to provide a highly

reliable, fault tolerant and secure data storage solution. The usage of a RAID[37] storage

device is highly recommended which addresses all of the above requirements. Through

the use of redundancy, most RAID levels provide protection for the data stored on the

array. This means that the data on the array can withstand even the complete failure of

one hard disk (or sometimes more) without any data loss. A good RAID implementation

can improve data availability by providing fault tolerance and by providing special

features that allow for recovery from hardware faults without disruption. A RAID

device turns a number of smaller drives into a larger array, there-by permitting the

possibility of creating a data storage of larger capacity than each single storage drive.

This facilitates applications that require large amounts of contiguous disk space, and

also makes disk space management simpler. RAID systems improve performance by

allowing the controller to exploit the capabilities of multiple hard disks to get around

performance-limiting mechanical issues that plague individual hard disks. Different

RAID implementations improve performance in different ways and to different degrees,

but all improve it in some way.

26

Figure 4.4: Various software components that make up REMI

Based on the individual components explained above, figure 4.4 shows a high

level software component interaction diagram that collectively works together to make

up the REMI system. The primary method of interacting with REMI is through a

web browser. The web interface makes use of JavaScript to handle client side user

interface manipulations and asynchronous interactions with the REMI server. The web

browser on the client side sends requests to the Apache web-server on the server. The

27

web-server hands off the requests to the passenger module to interact with the Rails

application framework. The application framework houses the business logic to handle

all the requests coming from the client. Depending on the nature of the request, Rails

interacts with the database and in addition may also interact with the underlying file

system to fetch or store data files. The Rails framework communicates with the MySql

database via the ActiveRecord object-relational model API. The ActiveRecord Model

library comes bundled with the Rails installation so it forms part of the web-application

framework. To interact with the file system to store and retrieve data files, we have made

use of core File library API that comes as part of the Ruby language.

4.3 REMI database development

REMI makes use of a data model that is organized around a notion of an Experiment,

where an experiment is defined as being a body of scientific work on a particular

subject or subjects of investigation. Each experiment is a single scientific task, be it

an observation, measurement, or simulation. Raw data is generated for this task, which

is then analyzed to produce derived data. All of the data generated is then associated

with the Experiment, with proper meta-data filled out. Every experiment is conducted

and owned by a Researcher. The experiment process may include the need to make

use of an imaging device to gather and capture related experiment information. An

experiment is defined as a fundamental unit of research work, which may or may not

consist of smaller tasks. REMI defines an experiment, not as a series of events or tasks,

but as a research work that starts on a particular date and involves making use of at

most one medical device for that process. Hence on any given date an experiment could

28

make use of a single medical device and a number of Subjects could participate in that

experiment.

Following is a listing of all the logical entities of the data model for REMI. Figure

4.5 shows the relationships between each of the entities.

• Experiment

A fundamental unit of research work that has a name associated to it, a description

to describe what the experiment is about or what the experiment aims to

accomplish, date on which the experiment took place and the modality to which

the experiment belonged to.

• Subject

A subject represents any object or phenomenon that is observed for the purpose of

research. For the kind of experiments that we address the subjects mostly belong

to the Animal kingdom. Each subject is uniquely identified based on a unique

assigned subject tag. Other information such as the species, gender, date of birth

and health condition is also captured. It is important to ensure that information

related to a subject be made anonymous and in no way one should be able to

identify an subject based on the captured information.

• Researcher

An experiment is always conducted by a particular researcher. The researcher

owns the experiments they perform. Apart from the name of the researcher, it is

also necessary to capture the position he holds, the department that he belongs to

and the institution or organization that he works for. Within the context of REMI,

a researcher is one who conducts experiments and then contributes to REMI by

29

uploading the various data-files and meta-data generated as part of the experiment

conducted.

• File

A file is a block of arbitrary information, or resource for storing information,

which is usually based on some kind of durable storage. Files are generated as

a process of conducting an experiment and contains information related to the

experiment being performed. Apart from the actual contents of the file, additional

information such as a file name, size of the file, date of creation and modification,

content type, location of the file within a hierarchical directory structure and fixity

information such as a checksum are also maintained with the file.

• Tag

A tag is a keyword or term assigned to a piece of information such as an Internet

bookmark, digital image, or in our case the data file. This kind of meta-data helps

describe an item and allows it to be found again by browsing or searching. Tags

represent semi-structured data that can be used to describe the files being attached

to an experiment.

• Machine

A Machine represents the instrument used, as part of an experiment, to gather

and capture data. Machines are varied in purpose, construction and capabilities.

Although a machine can be defined by numerous parameters, the ones that are

important to REMI to capture is the name of the machine, the manufacturer,

the physical address of the machine and number of detectors on the machine if

applicable.

30

• Detector

Detector contains information of the detectors on the machines and information

such as type of collimator, detector material being used for an experiment.

Figure 4.5: Main Entities of REMI

The proposed logical entity relationship diagram for REMI is shown in figure 4.5.

The Experiment entity is central to the entire REMI system. Experiments are conducted

by a particular Researcher and may be performed on one or more Subjects. A researcher

conducts a number of different experiments as deemed necessary within his area of

research. As part of the experiment, the researcher may make use of a particular

Machine to capture data or monitor subject and experiment conditions. A single

machine can be used to conduct different experiments. In the process of conducting

31

an experiment, data is captured/logged into one or more files. These files form part

of the entire experiment or could be files generated while studying a particular subject

that participates in the experiment. During the course of the experiment information

regarding the status of the subject needs to be captured. This information is represented

by attributes defined on the relationship between Subject and Experiment. Files are

generated as a result of performing an experiment. Since files cannot exist without being

part of a certain experiment, the File entity is shown as a weak entity whose existence

depends on the Experiment entity. File can be associated to an experiment in two ways.

As mentioned earlier certain files are generated while studying a particular subject that

participates in the experiment and hence should get associated to the ’participation’

relationship of the experiment and that subject. Every other file generated that does

not pertain to any particular subject is then associated directly to the experiment.

This association between Experiment and File or Experiment-Subject and File is also

captured within the logical ER diagram.

Although the data model is centred around the Experiment entity, the primary objects

for REMI is the data files that get associated to the experiments. Having explained the

importance and the role that meta-data can play in medical data management systems

such as REMI it is important to understand the nature and characteristics of the meta-

data being captured. All of the meta-data that gets captured can be broken down into two

distinct categories. Firstly medical image information, i.e. the data of each image file

and the results of the image processing; which can be considered as semi-structured

data; secondly information on the research study, subjects, researchers or principle

investigator which is highly structured data. Structured meta-data are those attributes

that we know to be important at the design-time and some may be query-able, i.e.

some user may want to access data-files that have queried attribute values (e.g., search

32

human sinograms where reconstructed image is available, or ”search list-mode raw data

for all hypertensive rats”). Other structured meta-data may never be queried on, e.g.,

processing parameters, and yet must be available. Semi-structured meta-data may be

query-able, but we do not either know those attributes nor their values, at the design-

time. They are designed as tags or key-words whose values are user provided. The query

engine may do string search on them. Unstructured meta-data are textual descriptions

that are open for viewing only and database has no search capability on them.

Identifying this distinction between these two sets of data is critical in the design

and development of the REMI database. The structured data can be easily mapped to a

relational database model but for the semi structured data, a relational data model does

not work since the semi structured data is dynamic in nature where we do not know of

those attributes nor their values at design-time.

The logical ER diagram include some additional entities to handle user privileges

and they have been explained below.

4.3.1 Roles and Abilities

Roles are entities that are used to govern how the users can interact with REMI. Roles

are assigned to users of the system. There are namely four roles defined on the system

out of the box.

• Public User

Public User is not an explicit role defined on the system but if a user has no other

role assigned to him then he automatically is a public user. Any user by default

is a public user. As a public user one can browse and query the REMI system for

the various studies within REMI. As a public user one can download data files

33

associated with the various experiments. In short one can view the system via this

role but cannot contribute or add data back into the database.

• Contributor

A contributor is one that can contribute experiments to the REMI system. He can,

in addition to viewing other experiments, create, update, and delete experiments.

The other entities that he has privileges on are Subjects and Machines. So he can

perform Create, Read, Update, and Delete (CRUD) operations on these entities

too.

• Data Authenticator It is important to ensure that the REMI system be used

mainly for storing and sharing medical experiment data. There is potential to

use the system for file sharing purposes other than medical data files. To ensure

that experiments and more importantly the files being uploaded are related to

the experiment there is a need to ensure some sort of gate keeping before the

experiments can be made public. A Data Authenticator in this role is primarily

tasked with authenticating the data being uploaded and granting permission for

that particular experiments to be made public.

• System Administrator A System Administrator is in charge of maintaining the

Roles, Users and the Categories within the system. He has CRUD privilege on all

of these three entities.

4.3.2 Categories and Category-Values

Every entity/class comprises of a list of attributes. An instance of an entity is described

by the data stored in each of its attributes. For some attributes, defined on a number

34

of entities within REMI, its value is chosen from a set of pre-defined values. Each

of these attributes would have its own unique set of values to chose from. We have

defined this attribute to predefined list of values mapping as categories and category-

values. Modeling data of this nature is also known as List-of-Values (LOVs)[8]. A lot

of the attributes, such as gender, species, modality, and data types, are stored within the

REMI system as categories and possible values for each of these attributes are stored as

category-values.

Figure 4.6: Physical Design Diagram. Refer to Appendix D for a complete listing of alltables.

Having defined the logical entities essential for REMI the natural next step is to

translate the logical ER diagram to a table design diagram. Figure 4.6 shows the various

35

tables and how each table is related to another. Each entity in the logical ER diagram

translates to an individual table within the database. If two entities share a one-to-

many relationship between them then at the table design the table that represents the

many aspect of the relationship includes an additional attribute that maps to the unique

identifier of records in the table that represents the one aspect of the relationship. Two

entities that share a many-many relationship between them have tables that represent

each of those two entities and an additional third table to store the associations between

those two entities. If a many-many relation has attributes defined on it then those

attribute are added to this third table. The Experiment and Subject entities share a

many-to-many relationship and this relationship has Condition and Weight as attributes

defined on it. The ExpSubjectAssociation table is defined on the database to represent

this relationship and this table contains columns to capture the condition and weight

attributes.

Another interesting relationship, that we needed to model properly to database

tables, is the relationship between File, Subject and Experiment. Depending on how

a researcher chooses to define an experiment, an experiment may contain no or one

or many subjects. If a researcher does choose to associate more than one subject to

an experiment then there is a chance that certain files generated over the course of

the experiment may pertain to a specific subject that participated in that experiment.

Hence those files need to be associated to the relationship that is defined between that

experiment and that subject. Within that same experiment some files may be generated

that is related more to the experiment rather than a specific subject participating in the

experiment. Example of such files would be experiment calibration files or README

files to name a few. These files need to be associated directly to the experiment itself.

The File entity translates to a table called file on the database. The Experiment and

36

Subject entities have their own table and the ExpSubjectAssociaton table is the table used

to capture the many-to-many relationship between these two entities. An experiment

could generate many files and one file could be part of a number of experiments. One

file could be generated as output of an experiment and that same file could act as an input

for another experiment. To capture this many-to-many relationship between Experiment

and File a third table is created called FileAssocation. Similarly a file can be associated

to the same subject but different experiments and also an association of subject and

experiment can generate more than one file. Hence this also forms a many-to-many

relationship between File and Experiment-Subject relationship. The FileAssocation can

be used to capture both these relationships with the File entity. The attributes defined on

the FileAssociation table are

• file id: Unique identifier from the File table.

• owner id: Unique identifier either from the Experiment table or the

ExpSubjectAssociation table.

• owner type: String field that is used to indicate if the record is related to the

Experiment table or ExpSubjectAssociation table.

As a process of designing our database we tried to follow data normalization

processes. Our database is in the second normal form, thereby ensuring that every

column value in the database is atomic and every non-key column is fully dependent

on the (entire) primary key. The database currently is not in the desired third normal

form due to the fact that the file table is a big de-normalized table containing not only

the attributes of a file but also attributes of all the meta-data that is associated with that

file. There is a fair amount of duplicate data that gets stored in this table as the same

37

meta-data can shared by more than one file. The de-normalization of meta-data and

file attributes was necessary to make the storing and retrieving of data fairly simple.

Breaking the meta-data into various entities would then mean joining a large number of

tables to search for files. Also since each file has its own meta-data attributes, updating

the meta-data for a particular file is fairly easy. Ease of handling meta-data related to

files is done at the cost of duplication of data. The exp file table is a de-normalized table

containing file information as well as other additional information that is not directly

associated with the file but is information about the data within the file. This data forms

the meta-data pertaining to the file. Currently the additional fields contained within

exp file is a static list of attributes that were identified along with the researchers as

critical data needed for storage and retrieval purposes. It being a static list is a big

limitation that we have mentioned in chapter 7.

For the database development we made use of MySQL server 5.1. The Rails

application framework provides a facility called Migrations. Migrations are a

convenient way to alter a database in a structured and organized manner. Migrations

allows one to use Ruby to define changes to a database schema, making it possible to

use a version control system to keep things synchronized with the actual code. Each

change in the database schema has its own migration object, encapsulating a move up

and a move down. Rails migrations allow data migration as well as schema migration,

and migration code is database independent. We have made use of migration files to

generate all of the tables within REMI.

38

4.4 REMI application development

Our initial development started with a system using PHP and Linux, Apache, MySQL,

and PHP (LAMP) based on a paper[41] published by Hugh E. Williams. This was a very

basic system that would allow anyone to upload files to a server. We tried building the

current system on top of this work but it became evident as the project got bigger that

an established and easy to iterate web framework was needed and hence the decision to

move our project to the Rails framework took place.

Rails uses the MVC[38] architecture pattern to organize application programming

and makes use of the ActiveModel object relational mapping model. Refer to appendix

C for details on the MVC pattern and the layout of different components within the

REMI source code. For every table within the REMI database there is a corresponding

model class defined within the framework. It is through these model classes that Rails

communicates with the database and performs CRUD operations on the respective

tables. A single instance of a particular model class maps to a single tuple of the

corresponding table in the database. In addition to basic CRUD operations it is a good

Rails programming practice for models to encapsulate the business logic. Business logic

could reside in the model class itself or modules could be created which models can look

up and execute code. To elaborate the above point, the ExpFile model, in addition to the

basic CRUD operations that are performed on the database table Files, also contains

the business logic to store files on the file-server. Logic related to generating the file

path, storing of the intermediate files, and merging of the intermediate files all resides

within the ExpFile model. Figure 4.7 show the various software components including

the views, controllers and models that are a part of the REMI system.

39

Figure 4.7: Various system components and its interaction with the System Roles

4.4.1 Authentication

Although REMI has been developed with the intention that anyone from anywhere

should be able to access and download experimental data there is processes within REMI

that we do not want to expose to the general public. Functions such as contribution

of new experiments to REMI, authorizing new experiments to be made available to

the public users and administrating user resources and role are some of the privileged

functionality that we want to make available to certain groups of people only after proper

credential validation. A simple username and password authentication system is put in

place to validate users. Depending on the role of the user and upon user credential

validation different menus are then exposed to the user through which user can perform

more advanced actions.

40

To ensure that authentication rules are applied correctly an ability module is placed

between every controller and model interface. It makes sense to have the module

sandwiched between the controller and the model because it is the model that deals with

the data layer within REMI and before we provide access to the data proper privilege

check need to be made. The ability module looks at every request that a controller makes

on behalf of the user and validates if the user has the valid permissions to perform that

action on the model. Abilities to perform an action on a model it associated with roles.

Assigning a role to a user in effect provide him the ability to perform actions on models

that role defines.

4.4.2 File Storage

A challenge in designing a file storage system is implementing a good file and directory

structure design to store possibly a large number of files. The simplest possible solution

to storing file within a file system is to have all of the files under a single parent

directory. This sort of design is very simplistic and very easy to implement. The

disadvantage with such a design scheme is that at the operating system level, any file

or directory commands executed on this directory would be slow and unresponsive.

Another approach is creating an individual directory for each of the experiments that

are created. This again can eventually lead to a large number of sub-directories being

created as the number of experiments within REMI increases. The challenge is to come

up with a mechanism to create such a directory structure that can grow and shrink

autonomously. We have used the following method to create directory structures on

the fly. Every file that needs to get uploaded to the database would contain a file name.

We concatenate the experiment id with the filename and generate a 64 length SHA256

41

hash. The hash is a string of hexadecimal characters. Now depending on two system

parameters ”directory levels” and ”directory name size”, the directory structure

is automatically generated using the hash value and the file is stored under the last

directory. The parameter ”directory name size” determines how many characters to

use of the hash value to create a directory. The parameter ”directory levels” determines

the depth of the directory structure which in turn determines the number of different

sequences of strings generated based on the parameter ”directory name size”.

Figure 4.8: Based on the settings of directory levels and directory name size set to 2,following the possible directory structure created

Following is a sample hash that gets generated:

fb80f1735b3153e7a41d38390de5d9773c35259965b835b859a32bfa3713bd4b

Now if ”directory levels” is set to 2 and ”directory name size” is set to 2

then using the above hash value we generate directory ”fb” at level 1 and

another directory ”80” at level 2. Hence the directory structure that you get is

ROOT DIRECTORY/fb/80/<experiment id> <filename>.<extension>

42

4.4.3 Menu

The main menu as seen by a public user is dynamic in nature. Menu sections such as

Modality, DataSet, and Species have their values retrieved from the REMI database.

These three menu items are stored as categories within the database and whose values

are in the category-values table. Reason for this is that new modalities, datasets, or

Species can be added by the System Administrator and those new entries need to surface

on the public-user user interface as part of the menu. All other static menu items are

retrieved from the menu.json file that resides under the folder app/configs. The menu

structure can be changed around by modifying the contents of the menu.json file.

4.4.4 Behaviour Driven Development

Behavior driven development (or BDD)[36] is an agile software development technique

that we followed. BDD focuses on obtaining a clear understanding of desired software

behavior through discussion with stakeholders. It extends Test Driven Development

(TDD) by writing test cases in a natural language that non-programmers can read.

Behavior-driven developers use their native language in combination with the ubiquitous

language of domain driven design to describe the purpose and benefit of their code.

This allows the developers to focus on why the code should be created, rather than

the technical details, and minimizes translation between the technical language in

which the code is written and the domain language spoken by the business, users,

stakeholders, project management, etc. BDD relies on the use of a very specific (and

small) vocabulary to minimize miscommunication and to ensure that everyone, that is

the business, developers, testers, analysts, and managers,are not only on the same page

but use the same words.

43

While using BDD, the idea is to use stories for planning and estimation. But once

we deliver working code it is more natural to talk about code in terms of features rather

than stories. And since stories in later iterations can lead to enhancements of existing

features, it is much easier to keep things organized by feature and just add new scenarios

to the existing features.

44

Figure 4.9: The Behavioural Driven Development Life Cycle. Picture taken from TheRSpec Book[25]

45

We typically use Cucumber[2] to describe behavior of the application from the

outside and RSpec[10] to describe the behavior of its component parts. Test driven

development follows the familiar red/green/refactor cycle, where we start of with a

failing test case indicated by the color red, then we write code so as to make the failing

test pass which is indicative of the color green. Now we can refactor the code with the

assurance that the test we just wrote would catch any bug we introduce as a result of

the refactor. With the addition of a higher level tool like Cucumber, we actually have

two concentric red/green/refactor cycles, as depicted in figure 4.9. Both cycles involve

taking small steps and listening to the feedback from the tools. We start with a failing

step (red) in Cucumber (the outer cycle). To get that step to pass, we drop down to RSpec

(the inner cycle) and drive the underlying code at a granular level (red/green/refactor).

At each green point in the RSpec cycle, we check the Cucumber cycle. If it is still red,

the resulting feedback should guide us to the next action in the RSpec cycle. If it is

green, we can jump out to Cucumber, refactor if appropriate, and then repeat the cycle

by writing a new failing Cucumber step.

As part of developing REMI we decided to practice BDD. Given this approach

we ensured that we had test cases in place before writing any piece of code. Figures

4.10 depicts the scenario of Creating a new Category that forms part of the feature A

System Administrator manages Categories. Similarly 4.11 illustrate two scenarios each

of which is a user logging in either as a Contributor or System Administrator. Both of

these scenarios are part of the Log In feature.

46

Figure 4.10: As a system administrator, he can create new categories in REMI

Figure 4.11: Users should be able to Log into REMI

To elaborate on the development process using BDD let us understand the Login

feature and specifically the first scenario. The scenario tests the login process for a user

who has the Contributor role assigned to him. The Given...And...Then statements then

define the scenario. It states that given a user logs in the ’Contributor’ role, when he

clicks the Contribute link he should be taken to the list of all experiments page. The list

of all experiments page is only accessible as a contributor. Testing this scenario throws

47

an error for the first step itself. Cucumber is not able to interpret the statement ”I am

logged in as a user in the (.*) role”. This is shown in figure 4.12.

Figure 4.12: Cucumber fails to interpret the statement

To help Cucumber understand that particular we write code explaining what needs

to be done as part of that step. Figure 4.13 shows the code for that particular step. Based

on the role passed as part of the description of that step we fetch that corresponding role

from the database. Then we make use of a test fixture to create a user and assign that

user the role. Then we visit the log in page and log into REMI using this test user. Upon

logging into REMI, the user should be directed to the home page and that particular

check is being made at the end of that piece of code.

48

Figure 4.13: Provide a definition to a step that Cucumber does not understand.

Figure 4.14: All the cucumber test pass now.

49

Now we run the test again and we notice that all the test pass as indicated in figure

4.14. As for the other two steps in that scenario, the cucumber-rails gem installs code

that can interpret the rest of the steps. Using the cucumber-rails gem one can interpret

clicking links and buttons, being redirected or landing on pages, filling out of fields in

forms and validating the value of fields on forms.

50

Chapter 5

Status report

REMI has been deployed into production and is being used by the Nuclear Imaging

group of the Life Sciences Division within the Lawrence Berkeley National Laboratory.

The system is being used to upload and share experimental data with other research

organizations like UC Davis and UC San Francisco in China Basin, San Francisco. As

of today REMI contains 175 experiments within the system and manages 8105 files that

are associated to one of these experiments. There is an active initiative to upload all of

the experiments and its associated data files, conducted in the past, to REMI.

REMI resides on an Apple XServer 3.1 system with the latest Mac OS X Server

10.7.1 operation system installed on it. The Apple XServer machine houses two Quad-

Core Intel Xeon 2.26 GHz processors with 6 GB of RAM to utilize. The server is

connected to a RAID array through Fibre Channel cables. The RAID array consists of

2 SATA drives.

A complete hardware configuration of REMI is provided below

1. RAID Arrays:

51

(a) Promise VTrak E610f - 16x 2 TB SATA

(b) Promise VTrak E610f - 16x 1 TB SATA

2. Capacity:

(a) has a raw capacity of 32 TB and a formatted capacity of 26 TB (RAID 6)

(b) has a raw capacity of 16 TB and formatted capacity of 13 TB (RAID 6)

3. Server:

• Apple Xserve (Xserve3,1)

• 2 x Quad-Core Intel Xeon 2.26 GHz processors

• 6 GB RAM

• Operating System: Mac OS X Server 10.7.1

The RAID arrays are directly attached to the server using Fibre Channel cables. The

single Apple XServer houses the MySQL database, the application server (Rails) and is

directly connected to the RAID device. Hence all the software components that make

up REMI are all present on a single server machine.

52

Chapter 6

Lessons learned

During the course of building REMI we faced a number of challenges along the way.

One of the basic requirements of REMI is the ability to upload research data files. The

nature of the data files varies in data types, file encoding formats and file sizes, to name

a few. One of the challenges we knew upfront was that REMI needs to handle uploading

and downloading of large files. Data files generated by various experiments vary from

few kilobytes to a little less than 100 gigabytes of data. REMI would need to be able

to handle such large volumes of data. Uploading a file is done within the context of an

update operation of an experiment. The process is to add a file to an experiment and

save the experiment. The new file then gets uploaded to the server and gets associated

to that particular experiment. A large number of files can be added to an experiment

within a single work flow. In order to handle updating the experiment and the newly

attached files, we let each file get uploaded within a separate Ajax request, one for each

file and the final update request for the experiment is sent to the server as a regular HTTP

request. Our assumption was that each XMLHttpRequest[14] would be able to handle

the upload process of each file irrespective of the size of the file. This assumption turned

53

out to be false as each browser has a maximum cap on the amount of data that can be

transferred via a single request. We tried to confirm the cap on file size for browser

uploads by looking up published information for each of the browsers on respective

browser websites. The limit on file size is varied depending on the browser used. The

current version of all the browsers now support a maximum upload file size limit of

4GB123.

During our test phase we also noticed that REMI was not able to upload some of

our test files. For such files we noticed that the browser sends a synchronization request

to the server and instead of beginning the file transfer, the browser will just not send

any more requests. We confirmed this behavior by monitoring the requests being sent

between client and server using a tool called wireshark[13]. Because of this behavior we

noticed that the server will start a connection and will wait on the browser for the next

incoming request, but since the browser will not send any further requests the connection

on the server will time out. Although we could not get to the root cause of this problem

we have a strong belief that the data encoding of the file and the manner in which the file

was being sent across from the browser to the server within the Ajax request was causing

a problem. The file should have been transferred as a binary large object (BLOB) but

instead was being transferred as a regular data file. We could have just created a BLOB

object for the entire file and the browser could then send the upload request to the server,

but this would not be a good solution as creating a BLOB for a large file increases the

memory footprint of the browser.

1http://code.google.com/p/gss/issues/detail?id=942http://blogs.msdn.com/b/ieinternals/archive/2011/03/10/wininet-internet-explorer-file-download-

and-upload-maximum-size-limits.aspx3http://www.motobit.com/help/scptutl/pa98.htm

54

Figure 6.1: Sequence Diagram for the File Upload Process. Refer to figures 6.2, 6.3,6.4, and 6.5 for more visible images.

To tackle the problems mentioned above, we resorted to uploading files from the

client to the server in chunks. Each chunk being uploaded is 250MB in size, which

is well below the maximum cap of file size for data uploads through a single browser

request. Uploading the files in chunks also provides us the ability to resume an upload in

case a file upload fails for any reason. The file resume feature is currently not supported

by REMI and is part of our future road map. However, a file may be reloaded again

55

from scratch. Currently we make use of the MozSlice method to read file chunks and

this method is specific to the Firefox browser. A similar method exists for browsers

that are based on the Webkit[12] engine known as webkitslice. Work is ongoing on

standardizing this method across all browsers as part of the HTML5 initiative. The

MozSlice/WebkitSlice method returns a new BLOB containing the data in the specified

range of bytes of the source. This helps us with the second issue of ensuring that all files

are being sent to the server as BLOB objects.

Figure 6.1 shows the entire sequence diagram during the upload process. A detailed

explanation of each of the steps is provided below based on the different parts of the

sequence diagram mentioned above.

The upload process starts with the user clicking the Update Experiment button on

the web browser. Clicking the button calls a JavaScript that then takes all of the meta-

data added for each new file and creates an object, per file and meta-data association

(Step 1.1 in figure 6.2). These objects are then loaded in an upload queue that is part

of the file uploader (Step 1.2 in figure 6.2). Once all the file objects are loaded in the

queue, we call the upload method on the file uploader (Step 1.3 in figure 6.2). Details of

the upload process are explained further down. Once the files are uploaded then another

POST request is sent to the server to update the other experiment related details (Step 1.4

in figure 6.3) and depending upon if the files and experiment data has been successfully

uploaded, a response is sent back to the browser.

56

Figure 6.2: Sequence Diagram for the File Upload Process: Part 1

57


To describe the details of the upload process, when the upload method is called on

the file uploader, the file uploader starts to process the queue of files. For each file a

new Ajax (XMLHttpRequest) object is created (Step 1.3.1 in figure 6.2). Then we start

to transfer the file to the server in parts. We first find out the chunk size (Step 1.3.2

in figure 6.2) and then we send that chunk to the server by calling the save file API on

the experiment controller (Step 1.3.3 in figure 6.2). A response of the chunk file being

saved is then received from the server (Step 1.3.4 in figure 6.3)

Figure 6.4 shows the details of saving the file chunk on the server.

58


The client browser is responsible for creating the chunk files and transferring them

to the server. The browser keeps a track of how much of the file has been transferred

and how much is remaining. The web browser sets an addition final flag to true when it

is the last chunk of the file to be transferred. This final flag indicates to the experiment

model that the entire file has been transferred and that it needs to concatenate those file

chunks. Once the chunk files are concatenated and a single file is created, the meta-data

associated to the file is then stored in the database. This behavior is shown in figure 6.5.

Once the file is concatenated the status response is sent back to the client browser.

59


Since REMI is dealing with medical imaging data that is very critical to research

organizations it becomes crucial to ensure that the data files get uploaded in a consistent

and reliable manner. Now that we upload files in chunks it becomes even more important

to ensure that files get split, copied and merged back on the server in a very robust

way. The entire process needs to be fail-safe and should be able to recover from

failures. Currently we have not implemented the feature to resume (restart from where

it failed) file uploads but the current working of the file upload process does ensure that

the uploaded file appears within the REMI system only after it has been successfully

copied over to the file server. In the case of any failed uploads we are aware of chunk

files remaining on the file server, but any attempt to upload the same file again within

60

the same experiment will over-write the previous chunk files. The combination of the

internal experiment number and the file name ensures that the file will always get copied

to the same directory structure. Any failure during the process of uploading a file, does

not restrict the user from adding the file back to the experiment and trying to upload

the file again. It is assumed that no two files to be attached in the same experiment will

have the same name and that any modification to the file names would be done before

deciding to upload the files to REMI.

Although we went ahead with slicing the files and uploading the files using

JavaScript and Ajax, we were aware of other solutions that exist to handle file uploads.

These solutions would need the browser to run an additional plug-in to help the upload

process. One such solution is to make use a Java applet to handle the file uploads. The

applet would run within the browser itself but this would make it necessary that the

client machine have Java installed on the system. Another approach would be to make

use of flash. Flash-based file-upload utilities could be used to upload the files which

would then mean that the client would need flash to be installed within the browser. One

of our design goals was to keep the client side as light-weight as possible and hence we

decided to search for other solutions than the two mentioned above. WebSockets[19] is

another technology providing capabilities for bi-directional, full-duplex communication

channels, over a single Transmission Control Protocol (TCP) socket. It is being designed

as part of the HTML5 specification with the intention of it being implemented in web

browsers and web servers.

61

Chapter 7

Weakness and Future Enhancements

Although REMI makes some good strides in consolidating research data and making all

of the experiment files available to access via a web interface, REMI in many ways is

still in its budding phase. As the system gets utilized more and more and with constant

user feedback, new versions of REMI should only make the system more mature. REMI

forms a perfect platform for a lot of exciting future work and enhancements and as of

now enables basic features of managing and sharing of experiments and its data. REMI

is far from a well polished finished product and currently has a number of limitations that

we hope will be addressed over time. A list of these limitations have been mentioned

below.

7.1 Limitations within REMI

7.1.1 Semi Structured Data

Semi-structured data is a form of structured data that does not conform with the formal

structure of tables and data models that are associated with relational databases. They do

62

however contain tags or markers to separate semantic elements and enforce hierarchies

of records and fields within the data. XML files, JSON files and other mark up languages

are all forms of semi-structured data. A lot of the meta-data that needs to be associated

with files are also semi-structured data. Our current data model has not been designed

to capture this semi-structured data in its entirety. In our current implementation of

REMI, we identified elements within the semi-structured data that we thought to be

important and useful enough for searching and display purposes. All elements identified

are associated to some modality or process that forms a part of the experiment. The need

to capture a particular element within the database was validated by researchers within

their field of research. Every element identified is then mapped to a single column within

the file table. We also did provide the ability to associate tags to a file which in some

way does provide the ability to capture semi-structured data but the solution is not a

scalable one.

7.1.2 Resume functionality for Uploads and Downloads

REMI needs to handle uploading and downloading of files whose size is anything less

than 100GB each. Transferring such high volumes of data is a very time consuming

process and is prone to the possibility of disruption during uploads or downloads.

Ability to resume a data upload or download is necessary so that bandwidth, time and

processing power is not utilized to resend data that was once already transferred. With

the upload process we did change the behaviour of the system to upload the data to the

server in chunks. This was necessary to over up the file upload limits set by browsers

that we discussed earlier. Uploading files in chunks makes it possible to resume file

uploads but currently we have not built that feature into REMI. As for the downloads,

63

the downloads all happen at one shot and any failure in the download progress would

result in the user needing to start the download process all over again from the very

beginning.

7.1.3 File Compression on the Server

As mentioned above, REMI handles files which are huge in nature. As a public user

of the system, one can search for experiments, and then select one or many files that

they want to download. The HTTP protocol does not support sending multiple file in

response to a single request. There are a number of documented solutions to go around

this, but they either involve compressing or archiving all the individual files into one

single file and then sending it to the client, or to make use of some third party plug-in,

that in the background sends an individual request for each of the files that the user

selects to download. We decided to implement the solution of compressing the files on

the server. That said, this is not a very favorable and scalable solution given that the size

of the files are really large, and compressing such large files on the server will take time,

temporary disk space, and is CPU intensive. Also some files are dense files and makes

no sense wasting CPU cycles trying to compress them.

7.1.4 Meta-data Templates

The process of filling out meta-data for each file a very time consuming and tedious

process. In most cases the values for the meta-data are the same across a group of files

and even across experiments. Initially a contributor had to fill in each meta-data field

explicitly and after a couple of suggestions we provided a feature where-by one could

club files into groups and associate the same set of meta-data to all files within that

64

group. This process still needed the user to fill out each of the meta-data fields and

associate it with the files. The next requirement was the ability to copy pre-existing

meta-data, thereby providing the user the ability to copy meta-data and only modifying

elements that needed to be modified for that data-set. Due to lack of time we did provide

a stopgap feature where the most latest copy of the meta-data was stored and a user could

choose to use that copy to associate with the files.

7.2 Future Enhancements

All of the weaknesses mentioned above need to be addressed in due course of time and

lends itself to form the basis of our future enhancement list.

• Given that REMI needs to capture a whole lot of meta-data and that meta-data is

mostly semi-structured data it is quite necessary that REMI be able to handle semi-

structured data in a more reliable and scalable way. Currently we have identified

certain elements of the meta-data that we need to store and have existing columns

defined for each of these elements. This approach is very restrictive and does not

address the dynamic nature of semi-structured data. Moving forward we need to

look at how best to represent and store meta-data within a database. XML files are

a representation of semi-structured data and most database vendors have realized

the importance to capture and store semi-structured data. Different database

vendors have taken different approaches to address the problem of storing semi-

structured data within databases. Few databases are still relational in nature but

provide the facility of storing an entire XML document within a row of a table.

Oracle introduced a new datatype, XMLType, to facilitate native handling of XML

data in the database. The database management system then provides API to query

65

and manipulate these XML documents. Similarly we need to research and see if

MySQL has similar capabilities to handle and store XML documents.

• Resuming file uploads and downloads is another big enhancement that is critical

to the success of REMI. On the upload process we do already chunk the files

and upload them. This in some ways has already laid the foundation to provide

the resume feature on the file uploads. One just needs to check the size of the

incomplete file that currently resides on the server and then tell the browser to

start uploading the data past that size. On the download side we currently do

not have a strategy in place, but literature suggests that using the HTTP Byte

Range Retrieval Extensions[28], resuming file downloads is possible. To provide

a resume feature on downloads we need to take a detailed view of our current

download process and our compression strategy on the server. Currently all files

selected to be downloaded are temporarily compressed on the server and then this

compressed file is served to the browser. Since the file served is a temporary

compressed file, careful thought needs to be taken in understanding how this ties

into resuming file downloads.

• During our live demonstrations of REMI with the various stakeholders we noticed

that the meta-data being entered for a lot of the experiments were the same.

This was due to the fact that a lot of the experiments carried out were being

conducted in the same controlled environment with the same set of parameters.

This observation brings about two important facts. One is the ability to copy

meta-data over from either other experiments or from a meta-data template bank.

Currently we have not given much thought into this, but may prove to be necessary

66

down the line. Our current implementation just provides a method of copying over

the last filled meta-data.

• Dealing with the meta-data issue, it became evident for the need to provide

more personalized features. Personalization settings or views such as saved

search criteria, meta-data templates and owned experiments make it easier for

the researcher to interact with REMI and tend to focus on the studies that he has

uploaded into REMI. Currently REMI does not provide any capabilities of data

personalization.

67

Chapter 8

Conclusion

The work related to this thesis started with two simple problems. The first problem,

that the nuclear imaging group from the Life Sciences Division, Lawrence Berkeley

National Laboratory, faced was the scattering of experimental data. The nuclear imaging

group, over the years have conducted a number of experiments and has collected tens of

terabytes of raw and processed data. This data either resided on individual researchers

local machines or within an individual location on a file-server. The data was also

organized as per each researcher’s preference. Having experimental data scattered on

individual local machines and organized as per each researcher’s liking, made it hard for

other researches to get access to the data. It was even harder for researcher to understand

the data, given that they first needed to understand how the data was being stored. It

became difficult to track all of this data since the data was stored in different locations.

The second problem was driven by the need to make experimental data available for

researchers belonging to other scientific research laboratories. The nuclear imaging

group collaborates with researchers from other organization located at different parts of

the world. The need was to be able to provide these researchers with access to all the

68

experiment data that is captured by the nuclear imaging group. For the same reasons as

mentioned in the first problem, it would be hard to share this data with other researcher.

To address these two problems, we identified the need to have a scientific data

management system in place to help consolidate all of the experiment data, and a way

to query for experimental data. We also needed a consistent way of managing and

storing experiment information. A data model was needed to be designed that best

captured all of the experiment information along with other related information. The

data management system needed to be accessible not only by members of the nuclear

imaging group within Lawrence Berkeley National Laboratory, but also by researchers

in other research organizations, located in different parts of the world.

We developed REMI, a web-based scientific data management system, to address

the above problems. We started with understanding the experiment data that was being

captured and collected by the nuclear imaging group. The group mainly conducts

SPECT, DTMRI, and Micro-PET cardiac studies. It was necessary to identify the

differences in the experimental methodologies and the data generated for each of these

studies. We also identified the various objects that participated within a particular

experiment study. All of the above information helped us to understand the operations

that REMI is to perform and the design of the data model needed to store all of this

data. REMI was also developed with the intention of making all of the experiment

data publicly available to anyone. Apart from this, one needs valid credentials to use

REMI to upload new experiment data. While developing REMI, there were a number of

challenges that we needed to overcome. From uploading the large datafiles in parts to

providing features to help researchers copy meta-data templates, each of them needed

to be addressed in a timely manner to make the project a success. In this thesis, we

69

reported our design considerations, problems encountered, and their solutions behind

REMI database.

Although there are a number of different databases available to manage research data

within a specific research area, REMI is the first scientific data management system that

is developed to address experimental data generated by cardiac nuclear imaging studies.

REMI manages a variety of data files, including raw medical imaging data files, machine

calibration data files, and image reconstruction data files. REMI makes all of this data

available to researchers, present in different parts of the world, through an easy-to-use

web interface. REMI is also built on top of a data model that can be extended to handle

most scientific experimental data. Through the use of tags, categories and category

values, REMI can be adapted to manage experimental data, generated by studies in

other similar scientific research domains.

As of today, REMI is running in production environment and is being used to store

and share experiment data with different researchers. REMI currently stores 8105 files

and 175 experiment. That said, there are a number of weaknesses present in the REMI

database. REMI does not store all of the semi-structured data generated within each

experiment. REMI stores a small subset of the semi-structured data, as identified

necessary by researchers. Also filling in meta-data is a tedious process, given that

researchers need to manually enter all of the meta-data. A good percentage of the meta-

data is present in various header files, and it would be nice if REMI could fetch this

meta-data from the header files themselves. We plan to address these weaknesses in our

future work. We also expect the REMI database to mature, as REMI gets utilized over

time and as more researchers begin interact with REMI.

70

Bibliography

[1] Mayo Clinic, Analyze Software, Retrieved from http://medical.nema.

org/, Accessed in 2011.

[2] Cucumber, http://cukes.info/, Accessed in 2011.

[3] Dicom: Digital imaging and communications in medicine, http://medical.

nema.org/.

[4] Emory cardiac toolbox, http://www.medx-inc.com/emory_toolbox.

html, Accessed in 2011.

[5] Html5: User interaction: Drag and drop, http://dev.w3.org/html5/

spec/dnd.html, Accessed in 2011.

[6] Java 2 platform, enterprise edition(j2ee), http://java.sun.com/j2ee/

overview.html, Accessed in 2011.

[7] Learn rest: A tutorial, http://rest.elkstein.org/, Accessed in 2011.

[8] List-of-values components, Oracle Fusion Middleware Web User Interface

Developer’s Guide for Oracle Application Development Framework.

71

[9] Qps: Quantitative perfusion spect, http://www.csaim.com/qps.php,

Accessed in 2011.

[10] Rspec: A behavior-driven development tool for ruby programmers., http://

rspec.info/, Accessed in 2011.

[11] Web services description language, http://www.w3.org/TR/wsdl,

Accessed in 2011.

[12] The webkit open source project., http://www.webkit.org/, Accessed in

2011.

[13] Wireshark, http://www.wireshark.org/, Accessed in 2011.

[14] Xmlhttprequest: W3c candidate recommendation, August 2010, http://www.

w3.org/TR/XMLHttpRequest/, Accessed in 2011.

[15] File api: W3c working draft, October 2011, http://dev.w3.org/2006/

webapi/FileAPI/, Accessed in 2011.

[16] Html5, June 2011, http:

//dev.w3.org/html5/spec/Overview.html, Accessed in 2011.

[17] Phusion passenger, December 2011, http://www.modrails.com/,

Accessed in 2011.

[18] Soap version 1.2, October 2011, http://www.w3.org/TR/soap/, Accessed

in 2011.

[19] The websocket api, November 2011, http://dev.w3.org/html5/

websockets/, Accessed in 2011.

72

[20] Todd-Pokropek A, Cradduck TD, and Deconinck F., A file format for the exchange

of nuclear medicine image data: a specification of interfile version 3.3, Nuclear

Medicine Communication (1992), 673–699.

[21] Steve Androulakis, Ashley M Buckle, Ian Atkinson, David Groenewegen, Nick

Nicholas, Andrew Treloar, and Anthony Beitz, Archer e-research tools for

research data management, The International Journal of Digital Curation 4 (2009),

no. 1.

[22] T. Berners-Lee, L. Masinter, and M. McCahill, Uniform resource locators (url),

December 1994, http://www.ietf.org/rfc/rfc1738.txt, Accessed

in 2011.

[23] Andrew D. Birrell and Bruce Jay Nelson, Implementing remote procedure calls,

ACM Transactions on Computer System (1984), 39–59.

[24] Scott Chacon, Git, Retrieved from http://git-scm.com/.

[25] David Chelimsky, Dave Astels, Bryan Helmkamp, Dan North, Zach Dennis, and

Aslak Hellesoy, The rspec book, Pragmatic Bookshelf, December 2010.

[26] Robert H. Choplin, Jobannes M. Boebme II, and C. Douglas Maynard, Picture

archiving and communication systems: an overview., Radiographics (1992).

[27] E.P.Ficaro, B.C. Lee, J.N. Kritzman, and J.R. Corbert, Corridor4dm: The

michigan method for quantitative nuclear cardiology., Proc. Am. Society Of

Nuclear Cardiology (2007), 455–465.

73

[28] R. Fielding, J.Gettys, J. C. Mogul, H. Frystyk, L. Masinter, P. Leach, and

T. Berners-Lee, Hypertext transfer protocol – http/1.1, June 1999, http://

tools.ietf.org/pdf/rfc2616.pdf.

[29] Roy T. Fielding and Richard N. Taylor, Principled design of the modern web

architecture, ACM Transactions on Internet Technology (2002), 115–150.

[30] Carissa G. Fonseca, Michael Backhaus, David A. Bluemke, Randall D. Britten,

Jae Do Chung, Brett R. Cowan, Ivo D. Dinov, J. Paul Finn1, Peter J. Hunter,

Alan H. Kadish, Daniel C. Lee, Joao A. C. Lima, Kalyanam Shivkumar Pau

Medrano-Gracia, Avan Suinesiaputra, Wenchao Tao1, and Alistair A. Young,

The cardiac atlas projectan imaging database for computational modeling and

statistical atlases of the heart, Bioinformatics (2011).

[31] Daniele Gianni, Steve McKeever, and Nic Smith, euheartdb: A web enabled

database for geometrical models of the heart., Functional Imaging and Modeling

of the Heart 5528 (2009), no. 44, 407–416.

[32] Jim Gray, David T. Liu, Maria Nieto-Santisteban, Alex Szalay, David J. DeWitt,

and Gerd Heber, Scientific data management in the coming decade, Cyber

Technology Watch (2005).

[33] David Heinemeier Hansson, Ruby on rails, Retrieved from http://

rubyonrails.org/.

[34] Esperanza Marcos, C’esar J. Acuna, Belen Vela, Jos’ M. Cavero, and Juan A.

Hernandez, A database for medical image management, computer methods and

programs in biomedicine (2007), 255–269.

74

[35] Brian Matthews, Shoaib Sufi, Damian Flannery, Laurent Lerusse, Tom Griffin,

Michael Gleaves, and Kerstin Kleese, Using a core scientific metadata model in

large-scale facilities, The International Journal of Digital Curation 5 (2010).

[36] Dan North, Introducing bdd, March 2006, http://dannorth.net/

introducing-bdd/, Accessed in 2011.

[37] David A. Patterson, Garth Gibson, and Randy H Katz, A case for redundant arrays

of inexpensive disks (raid), ACM SIGMOD 17 (1988), no. 3.

[38] Trygve Reenskaug, Model-view-controller (mvc) architecture., (1979), Retrieved

from http:

//heim.ifi.uio.no/˜trygver/themes/mvc/mvc-index.html.

[39] Mario Valle, Scientific data management, http://personal.cscs.ch/

˜mvalle/sdm/scientific-data-management.html.

[40] Gottfried Vossen and Mathias Weske, The wasa approach to workflow

management for scientific applications, NATO ASI Series 164 (1998), 145–164.

[41] Hugh E. Williams, Managing images with a web database application, Tech.

report, May 2002, Published on OnLamp.com, Accessed in 2011.

75

Appendix A

User Documentation

A.1 Public User

A.1.1 Downloading Experiment Data

1. When a user access the REMI website they are taken to the REMI main page.

Figure A.1 shows the main page that presents to the user three menu from which

one can choose to search for experiments.

76

Figure A.1

2. The modality menu, as shown in the figure A.2, lists all the modalities within

REMI. Choosing one of the modalities brings up a list of all the experiments that

belong to that modality.

77

Figure A.2

3. Figure A.3 shows a list of all DTMRI experiments stored in REMI. To view and

download the files associated to a particular experiment one needs to click the

green ”expand” icon. A User then selects the files he wants to download and

clicks the Download Files button.

78

Figure A.3

4. The download page provides a set of search fields that the user can use to search

for experiments. Figure A.4 shows a user searching for the experiment Rat Heart

SHR8 data set B.

79

Figure A.4

5. One can also use a text based search to search for experiments across parameters

shown in the table of experiments. In figure the user searches for all DTMRI

experiments that use WKY subjects.

80

Figure A.5

A.2 Login into REMI

1. Users that need to contribute to the REMI system need to log into the system.

Present at the bottom left corner is a log-in button indicated by the red arrow in

figure A.6. Clicking the button redirects the user to the log-in page.

81

Figure A.6

2. A unique username is assigned to every user of the system. To log into the REMI

system a user enters his username and password and hits the log in button as

shown in figure A.7.

82

Figure A.7

A.3 Data Authentication

Although we have defined a role called Data Authenticator and have defined the actions

one can perform as a data authenticator, the nuclear imaging group at the Life Science

Division, Lawrence Berkeley National Laboratory have not yet decided if they want to

support this policy or not.

A.4 Contributor

A.4.1 New Subject Work-flow

1. To add a new subject to the system you need to first click the contribute link at the

top as shown in figure A.8.

83

Figure A.9

Figure A.8

2. click the Subjects menu item. Figure A.9

84

3. A.10 shows a list of all existing subjects within REMI. To add a new subject click

the ’Add New Subject’ button.

Figure A.10

4. Once on the subject detail page shown in Figure A.11, enter all the required

information regarding the subject. Labels marked in red with an asterisks are

mandatory fields. Press the ’Create Subject’ button to save the subject.

85

Figure A.11

5. If all goes well and the subject is saved to REMI you should see a success message

as shown in Figure A.12. The success message disappears in a while leaving you

on the details page for any further modification if needed. If for any reason the

subject could not be saved to REMI then an appropriate error message is display

along with the reason for which the record could not be saved.

86

Figure A.12

6. Click the ’Back’ button to take you back to the Subjects index page. A.13

Figure A.13

87

7. As mentioned before the Subject index page shows a listing of all the subjects

available in REMI. The newly created subject should be available in the list and

we can search of that subject in a number of ways. A.14 shows 3 controls to

navigate the list.

(a) 1 in the figure determines the number of records one sees on the page

(b) 2 shows a text box that is used to filter records. searching is done across all

columns.

(c) 3 helps to navigate the list. You could use the arrow buttons to more one

page at a time forward or backwards. The double arrowed buttons take you

to the beginning of the list or the end of the list and the numbered buttons

takes you to that specific page in the list.

(d) 4 gives us a summary of the number of records in the list.

Figure A.14

88

8. We search for all records containing ’ftha’ and the list updates depending on our

search criteria. Notice that figure A.15 shows the new subject we just added is also

displayed along with an updated summary of the number of records the filtered

list contains.

Figure A.15

A.4.2 New Experiment Work-flow

1. As a contributor you can create new experiments by first clicking on the contribute

link as shown in figure A.16.

89

Figure A.16

2. Clicking the contribute link takes you to the experiment listing page. This page

displays all the experiments within REMI. To add a new experiment click the ’Add

New Experiment’ button as shown in figure A.17.

90

Figure A.17

3. The system starts of with a bare basic template of the experiment first. On saving

the template based on the modality chosen the appropriate experiment is created

with the additional information that is associated with that modality. Fields in

red with an asterisks sign are mandatory fields. The ’Researcher name’ field is

a lookup text box. Typing in the first couple of letters of the researchers user

91

name bring up a drop-down list displaying the list of researchers that match the

characters entered in that field. Figure A.18 shows a snapshot of the experiment

page.

Figure A.18

4. Saving the template created in the previous step takes the user to the edit page

which displays all the details entered by the user when creating the experiment.

In addition depending on the modality respective tabs appear at the top. Each

tab encapsulates certain amount of related information that needs to be capture as

part of the experiment. There are five actions as shown in figure A.19 that can be

performed from this page as listed below.

(a) Subjects

In most cases experiments are carried out on one or more subjects. Also

experiments may not have any subject associated with it. The Subjects tab

allows the user to associate subjects to the experiment.

92

(b) Tracers

Like subjects, tracers too may or may not have been used within an

experiment. The Tracers tab allows the user to associate tracers to the

experiment. A tracer is a pharmaceutical that allows the ability to image

the body, based on the cellular function and physiology.

(c) Detectors If the experiment has any detector information that needs to be

captured, the corresponding fields are located under the Detectors tab.

(d) Files To associate files to the experiment as well as enter meta-data related

to the files click the ’Files’ tab.

(e) Update Experiment

The ’Update Experiment’ button updates the experiment with the

information that is filled on the form. At any point during the process of

filling the various sections, one can hit the ’Update Experiment’ button and

that information added by the user will get stored to REMI.

(f) Back

Back takes the user back to the experiment listings page.

93

Figure A.19

5. Clicking the Subjects tab takes the user to the subject page from where the user

can associate the subjects to the experiment. To add a subject to the experiment

one needs to click the ’Associate New Subject’ button. See figure A.20.

94

Figure A.20

6. Clicking the ’Associate New Subject’ button brings up a list of all the subjects

within the REMI system as shown in figure A.21. Four actions can be taken from

here, namely;

(a) 1 shows the search box. The search box can be used to filter the subjects

based on a certain condition.

95

(b) 2 Check mark the subjects that need to be associated with the experiment.

(c) 3 Hit the ’Add Subject to Experiment’ button once the subjects have been

selected to associate those subjects to the experiment.

(d) 4 To go back and not associate any subjects to the experiment.

96

Figure A.21

97

7. The selected subjects are then added to the experiment. NOTE: The associations

are not saved until the ’Update Experiment’ button is clicked, telling the system

to save the association to REMI. Figure A.22 shows the two subjects associated to

the experiment. Also at the bottom addition information concerning the individual

subjects can be entered. 1 indicates that clicking that header expands the section

for that corresponding subject and collapses the other sections.

98

Figure A.22

8. Clicking the Tracers tab takes the user to the tracer page from where the user can

associate the tracers to the experiment. To add a tracer to the experiment one

needs to click the ’Associate New Tracer’ button. See figure A.23.

99

Figure A.23

9. Clicking the ’Associate New Tracer’ button brings up a list of all the tracers within

the REMI system as shown in figure A.24. Three actions can be taken from here

namely

(a) 1 Check mark the tracers that need to be associated with the experiment.

(b) 2 Hit the ’Add Tracer to Experiment’ button once the tracers have been

selected to associate those tracers to the experiment.

(c) 3 To go back and not associate any tracers to the experiment.

100

Figure A.24

10. The selected tracers are then added to the experiment. NOTE: The associations

are not saved until the ’Update Experiment’ button is clicked, telling the system

to save the association to REMI. Figure A.25 shows the two tracers associated to

the experiment.

101

Figure A.25

11. Clicking the Files tab you can now associate files to the experiment. Figure A.26

shows the file listing page. Files can be associated to the experiment in two ways.

One way to associate files is to click the browse button and select a single or

multiple files. The other method is to select the files and drag and drop them in

the ’Drop new files here’ section.

102

Figure A.26

12. Once the files appear on the page as shown in figure A.27 associated meta-data is

added to it by selecting the corresponding file using the check box and then hitting

the ’Edit Meta Data’ button. A single file or multiple files can be selected at the

same time.

103

Figure A.27

13. Four categories of meta-data can be associated to the files selected namely file

details, subject information, acquisition information and voxel information. File

details containing file information like description and data type can be entered

here. Refer to figure A.28.

104

Figure A.28

105

14. Depending on the subjects added to the experiment, files may also be associated

to a specific subject rather than an experiment. One can choose which subject the

current selected files belong to from here. Refer to figure A.29.

106

Figure A.29

107

15. Various other parameters that are captured during data acquisition should be

entered under the Acquisition Information tab.

108

Figure A.30109

16. The Voxel Information tab captures voxel information in case the files selected

have any voxel data associated with them.

110

Figure A.31

111

A.5 System Administrator

A.5.1 Researchers

1. To add a new researcher, a system administrator needs to click the System

Administration link. Figure A.32 shows the same.

Figure A.32

2. The user is then taken to the page containing a list of all users in REMI. To create

a new user, one needs to click the New User link as shown in figure A.33.

112

Figure A.33

3. The User then needs to enter the details of the user in the user details page. All

fields marked in red are mandatory and trying to safe the details with any of the

mandatory details missing throws a valid error message as shown in figure A.34.

113

Figure A.34

4. Figure A.35 shows the user details being successfully saved to the REMI database.

114

Figure A.35

5. Going back to the list of all researcher, one can search for the newly created user

using the search box, as shown in figure A.36.

115

Figure A.36

6. Figure A.37 shows the newly created user logging into REMI.

116

Figure A.37

7. The testUser successfully logs into REMI and as defined by the Contributor role,

which was assigned to testUser, they can access that menu item. This behavior is

shown in Figure A.38.

117

Figure A.38

A.5.2 Roles

1. To add a new Role to the REMI, a system administrator first needs to click the

System Administration link. This is shown in figure A.39.

118

Figure A.39

2. To add a new role, the system administrator then needs to click the New Role

button present on the Roles page. The roles page shows a listing of all roles

defined on REMI. Figure A.40 shows the user interface for the same.

119

Figure A.40

3. The system administrator then enters the role name and a description for that role

and hits the Create Role button as shown in figure A.41.

120

Figure A.41

4. The system administrator can also search for the new role from the roles listing

page shown in figure A.42.

121

Figure A.42

5. In addition to adding the role from the user interface, the system administrator

also needs to define the abilities of that role. The abilities of a role is defined in

the system file called ability.rb which can be found under the app/models folder.

Figure A.43 shows the details of the ability file. A new condition block needs to

be added to the ability file to assign abilities to the new role.

122

Figure A.43

The following code shows a template for the new role condition to be added to the

ability.rb file.

c l a s s A b i l i t y

i n c l u d e CanCan : : A b i l i t y

def i n i t i a l i z e ( u s e r )

# c r e a t e a u s e r i f t h e u s e r v a r i a b l e i s

# n o t a l r e a d y d e f i n e d

#

# t h e u s e r v a r i a b l e would n o t be d e f i n e d

# i f t h e u s e r i s n o t l og ge d i n t o t h e s y s t e m

123

#

# i f a u s e r l o g s i n t o REMI t h e u s e r v a r i a b l e

# p as se d t o t h e method w i l l be d e f i n e d .

u s e r | | = User . new

# F i r s t we s e t t h a t t h e u s e r s h o u l d n o t be a b l e

# t o a c c e s s any methods d e f i n e d on t h e c o n t r o l l e r s .

c a n n o t : manage , : a l l

# Code f o r c o n t r i b u t o r . . . s e e t h e f i l e

# f o r d e t a i l s

#

# Code f o r s y s t e m a d m i n i s t r a t o r . . .

# s e e t h e f i l e f o r d e t a i l s

# i f t h e u s e r i s a <new r o l e name> t h e n

# a l l o w them t o manage < l i s t o f c o n t r o l l e r s >

i f u s e r . i s a <new r o l e name>?

can : manage , [< l i s t o f c o n t r o l l e r s >]

end

end

end

124

A.5.3 Categories and Category Values

1. To add a new category or category value, a system administrator needs to go to the

system administration section within REMI, and click the Category Menu. This

brings up a listing of all the categories and one can add new categories too. Figure

A.44 shows the list of all categories and how to add a new category.

Figure A.44

2. A system enters the category name and saves the category by clicking the Create

Category button. Figure A.45 also shows the list of category values associated to

the new category. This list is empty as expected because this is a new category.

125

Figure A.45

3. The system administrator can then search for the new category from the list of

categories page as shown in figure A.46. To add category values to this new

category, they can click the name of the category to take them to the details page

again.

126

Figure A.46

4. Figure A.47 shows the details page of the category testCategory. The system

administrator could change the name of the category or could go ahead and add a

list of values for this category.

127

Figure A.47

5. The system administrator then adds a category value name and saves it as shown

in the figure A.48.

128

Figure A.48

6. The new category value then appears on the category detail page. Refer to figure

A.49

129

Figure A.49

7. The system administrator can also add a value to an existing Category in

REMI. The figure A.50 shows the system administrator searching for the Species

category.

130

Figure A.50

8. Once at the details page of the Species category, the system administrator clicks

the Add New Category Value button shown in figure A.51.

131

Figure A.51

9. A new value, ”New Animal” is added to the Species category shown in figure

A.52.

132

Figure A.52

10. The new value should appear is all places where the Species category is reference.

The figure A.53 shows the new value New Animal now appears in the menu listing

under Species.

133

Figure A.53

11. Also the new value appears in the dropdown on the Subject detail page as shown

in figure A.54.

134

Figure A.54

135

Appendix B

Installation Documentation

B.1 Version Control within REMI

For version controlling purposes we made use of an open source, free, distributed

version control system known as Git[24]. Git supports two kinds of repositories, ”bare”

repositories and repositories. In the ideal world of distributed revision control there is

no central repository. People just pull from whomever they want changes from and no

pushes exist. Using repositories one can only pull content into and pushing content into

a repositories is not supported. In Git one should only use a ”bare” repository to clone

and pull from, and push to.

Within our REMI development environment, REMI’s ’bare’ repository resides

on tasman. All developer need to pull and push data to this ’bare’ repository.

[email protected]:/Users/antall/git repositories/ror is the remote url to the ’bare’

repository.

136

B.2 Creating a development environment

Following are the steps to set up a development environment. Please note that these

steps are related to setting up the development environment on a Ubuntu machine and

should be applicable to any operating system that is a derivative of Ubuntu.

1. Installing MySQL

Depending on the operating system download a copy of MySQL 5.1 database and

install it on the machine.

2. Install additional tools and libraries

Additional tools such as curl and git are a prerequisite to installing the Ruby

Version Manager (RVM). libxml2-dev and libxslt1-dev are required for nokogiri

which will be later installed by the RVM as a gem. libmysqlclient16-dev is also

needed to build the mysql2 driver to communicate with the MySQL database.

$ sudo apt-get install curl git libxml2-dev libxslt1-dev libmysqlclient16-dev

3. Install RVM.

The use of the Ruby Version Manager, known as RVM (to handle the ruby

execution engine and other rubygems), is strongly encouraged. RVM helps

managing different ruby interpreters and gems in an isolated manner. This makes

switching between interpreters easy, and we do not run the risk of accidentally

updating gems that other projects are dependent on. The complete installation

document can be found on the RVM website.

4. Install additional packages for RVM

A few additional packages that are needed by ruby when it installs the various

137

gems need to be installed. For additional information regarding each of these

packages please refer to the packages session under the documentation section

located on the RVM website.

$ rvm pkg install zlib

$ rvm pkg install openssl

$ rvm pkg install readline

5. Installing ruby

For development purposes we use the MRI ruby interpreter. A complete list of all

ruby interpreters that RVM supports can be listed by the command

$ rvm list known.

We currently support MRI ruby interpreter - version 1.8.7 and we install the inter-

preter passing the various packages that were installed for RVM in the previous

step.

$ rvm install ruby-1.8.7 –with-zlib-dir=$rvm path/usr

–with-readline-dir=$rvm path/usr –with-openssl-dir=$rvm path/usr

6. Creating a Gemset and .rvmrc file

RVM provides the facility of creating self contained compartmentalized

independent ruby setups. This is achieved by creating gemsets and then installing

the necessary gems within that gemset. Gemsets can then be associated with a

single ruby project. Although it is encouraged to have one gemset per project, it

is still possible to associate a number of projects to the same gemset. Ensuring

that one gemset is associated with one project avoids the possibility of gems in a

gemset getting updated in one project which in turn could break the dependencies

of the other projects on that gemset.

138

To create a gemset, the version of ruby needs to be selected first. We then create

the gemset by providing a gemset name. Below we create a gemset called remi.

$ rvm use 1.8.7

$ rvm gemset create remi

7. Create location for the codebase

Create a location to house the codebase. We also want to associate the ruby engine

and the gemset to this location. We can specify the ruby engine and the gemset

within a .rvmrc file which should be placed in the directory that we just created.

Now every time we change directory to this location or any other sub directory,

the .rvmrc file is run and the corresponding ruby engine and gemset gets loaded.

$ mkdir remi

$ cd remi

$ echo ”rvm use 1.8.7@remi” >.rvmrc

$ source .rvmrc

8. Fetching the source code

Having created the directory for the source code we need to set up a git repository

by initializing it first. We then create an alias for the remote repository - in the

following example we have named it remi. Once the alias is setup we pull the

source code from this location.

$ git init

$ git remote add remi [email protected]:/Users/antall/git repositories/ror

$ git pull remi master

139

9. Installing the gems

After the source code has been copied all the gems associated with the project

have to be installed. The REMI project makes use of the bundler gem that takes

care of installing all the other gems present inside the Gemfile. So first we install

the bundler gem and then ask bundler to install all of the other gems.

$ gem install bundler

$ bundle install

10. Creating and loading the database

To create the development database with the defined tables and initial data, the

following commands are needed to be executed.

$ rake db:create

$ rake db:migrate

$ rake db:seed

11. Starting the Webrick application server

Once all the gems have been installed and the database has been created and

loaded we can start the application server. Once the application server has started,

http://localhost:3000 should take you to REMI’s main page.

$ rails s

All of the above steps should help you in setting up a development environment on

a Linux (Ubuntu) machine.

140

B.3 List of Used Software Tools

The table below contains a list of all the software tools and components that have been

used to develop REMI.

141

Table B.1: Used Software Summary.

Name Description VersionActiverecord ORM implementation 3.0.9Apache Web Server 2.2.17Authlogic Simple model based ruby authentication

solution3.0.3

CanCan Authorization Gem for Ruby on Rails 1.6.7Capybara Acceptance test framework for web

applications1.1.1

Cucumber BDD framework 0.10.2Factory girl rails Rails integration for factory girl 1.2.0Formtastic A Rails form builder plugin with

semantically rich and accessible markup2.0.2

Guard-cucumber Guard::Cucumber automatically run yourfeatures

0.7.2

Guard-rspec Guard::RSpec automatically run yourspecs

0.5.0

JSON JavaScript Object Notation 1.6.1jQuery JavaScript Library 1.6.4MySQL Database 5.1Mysql2 A modern, simple and very fast Mysql

library for Ruby - binding to libmysql0.2.6

Net-scp Pure-Ruby implementation of the SCPprotocol

1.0.4

Net-ssh Library for programmatically tunnelingconnections to servers

2.2.1

Passenger Deployment of Ruby web applications onApache

3.0.9

Rails Web framework 3.0.9Rails3-jquery-autocomplete

An easy and unobtrusive way to usejQuery’s autocomplete with Rails 3

0.7.3

Rspec-rails Rspec-2 for Rails-3 2.6.1Ruby A dynamic, open source programming

language1.8.7

Rubyzip A ruby library for reading and writing zipfiles.

0.9.4

Smartgit A GUI for git repositories any version

142

Appendix C

Maintenance Documentation

C.1 MVC

MVC is the concept of encapsulating data together with its processing (the model) and

isolate it from the manipulation (the controller) and presentation (the view) part that has

to be done on a user interface.

• Models

Models represent knowledge. A model could be a single object, or it could be

some structure of objects. There should be a one-to-one correspondence between

the model and the represented world. For example, ExpFile is a model within

REMI that maps to the real world entity File. Similarly the User model maps

to the individual researcher/user of the REMI database. Relational tables are

represented within the models.

• Views

A view is a (visual) representation of its model. It ordinarily highlights certain

143

attributes of the model and suppress others. It is thus acting as a presentation filter.

A view is attached to its model and gets the data necessary for the presentation

from the model by looking up attributes of the model. It may also update the

model by sending appropriate data back to the model. The view needs to know

the semantics of the attributes of the model it represents.

• Controllers

A controller is the link between a user and the system. It provides the user with

input by arranging for relevant views to present themselves in appropriate places

on the screen. It provides means for user output by presenting the user with menus

or other means of giving commands and data. The controller receives such user

output, translates it into the appropriate instructions and pass these instructions

on to one or more of the views. For example the UserController accepts the

GET/POST requests from the client browser and accordingly passes on the request

to the associated model or fetches a particular view.

144

C.2 Project Layout of REMI

Figure C.1: Project Layout of REMI

The figure C.1 shows the project layout of REMI. The entire REMI application resides

under the app folder. Following the MVC pattern, the app folder contains a sub-folder

for each of the Models, Views, and Controllers components. The Models directory

contains the classes which are used to store and manipulate the state of objects that

are stored in the REMI database. The user interface like HTML and ERB, which is

necessary to render the model to the user, is stored under the Views directory. ERB

provides an easy to use but powerful templating system for Ruby. Using ERB, actual

Ruby code can be added to any plain text document for the purposes of generating

document information details and/or flow control. The Controllers directory contains

all the controllers of the application. The controller decides what the user’s input was,

how the model needs to change as a result of that input, and which resulting view should

be used.

145

C.3 Adding a new Entity

The following section shows how to go about adding a new entity to REMI.

1. To create a new entity in REMI, we start with using the scaffold generator to cre-

ate the template code. Syntax to the scaffold generator is$ rake g scaffold <entity name>[<column name:datatype>]*

As shown in figure C.2, the scaffold generator generates the migration file, the

model file, the controller file and the various view files under the app folder. The

scaffold generator also creates additional files for testing and a helper file to put

additional helper logic.

Figure C.2: Using the scaffold generator to generate a template

2. Figure C.3 shows the contents of the newly created migration file. The migration

file defines a class containing two methods up and down. The up method creates

the table in the database and the down method rolls back the changes made my

the migration file.

146

Figure C.3: Contents of the migration file

3. The scaffold generator also updates the routes.rb file with an entry for the

new entities entity. The routes.rb contains a list of all the permitted RESTful

API defined on an entity. Figure C.4 shows the same.

Figure C.4: Entry made by the scaffold generator in the routes.rb file

4. We need to run the migrate command to update the database with the new table

that is defined in the migration file. To run the migration script the command is

147

$ rake db:migrate

In case one needs to rollback the changes then the command for it is$ rake db:rollback

Figure C.5 shows the output on the console confirming that the table is created in

the database.

Figure C.5: Executing the migrate command

5. Having performed all of the above steps, the new entity object is now accessible

via the web interface through the REST API. The listings page is shown in figure

C.6.

Figure C.6: One can access the new entity objects via the web browser

6. Now that we have access to the entity via the REST API, we need to be able to

access the entities via the REMI menu. An entry for the new entity needs to be

made in the menu.json file. The file currently has two menu configurations, first

148

the system administration menu configuration and second the contributor menu

configuration. Depending on where one would like the menu to new entities to

show up, an entry needs to be made in that section. In figure C.7, we have added

the menu to the contributor menu configuration. Components of the entry made

are

• id: The ID value that translates to the element ID on the HTML page. It

provides us with a handle for Document Object Model (DOM) manipulation.

• value: The display name that is shown on the user interface.

• link: The name of the restful API to be called on clicking that menu item.

149

Figure C.7: Adding an entry to the menu.json file

7. Also since we have added the new entity menu to the contributor menu

configuration, a similar entry needs to be made in the file menu.html.erb. This

file is located under the folder apps/views/common. This file acts as the view file

to display the menu entries across all of the REMI web pages. Figure C.8 shows

the corresponding changes made to the file.

150

Figure C.8: Adding an entry to the menu.html.erb file

8. Figure C.9 shows the new new entity menu item on the REMI menu.

Figure C.9: New Entity Menu Item

151

9. So far we have managed to create the necessary template files. We managed

to create the table within the database, and we updated the REMI menu to

display the new entity. The scaffold generator generated for us some default

view pages. Although we could make use of the same default view pages, we

need to modify these default pages, so that they are consistent with the look

and behavior of all the other REMI pages. All the view files can be found

under the apps/views/new entities directory. We start of with the listing page (file

index.html.erb). Figure C.10 show the default code within this page and figure

C.11 shows the modified code. There are three changes that need to be made.

(a) Step 1 in figure C.11, shows that the table header is modified to add a couple

of div tags to the file. We also add the table partial just before the div tags.

The table partial contains the presentation logic for all tables. It is important

to ensure that the ID for the table tag is the same as the table name parameter

passed to the table partial. Partial templates usually just called ”partials”

are a mechanism for breaking the rendering process into more manageable

chunks. With a partial, you can move the code for rendering a particular

piece of a response to its own file.

(b) Step 2 in figure C.11, changes the display logic of the rows of the table.

Rather than having separate columns for the Show, Edit, and Delete options,

we remove the Show and Edit options. We then modify name attribute for

the entity, by creating the name attribute as a link to the edit page.

(c) Step 3 in figure C.11, adds a class attribute to the link object. CSS looks at

this class attribute and paints a button object for that link.

152

Figure C.10: Default listings page

Figure C.11: Modified listings page

10. Figure C.12 shows the newly modified listings page. One can see the new table

template being displayed in comparison to the page displayed in figure C.9.

153

Figure C.12: New listing page

11. The last file to modify is the form.html.erb file. This file is a partial that is used

in the files new.html.erb and edit.html.erb. While creating a new new entity the

new.html.erb is displayed, and to update an existing new entity the edit.html.erb

file is displayed. A total of four changes are made to the form.html.erb file,

namely;

(a) Step 1 in figure C.13, we need to tell the page to make use of the formtastic

declarative syntax. This is done by adding the tag semantic form for. For a

list of all the gems used in REMI, refer to the table B.1.

(b) Step 2 in figure C.13, adds a timed notice to the page. This notice message is

displayed for 5 secs and is used to display messages related to creating and

updating of the objects.

(c) Step 3 in figure C.13, adds the various attributes of the object on the page. If

any new attribute is later added to the model, then that attribute also should

be mentioned here for display purposes.

(d) Step 4 in figure C.13, adds the submit and the cancel buttons on the page.

154

Figure C.13: Making changes to the form.html.erb file

155

Appendix D

Database Documentation

D.1 List of Tables

c l a s s C r e a t e U s e r M e t a d a t a T e m p l a t e s < A c t i v e R e c o r d : : M i g r a t i o n

def s e l f . up

c r e a t e t a b l e : f i l e t a g s do | t |

t . r e f e r e n c e s : e x p f i l e , : n u l l => f a l s e

t . r e f e r e n c e s : t ag , : n u l l => f a l s e

t . t i m e s t a m p s

end

a d d i n d e x : f i l e t a g s , : e x p f i l e i d

a d d i n d e x : f i l e t a g s , : t a g i d

c r e a t e t a b l e : t a g s do | t |

t . s t r i n g : tag name , : n u l l => f a l s e


156

end

a d d i n d e x : t a g s , : t ag name

c r e a t e t a b l e : c a t e g o r i e s do | t |

t . s t r i n g : name , : n u l l => f a l s e

t . t e x t : d e s c r i p t i o n

t . b o o l e a n : u s e r d e f i n e d


end

a d d i n d e x : c a t e g o r i e s , : name

c r e a t e t a b l e : c a t v a l u e s do | t |



t . b o o l e a n : u s e r d e f i n e d

t . r e f e r e n c e s : c a t e g o r y


end

a d d i n d e x : c a t v a l u e s , : c a t e g o r y i d

a d d i n d e x : c a t v a l u e s , : name

c r e a t e t a b l e : s u b j e c t s do | t |

t . s t r i n g : s u b j e c t t a g , : n u l l => f a l s e

t . s t r i n g : unique name , : n u l l => f a l s e

157

t . r e f e r e n c e s : specimen , : n u l l => f a l s e

t . r e f e r e n c e s : gender , : n u l l => f a l s e

t . t e x t : c o n d i t i o n s

t . d a t e : b i r t h d a t e , : n u l l => f a l s e

t . d a t e : h a r v e s t d a t e

t . r e f e r e n c e s : r e a s o n o f d e a t h

t . r e f e r e n c e s : c o n d i t i o n a t d e a t h

t . r e f e r e n c e s : h e a r t c o n d i t i o n a t d e a t h

t . r e f e r e n c e s : w a s f o m b l i n u s e d

t . f l o a t : h e a r t w e i g h t

t . f l o a t : b o d y w e i g h t


end

a d d i n d e x : s u b j e c t s , : s u b j e c t t a g

a d d i n d e x : s u b j e c t s , : s p e c i m e n i d

e x e c u t e ”ALTER TABLE s u b j e c t s ADD CONSTRAINT

c o n s t r a i n t u n i q s u b j e c t UNIQUE ( unique name , b i r t h d a t e ) ”

c r e a t e t a b l e : e x p e r i m e n t s do | t |



t . d a t e : e x p d a t e , : n u l l => f a l s e

t . r e f e r e n c e s : u se r , : n u l l => f a l s e

t . r e f e r e n c e s : machine

158

t . r e f e r e n c e s : moda l i t y , : n u l l => f a l s e

t . s t r i n g : t y p e


end

a d d i n d e x : e x p e r i m e n t s , : m o d a l i t y i d

a d d i n d e x : e x p e r i m e n t s , : u s e r i d

a d d i n d e x : e x p e r i m e n t s , : m a c h i n e i d

a d d i n d e x : e x p e r i m e n t s , : e x p d a t e

e x e c u t e ”ALTER TABLE e x p e r i m e n t s ADD CONSTRAINT

c o n s t r a i n t u n i q e x p e r i m e n t UNIQUE ( m o d a l i t y i d , u s e r i d , e x p d a t e , name ) ”

c r e a t e t a b l e : c o l l i m a t o r s do | t |

t . s t r i n g : name

t . r e f e r e n c e s : c o l l i m a t o r t y p e , : n u l l => f a l s e

t . s t r i n g : m a t e r i a l

t . f l o a t : r a d i u s

t . f l o a t : l e n g t h


end

a d d i n d e x : c o l l i m a t o r s , : c o l l i m a t o r t y p e i d

c r e a t e t a b l e : u s e r m e t a d a t a t e m p l a t e s do | t |


159

t . i n t e g e r : n u m b y t e s p e r p i x e l

t . r e f e r e n c e s : i m a g e b y t e o r d e r

t . s t r i n g : d a t a d i m e n s i o n s

t . i n t e g e r : num images

t . i n t e g e r : t o t a l n u m i m a g e s

t . s t r i n g : g a t e d t y p e

t . i n t e g e r : num ga te s

t . s t r i n g : i m a g e d a t a f o r m a t

t . r e f e r e n c e s : d a t a f i l e t y p e

t . r e f e r e n c e s : a c q u i s i t i o n m o d e

t . r e f e r e n c e s : d a t a t y p e

t . r e f e r e n c e s : n o r m a l i z a t i o n m e t h o d

t . r e f e r e n c e s : r e c o n s t r u c t i o n m e t h o d

t . b o o l e a n : f l o o d c o r r e c t e d

t . i n t e g e r : n u m a n g l e s t e p s

t . f l o a t : t o t a l a n g l e r o t a t i o n

t . b o o l e a n : d e c a y c o r r e c t e d

t . i n t e g e r : n p r o j e c t i o n s

t . f l o a t : t i m e p e r p r o j e c t i o n

t . s t r i n g : d i r e c t i o n o f r o t a t i o n

t . f l o a t : s t a r t a n g l e

t . s t r i n g : p i x e l d i m e n s i o n s

t . s t r i n g : v o x e l d i m e n s i o n s

t . i n t e g e r : a t t n z c n t #Num V o x e l s used from A t t n Map

160

t . f l o a t : v o x e l x w i d t h

t . f l o a t : v o x e l y w i d t h

t . f l o a t : v o x e l z w i d t h

t . t e x t : v o x e l d e s c r i p t i o n

t . b o o l e a n : i s g a t e d

t . i n t e g e r : s p a t i a l r e s o l u t i o n

t . i n t e g e r : n d i f f u s i o n g r a d i e n t s

t . f l o a t : b v a l u e

t . i n t e g e r : s t u d y d u r a t i o n

t . r e f e r e n c e s : s u b j e c t o r i e n t a t i o n

end

a d d i n d e x : u s e r m e t a d a t a t e m p l a t e s , : u s e r i d , : un iq ue => t rue

c r e a t e t a b l e : f i l e s do | t |

t . s t r i n g : f i l e n a m e , : n u l l => f a l s e

t . s t r i n g : c o n t e n t t y p e , : n u l l => f a l s e

t . s t r i n g : f i l e p a t h , : n u l l => f a l s e

t . f l o a t : f i l e s i z e

t . s t r i n g : f i l e c h e c k s u m

t . t e x t : f i l e d e s c r i p t i o n

t . i n t e g e r : n u m b y t e s p e r p i x e l

t . r e f e r e n c e s : i m a g e b y t e o r d e r

t . s t r i n g : d a t a d i m e n s i o n s

t . i n t e g e r : num images

161

t . i n t e g e r : t o t a l n u m i m a g e s

t . s t r i n g : g a t e d t y p e

t . i n t e g e r : num ga te s

t . s t r i n g : i m a g e d a t a f o r m a t

t . r e f e r e n c e s : d a t a f i l e t y p e

t . r e f e r e n c e s : a c q u i s i t i o n m o d e

t . r e f e r e n c e s : d a t a t y p e

t . r e f e r e n c e s : n o r m a l i z a t i o n m e t h o d

t . r e f e r e n c e s : r e c o n s t r u c t i o n m e t h o d

t . b o o l e a n : f l o o d c o r r e c t e d

t . i n t e g e r : n u m a n g l e s t e p s

t . f l o a t : t o t a l a n g l e r o t a t i o n

t . b o o l e a n : d e c a y c o r r e c t e d

t . i n t e g e r : n p r o j e c t i o n s

t . f l o a t : t i m e p e r p r o j e c t i o n

t . s t r i n g : d i r e c t i o n o f r o t a t i o n

t . f l o a t : s t a r t a n g l e

t . s t r i n g : p i x e l d i m e n s i o n s

t . s t r i n g : v o x e l d i m e n s i o n s

t . i n t e g e r : a t t n z c n t #Num V o x e l s used from A t t n Map

t . f l o a t : v o x e l x w i d t h

t . f l o a t : v o x e l y w i d t h

t . f l o a t : v o x e l z w i d t h

t . t e x t : v o x e l d e s c r i p t i o n

162

t . b o o l e a n : i s g a t e d

t . i n t e g e r : s p a t i a l r e s o l u t i o n

t . i n t e g e r : n d i f f u s i o n g r a d i e n t s

t . f l o a t : b v a l u e

t . i n t e g e r : s t u d y d u r a t i o n

t . r e f e r e n c e s : s u b j e c t o r i e n t a t i o n


end

c r e a t e t a b l e : f i l e a s s o c i a t i o n s , : i d => f a l s e do | t |

t . r e f e r e n c e s : f i l e

t . r e f e r e n c e s : owner , : n u l l => f a l s e , : po lymorph i c => t rue

t . t e x t : t a g s


end

a d d i n d e x : f i l e a s s o c i a t i o n s , : o w n e r i d

a d d i n d e x : f i l e a s s o c i a t i o n s , : f i l e i d

c r e a t e t a b l e : e x p t r a c e r a s s o c i a t i o n s do | t |

t . r e f e r e n c e s : expe r imen t , : n u l l => f a l s e

t . r e f e r e n c e s : t r a c e r , : n u l l => f a l s e

end

a d d i n d e x : e x p t r a c e r a s s o c i a t i o n s , : e x p e r i m e n t i d

163

a d d i n d e x : e x p t r a c e r a s s o c i a t i o n s , : t r a c e r i d

c r e a t e t a b l e : e x p s u b j e c t a s s o c i a t i o n s do | t |


t . r e f e r e n c e s : s u b j e c t , : n u l l => f a l s e

t . t e x t : c o n d i t i o n

t . f l o a t : w e i gh t

end

a d d i n d e x : e x p s u b j e c t a s s o c i a t i o n s , : e x p e r i m e n t i d

a d d i n d e x : e x p s u b j e c t a s s o c i a t i o n s , : s u b j e c t i d

c r e a t e t a b l e : machines do | t |


t . s t r i n g : company

t . r e f e r e n c e s : l o c a t i o n , : n u l l => f a l s e

t . i n t e g e r : n d e t e c t o r s , : d e f a u l t => 0 , : n u l l => f a l s e

t . s t r i n g : a d d r e s s

t . s t r i n g : c i t y

t . s t r i n g : s t a t e , : l i m i t => 2

t . s t r i n g : z i p c o d e


end

a d d i n d e x : machines , : name

a d d i n d e x : machines , : l o c a t i o n i d

164

c r e a t e t a b l e : energy windows do | t |

t . r e f e r e n c e s : e x p e r i m e n t d e t e c t o r

t . d e c i m a l : energy window

t . d e c i m a l : min energy window

t . d e c i m a l : max energy window

end

a d d i n d e x : energy windows , : e x p e r i m e n t d e t e c t o r i d

c r e a t e t a b l e : e x p e r i m e n t d e t e c t o r s do | t |

t . r e f e r e n c e s : s p e c t , : n u l l => f a l s e

t . r e f e r e n c e s : d e t e c t o r h e a d , : n u l l => f a l s e

t . r e f e r e n c e s : c o l l i m a t o r

t . s t r i n g : d e t e c t o r m a t e r i a l


end

a d d i n d e x : e x p e r i m e n t d e t e c t o r s , : s p e c t i d

a d d i n d e x : e x p e r i m e n t d e t e c t o r s , : d e t e c t o r h e a d i d

a d d i n d e x : e x p e r i m e n t d e t e c t o r s , : c o l l i m a t o r i d

c r e a t e t a b l e : u s e r s do | t |

t . s t r i n g : username , : n u l l => f a l s e

t . s t r i n g : f i r s t n a m e

165

t . s t r i n g : l a s t n a m e

t . s t r i n g : emai l , : n u l l => f a l s e

t . s t r i n g : d e p a r t m e n t

t . s t r i n g : i n s t i t u t i o n

t . s t r i n g : p o s i t i o n

t . s t r i n g : c r y p t e d p a s s w o r d

t . s t r i n g : p a s s w o r d s a l t

t . s t r i n g : p e r s i s t e n c e t o k e n

t . r e f e r e n c e s : s t a t u s

t . s t r i n g : a d d r e s s

t . s t r i n g : c i t y

t . s t r i n g : s t a t e , : l i m i t => 2

t . s t r i n g : z i p c o d e


end

a d d i n d e x : u s e r s , : username , : un i qu e => t rue

c r e a t e t a b l e : r o l e s do | t |




end

a d d i n d e x : r o l e s , : name , : un i qu e => t rue

166

c r e a t e t a b l e : u s e r r o l e s , : i d => f a l s e do | t |


t . r e f e r e n c e s : r o l e , : n u l l => f a l s e


end

a d d i n d e x : u s e r r o l e s , [ : u s e r i d , : r o l e i d ] , : u n iq ue => t rue

c r e a t e t a b l e : e x p e r i m e n t a p p r o v a l s , : i d => f a l s e do | t |


t . r e f e r e n c e s : u s e r


end

a d d i n d e x : e x p e r i m e n t a p p r o v a l s , : e x p e r i m e n t i d

a d d i n d e x : e x p e r i m e n t a p p r o v a l s , : u s e r i d

end

def s e l f . down

d r o p t a b l e : e x p e r i m e n t a p p r o v a l s

d r o p t a b l e : energy windows

d r o p t a b l e : c o l l i m a t o r s

d r o p t a b l e : f i l e a s s o c i a t i o n s

d r o p t a b l e : e x p s u b j e c t a s s o c i a t i o n s

d r o p t a b l e : e x p t r a c e r a s s o c i a t i o n s

167

d r o p t a b l e : u s e r m e t a d a t a t e m p l a t e s

d r o p t a b l e : f i l e s

d r o p t a b l e : e x p e r i m e n t d e t e c t o r s

d r o p t a b l e : e x p e r i m e n t s

d r o p t a b l e : machines

d r o p t a b l e : u s e r r o l e s

d r o p t a b l e : u s e r s

d r o p t a b l e : r o l e s

d r o p t a b l e : s u b j e c t s

d r o p t a b l e : c a t v a l u e s

d r o p t a b l e : c a t e g o r i e s

d r o p t a b l e : f i l e t a g s

d r o p t a b l e : t a g s

end

end

168

REMI Database: A Data Management System for Nuclear … · 2011-12-02 · I owe many thanks to...

Documents

Transcript of REMI Database: A Data Management System for Nuclear … · 2011-12-02 · I owe many thanks to...