Building Open Science Communities For Imaging Research · Building Open Science Communities For...

19
Building Open Science Communities For Imaging Research Open Source EHR Summit & Workshop October 17-18, 2012 National Harbor, MD Fred Prior 1 , J. Kirby 2 , J. Freymann 2 , C. Jaffe 3 , K. Smith 1 , B. Vendt 1 , K. Clark 1 , L. Tarbox 1 1 Washington University School of Medicine 2 SAIC-Frederick 3 Boston University

Transcript of Building Open Science Communities For Imaging Research · Building Open Science Communities For...

Building Open Science Communities For Imaging Research

Open Source EHR Summit & Workshop

October 17-18, 2012

National Harbor, MD

Fred Prior1,

J. Kirby2, J. Freymann2, C. Jaffe3, K. Smith1, B. Vendt1, K. Clark1, L. Tarbox1

1 Washington University School of Medicine 2 SAIC-Frederick 3 Boston University

Open Science

A somewhat nebulous concept in the scientific

community which has been construed to mean...

Using Open Source software in scientific research;

Making data and tools available to the public to enhance basic

science education;

Freely sharing tools, data and results among scientists;

Making scientific results available in Open Access journals;

Finding innovative solutions to scientific problems via crowd

sourcing.

Using Open Source software to capture and manage

Open Data to encourage and support research and

education.

Creating Research Communities around an Open Data resource.

Open Data

Really open access to data that was collected for one

purpose but is being reused for new purposes.

The NCI Cancer Imaging Program (CIP) has funded The

Cancer Imaging Archive (TCIA), an open image archive

service to support cancer research.

TCIA collects, de-identifies, curates and manages rich

collections of oncology images and related clinical and trial

information.

Since June of 2011, over 63 institutions have submitted

more than 19.5 million images.

Over 1500 registered users from 93 countries have

downloaded more than13.5 TB of information.

TCIA encourages and supports cancer-related open science communities by hosting and

managing the image archive, providing project wiki space, and is in the process of adding

searchable clinical trial data repositories to facilitate collaborative research.

http://www.cancerimagingarchive.net/

Open Source Software supporting Open Science Private Cloud computing environment based on XCP and XEN virtualization and

LAMP Stack (CentOS /Apache /MySQL /PHP)

National Biomedical Imaging Archive (NBIA) software provides image

management https://wiki.nci.nih.gov/display/NBIA/National+Biomedical+Imaging+Archive+-

+NBIA;jsessionid=A27CD5D74464B9D7C48CE5A8492D12CB

Clinical Trial Processor (CTP) software supports de-identification and secure

transport http://mircwiki.rsna.org/index.php?title=CTP-The_RSNA_Clinical_Trial_Processor

Confluence Wiki supports documentation and community outreach

Request Tracker (RT) Help Ticket tracking system manages help desk functions

http://www.bestpractical.com/rt/

Extensible Neuroimaging Archive Toolkit (XNAT) supports project focused view

of multiple image collections and research workgroups http://xnat.org/about/xnat-publications.html

AIME image annotation and markup management system allows users to

add content to the collections http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3015142/

Managing an Open Access Information Resource

A CTP pipeline de-identifies images in

compliance with DICOM PS 3.15, Appendix E:

and queues them for transmission to the intake

server.

The intake server accepts multiple concurrent

submissions. These are curated (quality control

review) before being moved to the public server

and made open access.

The TCIA wiki supports research groups by

providing a platform to describe the scope and

intent of collections and to provide metadata,

including the posting of conference abstracts and

publications.

Privileged image consumers have been granted

special permissions to query/download

collection-specific limited-access images, while

“general” consumers cannot.

A

B

C

D

The public server hosts curated images available

for download and the OS NBIA application for

doing so. Downloads are noted in NBIA’s

MySQL database.

The operations management Support Center’s

trouble-ticket system tracks image-consumers’

issues and resolutions. Consumers reach the

support center via email or telephone.

Image consumers view and download images

from TCIA and can save lists of images that

may be shared with colleagues. TCIA supports

searches by collection, anatomy, modality,

scanner manufacturer, dates, and many

modality-event specific criteria.

A dashboard reporting system provides TCIA

management with current and historical

consumer counts, collection image counts, and

image downloads.

E

F

G

H

Clark, et al., SIIM 2012, http://erl.wustl.edu/publications/posters/2012posters.html

TCIA DICOM Knowledge Base Open Data derived from clinical practice, or human subjects research

must be obtained with proper consent and fully de-identified before

being made publically available to protect patient privacy and comply

with applicable law.

The Cancer Imaging Archive (TCIA) staff has accumulated a wealth of

knowledge on best practices and procedures for DICOM image de-

identification.

In order to share this information with the wider research community a

wiki based knowledge base was created:

https://wiki.cancerimagingarchive.net/display/Public/De-

identification+Knowledge+Base

DICOM permits private extensions to the standard. These “private

tag” attributes may contain scientifically valuable information that must

be identified and retained.

De-identification scripts are constructed for each combination of

modality, vendor, software version and made available.

NCI’s Collection Acceptance Criteria

CIP Advisory Group prioritizes based on: NCI grant / contract award data sharing requirements

Analysis of imaging features to be used as biomarkers

Creation of correlative signatures for multi-platform biomarkers

Creation of algorithms for detection of cancer

Testing and validating quantitative analysis techniques

Unique characteristics for clinical training.

Adding new collections is a continuous process.

Current TCIA Collections Collection # Subjects # Images Access TCGA-BRCA 61 60,633 Public

TCGA-GBM 294 636,029 Public

TCGA-LGG 26 15,411 Public

TCGA-KIRC 102 48,781 Public

REMBRANDT 130 110,020 Public

RIDER Lung PET CT 244 269,522 Public

RIDER Breast MRI 10 4,800 Public

RIDER Neuro MRI 19 70,220 Public

RIDER Lung CT 32 15,716 Public

RIDER Phantom PET-CT 20 2,231 Public

RIDER Phantom MRI 10 7,061 Public

LIDC-IDRI 1,010 244,527 Public

FDA Lung Phantom 3 634,256 Public

Breast-Diagnosis 88 105,050 Public

QIN Breast 22 26,409 Public

QIBA CT-1C 1 69,258 Public

CT Colonography 825 941,774 Public

Prostate-Diagnosis 53 18,584 Public

QIN Lung 15 1,168 Restricted

QIN Phantom 3 475 Restricted

QIN Lung Segmentation Challenge 100 7,975 Restricted

QIN Prostate 22 25,981 Restricted

NaF Prostate 5 41,404 Restricted

Prostate-MRI 26 22,036 Restricted

Renal Training 8 7,576 Restricted

Head-Neck Cetuximab 96 15,199 Restricted

NLST 26,722 15,364,335 Restricted

TCIA Supported Research An international research community has free

access to information Cancer researchers can use this data to test new hypotheses

and develop new analysis techniques to advance our scientific

understanding of cancer.

Engineers and developers can build new analysis tools and techniques using this data as test material for developing and validating algorithms.

Educators can use it as a teaching tool for introducing students to medical imaging technology and cancer phenotypes.

NCI supported, TCIA centric research communities CIP TCGA Radiology Initiative

Lung Image Database Consortium

Quantitative Imaging Network

National Lung Screening Trial

CIP TCGA Radiology Initiative

Driven by input from its scientific community,

the Cancer Imaging Program (CIP) finds itself

at the junction of two powerful scientific

requisites: the need for cross-disciplinary research and inter-

institutional data-sharing to speed scientific discovery and

reduce redundancy,

the need to provide imaging phenotype data to augment

large scale genomic analysis.

https://tcga-data.nci.nih.gov/tcga/tcgaHome2.jsp

Generate uniform data

set from multiple sites

Access Training Data Sets

Federated annotation

tools pull data from a central

location

Publications can point to

specific Collections and

Shared Lists

TCIA Use Case How TCIA Enables the TCGA Phenotype Research Groups

Visualization and Annotation Tools Use TCIA Collections and Annotation Repository

Radiologists analyze TCIA collections using Clear Canvas

visualization and AIM annotations.

Annotations and markup are stored back into TCIA

Lung Image Database Consortium (LIDC)

The Lung Image Database Consortium research project

(LIDC-IDRI) involves the generation of marked-up

annotated lesions on the diagnostic and lung cancer

screening thoracic CT scans found in the LIDC-IDRI

image collection.

The project has created a web-accessible international

resource for development, training, and evaluation of

computer-assisted diagnostic (CAD) methods for lung

cancer detection and diagnosis.

https://wiki.cancerimagingarchive.net/display/Public/Lung+Image+Dat

abase+Consortium

Quantitative Imaging Network (QIN)

QIN’s mission is to improve the role of quantitative

imaging for clinical decision making in oncology by the

development and validation of data acquisition, analysis

methods, and tools to tailor treatment to individual

patients and to predict or monitor the response to drug

or radiation therapy.

To date, fifteen centers of imaging excellence have

been selected through the NIH peer review process.

TCIA actively supports 5 cross network research

initiatives with both public and private data collections.

https://wiki.cancerimagingarchive.net/display/Public/Quantitative+Imaging+

Network+Collections

Scientific Output from TCIA Supported Research Initiatives 20 peer reviewed publications

32 Scientific presentations

Partial list of publications may be found

here:

http://cancerimagingarchive.net/publications.html.

More are in preparation as the work is

ongoing and new projects and collections

are continually being added.

Conclusions

The Cancer Imaging Archive is an investment in Open

Science by the National Cancer Institute and allows Open

Access to cancer images, trial data, and mechanisms for

collaborative research.

Medical imaging research has been slow to adopt Open

Science, but initiatives such as TCIA are producing

substantial progress in this new direction.

Open science communities have formed around TCIA data

collections and are gaining traction as evidenced by a

steadily increasing output of abstracts and publications.