Formulate User Instructions, adapted from caBench-to-Bedside (caB2B) Web Application v3.2 An easy to...
-
Upload
marlene-morris -
Category
Documents
-
view
213 -
download
0
Transcript of Formulate User Instructions, adapted from caBench-to-Bedside (caB2B) Web Application v3.2 An easy to...
Formulate User Instructions, adapted from caBench-to-Bedside (caB2B)
Web Application v3.2An easy to use tool for searching across caGrid
Mukesh SharmaWashington University School of Medicine
WITH FUNDING SUPPORT
PROVIDED BY NATIONAL
INSTITUTE OF STANDARDS AND
TECHNOLOGY
QI-Bench Overview
2222
Formulate
Statistical Analysis Results (Relation
strength)
Annotation and Image Markup,
Non-imaging Clinical Data
Primary Data: Images and other
Raw Data
Reference Data SetsQIBO
Specify
RDF Triple Store
CT Volumetry CT
obtained_by
Tumor growth
measure_of
TherapeuticEfficacy
used_for
Analyze
Y=β0..n+β1(QIB)+β2T+ eij
Execute
Feedbac k
Feed
bac
k
ReferenceDataSet+ = Formulate (BiomarkerDB, {DataService} );
33
Role Use Case Supported Now Gap vs. Model
Domain expert
Find data with high precision and recall
Web App (thin)
Perform saved searches and organize data
Granular, role-based, security with single sign-on to both private and public data resources
InformaticistForm and expose queries to find data
Desktop (thick) Use UML models
Define queries in terms of RDF triples driven by ontologies (not UML)
IT systems expert
Configure knowledge resources and data services
Server-sideConfigure resources that use caGrid
Prepare and support more flexible method (e.g., SPARQL) so as not to be limited by caGrid
caB2B overview
– caB2B is a tool designed to integrate and analyze diverse biomedical datasets seamlessly. It has been developed to facilitate individual steps of cancer research analyses and reduce the bench-to-bedside barrier.
– caB2B is a caGrid client that permits bench scientists, translational researchers, and clinicians to leverage data services developed under caBIG® through a graphical user interface. Its metadata-based query interface enables end users to search virtually any caGrid data service.
44
Example Use Cases• User can query for all pre-cancerous
biospecimens from various caTissue instances like those at Washington University, Thomas Jefferson University, Holden Comprehensive Cancer Center etc.
• User can identify the sample obtained for Glioblastoma multiforme (GBM) and the corresponding CT image information. This query can be performed by querying across caTissue and NBIA using caB2B.
• User can find out if a sample used in an expression profiling experiment is available for a SNP analysis experiment. This query can be performed by querying across caTissue and caArray using caB2B.
• User can search for a particular gene based on the EntrezGeneID and its related information e.g. messenger RNA and protein information from GeneConnect.
55
caB2B Dependencies
• Availability of data on the caGrid
• Metadata registered in caDSR
• caGrid core services that support security, query federation and metadata
• Performance of the caGrid and data services
66
caB2B v3.2 Components• caB2B Server
– Caches metadata (concept codes, class and attribute descriptions, and permissible values) from caDSR and service instances to query
– Persists query results and downstream analyses • caB2B Administrative Module
– Permits caB2B server customization by the Administrator– Allows for model metadata caching and service instance selection– Permits Administrator to curate models (frequently used paths, creating
categories, defining intermodel joins) in order to facilitate end user queries
• caB2B Client Application– Allows end users to query virtually any caGrid data service, persist salient
results, and examine this information using visualization windows• caB2B Web Application
– Allows users to query microarray data, imaging data, and biospecimen data available on the caGrid.
– Permits keyword searches or use highly relevant parameterized queries (saved searches).
77
Target Audience • caB2B Administrative Module
Bioinformaticist -The caB2B administrator. Knowledge of UML models/domain models of caBIG tools is required; For activities like creating multi-model category, knowledge of Extensible Markup Language (XML) and basic knowledge of executing commands is desired.
• caB2B Client Application
Clinical and Translational Research Scientist. Knowledge of UML models/domain models of caBIG tools is required to create and execute the queries using caB2B.
• caB2B Web ApplicationClinical and translational research scientist. No special knowledge or skill is required to use the caB2B web application.
88
caB2B Web Application Capabilities
• caB2B Web Application allows users to
• Sign in (optional)• Select the type of data to search• Select the services (databases) from which data could be retrieved• Perform a keyword or a parameterized query• Execute queries offline• Export data into CSV file
99
Administrative module features
• Web based administration.• UI to search caDSR, retrieve models and load into MDR.• Discover services dynamically.• Curate frequently used paths to speedup query building.• Create categories to bridge gap between end user’s view of data and real
object oriented representation.• Define intermodel joins based on CDE, DEC match and manual override to
connect underspecified models.• Automatic cache update between administrative module and caB2B server.• Ability to reconfigure the previously configured service instances.
1111
Load Models from caDSR
Administrative interface
Select models to load
caB2B MDR
caB2B MDR
Get all Model names
Fetch selected model
1212
Discover Services Dynamically
Select models to discover services
caB2B MDR
caB2B MDR
Discover data services by domain model
Get loaded Models
Select service instances
1313
Curating frequently used paths for connecting classes
Identifying the most relevant paths between a classes and storing them.
1414
caB2B CategoryA UML Class is a collection of attributes that makes sense
technically to developers and bioinformaticians, but may not be intuitive to researchers and clinicians.
cd Logical Model
domainobject::Participant
# id: Long# lastName: String# firstName: String# middleName: String# birthDate: Date# sexGenotype: String# gender: String# race: String# ethnicity: String# socialSecurityNumber: String# activityStatus: String
domainobject::Site
# id: Long# name: String# type: String# emailAddress: String# activityStatus: String
domainobject::ParticipantMedicalIdentifier
# id: Long# medicalRecordNumber: String
1 #participant 0..*
#participantMedicalIdentifierCollection
1
#site0..*
Data elements for patient demographics are present across three classes
* Example from caTissue Core
1515
caB2B Category
A caB2B Category • Is a collection of attributes that makes logical sense
to researchers and clinicians• Can contain attributes from any class, even across
models, as long as a valid path exists among all classes
• UML Class is a type of CategoryUsage• Each caB2B administrator will create categories• Categories may be shared across caB2B server
instances
cd Logical Model
domainobject::Participant
# id: Long# lastName: String# firstName: String# middleName: String# birthDate: Date# sexGenotype: String# gender: String# race: String# ethnicity: String# socialSecurityNumber: String# activityStatus: String
domainobject::Site
# id: Long# name: String# type: String# emailAddress: String# activityStatus: String
domainobject::ParticipantMedicalIdentifier
# id: Long# medicalRecordNumber: String
1 #participant 0..*
#participantMedicalIdentifierCollection
1
#site0..*
An example caB2B category
1616
Defining intermodel joins using semantic metadata and manual override to consider underspecified models
Connecting two models using the common bridging attributes between them.
1818
The end user client is a Java application that enables end users to query for and persist data available on the caGrid. The end user client offers the following features:
• caGrid based authentication of users. Anonymous login for users without grid account.
• The query component consists of a diagrammatic view.
• The diagrammatic viewer allows the user to create a directed acyclic graph of the query that is to be executed and also helps the user to connect two or more classes to be searched.
• User based access control for experiments and saved queries. The experiments and queries saved by user will only be visible to the user and not to anyone else. "My Experiments" and "My Search Queries" menus on home page dashboard are available for easy access to user's experiments and queries.
caB2B Client Features
2020
caB2B Client Features• Category popularity to display most used categories. "Popular Categories"
menu on the home page dashboard now displays categories searched by all caB2B users in descending order of popularity.
• User override of administrator defined services instances. The user can change the service instances configured by the administrator without using the administrative module. The user can achieve this through "MySetting" link at the Home page dashboard or from the third step of search data wizard.
• User to view DCQLs in read only way. The DCQL that will be executed for a particular query is available for review from the third step of search data wizard.
• Grouping of query results by service instances. The results obtained for a query can be narrowed down to view results obtained from a particular service instance.
2121
caB2B Client Features• Queries generated/executed can be saved.
• The data obtained from the query may be saved as a ‘virtual experiment’ and analyzed further.
• Saved data may be filtered to generate a custom data view.
• The end user may also visualize data in the experiment by using various graphical components.
2222
User can override administrator defined services instances from “My Settings” or from the third step of query
2525
Grouping of query results by service instances.
Available results can be grouped based to instances
to view results from the service of interest.
348 Samples filtered to view only the results from
caNanoLab-GME instance
2727
Scenarios for caB2B Use• Use GU/WashU instances to access public data available via production
caGrid
• Use your institutional instance to access– Public data available via production caGrid– Private data available via institutional caGrid (local services)
• Use public/private instance to access – Private data which only your (or your collaborator’s) caGrid credentials
allow access to
3030
caB2B Development and Support
• caB2B Knowledge Center at Washington University Medical School
– https://wiki.nci.nih.gov/display/caB2B/caB2B (Wiki Page)– https://cabig-kc.nci.nih.gov/CaGrid/forums (Forums)
• caB2B Developers at Georgetown University– www.georgetown.edu/
3131
32
Useful Links• caB2B Instances
– https://cab2b.wustl.edu/cab2b/ (Washington University School of Medicine)
– https://cab2b.georgetown.edu/cab2b (Georgetown University)
• caB2B Tools Page https://cabig.nci.nih.gov/tools/caB2B
• caB2B GForge https://gforge.nci.nih.gov/projects/cab2b/
3232
Acknowledgments• Washington University
– Poornima Govindrao– Rakesh Nagarajan – Mark Watson
– Georgetown University• Jim Humphries• Baris Suzek
• NCI-CBIIT– Ian Fore– Juli Klemm
• SAIC-Frederick, Inc.– William Brent Lander
• Rod WinklerCapability Plus Solutions– Chris Piepenbring
• Sapient– Stephen Goldstein
• Persistent Systems/Krishagni Solutions• Srikanth Adiga• Pooja Arora• Gaurav Mehta• Pallavi Mistry• Chetan Patil• Chetan Pundhir• Deepak Shingan• Madhumita Shrikhande• Chandrakant Talele• Rajesh Vyas
333333