Overview of policies for security and data sharing
Transcript of Overview of policies for security and data sharing
Overview of Policies for Security and Data Sharing
Ingolf KrügerBarry Demchak
March 16, 2010
Roadmap
• PALMS (Physical Activity Location Measurement System)• SOA Review• PALMS Logical Architecture• Policy and its composition• Policy execution – relationship with caBIG
Feel free to ask questions!
PALMS Objectives
• Support data collection and analysis for exposure biology studies– Data capture from multiple devices– Multiple analyses and recombination of data– Sharing of data between investigators and projects– Support multiple visualizations (local and remote)
• Extensible and Flexible– Scalable for large data flows– Support large number of investigators and studies– Customizable datasets, calculations, and
visualizations• HIPAA Compliant and Secure
PALMS Organization and Data Flow (CI)
SPSS
ESRI
GPS Device
Accelerometer
Others
PALMS
Study StudyStudy
Study
Filtering Filtering Filtering
Scoring
Analyzing
Scoring Scoring
Analyzing Analyzing
External Data
Subject Data Raw Data
Others
...
...
Google Maps Viewer Other Local Viewer
Authorable & Discoverable
Study Repository
Visualization
Repository
Calculation Repository
PALMS Community
PIStudy
Study
Study
PIStudy
Study
PIStudy
PALMSStudyStudyStudy
Study
StudyStudy
PI PIPI
Community
• Policy-driven access control
– Subject data– Study data– Calculations– Visualizations
• Secure• HIPAA Compliance• Customized Studies• Collaboration• Data Reuse
Browser
Excel,Matlab…
Study Repository
Visualization Engine
Calculation Engine
Data Flow
PALMS
StudyStudyStudy
Study
StudyStudy
PI
RA
Define
EnterSubjects
EnterObservations Refi
ne & Star
t
Refine & Create
Guest
Study Repository
Visualization Engine
Calculation Engine
Policy
PALMS
StudyStudyStudy
Study
StudyStudy
PI
RA
Guest
Policy
Policy
AdminDefines
Policy (def.): Permission for
someone to act on something
Uses
Applies
Policy (alt def.): Conditional
replacement of one workflow with
another
Services and SOA
• Loose Coupling• Late Binding• Scalability• Composition• Interoperability• Testability
Producer Database
OK
StoreData(xxx)
Tim
e
Producer Database
Message Bus
Sto
reD
ata(
xxx)
OK
Network Implementation
Single Server, Multiple Processes
Single Application, Linked Modules
Logical Deployment
• Malleability• Manageability• Dependability• Incremental development
Logical Architecture (Preview)
Event Logger Access Policies
PALMS Integration System
Integration Adapter
Data Repository
HIPAA Policies
Service/ Data
ConnectorViewerViewer
Adapter
Consumer Systems
Service/ Data
ConnectorSensor AdapterSensor
Producer Systems
Subject Repository
Service/ Data
ConnectorAuthoringCalculation
Repository
Calculation Systems
ExecutionPrototyping
Failure Detection/Mitigation
Composing Workflow and Policy
• Define and implement Policy Concerns– A class of policy decision embedded in a workflow– Characterized by a contract for workflow and dataflow– Supports reasoning regarding application correctness,
completeness, and contradiction– Instantiated as policies inserted by stakeholders at either
design time or runtime
If user in [“PIs”, “RAs”, “Guests”] ContinueElse Reply “Failure”
Groups and Roles
If user in [“PIs”, “RAs”, “Guests”] ContinueElse Reply “Failure”
• Internet2 Grouper– Hierarchical group
management– Single point of control– Permission-based
administration– Virtual organizations
(VOs)
Identity
If user in [“PIs”, “RAs”, “Guests”] ContinueElse Reply “Failure”
• Establishing– What I have (token)– What I know (password)– What I am (biometric)
• Referencing– Trust relationships
(certification authorities)– X509 Certificate– SAML Certificate– OpenID
Browser Application
3
2
1
Certificate
4
5User ID & Password Confirm
ID Provider
caBIG• cancer Biomedical Informatics Grid
– Connects scientists & practitioners: shareable & interoperable infrastructure– Develop standard rules & common language: easily share information– Tools: collecting, analyzing, integrating, disseminating cancer information
– Cornerstones– Federation– Open development– Open access – Open source
– Workspaces– Clinical Trial Management– Integrative Cancer Research– Tissue Banks and Pathology– Vocabularies & Common Data Elements– Architecture– Strategic Planning– Data Sharing and Intellectual Capital– Training
caGrid & GAARDS• Grid Authentication & Authorization with Reliably Distributed Services
– Services & Tools for enforcement of security policy in enterprise grid– Developed on Globus Toolkit– Provides
– grid user management – identity federation – trust fabric provisioning and management – group/VO management – access control policy management and enforcement – credential delegation – web SSO – integration between security domains & grid security domain
caGrid & GAARDS
Relationship to PALMS• Pros
– Well supported– caGrid Knowledge Center (Justin Permer/Ohio State
Bioinformatics)– Professionally managed
– Well developed governance and development models– Standards-based
– Security: X509 & SAML– Ontologies: Thesaurus and Metathesaurus
– Sharing infrastructure– Growing community
• Cons– Key infrastructure out of our direct control
Questions??
Backup slides
Composing Workflow and Policy
Scenario: Add Policy to Existing Workflow
Is CNN Ready?
Is BBC Ready?
e-mail story to [email protected]
No
No
Yes
Yes
Authorized User?
No
Yes
(CNN | BBC) > story > if(authorized) > email(story,”[email protected]”)
• Key issues– What is policy to compose?– Where to insert policy? ... capture all paths?– How to compose multiple policies?– How to guarantee integrity of workflow?– Preview: We have to address these
• Current methodologies– Requirement discovery and hand coding– Policy-based design & Inversion of Control– Aspect Oriented Programming– UML sequence chart composition
• New methodology (preview)– ORCA
Architecture Definition Methodology
Rich Services Virtual Network
Rich Services
RAS4
Services
Service S 1
Roles
U1
U2
U3
U4
U5
Use Case Graph
ConcernsC1 C2 C3
C4CC1
CC2CC3
Domain Model
R1 R2
R3 R4
R5 R6
R1 R2
msg
R3
CC1CC2
Role Domain Model
R1 R2
R3 R4
R5 R6
CC1 CC2 CC3
Router/ Interceptor
Messenger /Communicator
RAS1 RAS2
CC1 CC4 CC5
Router /Interceptor
Messenger / Communicator
RAS5 RAS6RAS 3
S/D
S/D
RIS :
RIS:
Serv
ice
Elic
itat
ion
Ric
h S
ervi
ce A
rchi
tect
ure
RAS7
System of Systems Topology
H1 H2
H3
H5
H6
H7
H8
H9H4
RAS1 RAS2 RAS3
RAS5 RAS6 RAS7
Infrastructure Mapping
H1:RAS1 H2:RAS 2
H3:CC1
H5:RAS2
H6:RAS5
H7:RAS7H8:RAS7
H9:RAS6
H4:RAS3O
ptim
izat
ion
ImplementationRAS1 RAS 2
RAS3 RAS 4
RAS5 RAS 6
RAS7 CC1
CC2 CC3
CC4 CC5
Ana
lysis
Syn
thes
is
Ana
lysis
Iden
tific
atio
n
Def
initi
on
Con
solid
atio
n
Refinement
Hierarchic composition
Refinement
Logical Model
Syst
em A
rchi
tect
ure
Defin
ition
Logical Architecture Loop
Deployment Loop
User View
Analysis Engine
Analysis Engine
Network
Access Policies
Event Logger
Data Repository
HIPAA Policies
Research Feeds
Blood Pressure
GeoTracker
Camera
CO2
Sensor
TextMessage
Visualizations
Internet Browsers
Geo Display
Export
Internet Explorer, FireFox, etc
SPSS, Excel, Crystal
Reports, etc
Analysis Engine
Subject Registry
Data Flow (Today)
SPSS
ESRI
GPS Device
Accelerometer
Others
PALMS
Filtering
Scoring
Analyzing
Subject Data
Raw Data
Others
Tagging
Data Flow (Analysis-centric)
Raw Data Tagging/Filtering Scoring
Calorie Analysis
Bout Analysis
Final Product SPSS
Accelerometer
PALMS
Study
Optional
Subject Data
Data Flow (Algebraic)
Tagging Editor (T)
R = database (r)e = seconds in epoch
Inputs
T = database (r).s = subject.t = time of first epoch.ve = one value per epoche = seconds in epochy = invalid value
Outputsr = rule of {f, vl, vh}.f = list of {time filter}.vl = lowest valid value.vh = highest valid valuerg = global rule for Rrr = rule for particular ry = invalid valueVh = valid values in hourVd = valid hours in day
Local Params
Transform R → T by using rules r to either convert Ri.ve to y or leave it as Ri.ve. Editor allows user to specify rg, rr , and y in WYSIWYG style
Processing
Specifies global and row-specific rules for excluding raw data, then produces a new database by applying rules to raw data
Description
A list of data records uniquely identified by subject and time
Description
r = row of {s, t, ve}.s = subject.t = time of first epoch.ve = one value per epoche = seconds in epoch
Data Values
Raw Observation Database (R)
Scoring Editor (S)
T = database (r)e = seconds in epochy = invalid valueP = database {s, a, w}
Inputs
S = database {s, t, vp}.s = subject.t = time of first period.vp = record per period..s = total count for period..c = category for period
OutputsG = list of groups.name = group name.al = lowest age.ah = highest ageC = list of categories.name = category name.vl = lowest count value.vh = highest count valuep = period (day/hour/total)
Local Params
Transform T → S by using r.s.age and G.C to calculate categories from r.ve. Editor allows user to specify G, C, p in WYSIWYG style
Processing
For each row, totals sequences of epoch values, then classifies each total into a category
Description
Calories Editor (K)
S = database {s, t, vp}P = database {s, a, w}
Inputs
K = database (s, t, kp)
Outputs
k = kilocalorie formula
Local Params
Transform S → K by using a formula k(S.vp, P.a, P.w) for each vp in S. Editor allows user to specify k in WYSIWYG style
Processing
For each period count, determine the number of calories expended
Description
A list of data records uniquely identified by subject
Description
r = row of {s, a, w}.s = subject.a = age of subject.w = weight of subject
Data Values
Subject Database (P)
Bout Editor (B)
T = database (r)e = seconds in epochy = invalid value
Inputs
B = database (s, t, bp)
Outputs
mb = bout (minutes)vl = lowest activity countvh = highest activity countmt = tolerance (minutes)
Local Params
Transform S → B by counting runs subject to local parameters. Editor allows user to specify local parameters in WYSIWYG style
Processing
For each period count, determine the number of bouts observed
Description
Scoring Editor (S)
T = database (r)e = seconds in epochy = invalid valueP = database {s, a, w}
Inputs
S = database {s, t, vp}.s = subject.t = time of first period.vp = record per period..s = total count for period..c = category for period
OutputsG = list of groups.name = group name.al = lowest age.ah = highest ageC = list of categories.name = category name.vl = lowest count value.vh = highest count valuep = period (day/hour/total)
Local Params
Transform T → S by using r.s.age and G.C to calculate categories from r.ve. Editor allows user to specify G, C, p in WYSIWYG style
Processing
For each row, totals sequences of epoch values, then classifies each total into a category
Description
Artifacts
• User Stories• Use Cases• Access Control Patterns• Domain Modeling • Dataflow• Low Fidelity UI• Service Definitions• Rich Service
Requirements Modeling
Service Modeling
Use Cases
Use Case Attributes• ID• Name• Priority• Complexity• Release Number• Last Revised• Description• Actors (Primary and Secondary)• Stakeholders• Pre-Conditions• Constraints• Post-Conditions• Triggers• Cross References
• Flow of Events– Basic Flow– Alternative Flows– Exceptions
• Extensions• Information Requirements• Special Requirements• Frequency of Use• Assumptions• Issues and Considerations
– Issues– Consideration
• Process Flows• Related Use Cases
RA signs in
RA selectsstudy
RA uploads .CSV and .GPX files
PALMS displays summary
RA confirms summary
PALMS commitsdataset
PALMS abandonsdataset
Display error
Display error
All files missingor invalid
Time rangeoverlaps
accept decline
Low Fidelity User Interface
Domain Modeling (Overview)
Domain Modeling
Rich Service
Service Interactions (AAI)
alt
loop
Web Browser Authentication Repository PALMS
ValidateCredential(UserID, Pwd)
SAML
SomeOp(study, …)
1
Browser Proxy
Result
Retain SAML
+ SAML
1
Service Interactions (Calculation)
alt
alt
Web Browser PALMS Study Calculation
Engine
StartCalculation(study, protocolID, paramBlockID, resultName)
StartResult
- study
+ study
Start Calculation
Results Repository
Initiate Result
AddResult(resultName, protocolID, paramBlock)
AddResult
Protocol Repository
GetProtocolParams(protocolID, paramBlockID)
Get Param BlockParamBlockResult
The Road ForwardComponent Interactions
Web Browser PALMS
Request
Result
Browser Proxy
PALMS Subservices
Client Server Server
Google Web Toolkit(GWT)
Mule EnterpriseService Bus
PALMS Products• Integration
– Mapping Engines– Data Mining Engines– Social Networks– Disaster Management
• Alerts and Events
• Data Subscriptions
• Data Flow Analysis (provenance flow)
• Scalable and Configurable Calculations
• Collaboration
Questions??