Data Day Health IT - Data Architecture
-
Upload
onramp -
Category
Technology
-
view
21 -
download
0
Transcript of Data Day Health IT - Data Architecture
Healthcare Considerations for Modern Data Architectures Pitfalls, Challenges and Best Practices Data Day Health 2017
Presented by:Toby Owen, VP Product Development
OnRamp - Industry leading high security and hybrid hosting
provider- Operates multiple enterprise class data centers
located in Austin, Texas and Raleigh, North Carolina- SSAE 16 SOC II and SOC 3 Audited, PCI and HIPAA
compliant company- Specializes in helping organizations meet their
rigorous compliance requirement and keep their data safe
Toby Owen- Vice President, Product Development, OnRamp- 20 year IT veteran with operations and
engineering background- Security, IT ops at scale, hybrid cloud,
compliant workload hosting
AGENDAGOAL: Designing an app for Healthcare… that’s compliant!
Data StoresApp DesignWhere to Run ItDev LifecycleTakeawaysQ & A
Refresher on (or intro to) databasesCAP theorem
C = ConsistencyA = AvailabilityP = Partition Tolerance
Database Reference Guide – at a glance
*Adapted from http://blog.nahurst.com/visual-guide-to-nosql-systems
Why do we care?• Scaling vertically versus horizontally
- Costs of scaling up can grow exponentially - Scaling horizontally is linear- Limits to scaling vertically, “indefinite”
horizontal scale limit• Data sources are increasingly distributed• Horizontal scaling provides better geo-
resiliency at the same time• Not all data needs strict ACID compliance More arguments favor distributed data stores
RDBMS and ACID• Definition: Atomicity, Consistency, Isolation, Durability• Favors Consistency over Availability• Examples- MSSQL- MySQL- Postgres- Greenplum- VoltDB
Is scalability and ACID a false tradeoff?• Scalability and ACID are difficult to satisfy at the same
time• Not all data requires strict ACID compliance• Relational can be a bottleneck- Simpler models might simplify operations – easier and more
efficient• New relational DBs can be very fast AND scalable• Many NoSQL DB’s adding features to look more like
RDBMS• Take-away: understand your data (shape and use case)
and pick the right solution
NoSQL and BASE• NoSQL Definition- SOME of the following: non-relational, distributed, open-source,
horizontally scalable, schema free, easy replication support, simple API• BASE Definition: Basically Available, Soft state, Eventual
consistency- All data reads will eventually yield the same result
• Favors Availability over Consistency• Let’s focus some time here exploring NoSQL
databases/datastores- Considerations based on scalability, encryption and key management
• Document oriented Database (JSON). Considered “semi-structured” data• Scalability - built in via automatic sharding (range, hash, zone)
- EA FIFA game (250+ servers), Yandex (10’s billion objects, TBs of data, growing at 10MM files uploads/day)• Security – encryption in-transit
- SSL/TLS client support (data in-transit)- MongoDB Enterprise Advanced supports FIPS 140-2- Atlas (Mongo-aaS on Amazon) does NOT support FIPS mode
• Security – encryption at-rest- App level, external filesystem, disk level, or natively (encrypted storage engine). Native suports FIOPS
140-2• Security – key management
- Each DB has a separate Key- Can be integrated with external KMS- Supports key rotation without downtime (via rolling restarts of replica set)- Native encryption is only available via Enterprise Advanced version!
• Row-oriented• Scalability – peer-to-peer distributed system, data across all nodes
- Each node contains commit log, exchanges data across cluster every second- All writes are automatically partitioned and replicated throughout cluster- Apple (75,000 nodes, 10PB); Netflix (2,500 nodes, 420TB, 1 trillion requests/day)
• Security – encryption in-transit- Supports TLS/SSL, separate configs for client-server and server-server- FIPS compliance supported
• Security – encryption at-rest- Open-source Cassandra relies on filesystem encryption- Datastax (commercial version) supports at-rest encryption
• Security – key management- Open-source Cassandra relies on filesystem encryption’s key management tools (can be complex)- Datastax (commercial version) has native KMIP support
• Not really a database – distributed filesystem (HDFS) plus application interface (MapReduce)• Scalability – designed for large file distribution across 100’s and 1000’s of servers, streaming
access and large data sets - (compute cheaper to move than data)- Facebook (21PB, 2000 machines), Spotify (1300 nodes, 42PB storage, 20TB a day ingested, 200TB a
day generated by Hadoop)• Security – encryption in-transit
- HDFS supports transparent encryption • Security – encryption at-rest
- Supported by HDFS, application, database, or disk-level- Lots of options for commercial support and tools to simplify management
• Security – key management- Natively supports it’s own KMS- Again, more commercial options exist to simplify
LOTS of others• Key Value
- Redis - DynamoDB
• Document Oriented- CouchDB - DocumentDB
• Time Series• Graph• + 225 more! (nosql-database.org for basic info and
comparisons)
So you’ve chosen your datastore(s)Now what?
Application architecture!
Application design SOME Considerations for HIPAA and HITECH• HITECH – each app zone requires firewall isolation- Web, app, database
• Key Management- Key Management System (KMS)- Hardware Security Module (HSM)- Keys database- Key splitting – for transferring clear-text cipher keys
Reference Architecture
And more• Many other security considerations around compliant
application architecture- Shared storage resources and shared IaaS
Supporting encryption at-rest may not be enough to achieve HIPAA or HITRUST compliance.
- Verifiable (compliant) destruction of data in a shared environment - Encryption keys need to be managed in accordance with
shared secrets or ‘key splitting’ schemes (e.g. Shamir’s secret sharing)
Next?We’ve chosen the right datastores…We’ve designed our application to support HITRUST or HIPAA…
Where will the app run?
Hybrid is the likely reality• Consuming 3rd party data
sources• Capabilities of each data or
app component provider• BAA with each provider• Peril of failing to plan
How to keep all this compliant?• Lots to consider to get it right• Start at the beginning – your
development lifecycle• Automate everything• Dev/Test/Staging/Production should all
account for secure design• Use Containers ?• Maybe get some help
Key Takeaways• Distributed data is becoming the new norm• Data is different – data usage should dictate data technology
- (no one-size-fits-all)• Application Architecture is key to achieving compliance• Must consider all locations where app is running• Consider compliance in all phases of app development (starting
with design)• Automation in development pipeline is key to building-in and
maintaining compliance throughout app lifecycle• Final consideration – are you now a service provider?
Toby OwenVP, Product [email protected]@tobydowenlinkedin.com/in/tobyowen
Resources• Databases and scaling:
- http://stackoverflow.com/questions/12215002/why-are-relational-databases-having-scalability-issues- http://blog.nahurst.com/visual-guide-to-nosql-systems- http://nosql-database.org/
• MongoDB- https://www.mongodb.com/mongodb-architecture- https://webassets.mongodb.com/_com_assets/collateral/MongoDB_Security_Architecture_WP.pdf
• Cassandra- http://cassandra.apache.org/doc/latest/operating/security.html?highlight=encryption- http://stackoverflow.com/questions/32584253/how-to-use-cassandra-with-tde-transparent-data-encryption- http://dba.stackexchange.com/questions/6909/cassandra-encryption-at-rest- http://www.datastax.com/products/datastax-enterprise
• Hadoop- https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html- Hadoop at Scale: Spotify http
://cdn.oreillystatic.com/en/assets/1/event/118/The%20Evolution%20of%20Hadoop%20at%20Spotify-%20Through%20Failures%20and%20Pain%20Presentation.pdf
• Key management- https://en.wikipedia.org/wiki/Shamir%27s_Secret_Sharing