Big Data and Health Care
-
Upload
jeffrey-funk-creating-new-industries -
Category
Business
-
view
179 -
download
0
Transcript of Big Data and Health Care
BIG DATA DIGITAL HEALTH
REVOLUTIONAlex A0135681
Henri A0135487
Zheng A0121892
Pham A0095804
Yin A0119974
Kavitha A0110143
For information on other technologies, see http://www.slideshare.net/Funk98/presentations
HAVE YOU EVER
VISITED A DOCTOR?
ONE SIZE FITS ALL
ONE SIZE FITS ALL
FOOD FOR THOUGHTS
FOOD FOR THOUGHTS
40,000
+PATIENTS DIE IN US
EACH
BIG DATA DIGITAL HEALTH
REVOLUTION
CONTENT
DATA
COLLECTION
SENSORS
DATA
PROCESSING
HARDWARE
DATA
ANALYZING
ALGORITHMS
SUSTAINABLE HEALTHCARE
SYSTEM
CONTENT
DATA
COLLECTION
SENSORS
DATA
PROCESSING
HARDWARE
DATA
ANALYZING
ALGORITHMS
SUSTAINABLE HEALTHCARE
SYSTEM
TODAY
IN FUTURE
SENSORS TODAY
iBGStar
iHealth wireless
pulse oximeter
Jawbon
e Withings smart body
analyser
iBGStar
iHealth wireless
pulse oximeter
Jawbon
e Withings smart body
analyser
CALORI
ES
EATING
HABITS
SLEEP
BODY
TEMPERATUR
E
HEART
RATE
BLOOD
SUGAR
SENSORS TODAY
CONTENT
DATA
COLLECTION
SENSORS
DATA
PROCESSING
HARDWARE
DATA
ANALYZING
ALGORITHMS
SUSTAINABLE HEALTHCARE
SYSTEM
TODAY
IN FUTURE
DETECTIO
N
ANALYSIS DIAGNOSTIC
S
CELL
CULTURE
DRUG
DELIVERY
THERAPEUTI
CS
SENSORS IN FUTURE
Continuous MicroCHIPS
Glucose MonitoringGoogle lens
MIT batteryless power
source
Parathyroid
hormone
microchip injection
SENSORS IN FUTURE
Sensor-Laden
Transdermal patch
SENSORS IN FUTURE - BioMEMS and Microsystems
SIZE
POWER
COMMUNICATIO
N
SENSORS IN FUTURE - Micro supercapacitors
Laser-scribed graphene micro-supercapacitors
SENSORS IN FUTURE - Reduction in MOSFET size
SENSORS IN FUTURE - External communication
SENSORS IN FUTURE - The trend in shrinking sells
SENSORS IN FUTURE - BioMEMS and Microsystems
● Size decrease
● Better and smaller communication chips and algorithms
● micro supercapacitors
● This will facilitate the arrival of these new implantable chips
● Allows for non bothersome personal medicine
● Allow for more tailored medicine
● It will require more data analysis and more processing power
CONTENT
DATA
COLLECTION
SENSORS
DATA
PROCESSING
HARDWARE
DATA
ANALYZING
ALGORITHMS
SUSTAINABLE HEALTHCARE
SYSTEM
Introduction
SSD vs HDD
Data Protection
The Storage Medium used is
of More focus than the
Quantity of Storage used. It is
no longer one-size-fits-all
“Data Deluge” is
Fundamentally Changing the
way that Storage is
Approached.
HARDWAREIntroduction
● Provide Real-time Or Near Real-time
Responses.
● Handle Huge Data Volumes Growing Rapidly
Key Characteristics of Big Data Infrastructure:
● High processing/IOPS performance
● Very Large Capacity.
HARDWAREWhat’s Key to Efficient Data Processing?
KEY DIFFERENTIATOR
● Big Data is Largely Unstructured.
● Unstructured Data is Immutable
● Traditional File Systems have Built-in Functions to handle Insert/Update.
● Creates a Lot of Overhead in Terms of Performance, IOs Required to
Access Data and the Ability to Scale
HARDWAREWHY DO WE NEED A DIFFERENT APPROACH?
FIG. GROWTH OF UNSTRUCTURED DATA ANNUALLY
● Objects in one Large, Scalable Pool of Storage
● Stores metadata – Information about the
object
● An Object ID is stored, to Locate the Data
● Objects are immutable
● No File System Hierarchy
Products:
● Scality’s RING architecture
● Dell DX
● EMC’s Atmos
HARDWAREOBJECT STORAGE – Choice of Storage
CONTENT
DATA
COLLECTION
SENSORS
DATA
PROCESSING
HARDWARE
DATA
ANALYZING
ALGORITHMS
SUSTAINABLE HEALTHCARE
SYSTEM
Introduction
SSD vs HDD
Data Protection
● Access Times
SSDs exhibit Virtually no Access
time
● Random I/O Performance of SSDSSD Delivers at least 6000 IO/Sec
15 times faster than HDD(400
IO/S)
● Reliability
SSDs 4-10 Times more Reliable
HARDWAREStorage Medium Solid-State Drive (SSD) or Hard Disk
Drive(HDD)
SSD
HDD
REAL TIME APPLICATIONS OF SSD
● Read-Intensive Video-on-demand(VOD), and Image-Retrieval
Applications.
● Emerging Applications (Big Data/Hadoop/Cloud)
HARDWARECOMPARISON OF BOOT TIMES USING SSD & HDD
2011Throughput 250 MB/s , Capacity 512GB
2014:1000 MB/s Data Transfer , Capacity 4TB
Standard 2.5 inch form factor
Further Scale Down of Flash
Lithography
Leads to Continued Performance Gains
and Greater Capacity Points.
HARDWARESolid-State drives SSDs & Moore’s Law
Fig 1.HDD Aerial Density follow Moore’s
Law
Fig2. Avg. Price Comparison of SSD Vs.
HDD
CONTENT
DATA
COLLECTION
SENSORS
DATA
PROCESSING
HARDWARE
DATA
ANALYZING
ALGORITHMS
SUSTAINABLE HEALTHCARE
SYSTEM
Introduction
SSD vs HDD
Data Protection
HARDWAREDATA PROTECTION – WHY DIFFER FROM TRADITIONAL
APPROACHES?
RAID (REDUNDANT ARRAY OF INDEPENDENT DISKS)● Originally Designed for Small Capacity Disks.
● Longer Time taken to Restore a Failed Drive as Capacity Increase.
● To Shorten Longer Rebuild cycles, RAID Systems Ship with Faster Processors,
Leading to High Energy Consumption.
REPLICATION
● Copies Add Additional Costs: Typically 133% or more Additional Storage is
needed for each Additional Copy
● Storage System will get More Expensive as the amount of Data Increases
HARDWARELimitations of Traditional Approaches
How Does it Work?
● Information Dispersal Algorithms (IDAs)
separate data into Unrecognizable slices of
information.
● It is then dispersed to Storage Nodes in
disparate Storage locations.
● It can be implemented Locally or
Distributed .
● Only a Pre-defined subset of the slices From
the Dispersed Storage Nodes is needed to fully
Retrieve all of the Data.
HARDWAREInformation Dispersal - Better Approach?
● It is Resilient against Natural disasters or Technological failures, like
Drive failures, System Crashes and Network Failures.
● Data can still be Accessed in Real-time even if there are Multiple
Simultaneous Failures across a String of Hosting Devices, Servers or
Networks
● Five 9’s or More are Guaranteed with Overhead Low as 20% - As
Opposed To 3 Copies Requiring 200% Overhead.
HARDWAREBenefits of Information Dispersal
HARDWARECost Savings from IDA in Petabyte Storage over RAID and
Replication
When looking at Number of Years without Data loss, with a 99.99999% Confidence Level,
Information Dispersal doesn’t even appear on the Chart because even For a Large storage amount
like 524K Terabytes, the Confidence for Years without data loss is not within anyone’s
lifetime.(Theoretically Over 79 Million Years.)
HARDWARECost Savings from IDA in Petabyte Storage over RAID and
Replication
When looking at Number of Years without Data loss, with a 99.99999% Confidence Level,
Information Dispersal doesn’t even appear on the Chart because even For a Large storage amount
like 524K Terabytes, the Confidence for Years without data loss is not within anyone’s
lifetime.(Theoretically Over 79 Million Years.)
HARDWARECost Savings from IDA in Petabyte Storage over RAID and
Replication
CONTENT
DATA
COLLECTION
SENSORS
DATA
PROCESSING
HARDWARE
DATA
ANALYZING
ALGORITHMS
SUSTAINABLE HEALTHCARE
SYSTEM
Deal with huge
data
Machine learning
How make the huge dataset to match the ICD 10?
ALGORITHMSDeal with the huge data
ICD 10 Clinical
Modifications
69823
ICD CM Dataset • 3-7 characters
• Character 1 is alpha
• Character 2 is numeric
• Character 3-7 can be alpha
or numeric
ICD 10 Procedure
Coding System
76000
ICD 10 PCS Dataset • 7 characters
• Each one can be alpha or
numeric
• Numbers 0-9; letters A-H, J-
N, P-Z
ALGORITHMSICD 10 introduction
Analytics Algorithms
Machine Learning
Image Retrieval system
Huge Nonstandard
Data Source (4V)
Data Feature Selection
Huge multiple
characters mapping
databases
Data Analytics
Volume
Velocity
Variety
Veracity
ALGORITHMSWhy we need big data
CONTENT
DATA
COLLECTION
SENSORS
DATA
PROCESSING
HARDWARE
DATA
ANALYZING
ALGORITHMS
SUSTAINABLE HEALTHCARE
SYSTEM
Deal with huge data
Machine learning
Diagnosis is a relatively straightforward
machine learning problem. Clinical
decision making is highly suited for
rule-based systems because of the
nature of the data, such as ICD-10
codes, medications, etc.,
ALGORITHMSMachine Learning in medical diagnosis
ALGORITHMSPopular Imaging Modalities in Healthcare Domain
ALGORITHMSMedical Image Retrieval System
*ImageCLEF medical – competition on Medical Image Processing
Two main tasks:● Image–based retrieval● Case–based retrieval
source : http://www.imageclef.org/
# of images
ALGORITHMSDatabase of ImageCLEF Data Medical
competition
• This is the classic medical retrieval task.
• Similar to Query by Image Example.
• Given the query image, find the most similar images.
http://www.imageclef.org/
# performance
ALGORITHMSImage base retrieval Algorithm
Performance = Difficulty * Accuracy
# of images Mean average
precision
• This is a more complex
task; is closer to the
clinical workflow.
• A case description, with
patient demographics,
limited symptoms and test
results including imaging
studies, is provided (but
not the final diagnosis).
• The goal is to retrieve
cases including images
that might best suit the
provided case description.
http://www.imageclef.org/
ALGORITHMSCase-based retrieval Algorithm
Speed Slow Fast
Accuracy Hard to keep Precision
Level to study Quite hard Easy to learn
Solution level Shallow Deep
Machine
Learning
NO YES
Result Hard to explain Perspective visualization
ALGORITHMSManual Calculate VS Software and Algorithm
CONTENT
DATA
COLLECTION
SENSORS
DATA
PROCESSING
HARDWARE
DATA
ANALYZING
ALGORITHMS
SUSTAINABLE HEALTHCARE
SYSTEM
Technological
fusion
TECHNOLOGICAL FUSION
BioMEMS Hardware Object Storage
Information Dispersal
Machine Learning
More data can be
gathered to identify
patterns and
interactions
Doctors will use for
diagnosis and decision-
making
Health care costs will
decrease
Individual patient care
will improve
TECHNOLOGICAL FUSION CONCLUSION
THANK YOUQ&A