Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to...
Transcript of Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to...
![Page 1: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/1.jpg)
d h ll d h l hBig data challenges and opportunities in healthcare: application to detecting faint signals
Dr. Greg SlabaughCity University LondonCity University LondonSchool of Informatics
![Page 2: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/2.jpg)
Data data and more dataData, data, and more data
di 90% f h d i h ld d d i h• According to IBM, 90% of the data in the world today was created in the past two years. [1]
• According to International Data Corporation, the total amount of global data is [2]expected to grow to 2.7 zettabytes during 2012. [2]
• The data is growing exponentially (43% growth rate) and is estimated to be 7.9 zettabytes by 2015. [3]
Term SI prefix
kilobyte (KB) 103
megabyte (MB) 106
gigabyte (GB) 109
terabyte (TB) 1012
terabyte (TB) 10
petabyte (PB) 1015
exabyte (EB) 1018
zettabyte (ZB) 1021
yottabyte (YB) 1024
![Page 3: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/3.jpg)
US Library of CongressUS Library of Congress
• The world’s largest library◦ 151.8 million items on 838 miles of bookshelves
◦ 34.5 million books and other print materials34.5 million books and other print materials
◦ 13.4 million photographs
◦ 5.4 million maps
6 5 illi i f h i◦ 6.5 million pieces of sheet music
◦ 66.6 million manuscripts
By one estimate[4], the entire print collection is roughly 10By one estimate , the entire print collection is roughly 10 petabytes of data
![Page 4: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/4.jpg)
7 9 zettabytes (2015) is7.9 zettabytes (2015) is
>700 000 Libraries of>700,000 Libraries of CCongress
![Page 5: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/5.jpg)
SourcesSources
Images from [4]
![Page 6: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/6.jpg)
Not limited to internet mediaNot limited to internet media Large datasets are impacting nearly all areas of business Large datasets are impacting nearly all areas of business
• Transactional data◦ Walmart: >1M transactions / hour◦ Walmart: >1M transactions / hour
◦ Visa: >12.5M transactions / hour[5]
• Networked sensors◦ Surveillance
◦ Automobiles
• Product development and manufacturing• Product development and manufacturing◦ Integration of data from R&D, engineering, manufacturing
◦ RFID
• Healthcare◦ Patient records
Di ti t t◦ Diagnostic tests
◦ Imaging
![Page 7: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/7.jpg)
Why does big data matter?Why does big data matter?
b l• Big data is not just about storing large datasets
• Rather, it is about leveraging datasets
◦ Mining datasets to find new meaning◦ Mining datasets to find new meaning
◦ Combining datasets that have never been combined before
◦ Making more informed decisionsMaking more informed decisions
◦ Offering new products and services
Data is a vital asset, and analytics are the key to unlocking its potential
“We don’t have better algorithms than anyone else, we just have more data.” [7]
P t N i Di t f R h G l k i 2010‐ Peter Norvig, Director of Research, Google, spoken in 2010
![Page 8: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/8.jpg)
Recognising the valueRecognising the value
l• Oracle
◦ Integration of R (statistical programming language) into database software
• IBMIBM
◦ InfoSphere BigInsights
◦ BigQuery (preview – invitation only): web service that enables interactive analysis on massive datasets – billions of rows
• Opera solutions• Opera solutions
◦ Big data analytics software based on machine learning
• Explorysp y
◦ Explore and compare populations of patients based on medical data records
A h H d• Apache: Hadoop
![Page 9: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/9.jpg)
Big data in healthcareBig data in healthcare
Th t• The past◦ Data is by product of providing healthcare services◦ Data sets filed and never seen again; some datasets discarded; hardcopies
l h l h ( ) l h h k l f f◦ Electronic health records (EHRs): primary value is that they make life easier for doctors and bring down storage costs
• The futurel◦ Data is a central asset
◦ New analytics to mine data and help extract meaning◦ Datasets integrated and cross‐referenced – personalised medicine
l d d l d◦ Digital records aggregated across patients – population studies
• Potential applications[8]
◦ Spotting unwanted drug interactions◦ Identifying the most effective treatments◦ Predicting onset of disease before symptoms emerge◦ Analysing of disease patterns
![Page 10: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/10.jpg)
Challenges in healthcareChallenges in healthcare
Access to data
IT infrastructure
Analytics
Legal issuesData privacy Data integrity
Slide adapted from [9]
![Page 11: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/11.jpg)
Opportunities in healthcareOpportunities in healthcare
$165B Cli i lClinical
$9B $108B $5B Public health R&D Business model
$47B Accounts
Slide adapted from [9]
![Page 12: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/12.jpg)
A case study: colorectal diseaseA case study: colorectal disease
◦ Colorectal cancer secondmost prevalent cancer in Western countries[10]
◦ 940,000 cases occur annually
◦ 655,000 deaths annually
◦ if detected early, 90% of patients live at least ten years
Pre‐cancerous polypPre‐cancerous polyp
![Page 13: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/13.jpg)
Colorectal cancerColorectal cancer
h h l• Pathophysiology
◦ Adenomatous polyps
• Benign tumours of a glandular organ (colon is a mucosal organ)Benign tumours of a glandular organ (colon is a mucosal organ)
• Common, particularly in patients 50+
• Greater than 10 mm: higher likelihood of developing into cancer
◦ Cancer
• Can invade below colon surface and spread to other organs
Progression is slow (5+ years to polyp, 5+ more to cancer)
![Page 14: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/14.jpg)
Screening methodologiesScreening methodologies• Faecal occult blood test (FOBT)Faecal occult blood test (FOBT)
• Optical colonoscopy (OC)
+ Effectiveness, cost, can remove polyps during procedure
I i d i i k f f i l i h i l li i i‐ Invasive, sedation, risk of perforation, occlusions, physical limitations, patient compliance
![Page 15: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/15.jpg)
Screening methodologiesScreening methodologies• CT Colonography (CTC), or “Virtual Colonoscopy”
◦ Examination of the colon using CT imaging◦ Examination of the colon using CT imaging
◦ Patient given laxatives to clear the colon
◦ Patient consumes a faecal tagging solution designed to coat any residual stools or liquid
◦ Thin tube inserted into rectum to inflate colon with gas (C02)
◦ Images taken with patient in prone and supine positionsg p p p p
◦ Images are analysed by radiologist for colorectal lesions
![Page 16: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/16.jpg)
Data overloadData overload• Optical colonoscopyOptical colonoscopy
◦ Approx. 20 minutes of HD video per patient (often not recorded)
◦ 30,000,000 procedures performed annually worldwide
◦ Roughly 15 petabytes of data per year (and growing exponentially)
• CT colonography
◦ Each CT series (prone, supine) has roughly 500 images of size 512x512p p g y g
◦ 1,000,000 procedures performed annually worldwide
◦ Roughly 0.5 petabytes of data per year (and growing exponentially)
All this data must be reviewed by a physician (gastroenterologist or radiologist)
![Page 17: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/17.jpg)
CT Colonography imagesCT Colonography images• This patient has a polyp in their colon. Did you see it?This patient has a polyp in their colon. Did you see it?
Polyps can be very subtle and difficult to detect, even for expert radiologists
![Page 18: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/18.jpg)
Computer aided detection (CAD)Computer‐aided detection (CAD)• CAD consists of image processing and pattern recognition algorithmsCAD consists of image processing and pattern recognition algorithms
designed to detect polyps that may be of interest to a physician
• CAD draws the radiologist’s attention to regions that may have otherwise b l k dbeen overlooked
• “Spell‐checker” for medical images
• Characterised byCharacterised by
◦ Sensitivity (percentage of polyps found)
◦ Number of false positives
• CAD is designed to be complementary – it is not a replacement for physician
![Page 19: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/19.jpg)
How does CAD work?Image pre‐processing
How does CAD work?
Organ segmentation
Image pre processing
Candidate generation
g g
Feature calculation
Classifier
Results
A Robust and Fast System for CTC Computer‐Aided Detection of Colorectal Lesions, Greg Slabaugh, Xiaoyun Yang, Xujiong Ye, Richard Boyes, Gareth Beddoe, Algorithms, 3(1):21‐43, special journal issue on Machine Learning for Medical Imaging, 2010.
![Page 20: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/20.jpg)
PerformancePerformance
• L t l CAD li i l t d (3000+ ti t )• Largest ever colon CAD clinical study (3000+ patients) ◦ Dr. Perry Pickhardt (U. Wisconsin)◦ Published in Radiology 2010[12]; picked up in radiology press ◦ 4.7 false positives per series
CAD identified 15 polyps that were missed by expert radiologists
![Page 21: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/21.jpg)
Advanced analyticsAdvanced analyticsManifold learning[13]Manifold learning
![Page 22: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/22.jpg)
Advanced analyticsAdvanced analytics
Population[ ]regression[14]
![Page 23: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/23.jpg)
Multi‐modality, multi‐scale and hetereogeneous
Organism Organ Tissue CellsOrganism Organ Tissue Cells
Proteomics Geonomics Atomic RecordsProteomics Geonomics Atomic Records
![Page 24: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/24.jpg)
Other “dimensions” to considerOther dimensions to consider
data
time
![Page 25: Big data chllhallenges and opportunities in hlhh ealthcare ...€¦ · • According to International Data Corporation, the total amount of global data is expected togrow 2.7 zettabytes](https://reader034.fdocuments.us/reader034/viewer/2022042810/5f9b3ef65ddb593b497ce0a7/html5/thumbnails/25.jpg)
ReferencesReferences[1] IBM quote, microscope.co.uk[ ] q , p
[2] International Data Corporation 2012 prediction, IDC website
[3] CenturyLink 2015 prediction, ReadWriteWeb website
[4] Estimated data in US Library of Congress, Wikipedia
[5] Visa transactions, FastCompany website
[6] McKinsey Global Institute, “Big data: The next frontier for innovation, competition, and productivity,” 2011
[7] Peter Norvig quote, CNET website
[8] Economist report on Big Data[8] Economist report on Big Data
[9] Wipro infographic
[10] World health organization, 2008 leading causes of death
[11] A Robust and Fast System for CTC Computer‐Aided Detection of Colorectal Lesions, Slabaugh et al., Algorithms, y p g g3(1):21‐43, 2010.
[12] Colorectal polyps: stand‐alone performance of computer‐aided detection in a large asymptomatic screening population, Lawrence, E.M., Pickhardt, P.J., Kim, D.H., & Robbins, J.B. (2010). In Radiology, 256, 791‐798.
[13] http://scikit learn github com/scikit learn org/dev/auto examples/manifold/plot lle digits html[13] http://scikit‐learn.github.com/scikit‐learn.org/dev/auto_examples/manifold/plot_lle_digits.html
[14] Population Shape Regression From Random Design Data, Davis et al., ICCV 2007