Post on 06-Jan-2022
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Searching Surgical Pathology Databases Searching Surgical Pathology Databases with Images Instead of Words with Images Instead of Words
Ulysses. J. Balis, MDUlysses. J. Balis, MDDirector of Pathology Informatics Director of Pathology Informatics -- MGH Pathology Service,MGH Pathology Service,
Chief of Pathology Chief of Pathology -- Boston Shriners Burns HospitalBoston Shriners Burns Hospitalbalis@helix.mgh.harvard.edu balis@helix.mgh.harvard.edu
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Disclosure of AffiliationsDisclosure of Affiliations
•• Aperio Technology Aperio Technology –– Major ShareholderMajor Shareholder–– Scientific Advisory BoardScientific Advisory Board
•• Living Living MicroSystemsMicroSystems–– FounderFounder–– ShareholderShareholder
•• ImpacImpac Medical SystemsMedical Systems–– Scientific Advisory BoardScientific Advisory Board
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
S
E
A
RC
H
Q
Q
QQ Q
Q
Q
T
T
TT
T
J
J
L
XX
Q TJ
L
L
X
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
OutlineOutline
•• Current Search in Surgical PathologyCurrent Search in Surgical Pathology•• Fundamental Search ConceptsFundamental Search Concepts•• Extending searches to the spatial domainExtending searches to the spatial domain•• Background complexity theoryBackground complexity theory•• Interactive demonstrationsInteractive demonstrations
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Some observations about searching Some observations about searching extant repositories (data mining)…extant repositories (data mining)…
•• What is the typical What is the typical search modality, search modality, computationally?computationally?
•• What is a more What is a more desirable search desirable search modality from an modality from an Anatomic Pathology Anatomic Pathology perspective?perspective?
Region-of-interest based predicate
Test/code-based predicate (keywords, ICD-9 codes, etc.)
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Further ObservationsFurther Observations
•• Anatomic Pathology is largely visual and yet our Anatomic Pathology is largely visual and yet our search predicate remains firmly entrenched in search predicate remains firmly entrenched in texttext--based content based content retriealretrieal..
•• With the enabling reality of WholeWith the enabling reality of Whole--SlideSlide--Imaging, it is now germane to consider the Imaging, it is now germane to consider the problem of contentproblem of content--based image retrieval (CBIR) based image retrieval (CBIR) and the associated metric of tagged metadata and the associated metric of tagged metadata that can accompany such a repository.that can accompany such a repository.
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Conventional TextConventional Text--based Searchesbased Searches
•• Flexibility of search is dependent on:Flexibility of search is dependent on:–– Level of granularity in the source Level of granularity in the source
databasedatabase–– Correctness of spellingCorrectness of spelling–– Correctness of codification (if present)Correctness of codification (if present)
AND OR NOT NEAR LIKE
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
8AM: What cases does this tumor remind me of?
10AM: This looks like a case I saw last week – how
did I describe that one?
2PM: What is the fundamental pathogenesis
going on here?
4:30PM: That back 9 yesterday was a
disaster – I’ve really got to get my slice
under control
Realities:•Surgical Pathology is largely visual, yet our current fundamental approach to data retrieval is text-based.
•This functional gap is largely a resultant of the historical difficulty of content-based image retrieval and not representative of the lack of need for this capability.
•Much of the cognitive and diagnostic methodology that represents the art of surgical pathology is based upon pattern matching and not a dependence on text.
e.g. our current primary retrieval methodology is text-based because we are constrained to this medium as our search metric.
The actual search metric is spatial.
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006The CCD – the fundamental enabling tool of digital image capture
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
WideWide--Field Image Capture (type 1)Field Image Capture (type 1)formerly known as the formerly known as the store and forward modelstore and forward model
•• Acquire the whole slide into the digital realmAcquire the whole slide into the digital realm•• Image is scanned and reconstructed in some Image is scanned and reconstructed in some
predefined time interval (minutes to hours)predefined time interval (minutes to hours)•• The data set is then available for display, The data set is then available for display,
dissemination, analysis or query. dissemination, analysis or query. •• Better performance is achieved with increasing Better performance is achieved with increasing
computational power, system memory and offcomputational power, system memory and off--line storage.line storage.
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Current State of WideCurrent State of Wide--Field Slide Field Slide ScanningScanning
•• Single slide (and small set) scanning reduced to Single slide (and small set) scanning reduced to practice.practice.
•• Generally confined to a single acquisition planeGenerally confined to a single acquisition plane•• Storage technology currently based on Storage technology currently based on
multiplanarmultiplanar TIFF / JPEG 2000 TIFF / JPEG 2000 storage/compression technology.storage/compression technology.
•• Optical path engineering is Optical path engineering is approachingapproaching the the diffraction resolution limit of of modern optical diffraction resolution limit of of modern optical microscopy microscopy
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
WideWide--Field Microscopy Field Microscopy –– Competing Competing factors….factors….
•• Compression RatioCompression Ratio–– Too low: digital storage is prohibitively Too low: digital storage is prohibitively
expensiveexpensive–– Too high: Image is useless, diagnosticallyToo high: Image is useless, diagnostically
•• Resolution (image quality):Resolution (image quality):–– Too low: Image is useless, diagnosticallyToo low: Image is useless, diagnostically–– Too high: Image acquisition too timely to Too high: Image acquisition too timely to
allow for conversion to an all digital allow for conversion to an all digital signoutsignoutparadigm.paradigm.
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
CCD Facts:•Number of pixels (i.e. MegaPixel Count) is directly proportional to the maximum number of transistors that current microphotolithographic techniques can allow on a single substrate
•Transistor count for both CCD’s and CMOS imagers closely follows Moores Law, which states that total number of possible transistors on a chip doubles every 18 months. This has been generally accurate since the mid-1960’s
•Current State of the art (mid-2005) in single-device imagers:
•Consumer grade imaging: 16.2 Megapixel (Canon)
•Scientific-grade imaging: 22.6 Megapixel (Dalsa Corporation)
•This capability will likely double by the close of 2006
ResolutionResolution
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Some ObservationsSome Observations
•• Characteristics:Characteristics:–– ~2.5 by ~7.5 cm~2.5 by ~7.5 cm–– 1/3 used for label1/3 used for label–– 2.5 x 5.0 cm for tissue display2.5 x 5.0 cm for tissue display–– Typical light microscopy is Typical light microscopy is
diffractiondiffraction--limited to 0.25 limited to 0.25 micronsmicrons
–– Yields an effective required pixel Yields an effective required pixel count of 100K by 200k pixels (2.3 count of 100K by 200k pixels (2.3 Gb) or a 20k Gb) or a 20k MPixelMPixel ImageImage
–– This is the same things as saying This is the same things as saying that one would need to capture that one would need to capture 20,000 images with a 1 20,000 images with a 1 MPixelMPixelcamera to obtain a single slidecamera to obtain a single slide
–– Herein lies the essence of why Herein lies the essence of why telepathology has been so long in telepathology has been so long in approaching an operational approaching an operational reality.reality.
7.5 cm5 cm
2.5 cm
(1000 x 25) / 0.25 microns = 100,000 linear pixels
(1000 x 50) / 0.25 microns = 200,000 linear pixels
This is a 20 GPixel image vs. a relatively insignificant
4 MPixel Image
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
What happens as Moore’s Law is What happens as Moore’s Law is applied to the pathology problem?applied to the pathology problem?
Number of Megapixels
Resulting Required Number of Captures
Year Imager commonly available (Moore's Law)
Time (min.) to capture single slide (@ 0.25 sec / image)
1 20000.00 1998 83.33
2 10000.00 2000 41.67
3 6666.67 2001 27.78
4 5000.00 2003 20.83
7 2857.14 2005 11.90
12 1666.67 2006 6.94
16 1250.00 2007 5.21
22 909.09 2009 3.79
44 454.55 2010 1.89
88 227.27 2012 0.95
172 116.28 2013 0.48
344 58.14 2015 0.24
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Project ObjectivesProject Objectives
•• Develop a selfDevelop a self--training, domain independent image training, domain independent image segmentation / classification tool.segmentation / classification tool.
•• Utilize this tool to create two novel image search Utilize this tool to create two novel image search modalities:modalities:–– Region of interest Query by example (image space search; not Region of interest Query by example (image space search; not
text based)text based)–– Retrieve diagnostic information associated with prior classifiedRetrieve diagnostic information associated with prior classified
fields, enabling the generation of dynamically generated fields, enabling the generation of dynamically generated differential diagnosisdifferential diagnosis
•• Explore the Explore the stochasticsstochastics of multiof multi--dimensional image space dimensional image space data as it applies to other emerging massively parallel data as it applies to other emerging massively parallel data collection approaches (genomics, proteomics, etc.)data collection approaches (genomics, proteomics, etc.)–– i.e. i.e. MorphogenomicsMorphogenomics
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Some salient events leading to the Some salient events leading to the present…present…
1994 1994 –– CAP Explores the possibility of computerized cytology proficienCAP Explores the possibility of computerized cytology proficiency testingcy testingIt became quickly obvious that data compression would play a keyIt became quickly obvious that data compression would play a key role in realizing this highrole in realizing this high--performance computing applicationperformance computing applicationLosslessLossless--based compression was simply based compression was simply inadequteinadequte with its 5:1 maximum ratiowith its 5:1 maximum ratioLossLoss--based compression exhibited significant artifacts when compressibased compression exhibited significant artifacts when compression ratios of greater on ratios of greater than 50:1 were employed.than 50:1 were employed.Data capture platforms for acquiring an entire slide surface areData capture platforms for acquiring an entire slide surface area were simply not a were simply not commercially available.commercially available.
2000 2000 –– Multiple WholeMultiple Whole--slide vendors enter into the fray, enabling the data acquisitionslide vendors enter into the fray, enabling the data acquisitioncomponent, but leaving the data compression issue as a remainingcomponent, but leaving the data compression issue as a remaining course of course of discoverydiscovery2001 2001 –– JPEG2000 compression (waveletJPEG2000 compression (wavelet--based) facilitates slightly higher compression based) facilitates slightly higher compression ratios of upwards of 150:1, which is still largely inadequate foratios of upwards of 150:1, which is still largely inadequate for the problem of r the problem of archiving comprehensive workflow, which is estimated at archiving comprehensive workflow, which is estimated at PetabytesPetabytes to to ExabytesExabytes per per year.year.
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Conventional LossConventional Loss--basedbasedImage CompressionImage Compression
Raw Data RestoredData
Compression Algorithm
Restoration Algorithm
Compressed data(may or may not preserve spatial
organization of original data)
Depending on the selected compression ratio, restored loss-compressionimagery may or may not be of diagnostic quality.
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Vector QuantizationVector Quantization
Original Image Division of image into local
domains
Extraction of Local Domain
Composite Vectors
Individual assessment of each composite vector
Vectorization of each local kernal
VK=Σ{[L•x0y0]Order ,… [L•xnym]Order}
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
1,1 1,2
2,1
n,n
1,1 1,2 ….. 1,n
2,1 2,2 ….. 2,n
. . .
n,1 n,2 ….. n,n
. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .
=Each location is an RGB triplet; hence, each vector component is itself a triplet sub-vector.For every location
Initial n by n sub-region of image Resultant Vector Kernel of n●n●3 dimensionality
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Vector QuantizationVector QuantizationVK=Σ{[L•x0y0]Order ,… [L•xnym]Order}
Query Against library (Vocabulary) of established vectors
EstablishedVocabulary
NovelVector
PreviouslyIdentified Vector
38857448643
Assignment of a unique serial number and
inclusion into global
vocabulary38857448643
553246564
53887
554323267
865438676
354554343
55565435
446854
446854456
66963658
776956468
8865433
Assembly ofcompressed
dataset
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
VQ VQ -- BasedBasedImage CompressionImage Compression
Raw Data RestoredData
Compressed data(preserved spatial organization of
original data)
Depending on the selected compression ratio, restored loss-compressionimagery may or may not be of diagnostic quality.
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
N-Space systems exhibit Maxwellian energy distributions, regardless of length-scale, making them available for modeling in reverse-discretized form.
Thus, the cluster of homomorphs created by any histologic architecture can be modeled by a family of continuous functions, simplifying computational complexity and search-space size.
Let us witness a family of orthonormalpolynomial in N-space constituting a synthetic aperture cytolologic image.
From: Galactic Dynamics, Binney J and Tremaine S. Princeton University Press, 1987
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Consequences of Z mode modelingConsequences of Z mode modeling
•• Opportunities for significantly greater Opportunities for significantly greater compression than that currently in usecompression than that currently in use–– JPEG (lossless): 3JPEG (lossless): 3--5:15:1–– JPEG (loss): 5JPEG (loss): 5--25:125:1–– JPEG2000 (loss): 25JPEG2000 (loss): 25--200:1200:1
•• Point Spread function / Point Spread function / ChebyshevChebyshev--II Z and II Z and Volumetric Classifier ModelingVolumetric Classifier Modeling–– Generation 1: 1000:1Generation 1: 1000:1–– Generation 2: 10,000:1Generation 2: 10,000:1–– Generation 3: 100,000:1Generation 3: 100,000:1–– Generation 4 (under development): 1,000,000:1Generation 4 (under development): 1,000,000:1
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
A serendipitous intersection of disparate (or apparently disparate) fields of study: Astrophysics and Informational Theory in Multi-dimensional Histology Informational Representation.
The N-Space sparsity issue has been well explored in the general field of Galactic Dynamics with the general case solution being an exact fit for histology information theory.
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Typical 2D Voronoi Projection of N Space Data
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
The MeanThe Mean--freefree--path problempath problem
•• In Astrophysics: What is the incidence of In Astrophysics: What is the incidence of two stars colliding for a given tensor two stars colliding for a given tensor volumenticvolumentic distribution?distribution?
•• In Histology: What is the likelihood of two In Histology: What is the likelihood of two comparable tensors sharing a common comparable tensors sharing a common region in Nregion in N--space for a given space for a given homomorphichomomorphic stringency?stringency?
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
The MeanThe Mean--freefree--path problempath problem
•• λλ=1/(=1/(nnσσ) and ) and ρρ = = λλ//vv–– Mean free path of Mean free path of λλ and collision interval of and collision interval of ρρ
•• Where Where nn is the number density, is the number density, σσ is the cross section and is the is the cross section and is the random velocityrandom velocity
–– For our galaxy, For our galaxy, ρρ =10=101919 yearsyears•• σσ = = ππ (2R(2R⊙⊙))2 2 ; R; R⊙⊙ =6.96x10=6.96x101010 cmcm
–– For Vector quantization of histologic data, with use of 30For Vector quantization of histologic data, with use of 30--dimensional vectors or higher orders, the incidence of overlap odimensional vectors or higher orders, the incidence of overlap of f nonnon--homomorphichomomorphic regions is greater then 1 in 256regions is greater then 1 in 2563030 ((1.766x101.766x107272))which allows for unique identification of structural components.which allows for unique identification of structural components.
–– When combined with multivariate Bayesian analysis, the When combined with multivariate Bayesian analysis, the identification profile effectively becomes a fingerprint for identification profile effectively becomes a fingerprint for underlying unique underlying unique histomorphichistomorphic status.status.
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Consequences of VQ representation, in Consequences of VQ representation, in light of light of MaxwellianMaxwellian complexitycomplexity•• If an image can be compresses by six log, If an image can be compresses by six log,
and subsequently restored with minimal and subsequently restored with minimal degradation of diagnostic clarity, is it not degradation of diagnostic clarity, is it not the case that the sum total of “knowledge” the case that the sum total of “knowledge” is similarly contained in the compressed is similarly contained in the compressed data set as at is obviously present in the data set as at is obviously present in the primary and restored data.primary and restored data.
•• Searches carried out upon the compressed Searches carried out upon the compressed data set represent an enormous data set represent an enormous computation opportunity for simplified computation opportunity for simplified query.query.
•• As VQ vectors are structural As VQ vectors are structural homologshomologs of of repeating histologic elements, the query can repeating histologic elements, the query can be carried out by searching for a set of be carried out by searching for a set of recurring vectors in the image set space, recurring vectors in the image set space, using a regionusing a region--ofof--interest source template.interest source template.
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Complexity theory and Complexity theory and HistopathologyHistopathology•• All recurrent and/or structurally selfAll recurrent and/or structurally self--similar patterns in nature exhibit a similar patterns in nature exhibit a
characteristic complexity level.characteristic complexity level.•• Normal histology (two dimensional projections of a fully realizeNormal histology (two dimensional projections of a fully realized threed three--
dimensional structure) exhibits a fingerprint complexity patterndimensional structure) exhibits a fingerprint complexity pattern, which is , which is organorgan--system specific.system specific.
•• Disease states tend to lower complexity number.Disease states tend to lower complexity number.•• The number of vectors required in a generic class vocabulary to The number of vectors required in a generic class vocabulary to fully fully
represent a particular organ system are specific to that organ.represent a particular organ system are specific to that organ.•• It is possible to make generic vocabularies for a given:It is possible to make generic vocabularies for a given:
–– Organ systemOrgan system–– Spectrum of disease manifestationSpectrum of disease manifestation
•• Vocabularies of disparate systems can be pooled together into a Vocabularies of disparate systems can be pooled together into a single single multimulti--use vocabulary.use vocabulary.
•• Consequently, use of vocabulary compression techniques represenConsequently, use of vocabulary compression techniques represents an ts an enormous opportunity for not only compression (VQ) but nonenormous opportunity for not only compression (VQ) but non--directed directed pattern matching.pattern matching.
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Typical 2D Voronoi Projection of N Space Data
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
2D projections of N-space Voronoi Systems: The Voronoi algorithm can similarly be applied to clustered events in N-space. As near-neighbor collisions increase from the completely sparse prototypic case (A) to intermediate density (B) to systems where each cluster contains a significant number of events (C), the overall Voronoi segmentation Hull converges upon an optimal N-space manifold. Determination of inclusion in any given cluster for new test candidate vectors is solely on the basis of the candidate’s N-dimensional Pythagorean distance to the current centroid of the cluster. As the cluster increases its number of constituent events, the centroid may wander or drift in N-space, based upon the statistical bias of new events. Clearly, increased events allows for identification of archetypal centroids for each self-defining cluster. This, in turn, allows for more accurate classification of future candidate vectors.
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Typical Typical VoronoiVoronoi Function Function Convergence on the edge of Convergence on the edge of ComplexityComplexity
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
-2
0
2
2
3
4
5
0
0.25
0.5
0.75
1
-2
0
2
Convergence with increasing Vocabulary Size
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Hypothesized Pathology uses of Hypothesized Pathology uses of Region of Interest based Query Region of Interest based Query
•• Local FeatureLocal Feature--based differential diagnosis based differential diagnosis generationgeneration
•• Assembly of an “album” of similar prior Assembly of an “album” of similar prior archival cases (with associated diagnosis) archival cases (with associated diagnosis) based upon current ROIbased upon current ROI
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
How does this new approach differ How does this new approach differ from traditional image analysisfrom traditional image analysis
•• Conventional Image Conventional Image AnalysisAnalysis–– Algorithms are custom Algorithms are custom
designed for a narrow designed for a narrow recognition taskrecognition task
–– Often requires Often requires customization with customization with expert programmingexpert programming
–– Low tolerance to Low tolerance to variability in source variability in source format format
•• ROIROI--based Query and based Query and ClassificationClassification–– General matching General matching
algorithm suitable for algorithm suitable for all tissue morphologiesall tissue morphologies
–– No endNo end--user user customizationcustomization
–– Designed to improve Designed to improve with increased pool of with increased pool of source imagery (selfsource imagery (self--training)training)
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Derivative Technology:Derivative Technology:ImageImage--Based QueryBased Query--byby--ExampleExample
•• New Class of DatabaseNew Class of Database•• User to select query by generating an imageUser to select query by generating an image--
based ROI (region of interest)based ROI (region of interest)•• ROI is ROI is vectorizedvectorized for comparison with the highly for comparison with the highly
compressed vocabulary library.compressed vocabulary library.•• Similar Images (with associated known Similar Images (with associated known
diagnoses) are returned as a thumbnail gallery.diagnoses) are returned as a thumbnail gallery.•• A differential diagnosis tool is implicitly enabled A differential diagnosis tool is implicitly enabled
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Overall Application Data Flow Overall Application Data Flow ModelModel
Obtain Wide-FieldImage Dataset
(conventional or hyperspectral)
Classify surface area into afully qualified set of
candidate vectors (by V.Q.)
Re-organize vectors intoN-dimensionally clusteredAggregates usingVoronoi space projection
Aggregate data as Bayesianlikelihood clusters, with associatedCase-level or Field-of-interest-leveldiagnoses
Instantiate the above data as anorgan-specific vocabulary (BBN)
Test New regions of interestagainst established vocabulary
clusters
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Typical Resultant Voronoi Class System Clusters as basis functions forBayesian Belief Netorks (BBNs)
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Results of initial training/vocabulary Results of initial training/vocabulary construction and subsequent vocabulary construction and subsequent vocabulary challenge with 20 new caseschallenge with 20 new cases
Organ system Asymptotic Vector PoolLiver 618000Colon 863000
Pancreas 742000Duodenum 817000
Field Selection Diagnostic ConcordenceDiagnostic 0.9
Non-diagnostic 0.277777778
Initial Building
Vocabulary Challenge
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Cumulative Growth of Vocabulary Classes by Organ System
Case Number
2 4 6 8 10 12 14 16 18 20
Num
ber o
f Vec
tors
0.0
5.0e+6
1.0e+7
1.5e+7
2.0e+7
2.5e+7
3.0e+7
3.5e+7
Colon Liver Stomach Esophagus Pancreas Small Intestine
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
Image Matching Speed as a Function of Vocabulary SIze
Number of Vectors within Vocabulary
0 1e+6 2e+6 3e+6 4e+6 5e+6 6e+6 7e+60.0
0.1
0.2
0.3
0.4
0.5
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
SummarySummary
•• Vector Quantization Techniques hold the Vector Quantization Techniques hold the promise to realize a generalpromise to realize a general--utility differential utility differential diagnosis and imagediagnosis and image--based query toolbased query tool
•• Significant work remains with organSignificant work remains with organ--specific specific adjudication of constitutive vectorsadjudication of constitutive vectors
•• Pilot data suggests strong correlation between Pilot data suggests strong correlation between morphology and gene expression data.morphology and gene expression data.
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006
AcknowledgementsAcknowledgements
•• Ronald Tompkins MD, ScD; MGHRonald Tompkins MD, ScD; MGH•• Mehmet Toner, PhD; MGHMehmet Toner, PhD; MGH•• Charles PierceCharles Pierce•• Anastasios Markas, PhD (Atmel Anastasios Markas, PhD (Atmel
Corporation)Corporation)
Lab Infotech SummitLab Infotech Summit2 March 20062 March 2006