Description and retrieval of medical visual information based on language modelling
-
Upload
antonio-foncubierta-rodriguez -
Category
Presentations & Public Speaking
-
view
127 -
download
0
Transcript of Description and retrieval of medical visual information based on language modelling
1
Description and retrieval of medicalvisual information based on languagemodellingAntonio Foncubierta-Rodríguez
Table of contents
Motivation and introduction
Technical contributions
Experiments
Concluding remarks
2
Evolution of medical images
• 1895, Conrad Röntgen discoversX–rays
• Approximately 100 years later:anatomical, functional, motion
• Any aspect can be visualized andquantified
Imagingmodalities
Microscopy
Visible light
Magneticresonance
X–Rays
Nuclearimaging
Ultrasound
5
Use of medical images
GenevaUniversityHospitals
during 2012
Magneticresonance
X–Rays
Nuclearimaging
30,645 CT exams
12,819 MRI exams
1,426 PET exams
30%of worldstoragecapacity* estimation
6
Dimensions of medical images
2D
2D + time
3D
3D + time
3D + other
E.g.: dermatography,radiography, angiography.
E.g.: echography,endoscopy.
E.g.: CT, MRI, PET.
E.g.: functional MRI.
E.g.: Dual Energy CT.
7
Computer Aided Tools
• Multimodal information• Partly annotated• Multidimensional
HOWto make sense?
CAD
CBIR
9
Visual features
High di-mensional
approaches
Shape de-scription
Point–based
Surface–based
Topology–basedFull–
supportdescrip-
tion
Geometry–based
Spectral–based
Statistical& stochas-tic meth-
ods
Videospecificmethods
Low di-mensional
approaches
Spinimages
Silhouettesand depth
images
Slice &frame
analysis
10
Visual similarity
Ii = log 1Pi
• Information:• Specific definition• Low level features
• Similarity• General definition• Higher level concepts (semantic
gap)
11
Bag of visual words
• BoVW aims at shortening the semantic gap• Consists of:
1. Partition a n–dimensional feature space into K disjoint regions2. Measure features at m sampling points of an image3. Assign each sample to one of the K regions4. K–bin histogram is the image descriptor
12
Scientific contributions
Feature ex-traction and
modelling using
BOVW
Multiscaletexture
descriptors
Multiscaleanalysisof ROIs
OptimalVocabu-lary Size
OptimalBag length
Optimal vo-cabulariesin DECT
VocabularyPruning
Languagemodelling
Groundtruth
generation
14
Section outline
Motivation and introduction
Technical contributionsMulti–scale texture descriptionA visual grammarROI detector
Experiments
Concluding remarks
15
Multidimensional description
• 3D models• External structure• Shape analysis• Deformation quantification
• Volumetric images• Internal structure• Pattern analysis• Early stage detection
17
Multi–scale texture description
TextureThe feel, appearance or consistency of a surface or a substance.
— Oxford Dictionaries
Texture contains important information about the structuralarrangement of surfaces and their relationship to thesurrounding environment.
— Haralick et al.
18
Wavelet analysis
Wavelet analysis
ψs,τ(t) =1p
sψ
�
t − τ
s
�
Ψs,τ(ω) =1p
s|s|Ψ (sω) e−jωτ
• ψ(t) must be zero mean• Ψ(ω) is a bandpass filter• Finite set of scale parameters s
• Scaling function ϕ(t) used tocover the low frequencies
19
Wavelet analysis: filterbanks
0 ω0
|Ψs(ω)|
←− B −→← B2 →
s = 1s = 2s = 4
20
Isotropic wavelet analysis• Gaussian–based functions to analyze isotropic image texture• Difference of Gaussians is an approximation to Laplacian of Gaussians
(Mexican Hat)
Difference of Gaussians
gσ(x) =1
σxσyσz
Æ
(2π)3e−�
(xδx)2
2σ2x
+(yδy)
2
2σ2y
+(zδz)2
2σ2z
�
ψj(x) = gσ1(x)− gσ2(x)
σ2 = 1.6σ1
21
Riesz transform
• Multidimensional extension of the Hilbert transform• Steerable
Nth order 3D Riesz transform
ÛR(n1,n2,n3)f (ω) =
√
√
√
n1 + n2 + n3
n1!n2!n3!
(−jω1)n1 (−jω2)n2 (−jω3)n3
||ω||n1+n2+n3f̂ (ω)
for all combinations of (n1,n2,n3) with n1 + n2 + n3 = N and n1,2,3 ∈ N.�N+2
2
�
templatesR(n1,n2,n3)
22
Riesz filterbanks
• Multiscale• Steerable bandpass
filters• Fourier domain
23
Beyond bag of visual words
• Widely used• Strong performance variation• Clustering:
• Large clusters, small vocabularies• Small clusters, large vocabularies
Languagemodelling of
BOVW
VocabularySize
Meaning
Wordto wordrelations
25
From words to grammar
GrammarThe whole system and structure of a language or of languages ingeneral, usually taken as consisting of syntax and morphology(including inflections) and sometimes also phonology and semantics.
— Oxford Dictionaries
26
From words to grammar
xx
xx
xx
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
xxxx
x
x
xx xx
x
x
x
x
xx
x
x
x
xx
x xx
x
x
x
x
xx
x
xx
x
x
xxx
x
xx xx xx
xxx
x
xx
xx
xx
x
x
xx
x
x x
x
x
xx
xx
x
x
x x
x
x xx
xxx xx
x
x
x
xx
x
x
x
x
xx
x
x
x
x
x
x
xx
x x
xx
x
x
x
x
xx
xx
x
x
x
x
xx
x
x
xx
x xx
x
x
x
x
xxx
x
x
x
xx
x x
x
x
xx
xxx
xx
x
x
x
x
xx
x
x
x
VisualGrammar
Meaning
Synonymy
Polysemy
27
Visual topics
PLSA–based definitionA visual topic is an unobserved or latent variable z ∈ Z =
�
z1, . . . , zNZ
so that theprobability of observing the word wn in the visual instance Ii:
P(wn, Ii) =
NZ∑
j=1
P(wn|zj)P(zj|Ii).
P(zj|Ii) P(wn|zj)
WNW×NZ
image topics visual words
28
The word-topic matrix
WNW×NZ =
P(w1|z1) · · · P(w1|zNZ )P(w2|z1) · · · P(w2|zNZ )
.... . .
...P(wNW |z1) · · · P(wNW |zNZ )
→
t1,1 · · · t1,NZt2,1 · · · t2,NZ
.... . .
...tNW ,1 · · · tNW ,NZ
• Rows: relevant topics for a word
• Columns: relevant words for a topic
• Use the ratio of words as a topic–based significance tn,j:
• tn,j = 1→ the most significant word for the topic• tn,j = 1/NW → the least significant word for the topic
29
Visual meaningfulness
DefinitionThe visual meaningfulness of a visual word wn is its maximum topic–basedsignificance level:
mn =
�
maxj�
tn,j
if maxj�
tn,j
≥ Tmeaning0 otherwise
• Words below the meaningfulness threshold can be truncated.
30
Meaningfulness transformation
Definition
h = (n(w1),n(w2), . . . ,n(wNW ))T
M =
m1 0 · · · 00 m2 · · · 0...
.... . .
...0 0 · · · mNW
hM = Mhn(wM
i) = mi · n(wi)
31
Word to word relations
Example
• A single class might have severalvisual appearances
• Several classes might partiallyshare the visual appearance
• Two visual words with the samemeaning, belong to differenthistogram bins and cannot becompared
• Identifying synonymy allows tocompare these words
bimodal class
partially shared
appearance
word 1
word 2
word 3
word 4
word 5
32
Synonymy graphs
• Word 3 is partially linked to words1 and 2
• Words 4 and 5 are also linked
word 1
word 2
word 3word 4
word 5
33
Visual synonymy
DefinitionA pair of visual words wn,wm can be considered synonyms if the followingthree conditions are met:1. There is at least one visual topic zj to which both wn and wm belong.2. wn and wm have a similar contextual distribution with the rest of the
words.3. wn and wm have a complementary distribution in the collection.
34
Synonymy value
DefinitionThe synonymy value of two words wn,wm is the maximum significance valuefor which both words are significant for the same visual topic.
t1,1 · · · t1,NZ
t2,1 · · · t2,NZ...
. . ....
tNW ,1 · · · tNW ,NZ
σnm = σmn = maxj
§
minn,m
�
tn,j, tm,j
ª
35
Synonymy transformation
Definition
S =
1 s12 · · · s1NW
s21 1 · · · s2NW...
.... . .
...sNW1 sNW2 · · · 1
, sij = sji =
1 if i = jσij if wi,wj are synonyms0 otherwise
Transformed histogram:
hS = Sh; n(wSi
) = n(wi) +∑
i 6=j
sijn(wj)
36
Word ambiguity and dimensionality
• Some visual words are sourcesof ambiguity if they relate tovarious appearances
• Their presence in the histogram isnot discriminative
• Possible solution: identifypolysemy and reduce theirweight
topic A topic B
37
Visual polysemy
DefinitionA visual word wn is polysemic in strict sense if all the following conditions aremet:1. wn if there are at least two visual topics zj, zk to which the visual word
belongs (wide sense polysemy)2. There is a visual word wm, which is a synonym of wn and belongs to the
topic zj
3. There is a visual word wl, which is a synonym of wn and belongs to thetopic zj
4. wm,wl are not synonyms
38
Polysemy threshold
DefinitionThe polysemy threshold of a visual word wn, Tn
polysemy, is the largest valuethat satisfies that there are at least two topics for which the word issignificant above the threshold:
t1,1 · · · t1,NZ
t2,1 · · · t2,NZ...
. . ....
tNW ,1 · · · tNW ,NZ
�
�
�
¦
tn,j ≥ Tnpolysemy
©
�
�
� ≥ 2;∀j = 1, . . . ,NZ
39
Polysemy transformation
Definition
P =
p1 0 · · · 00 p2 · · · 0...
.... . .
...0 0 · · · pNW
; pi = 1− T ipolysemy
Transformed histogram:
hP = Ph; n(wPi
) = pi · n(wi)
40
Grammatical similarity
VisualGrammar
Meaning
Synonymy
Polysemy
vocabularypruning
bin to binweighting
vocabularyweighting
simgram(Ii, Ij) =(S ·P ·M ·hi)
T · (S ·P ·M ·hj)
(S ·P ·M ·hi)
·
(S ·P ·M ·hj)
41
Section outline
Motivation and introduction
Technical contributionsMulti–scale texture descriptionA visual grammarROI detector
Experiments
Concluding remarks
42
Local analysis
• Medical images contain largeamounts of information
• Abnormalities and clinicallyrelevant patterns occur only inreduced regions of interest
• Local context description :• Dense sampling• Keypoint–based analysis
43
Geodesic detection of regional extrema
1. Multi–scale difference ofGaussians relates to saliency
2. Use geodesic operations toobtain regional extrema:2.1 Fill hole / grind peak2.2 Substract from the original DoG
image2.3 Label each fully connected
component larger than astructuring element.
44
Section outline
Motivation and introduction
Technical contributions
ExperimentsTexture analysis of 2D lung CTTexture analysis of 3D brain MRITexture analysis of 4D lung CTTexture analysis of 4D lung CT using ROIsVisual grammar for description of 2D imagesVisual grammar for description of 3D medical images
Concluding remarks
45
Texture analysis of 2D lung CT
• Interstitial lung diseases• TALISMAN dataset acquired at
Geneva University Hospitals• 90 HRCT scans from 85 patients• 1679 annotated regions• 6 classes
• fibrosis• ground glass• emphysema• micronodules• healthy tissue• consolidation
47
Texture analysis of 2D lung CT
k-means
clustering
visual vocabulary
word-1 = (f11,f12,...,f1N)
word-2= (f21,f22,...,f2N)
...
word-k= (fk1,fk2,...,fkN)
4 scales
Wavelet
Transform
Energy of
Coe cients
Dataset
Histogram of
visual words
for each region
k-dimensional
discrete feature space
48
Texture analysis of 2D lung CT
• Optimal number of visualwords between 100 and300
• Overall performancedecreases with largervocabularies
Keep only meaningful words0 50 100 150 200 250 300 350 400 450 500
20
30
40
50
60
70
80
Number of Visual Words
P@
1 (
%)
ConsolidationEmphysemaFibrosisGround GlassHealthyMicronodulesGeometric mean
49
Texture analysis of 3D brain MRI
• Texture–based segmentation of thecerebellum
• IBSR dataset provided by MGH.• MRI from 18 adult subjects• Manual segmentations
• Cerebellum cortex• Cerebellum white matter
51
Texture analysis of 3D brain MRITraining Set
Testing Set
Histogram
Equalization
5 Scales
DoG 3D Wavelet
k-means
Clustering
NxNxN block
visual words
histogram
Nearest Neighbor
Search
Visual word
assignment
Visual Words
Histograms
Feature Space
Training Set
Training Set
Training Set
Testing Set
PREPROCESSING
FEATURE EXTRACTION
CLASSIFICATION
Visual
Vocabulary
52
Texture analysis of 3D brain MRI
• Performance improves with largerblock sizes
• Rest of brain• Cerbellum cortex
• Performance does not improve• Cerebellum white matter
Data–driven regions of interest
53
Texture analysis of 4D lung CT
• Pulmonary embolism retrieval• Dual Energy CT dataset acquired
at Geneva University Hospitals• 25 patients• 4D data
• x,y,z• Energy level of acquisition
• Ground truth• Severity (Qanadli index)• Lobe based
55
Texture analysis of 4D lung CT
k-means
clustering
55-dimensional
continuous feature space
visual vocabulary
word-1 = (f11,f12,...,f1N)
word-2= (f21,f22,...,f2N)
...
word-k= (fk1,fk2,...,fkN)
voxeli = closest word
5 scales
Wavelet
Transform
Energy of
Coefficients
. . .
Energy level 1 Energy level 11
Histogram of
visual words
for each lobe
voxeli = (fi1,fi2,..,fiN)
Lung lobes mask
1
54
3
2
k-dimensional
discrete feature space
56
Texture analysis of 4D lung CT• Performance improves with
4D data• 63% for P@1• 62% for P@5• 60% for P@10
• Optimal configuration• 2 scales, 100–150 words
• Intensive computation• High dimensional feature
space
Analyze only part of the data:ROIs and meaningful words.
Words Scales P@1(%) P@5(%) P@10(%)
50 1 55 56 56100 1 58 55 57150 1 58 56 56
50 2 62 58 55100 2 62 62 60150 2 63 62 60
50 3 58 54 55100 3 60 59 58150 3 57 62 58
50 5 45 52 51100 5 57 52 51150 5 58 52 52
57
Texture analysis of 4D lung CT using ROIs
• Pulmonary embolism detection• Improvements over previous
approaches• ROI–based analysis• Optimal combination of
energy–based vocabularies
59
Texture analysis of 4D lung CT using ROIs
• Improvements in performance• Optimal combination of
energy–based vocabularies• Multi–scale regions of interest
Finer–grain analysis of significantwords and synergies among them
Lobe DECT Words Energy levels SECT
LR 84 % 5 (50,130) 52 %LL 84 % 5 (100,140) 48 %MR 80 % 5 (40,50,130,140) 52 %UL 76 % 25 (40,70,80,90) 60 %UR 80 % 25 (90,120) 56 %
60
Visual grammar for description of 2D images
• Classification and retrieval ofimages from the biomedicalliterature
• ImageCLEFmed modalityclassification task
• 1000 training and 1000 testimages
• 31 hierarchical categories
62
Visual grammar for description of 2D images
• SIFT–based visual vocabularies• Varying number of visual topics
from 25 to 350 in steps of 25• Varying meaningfulness threshold
from 50% to 100%
63
Visual grammar for description of 2D images
• Statistically significantimprovement over state-of-the artbaseline
• Vocabulary reductions withouteffect on the accuracy
• Up to 20% of the originalvocabulary size
Analyze synonymy relations amongmultiple vocabularies
0 50 100 150 200 250 300 350 400 450 50020
25
30
35
40
45
50
55
60
65
Effective number of visual words
Cla
ssific
ation a
ccura
cy (
%)
Baseline Grammar Statistical significance threshold
64
Visual grammar for description of 3D medical images
• Organ identification task• VISCERAL dataset• Full body CT scans
• 15 Contrast–enhanced• 15 Not enhanced• 10 anatomical structures, 8 classes
66
Visual grammar for description of 3D medical images
• Riesz–based texture features• 3 scales• Riesz order 2.
• Organ–specific vocabularies• 1000 random samples within the organ• 20 visual words per organ
• Visual Grammar transformation
67
Visual grammar for description of 3D medical images
• Good results for organidentification
• Reduction of vocabularysize with respect tobaseline without visualgrammar
68
Visual grammar for description of 3D medical images
• Good results for organidentification
• Reduction of vocabularysize with respect tobaseline without visualgrammar
0 20 40 60 80 100 120 140 160 180 20010
20
30
40
50
60
70
80
Vocabulary Size
Cla
ssific
atio
n A
ccu
racy (
%)
68
Section outline
Motivation and introduction
Technical contributions
Experiments
Concluding remarks
69
Conclusions
Feature ex-traction and
modelling using
BOVW
Multiscaletexture
descriptors
Multiscaleanalysisof ROIs
OptimalVocabu-lary Size
OptimalBag length
Optimal vo-cabulariesin DECT
VocabularyPruning
Languagemodelling
Evaluation of DoG and Riesz Wavelets and BOVW
Data–driven ROI forlocal analysis of lungtexture
Optimal vocabularysize by learninginformative words
BOVW need to coveranatomicallymeaningful areas
Specific vocabularies,combined, providebetter insight intopatterns
Removal of wordsusing languagemodelling, does notimpact accuracy
Visual Grammartransformationsimprove accuracy andreduce descriptor size
70
Shortcomings
• Visual grammar model is slow to train for large vocabularies, synonymyrequires further restrictions (sparsity)
• Semantics is covered, but there’s other aspects that can still be explored• Variations of a visual word (morphology)• Combination rules of words in proximity (syntax)
• Bag of visual words has evolved into VLAD and Fisher Vectors, which insome aspects are more robust.
71
Future work
• Extend the visual grammar evaluation• Extend the visual grammar to cover various languages
• Synergies between isotropic and steerable texture descriptors• Synergies between text and visual description• Synergies between color and texture description
• Extend the language modelling to identify• Paradigmatic relations• Absence of visual words
72
Questions
73