Description and retrieval of medical visual information based on language modelling

1

Description and retrieval of medicalvisual information based on languagemodellingAntonio Foncubierta-Rodríguez

Table of contents

Motivation and introduction

Technical contributions

Experiments

Concluding remarks

2

Evolution of medical images

• 1895, Conrad Röntgen discoversX–rays

• Approximately 100 years later:anatomical, functional, motion

• Any aspect can be visualized andquantified

Imagingmodalities

Microscopy

Visible light

Magneticresonance

X–Rays

Nuclearimaging

Ultrasound

5

Use of medical images

GenevaUniversityHospitals

during 2012

Magneticresonance

X–Rays

Nuclearimaging

30,645 CT exams

12,819 MRI exams

1,426 PET exams

30%of worldstoragecapacity* estimation

6

Dimensions of medical images

2D

2D + time

3D

3D + time

3D + other

E.g.: dermatography,radiography, angiography.

E.g.: echography,endoscopy.

E.g.: CT, MRI, PET.

E.g.: functional MRI.

E.g.: Dual Energy CT.

7

Computer Aided Tools

• Multimodal information• Partly annotated• Multidimensional

HOWto make sense?

CAD

CBIR

9

Visual features

High di-mensional

approaches

Shape de-scription

Point–based

Surface–based

Topology–basedFull–

supportdescrip-

tion

Geometry–based

Spectral–based

Statistical& stochas-tic meth-

ods

Videospecificmethods

Low di-mensional

approaches

Spinimages

Silhouettesand depth

images

Slice &frame

analysis

10

Visual similarity

Ii = log 1Pi

• Information:• Specific definition• Low level features

• Similarity• General definition• Higher level concepts (semantic

gap)

11

Bag of visual words

• BoVW aims at shortening the semantic gap• Consists of:

1. Partition a n–dimensional feature space into K disjoint regions2. Measure features at m sampling points of an image3. Assign each sample to one of the K regions4. K–bin histogram is the image descriptor

12

Scientific contributions

Feature ex-traction and

modelling using

BOVW

Multiscaletexture

descriptors

Multiscaleanalysisof ROIs

OptimalVocabu-lary Size

OptimalBag length

Optimal vo-cabulariesin DECT

VocabularyPruning

Languagemodelling

Groundtruth

generation

14

Section outline


Technical contributionsMulti–scale texture descriptionA visual grammarROI detector

Experiments

Concluding remarks

15

Multidimensional description

• 3D models• External structure• Shape analysis• Deformation quantification

• Volumetric images• Internal structure• Pattern analysis• Early stage detection

17

Multi–scale texture description

TextureThe feel, appearance or consistency of a surface or a substance.

— Oxford Dictionaries

Texture contains important information about the structuralarrangement of surfaces and their relationship to thesurrounding environment.

— Haralick et al.

18

Wavelet analysis

Wavelet analysis

ψs,τ(t) =1p

sψ

�

t − τ

s

�

Ψs,τ(ω) =1p

s|s|Ψ (sω) e−jωτ

• ψ(t) must be zero mean• Ψ(ω) is a bandpass filter• Finite set of scale parameters s

• Scaling function ϕ(t) used tocover the low frequencies

19

Wavelet analysis: filterbanks

0 ω0

|Ψs(ω)|

←− B −→← B2 →

s = 1s = 2s = 4

20

Isotropic wavelet analysis• Gaussian–based functions to analyze isotropic image texture• Difference of Gaussians is an approximation to Laplacian of Gaussians

(Mexican Hat)

Difference of Gaussians

gσ(x) =1

σxσyσz

Æ

(2π)3e−�

(xδx)2

2σ2x

+(yδy)

2

2σ2y

+(zδz)2

2σ2z

�

ψj(x) = gσ1(x)− gσ2(x)

σ2 = 1.6σ1

21

Riesz transform

• Multidimensional extension of the Hilbert transform• Steerable

Nth order 3D Riesz transform

ÛR(n1,n2,n3)f (ω) =

√

√

√

n1 + n2 + n3

n1!n2!n3!

(−jω1)n1 (−jω2)n2 (−jω3)n3

||ω||n1+n2+n3f̂ (ω)

for all combinations of (n1,n2,n3) with n1 + n2 + n3 = N and n1,2,3 ∈ N.�N+2

2

�

templatesR(n1,n2,n3)

22

Riesz filterbanks

• Multiscale• Steerable bandpass

filters• Fourier domain

23

Beyond bag of visual words

• Widely used• Strong performance variation• Clustering:

• Large clusters, small vocabularies• Small clusters, large vocabularies

Languagemodelling of

BOVW

VocabularySize

Meaning

Wordto wordrelations

25

From words to grammar

GrammarThe whole system and structure of a language or of languages ingeneral, usually taken as consisting of syntax and morphology(including inflections) and sometimes also phonology and semantics.

— Oxford Dictionaries

26

From words to grammar

xx

xx

xx

x

x

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

xxxx

x

x

xx xx

x

x

x

x

xx

x

x

x

xx

x xx

x

x

x

x

xx

x

xx

x

x

xxx

x

xx xx xx

xxx

x

xx

xx

xx

x

x

xx

x

x x

x

x

xx

xx

x

x

x x

x

x xx

xxx xx

x

x

x

xx

x

x

x

x

xx

x

x

x

x

x

x

xx

x x

xx

x

x

x

x

xx

xx

x

x

x

x

xx

x

x

xx

x xx

x

x

x

x

xxx

x

x

x

xx

x x

x

x

xx

xxx

xx

x

x

x

x

xx

x

x

x

VisualGrammar

Meaning

Synonymy

Polysemy

27

Visual topics

PLSA–based definitionA visual topic is an unobserved or latent variable z ∈ Z =

�

z1, . . . , zNZ

so that theprobability of observing the word wn in the visual instance Ii:

P(wn, Ii) =

NZ∑

j=1

P(wn|zj)P(zj|Ii).

P(zj|Ii) P(wn|zj)

WNW×NZ

image topics visual words

28

The word-topic matrix

WNW×NZ =

P(w1|z1) · · · P(w1|zNZ )P(w2|z1) · · · P(w2|zNZ )

.... . .

...P(wNW |z1) · · · P(wNW |zNZ )

→

t1,1 · · · t1,NZt2,1 · · · t2,NZ

.... . .

...tNW ,1 · · · tNW ,NZ

• Rows: relevant topics for a word

• Columns: relevant words for a topic

• Use the ratio of words as a topic–based significance tn,j:

• tn,j = 1→ the most significant word for the topic• tn,j = 1/NW → the least significant word for the topic

29

Visual meaningfulness

DefinitionThe visual meaningfulness of a visual word wn is its maximum topic–basedsignificance level:

mn =

�

maxj�

tn,j

if maxj�

tn,j

≥ Tmeaning0 otherwise

• Words below the meaningfulness threshold can be truncated.

30

Meaningfulness transformation

Definition

h = (n(w1),n(w2), . . . ,n(wNW ))T

M =

m1 0 · · · 00 m2 · · · 0...

.... . .

...0 0 · · · mNW

hM = Mhn(wM

i) = mi · n(wi)

31

Word to word relations

Example

• A single class might have severalvisual appearances

• Several classes might partiallyshare the visual appearance

• Two visual words with the samemeaning, belong to differenthistogram bins and cannot becompared

• Identifying synonymy allows tocompare these words

bimodal class

partially shared

appearance

word 1

word 2

word 3

word 4

word 5

32

Synonymy graphs

• Word 3 is partially linked to words1 and 2

• Words 4 and 5 are also linked

word 1

word 2

word 3word 4

word 5

33

Visual synonymy

DefinitionA pair of visual words wn,wm can be considered synonyms if the followingthree conditions are met:1. There is at least one visual topic zj to which both wn and wm belong.2. wn and wm have a similar contextual distribution with the rest of the

words.3. wn and wm have a complementary distribution in the collection.

34

Synonymy value

DefinitionThe synonymy value of two words wn,wm is the maximum significance valuefor which both words are significant for the same visual topic.

t1,1 · · · t1,NZ

t2,1 · · · t2,NZ...

. . ....

tNW ,1 · · · tNW ,NZ

σnm = σmn = maxj

§

minn,m

�

tn,j, tm,j

ª

35

Synonymy transformation

Definition

S =

1 s12 · · · s1NW

s21 1 · · · s2NW...

.... . .

...sNW1 sNW2 · · · 1

, sij = sji =

1 if i = jσij if wi,wj are synonyms0 otherwise

Transformed histogram:

hS = Sh; n(wSi

) = n(wi) +∑

i 6=j

sijn(wj)

36

Word ambiguity and dimensionality

• Some visual words are sourcesof ambiguity if they relate tovarious appearances

• Their presence in the histogram isnot discriminative

• Possible solution: identifypolysemy and reduce theirweight

topic A topic B

37

Visual polysemy

DefinitionA visual word wn is polysemic in strict sense if all the following conditions aremet:1. wn if there are at least two visual topics zj, zk to which the visual word

belongs (wide sense polysemy)2. There is a visual word wm, which is a synonym of wn and belongs to the

topic zj

3. There is a visual word wl, which is a synonym of wn and belongs to thetopic zj

4. wm,wl are not synonyms

38

Polysemy threshold

DefinitionThe polysemy threshold of a visual word wn, Tn

polysemy, is the largest valuethat satisfies that there are at least two topics for which the word issignificant above the threshold:

t1,1 · · · t1,NZ

t2,1 · · · t2,NZ...

. . ....

tNW ,1 · · · tNW ,NZ

�

�

�

¦

tn,j ≥ Tnpolysemy

©

�

�

� ≥ 2;∀j = 1, . . . ,NZ

39

Polysemy transformation

Definition

P =

p1 0 · · · 00 p2 · · · 0...

.... . .

...0 0 · · · pNW

; pi = 1− T ipolysemy

Transformed histogram:

hP = Ph; n(wPi

) = pi · n(wi)

40

Grammatical similarity

VisualGrammar

Meaning

Synonymy

Polysemy

vocabularypruning

bin to binweighting

vocabularyweighting

simgram(Ii, Ij) =(S ·P ·M ·hi)

T · (S ·P ·M ·hj)

(S ·P ·M ·hi)

·

(S ·P ·M ·hj)

41

Section outline


Technical contributionsMulti–scale texture descriptionA visual grammarROI detector

Experiments

Concluding remarks

42

Local analysis

• Medical images contain largeamounts of information

• Abnormalities and clinicallyrelevant patterns occur only inreduced regions of interest

• Local context description :• Dense sampling• Keypoint–based analysis

43

Geodesic detection of regional extrema

1. Multi–scale difference ofGaussians relates to saliency

2. Use geodesic operations toobtain regional extrema:2.1 Fill hole / grind peak2.2 Substract from the original DoG

image2.3 Label each fully connected

component larger than astructuring element.

44

Section outline



ExperimentsTexture analysis of 2D lung CTTexture analysis of 3D brain MRITexture analysis of 4D lung CTTexture analysis of 4D lung CT using ROIsVisual grammar for description of 2D imagesVisual grammar for description of 3D medical images

Concluding remarks

45

Texture analysis of 2D lung CT

• Interstitial lung diseases• TALISMAN dataset acquired at

Geneva University Hospitals• 90 HRCT scans from 85 patients• 1679 annotated regions• 6 classes

• fibrosis• ground glass• emphysema• micronodules• healthy tissue• consolidation

47


k-means

clustering

visual vocabulary

word-1 = (f11,f12,...,f1N)

word-2= (f21,f22,...,f2N)

...

word-k= (fk1,fk2,...,fkN)

4 scales

Wavelet

Transform

Energy of

Coe cients

Dataset

Histogram of

visual words

for each region

k-dimensional

discrete feature space

48


• Optimal number of visualwords between 100 and300

• Overall performancedecreases with largervocabularies

Keep only meaningful words0 50 100 150 200 250 300 350 400 450 500

20

30

40

50

60

70

80

Number of Visual Words

P@

1 (

%)

ConsolidationEmphysemaFibrosisGround GlassHealthyMicronodulesGeometric mean

49

Texture analysis of 3D brain MRI

• Texture–based segmentation of thecerebellum

• IBSR dataset provided by MGH.• MRI from 18 adult subjects• Manual segmentations

• Cerebellum cortex• Cerebellum white matter

51

Texture analysis of 3D brain MRITraining Set

Testing Set

Histogram

Equalization

5 Scales

DoG 3D Wavelet

k-means

Clustering

NxNxN block

visual words

histogram

Nearest Neighbor

Search

Visual word

assignment

Visual Words

Histograms

Feature Space

Training Set

Training Set

Training Set

Testing Set

PREPROCESSING

FEATURE EXTRACTION

CLASSIFICATION

Visual

Vocabulary

52

Texture analysis of 3D brain MRI

• Performance improves with largerblock sizes

• Rest of brain• Cerbellum cortex

• Performance does not improve• Cerebellum white matter

Data–driven regions of interest

53


• Pulmonary embolism retrieval• Dual Energy CT dataset acquired

at Geneva University Hospitals• 25 patients• 4D data

• x,y,z• Energy level of acquisition

• Ground truth• Severity (Qanadli index)• Lobe based

55


k-means

clustering

55-dimensional

continuous feature space

visual vocabulary

word-1 = (f11,f12,...,f1N)

word-2= (f21,f22,...,f2N)

...

word-k= (fk1,fk2,...,fkN)

voxeli = closest word

5 scales

Wavelet

Transform

Energy of

Coefficients

. . .

Energy level 1 Energy level 11

Histogram of

visual words

for each lobe

voxeli = (fi1,fi2,..,fiN)

Lung lobes mask

1

54

3

2

k-dimensional

discrete feature space

56

Texture analysis of 4D lung CT• Performance improves with

4D data• 63% for P@1• 62% for P@5• 60% for P@10

• Optimal configuration• 2 scales, 100–150 words

• Intensive computation• High dimensional feature

space

Analyze only part of the data:ROIs and meaningful words.

Words Scales P@1(%) P@5(%) P@10(%)

50 1 55 56 56100 1 58 55 57150 1 58 56 56

50 2 62 58 55100 2 62 62 60150 2 63 62 60

50 3 58 54 55100 3 60 59 58150 3 57 62 58

50 5 45 52 51100 5 57 52 51150 5 58 52 52

57

Texture analysis of 4D lung CT using ROIs

• Pulmonary embolism detection• Improvements over previous

approaches• ROI–based analysis• Optimal combination of

energy–based vocabularies

59

Texture analysis of 4D lung CT using ROIs

• Improvements in performance• Optimal combination of

energy–based vocabularies• Multi–scale regions of interest

Finer–grain analysis of significantwords and synergies among them

Lobe DECT Words Energy levels SECT

LR 84 % 5 (50,130) 52 %LL 84 % 5 (100,140) 48 %MR 80 % 5 (40,50,130,140) 52 %UL 76 % 25 (40,70,80,90) 60 %UR 80 % 25 (90,120) 56 %

60

Visual grammar for description of 2D images

• Classification and retrieval ofimages from the biomedicalliterature

• ImageCLEFmed modalityclassification task

• 1000 training and 1000 testimages

• 31 hierarchical categories

62


• SIFT–based visual vocabularies• Varying number of visual topics

from 25 to 350 in steps of 25• Varying meaningfulness threshold

from 50% to 100%

63


• Statistically significantimprovement over state-of-the artbaseline

• Vocabulary reductions withouteffect on the accuracy

• Up to 20% of the originalvocabulary size

Analyze synonymy relations amongmultiple vocabularies

0 50 100 150 200 250 300 350 400 450 50020

25

30

35

40

45

50

55

60

65

Effective number of visual words

Cla

ssific

ation a

ccura

cy (

%)

Baseline Grammar Statistical significance threshold

64

Visual grammar for description of 3D medical images

• Organ identification task• VISCERAL dataset• Full body CT scans

• 15 Contrast–enhanced• 15 Not enhanced• 10 anatomical structures, 8 classes

66


• Riesz–based texture features• 3 scales• Riesz order 2.

• Organ–specific vocabularies• 1000 random samples within the organ• 20 visual words per organ

• Visual Grammar transformation

67


• Good results for organidentification

• Reduction of vocabularysize with respect tobaseline without visualgrammar

68


• Good results for organidentification

• Reduction of vocabularysize with respect tobaseline without visualgrammar

0 20 40 60 80 100 120 140 160 180 20010

20

30

40

50

60

70

80

Vocabulary Size

Cla

ssific

atio

n A

ccu

racy (

%)

68

Section outline



Experiments

Concluding remarks

69

Conclusions

Feature ex-traction and

modelling using

BOVW

Multiscaletexture

descriptors

Multiscaleanalysisof ROIs

OptimalVocabu-lary Size

OptimalBag length

Optimal vo-cabulariesin DECT

VocabularyPruning

Languagemodelling

Evaluation of DoG and Riesz Wavelets and BOVW

Data–driven ROI forlocal analysis of lungtexture

Optimal vocabularysize by learninginformative words

BOVW need to coveranatomicallymeaningful areas

Specific vocabularies,combined, providebetter insight intopatterns

Removal of wordsusing languagemodelling, does notimpact accuracy

Visual Grammartransformationsimprove accuracy andreduce descriptor size

70

Shortcomings

• Visual grammar model is slow to train for large vocabularies, synonymyrequires further restrictions (sparsity)

• Semantics is covered, but there’s other aspects that can still be explored• Variations of a visual word (morphology)• Combination rules of words in proximity (syntax)

• Bag of visual words has evolved into VLAD and Fisher Vectors, which insome aspects are more robust.

71

Future work

• Extend the visual grammar evaluation• Extend the visual grammar to cover various languages

• Synergies between isotropic and steerable texture descriptors• Synergies between text and visual description• Synergies between color and texture description

• Extend the language modelling to identify• Paradigmatic relations• Absence of visual words

72

Questions

73

Description and retrieval of medical visual information based on language modelling

Presentations & Public Speaking

Transcript of Description and retrieval of medical visual information based on language modelling