An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC...

38
An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21 2009
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC...

Page 1: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

An Investigation into the Relationship between Semantic and Content Based Similarity

Using LIDC

Grace Dasovich

Robert Kim

Midterm Presentation

August 21 2009

Page 2: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

OutlineOutline

• Related Work

• Data

• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression

• Conclusions

• Future Work

Page 3: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

• Computer-Aided Diagnosis (CADx) based on low-level image features– Armato et al. developed a linear discriminant

classifier using features of lung nodules– Need to find the relationship between the

image features and radiologists’ ratings

Related Work

Page 4: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

• Image features and the semantic ratings– Lung Interpretations

• Barb et al. developed Evolutionary System for Semantic Exchange of Information in Collaborative Environments (ESSENCE)

• Raicu et al. used ensemble classifiers and decision trees to predict semantic ratings

• Samala et al. used several combinations of image features and the radiologists’ ratings to classify nodules

Related Work

Page 5: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

– Similarity• Li et al. investigated four different methods to

compute similarity measures for lung nodules– Feature-based– Pixel-value-difference– Cross correlation– ANN

Related Work

Page 6: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Materials

• LIDC Dataset

• 149 Unique Nodules– One slice per nodule, largest nodule area

• 9 Semantic Characteristics– Calcification and Internal Structure had little

variation, thus were not used

• 64 Content Features– Shape, size, intensity, and texture

6

Data

Page 7: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

• Related Work

• Data

• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression

• Conclusions

• Future Work

Outline

Page 8: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

• Cosine Similarity

• Jeffrey Divergence

• Euclidean Distance

Similarity Measures

Page 9: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Similarity Measures

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

Euclidean Distance

Co

sin

e S

imila

rity

Page 10: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

3.5

4

Euclidean Distance

Jeff

rey

Div

erg

en

ce

Similarity Measures

Page 11: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

• Computed feature distance measures

Similarity Measures

Page 12: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

OutlineOutline

• Related Work

• Data

• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression

• Conclusions

• Future Work

Page 13: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

• Two three-layer ANNs – Input (64 neurons), hidden layer (5 neurons), output

(1)– Input (64 neurons), hidden layer (5 neurons), output

(7)

• Input = 64 feature distances• Output = Semantic similarity or difference in

semantic ratings• Hyperbolic tangent function, backpropagation

algorithm, 200 iterations

Methods

Page 14: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

• ANN with a single output– 640 random pairs from all 109 nodules– 231 pairs from nodules with malignancy > 3– 496 pairs from nodules with area > 122 mm2

Methods

Page 15: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Methods

• ANN with seven outputs– 640 random pairs from all 109 nodules

Page 16: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

• Leave-one-out method– Cosine similarity or Jeffrey divergence or

difference in Semantic ratings used as teaching data

– An ANN trained with entire dataset minus one image pair

– The pair left out used for testing– Correlation between calculated radiologists’

similarity and ANN output calculated

Methods

Page 17: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

• ANN with a single output– 640 random pairs from all 109 nodules– 231 pairs from nodules with malignancy > 3– 496 pairs from nodules with area > 122 mm2

• ANN with seven outputs– 640 random pairs from all 109 nodules

Methods

Page 18: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

• ANN using 640 random pairs

Results

Page 19: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

• ANN using 231 pairs with malignancy rating > 3

Results

Page 20: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

• ANN using 496 pairs with area > 122 mm2

Results

Page 21: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

• ANN output vs. target values using Jeffrey divergence for the 640 pairs (r = 0.438)

Results

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Output

Ta

rge

t

Page 22: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

• ANN using random 640 pairs and the Jeffrey divergence with seven semantic ratings

Results

Page 23: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

OutlineOutline

• Related Work

• Data

• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression

• Conclusions

• Future Work

Page 24: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Methods

• Normalization of Features– Min-Max Technique – Z-Score Technique

• Pair Selection– Looked for matches between k number of

most similar images based on semantic and content

24

Methods

Page 25: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Methods

• Multivariate Regression Analysis– Select features with highest correlation

coefficients

– Feature distance measures

25

Methods

Page 26: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

• Nodule Analysis– Determine differences between selected and

non-selected nodules– Define requirements for our model

Methods

Page 27: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Results

27

Results

0 2 4 6 8 10 12 14 16 18 200

0.5

1

Cor

rela

tion

Threshold0 2 4 6 8 10 12 14 16 18 20

0

1000

2000

Num

ber

of P

airs

Page 28: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Results

d(i, j) d2(i, j) exp(d(i, j))

Cosine 0.871 0.849 0.866

Jeffrey 0.647 0.633 0.608

Page 29: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Results

Correlation Coefficient Feature0.1175 Equivalent Diameter0.1085 Energy (Haralick)0.0823 Gabor Mean 135_050.0647 Convex Area0.0467 Gabor STD 135_040.0322 Min Intensity BG0.0295 Markov 40.0280 Variance (Haralick)0.0265 Gabor STD 45_050.0238 SD Intensity

R2 = 0.871

29

Results

Page 30: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Results

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

Content

Sem

antic

30

Results

Page 31: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Results

1 2 3 4 50

0.5

1Lobulation

1 2 3 4 50

0.5

1Malignancy

1 2 3 4 50

0.2

0.4

0.6

0.8

1Margin

1 2 3 4 50

0.2

0.4

0.6

0.8

1Sphericity

1 2 3 4 50

0.5

1Spiculation

1 2 3 4 50

0.5

1Subtlety

1 2 3 4 50

0.5

1Texture

79 Nodules

70 Nodules

31

Results

Page 32: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Results

-2 0 2 4 6 80

0.2

0.4Equivalent Diameter

-2 0 2 4 60

0.2

0.4Energy

-1 0 1 2 3 40

0.2

0.4Gabor Mean 135 5

-2 0 2 4 6 8 100

0.5

1Convex Area

-2 -1 0 1 2 3 4 50

0.1

0.2Gabor SD 135 4

-3 -2 -1 0 1 20

0.2

0.4Min Intensity BG

-1 0 1 2 3 4 5 60

0.5

1Markov4

-2 0 2 4 6 80

0.5

1Variance

-2 -1 0 1 2 3 40

0.1

0.2Gabor SD 45 5

-2 0 2 4 60

0.1

0.2SD Intensity

79 nodules70 nodules

32

Results

Page 33: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Results

-5 0 5 100

0.1

0.2

0.3

0.4A

-5 0 5 100

0.05

0.1

0.15

0.2B

79 Nodules70 Nodules

79 Nodules70 Nodules

1 2 3 4 50

0.2

0.4

0.6

0.8C

1 2 3 4 50

0.2

0.4

0.6

0.8D

79 Nodules70 Nodules

79 Nodules70 Nodules

Results

A. Equivalent Diameter, B. Standard Deviation of Intensity, C. Malignancy, D. Subtlety

Page 34: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Preliminary Issues

• The ANN also is not yet sufficient to predict semantic similarity from content– Best correlation 0.438– Malignancy correlation 0.521– Jeffrey performed better unlike linear model

• A semantic gap still exists

Conclusions

Page 35: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Conclusions

• Our linear model applies to a specific type of nodule– Characteristics: High malignancy, high texture,

low lobulation, and low spiculation– Features: Larger diameter, greater intensity

• Linear models are not sufficient for determination of similarities– R2 of 0.871 with chosen nodules

35

Conclusions

Page 36: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Future Work

• Reduce variability among radiologists– Use only nodules with radiologists’ agreement

• Find best combination of content features– 64 may be too many– Currently only using 2D

Future Work

Page 37: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

• Different semantic distance measures– Some ratings are ordinal, Jeffery is for

categorical

• Different methods of machine learning– Incorporate radiologists’ feedback into training– Ensemble of classifiers

Future Work

Page 38: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21.

Thanks for Listening

Any Questions?

38

Thanks for Listening