An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC
description
Transcript of An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC
An Investigation into the Relationship between Semantic and Content Based Similarity
Using LIDC
Grace Dasovich
Robert Kim
Midterm Presentation
August 21 2009
OutlineOutline
• Related Work
• Data
• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression
• Conclusions
• Future Work
• Computer-Aided Diagnosis (CADx) based on low-level image features– Armato et al. developed a linear discriminant
classifier using features of lung nodules– Need to find the relationship between the
image features and radiologists’ ratings
Related Work
• Image features and the semantic ratings– Lung Interpretations
• Barb et al. developed Evolutionary System for Semantic Exchange of Information in Collaborative Environments (ESSENCE)
• Raicu et al. used ensemble classifiers and decision trees to predict semantic ratings
• Samala et al. used several combinations of image features and the radiologists’ ratings to classify nodules
Related Work
– Similarity• Li et al. investigated four different methods to
compute similarity measures for lung nodules– Feature-based– Pixel-value-difference– Cross correlation– ANN
Related Work
Materials
• LIDC Dataset
• 149 Unique Nodules– One slice per nodule, largest nodule area
• 9 Semantic Characteristics– Calcification and Internal Structure had little
variation, thus were not used
• 64 Content Features– Shape, size, intensity, and texture
6
Data
• Related Work
• Data
• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression
• Conclusions
• Future Work
Outline
• Cosine Similarity
• Jeffrey Divergence
• Euclidean Distance
Similarity Measures
Similarity Measures
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Euclidean Distance
Co
sin
e S
imila
rity
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5
2
2.5
3
3.5
4
Euclidean Distance
Jeff
rey
Div
erg
en
ce
Similarity Measures
• Computed feature distance measures
Similarity Measures
OutlineOutline
• Related Work
• Data
• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression
• Conclusions
• Future Work
• Two three-layer ANNs – Input (64 neurons), hidden layer (5 neurons), output
(1)– Input (64 neurons), hidden layer (5 neurons), output
(7)
• Input = 64 feature distances• Output = Semantic similarity or difference in
semantic ratings• Hyperbolic tangent function, backpropagation
algorithm, 200 iterations
Methods
• ANN with a single output– 640 random pairs from all 109 nodules– 231 pairs from nodules with malignancy > 3– 496 pairs from nodules with area > 122 mm2
Methods
Methods
• ANN with seven outputs– 640 random pairs from all 109 nodules
• Leave-one-out method– Cosine similarity or Jeffrey divergence or
difference in Semantic ratings used as teaching data
– An ANN trained with entire dataset minus one image pair
– The pair left out used for testing– Correlation between calculated radiologists’
similarity and ANN output calculated
Methods
• ANN with a single output– 640 random pairs from all 109 nodules– 231 pairs from nodules with malignancy > 3– 496 pairs from nodules with area > 122 mm2
• ANN with seven outputs– 640 random pairs from all 109 nodules
Methods
• ANN using 640 random pairs
Results
• ANN using 231 pairs with malignancy rating > 3
Results
• ANN using 496 pairs with area > 122 mm2
Results
• ANN output vs. target values using Jeffrey divergence for the 640 pairs (r = 0.438)
Results
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Output
Ta
rge
t
• ANN using random 640 pairs and the Jeffrey divergence with seven semantic ratings
Results
OutlineOutline
• Related Work
• Data
• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression
• Conclusions
• Future Work
Methods
• Normalization of Features– Min-Max Technique – Z-Score Technique
• Pair Selection– Looked for matches between k number of
most similar images based on semantic and content
24
Methods
Methods
• Multivariate Regression Analysis– Select features with highest correlation
coefficients
– Feature distance measures
25
Methods
• Nodule Analysis– Determine differences between selected and
non-selected nodules– Define requirements for our model
Methods
Results
27
Results
0 2 4 6 8 10 12 14 16 18 200
0.5
1
Cor
rela
tion
Threshold0 2 4 6 8 10 12 14 16 18 20
0
1000
2000
Num
ber
of P
airs
Results
d(i, j) d2(i, j) exp(d(i, j))
Cosine 0.871 0.849 0.866
Jeffrey 0.647 0.633 0.608
Results
Correlation Coefficient Feature0.1175 Equivalent Diameter0.1085 Energy (Haralick)0.0823 Gabor Mean 135_050.0647 Convex Area0.0467 Gabor STD 135_040.0322 Min Intensity BG0.0295 Markov 40.0280 Variance (Haralick)0.0265 Gabor STD 45_050.0238 SD Intensity
R2 = 0.871
29
Results
Results
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Content
Sem
antic
30
Results
Results
1 2 3 4 50
0.5
1Lobulation
1 2 3 4 50
0.5
1Malignancy
1 2 3 4 50
0.2
0.4
0.6
0.8
1Margin
1 2 3 4 50
0.2
0.4
0.6
0.8
1Sphericity
1 2 3 4 50
0.5
1Spiculation
1 2 3 4 50
0.5
1Subtlety
1 2 3 4 50
0.5
1Texture
79 Nodules
70 Nodules
31
Results
Results
-2 0 2 4 6 80
0.2
0.4Equivalent Diameter
-2 0 2 4 60
0.2
0.4Energy
-1 0 1 2 3 40
0.2
0.4Gabor Mean 135 5
-2 0 2 4 6 8 100
0.5
1Convex Area
-2 -1 0 1 2 3 4 50
0.1
0.2Gabor SD 135 4
-3 -2 -1 0 1 20
0.2
0.4Min Intensity BG
-1 0 1 2 3 4 5 60
0.5
1Markov4
-2 0 2 4 6 80
0.5
1Variance
-2 -1 0 1 2 3 40
0.1
0.2Gabor SD 45 5
-2 0 2 4 60
0.1
0.2SD Intensity
79 nodules70 nodules
32
Results
Results
-5 0 5 100
0.1
0.2
0.3
0.4A
-5 0 5 100
0.05
0.1
0.15
0.2B
79 Nodules70 Nodules
79 Nodules70 Nodules
1 2 3 4 50
0.2
0.4
0.6
0.8C
1 2 3 4 50
0.2
0.4
0.6
0.8D
79 Nodules70 Nodules
79 Nodules70 Nodules
Results
A. Equivalent Diameter, B. Standard Deviation of Intensity, C. Malignancy, D. Subtlety
Preliminary Issues
• The ANN also is not yet sufficient to predict semantic similarity from content– Best correlation 0.438– Malignancy correlation 0.521– Jeffrey performed better unlike linear model
• A semantic gap still exists
Conclusions
Conclusions
• Our linear model applies to a specific type of nodule– Characteristics: High malignancy, high texture,
low lobulation, and low spiculation– Features: Larger diameter, greater intensity
• Linear models are not sufficient for determination of similarities– R2 of 0.871 with chosen nodules
35
Conclusions
Future Work
• Reduce variability among radiologists– Use only nodules with radiologists’ agreement
• Find best combination of content features– 64 may be too many– Currently only using 2D
Future Work
• Different semantic distance measures– Some ratings are ordinal, Jeffery is for
categorical
• Different methods of machine learning– Incorporate radiologists’ feedback into training– Ensemble of classifiers
Future Work
Thanks for Listening
Any Questions?
38
Thanks for Listening