_____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski,...

20
_____KOSYR 2001______ Rules for Melanoma Rules for Melanoma Skin Cancer Diagnosis Skin Cancer Diagnosis W ł odzis ł aw Duch , K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods, Nicholas Copernicus University, Torun, Poland. http://www.phys.uni.torun.pl/kmk Zdzisław Hippe Department of Computer Chemistry and Physical Chemistry Rzeszów University of Technology, [email protected]

Transcript of _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski,...

Page 1: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

Rules for Melanoma Rules for Melanoma

Skin Cancer DiagnosisSkin Cancer Diagnosis

Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

Nicholas Copernicus University, Torun, Poland.

http://www.phys.uni.torun.pl/kmk

Zdzisław Hippe

Department of Computer Chemistry and Physical Chemistry

Rzeszów University of Technology,

[email protected]

Rules for Melanoma Rules for Melanoma

Skin Cancer DiagnosisSkin Cancer Diagnosis

Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

Nicholas Copernicus University, Torun, Poland.

http://www.phys.uni.torun.pl/kmk

Zdzisław Hippe

Department of Computer Chemistry and Physical Chemistry

Rzeszów University of Technology,

[email protected]

Page 2: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

Content:Content:

Melanoma skin cancer data

5 methods: GTS, SSV, MLP2LN, SSV, SBL, and their results.

Final comparison of results

Conclusions & future prospects

Page 3: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

Skin cancerSkin cancer

Most common skin cancer:

Basal cell carcinoma (rak podstawnokomórkowy)

Squamous cell carcinoma (rak kolczystonabłonkowy)

Melanoma: uncontrolled growth of melanocytes, the skin cells that produce the skin pigment melanin.

Too much exposure to the sun, sunburn.

Melanoma is 4% of skin cancers, most difficult to control, 1:79 Americans will develop melanoma.

Almost 2000 percent increase since 1930.

Survival now 84%, early detection 95%.

Page 4: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

Melanoma skin cancer data summaryMelanoma skin cancer data summary

Collected in the Outpatient Center of Dermatology in Rzeszów, Poland.

Four types of Melanoma: benign, blue, suspicious, or malignant.

250 cases, with almost equal class distribution.

Each record in the database has 13 attributes.

TDS (Total Dermatoscopy Score) - single index

26 new test cases.

Goal: understand the data, find simple description.

Page 5: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

Melanoma AB attributesMelanoma AB attributes

Asymmetry: symmetric-spot, 1-axial asymmetry, and 2-axial asymmetry.

Border irregularity: The edges are ragged, notched, or blurred.Integer, from 0 to 8.

Page 6: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

Melanoma CD attributesMelanoma CD attributes

Color: white, blue, black, red, light brown, and dark brown; several colors are possible simultaneously.

Diversity: pigment globules, pigment dots, pigment network, branched strikes, structureless areas.

Page 7: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

Melanoma TDS indexMelanoma TDS index

Combine ABCD attributes to form one index:

TDS index ABCD formula:

TDS = 1.3 Asymmetry + 0.1 Border + 0.5 {Colors} + 0.5 {Diversities}

Coefficients from statistical analysis.

Page 8: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

Remarks on testingRemarks on testing

Test: only 26 cases for 4 classes.

Estimation of expected statistical accuracy on 276 training + test cases with 10-fold crossvalidation.Not done with most methods!

Risk matrices desirable: identification of Blue nevus instead Benign nevus carries no risk, but with malignant great risk.

Page 9: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

Methods used: GTSMethods used: GTS

GTS covering algorithm (Hippe, 1997) + recursive reduction of the number of decision rules.

Interactive, user guides the development of the learning model.

Selection of combination of attributes generating learning model is based on Frequency and Ranking.

GTS allows to create many different sets of rules.

In a complex situation may be rather difficult to use.

Page 10: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

GTS results.GTS results.

GTS generated a large number (198) of rules.

Experimentation allowed to find important attributes.

Various sets of decision rules were generated: TDS & C-blue & Asymmetry & Border (4 attributes, based on the experience of medical doctors)TDS & C-blue & D-structureless-areas (3 attributes) TDS & C-Blue (2 attributes)TDS (1 attribute) - poor results. Models with 2-4 attributes give 81-85% accuracy.

Combination and generalization of these rules allowed to select 4 simplified best rules.

Overall: 6 errors on training, 0 errors on test set.

Page 11: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

Methods used: SSVMethods used: SSV

Decision tree (Grąbczewski, Duch 1999)

Based on a separability criterion: max. index of separability for a given split value for continuous attribute or a subset of discrete values.

Easily converted into a set of crisp logical rules.

Pruning used to ensure the simplest set of rules that generalize well.

Fully automatic, very efficient, crossvalidation tests provide estimation of statistical accuracy.

Page 12: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

SSV resultsSSV results

Pruning degree is the only user-defined parameter.

Finds TDS, C-BLUE as most important. Rules are easy to understand:

IF TDS 4.85 C-BLUE is absent => Benign-nevusIF TDS 4.85 C-BLUE is present => Blue-nevusIF 4.85 < TDS < 5.45 => SuspiciousIF TDS 5.45 => Malignant

98% accuracy on training, 100% test. 5 errors, vector pairs from C1/C2 have identical TDS & C-BLUE.

10xCV on all data: 97.5±0.3%

Page 13: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

Methods used: Methods used: MLP2LNMLP2LN

Constructive constrained MLP algorithm, 0, ±1 weights at the end of training.

MLP is converted into LN, network performing logical function (Duch, Adamczak, Grąbczewski 1996)

Network function is written as a set of crisp logical rules.

Automatic determination of crisp and fuzzy "soft-trapezoidal" membership functions.

Tradeoff: simplicity vs. accuracy explored.

Tradeoff: confidence vs. rejection rate explored.

Almost fully automatic algorithm.

Page 14: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

MLP2LN resultsMLP2LN results

Very similar rules as for the SSV found.

Confusion matrix:

Original class Benign Blue- Malig- Suspi-

Calculated nevus nevus nant cious

Benign-nevus 62 5 0 0

Blue-nevus 0 59 0 0

Malignant 0 0 62 0

Suspicious 0 0 0 62

Page 15: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

Methods used: FSMMethods used: FSM

Feature-Space Mapping (Duch 1994)

FSM estimates probability density of training data.

Neuro-fuzzy system, based on separable transfer functions.

Constructive learning algorithm with feature selection and network pruning.

Each transfer function component is a context-dependent membership function.

Crisp logic rules from rectangular functions.

Trapezoidal, triangular, Gaussian f. for fuzzy logic rules.

Page 16: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

FSM resultsFSM results

Rectangular functions used for C-rules.

7 nodes (rules) created on average.

10xCV accuracy on training 95.5±1.0%, test 100%.

Committee of 20 FSM networks: 95.5±1.1%, test 92.6%.

F-rules, Gaussian membership functions: 15 fuzzy rules, lower accuracy.

Simplest solution should strongly be preferred.

Page 17: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

Methods used: SBLMethods used: SBL

Similarity-Based-Methods: many models based on evaluation of similarity.

Similarity-Based-Learner (SBL): software implementation of SBM.

Various extensions of the k-nearest neighbor algorithms.

S-rules, more general than C-rules and F-rules.

Small number of prototype cases used to explain the data class structure.

Page 18: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

SBL resultsSBL results

SBL optimized performing 10xCV on training set.

Manhattan distance, feature selection: TDS & C_Blue

97.4 ± 0.3% on training, 100% test.

S-rules of the form: IF (X sim Pi) THEN C(X)=C(Pi)IF (|TDS(X)-TDS(Pi)|+|C_blue(X)-C_blue (Pi)|)<T (Pi) THEN C(X)=C(Pi) Prototype selection left 13 vectors (7 for Benign-nevus class, 2 for every other class.97.5% or 6 errors on training (237 vectors), 100% test

7 prototypes: 91.4% training (243 vectors), 100% test

Page 19: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

Results - comparisonResults - comparison

Method Rules Training % Test%

SSV Tree, crisp rules 4 97.5±0.3 100MLP2LN, crisp rules 4 98.0 all 100

GTS - final simplified 4 97.6 all 100

FSM, rectangular f. 7 95.5±1.0 100±0.0

knn+ prototype selection 13 97.5±0.0 100

FSM, Gaussian f. 15 93.7±1.0 95±3.6

GTS initial rules 198 85 all 84.6knn k=1, Manh, 2 feat. 250 97.4±0.3 100LERS, weighted rules 21 -- 96.2

Page 20: _____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

_____KOSYR 2001______

Conclusions:

TDS - most important; Color-blue second.

Without TDS - many rules.

Optimize TDS: automatic aggregation of features, ex. 2-layered neural network.

Very simple and reliable rules have been found.

S-rules are being improved - prototypes obtained from learning instead of selection.

Data base is expanding; need for non-cancer data.