Authors
Eduardo Pérez, Oscar Reyes and Sebastián Ventura
Source code is available to download at Github.
Abstract
Melanoma is the type of skin cancer with the highest levels of mortality, and it is more dangerous because it can spread to other parts of the body if not caught and treated early. Melanoma diagnosis is a complex task, even for expert dermatologists, mainly due to the great variety of morphologies in moles of patients. Accordingly, the automatic diagnosis of melanoma is a task that poses the challenge of developing efficient computational methods that ease the diagnostic and, therefore, aid dermatologists in decision-making. In this work, an extensive analysis was conducted, aiming at assessing and illustrating the effectiveness of convolutional neural networks in coping with this complex task. To achieve this objective, twelve well-known convolutional network models were evaluated on eleven public image datasets. The experimental study comprised five phases, where first it was analyzed the sensitivity of the models regarding the optimization algorithm used for their training, and then it was analyzed the impact in performance when using different techniques such as cost-sensitive learning, data augmentation and transfer learning. The conducted study confirmed the usefulness, effectiveness and robustness of different convolutional architectures in solving melanoma diagnosis problem. Also, important guidelines to researchers working on this area were provided, easing the selection of both the proper convolutional model and technique according the characteristics of data.
Index
- Image datasets for melanoma diagnosis
- Optimization algorithms for training CNNs
- Weight balancing
- Transfer learning
- Balance data using data-augmentation
- Combining transfer learning and data augmentation
- Complexity of the Convolutional Neural Network structures used in this work
Image datasets for melanoma diagnosis
Due to the incidence of melanoma is continuing to increase worldwide, in the last few years several private and public datasets have been published, thus allowing a better study of this illness and, therefore, the design of better approaches for its automatic diagnosis. The most popular private data collections of dermoscopic images are the Interactive Atlas of Dermoscopy (Argenziano et al., 2004), Dermofit Image Library (Ballerini et al., 2013), and the dataset presented by Esteva et al. (2017) which conducted a comparison with 21 dermatologists using 129,450 clinical images; to the best of our knowledge this last dataset is the largest one reported in the literature.
Regarding public datasets for studying melanoma, the biggest collection of datasets can be found in ISIC repository, comprising a total of 23,906 images that were labeled by expert dermatologists. In this repository we can found the HAM10000, MSK and UDA datasets which appeared in (Tschandl et al., 2018), (Codella et al., 2018), and (Gutman et al., 2016), respectively. Furthermore, this repository provides the different datasets presented in the annual ISIC challenges (ISIC-2016, ISIC-2017, etc), which are also commonly used as benchmarks by the researchers. Mendonca et al. (2013), on the other hand, created the PH2 dataset that comprises 200 highquality dermoscopic images. Finally, Ericsson (2015) presented the MED-NODE dataset that collects 170 non-dermoscopic images shot with common digital cameras; it is noteworthy that this source of data is very important nowadays since the use of technological devices (e.g. smartphones and tablets) is constantly growing.
Dataset |
Source |
# Img. |
ImbR |
IntraC |
InterC |
DistR |
Silhouette |
HAM10000 |
(Tschandl et al., 2018 |
7,818 |
6.024 |
8,705 |
9,770 |
0.891 |
0.213 |
ISBI2016 |
(Gutman et al., 2016) |
1,273 |
4.092 |
10,553 |
10,992 |
0.960 |
0.101 |
ISBI2017 |
(Codella et al., 2018) |
2,745 |
4.259 |
9,280 |
9,674 |
0.959 |
0.089 |
MED-NODE |
(Giotis et al., 2015) |
170 |
1.429 |
9,029 |
9,513 |
0.949 |
0.068 |
MSK-1 |
(Codella et al., 2018) |
1,088 |
2.615 |
11,753 |
14,068 |
0.835 |
0.173 |
MSK-2 |
(Codella et al., 2018) |
1,522 |
3.299 |
9,288 |
9,418 |
0.986 |
0.062 |
MSK-3 |
(Codella et al., 2018) |
225 |
10.842 |
8,075 |
8,074 |
1.000 |
0.112 |
MSK-4 |
(Codella et al., 2018) |
943 |
3.366 |
6,930 |
7,162 |
0.968 |
0.065 |
PH2 |
(Mendonca et al., 2013) |
200 |
4.000 |
12,688 |
14,928 |
0.850 |
0.210 |
UDA-1 |
(Gutman et al., 2016) |
557 |
2.503 |
11,730 |
12,243 |
0.958 |
0.083 |
UDA-2 |
(Gutman et al., 2016) |
60 |
1.609 |
11,297 |
11,601 |
0.974 |
0.020 |
Table 1: Summary of the benchmark datasets. Img: total number of images; ImbR: imbalance ratio between the benign and malignant classes; IntraC: average distance between samples of the same class; InterC: average distance between samples of different classes; DistR: ratio between IntraC and InterC; Silhouette: silhouette score. |
Optimization algorithms for training CNNs
Table 2 shows the results obtained after training the twelve CNN models with the three selected optimization algorithms, SGD, ADAM, and RMSprop, using the learning rates recommended by their authors.
Dataset |
DenseNet121.adam |
DenseNet121.rmsprop |
DenseNet121.sgd |
DenseNet169.adam |
DenseNet169.rmsprop |
DenseNet169.sgd |
DenseNet201.adam |
DenseNet201.rmsprop |
DenseNet201.sgd |
InceptionResNetV2.adam |
InceptionResNetV2.rmsprop |
InceptionResNetV2.sgd |
InceptionV3.adam |
InceptionV3.rmsprop |
InceptionV3.sgd |
InceptionV4.adam |
InceptionV4.rmsprop |
InceptionV4.sgd |
MobileNet.adam |
MobileNet.rmsprop |
MobileNet.sgd |
NASNetMobile.adam |
NASNetMobile.rmsprop |
NASNetMobile.sgd |
ResNet50.adam |
ResNet50.rmsprop |
ResNet50.sgd |
VGG16.adam |
VGG16.rmsprop |
VGG16.sgd |
VGG19.adam |
VGG19.rmsprop |
VGG19.sgd |
Xception.adam |
Xception.rmsprop |
Xception.sgd |
HAM10000 |
0.472 |
0.463 |
0.553 |
0.480 |
0.474 |
0.547 |
0.467 |
0.462 |
0.628 |
0.448 |
0.472 |
0.550 |
0.443 |
0.482 |
0.591 |
0.144 |
0.058 |
0.462 |
0.428 |
0.430 |
0.463 |
0.441 |
0.440 |
0.578 |
0.439 |
0.335 |
0.410 |
0.000 |
0.000 |
0.383 |
0.000 |
0.000 |
0.363 |
0.502 |
0.488 |
0.479 |
ISBI2016 |
0.246 |
0.239 |
0.342 |
0.269 |
0.214 |
0.321 |
0.256 |
0.259 |
0.367 |
0.263 |
0.232 |
0.389 |
0.293 |
0.259 |
0.307 |
0.061 |
0.027 |
0.155 |
0.190 |
0.171 |
0.309 |
0.227 |
0.223 |
0.254 |
0.255 |
0.219 |
0.303 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.281 |
0.267 |
0.281 |
ISBI2017 |
0.203 |
0.162 |
0.179 |
0.188 |
0.188 |
0.224 |
0.196 |
0.183 |
0.217 |
0.079 |
0.191 |
0.284 |
0.144 |
0.102 |
0.225 |
0.027 |
0.018 |
0.089 |
0.163 |
0.220 |
0.095 |
0.206 |
0.197 |
0.230 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.176 |
0.000 |
0.000 |
0.188 |
0.212 |
0.218 |
0.193 |
MED-NODE |
0.559 |
0.530 |
0.506 |
0.579 |
0.561 |
0.504 |
0.529 |
0.560 |
0.508 |
0.542 |
0.579 |
0.505 |
0.551 |
0.537 |
0.567 |
0.396 |
0.258 |
0.494 |
0.377 |
0.394 |
0.533 |
0.415 |
0.314 |
0.466 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.569 |
0.560 |
0.539 |
MSK-1 |
0.459 |
0.464 |
0.575 |
0.432 |
0.463 |
0.560 |
0.450 |
0.456 |
0.555 |
0.424 |
0.376 |
0.592 |
0.327 |
0.443 |
0.574 |
0.047 |
0.053 |
0.368 |
0.345 |
0.426 |
0.494 |
0.292 |
0.334 |
0.533 |
0.389 |
0.406 |
0.366 |
0.000 |
0.000 |
0.487 |
0.000 |
0.000 |
0.461 |
0.517 |
0.532 |
0.405 |
MSK-2 |
0.272 |
0.268 |
0.314 |
0.275 |
0.254 |
0.323 |
0.273 |
0.279 |
0.325 |
0.279 |
0.237 |
0.327 |
0.244 |
0.257 |
0.282 |
0.036 |
0.028 |
0.197 |
0.229 |
0.260 |
0.263 |
0.244 |
0.238 |
0.269 |
0.222 |
0.232 |
0.258 |
0.000 |
0.000 |
0.239 |
0.000 |
0.000 |
0.157 |
0.295 |
0.294 |
0.232 |
MSK-3 |
0.176 |
0.093 |
0.093 |
0.172 |
0.107 |
0.114 |
0.169 |
0.159 |
0.165 |
0.116 |
0.060 |
0.095 |
0.147 |
0.069 |
0.080 |
0.100 |
0.071 |
0.044 |
0.184 |
0.146 |
0.124 |
0.180 |
0.114 |
0.119 |
0.103 |
0.090 |
0.166 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.110 |
0.078 |
0.149 |
MSK-4 |
0.315 |
0.273 |
0.406 |
0.273 |
0.267 |
0.391 |
0.311 |
0.286 |
0.393 |
0.336 |
0.309 |
0.398 |
0.305 |
0.333 |
0.365 |
0.061 |
0.037 |
0.270 |
0.224 |
0.290 |
0.269 |
0.157 |
0.174 |
0.302 |
0.256 |
0.271 |
0.270 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.279 |
0.291 |
0.273 |
PH2 |
0.718 |
0.675 |
0.778 |
0.714 |
0.692 |
0.766 |
0.678 |
0.733 |
0.771 |
0.686 |
0.680 |
0.762 |
0.682 |
0.669 |
0.647 |
0.401 |
0.190 |
0.580 |
0.647 |
0.619 |
0.700 |
0.470 |
0.474 |
0.640 |
0.696 |
0.651 |
0.719 |
0.000 |
0.000 |
0.317 |
0.000 |
0.000 |
0.204 |
0.740 |
0.648 |
0.808 |
UDA-1 |
0.396 |
0.384 |
0.386 |
0.359 |
0.349 |
0.400 |
0.368 |
0.347 |
0.394 |
0.359 |
0.380 |
0.419 |
0.330 |
0.316 |
0.336 |
0.089 |
0.057 |
0.281 |
0.279 |
0.335 |
0.360 |
0.179 |
0.277 |
0.351 |
0.310 |
0.350 |
0.330 |
0.000 |
0.000 |
0.286 |
0.000 |
0.000 |
0.259 |
0.365 |
0.396 |
0.348 |
UDA-2 |
0.479 |
0.345 |
0.465 |
0.420 |
0.287 |
0.455 |
0.382 |
0.299 |
0.596 |
0.523 |
0.336 |
0.438 |
0.522 |
0.364 |
0.540 |
0.390 |
0.141 |
0.514 |
0.393 |
0.328 |
0.375 |
0.376 |
0.379 |
0.427 |
0.516 |
0.298 |
0.414 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.553 |
0.486 |
0.439 |
Table 2: Average MCC values on test sets by using the three optimization algorithms. |
Table 3 shows the results obtained after training the twelve CNN models with the three selected optimization algorithms, SGD, ADAM, and RMSprop; Adam and RMSprop used the same learning rate as SGD.
Dataset |
DenseNet121.adam |
DenseNet121.rmsprop |
DenseNet121.sgd |
DenseNet169.adam |
DenseNet169.rmsprop |
DenseNet169.sgd |
DenseNet201.adam |
DenseNet201.rmsprop |
DenseNet201.sgd |
InceptionResNetV2.adam |
InceptionResNetV2.rmsprop |
InceptionResNetV2.sgd |
InceptionV3.adam |
InceptionV3.rmsprop |
InceptionV3.sgd |
InceptionV4.adam |
InceptionV4.rmsprop |
InceptionV4.sgd |
MobileNet.adam |
MobileNet.rmsprop |
MobileNet.sgd |
NASNetMobile.adam |
NASNetMobile.rmsprop |
NASNetMobile.sgd |
ResNet50.adam |
ResNet50.rmsprop |
ResNet50.sgd |
VGG16.adam |
VGG16.rmsprop |
VGG16.sgd |
VGG19.adam |
VGG19.rmsprop |
VGG19.sgd |
Xception.adam |
Xception.rmsprop |
Xception.sgd |
HAM10000 |
0.484 |
0.513 |
0.553 |
0.492 |
0.502 |
0.547 |
0.478 |
0.497 |
0.628 |
0.492 |
0.449 |
0.550 |
0.470 |
0.468 |
0.591 |
0.013 |
0.000 |
0.462 |
0.459 |
0.451 |
0.463 |
0.452 |
0.400 |
0.578 |
0.000 |
0.000 |
0.410 |
0.000 |
0.000 |
0.383 |
0.000 |
0.000 |
0.363 |
0.403 |
0.492 |
0.479 |
ISBI2016 |
0.329 |
0.306 |
0.342 |
0.281 |
0.299 |
0.321 |
0.270 |
0.297 |
0.367 |
0.235 |
0.278 |
0.389 |
0.182 |
0.314 |
0.307 |
0.014 |
0.025 |
0.155 |
0.283 |
0.252 |
0.309 |
0.076 |
0.018 |
0.254 |
0.000 |
0.000 |
0.303 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.248 |
0.256 |
0.281 |
ISBI2017 |
0.215 |
0.202 |
0.179 |
0.227 |
0.234 |
0.224 |
0.251 |
0.21 |
0.217 |
0.172 |
0.176 |
0.284 |
0.141 |
0.195 |
0.225 |
0.011 |
0.013 |
0.089 |
0.178 |
0.193 |
0.095 |
0.160 |
0.061 |
0.230 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.176 |
0.000 |
0.000 |
0.188 |
0.205 |
0.218 |
0.193 |
MED-NODE |
0.486 |
0.511 |
0.506 |
0.521 |
0.600 |
0.504 |
0.613 |
0.603 |
0.508 |
0.529 |
0.517 |
0.505 |
0.498 |
0.563 |
0.567 |
0.122 |
0.066 |
0.494 |
0.503 |
0.545 |
0.533 |
0.284 |
0.076 |
0.466 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.657 |
0.552 |
0.539 |
MSK-1 |
0.493 |
0.473 |
0.575 |
0.494 |
0.514 |
0.560 |
0.508 |
0.495 |
0.555 |
0.242 |
0.402 |
0.592 |
0.431 |
0.479 |
0.574 |
0.032 |
0.042 |
0.368 |
0.396 |
0.354 |
0.494 |
0.063 |
0.021 |
0.533 |
0.000 |
0.000 |
0.366 |
0.000 |
0.000 |
0.487 |
0.000 |
0.000 |
0.461 |
0.480 |
0.496 |
0.405 |
MSK-2 |
0.280 |
0.296 |
0.314 |
0.299 |
0.285 |
0.323 |
0.262 |
0.291 |
0.325 |
0.221 |
0.245 |
0.327 |
0.185 |
0.259 |
0.282 |
0.010 |
0.014 |
0.197 |
0.233 |
0.253 |
0.263 |
0.092 |
0.049 |
0.269 |
0.000 |
0.000 |
0.258 |
0.000 |
0.000 |
0.239 |
0.000 |
0.000 |
0.157 |
0.225 |
0.285 |
0.232 |
MSK-3 |
0.148 |
0.052 |
0.093 |
0.174 |
0.135 |
0.114 |
0.069 |
0.114 |
0.165 |
0.021 |
0.045 |
0.095 |
0.079 |
0.077 |
0.080 |
0.092 |
0.018 |
0.044 |
0.093 |
0.154 |
0.124 |
0.066 |
0.047 |
0.119 |
0.000 |
0.000 |
0.166 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.080 |
0.115 |
0.149 |
MSK-4 |
0.303 |
0.323 |
0.406 |
0.304 |
0.291 |
0.391 |
0.266 |
0.346 |
0.393 |
0.222 |
0.214 |
0.398 |
0.178 |
0.318 |
0.365 |
0.023 |
0.036 |
0.270 |
0.278 |
0.282 |
0.269 |
0.072 |
0.035 |
0.302 |
0.000 |
0.000 |
0.270 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.303 |
0.285 |
0.273 |
PH2 |
0.669 |
0.679 |
0.778 |
0.700 |
0.659 |
0.766 |
0.691 |
0.736 |
0.771 |
0.528 |
0.609 |
0.762 |
0.636 |
0.724 |
0.647 |
0.069 |
0.067 |
0.580 |
0.662 |
0.664 |
0.700 |
0.316 |
0.116 |
0.640 |
0.000 |
0.000 |
0.719 |
0.000 |
0.000 |
0.317 |
0.000 |
0.000 |
0.204 |
0.629 |
0.592 |
0.808 |
UDA-1 |
0.418 |
0.379 |
0.386 |
0.410 |
0.377 |
0.400 |
0.393 |
0.407 |
0.394 |
0.368 |
0.330 |
0.419 |
0.261 |
0.357 |
0.336 |
0.058 |
0.038 |
0.281 |
0.352 |
0.340 |
0.360 |
0.074 |
0.058 |
0.351 |
0.000 |
0.000 |
0.330 |
0.000 |
0.000 |
0.286 |
0.000 |
0.000 |
0.259 |
0.339 |
0.392 |
0.348 |
UDA-2 |
0.271 |
0.199 |
0.465 |
0.344 |
0.324 |
0.455 |
0.366 |
0.284 |
0.596 |
0.161 |
0.215 |
0.438 |
0.463 |
0.206 |
0.540 |
0.071 |
0.050 |
0.514 |
0.363 |
0.264 |
0.375 |
0.141 |
0.126 |
0.427 |
0.000 |
0.000 |
0.414 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.436 |
0.234 |
0.439 |
Table 3: Average MCC values on test sets by using the three optimization algorithms. |
Weight balancing
Table 4 summarizes the results obtained after training the twelve CNN models but considering the weight balancing approach.
Dataset |
DenseNet121 |
DenseNet169 |
DenseNet201 |
InceptionResNetV2 |
InceptionV3 |
InceptionV4 |
MobileNet |
NASNetMobile |
ResNet50 |
VGG16 |
VGG19 |
Xception |
HAM10000 |
0.559 |
0.554 |
0.568 |
0.564 |
0.588 |
0.476 |
0.470 |
0.584 |
0.000 |
0.369 |
0.367 |
0.467 |
ISBI2016 |
0.314 |
0.294 |
0.316 |
0.329 |
0.288 |
0.209 |
0.302 |
0.290 |
0.231 |
0.196 |
0.000 |
0.266 |
ISBI2017 |
0.235 |
0.242 |
0.270 |
0.281 |
0.255 |
0.092 |
0.216 |
0.229 |
0.177 |
0.199 |
0.181 |
0.183 |
MED-NODE |
0.496 |
0.462 |
0.464 |
0.561 |
0.540 |
0.532 |
0.575 |
0.514 |
0.000 |
0.000 |
0.000 |
0.534 |
MSK-1 |
0.559 |
0.544 |
0.545 |
0.595 |
0.607 |
0.300 |
0.524 |
0.550 |
0.427 |
0.431 |
0.153 |
0.407 |
MSK-2 |
0.282 |
0.303 |
0.276 |
0.323 |
0.265 |
0.202 |
0.281 |
0.235 |
0.197 |
0.256 |
0.223 |
0.221 |
MSK-3 |
0.088 |
0.162 |
0.129 |
0.162 |
0.167 |
0.069 |
0.121 |
0.090 |
0.250 |
0.000 |
0.000 |
0.133 |
MSK-4 |
0.318 |
0.329 |
0.311 |
0.360 |
0.331 |
0.270 |
0.252 |
0.288 |
0.244 |
0.218 |
0.139 |
0.259 |
PH2 |
0.779 |
0.773 |
0.819 |
0.794 |
0.771 |
0.661 |
0.780 |
0.727 |
0.760 |
0.678 |
0.280 |
0.794 |
UDA-1 |
0.432 |
0.351 |
0.406 |
0.437 |
0.368 |
0.318 |
0.395 |
0.295 |
0.364 |
0.279 |
0.059 |
0.348 |
UDA-2 |
0.460 |
0.408 |
0.345 |
0.385 |
0.537 |
0.322 |
0.432 |
0.317 |
0.478 |
0.000 |
0.000 |
0.416 |
Table 4: Average MCC values on test sets by using weight balancing. |
Transfer learning
Table 5 shows the results of each model after applying transfer learning.
Dataset |
DenseNet121 |
DenseNet169 |
DenseNet201 |
InceptionResNetV2 |
InceptionV3 |
InceptionV4 |
MobileNet |
NASNetMobile |
ResNet50 |
VGG16 |
VGG19 |
Xception |
HAM10000 |
0.750 |
0.757 |
0.756 |
0.674 |
0.686 |
0.724 |
0.725 |
0.709 |
0.315 |
0.546 |
0.424 |
0.702 |
ISBI2016 |
0.441 |
0.437 |
0.444 |
0.441 |
0.427 |
0.492 |
0.481 |
0.402 |
0.350 |
0.341 |
0.248 |
0.463 |
ISBI2017 |
0.442 |
0.439 |
0.448 |
0.412 |
0.457 |
0.444 |
0.439 |
0.410 |
0.255 |
0.272 |
0.155 |
0.456 |
MED-NODE |
0.678 |
0.691 |
0.701 |
0.600 |
0.680 |
0.620 |
0.671 |
0.707 |
0.472 |
0.495 |
0.459 |
0.718 |
MSK-1 |
0.658 |
0.653 |
0.665 |
0.701 |
0.668 |
0.695 |
0.654 |
0.628 |
0.539 |
0.405 |
0.357 |
0.645 |
MSK-2 |
0.450 |
0.429 |
0.436 |
0.458 |
0.450 |
0.429 |
0.478 |
0.385 |
0.375 |
0.365 |
0.301 |
0.437 |
MSK-3 |
0.225 |
0.328 |
0.315 |
0.084 |
0.125 |
0.383 |
0.272 |
0.201 |
0.150 |
0.013 |
0.032 |
0.000 |
MSK-4 |
0.568 |
0.557 |
0.577 |
0.536 |
0.516 |
0.513 |
0.525 |
0.455 |
0.361 |
0.344 |
0.336 |
0.450 |
PH2 |
0.872 |
0.909 |
0.861 |
0.716 |
0.738 |
0.781 |
0.834 |
0.618 |
0.467 |
0.635 |
0.543 |
0.773 |
UDA-1 |
0.549 |
0.513 |
0.514 |
0.501 |
0.509 |
0.527 |
0.507 |
0.515 |
0.521 |
0.449 |
0.256 |
0.535 |
UDA-2 |
0.332 |
0.397 |
0.465 |
0.325 |
0.406 |
0.404 |
0.471 |
0.322 |
0.484 |
0.485 |
0.443 |
0.357 |
Table 5: Average MCC values on test sets by using transfer learning. |
Data augmentation
Table 6 summarizes the results attained by applying the data augmentation (just in train) approach in each model.
Dataset |
DenseNet121 |
DenseNet169 |
DenseNet201 |
InceptionResNetV2 |
InceptionV3 |
InceptionV4 |
MobileNet |
NASNetMobile |
ResNet50 |
VGG16 |
VGG19 |
Xception |
HAM10000 |
0.558 |
0.552 |
0.570 |
0.536 |
0.550 |
0.438 |
0.481 |
0.568 |
0.410 |
0.461 |
0.424 |
0.477 |
ISBI2016 |
0.318 |
0.301 |
0.278 |
0.322 |
0.316 |
0.260 |
0.298 |
0.286 |
0.303 |
0.205 |
0.235 |
0.242 |
ISBI2017 |
0.284 |
0.322 |
0.303 |
0.236 |
0.242 |
0.200 |
0.210 |
0.261 |
0.000 |
0.232 |
0.222 |
0.178 |
MED-NODE |
0.485 |
0.503 |
0.514 |
0.562 |
0.614 |
0.417 |
0.639 |
0.501 |
0.000 |
0.567 |
0.504 |
0.544 |
MSK-1 |
0.587 |
0.591 |
0.594 |
0.550 |
0.582 |
0.557 |
0.523 |
0.564 |
0.366 |
0.466 |
0.462 |
0.375 |
MSK-2 |
0.306 |
0.295 |
0.315 |
0.261 |
0.262 |
0.190 |
0.262 |
0.271 |
0.258 |
0.222 |
0.239 |
0.236 |
MSK-3 |
0.153 |
0.168 |
0.163 |
0.322 |
0.218 |
0.273 |
0.256 |
0.238 |
0.166 |
0.107 |
0.136 |
0.275 |
MSK-4 |
0.300 |
0.254 |
0.282 |
0.366 |
0.323 |
0.288 |
0.204 |
0.247 |
0.270 |
0.273 |
0.214 |
0.247 |
PH2 |
0.675 |
0.741 |
0.679 |
0.789 |
0.703 |
0.720 |
0.780 |
0.630 |
0.719 |
0.705 |
0.431 |
0.744 |
UDA-1 |
0.378 |
0.380 |
0.395 |
0.524 |
0.386 |
0.347 |
0.380 |
0.313 |
0.330 |
0.369 |
0.375 |
0.331 |
UDA-2 |
0.428 |
0.464 |
0.385 |
0.484 |
0.477 |
0.444 |
0.355 |
0.490 |
0.414 |
0.450 |
0.412 |
0.488 |
Table 6: Average MCC values on test sets by using data augmentation in train. |
Table 7 summarizes the results attained by applying the data augmentation (in train and test) approach in each model.
Dataset |
DenseNet121 |
DenseNet169 |
DenseNet201 |
InceptionResNetV2 |
InceptionV3 |
InceptionV4 |
MobileNet |
NASNetMobile |
ResNet50 |
VGG16 |
VGG19 |
Xception |
HAM10000 |
0.761 |
0.750 |
0.753 |
0.912 |
0.873 |
0.798 |
0.760 |
0.874 |
0.510 |
0.661 |
0.649 |
0.921 |
ISBI2016 |
0.664 |
0.630 |
0.656 |
0.782 |
0.655 |
0.563 |
0.575 |
0.750 |
0.403 |
0.507 |
0.511 |
0.754 |
ISBI2017 |
0.710 |
0.736 |
0.715 |
0.847 |
0.749 |
0.622 |
0.744 |
0.761 |
0.100 |
0.584 |
0.575 |
0.806 |
MED-NODE |
0.518 |
0.520 |
0.514 |
0.570 |
0.618 |
0.464 |
0.660 |
0.501 |
0.100 |
0.579 |
0.540 |
0.584 |
MSK-1 |
0.797 |
0.781 |
0.792 |
0.857 |
0.754 |
0.701 |
0.785 |
0.762 |
0.466 |
0.625 |
0.610 |
0.748 |
MSK-2 |
0.646 |
0.619 |
0.631 |
0.791 |
0.518 |
0.462 |
0.531 |
0.711 |
0.358 |
0.457 |
0.428 |
0.753 |
MSK-3 |
0.548 |
0.586 |
0.588 |
0.819 |
0.565 |
0.500 |
0.532 |
0.697 |
0.266 |
0.255 |
0.227 |
0.819 |
MSK-4 |
0.655 |
0.668 |
0.696 |
0.812 |
0.693 |
0.670 |
0.596 |
0.704 |
0.370 |
0.583 |
0.467 |
0.763 |
PH2 |
0.771 |
0.819 |
0.778 |
0.907 |
0.840 |
0.849 |
0.902 |
0.835 |
0.819 |
0.841 |
0.587 |
0.936 |
UDA-1 |
0.484 |
0.524 |
0.501 |
0.688 |
0.489 |
0.445 |
0.535 |
0.524 |
0.430 |
0.572 |
0.555 |
0.587 |
UDA-2 |
0.472 |
0.503 |
0.408 |
0.521 |
0.471 |
0.453 |
0.403 |
0.490 |
0.514 |
0.450 |
0.412 |
0.538 |
Table 7: Average MCC values on test sets by using data augmentation in train and test. |
Combining transfer learning and data augmentation
Table 8 shows the results of combining transfer learning and data augmentation.
Dataset |
DenseNet121 |
DenseNet169 |
DenseNet201 |
InceptionResNetV2 |
InceptionV3 |
InceptionV4 |
MobileNet |
NASNetMobile |
ResNet50 |
VGG16 |
VGG19 |
Xception |
HAM10000 |
0.944 |
0.937 |
0.954 |
0.931 |
0.940 |
0.874 |
0.945 |
0.934 |
0.870 |
0.763 |
0.601 |
0.959 |
ISBI2016 |
0.828 |
0.820 |
0.850 |
0.803 |
0.802 |
0.724 |
0.850 |
0.805 |
0.385 |
0.557 |
0.625 |
0.799 |
ISBI2017 |
0.868 |
0.858 |
0.854 |
0.857 |
0.829 |
0.759 |
0.875 |
0.832 |
0.414 |
0.743 |
0.738 |
0.846 |
MED-NODE |
0.751 |
0.727 |
0.698 |
0.675 |
0.732 |
0.559 |
0.741 |
0.660 |
0.289 |
0.611 |
0.486 |
0.745 |
MSK-1 |
0.883 |
0.887 |
0.880 |
0.874 |
0.868 |
0.796 |
0.886 |
0.843 |
0.350 |
0.667 |
0.708 |
0.856 |
MSK-2 |
0.836 |
0.822 |
0.830 |
0.807 |
0.805 |
0.719 |
0.860 |
0.785 |
0.350 |
0.563 |
0.561 |
0.815 |
MSK-3 |
0.967 |
0.980 |
1.000 |
0.959 |
0.959 |
0.756 |
1.000 |
1.000 |
0.606 |
0.907 |
0.911 |
0.927 |
MSK-4 |
0.865 |
0.853 |
0.864 |
0.852 |
0.844 |
0.772 |
0.890 |
0.857 |
0.482 |
0.906 |
0.825 |
0.822 |
PH2 |
0.987 |
0.960 |
0.960 |
0.939 |
0.963 |
0.833 |
0.963 |
0.934 |
0.836 |
0.909 |
0.923 |
0.944 |
UDA-1 |
0.718 |
0.721 |
0.764 |
0.725 |
0.720 |
0.680 |
0.781 |
0.706 |
0.463 |
0.601 |
0.585 |
0.692 |
UDA-2 |
0.305 |
0.459 |
0.522 |
0.464 |
0.413 |
0.327 |
0.577 |
0.548 |
0.485 |
0.425 |
0.477 |
0.362 |
Table 8: Average MCC values on test sets by using transfer learning and data augmentation in train and test. |
Complexity of the Convolutional Neural Network structures used in this work
The following Tables show some useful metrics to measure the complexity of the CNN models applied in this work. The run time and the amount of utilized GPU memory are reported considering all datasets. The models were analysed considering the settings from Phase 1, meaning that data augmentation was not performed. Regarding run time, we considered two scenarios: (I) how long it takes a CNN model to complete 150 epochs and (II) how long it takes a CNN model to complete an evaluation cycle (150 epochs × 10 folds × 3 repetitions); the run time is expressed in hours and the amount of GPU memory in megabytes.
Dataset |
DenseNet121 |
DenseNet169 |
DenseNet201 |
InceptionResNetV2 |
InceptionV3 |
InceptionV4 |
MobileNet |
NASNetMobile |
ResNet50 |
VGG16 |
VGG19 |
Xception |
HAM10000 |
3.88 |
5.14 |
6.50 |
7.83 |
3.87 |
5.46 |
1.68 |
6.03 |
3.29 |
3.81 |
4.38 |
4.62 |
ISBI2016 |
1.13 |
1.56 |
1.93 |
2.06 |
0.97 |
1.27 |
0.40 |
1.94 |
0.79 |
0.76 |
0.85 |
0.96 |
ISBI2017 |
1.75 |
2.37 |
2.97 |
3.36 |
1.63 |
2.22 |
0.68 |
2.85 |
1.35 |
1.45 |
1.65 |
1.79 |
MED-NODE |
0.68 |
0.95 |
1.16 |
1.09 |
0.48 |
0.57 |
0.18 |
1.24 |
0.37 |
0.25 |
0.26 |
0.35 |
MSK-1 |
1.06 |
1.46 |
1.81 |
1.90 |
0.90 |
1.16 |
0.36 |
1.82 |
0.72 |
0.68 |
0.77 |
0.86 |
MSK-2 |
1.24 |
1.69 |
2.11 |
2.29 |
1.08 |
1.43 |
0.45 |
2.09 |
0.89 |
0.87 |
0.98 |
1.10 |
MSK-3 |
0.70 |
1.00 |
1.21 |
1.14 |
0.51 |
0.61 |
0.20 |
1.28 |
0.39 |
0.29 |
0.31 |
0.38 |
MSK-4 |
0.99 |
1.36 |
1.67 |
1.77 |
0.82 |
1.05 |
0.33 |
1.72 |
0.65 |
0.60 |
0.67 |
0.77 |
PH2 |
0.69 |
0.97 |
1.18 |
1.12 |
0.50 |
0.58 |
0.19 |
1.25 |
0.38 |
0.27 |
0.28 |
0.37 |
UDA-1 |
0.84 |
1.17 |
1.44 |
1.49 |
0.65 |
0.81 |
0.26 |
1.47 |
0.52 |
0.43 |
0.47 |
0.56 |
UDA-2 |
0.66 |
0.91 |
1.11 |
1.01 |
0.44 |
0.50 |
0.16 |
1.17 |
0.33 |
0.21 |
0.22 |
0.30 |
Average |
1.24 |
1.69 |
2.10 |
2.28 |
1.08 |
1.42 |
0.44 |
2.08 |
0.88 |
0.87 |
0.99 |
1.10 |
Table 9: Run time employed by the CNN models during training for 150 epochs. The run time is expressed in hours. |
Dataset |
DenseNet121 |
DenseNet169 |
DenseNet201 |
InceptionResNetV2 |
InceptionV3 |
InceptionV4 |
MobileNet |
NASNetMobile |
ResNet50 |
VGG16 |
VGG19 |
Xception |
HAM10000 |
116.35 |
154.29 |
194.92 |
234.88 |
116.10 |
163.82 |
50.41 |
180.96 |
98.73 |
114.21 |
131.42 |
138.65 |
ISBI2016 |
34.01 |
46.86 |
57.79 |
61.66 |
29.22 |
38.01 |
11.94 |
58.28 |
23.61 |
22.73 |
25.56 |
28.80 |
ISBI2017 |
52.36 |
70.99 |
89.13 |
100.80 |
48.87 |
66.62 |
20.49 |
85.43 |
40.51 |
43.40 |
49.50 |
53.58 |
MED-NODE |
20.46 |
28.48 |
34.86 |
32.57 |
14.47 |
17.09 |
5.46 |
37.18 |
11.05 |
7.44 |
7.79 |
10.37 |
MSK-1 |
31.93 |
43.84 |
54.45 |
57.11 |
27.13 |
34.91 |
10.95 |
54.73 |
21.58 |
20.48 |
22.98 |
25.91 |
MSK-2 |
37.13 |
50.74 |
63.22 |
68.74 |
32.50 |
42.83 |
13.40 |
62.68 |
26.59 |
26.17 |
29.48 |
32.99 |
MSK-3 |
20.99 |
29.92 |
36.25 |
34.29 |
15.42 |
18.29 |
5.89 |
38.49 |
11.75 |
8.60 |
9.24 |
11.53 |
MSK-4 |
29.72 |
40.77 |
50.04 |
53.01 |
24.48 |
31.55 |
9.88 |
51.61 |
19.63 |
17.92 |
20.01 |
23.04 |
PH2 |
20.68 |
29.24 |
35.43 |
33.47 |
14.93 |
17.50 |
5.68 |
37.58 |
11.29 |
7.95 |
8.44 |
10.96 |
UDA-1 |
25.16 |
35.08 |
43.13 |
44.66 |
19.62 |
24.33 |
7.71 |
44.24 |
15.46 |
12.90 |
14.23 |
16.92 |
UDA-2 |
19.82 |
27.40 |
33.16 |
30.24 |
13.31 |
15.14 |
4.93 |
35.25 |
9.95 |
6.43 |
6.56 |
8.90 |
Table 10: Run time employed by the CNN models to complete an evaluation cycle (150 epochs × 10 folds × 3 repetitions). The run time is expressed in hours. |
CNN |
GPU memory (megabytes) |
Parameters (millions) |
MobileNet |
671 |
4 |
NASNetMobile |
703 |
5 |
ResNet50 |
873 |
25 |
DenseNet121 |
933 |
8 |
DenseNet169 |
1015 |
14 |
DenseNet201 |
1043 |
20 |
InceptionV3 |
1085 |
23 |
InceptionV4 |
1241 |
41 |
Xception |
1271 |
22 |
InceptionResNetV2 |
1437 |
55 |
VGG16 |
2051 |
138 |
VGG19 |
2099 |
143 |
Table 11: GPU memory used by the Convolutional Neural Network models. The models are sorted by GPU memory. |