Convolutional neural networks for the automatic diagnosis of melanoma: an extensive experimental study – Knowledge Discovery and Intelligent Systems – KDIS

Authors

Eduardo Pérez, Oscar Reyes and Sebastián Ventura

Source code is available to download at Github.

Abstract

Melanoma is the type of skin cancer with the highest levels of mortality, and it is more dangerous because it can spread to other parts of the body if not caught and treated early. Melanoma diagnosis is a complex task, even for expert dermatologists, mainly due to the great variety of morphologies in moles of patients. Accordingly, the automatic diagnosis of melanoma is a task that poses the challenge of developing efficient computational methods that ease the diagnostic and, therefore, aid dermatologists in decision-making. In this work, an extensive analysis was conducted, aiming at assessing and illustrating the effectiveness of convolutional neural networks in coping with this complex task. To achieve this objective, twelve well-known convolutional network models were evaluated on eleven public image datasets. The experimental study comprised five phases, where first it was analyzed the sensitivity of the models regarding the optimization algorithm used for their training, and then it was analyzed the impact in performance when using different techniques such as cost-sensitive learning, data augmentation and transfer learning. The conducted study confirmed the usefulness, effectiveness and robustness of different convolutional architectures in solving melanoma diagnosis problem. Also, important guidelines to researchers working on this area were provided, easing the selection of both the proper convolutional model and technique according the characteristics of data.

Index

Image datasets for melanoma diagnosis
Optimization algorithms for training CNNs
Weight balancing
Transfer learning
Balance data using data-augmentation
Combining transfer learning and data augmentation
Complexity of the Convolutional Neural Network structures used in this work

Image datasets for melanoma diagnosis

Due to the incidence of melanoma is continuing to increase worldwide, in the last few years several private and public datasets have been published, thus allowing a better study of this illness and, therefore, the design of better approaches for its automatic diagnosis. The most popular private data collections of dermoscopic images are the Interactive Atlas of Dermoscopy (Argenziano et al., 2004), Dermofit Image Library (Ballerini et al., 2013), and the dataset presented by Esteva et al. (2017) which conducted a comparison with 21 dermatologists using 129,450 clinical images; to the best of our knowledge this last dataset is the largest one reported in the literature.

Regarding public datasets for studying melanoma, the biggest collection of datasets can be found in ISIC repository, comprising a total of 23,906 images that were labeled by expert dermatologists. In this repository we can found the HAM10000, MSK and UDA datasets which appeared in (Tschandl et al., 2018), (Codella et al., 2018), and (Gutman et al., 2016), respectively. Furthermore, this repository provides the different datasets presented in the annual ISIC challenges (ISIC-2016, ISIC-2017, etc), which are also commonly used as benchmarks by the researchers. Mendonca et al. (2013), on the other hand, created the PH2 dataset that comprises 200 highquality dermoscopic images. Finally, Ericsson (2015) presented the MED-NODE dataset that collects 170 non-dermoscopic images shot with common digital cameras; it is noteworthy that this source of data is very important nowadays since the use of technological devices (e.g. smartphones and tablets) is constantly growing.

Dataset	Source	# Img.	ImbR	IntraC	InterC	DistR	Silhouette
HAM10000	(Tschandl et al., 2018	7,818	6.024	8,705	9,770	0.891	0.213
ISBI2016	(Gutman et al., 2016)	1,273	4.092	10,553	10,992	0.960	0.101
ISBI2017	(Codella et al., 2018)	2,745	4.259	9,280	9,674	0.959	0.089
MED-NODE	(Giotis et al., 2015)	170	1.429	9,029	9,513	0.949	0.068
MSK-1	(Codella et al., 2018)	1,088	2.615	11,753	14,068	0.835	0.173
MSK-2	(Codella et al., 2018)	1,522	3.299	9,288	9,418	0.986	0.062
MSK-3	(Codella et al., 2018)	225	10.842	8,075	8,074	1.000	0.112
MSK-4	(Codella et al., 2018)	943	3.366	6,930	7,162	0.968	0.065
PH2	(Mendonca et al., 2013)	200	4.000	12,688	14,928	0.850	0.210
UDA-1	(Gutman et al., 2016)	557	2.503	11,730	12,243	0.958	0.083
UDA-2	(Gutman et al., 2016)	60	1.609	11,297	11,601	0.974	0.020
Table 1: Summary of the benchmark datasets. Img: total number of images; ImbR: imbalance ratio between the benign and malignant classes; IntraC: average distance between samples of the same class; InterC: average distance between samples of different classes; DistR: ratio between IntraC and InterC; Silhouette: silhouette score.

Optimization algorithms for training CNNs

Table 2 shows the results obtained after training the twelve CNN models with the three selected optimization algorithms, SGD, ADAM, and RMSprop, using the learning rates recommended by their authors.

Dataset	DenseNet121.adam	DenseNet121.rmsprop	DenseNet121.sgd	DenseNet169.adam	DenseNet169.rmsprop	DenseNet169.sgd	DenseNet201.adam	DenseNet201.rmsprop	DenseNet201.sgd	InceptionResNetV2.adam	InceptionResNetV2.rmsprop	InceptionResNetV2.sgd	InceptionV3.adam	InceptionV3.rmsprop	InceptionV3.sgd	InceptionV4.adam	InceptionV4.rmsprop	InceptionV4.sgd	MobileNet.adam	MobileNet.rmsprop	MobileNet.sgd	NASNetMobile.adam	NASNetMobile.rmsprop	NASNetMobile.sgd	ResNet50.adam	ResNet50.rmsprop	ResNet50.sgd	VGG16.sgd	VGG19.sgd	Xception.adam	Xception.rmsprop	Xception.sgd
HAM10000	0.472	0.463	0.553	0.480	0.474	0.547	0.467	0.462	0.628	0.448	0.472	0.550	0.443	0.482	0.591	0.144	0.058	0.462	0.428	0.430	0.463	0.441	0.440	0.578	0.439	0.335	0.410	0.383	0.363	0.502	0.488	0.479
ISBI2016	0.246	0.239	0.342	0.269	0.214	0.321	0.256	0.259	0.367	0.263	0.232	0.389	0.293	0.259	0.307	0.061	0.027	0.155	0.190	0.171	0.309	0.227	0.223	0.254	0.255	0.219	0.303	0.000	0.000	0.281	0.267	0.281
ISBI2017	0.203	0.162	0.179	0.188	0.188	0.224	0.196	0.183	0.217	0.079	0.191	0.284	0.144	0.102	0.225	0.027	0.018	0.089	0.163	0.220	0.095	0.206	0.197	0.230	0.000	0.000	0.000	0.176	0.188	0.212	0.218	0.193
MED-NODE	0.559	0.530	0.506	0.579	0.561	0.504	0.529	0.560	0.508	0.542	0.579	0.505	0.551	0.537	0.567	0.396	0.258	0.494	0.377	0.394	0.533	0.415	0.314	0.466	0.000	0.000	0.000	0.000	0.000	0.569	0.560	0.539
MSK-1	0.459	0.464	0.575	0.432	0.463	0.560	0.450	0.456	0.555	0.424	0.376	0.592	0.327	0.443	0.574	0.047	0.053	0.368	0.345	0.426	0.494	0.292	0.334	0.533	0.389	0.406	0.366	0.487	0.461	0.517	0.532	0.405
MSK-2	0.272	0.268	0.314	0.275	0.254	0.323	0.273	0.279	0.325	0.279	0.237	0.327	0.244	0.257	0.282	0.036	0.028	0.197	0.229	0.260	0.263	0.244	0.238	0.269	0.222	0.232	0.258	0.239	0.157	0.295	0.294	0.232
MSK-3	0.176	0.093	0.093	0.172	0.107	0.114	0.169	0.159	0.165	0.116	0.060	0.095	0.147	0.069	0.080	0.100	0.071	0.044	0.184	0.146	0.124	0.180	0.114	0.119	0.103	0.090	0.166	0.000	0.000	0.110	0.078	0.149
MSK-4	0.315	0.273	0.406	0.273	0.267	0.391	0.311	0.286	0.393	0.336	0.309	0.398	0.305	0.333	0.365	0.061	0.037	0.270	0.224	0.290	0.269	0.157	0.174	0.302	0.256	0.271	0.270	0.000	0.000	0.279	0.291	0.273
PH2	0.718	0.675	0.778	0.714	0.692	0.766	0.678	0.733	0.771	0.686	0.680	0.762	0.682	0.669	0.647	0.401	0.190	0.580	0.647	0.619	0.700	0.470	0.474	0.640	0.696	0.651	0.719	0.317	0.204	0.740	0.648	0.808
UDA-1	0.396	0.384	0.386	0.359	0.349	0.400	0.368	0.347	0.394	0.359	0.380	0.419	0.330	0.316	0.336	0.089	0.057	0.281	0.279	0.335	0.360	0.179	0.277	0.351	0.310	0.350	0.330	0.286	0.259	0.365	0.396	0.348
UDA-2	0.479	0.345	0.465	0.420	0.287	0.455	0.382	0.299	0.596	0.523	0.336	0.438	0.522	0.364	0.540	0.390	0.141	0.514	0.393	0.328	0.375	0.376	0.379	0.427	0.516	0.298	0.414	0.000	0.000	0.553	0.486	0.439

Table 2: Average MCC values on test sets by using the three optimization algorithms.

Table 3 shows the results obtained after training the twelve CNN models with the three selected optimization algorithms, SGD, ADAM, and RMSprop; Adam and RMSprop used the same learning rate as SGD.

Dataset	DenseNet121.adam	DenseNet121.rmsprop	DenseNet121.sgd	DenseNet169.adam	DenseNet169.rmsprop	DenseNet169.sgd	DenseNet201.adam	DenseNet201.rmsprop	DenseNet201.sgd	InceptionResNetV2.adam	InceptionResNetV2.rmsprop	InceptionResNetV2.sgd	InceptionV3.adam	InceptionV3.rmsprop	InceptionV3.sgd	InceptionV4.adam	InceptionV4.rmsprop	InceptionV4.sgd	MobileNet.adam	MobileNet.rmsprop	MobileNet.sgd	NASNetMobile.adam	NASNetMobile.rmsprop	NASNetMobile.sgd	ResNet50.sgd	VGG16.sgd	VGG19.sgd	Xception.adam	Xception.rmsprop	Xception.sgd
HAM10000	0.484	0.513	0.553	0.492	0.502	0.547	0.478	0.497	0.628	0.492	0.449	0.550	0.470	0.468	0.591	0.013	0.000	0.462	0.459	0.451	0.463	0.452	0.400	0.578	0.410	0.383	0.363	0.403	0.492	0.479
ISBI2016	0.329	0.306	0.342	0.281	0.299	0.321	0.270	0.297	0.367	0.235	0.278	0.389	0.182	0.314	0.307	0.014	0.025	0.155	0.283	0.252	0.309	0.076	0.018	0.254	0.303	0.000	0.000	0.248	0.256	0.281
ISBI2017	0.215	0.202	0.179	0.227	0.234	0.224	0.251	0.21	0.217	0.172	0.176	0.284	0.141	0.195	0.225	0.011	0.013	0.089	0.178	0.193	0.095	0.160	0.061	0.230	0.000	0.176	0.188	0.205	0.218	0.193
MED-NODE	0.486	0.511	0.506	0.521	0.600	0.504	0.613	0.603	0.508	0.529	0.517	0.505	0.498	0.563	0.567	0.122	0.066	0.494	0.503	0.545	0.533	0.284	0.076	0.466	0.000	0.000	0.000	0.657	0.552	0.539
MSK-1	0.493	0.473	0.575	0.494	0.514	0.560	0.508	0.495	0.555	0.242	0.402	0.592	0.431	0.479	0.574	0.032	0.042	0.368	0.396	0.354	0.494	0.063	0.021	0.533	0.366	0.487	0.461	0.480	0.496	0.405
MSK-2	0.280	0.296	0.314	0.299	0.285	0.323	0.262	0.291	0.325	0.221	0.245	0.327	0.185	0.259	0.282	0.010	0.014	0.197	0.233	0.253	0.263	0.092	0.049	0.269	0.258	0.239	0.157	0.225	0.285	0.232
MSK-3	0.148	0.052	0.093	0.174	0.135	0.114	0.069	0.114	0.165	0.021	0.045	0.095	0.079	0.077	0.080	0.092	0.018	0.044	0.093	0.154	0.124	0.066	0.047	0.119	0.166	0.000	0.000	0.080	0.115	0.149
MSK-4	0.303	0.323	0.406	0.304	0.291	0.391	0.266	0.346	0.393	0.222	0.214	0.398	0.178	0.318	0.365	0.023	0.036	0.270	0.278	0.282	0.269	0.072	0.035	0.302	0.270	0.000	0.000	0.303	0.285	0.273
PH2	0.669	0.679	0.778	0.700	0.659	0.766	0.691	0.736	0.771	0.528	0.609	0.762	0.636	0.724	0.647	0.069	0.067	0.580	0.662	0.664	0.700	0.316	0.116	0.640	0.719	0.317	0.204	0.629	0.592	0.808
UDA-1	0.418	0.379	0.386	0.410	0.377	0.400	0.393	0.407	0.394	0.368	0.330	0.419	0.261	0.357	0.336	0.058	0.038	0.281	0.352	0.340	0.360	0.074	0.058	0.351	0.330	0.286	0.259	0.339	0.392	0.348
UDA-2	0.271	0.199	0.465	0.344	0.324	0.455	0.366	0.284	0.596	0.161	0.215	0.438	0.463	0.206	0.540	0.071	0.050	0.514	0.363	0.264	0.375	0.141	0.126	0.427	0.414	0.000	0.000	0.436	0.234	0.439

Table 3: Average MCC values on test sets by using the three optimization algorithms.

Weight balancing

Table 4 summarizes the results obtained after training the twelve CNN models but considering the weight balancing approach.

Dataset	DenseNet121	DenseNet169	DenseNet201	InceptionResNetV2	InceptionV3	InceptionV4	MobileNet	NASNetMobile	ResNet50	VGG16	VGG19	Xception
HAM10000	0.559	0.554	0.568	0.564	0.588	0.476	0.470	0.584	0.000	0.369	0.367	0.467
ISBI2016	0.314	0.294	0.316	0.329	0.288	0.209	0.302	0.290	0.231	0.196	0.000	0.266
ISBI2017	0.235	0.242	0.270	0.281	0.255	0.092	0.216	0.229	0.177	0.199	0.181	0.183
MED-NODE	0.496	0.462	0.464	0.561	0.540	0.532	0.575	0.514	0.000	0.000	0.000	0.534
MSK-1	0.559	0.544	0.545	0.595	0.607	0.300	0.524	0.550	0.427	0.431	0.153	0.407
MSK-2	0.282	0.303	0.276	0.323	0.265	0.202	0.281	0.235	0.197	0.256	0.223	0.221
MSK-3	0.088	0.162	0.129	0.162	0.167	0.069	0.121	0.090	0.250	0.000	0.000	0.133
MSK-4	0.318	0.329	0.311	0.360	0.331	0.270	0.252	0.288	0.244	0.218	0.139	0.259
PH2	0.779	0.773	0.819	0.794	0.771	0.661	0.780	0.727	0.760	0.678	0.280	0.794
UDA-1	0.432	0.351	0.406	0.437	0.368	0.318	0.395	0.295	0.364	0.279	0.059	0.348
UDA-2	0.460	0.408	0.345	0.385	0.537	0.322	0.432	0.317	0.478	0.000	0.000	0.416

Table 4: Average MCC values on test sets by using weight balancing.

Transfer learning

Table 5 shows the results of each model after applying transfer learning.

Dataset	DenseNet121	DenseNet169	DenseNet201	InceptionResNetV2	InceptionV3	InceptionV4	MobileNet	NASNetMobile	ResNet50	VGG16	VGG19	Xception
HAM10000	0.750	0.757	0.756	0.674	0.686	0.724	0.725	0.709	0.315	0.546	0.424	0.702
ISBI2016	0.441	0.437	0.444	0.441	0.427	0.492	0.481	0.402	0.350	0.341	0.248	0.463
ISBI2017	0.442	0.439	0.448	0.412	0.457	0.444	0.439	0.410	0.255	0.272	0.155	0.456
MED-NODE	0.678	0.691	0.701	0.600	0.680	0.620	0.671	0.707	0.472	0.495	0.459	0.718
MSK-1	0.658	0.653	0.665	0.701	0.668	0.695	0.654	0.628	0.539	0.405	0.357	0.645
MSK-2	0.450	0.429	0.436	0.458	0.450	0.429	0.478	0.385	0.375	0.365	0.301	0.437
MSK-3	0.225	0.328	0.315	0.084	0.125	0.383	0.272	0.201	0.150	0.013	0.032	0.000
MSK-4	0.568	0.557	0.577	0.536	0.516	0.513	0.525	0.455	0.361	0.344	0.336	0.450
PH2	0.872	0.909	0.861	0.716	0.738	0.781	0.834	0.618	0.467	0.635	0.543	0.773
UDA-1	0.549	0.513	0.514	0.501	0.509	0.527	0.507	0.515	0.521	0.449	0.256	0.535
UDA-2	0.332	0.397	0.465	0.325	0.406	0.404	0.471	0.322	0.484	0.485	0.443	0.357

Table 5: Average MCC values on test sets by using transfer learning.

Data augmentation

Table 6 summarizes the results attained by applying the data augmentation (just in train) approach in each model.

Dataset	DenseNet121	DenseNet169	DenseNet201	InceptionResNetV2	InceptionV3	InceptionV4	MobileNet	NASNetMobile	ResNet50	VGG16	VGG19	Xception
HAM10000	0.558	0.552	0.570	0.536	0.550	0.438	0.481	0.568	0.410	0.461	0.424	0.477
ISBI2016	0.318	0.301	0.278	0.322	0.316	0.260	0.298	0.286	0.303	0.205	0.235	0.242
ISBI2017	0.284	0.322	0.303	0.236	0.242	0.200	0.210	0.261	0.000	0.232	0.222	0.178
MED-NODE	0.485	0.503	0.514	0.562	0.614	0.417	0.639	0.501	0.000	0.567	0.504	0.544
MSK-1	0.587	0.591	0.594	0.550	0.582	0.557	0.523	0.564	0.366	0.466	0.462	0.375
MSK-2	0.306	0.295	0.315	0.261	0.262	0.190	0.262	0.271	0.258	0.222	0.239	0.236
MSK-3	0.153	0.168	0.163	0.322	0.218	0.273	0.256	0.238	0.166	0.107	0.136	0.275
MSK-4	0.300	0.254	0.282	0.366	0.323	0.288	0.204	0.247	0.270	0.273	0.214	0.247
PH2	0.675	0.741	0.679	0.789	0.703	0.720	0.780	0.630	0.719	0.705	0.431	0.744
UDA-1	0.378	0.380	0.395	0.524	0.386	0.347	0.380	0.313	0.330	0.369	0.375	0.331
UDA-2	0.428	0.464	0.385	0.484	0.477	0.444	0.355	0.490	0.414	0.450	0.412	0.488

Table 6: Average MCC values on test sets by using data augmentation in train.

Table 7 summarizes the results attained by applying the data augmentation (in train and test) approach in each model.

Dataset	DenseNet121	DenseNet169	DenseNet201	InceptionResNetV2	InceptionV3	InceptionV4	MobileNet	NASNetMobile	ResNet50	VGG16	VGG19	Xception
HAM10000	0.761	0.750	0.753	0.912	0.873	0.798	0.760	0.874	0.510	0.661	0.649	0.921
ISBI2016	0.664	0.630	0.656	0.782	0.655	0.563	0.575	0.750	0.403	0.507	0.511	0.754
ISBI2017	0.710	0.736	0.715	0.847	0.749	0.622	0.744	0.761	0.100	0.584	0.575	0.806
MED-NODE	0.518	0.520	0.514	0.570	0.618	0.464	0.660	0.501	0.100	0.579	0.540	0.584
MSK-1	0.797	0.781	0.792	0.857	0.754	0.701	0.785	0.762	0.466	0.625	0.610	0.748
MSK-2	0.646	0.619	0.631	0.791	0.518	0.462	0.531	0.711	0.358	0.457	0.428	0.753
MSK-3	0.548	0.586	0.588	0.819	0.565	0.500	0.532	0.697	0.266	0.255	0.227	0.819
MSK-4	0.655	0.668	0.696	0.812	0.693	0.670	0.596	0.704	0.370	0.583	0.467	0.763
PH2	0.771	0.819	0.778	0.907	0.840	0.849	0.902	0.835	0.819	0.841	0.587	0.936
UDA-1	0.484	0.524	0.501	0.688	0.489	0.445	0.535	0.524	0.430	0.572	0.555	0.587
UDA-2	0.472	0.503	0.408	0.521	0.471	0.453	0.403	0.490	0.514	0.450	0.412	0.538

Table 7: Average MCC values on test sets by using data augmentation in train and test.

Combining transfer learning and data augmentation

Table 8 shows the results of combining transfer learning and data augmentation.

Dataset	DenseNet121	DenseNet169	DenseNet201	InceptionResNetV2	InceptionV3	InceptionV4	MobileNet	NASNetMobile	ResNet50	VGG16	VGG19	Xception
HAM10000	0.944	0.937	0.954	0.931	0.940	0.874	0.945	0.934	0.870	0.763	0.601	0.959
ISBI2016	0.828	0.820	0.850	0.803	0.802	0.724	0.850	0.805	0.385	0.557	0.625	0.799
ISBI2017	0.868	0.858	0.854	0.857	0.829	0.759	0.875	0.832	0.414	0.743	0.738	0.846
MED-NODE	0.751	0.727	0.698	0.675	0.732	0.559	0.741	0.660	0.289	0.611	0.486	0.745
MSK-1	0.883	0.887	0.880	0.874	0.868	0.796	0.886	0.843	0.350	0.667	0.708	0.856
MSK-2	0.836	0.822	0.830	0.807	0.805	0.719	0.860	0.785	0.350	0.563	0.561	0.815
MSK-3	0.967	0.980	1.000	0.959	0.959	0.756	1.000	1.000	0.606	0.907	0.911	0.927
MSK-4	0.865	0.853	0.864	0.852	0.844	0.772	0.890	0.857	0.482	0.906	0.825	0.822
PH2	0.987	0.960	0.960	0.939	0.963	0.833	0.963	0.934	0.836	0.909	0.923	0.944
UDA-1	0.718	0.721	0.764	0.725	0.720	0.680	0.781	0.706	0.463	0.601	0.585	0.692
UDA-2	0.305	0.459	0.522	0.464	0.413	0.327	0.577	0.548	0.485	0.425	0.477	0.362

Table 8: Average MCC values on test sets by using transfer learning and data augmentation in train and test.

Complexity of the Convolutional Neural Network structures used in this work

The following Tables show some useful metrics to measure the complexity of the CNN models applied in this work. The run time and the amount of utilized GPU memory are reported considering all datasets. The models were analysed considering the settings from Phase 1, meaning that data augmentation was not performed. Regarding run time, we considered two scenarios: (I) how long it takes a CNN model to complete 150 epochs and (II) how long it takes a CNN model to complete an evaluation cycle (150 epochs × 10 folds × 3 repetitions); the run time is expressed in hours and the amount of GPU memory in megabytes.

Dataset	DenseNet121	DenseNet169	DenseNet201	InceptionResNetV2	InceptionV3	InceptionV4	MobileNet	NASNetMobile	ResNet50	VGG16	VGG19	Xception
HAM10000	3.88	5.14	6.50	7.83	3.87	5.46	1.68	6.03	3.29	3.81	4.38	4.62
ISBI2016	1.13	1.56	1.93	2.06	0.97	1.27	0.40	1.94	0.79	0.76	0.85	0.96
ISBI2017	1.75	2.37	2.97	3.36	1.63	2.22	0.68	2.85	1.35	1.45	1.65	1.79
MED-NODE	0.68	0.95	1.16	1.09	0.48	0.57	0.18	1.24	0.37	0.25	0.26	0.35
MSK-1	1.06	1.46	1.81	1.90	0.90	1.16	0.36	1.82	0.72	0.68	0.77	0.86
MSK-2	1.24	1.69	2.11	2.29	1.08	1.43	0.45	2.09	0.89	0.87	0.98	1.10
MSK-3	0.70	1.00	1.21	1.14	0.51	0.61	0.20	1.28	0.39	0.29	0.31	0.38
MSK-4	0.99	1.36	1.67	1.77	0.82	1.05	0.33	1.72	0.65	0.60	0.67	0.77
PH2	0.69	0.97	1.18	1.12	0.50	0.58	0.19	1.25	0.38	0.27	0.28	0.37
UDA-1	0.84	1.17	1.44	1.49	0.65	0.81	0.26	1.47	0.52	0.43	0.47	0.56
UDA-2	0.66	0.91	1.11	1.01	0.44	0.50	0.16	1.17	0.33	0.21	0.22	0.30
Average	1.24	1.69	2.10	2.28	1.08	1.42	0.44	2.08	0.88	0.87	0.99	1.10

Table 9: Run time employed by the CNN models during training for 150 epochs. The run time is expressed in hours.

Dataset	DenseNet121	DenseNet169	DenseNet201	InceptionResNetV2	InceptionV3	InceptionV4	MobileNet	NASNetMobile	ResNet50	VGG16	VGG19	Xception
HAM10000	116.35	154.29	194.92	234.88	116.10	163.82	50.41	180.96	98.73	114.21	131.42	138.65
ISBI2016	34.01	46.86	57.79	61.66	29.22	38.01	11.94	58.28	23.61	22.73	25.56	28.80
ISBI2017	52.36	70.99	89.13	100.80	48.87	66.62	20.49	85.43	40.51	43.40	49.50	53.58
MED-NODE	20.46	28.48	34.86	32.57	14.47	17.09	5.46	37.18	11.05	7.44	7.79	10.37
MSK-1	31.93	43.84	54.45	57.11	27.13	34.91	10.95	54.73	21.58	20.48	22.98	25.91
MSK-2	37.13	50.74	63.22	68.74	32.50	42.83	13.40	62.68	26.59	26.17	29.48	32.99
MSK-3	20.99	29.92	36.25	34.29	15.42	18.29	5.89	38.49	11.75	8.60	9.24	11.53
MSK-4	29.72	40.77	50.04	53.01	24.48	31.55	9.88	51.61	19.63	17.92	20.01	23.04
PH2	20.68	29.24	35.43	33.47	14.93	17.50	5.68	37.58	11.29	7.95	8.44	10.96
UDA-1	25.16	35.08	43.13	44.66	19.62	24.33	7.71	44.24	15.46	12.90	14.23	16.92
UDA-2	19.82	27.40	33.16	30.24	13.31	15.14	4.93	35.25	9.95	6.43	6.56	8.90

Table 10: Run time employed by the CNN models to complete an evaluation cycle (150 epochs × 10 folds × 3 repetitions). The run time is expressed in hours.

CNN	GPU memory (megabytes)	Parameters (millions)
MobileNet	671	4
NASNetMobile	703	5
ResNet50	873	25
DenseNet121	933	8
DenseNet169	1015	14
DenseNet201	1043	20
InceptionV3	1085	23
InceptionV4	1241	41
Xception	1271	22
InceptionResNetV2	1437	55
VGG16	2051	138
VGG19	2099	143
Table 11: GPU memory used by the Convolutional Neural Network models. The models are sorted by GPU memory.