This page provides datasets as well as links to machine learning methods implementation of the paper entitled Machine learning methods for binary and multiclass classification of melanoma thickness from dermoscopic images published in IEEE Transactions on Medical Imaging. If you use these datasets please properly cite the associated publication.
1. Citation details
Aurora Sáez, Javier Sánchez-Monedero, Pedro Antonio Gutiérreza and César Hervás-Martínez, Machine learning methods for binary and multiclass classification of melanoma thickness from dermoscopic images, IEEE Transactions on Medical Imaging, pp 1036-1045, Volume 35, Issue 4, 2016. DOI: 10.1109/TMI.2015.2506270
2. Abstract of the paper
Abstract—Thickness of the melanoma is the most important factor associated with survival in patients with melanoma. It is most commonly reported as a measurement of depth given in millimeters (mm), and computed by means of pathological examination after a biopsy of the suspected lesion. In order to avoid the use a invasive method in the estimation of the thickness of melanoma before surgery, we propose a computational image analysis system from dermoscopic images. The proposed feature extraction is based on the clinical findings that correlate certain characteristics present in dermoscopic images and tumor depth. Two supervised classification schemes are proposed: a binary classification in which melanomas are classified into thin or thick, and a three-classes scheme (thin, intermediate, and thick). The performance of several nominal classification methods, among them a recent interpretable method combining logistic regression with artificial neural networks (Logistic regression using Initial variables and Product Units, LIPU), is compared. For the three classes problem, a set of ordinal classification methods (considering ordering relation between the three classes) is included. For the binary case, LIPU outperforms the other methods with an accuracy of 77.6%, and for the second scheme, the ordinal classification methods achieve a better balance between the accuracies obtained for all classes.
This section includes the datasets corresponding to binary and ordinal versions of the problem. We include the whole dataset and the 10-fold partitions:
Each file contains one folder for each dataset containing the10-fold train and generalization (test) partitions. Each partition is in three file formats:
- matlab: files used by ORCA framework.
- weka: Weka file format.
- nnep: JCLEC-NNEP file format (file format description available at Partitions and Source Code section of AYRNA's website)
4. Links to nominal and ordinal classification implementation
We use the ORCA (Ordinal Regression and Classification Algorithms) which is MATLAB framework for the following methods:
- Kernel Discriminat Analisys (KDA)
- Support Vector Machine for Classification (SVC)
- Support Vector Ordinal Regression with implicit constraints (SVORIM)
- RED-SVM which applies the reduction from cost-sensitive ordinal ranking to weighted binary classification (RED) framework to SVM
- Kernel Discriminant Learning for Ordinal Regression (KDLOR)
For Logistic regression using Initial variables and Product Units (LIPU) and Product Units Neural Network (PUNN) we used source code available at http://www.uco.es/grupos/ayrna/en/partitions-and-datasets/#paguitierrez2011ieeetnn.
For Logistic Regresion (LR), we used SimpleLogistic implementation available at Weka.
5. Confusion matrices
As supplementary experimental information, we provide here generalization performance confusion matrices for all the methods in the paper:
|Binary classification problem|
|Ordinal classification problem|