Home Publicaciones Conferenciasa-n-spheres-based-synthetic-data-generator-for-supervised-classification

A n-spheres based synthetic data generator for supervised classification

Áreas de investigación:

Methodology - Machine learning

Año:

2013

Tipo de publicación:

Artículo en conferencia

Palabras clave:

synthetic data, data generator, data complexity, ordinal classifica- tion, ordinal regression, experimental design

Autores:

Volumen:

7902

Título del libro:

International Work Conference on Artificial Neural Networks (IWANN 2013)

Serie:

Lecture Notes in Computer Science

Páginas:

613-621

Organización:

Tenerife, Spain

Mes:

12th-14th June

ISBN:

978-3-642-38678-7

BibTex:

@conference{nSSDG2013,
author = "Javier S{\'a}nchez-Monedero and Pedro Antonio Guti{\'e}rrez and Mar{\'i}a P{\'e}rez-Ortiz and C{\'e}sar Herv{\'a}s-Mart{\'i}nez",
abstract = "Synthetic datasets can be useful in a variety of situations, specifically when new machine learning models and training algorithms are developed or when trying to seek the weaknesses of an specific method. In contrast to real-world data, synthetic datasets provide a controlled environment for analysing concrete critic points such as outliers tolerance, data dimensionality influence and class imbalance, among others. In this paper, a framework for synthetic data generation is developed with special attention to patterns ordered in the space, data dimensionality, class overlapping and data multimodality. Variables such as position, width and overlapping of data distributions in the n-dimensional space are controlled by considering them as n-spheres. The method is tested in the context of ordinal regression, a paradigm of classification where there is an order arrangement between categories. The contribution of the paper is the full control over data topology and over a set of relevant statistical properties of the data.
",
booktitle = "International Work Conference on Artificial Neural Networks (IWANN 2013)",
doi = "10.1007/978-3-642-38679-4_62",
isbn = "978-3-642-38678-7",
keywords = "synthetic data, data generator, data complexity, ordinal classifica- tion, ordinal regression, experimental design",
month = "12th-14th  June",
organization = "Tenerife, Spain",
pages = "613--621",
publisher = "Springer-Verlag Berlin Heidelberg",
series = "Lecture Notes in Computer Science",
title = "{A} n-spheres based synthetic data generator for supervised classification",
url = "http://dx.doi.org/10.1007/978-3-642-38679-4_62",
volume = "7902",
year = "2013",
}

Abstract:

Synthetic datasets can be useful in a variety of situations, specifically when new machine learning models and training algorithms are developed or when trying to seek the weaknesses of an specific method. In contrast to real-world data, synthetic datasets provide a controlled environment for analysing concrete critic points such as outliers tolerance, data dimensionality influence and class imbalance, among others. In this paper, a framework for synthetic data generation is developed with special attention to patterns ordered in the space, data dimensionality, class overlapping and data multimodality. Variables such as position, width and overlapping of data distributions in the n-dimensional space are controlled by considering them as n-spheres. The method is tested in the context of ordinal regression, a paradigm of classification where there is an order arrangement between categories. The contribution of the paper is the full control over data topology and over a set of relevant statistical properties of the data.

Versión en línea [Bibtex] [RIS] [MODS]

Back