datapro4j

The data processing library for Java

 

 

The programmers guide

 

Revision: 1

 

 

 

 

 

 

Please, cite this document as:

 

J.R. Romero, J.M. Luna, S. Ventura (2012). datapro4j: the data processing library for Java. Dept. of Computer Science and Numerical Analysis, University of Córdoba (Spain). Available for download from http://www.uco.es/grupos/kdis/datapro4j

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Knowledge Discovery and Intelligent Systems

University of Córdoba, Spain

http://www.uco.es/grupos/kdis                                                                                         July 2012


 

 

 

CONTACT INFO

 

José Raúl Romero, PhD

Dept. Computer Science and Numerical Analysis

University of Córdoba, Spain

 

Email: jrromero@uco.es

Web: http://www.jrromero.net/en

 

 

 

PARTICIPANTS (BY ALPHABETICAL ORDER)

 

     de la Torre pez, José. BSc. [JTL]

     Luna, José María, MSc. [JML]

     Orozco Borrego, Mario. BSc. [MOB]

     Ramírez Quesada, Aurora. MSc. [ARQ]

 

 

 

PROJECT HISTORY

 

Version

Date

Description

Participants

0.1

July 2011

Initial version. Intruder algorithms.

ARQ, JTL, JML, JRR

0.2

September 2011

Strategies and columns

MOB, JML, JRR

0.3

April 2012

Refactoring, performance improvements

and testing

ARQ, JML, JRR

0.4

Under development

Weka wrappers for preprocessing, association, clustering and classification

JRR

0.5

Under development

New dataset sources from relational databases and noSQL databases

JRR

 

 

 

 

DOCUMENT HISTORY

 

Revision

Date

Description

Author

1

July 17, 2012

Initial version of this document

JRR

 

 

 

 

 

 

 

 

 

 

 

 


 

TABLE OF CONTENTS

 

 

TABLE OF FIGURES  6

Introduction   8

Purpose  8

Scope  8

License  8

Overview    9

To-do list  9

Package es::uco::kdis::datapro   10

Package es::uco::kdis::datapro::algorithm    11

Package es::uco::kdis::datapro::algorithm::base  12

Class DatasetStrategy  12

Package es::uco::kdis::datapro::algorithm::intruder   16

Class AverageAttack  16

Class BandwagonAttack  18

Class DatasetStatistics  21

Class IntruderAttack  22

Class LoveHateAttack  27

Class RandomAttack  29

Class ReverseBandwagonAttack  31

Class SegmentAttack  32

Package es::uco::kdis::datapro::algorithm::preprocessing   35

Package es::uco::kdis::datapro::algorithm::preprocessing:: discretization   36

Class EqualFrequencyDiscretization   39

Class EqualWidthDiscretization   36

Class MDLPDiscretize  40

Package es::uco::kdis::datapro::algorithm::preprocessing:: instance  43

Class RemoveDuplicates  43

Class RemovePercentage  44

Package es::uco::kdis::datapro::algorithm::validation   48

Class KFolds  48

Package es::uco::kdis::datapro::dataset   51

Class Dataset  51

Class FileDataset  64

Class InstanceIterator  68

Interface IIterator  70

Package es::uco::kdis::datapro::dataset::Column   72

Class ColumnAbstraction   72

Class ColumnImpl  79

Enumeration ColumnType  85

Class BinaryColumn   87

Class BinaryColumnImpl  89

Class CategoricalColumnImpl  95

Class DateColumn   100

Class DateColumnImpl  102

Class IntegerColumn   105

Class IntegerColumnImpl  108

Class NominalColumn   110

Class NominalColumnImpl  111

Class NumericalColumn   115

Class NumericalColumnImpl  119

Class RangeColumn   123

Class RangeColumnImpl  125

Package es::uco::kdis::datapro::dataset::Source  130

Class ArffDataset  130

Class CsvDataset  135

Class ExcelDataset  139

Class KeelDataset  142

Package es::uco::kdis::datapro::datatypes  146

Class InvalidValue  146

Class EmptyValue  147

Class MissingValue  147

Class NullValue  148

Class Range  149

Class DoubleRange  152

Package es::uco::kdis::datapro::exception   154

Class IllegalFormatSpecificationException   154

Class NoSuchCategoryException   155

Class NotAddedValueException   156

Appendix A: UML diagrams  157

Package es.uco.kdis.datapro.algorithm.base  157

Package es.uco.kdis.datapro.algorithm.preprocessing   158

Package es.uco.kdis.datapro.dataset columns  159

Package es.uco.kdis.datapro.dataset.Source  160

Appendix B: Extending the library   162

Project structure  162

Code documentation   163

Coding recommendations  164

 

 

 

 

 

 

 

 

 

 

 

 

 


 

 

TABLE OF FIGURES

 

 

Package es.uco.kdis.datapro.algorithm_ 11

Package es.uco.kdis.datapro.algorithm.base 12

Class DatasetStrategy 12

Package es.uco.kdis.datapro.algorithm.intruder_ 16

Package es.uco.kdis.datapro.algorithm.preprocessing_ 35

Package es.uco.kdis.datapro.algorithm.preprocessing.discretization_ 36

Class EqualFrequencyDiscretization_ 39

Class EqualWidthDiscretization_ 36

Class MDLPDiscretize 40

Package es.uco.kdis.datapro.algorithm.preprocessing.instance 43

Class RemoveDuplicates 43

Class RemovePercentage 45

Package es.uco.kdis.datapro.algorithm.validation_ 48

Package es.uco.kdis.datapro.dataset 51

Class Dataset 52

Class FileDataset 64

Class InstanceIterator_ 69

Interface IIterator_ 70

Package es.uco.kdis.datapro.dataset.Column_ 72

Abstract class ColumnAbstraction_ 73

Abstract class ColumnImpl 80

Enumeration ColumnType 86

Class BinaryColumn_ 87

Class BinaryColumnImpl 89

Class CategoricalColumn_ 92

Class CategoricalColumnImpl 95

Class DateColumn_ 101

Class DateColumnImpl 102

Class IntegerColumn_ 105

Class IntegerColumnImpl 108

Class NominalColumn_ 110

Class NominalColumnImpl 112

Class NumericalColumn_ 115

Class NumericalColumnImpl 119

Class RangeColumn_ 123

Class RangeColumnImpl 125

Package es.uco.kdis.datapro.dataset.Source 130

Class ArffDataset 130

Class CsvDataset 135

Class ExcelDataset 139

Class KeelDataset 143

Package es.uco.kdis.datapro.datatypes 146

Class InvalidValue 146

Class EmptyValue 147

Class MissingValue 148

Class NullValue 149

Class Range 150

Class DoubleRange 152

Package es.uco.kdis.datapro.exception_ 154

Class IllegalFormatSpecificationException_ 154

Class NoSuchCategoryException_ 155

Class NotAddedValueException_ 156

Class diagram: package overview_ 157

Class diagram: package es.uco.kdis.datapro.algorithm.base 157

Class diagram: Package es.uco.kdis.datapro.algorithm.preprocessing_ 158

Class diagram: Package es.uco.kdis.datapro.dataset.Column_ 159

Package es.uco.kdis.datapro.dataset.Source 160

Class diagram: Package es.uco.kdis.datapro.datatypes 161

Class diagram: Package es.uco.kdis.datapro.exception_ 161

 

 

 


 

Introduction

 

 

Purpose

 

This document provides class, interface, and enumeration specification for the datapro4j library. The specification provides the details of the types being modeled within the system.

The datapro4j library is conceived to provide fully support to the efficient handling of data sets from different sources and declaring different kind of data types. This task often takes too long to the Java programmer, especially in certain domains, such as analytical analysis or data mining. Notice that this library is not provided for a given application domain, just for those that require the handling of structured data in Java from diverse data sources.

Therefore, datapro4j can be used in data mining for handling inputs or preprocessing data, using both internal strategies (e.g. algorithms on instances, discretization, etc.) or external tools (e.g. Weka or any other application). It can be also used for handling outputs: for example, in migrating data to other different formats, rearrange results from external tools or algorithms, executing statistical tests, etc.

It is worth mentioning that datapro4j is conceived to be extended, adding new algorithms, data formats, column types, etc. All these aspects are independent of each other, so algorithms can be executed regardless of being introduced in diverse formats (stored in noSQL databases, as an ARFF file, or whichever).

 

Scope

 

This document is intended to define the class specification for the datapro4j library.

 

License

 

Copyright Š 2012 The authors (University of Cordoba, Spain)

 

This software was developed by members of the Knowlegde Discovery and Intelligent Systems at the University of Córdoba, Spain. For further information on the library and modifications, please visit the URL http://www.uco.es/grupos/kdis/datapro4j

 

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.

 

 

Redistribution and use of binary forms, with or without modification, are permitted if the following conditions are met:

ˇ         Redistributions of source code must retain the above copyright notice, this list of conditions and the disclaimer above.

ˇ         Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

ˇ         All advertising materials or publication mentioning features or use of this software must display the following acknowledgement: This product includes software developed by the KDIS Research Group at the University of Córdoba (Spain) and its contributors.” or cite the following reference:

 

J.R. Romero, J.M. Luna, S. Ventura (2012). datapro4j: the data processing library for Java. Dept. of Computer

Science and Numerical Analysis, University of Córdoba (Spain). Available for download from http://www.uco.es/grupos/kdis/datapro4j

 

ˇ         Neither the name of the University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

ˇ         Commercial use of this software or part of it is not allowed without specific prior written permission.

ˇ         Licensing and conditions are subject to change without notice.

Note: At the moment this software is provided in binary form as a Java library. Source code is not provided (we plan to release the Java source code in a near future).

 

Overview

This document provides a list of all packages with a summary for each. Each package has a section that contains a list of its classes, interfaces and enumeration type, with a summary for each. Class and Interface contains description, summary tables, detailed member descriptions, and relation table.

Private properties are omitted. Protected properties are shown when useful for external programmers.

 

To-do list

 

In the near future, this library will be updated with the following features (not necessarily in this order):

ˇ         Listeners in strategies.

ˇ         Graphical UI. (Some minor support is already provided).

ˇ         Generation of synthetic datasets under precise constraints.

ˇ         Multipart datasets: those datasets which are not possible to be fully stored in memory, so they need to be split and partially retrieved.

ˇ         Different data mining support.

ˇ         Wrappers for different datasets and tools.

o    A wrapper for Weka is under development.

ˇ         Access to different databases.

o    Access thru JDBC to RDBMS engines (e.g. MySQL, Oracle) is under development.

o    Access to no-sql engines (e.g. Cassandra) is under development.

ˇ         More dataset formats:

o    Currently, the following formats are supported: ARFF, KEEL, CSV, Excel

o    The following formats are under development: XRFF


 

Package es::uco::kdis::datapro

 

 

The library base package. The software is mainly divided into three different components:

ˇ         Dataset and columns. The logical abstract representation of a dataset and its attributes.

ˇ         Dataset and sources. The physical representation of a dataset, serialized in files, stored in databases or any other device.

ˇ         Dataset and strategies. Any algorithm running on a single dataset, set of datasets or column.

 

Name

datapro

Qualified Name

es::uco::kdis::datapro


 

Package es::uco::kdis::datapro::algorithm

 

 

Only those public strategies are described here. Developers can easily provide their own strategies.

 

Figure 1. Package es.uco.kdis.datapro.algorithm

 

 

Name

algorithm

Qualified Name

es::uco::kdis::datapro::algorithm


 

Package es::uco::kdis::datapro::algorithm::base

 

 

Figure 2. Package es.uco.kdis.datapro.algorithm.base

 

 

Name

base

Qualified Name

es::uco::kdis::datapro::algorithm::base

 

 

Class DatasetStrategy

This class represents a generic strategy.

Strategies are a well-known design pattern, where algorithms are encapsulated into classes. Strategies should be executed using either a sequential or a step-by-step process. In general, every strategy is executed according to the following sequence:

ˇ         Creation: the strategy constructor should collect all the parameters required by the algorithm to be initialized and executed for the first time. Build as many constructors as required.

ˇ         Initialization: the method initialize() implements any preprocessing step required to the algorithm to be executed. This preprocessing is not a part of the algorithm itself but it should be executed for the first time that the algorithm is invoked.

ˇ         Execution: the method execution()  runs the algorithm once using the parameters introduced when the constructor was invoked, and initialized afterwards. If the algorithm has finished and it could not be invoked any more, then the method setExecutable(false) should be called. On the contrary, the execution is allowed until the stop criteria are fulfilled.

ˇ         Stop criteria: the method isExecutable returns true if the algorithm can be executed once more over the dataset; false, otherwise.

ˇ         Post-execution: Any post-processing step has to be implemented by the method postexec().

ˇ         Result collection: Final results are collected from the dataset, if changed, and returned from the method getResult().

 

Figure 3. Class DatasetStrategy

 

 

Name

DatasetStrategy

Qualified Name

es::uco::kdis::datapro::algorithm::base::DatasetStrategy

Visibility

public

Abstract

true

Base Classifier

 

Realized Interface

 

 

Attribute Detail

bExecutable

Execution flag. This is protected only for inheritance purposes, and should be never directly modified.

Type

boolean

Default Value

true

Visibility

protected

Multiplicity

 

 

oDataset

Dataset used by the strategy.

Type

Dataset

Default Value

 

Visibility

protected

Multiplicity

 

 

 

Operation Detail

execute

This method is invoked to execute the strategy.

 

Type

void

Visibility

public

Is Abstract

true

Parameter

 

 

getDataset

Getter method for the dataset attribute.

 

Type

Dataset

Visibility

protected

Is Abstract

false

Parameter

 

 

getResult

This method returns an object comprising the resulting Object of the process

 

Type

Object

Visibility

public

Is Abstract

true

Parameter

 

 

initialize

This method calls the Initialization process of the strategy.

 

 

Type

void

Visibility

public

Is Abstract

true

Parameter

 

 

isExecutable

This method returns true if the strategy is in an executable state.

 

Type

boolean

Visibility

public

Is Abstract

false

Parameter

 

 

postexec

This method should be invoked, if required, after the strategy execution.

 

Type

void

Visibility

public

Is Abstract

true

Parameter

 

 

setDataset

This method sets the dataset to be used by the strategy.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

     inout data : Dataset

 

setExecutable

This method sets the current executable state of the strategy.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

     in bExecutable : boolean

 

Relation Detail

Generalization

 

Name

 

Related Element

     EqualFrequencyDiscretization

 

Name

 

Related Element

     EqualWidthDiscretization

 

Name

 

Related Element

     MDLPDiscretize

 

 

Name

 

Related Element

     RemoveDuplicates

 

Name

 

Related Element

     IntruderAttack

 

Name

 

Related Element

     KFolds

 

Name

 

Related Element

     RemovePercentage

 

Name

 

Related Element

     DatasetStatistics


 

Package es::uco::kdis::datapro::algorithm::intruder

 

 

Figure 4. Package es.uco.kdis.datapro.algorithm.intruder

 

 

Name

intruder

Qualified Name

es::uco::kdis::datapro::algorithm::intruder

 

Class AverageAttack

This class implements the Average Attack. This attack strategy sets the maximum value (push attack) or the minimum value (nuke attack) to the target item. The filler items are selected randomly and their values are also randomly chosen over a Normal Distribution, using the mean and standard deviation of the own item.

For a further description see the following paper:

B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.

 

Name

AverageAttack

Qualified Name

es::uco::kdis::datapro::algorithm::intruder::AverageAttack

Visibility

public

Abstract

false

Base Classifier

     IntruderAttack

Realized Interface

 

 

 

Operation Detail

AverageAttack

Parameterized Constructor.

     oDataset The original dataset

     iNumAttacks The number of attack instances

     bPush The attack type (true, push; false, nuke)

     iTarget The target item (The column attribute/item index)

     iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size

     dXRand The possibility of choose an item as selected/filler item

     iSeed The random seed

 

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     in bPush : boolean

     in dXRand : double

     in iNumAttacks : int

     in iNumFillers : int

     in iSeed : int

     in iTarget : int

     inout oDataset : Dataset

 

chooseSelectedItems

The Average Attack does not use the selected item set.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

initialize

Initialization method.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

setFillerValues

In the Average Attack, the values for filler items must be randomly generated by a Normal Distribution, using the mean and standard deviation of each item.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

setSelectedValues

The Average Attack does not use the selected item set.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

 

Relation Detail

Generalization

 

Name

 

Related Element

     IntruderAttack

 

 

Class BandwagonAttack

This class implements the Bandwagon Attack. This attack strategy sets the maximum value (push attack) to the target item. Then, a set of items, named selected items, are chosen between the most visibility items.

The visibility items are those having a high mean and high evaluation density. For a further description see the following paper:

B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.

 

Name

BandwagonAttack

Qualified Name

es::uco::kdis::datapro::algorithm::intruder::BandwagonAttack

Visibility

public

Abstract

false

Base Classifier

     IntruderAttack

Realized Interface

 

 

Attribute Detail

dDensity

The density threshold, i.e. the minimum number of values in the column.

Type

double

Default Value

 

Visibility

protected

Multiplicity

 

 

dVisibility

The visibility threshold, i.e., the possibility of choose an item to act as selected item.

 

Type

double

Default Value

 

Visibility

protected

Multiplicity

 

 

rgdMeanSD

It stores the mean and standard deviation of the overall dataset.

 

Type

Double

Default Value

new ArrayList<Double>()

Visibility

protected

Multiplicity

0..*

 

rgoVisibilityColumns

The array of columns whose visibility exceed the thresholds dXVisibility and dXDensity.

Type

Integer

Default Value

new ArrayList<Integer>()

Visibility

package

Multiplicity

0..*

 

rgoVisibilityMeans

The array of mean columns whose visibility exceed the thresholds dXVisibility and dXDensity.

 

Type

Double

Default Value

new ArrayList<Double>()

Visibility

package

Multiplicity

0..*

 

 

Operation Detail

BandwagonAttack

Parameterized Constructor:

     oDataset The original dataset

     iNumAttacks The number of attack instances

     iTarget The target item (The column attribute/item index)

     iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size

     iNumSelected The size of selected item set

     dVisibility The visibility threshold (absolute value of column mean).

     dDensity The density threshold (absolute value of instances without counting null, empty or missing values in the column)

     dXRand The possibility of choose an item as filler item

     iSeed The random seed

 

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     in dDensity : double

     in dVisibility : double

     in dXRand : double

     in iNumAttacks : int

     in iNumFillers : int

     in iNumSelected : int

     in iSeed : int

     in iTarget : int

     inout oDataset : Dataset

 

chooseSelectedItems

Create the set of selected items. The size is prefixed by iNumSelected property.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

initialize

Initialization method for the strategy.

 

 

 

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

orderArray

Order the columns using their mean as comparative metric. This method implements the QuickSort algorithm.

     iInit The initial position of the array

     iEnd The end position in the array

 

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

     in iEnd : int

     in iInit : int

 

setFillerValues

In the Bandwagon Attack, the values for filler items must be randomly generated by a Normal Distribution, using the mean and standard deviation of the overall dataset.

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

setSelectedValues

Set the values of selected items. In the Bandwagon Attack, each selected item has the maximum value.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

setVisibilityColumns

Select the columns that exceed the visibility and density threshold.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

 

Relation Detail

Generalization

 

 

Name

 

Related Element

     ReverseBandwagonAttack

 

 

 

Name

 

Related Element

     IntruderAttack

 

 

Class DatasetStatistics

 

Name

DatasetStatistics

Qualified Name

es::uco::kdis::datapro::algorithm::intruder::DatasetStatistics

Visibility

public

Abstract

false

Base Classifier

     DatasetStrategy

Realized Interface

 

 

Attribute Detail

 

All attributes are private.

 

Operation Detail

DatasetStatistics

Constructor. A parameter is required:

     data Dataset over which the statistical strategy will be executed.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout data : Dataset

 

execute

It executes the algorithm.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

getResult

It returns the mean and SD in form of an ArrayList of Double values.

 

Type

ArrayList<Double>

Visibility

public

Is Abstract

false

Parameter

 

 

Initialize

Inialization/Pre-processing method for the strategy.

 

 

 

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

postexec

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

 

Relation Detail

Generalization

 

Name

 

Related Element

     DatasetStrategy

 

 

Class IntruderAttack

IntruderAttack is the abstract base class for all the intruder attack algorithms. This class represents a generic attack used to alter the content of a dataset. It extends DatasetStrategy, whose methods are implemented and adapted to a general intruder strategy.

For a further description see the paper:

B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.

 

Name

IntruderAttack

Qualified Name

es::uco::kdis::datapro::algorithm::intruder::IntruderAttack

Visibility

public

Abstract

true

Base Classifier

     DatasetStrategy

Realized Interface

 

 

Attribute Detail

bPush

bPush represents the version of the algorithm (true, for push attack; false for nuke attack).

 

Type

boolean

Default Value

 

Visibility

protected

Multiplicity

 

 

dXRand

dXrand represents the possibility of choosing an itemm(attribute) as filler item.

 

 

 

Type

double

Default Value

 

Visibility

protected

Multiplicity

 

 

iActualInstance

iActualInstance represents the dataset instance modified by the attack.

 

 

Type

Int

Default Value

 

Visibility

Protected

Multiplicity

 

 

iNumAttacks

iNumAttacks represents the number of attack instances that will be generated.

 

 

Type

int

Default Value

 

Visibility

protected

Multiplicity

 

 

iNumFillers

iNumFillers is the number of filler items, -1 if the filler item set size is randomly chosen.

 

Type

int

Default Value

 

Visibility

protected

Multiplicity

 

 

iNumSelected

iNumSelected is the number of selected items, -1 if the selected item set size is randomly chosen.

 

Type

Int

Default Value

 

Visibility

Protected

Multiplicity

 

 

iSeed

iSeed is the seed for the oRand object.

 

Type

Int

Default Value

 

Visibility

Protected

Multiplicity

 

 

iTarget

iTarget is the target attribute of the attack.

 

Type

int

Default Value

 

Visibility

protected

Multiplicity

 

 

oInjection

oInjection stores the attack instances.

 

Type

Dataset

Default Value

 

Visibility

protected

Multiplicity

 

 

oRand

oRand represents a random object.

 

Type

Random

Default Value

 

Visibility

protected

Multiplicity

 

 

rgoFillers

rgoFillers is the set of selected items.

 

Type

ColumnAbstraction

Default Value

new ArrayList<ColumnAbstraction>()

Visibility

protected

Multiplicity

0..*

 

rgoSelected

rgoSelected is the set of selected items.

 

Type

ColumnAbstraction

Default Value

new ArrayList<ColumnAbstraction>()

Visibility

protected

Multiplicity

0..*

 

 

Operation Detail

addAttack

Add a new instance (all items set to missed value) to the injection.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

 

chooseFillerItems

Select the set of filler items. This set is common for all the intruder attack algorithms.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

chooseSelectedItems

Select the set of selected items. The selection process is part of a specific intruder attack algorithm.

 

Type

void

Visibility

protected

Is Abstract

true

Parameter

 

 

createRandomSetOfFiller

Select a random set of columns to act as filler items. The set size is also randomly selected. It returns the array of dataset columns that will act as filler items.

 

Type

ArrayList<ColumnAbstraction>

Visibility

protected

Is Abstract

false

Parameter

 

 

createSetOfFiller

Select a random set of columns to act as filler items. The set size is prefixed by iNumFiller property. It returns the array of dataset columns that will act as filler items.

 

Type

ArrayList<ColumnAbstraction>

Visibility

protected

Is Abstract

false

Parameter

 

 

execute

Implements the strategy of attack algorithms.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

getMeanAndSD

Calculate the mean and standard deviation of the overall dataset. It returns an array with two elements, mean and standard deviation.

 

Type

ArrayList<Double>

Visibility

protected

Is Abstract

false

Parameter

 

 

getResult

Return the dataset injection created. It returns the object comprising the injection after the attack.

 

Type

Object

Visibility

public

Is Abstract

false

Parameter

 

 

initialize

Initialize the algorithm to prepare the execution.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

isSelectedColumn

This method returns a true value if the rgoSelected contains a column named as sName parameter, false otherwise.

ˇ         sName The name of the column to be searched. It returns True if the column exists, false if not.

 

Type

boolean

Visibility

protected

Is Abstract

false

Parameter

ˇ         inout sName: String

ˇ          

ˇ          

 

postexec

Post-processing after the execute method.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

setFillerValues

This method assigns the correct value for each filler item. It depends on the intruder  attack algorithm.

 

Type

void

Visibility

protected

Is Abstract

true

Parameter

 

 

setMaximumValue

Assign the maximum value to the target item.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

setMinimumValue

Assign the minimum value to the target item.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

setSelectedValues

The selected items value generation process. It is also depends on the specific intruder attack algorithm.

 

Type

void

Visibility

protected

Is Abstract

true

Parameter

 

 

 

Relation Detail

Generalization

 

 

Name

 

Related Element

     AverageAttack

 

Name

 

Related Element

     DatasetStrategy

 

Name

 

Related Element

     RandomAttack

 

Name

 

Related Element

     LoveHateAttack

 

Name

 

Related Element

     BandwagonAttack

 

Name

 

Related Element

     SegmentAttack

 

 

Class LoveHateAttack

This class implements the Love/Hate Attack. This attack strategy sets the maximum value (push attack) or the minimum value (nuke attack) to the target item. The filler items are selected randomly and their values are assigned in the opposite sense of the target item.

 

For a further description see the paper:

B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems:  An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.

 

Name

LoveHateAttack

Qualified Name

es::uco::kdis::datapro::algorithm::intruder::LoveHateAttack

Visibility

public

Abstract

false

Base Classifier

     IntruderAttack

Realized Interface

 

 

 

Operation Detail

chooseSelectedItems

The Love/Hate Attack does not use the selected items.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

initialize

Initialization method.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

LoveHateAttack

Parameterized Constructor:

     oDataset The original dataset

     iNumAttacks The number of attack instances

     bPush The attack type (true, push; false, nuke)

     iTarget The target item (The column attribute/item index)

     iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size

     dXRand The possibility of choose an item as selected/filler item

     iSeed The random seed

 

Type

 

Visibility

public

Is Abstract

false

Parameter

in bPush : boolean

in dXRand : double

in iNumAttacks : int

in iNumFillers : int

in iSeed : int

in iTarget : int

    inout oDataset : Dataset

 

setFillerValues

In the Love/Hate Attack, the values for filler items must be assigned in the opposite sense of the type of attack. If it is a push attack, all the filler items will be set to minimum value; if it is a nuke attack, all the filler items will be set to maximum value.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

setSelectedValues

The Love/Hate Attack does not use the selected items.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

 

Relation Detail

 

Generalization

 

Name

 

Related Element

     IntruderAttack

 

 

Class RandomAttack

This class implements the Random Attack. This attack strategy sets the maximum value (push attack) or the minimum value (nuke attack) to the target item. The filler items are selected randomly and their values are also chosen with a Normal Distribution, using the global dataset mean and standard deviation.

For a further description read the article:

B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.

 

Name

RandomAttack

Qualified Name

es::uco::kdis::datapro::algorithm::intruder::RandomAttack

Visibility

public

Abstract

false

Base Classifier

     IntruderAttack

Realized Interface

 

 

Attribute Detail

 

All attributes are private.

 

Operation Detail

chooseSelectedItems

The Random Attack does not use the selected items.

 

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

initialize

Initialization method.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

RandomAttack

Parameterized Constructor:

    oDataset The original dataset

    iNumAttacks The number of attack instances

    bPush The attack type (true, push; false, nuke)

    iTarget The target item (The column attribute/item index)

    iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size

    dXRand The possibility of choose an item as selected/filler item

    iSeed The random seed

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     in bPush : boolean

     in dXRand : double

     in iNumAttacks : int

     in iNumFillers : int

     in iSeed : int

     in iTarget : int

     inout oDataset : Dataset

 

setFillerValues

In the Random Attack, the values for filler items must be randomly generated by a Normal Distribution, using the mean and standard deviation of the dataset.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

setSelectedValues

The Random Attack does not use the selected items.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 


 

Relation Detail

 

Generalization

 

 

Name

 

Related Element

     IntruderAttack

 

 

Class ReverseBandwagonAttack

This class implements the Reverse Bandwagon Attack. This attack strategy sets the minimum value (nuke attack) to the target item. Then, a set of items, named selected items, are chosen between the less visibility items. The visibility items are those having a low mean and high evaluation density.

For a better description read the article:

B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.

 

Name

ReverseBandwagonAttack

Qualified Name

es::uco::kdis::datapro::algorithm::intruder::ReverseBandwagonAttack

Visibility

public

Abstract

false

Base Classifier

     BandwagonAttack

Realized Interface

 

 

Operation Detail

chooseSelectedItems

Create the set of selected items. The size is prefixed by iNumSelected property.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

initialize

Initialization method.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

ReverseBandwagonAttack

Parameterized Constructor:

    oDataset The original dataset

    iNumAttacks The number of attack instances

    iTarget The target item (The column attribute/item index)

    iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size

    iNumSelected The size of selected item set: -1 for randomly size, >0 for fixed size

    dXVisibility The visibility threshold

    dXDensity The density threshold

    dXRand The possibility of choose an item as selected/filler item

    iSeed The random seed

 

 

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     in dXDensity : double

     in dXRand : double

     in dXVisibility : double

     in iNumAttacks : int

     in iNumFillers : int

     in iNumSelected : int

     in iSeed : int

     in iTarget : int

     inout oDataset : Dataset

 

setSelectedValues

Set the values of selected items. In the Reverse Bandwagon Attack, each selected item has the minimum value.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

setVisibilityColumns

Select the columns that exceed the visibility and density threshold.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

 

Relation Detail

 

Generalization

 

 

Name

 

Related Element

     BandwagonAttack

 

 

Class SegmentAttack

This class implements the Segment Attack. This attack strategy sets the maximum value (push attack) to the target item. Then, a set of selected items (the segment) are set to the maximum value. Finally, a set of filler items are randomly chosen and the minimum value are set to their.

For a better description read the article:

B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. 7(4):1-23, 2007.

 

Name

SegmentAttack

Qualified Name

es::uco::kdis::datapro::algorithm::intruder::SegmentAttack

Visibility

public

Abstract

false

Base Classifier

     IntruderAttack

Realized Interface

 

 

Attribute Detail

rgdMeanSD

rgdMeanSDstores the mean and standard deviation of the overall dataset.

 

Type

Double

Default Value

new ArrayList<Double>()

Visibility

protected

Multiplicity

0..*

 

 

Operation Detail

chooseSelectedItems

Create the segment, the set of selected item, with the information given in rgsNamesOfSelected. It returns the array of dataset columns that will act as selected items.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

initialize

Initialization method.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

SegmentAttack

Parameterized Constructor:

    oDataset The original dataset

    iNumAttacks The number of attack instances

    iTarget The target item (The column attribute/item index)

    iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size

    rgsNamesOfSelected The array with the names of the columns that will act as selected items (the segment)

    dXRand The possibility of choose an item as selected/filler item

    iSeed The random seed

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     in dXRand : double

     in iNumAttacks : int

     in iNumFillers : int

     in iSeed : int

     in iTarget : int

    inout oDataset : Dataset

    inout rgsNamesOfSelected : ArrayList<String>

 

 

 

 

setFillerValues

Set the value for filler items. In the Segment Attack, the minimum value is assigned.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

setSelectedValues

Set the values for the selected items. In the Segment Attack, the maximum value is assigned.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

 

Relation Detail

 

Generalization

 

Name

 

Related Element

     IntruderAttack


 

Package es::uco::kdis::datapro::algorithm::preprocessing

 

Figure 5. Package es.uco.kdis.datapro.algorithm.preprocessing

 

 

Name

preprocessing

Qualified Name

es::uco::kdis::datapro::algorithm::preprocessing


 

Package es::uco::kdis::datapro::algorithm::preprocessing:: discretization

 

 

Figure 6. Package es.uco.kdis.datapro.algorithm.preprocessing.discretization

 

 

Name

discretization

Qualified Name

es::uco::kdis::datapro::algorithm::preprocessing::discretization

 

Class EqualWidthDiscretization

Equal-width discretization of a given numerical/integer column of the dataset. A RangeColumn is returned. Notice that this class is inherited from EqualFrequencyDiscretization.

Figure 8. Class EqualWidthDiscretization

 

 

 

 

Name

EqualWidthDiscretization

Qualified Name

es::uco::kdis::datapro::algorithm::preprocessing::discretization::EqualWidthDi

scretization

Visibility

public

Abstract

false

Base Classifier

     DatasetStrategy

Realized Interface

 

 

Attribute Detail

iBins

iBins is the number of bins.

 

Type

int

Default Value

 

Visibility

protected

Multiplicity

 

 

oCol

The column to be discretized.

 

Type

NumericalColumn

Default Value

 

Visibility

protected

Multiplicity

 

 

oRangeColumn

The column returned as result.

 

Type

RangeColumn

Default Value

 

Visibility

protected

Multiplicity

 

 

sColName

The name of the column to be discretized.

 

Type

String

Default Value

 

Visibility

protected

Multiplicity

 

 

sResName

The name of the resulting column.

 

Type

String

Default Value

 

Visibility

protected

Multiplicity

 

 

 

Operation Detail

calculateDRangeColumn

This (protected) method creates a new RangeColumn taking both the intervals given as parameter and the values comprised by the original numerical column.

    aoRanges Array of intervals

    sName Name of the new column

 

It returns the resulting RangeColumn.

 

Type

RangeColumn

Visibility

protected

Is Abstract

false

Parameter

     inout aoRanges : DoubleRange

     inout sName : String

 

EqualWidthDiscretization

Parameterized Constructor:

    oDataset The dataset to be processed.

    iBins The number of bins.

    sColName The name of the column to be processed.

    sResName The name of the resulting column .

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     in iBins : int

     inout oDataset : Dataset

     inout sColName : String

     inout sResName : String

 

execute

This method runs the discretization process. Firstly, it calculates the cut-points and sets the range intervals.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

getResult

The discretized RangeColumn is returned.

 

Type

Object

Visibility

public

Is Abstract

false

Parameter

 

 

initialize

The initialization method. Types of the column and its values are checked.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

postexec

Not required.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

 

Relation Detail

 

Generalization

 

Name

 

Related Element

     EqualFrequencyDiscretization

 

Name

 

Related Element

     DatasetStrategy

 

 

Class EqualFrequencyDiscretization

Equal-frequency discretization of a given numerical/integer column of the dataset. A RangeColumn is returned.

Figure 7. Class EqualFrequencyDiscretization

 

Name

EqualFrequencyDiscretization

Qualified Name

es::uco::kdis::datapro::algorithm::preprocessing::discretization::EqualFrequen cyDiscretization

Visibility

public

Abstract

false

Base Classifier

     DatasetStrategy

     EqualWidthDiscretization

Realized Interface

 

 

 

Attribute Detail

 

All attributes are private.

 

Operation Detail

 

Notice that this class is inherited from EqualWidthDiscretization.

 

EqualFrequencyDiscretization

Parametrized constructor.

Parameters:

ˇ         iBins Number of bins to be created

ˇ         oDataset Source dataset containing the column to be discretized

ˇ         sColName Name of the source column

ˇ         sResName Name of the resulting Range column

 

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     in iBins : int

     inout oDataset : Dataset

     inout sColName : String

     inout sResName : String

 

execute

This method makes the discretization by frequency of the column passed as parameter.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

 

Relation Detail

 

Generalization

 

Name

 

Related Element

     DatasetStrategy

 

Name

 

Related Element

     EqualWidthDiscretization

 

Class MDLPDiscretize

Figure 9. Class MDLPDiscretize

 

 

Name

MDLPDiscretize

Qualified Name

es::uco::kdis::datapro::algorithm::preprocessing::discretization::MDLPDiscreti

ze

Visibility

public

Abstract

false

Base Classifier

     DatasetStrategy

Realized Interface

 

 

Attribute Detail

 

All attributes are private.

 

Operation Detail

execute

This method runs the discretization process following the MDLP algorithm.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

 

getResult

It returns the discretized dataset.

 

Type

Object

Visibility

public

Is Abstract

false

Parameter

 

 

initialize

The initialize() strategy method. It takes the whole dataset, and distribute each column in a LinkedList that contains a double array where the first value is the concrete value of the column, the second value is the label associated.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

 

MDLPDiscretize

Constructor with parameters:

    oDataset source dataset

 

 

Note: class labels are supposed to be in the last column of the dataset.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout oDataset : Dataset

 

postexec

The postexec() strategy method

 

 

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

 

Relation Detail

 

Generalization

 

 

Name

 

Related Element

     DatasetStrategy


 

Package es::uco::kdis::datapro::algorithm::preprocessing:: instance

 

 

Figure 10. Package es.uco.kdis.datapro.algorithm.preprocessing.instance

 

 

 

Name

instance

Qualified Name

es::uco::kdis::datapro::algorithm::preprocessing::instance

 

Class RemoveDuplicates

This class modifies the content of a Dataset by removing duplicate instances from this dataset.

 

Figure 11. Class RemoveDuplicates

 

 

Name

RemoveDuplicates

Qualified Name

es::uco::kdis::datapro::algorithm::preprocessing::instance::RemoveDuplicates

Visibility

public

Abstract

false

Base Classifier

     DatasetStrategy

Realized Interface

 

 

 

Attribute Detail

 

All attributes are private.

 

Operation Detail

execute

Execution method.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

getResult

It returns the clean dataset.

 

Type

Object

Visibility

public

Is Abstract

false

Parameter

 

 

initialize

Initialize the algorithm to prepare the execution.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

postexec

Post-processing.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

RemoveDuplicates

Parameterized Constructor:

    oDataset The source dataset to work with.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout oDataset : Dataset

 

 

Relation Detail

 

Generalization

 

 

Name

 

Related Element

     DatasetStrategy

 

 

Class RemovePercentage

This class modifies the content of a dataset by removing a percentage of its instances.


 

Figure 12. Class RemovePercentage

 

 

Name

RemovePercentage

Qualified Name

es::uco::kdis::datapro::algorithm::preprocessing::instance::RemovePercentag e

Visibility

public

Abstract

false

Base Classifier

     DatasetStrategy

Realized Interface

 

 

Attribute Detail

RANDOM

RANDOM mode, when instances to be removed are randomly selected.

 

 

Type

int

Default Value

0

Visibility

public

Multiplicity

 

 

FROMINIT

FROMINIT mode, when instances to be removed are taken from the beginning of the column.

 

 

Type

int

Default Value

1

Visibility

public

Multiplicity

 

 

FROMEND

FROMEND mode, when instances to be removed are taken from the end of the column.

 

Type

int

Default Value

2

Visibility

public

Multiplicity

 

 

oRnd

oRnd is the random generator object.

 

Type

Random

Default Value

new Random()

Visibility

public

Multiplicity

 

 

 

Operation Detail

execute

Execute method.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

getResult

Return the resulting dataset from the strategy process.

 

Type

Object

Visibility

public

Is Abstract

false

Parameter

 

 

initialize

Initialize the algorithm to prepare the execution.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

postexec

Post-processing method.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

RemovePercentage

Parameterized Constructor:

    oDataset The source dataset

    iMode The mode of removal

    dPercentage The percentage of instances (in [0,1]) to remove from the dataset

 

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     in dPercentage : double

     in iMode : int

     inout oDataset : Dataset

 

 

Relation Detail

 

Generalization

 

Name

 

Related Element

     DatasetStrategy


 

Package es::uco::kdis::datapro::algorithm::validation  

 

Figure 13. Package es.uco.kdis.datapro.algorithm.validation

 

 

Name

validation

Qualified Name

es::uco::kdis::datapro::algorithm::validation

 

 

Class KFolds

This class implements the strategy that calculates the different partitions of the dataset using the KFolds algorithm.

Figure 14. Class es.uco.kdis.datapro.algorithm.validation.KFolds

 

Name

KFolds

Qualified Name

es::uco::kdis::datapro::algorithm::validation::KFolds

Visibility

public

Abstract

false

Base Classifier

     DatasetStrategy

Realized Interface

 

 

Attribute Detail

 

All attributes are private.

 

Operation Detail

execute

It runs the KFolds algorithm. After the execution, the algorithm is not executable anymore.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

getResult

This method returns the list containing the resulting dataset partitions.

 

Type

List<Object>

Visibility

public

Is Abstract

false

Parameter

 

 

 

 

initialize

This method initializes the algorithm. The instances are sorted as a HashMap by categories.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

KFolds

Parameterized constructor. Notice that the class column is supposed to be the last column in the dataset:

     oDataset Source dataset

     iNumberOfPartitions Number of partitions to be built

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     in iNumberOfPartitions : int

     inout oDataset : Dataset

 

KFolds

Parameterized constructor. Notice that the class column is supposed to be the last column in the dataset:

     oDataset Source dataset

     iNumberOfPartitions Number of partitions to be built

     iSeed If the programmer wants to reproduce a previous partition, he can indicate a given seed to the process. Otherwise, the seed is randomly selected.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     in iNumberOfPartitions : int

     inout oDataset : Dataset

 

postexec

Not required.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

 

Relation Detail

 

Generalization

 

 

Name

 

Related Element

     DatasetStrategy


 

Package es::uco::kdis::datapro::dataset

 

 

Figure 15. Package es.uco.kdis.datapro.dataset

 

 

Name

dataset

Qualified Name

es::uco::kdis::datapro::dataset

 

 

Class Dataset

Dataset is the abstract base class for all the different types of dataset sources. This class fills the gap between the physical dataset (stored in a file, database, etc.) and its logical handling, where the access to attributes/columns and processing methods is provided.


 

Figure 16. Class Dataset

 

Name

Dataset

Qualified Name

es::uco::kdis::datapro::dataset::Dataset

Visibility

public

Abstract

true

Base Classifier

 

Realized Interface

 

 

Attribute Detail

iCursor

iCursor refers to the row being pointed in the dataset by the InstanceIterator.

 

 

 

Type

int

Default Value

 

Visibility

Protected

Multiplicity

 

 

rgoColumns

rgoColumns is the list of columns that comprise the dataset.

 

Type

ColumnAbstraction

Default Value

 

Visibility

protected

Multiplicity

0..*

 

rgoValidBinaryFalseValues

For binary columns, it contains the list of values that will be interpreted as False when reading from the physical dataset. Writing will be performed using the first element in the list.

 

Type

String

Default Value

 

Visibility

Protected

Multiplicity

0..*

 

rgoValidBinaryTrueValues

For binary columns, it contains the list of values that will be interpreted as True when reading from the physical dataset. Writing will be performed using the first element in the list.

 

Type

String

Default Value

 

Visibility

protected

Multiplicity

0..*

 

sOpenRangeDelimiter

For range columns, sOpenRangeDelimiter stores the symbol(s) that open the numerical range, right before the minimum value: e.g., '[' for [2,3]. This is used during the reading and writing of the physical dataset.

 

Type

String

Default Value

 

Visibility

protected

Multiplicity

 

 

sSeparationRangeDelimiter

For range columns, sSeparationRangeDelimiter stores the symbol(s) that separate the minimum and maximum values in a numerical range: e.g., ',' for [2,3]. This value is only used during the reading and writing of the physical dataset.

 

 

 

Type

String

Default Value

 

Visibility

protected

Multiplicity

 

 

sCloseRangeDelimiter

For range columns, sCloseRangeDelimiter stores the symbol(s) that serves to close the numerical range, right after the maximum value: e.g., ']' for [2,3]. This is only used during the reading and writing of the physical dataset.

 

Type

String

Default Value

 

Visibility

protected

Multiplicity

 

tiplicity

sEmptyValue

sEmptyValue stores the string that will represent an empty value in the dataset file.

 

Type

String

Default Value

 

Visibility

protected

Multiplicity

 

 

sMissingValue

sMissedValue stores the string that will represent a missing value in the dataset file.

 

Type

String

Default Value

 

Visibility

protected

Multiplicity

 

 

sNullValue

sNullValue stores the string that will represent a null value in the dataset file.

 

Type

String

Default Value

 

Visibility

protected

Multiplicity

 

 

sName

The name of the dataset.

 

Type

String

Default Value

 

Visibility

protected

Multiplicity

 

 

 

Operation Detail

addAllValues

A set of column values are inserted into the dataset structure. Notice that instance duplication is not checked.

Parameters:

     sColumnFormat String that specifies the types of the columns to be added. Types depend on the specific dataset.

 

Exceptions:

     IOException

     IllegalFormatSpecificationException

     NotAddedValueException

     IndexOutOfBoundsException

 

Type

void

Visibility

protected

Is Abstract

true

Parameter

     inout sColumnFormat : String

 

addColumn

Insert a column abstraction given by parameter in the last position of the list of columns of the dataset

 

Parameter:

     oColumn: Column abstraction to be added

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout oColumn : ColumnAbstraction

 

addColumn

Insert a column abstraction in a given position of the list of dataset columns.

Parameters:

     oColumn: Column abstraction to be inserted

     iIndex: Position index where the column element is added in the list. The rest of column items will be shifted one position to the right.

 

Exceptions:

     UnsupportedOperationException

     ClassCastException

     NullPointedException

     IllegalArgumentException

     IndexOutOfBoundsException

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout iIndex : int

     inout oColumn : ColumnAbstraction

 

clone

Create a new dataset exactly with the same metadata and column structure. However, only the structure is copied, since instances from the original dataset are not added to the new one.

 

It returns the empty cloned dataset.

 

Type

Dataset

Visibility

public

Is Abstract

false

Parameter

 

 

close

Abstract method that serves to close the physical dataset source.

Exceptions:

     IOException

 

 

Type

void

Visibility

protected

Is Abstract

true

Parameter

 

 

 

copy

This method creates a new dataset exactly with the same metadata, column structure and data than the original dataset. In this case, instances from the original dataset are also copied to the new one.

A copy of the dataset is returned.

 

Type

Dataset

Visibility

public

Is Abstract

false

Parameter

 

 

Dataset

This is the default constructor of this class. By default, it sets the following parameters to their default values:

     sMissedValue: "?"

     sNullValue: "?"

     sEmptyValue: "?"

     sOpenRangeDelimiter: "["

     sSeparationRangeDelimiter: ","

     sCloseRangeDelimiter: "]"

 

Notice that using these symbols is not mandatory for reading/writing, as its applicability depends on the specific implementation of each source dataset.

 

 

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

getColumn

This method looks for a column abstraction by its index in the column list. Notice that indexes can change when one column is added or removed to/from intermediate positions.

Parameter:

     iIndex: Index of the queried column.

 

It returns a reference to the column abstraction queried.

 

Type

ColumnAbstraction

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

 

 

getColumnByName

This method returns the first column instance found having the name required as parameter. Parameter:

     sName: The name of the column queried (no case-sensitive)

 

It returns the column abstraction class that accesses to the column required by its name.

 

Type

ColumnAbstraction

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

 

getColumns

Getter method for the private property rgoColumns, which comprises the array of column abstractions in the dataset.

 

Type

List<ColumnAbstraction>

Visibility

public

Is Abstract

false

Parameter

 

 

getEmptyValue

Getter method for the private property sEmptyValue, which comprises the String that represents the symbol for the empty value in the dataset source. Notice that each specific dataset implementation can use, or not, this property accordingly.

 

Type

String

Visibility

public

Is Abstract

false

Parameter

 

 

 

getIndexOfColumn

Given a column abstraction, it searches for the index that this column occupies in the array of column abstractions in the dataset.

Parameter:

     oCol: Column to be located.

 

It returns the index of the column abstraction passed as parameter; -1, otherwise.

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     inout oCol : ColumnAbstraction

 

getMissingValue

Getter method for the private property sMissingValue, which comprises the String that represents the symbol for the missing value in the dataset source. Notice that each specific dataset implementation can use, or not, this property accordingly.

 

Type

String

Visibility

public

Is Abstract

false

Parameter

 

 

getName

Getter method for the private property sName, which represents the name given to the dataset.

 

Type

String

Visibility

public

Is Abstract

false

Parameter

 

 

getNullValue

Getter method for the private property sNullValue, which comprises the String that represents the symbol for the null value in the dataset source. Notice that each specific dataset implementation can use or not this property accordingly.

 

Type

String

Visibility

public

Is Abstract

false

Parameter

 

 

getNumberOfDecimals

Getter method for the private property iNumberOfDecimals, which indicates the number of decimal digits used when writing numerical columns in dataset sources. Notice that this value can be used accordingly by each specific dataset source.

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

getRangeDelimiters

This method gets a list of the three values used to demarcate a range, comprising the sOpenRangeDelimiter, sSeparationRangeDelimiter and sCloseRangeDelimiter. Notice that each specific dataset source could make use of these values accordingly.

 

Type

ArrayList<String>

Visibility

public

Is Abstract

false

Parameter

 

 

getValidBinaryFalseValues

Getter method for the private property rgoValidBinaryFalseValues: the list of strings that are interpreted as false when reading and writing Boolean values in a dataset source. Notice that its specific use depends on the implementation made for each dataset source.

 

Type

ArrayList<String>

Visibility

public

Is Abstract

false

Parameter

 

 

getValidBinaryTrueValues

Getter method for the private property rgoValidBinaryTrueValues: the list of strings that are interpreted as true when reading and writing Boolean values in a dataset source. Notice that its specific use depends on the implementation made for each dataset source.

 

Type

ArrayList<String>

Visibility

public

Is Abstract

false

Parameter

 

 

merge

This method merges two datasets by adding the dataset passed as parameter to the current one. Parameters:

     oDSInjected: The dataset to be added. Notice that this dataset must contain the same number and type of columns than the dataset object this.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout oDSInjected : Dataset

 

merge

This method merges two datasets by adding the dataset passed as parameter to the dataset object this.

Parameters:

ˇ         oDataset: The dataset to be added.

ˇ         sColumnFormat: Sometimes the target dataset contains more columns than the source dataset. For those cases, the columns to be added can be explicitly specified. This parameter is a String that indicates the columns to be added. Each character in the String matches to a column in the target dataset. The String may comprise some of the following characters:

o    x: Include this column

o    %: Skip this column.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout oDataset : Dataset

     inout sColumnFormat : String

 

open

Abstract protected method. This method just opens the source dataset and initializes the row cursor to the first row of data. However, each specific dataset class is responsible for its implementation, and thus defining its real scope, according to its specific properties.

Notice that each type of datasets will provide specific methods to process the full dataset. For example, file datasets provide the method readDataset.

Exceptions:

     FileNotFoundException

     IOException

     IllegalFormatSpecificationException

 

Type

void

Visibility

protected

Is Abstract

true

Parameter

 

 

removeColumn

This method removes a column from the dataset. Notice that column indexes can be modified (decreased) for the rest of columns. The column removed is returned.

Parameter:

     iIndex: Position index where the column to be removed is located.

 

Exceptions:

     UnsupportedOperationException

     IndexOutOfBoundsException

 

Type

ColumnAbstraction

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

 

setColumns

Setter method for the property rgoColumns. Even when it is a public method, notice that it should be used very carefully, mainly for those cases when the replacement of the entire set of columns is mandatory. To add or remove a single column, or just a set of them, use instead the methods addColumn and removeColumn.

Parameter:

     rgoCols: The entire list of columns in the dataset.


 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout rgoCols : List<ColumnAbstraction>

 

setEmptyValue

Setter method for the private property sEmptyValue, which comprises the String that represents the symbol for the null value in the dataset source. Notice that each specific dataset implementation can make its own use of this property accordingly.

Parameters:

     sEmptyValue The symbol/string representing an empty value in the dataset

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout sEmptyValue : String

 

setMissingValue

Setter method for the private property sMissingValue, which comprises the String that represents the symbol for the missing value in the dataset source. Notice that each specific dataset implementation can make its own use of this property accordingly.

Parameters:

     sMissingValue The symbol/string representing a missing value in the dataset

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout sMissingValue : String

 

setName

Setter method for the private property sName, which represents the name of the dataset. Parameter:

     sName: The name of the dataset.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

 

setNullValue

Setter method for the private property sNullValue, which comprises the String that represents the symbol for the null value in the dataset source. Notice that each specific dataset implementation can make its own use of this property accordingly.

Parameters:

     sNullValue The symbol/string representing a null value in the dataset

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout sNull : String

 

setNumberOfDecimals

Setter method for the private property iNumberOfDecimals, which represents the number of decimals that the programmer wants to set for numerical values. Notice that the specific applicability of this attribute directly depends on the specific implementation of the dataset source.

 

Parameter:

     iNum: The number of decimal digits that will be considered when saving numerical values.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     in iNum : int

 

setRangeDelimiters

This method sets the symbols that will serve as range delimiter. Notice that the specific applicability of these attributes directly depends on the specific implementation of the dataset source.

Parameters:

     sInitial: The symbol/string that represents the starting delimiter.

     sSeparator: The symbol/string that represents the value separator.

     sEnding: The symbol/string that represents the ending delimiter.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout sEnding : String

     inout sInitial : String

     inout sSeparator : String

 

setValidBinaryFalseValues

Setter method of the list rgoValidBinaryFalseValues, which contains the set of strings that represent a False boolean value in the dataset. Notice that only the first value of the list should be used for serialization (reading or writing). In any case, it depends on the specific implementation of each dataset source.

 

Parameter:

     rgoValidBinaryFalseValues: The list of values that will be interpreted as False.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout rgoValidBinaryFalseValues : ArrayList<String>

 

setValidBinaryTrueValues

Setter method of the list rgoValidBinaryTrueValues, which contains the set of strings that represent a True boolean value in the dataset. Notice that only the first value of the list should be used for serialization (reading or writing). In any case, it depends on the specific implementation of each dataset source.

 

Parameter:

     rgoValidBinaryTrueValues: The list of values that will be interpreted as True.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout rgoValidBinaryTrueValues : ArrayList<String>

 

setValidBinaryValues

This method sets both the list of strings that will represent a True boolean value, and the list of strings that will represent a False boolean value in the dataset. This functionality could be also done by invoking seldom specific methods.

Parameters:

     rgoFalseList: A list with the valid False symbols/strings

     rgoTrueList: A list with the valid True symbols/strings

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout rgoFalseList : ArrayList<String>

     inout rgoTrueList : ArrayList<String>

 

swapColumns

This method swaps two columns in the list of columns of the dataset. It searches for both columns, and swaps its positions, and thus both structure and data.

Parameters:

     oColumn1: The first column to swap.

     oColumn2: The second column to swap.

 

Exceptions:

     ColumnAbstraction

     UnsupportedOperationException

     ClassCastException

     NullPointedException

     IllegalArgumentException

     IndexOutOfBoundsException

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout oColumn1 : ColumnAbstraction

     inout oColumn2 : ColumnAbstraction

 

 

Relation Detail

 

Association

 

Name

rgoColumns

Related Element

     ColumnAbstraction

 

Dependency

 

 

Name

 

Related Element

     InstanceIterator

 

Generalization

 

 

Name

 

Related Element

     FileDataset

 

 

Class FileDataset

This abstract class represents a dataset when its source is extracted from a file. It includes the specific methods required to handle with datasets in form of files.

 

Figure 17. Class FileDataset

 

Name

FileDataset

Qualified Name

es::uco::kdis::datapro::dataset::FileDataset

Visibility

public

Abstract

true

Base Classifier

     Dataset

Realized Interface

 

 

 

 

Attribute Detail

oBufferedReader

oBufferedReader is the buffer used to read the file.

 

Type

BufferedReader

Default Value

 

Visibility

protected

Multiplicity

 

 

sCommentValue

sCommentedValue stores the string that will indicate the beginning of a comment line in the dataset file, if this line has to be omitted from the processing.

 

Type

String

Default Value

 

Visibility

protected

Multiplicity

 

 

sFileName

sFileName is the name of the file source that contains the dataset.

 

Type

String

Default Value

 

Visibility

protected

Multiplicity

 

 

sSeparationSymbol

sSeparationSymbol stores the symbol/string that indicates the separator between values of the same instance-row (i.e., a comma, a line of the dataset file, etc).

 

Type

String

Default Value

 

Visibility

protected

Multiplicity

 

 

 

Operation Detail

clone

This method creates a new dataset exactly with the same type and column structure than the original. Instances from the original dataset are not copied. It returns a new Dataset instance.

 

Type

Dataset

Visibility

public

Is Abstract

false

Parameter

 

 

copy

This method clones the dataset and fills its content with the instances extracted from the original. Create a new dataset exactly with the same type, column structure and data. It returns the copied Dataset instance.

 

Type

Dataset

Visibility

public

Is Abstract

false

Parameter

 

 

 

FileDataset

Default constructor. Notice that the following symbols are used by default:

     sCommentValue: "%"

     sSeparationSymbol: ","

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

FileDataset

This constructor receives the name of the file as parameter. The following symbols are used as default:

     sCommentValue: "%"

     sSeparationSymbol: ","

 

Parameter:

     sFileName: The filename of the dataset source.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout sFileName : String

 

getCommentValue

Getter method of the property sCommentValue.

 

Type

String

Visibility

public

Is Abstract

false

Parameter

 

 

getFileName

Getter method of the filename of the dataset source.

 

Type

String

Visibility

public

Is Abstract

false

Parameter

 

 

getSeparationSymbol

Getter method of the property sSeparationSymbol.

 

Type

String

Visibility

public

Is Abstract

false

Parameter

 

 

readDataset

Implementations of this abstract method will read the dataset from the file specified by the constructor.

 

Parameters:

     sContentFormat: String that specifies the reading format of the dataset file. Construct the string using a sequence of control tokens:

o  % to omit a line (only one line).

o  %name to read the name of columns (only one line).

o  %col to read data (zero, one or more lines).

 

Example: the string “%%%col%%name” indicates that the first two lines must be omitted, then data is read and, finally, the last line will contain the column names.

 

     sColumnFormat: A String that contains an ordered sequence of tokens that determine the data type of each column to be read. Use the following tokens:

o  s: Nominal column

o  f: Real column

o  c: Categorical column

o  b: Binary column

o  i: Integer column

o  %: Skip this column (the column skipped is not processed)

 

Additionally, notice that other tokens can be considered depending of the specific dataset source (e.g., d for columns of type date).

 

Exceptions:

     FileNotFoundException

     IOException

     IllegalFormatSpecificationException

     NotAddedValueException

     IndexOutOfBoundsException

 

Type

void

Visibility

public

Is Abstract

true

Parameter

inout sColumnFormat : String

inout sContentFormat : String

 

setCommentValue

Setter method of the property sCommentValue.

Parameter:

     sComment: The token/string indicating the symbol that represents a comment line in the dataset file.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

     inout sComment : String

 

setFileName

Setter method of the property sFileName. Parameter:

     sFileName: The filename of the dataset source.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout sFileName : String

 

setSeparationSymbol

Setter method of the property sSeparationSymbol. Parameter:

     sSeparationSymbol: The token used to differentiate between instances in the same line of the dataset source.

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

     inout sSeparator : String

 

writeDataset

This abstract method defines the signature of the write method for every file dataset. Implementations of this method deal with the serialization (writing) of the current column structure into each specific file format.

Parameter:

     sOutputFile: The path where the dataset file will be saved.

 

Exception:

     IOException

 

Type

void

Visibility

public

Is Abstract

true

Parameter

     inout sOutputFile : String

 

 

Relation Detail

 

Generalization

Name

 

Related Element

     CsvDataset

 

Name

 

Related Element

     ExcelDataset

 

Name

 

Related Element

     ArffDataset

 

Name

 

Related Element

     Dataset

 

 

Class InstanceIterator

InstanceIterator is the class that implements the interface IIterator for covering the instances of the dataset. Thus, this class represents an iterator to access each row/instance in a dataset. The instance iterator provides methods to cover the whole set of instances and keeps the reference to the dataset being iterated.

 

Figure 18. Class InstanceIterator

 

 

Name

InstanceIterator

Qualified Name

es::uco::kdis::datapro::dataset::InstanceIterator

Visibility

public

Abstract

false

Base Classifier

 

Realized Interface

     IIterator

 

Attribute Detail

 

All attributes are private.

 

Operation Detail

currentInstance

This method returns the list of objects that form the currently pointed instance in the dataset.

 

Type

List<Object>

Visibility

public

Is Abstract

false

Parameter

 

 

first

This method returns the list of objects that form the first instance in the dataset and sets the pointer to the first instance.

 

Type

List<Object>

Visibility

public

Is Abstract

false

Parameter

 

 

InstanceIterator

Default iterator constructor.

Parameter:

     oDataset: The dataset to be covered by the iterator.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout oDataset : Dataset

 

 

 

isDone

This method returns true if the dataset has no more instances to be iterated. False, otherwise.

 

Type

boolean

Visibility

public

Is Abstract

false

Parameter

 

 

next

This method increases the instance pointer by one, i.e. sets the pointer to the next instance in the dataset.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

Relation Detail

 

Interface Realization

 

 

Name

 

Related Element

     IIterator

 

 

Interface IIterator

IIterator is the interface that any instance iterator has to implement, as InstanceIterator does.

 

Figure 19. Interface IIterator

 

 

Name

IIterator

Qualified Name

es::uco::kdis::datapro::dataset::IIterator

Visibility

public

Base Classifier

 

 

 

Operation Detail

currentInstance

The implementation of this method has to return the current pointed instance in the dataset as a List of instances of any class from Object.

 

 

 

Type

List<Object>

Visibility

public

Is Abstract

true

Parameter

 

 

first

An implementation of this method returns the first instance of the dataset. From here on, the current instance pointed by the iterator should be this first one.

 

Type

List<Object>

Visibility

public

Is Abstract

true

Parameter

 

 

isDone

This method should be implemented to return True if the iterator points to the last instance of the dataset. It returns False otherwise.

Type

boolean

Visibility

public

Is Abstract

true

Parameter

 

 

next

The implementation of this method increases the iterator to the next instance in the dataset.

 

Type

void

Visibility

public

Is Abstract

true

Parameter

 

 

 

Relation Detail

 

Interface Realization

 

Name

 

Related Element

    InstanceIterator


 

Package es::uco::kdis::datapro::dataset::Column

 

 

This package contains the classes related to the different types of columns supported by the library. At the moment, datapro4j provides an implementation for the following types:

      Binary column, for positive or negative values.

      Categorical column, for prefixed string values, considered as an enumeration of categories.

      Date column.

      Integer column, for numerical integer values.

      Nominal column, for free valued strings.

      Numerical column, for numerical real values.

      Range column, for those values that represent a numerical interval (minimum, maximum), where both open and close ranges can be considered.

Columns are coded following the philosophy of the bridge design pattern, where an abstraction is decoupled from its implementation. In this way, the programmer can add to the library new implementations of some of the columns provided, e.g. for performance reasons, without altering the manner in which the rest of the libraryincluding algorithmsinteracts with this column.

Therefore, every column type is implemented by at least two different classes: its abstraction, where the accessor methods to its functionalities exist, and its implementation, where these functionalities are coded, and invoked from the abstraction.

Using columns properly demands considering the following rules:

      Any code from the library (i.e. from other columns, datasets or strategies) should always invoke methods of the abstraction. Never invoke directly to the column implementation (only its own abstraction should).

      Altering current abstractions may cause unexpected failures. Use generalization or provide conversion methods to build your own abstractions instead.

      Abstractions and implementations must be subclasses of ColumnAbstraction and ColumnImplementation, respectively.

      Datapro4j only supports one implementation class per abstraction. If the programmer wants to have more than one implementation, then more than one abstraction should be provided, or a factory pattern should be coded.

      If new abstractions (i.e. type of columns) are provided, modify the enumeration ColumnType accordingly.

Figure 20. Package es.uco.kdis.datapro.dataset.Column

 

 

Name

Column

Qualified Name

es::uco::kdis::datapro::dataset::Column

 

 

Class ColumnAbstraction

This abstract class implements the common functionalities contained by every column in the dataset. It also defines the methods that are not coded by the implementation class, but they refer to the column metainformation (e.g. name, type, etc.). The latter methods are directly implemented by abstractions, since they do not require any access to data.


 

Figure 21. Abstract class ColumnAbstraction

 

 

Name

ColumnAbstraction

Qualified Name

es::uco::kdis::datapro::dataset::Column::ColumnAbstraction

Visibility

public

Abstract

true

Base Classifier

 

Realized Interface

 

 

Attribute Detail

ctColumnType

The column type, as represented by the enumeration defined by the class ColumnType.

 

 

Type

ColumnType

Default Value

 

Visibility

protected

Multiplicity

1

 

oImpl

A reference to the implementation object.

 

Type

ColumnImpl

Default Value

 

Visibility

protected

Multiplicity

1

 

sName

The name of the column.


 

Type

String

Default Value

 

Visibility

protected

Multiplicity

 

 

 

Operation Detail

addAllValues

This method calls the implementation to add a list of values at the end of the column.

Parameter:

     rgoCol The list of values to be added. The objects here contained must satisfy the type required by the column.

 

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout rgoCol : List<Object>

 

addValue

This method calls the implementation to add a single value at the end of the column.

Parameter:

     oValue The value to be added. It must satisfy the type required by the column.

 

The method returns the number of items successfully added to the column.

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     inout oValue : Object

 

addValue

This method calls the implementation to add a single value at the end of the column.

Parameters:

     oValue The value to be added. It must satisfy the type required by the column.

     bForce is used to indicate that the value must be added, independently of the constraints and addition policies defined by the column type.

 

The method returns the number of items successfully added to the column.

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in bForce : boolean

     inout oValue : Object

 

addValue

This method calls the implementation to add a single value at a given position in the column.

Parameters:

      iIndex indicates the element position where the item has to be added.

      oValue The value to be added. It must satisfy the type required by the column.

The method returns the number of items successfully added to the column.

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

     inout oValue : Object

 

ColumnAbstraction

Default constructor with parameters. Subclasses may override this method or create new constructors.

 

This constructor only assigns the parameter values to its respective variables. The constructor in the subclass should create the implementation object and assigned it to the variable oImpl.

 

Parameters:

     ctColumnType The column type.

     sName The Name of the column to be created.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout ctColumnType : ColumnType

     inout sName : String

 

countEmptyValues

This method calls the implementation to return the number of empty values in the column set.

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countInvalidValues

This method calls the implementation to return the number of invalid values (i.e. empty, null and missing values) in the column set.

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countMissingValues

This method calls the implementation to return the number of missing values in the column set.

 

Type

Int

Visibility

public

Is Abstract

false

Parameter

 

 

countNullValues

This method calls the implementation to return the number of null values in the column set.

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

getElement

This method calls the implementation to return the element at the given position.

Parameter:

     iPos Position of the element queried.

 

Type

Object

Visibility

public

Is Abstract

false

Parameter

     in iPos : int

 

getEmptyValue

This method calls the implementation to return the column-specific empty value. This is not the default empty value used by datapro4j (Class EmptyValue) for reading, writing or internally checking empty values, but it serves the developer to define its own use (e.g., the symbol associated to the empty value, or whatever).

 

Type

Object

Visibility

public

Is Abstract

false

Parameter

 

 

getMissingValue

This method calls the implementation to return the column-specific missing value. This is not the default missing value used by datapro4j (Class MissingValue) for reading, writing or internally checking missing values, but it serves the developer to define its own use (e.g., the symbol associated to a missing value, or whatever).

Type

Object

Visibility

public

Is Abstract

false

Parameter

 

 

getName

This method returns the name given of the column.

 

Type

String

Visibility

public

Is Abstract

false

Parameter

 

 

getNullValue

This method calls the implementation to return the column-specific null value. This is not the default null value used by datapro4j (Class NullValue) for reading, writing or internally checking null values, but it serves the developer to define its own null object.

 

Type

Object

Visibility

public

Is Abstract

false

Parameter

 

 

getSize

This method calls the implementation to return the size of the column.

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

getType

This method returns the type of the column as a value of ColumnType.

 

Type

ColumnType

Visibility

public

Is Abstract

false

Parameter

 

 

getValues

This method calls the implementation to return the list of items (as instances of Object) contained in the column.

Type

List<Object>

Visibility

public

Is Abstract

false

Parameter

 

 

removeValue

It calls the implementation to remove an element in the column at a given position. Parameter:

     iIndex The index of the element to be removed.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

 

setEmptyValue

This method calls the implementation to set the column-specific empty value, if required. This is not the default empty value used by datapro4j (Class EmptyValue) for reading, writing or internally checking empty values, but the developer has to define its usage in the code of the proper strategies.

Parameter:

     oEmptyValue The empty value to be set.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout oEmptyValue : Object

 

setMissingValue

This method calls the implementation to set the column-specific missing value, if required. This is not the default missing value used by datapro4j (Class MissingValue) for reading, writing or internally checking missing values, but the developer has to define its usage in the code of the proper strategies. Parameter:

     oMissingValue The missing value to be set.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout oMissingValue : Object

 

setName

This method sets the name of the column.

Parameter:

     sName The new name for the column.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

 

setNullValue

This method calls the implementation to set the column-specific null value, if required. This is not the default null value used by datapro4j (Class NullValue) for reading, writing or internally checking null values, but the developer has to define its usage in the code of the proper strategies.

Parameter:

     oNullValue The null value to be set.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout oNullValue : Object

 

setValue

This method calls the implementation to set the value of an element in the column at a given position.

Parameters:

     oValue The value to be added.

     iIndex The element position in the column.

 

It returns the number of elements correctly added.

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

     inout oValue : Object

 

 

 

Relation Detail

 

Association

 

 

Name

 

Related Element

     ColumnImpl

 

Name

 

Related Element

     ColumnType

 

Name

rgoColumns

Related Element

     Dataset

 

Generalization

 

Name

 

Related Element

     CategoricalColumn

 

Name

 

Related Element

     NumericalColumn

 

Name

 

Related Element

     DateColumn

 

Name

 

Related Element

     BinaryColumn

 

Name

 

Related Element

     NominalColumn

 

Name

 

Related Element

     RangeColumn

 

 

Class ColumnImpl

This abstract class serves as a base for column implementation classes. These classes comprise the real code accessing data in the column. Only metainformation is managed by its abstraction.

Note: None of its methods should be directly invoked, apart from its specific abstraction. Thus, for a given column type, abstraction is inalterable, whereas implementation could be adapted by the programmer.

 

 

Figure 22. Abstract class ColumnImpl

 

Name

ColumnImpl

Qualified Name

es::uco::kdis::datapro::dataset::Column::ColumnImpl

Visibility

public

Abstract

true

Base Classifier

 

Realized Interface

 

 

Attribute Detail

oEmptyValue

This object represents a column-specific empty value. Notice that this is not the standard empty value object, as used by datapro4j strategies and datasets.

 

Type

Object

Default Value

null

Visibility

protected

Multiplicity

 

 

oMissingValue

This object represents a column-specific missing value. Notice that this is not the standard missing value object, as used by datapro4j strategies and datasets.

 

Type

Object

Default Value

null

Visibility

protected

Multiplicity

 

 

oNullValue

This object represents a column-specific null value. Notice that this is not the standard null value object, as used by datapro4j strategies and datasets.

 

 

 

Type

Object

Default Value

null

Visibility

protected

Multiplicity

 

 

Operation Detail

 

The following methods code the implementation for their corresponding abstraction methods.

 

addAllValues

This method implements the method addAllValues of the column abstraction, returning the number of objects successfully added.

Parameter:

     rgoCol The list of item objects to be added to the column.

 

Type

int

Visibility

public

Is Abstract

true

Parameter

     inout rgoCol : List<Object>

 

addValue

This method implements the method addValue of the column abstraction, returning the number of objects successfully added.

Parameter:

     oValue The value to be added.

 

Type

int

Visibility

public

Is Abstract

true

Parameter

     inout oValue : Object

 

addValue

This method implements the method addValue of the column abstraction, returning the number of objects successfully added.

Parameters:

     oValue The value to be added

     bForce If true, the implementation must force its addition.

Note: By default bForce is not considered. Otherwise, the subclass implementing the specific column should explicitly rewrite this method.

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     inout oValue : Object

     in bForce : boolean


 

addValue

This method implements the method addValue of the column abstraction, returning the number of objects successfully added.

Parameters:

     oValue The value to be added.

     iIndex The position in the column to add the value.

 

Type

Int

Visibility

public

Is Abstract

true

Parameter

     inout oValue : Object

     in iIndex : int

 

countEmptyValues

This method implements the method countEmptyValue of the column abstraction, returning the number of empty values contained in the column values. -1 is returned if this value could not be calculated.

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countInvalidValues

This method implements the method countInvalidValue of the column abstraction, returning the number of invalid values (null, empty and missing values) contained in the column values. -1 is returned if this value could not be calculated.

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countMissingValues

This method implements the method countMissingValue of the column abstraction, returning the number of missing values contained in the column values. -1 is returned if this value could not be calculated.

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countNullValues

This method implements the method countNullValue of the column abstraction, returning the number of null values contained in the column values. -1 is returned if this value could not be calculated.

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

getElement

This method implements the method getElement of the column abstraction, returning the element at the given position.

Parameter:

     iPos The position of the element to be returned.

 

 

Type

Object

Visibility

public

Is Abstract

true

Parameter

     in iPos : int

 

getEmptyValue

This method implements the method getEmptyValue of the column abstraction, returning the element representing the column-specific empty value.

 

Type

Object

Visibility

public

Is Abstract

false

Parameter

 

 

getMissingValue

This method implements the method getMissingValue of the column abstraction, returning the element representing the column-specific missing value.

 

Type

Object

Visibility

public

Is Abstract

false

Parameter

 

 

getNullValue

This method implements the method getNullValue of the column abstraction, returning the element representing the column-specific null value.

 

Type

Object

Visibility

public

Is Abstract

false

Parameter

 

 

getSize

This method implements the method getSize of the column abstraction, returning the number of elements contained in the column.

 

Type

int

Visibility

public

Is Abstract

true

Parameter

 

 

 

 

getValues

This method implements the method getValues of the column abstraction, returning the list of elements (as instances of Object) contained in the column.

 

Type

List<Object>

Visibility

public

Is Abstract

true

Parameter

 

 

removeValue

This method implements the method removeValue of the column abstraction.

Parameter:

     iIndex The position in the column to add the value.

 

Type

void

Visibility

public

Is Abstract

true

Parameter

     in iIndex : int

 

setEmptyValue

This method implements the method setEmptyValue of the column abstraction, setting the element representing the column-specific empty value.

Parameter:

     oEmptyValue The object representing a specific empty value in this column.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout oEmptyValue : Object

 

setMissingValue

This method implements the method setMissingValue of the column abstraction, setting the element representing the column-specific missing value.

Parameter:

     oMissingValue The object representing a specific missing value in this column.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout oMissingValue : Object

 

setNullValue

This method implements the method setNullValue of the column abstraction, setting the element representing the column-specific null value.

Parameter:

     oNullValue The object representing a specific null value in this column.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout oNullValue : Object

 

setValue

This method implements the method setValue of the column abstraction, setting the element value at the given position.

Parameters:

     oValue The object value to set.

     iIndex The position index in the column.

 

Type

int

Visibility

public

Is Abstract

true

Parameter

     in iIndex : int

     inout oValue : Object

 

 

Relation Detail

Association

 

Name

 

Related Element

     ColumnAbstraction

 

Generalization

 

 

Name

 

Related Element

     RangeColumnImpl

 

Name

 

Related Element

     NominalColumnImpl

 

Name

 

Related Element

     NumericalColumnImpl

 

Name

 

Related Element

     DateColumnImpl

 

Name

 

Related Element

     CategoricalColumnImpl

 

Name

 

Related Element

     BinaryColumnImpl

 

 

Enumeration ColumnType

This enumeration contains the different types of columns supported by datapro4j. The following types are currently supported:

     Binary

     Categorical

     Date

     Integer

     Nominal

     Numerical

     Range

Note: If the programmer wants to check the column type, the following code should be used (e.g. for binary columns)

 

 

ColumnAbstraction oCol;

if (oCol.getType().equals(ColumnType.Binary)) {

}

Figure 23. Enumeration ColumnType

 

Name

ColumnType

Qualified Name

es::uco::kdis::datapro::dataset::Column::ColumnType

Visibility

public

Abstract

false

Base Classifier

 

Realized Interface

 

 

 

Attribute Detail

Binary

Boolean attribute

 

Type

 

Default Value

 

Visibility

public

Multiplicity

 

 

Categorical

Categorical attribute

 

Type

 

Default Value

 

Visibility

public

Multiplicity

 

 

Date

Date attribute

 

Type

 

Default Value

 

Visibility

public

Multiplicity

 

 

Integer

Integer attribute

 

Type

 

Default Value

 

Visibility

public

Multiplicity

 

 

Nominal

Nominal attribute

 

Type

 

Default Value

 

Visibility

public

Multiplicity

 

 

Numerical

Numerical attribute

 

Type

 

Default Value

 

Visibility

public

Multiplicity

 

 

Range

Range attribute

Type

 

Default Value

 

Visibility

public

Multiplicity

 

 

Relation Detail

Association

 

Name

 

Related Element

     ColumnAbstraction

 

 

Class BinaryColumn

This class represents the abstraction of a binary column. Here the methods that provide specific operations on specific binary data are defined.

 

Figure 24. Class BinaryColumn

 

 

 

Name

BinaryColumn

Qualified Name

es::uco::kdis::datapro::dataset::Column::BinaryColumn

Visibility

public

Abstract

false

Base Classifier

     ColumnAbstraction

Realized Interface

 

 

Operation Detail

BinaryColumn

Default constructor. The implementation BinaryColumnImpl is invoked.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

BinaryColumn

Constructor with the name of the column as a parameter. The implementation BinaryColumnImpl is invoked.

Parameter:

     sName The name of the column.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

 

toCategorical

This method calls the implementation to return a categorical column generated from the binary column. The resulting categorical column defines two categories, one per each binary value (false, true).

Parameters:

     sFalseCategory The category representing the false binary value.

     sTrueCategory The category representing the true binary value.

 

Notes:

    If the value is an empty or a missing value, then a false value is considered.

    If the value is a null value, then a null value is considered.

 

Type

CategoricalColumn

Visibility

public

Is Abstract

false

Parameter

     inout sFalseCategory : String

     inout sTrueCategory : String

 

Relation Detail

Generalization

 

Name

 

Related Element

     ColumnAbstraction

 

 

Class BinaryColumnImpl

 

This class provides the implementation code accessing real data in a binary column. Binary values are stored as objects of class Boolean.

Note: None of its methods should be directly invoked, but only from its specific abstraction.

 

Figure 25. Class BinaryColumnImpl

 

 

Name

BinaryColumnImpl

Qualified Name

es::uco::kdis::datapro::dataset::Column::BinaryColumnImpl

Visibility

public

Abstract

false

Base Classifier

     ColumnImpl

Realized Interface

 

 

 

Attribute Detail

 

All attributes are private.

 

Operation Detail

 

For a more complete specification of the methods inherited from  ColumnImpl, see its specifications above.

 

addAllValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     inout rgoCol : List<Object>

 

addValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     inout oValue : Object

 

addValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

     inout oValue : Object

 

BinaryColumnImpl

Default constructor.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

countEmptyValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countInvalidValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countMissingValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countNullValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

getElement

 

Type

Object

Visibility

public

Is Abstract

false

Parameter

     in iPos : int

 

 

getSize

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

getValues

 

Type

List<Object>

Visibility

public

Is Abstract

false

Parameter

 

 

removeValue

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout iIndex : int

 

setValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

     inout oValue : Object

 

toCategorical

This method implements the method toCategorical of the binary column abstraction, converting the binary column into a categorical column.

Parameters:

     sName The name of the column. By default this property is set by the abstraction to the current name of the binary column.

     sFalseCategory The category representing the false binary value.

     sTrueCategory The category representing the true binary value.

 

Type

CategoricalColumn

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

     inout sFalseCategory : String

     inout sTrueCategory : String

 

 

Relation Detail

Generalization

 

Name

 

Related Element

     ColumnImpl

 

Class CategoricalColumn

 

This class defines the abstraction of a categorical column, where every value belongs to a predefined category. Here the methods that provide specific operations on categorical data are defined.

 

Figure 26. Class CategoricalColumn

 

 

Name

CategoricalColumn

Qualified Name

es::uco::kdis::datapro::dataset::Column::CategoricalColumn

Visibility

public

Abstract

false

Base Classifier

     ColumnAbstraction

Realized Interface

 

 

 

Operation Detail

 

addCategory

This method calls the implementation to add a new category to the set of allowable values. Categories are included as objects of class String.

Parameter:

     szCategory The new category in the column

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout szCategory : String

 

 

CategoricalColumn

Constructor with the name of the column as a parameter. The implementation CategoricalColumnImpl is invoked.

Parameter:

     sName The name of the column

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

 

CategoricalColumn

Default constructor. The implementation CategoricalColumnImpl is invoked.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

 

getCategoryIndex

This method calls the implementation to return the index in the list of categories of a given string. The value -1 is returned if the value is not found.

Parameter:

     szCategory The string representing the category to be searched in the list of categories

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     inout szCategory : String

 

 

getCategoryList

This method calls the implementation to return the list of categories in the column.

 

Type

List<Object>

Visibility

public

Is Abstract

false

Parameter

 

 

 

getCategoryName

This method calls the implementation to return the category string stored in a given position of the list of categories. null is returned if the index given is not valid.

Parameter:

     iIndex The index of the wanted category

 

Type

String

Visibility

public

Is Abstract

false

Parameter

     inout iIndex : Integer

 

 

getElementIndex

This method calls the implementation to return the element stored in a given position in the column. The category index is returned, whereas the default method getElement (inherited from ColumnAbstraction) returns the category by name. If the value is invalid, -1 is returned.

Parameter:

     iPos The index of the item in the column

 

Exceptions:

     IndexOutOfBoundsException

 

Type

Integer

Visibility

public

Is Abstract

false

Parameter

     in iPos : int

 

 

replaceCategory

This method calls the implementation to replace a given category with a new one. Parameters:

     szOldCategory The category string to be replaced

     szNewCategory The new category string to be set

     bJoinCategory If the new category string already exists, then this parameter determines whether the values in of the old category are mixed together with the values of the column whose values coincide

 

1 is returned if the category is successfully replaced, or 0 otherwise.

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in bJoinCategory : boolean

     inout szNewCategory : String

     inout szOldCategory : String

 

 

toBinary

This method calls the implementation to return a binary column generated from the categorical column. Invalid values remain unaltered.

Parameter:

     aReferenceTrueValues The list of category strings to be as true values

 

Type

BinaryColumn

Visibility

public

Is Abstract

false

Parameter

     inout aReferenceTrueValues : List<String>

 

toNominal

This method calls the implementation to return a nominal column generated from the strings stored in the categorical column. Nominal values are extracted from the strings representing each category.

 

Type

NominalColumn

Visibility

public

Is Abstract

false

Parameter

 

 

toNumerical

This method calls the implementation to return an integer column generated from the index values assigned to the categories in the source column.

 

Type

IntegerColumn

Visibility

public

Is Abstract

false

Parameter

 

 

 

Relation Detail

 

Generalization

 

Name

 

Related Element

     ColumnAbstraction

 

 

Class CategoricalColumnImpl

This class provides the implementation code accessing real data in a categorical column. Categories are stored as a HashMap between a String and an Integer. Thus, internally, data are stored as an ArrayList of Integer, whereas their correspondences to categories are saved as String.

This class should never be directly invoked, apart from those invocations coming from its abstraction.

Figure 27. Class CategoricalColumnImpl

 

 

Name

CategoricalColumnImpl

Qualified Name

es::uco::kdis::datapro::dataset::Column::CategoricalColumnImpl

Visibility

public

Abstract

false

Base Classifier

     ColumnImpl

Realized Interface

 

 

 

 

Attribute Detail

 

All attributes are private.

 

Operation Detail

 

For a more complete specification of the methods inherited from  ColumnImpl, see its specification above. Notice that values can be added both as a String identifier- and as an Integerindex- (see methods addValue, addAllValues). In both cases only elements belonging to valid categories are added to the set of values in the column.

 

addCategory

This method implements the functionality of addCategory in the categorical column abstraction, adding a new category to the column. This category should not exist. It returns the index of the new category, if successfully created, or -1 if the category cannot be added.

Parameter:

     sCat The identifier of the new category

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     inout sCat : String

 

addValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     inout oValue : Object

 

addValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in bForce : boolean

     inout oValue : Object

 

addValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

     inout oValue : Object

 

CategoricalColumnImpl

Default constructor.

 

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

countEmptyValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countInvalidValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countMissingValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countNullValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

getCategoryIndex

This method implements the functionality of getCategoryIndex in the column abstraction, returning the index of the category passed as String, or -1 if the category does not exist in the list of categories of the column.

Parameter:

     sCategory The category identifier

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     inout sCategory : String

 

getCategoryList

This method implements the functionality of getCategoryIndex in the column abstraction, returning the list of category identifiers comprised by the category list. The resulting list is not sorted.

 

 

 

Type

List<Object>

Visibility

public

Is Abstract

false

Parameter

 

 

getCategoryName

This method implements the functionality of getCategoryName in the column abstraction, returning the identifier of the category whose index is passed as parameter. If the category does not exist, then null is returned.

Parameter:

     iIndex The category index

 

Type

String

Visibility

public

Is Abstract

false

Parameter

     inout iIndex : Integer

 

getElement

 

Type

Object

Visibility

public

Is Abstract

false

Parameter

     in iPos : int

 

getElementIndex

This method implements the functionality of getElementIndex in the column abstraction, returning the category index stored at a given position. Notice that indexes in the category list do not have to be sorted or sequencial, since categories may be successively created and deleted, causing gaps in the index sequence. Always consider category indexes as numerical identifiers, never as sequential indexes.

This method returns -1 if the position given is invalid.

Parameter:

     iPos The position given in the category list.

 

Exceptions:

     IndexOutOfBoundsException

 

Type

Integer

Visibility

public

Is Abstract

false

Parameter

     in iPos : int

 

getSize

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

 

 

 

 

getValues

 

Type

List<Object>

Visibility

public

Is Abstract

false

Parameter

 

 

removeValue

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

 

 

replaceCategory

This method implements the functionality of replaceCategory in the column abstraction, updating both the category list and replacing the values in the column. 1 is returned if done; 0, otherwise.

Parameters:

     sOldCategory The old category identifier to be replaced

     sNewCategory The new category

     bJoinCategory If true, if the new category identifier already exists in the column, then the values with the old category identifier will be joined to the already existing identifier, having only one category as a result

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in bJoinCategory : boolean

     inout sNewCategory : String

     inout sOldCategory : String

 

setValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

     inout oValue : Object

 

toBinary

This method implements the functionality of toBinary in the column abstraction, returning a binary column constructed from the data contained in the categorical column. The list of category identifiers considered as True values in the binary column is passes as parameter. The non included category identifiers are considered as False values. Note that invalid values are observed.

Parameters:

     aReferenceTrueValues The list of categories representing true values

     sName The name of the new binary column

 

 

 

 

Type

BinaryColumn

Visibility

public

Is Abstract

false

Parameter

     inout aReferenceTrueValues : List<String>

     inout sName : String

 

 

toNominal

This method implements the functionality of toNominal in the column abstraction, returning a nominal column constructed from the data contained in the categorical column. Strings for the nominal column are constructed from the category identifiers.

Parameter:

     sName The name of the new nominal column

 

Type

NominalColumn

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

 

 

toNumerical

This method implements the functionality of toNumerical in the column abstraction, returning an integer column constructed from the data contained in the categorical column. Numbers of the integer column are extracted from the category indexes.

Parameter:

     sName The name of the new integer column

 

Type

IntegerColumn

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

 

Relation Detail

Generalization

 

Name

 

Related Element

     RangeColumnImpl

 

Name

 

Related Element

     ColumnImpl

 

Class DateColumn

This class represents the abstraction of a date datatype column. This type of column is specifically required by ARFF datasets.

Figure 28. Class DateColumn

 

 

Name

DateColumn

Qualified Name

es::uco::kdis::datapro::dataset::Column::DateColumn

Visibility

public

Abstract

false

Base Classifier

     ColumnAbstraction

Realized Interface

 

 

 

Operation Detail

addDateSpecification

This method calls the implementation to set the date format specification of the values in the column.

Parameter:

     sDate The format specification of the values in the date column

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout oDate : SimpleDateFormat

 

DateColumn

Default constructor with no parameters. The implementation DateColumnImpl is invoked.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

DateColumn

Constructor with the name of the column as a parameter. The implementation DateColumnImpl is invoked.

Parameter:

     sName The name of the column

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

 

getDateSpecification

This method calls the implementation to get the date format specification of the values in the column.

 

 

Type

SimpleDateFormat

Visibility

public

Is Abstract

false

Parameter

 

 

Relation Detail

Generalization

 

Name

 

Related Element

     ColumnAbstraction

 

 

Class DateColumnImpl

This class provides the implementation code accessing real data in a date column. Values are stored as

Date objects according to the format specified by a given SimpleDateFormat object. This class should not be invoked directly, only by the column abstraction.

 

Figure 29. Class DateColumnImpl

 

Name

DateColumnImpl

Qualified Name

es::uco::kdis::datapro::dataset::Column::DateColumnImpl

Visibility

public

Abstract

false

Base Classifier

     ColumnImpl

Realized Interface

 

 

Attribute Detail

 

All attributes are private.

 

Operation Detail

For a more complete specification of the methods inherited from  ColumnImpl, see its specifications above.

 

 

 

 

 

addAllValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     inout rgoCol : List<Object>

 

 

addDateSpecification

This method implements the method addDateSpecification of the date column abstraction, setting the date format specification of the values in the column.

Parameter:

     sDate The format specification of the values in the date column

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout oDate : SimpleDateFormat

 

addValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     inout oValue : Object

 

addValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in bForce : boolean

     inout oValue : Object

 

addValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

     inout oValue : Object

 

DateColumnImpl

Default constructor with no parameters.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

 

 

getDateSpecificaiton

This method implements the method getDateSpecification of the column abstraction, returning the date format specification of the values in the column.

 

Type

SimpleDateFormat

Visibility

public

Is Abstract

false

Parameter

 

 

getElement

 

Type

Object

Visibility

public

Is Abstract

false

Parameter

         in iPos : int

 

getSize

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

getValues

 

Type

List<Object>

Visibility

public

Is Abstract

false

Parameter

 

 

removeValue

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

 

setValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

     inout oValue : Object

 

 

Relation Detail

 

Generalization

 

Name

 

Related Element

     ColumnImpl

 

Class IntegerColumn

This class represents the abstraction of an integer column. Integer columns are a specialization of numerical (real) columns.

Figure 30. Class IntegerColumn

 

 

Name

IntegerColumn

Qualified Name

es::uco::kdis::datapro::dataset::Column::IntegerColumn

Visibility

public

Abstract

false

Base Classifier

     NumericalColumn

Realized Interface

 

 

Operation Detail

 

Many methods are specializations of their respective methods in the numerical column (NumericalColumn), adapted to the domain of integer values.

 

getiMaxInterval

Analogously to getdMaxInterval in the NumericalColumn abstraction class, this method gets the maximum integer value allowed for this column.

 

Type

Integer

Visibility

public

Is Abstract

false

Parameter

 

 

getiMinInterval

Analogously to getdMinInterval in the NumericalColumn abstraction class, this method gets the minimum integer value allowed for this column.

 

Type

Integer

Visibility

public

Is Abstract

false

Parameter

 

 

 

 

getMaxValue

See getMaxValue in the specification of the NumericalColumn abstraction class.

 

Type

double

Visibility

public

Is Abstract

false

Parameter

 

 

getMinValue

For further information, see getMinValue in the specification of the NumericalColumn abstraction class.

Type

double

Visibility

public

Is Abstract

false

Parameter

 

 

IntegerColumn

Default constructor with no parameters.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

IntegerColumn

Constructor with the name of the resulting column as a parameter.

Parameter:

     sName The Name of the column

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

 

mean

For further information, see mean in the specification of the NumericalColumn abstraction class.

 

Type

double

Visibility

public

Is Abstract

false

Parameter

 

 

setiMaxInterval

Analogously to setdMaxInterval in the NumericalColumn abstraction class, this method sets the maximum integer value allowed for this column.

Parameter:

     iMaxInterval The maximum value allowed in the column

 

Exceptions:

     IllegalAccessException if the value cannot be set.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout iMaxInterval : Integer

 

setiMinInterval

Analogously to setdMinInterval in the NumericalColumn abstraction class, this method sets the minimum integer value allowed for this column.

Parameter:

     iMinInterval The maximum value allowed in the column

 

Exceptions:

     IllegalAccessException if the value cannot be set.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

         inout iMinInterval : Integer

 

standardDeviation

For further information, see standardDeviation in the specification of the NumericalColumn abstraction class.

 

Type

double

Visibility

public

Is Abstract

false

Parameter

 

 

toCategorical

This method calls the implementation to return a categorical column using the values contained in the integer column, where each different value constitutes a different category.

 

Type

CategoricalColumn

Visibility

public

Is Abstract

false

Parameter

 

 

toNumerical

This method calls the implementation to return a numerical column using the values contained in the integer column, where each integer value is casted to a double value.

 

Type

NumericalColumn

Visibility

public

Is Abstract

false

Parameter

 

 

 

 

 

 

Relation Detail

Generalization

 

 

Name

 

Related Element

     NumericalColumn

 

 

Class IntegerColumnImpl

This class provides the implementation code accessing real data in an integer column. This class is a specialization of the numerical column implementation (NumericalColumnImpl). Integer values are stored as objects of class Integer. This class and its methods should not be invoked directly.

Figure 31. Class IntegerColumnImpl

 

Name

IntegerColumnImpl

Qualified Name

es::uco::kdis::datapro::dataset::Column::IntegerColumnImpl

Visibility

public

Abstract

false

Base Classifier

     NumericalColumnImpl

Realized Interface

 

 

Operation Detail

 

For further information, see a complete specification of these methods in NumericalColumnImpl and ColumnImpl.

 

addValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     inout oValue : Object

 

addValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

     inout oValue : Object

 

getMaxValue

 

Type

double

Visibility

public

Is Abstract

false

Parameter

 

 

getMinValue

 

Type

double

Visibility

public

Is Abstract

false

Parameter

 

 

IntegerColumnImpl

Default constructor with no parameters.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

mean

 

Type

double

Visibility

public

Is Abstract

false

Parameter

 

 

 

setValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

     inout oValue : Object

 

standardDeviation

 

Type

double

Visibility

public

Is Abstract

false

Parameter

 

 

toCategorical

This method implements the method toNumerical of the abstraction, returning a categorical column using the values contained in the integer column, where each different value constitutes a different category.

Parameter:

     sName The name of the resulting column

 

Type

CategoricalColumn

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

 

toNumerical

This method implements the method toNumerical of the abstraction, returning a numerical column using the values contained in the integer column, where each different value constitutes a different category.

Parameter:

     sName The name of the resulting column

 

Type

NumericalColumn

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

 

 

Relation Detail

Generalization

 

Name

 

Related Element

     NumericalColumnImpl

 

 

Class NominalColumn

This class represents the abstraction of a nominal column containing free-style strings as values. Here the methods that provide specific operations of nominal values are defined.

Figure 32. Class NominalColumn

 

 

Name

NominalColumn

Qualified Name

es::uco::kdis::datapro::dataset::Column::NominalColumn

Visibility

public

Abstract

false

Base Classifier

     ColumnAbstraction

Realized Interface

 

 

 

Operation Detail

NominalColumn

Default constructor with no parameters.

 

 

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

NominalColumn

Constructor with the name of the column as parameter.

Parameter:

     sName Name of the column

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

 

toCategorical

This method calls the implementation to return a categorical column, where each different string is a category (no repetition).

 

Type

CategoricalColumn

Visibility

public

Is Abstract

false

Parameter

 

 

toNumerical

This method calls the implementation to return a numerical column, where each string is parsed to a numerical value. If the string could not be parsed, then an empty value is added instead. Notice that upper and lower interval values are not set for the numerical column returned.

 

Type

NumericalColumn

Visibility

public

Is Abstract

false

Parameter

 

 

 

Relation Detail

Generalization

 

Name

 

Related Element

     ColumnAbstraction

 

 

Class NominalColumnImpl

This class provides the implementation code accessing real data in the nominal column. Nominal values are stored as String objects. Note that these methods should not be invoked directly.

Figure 33. Class NominalColumnImpl

 

 

Name

NominalColumnImpl

Qualified Name

es::uco::kdis::datapro::dataset::Column::NominalColumnImpl

Visibility

public

Abstract

false

Base Classifier

     ColumnImpl

Realized Interface

 

 

Attribute Detail

 

All attributes are private.

 

Operation Detail

 

For a more detailed specification of the methods inherited from  ColumnImpl, see its specification above.

 

addAllValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     inout rgoCol : List<Object>

 

addValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     inout oValue : Object

 

 

 

 

 

 

 

 

 

addValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

     inout oValue : Object

 

countEmptyValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countInvalidValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countMissingValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countNullValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

getElement

 

Type

Object

Visibility

public

Is Abstract

false

Parameter

     in iPos : int

 

getSize

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

 

 

getValues

 

Type

List<Object>

Visibility

public

Is Abstract

false

Parameter

 

 

 

NominalColumnImpl

Default constructor with no parameters.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

removeValue

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

 

setValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

     inout oValue : Object

 

toCategorical

This method implements the method toCategorical of the abstraction, returning a categorical column, where each different string is a category (no repetition).

Parameter:

     sName The name of the column to be created

 

Type

CategoricalColumn

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

 

toNumerical

This method implements the method toNumerical of the abstraction, returning a numerical column, where each string is parsed to a numerical value. If the string could not be parsed, then an empty value is added instead. Notice that upper and lower interval values are not set for the numerical column returned.

Parameter:

     sName The name of the column

 

 

 

Type

NumericalColumn

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

 

 

Relation Detail

Generalization

 

Name

 

Related Element

     ColumnImpl

 

 

Class NumericalColumn

This class represents the abstraction of a numerical (real) column.

 

Figure 34. Class NumericalColumn

 

Name

NumericalColumn

Qualified Name

es::uco::kdis::datapro::dataset::Column::NumericalColumn

Visibility

public

Abstract

false

Base Classifier

     ColumnAbstraction

Realized Interface

 

 

 

Attribute Detail

dMaxInterval

This attribute indicates the maximum value allowed in the column. This property should be accessed using getter/setter methods.

 

Type

Double

Default Value

Double.MAX_VALUE

Visibility

protected

Multiplicity

 

 

dMinInterval

This attribute indicates the minimum value allowed in the column. This property should be accessed using getter/setter methods.

 

Type

Double

Default Value

Double.MIN_VALUE

Visibility

protected

Multiplicity

 

 

 

Operation Detail

getdMaxInterval

This method returns the maximum real value allowed in the column. The implementation is not invoked, since this information is part of metadata.

 

Type

Double

Visibility

public

Is Abstract

false

Parameter

 

 

getdMinInterval

This method returns the minimum real value allowed in the column. The implementation is not invoked, since this information is part of metadata.

 

Type

Double

Visibility

public

Is Abstract

false

Parameter

 

 

getMaxValue

This method calls the implementation to get the maximum existing value in the column data.

 

Type

double

Visibility

public

Is Abstract

false

Parameter

 

 

getMinValue

This method calls the implementation to get the minimum existing value in the column data.

 

Type

double

Visibility

public

Is Abstract

false

Parameter

 

 

mean

This method calls the implementation to get the mean value of the column data.

 

Type

double

Visibility

public

Is Abstract

false

Parameter

 

 

normalize

This method calls the implementation to normalize the set of values in the numerical column.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

NumericalColumn

Default constructor with no parameters. The implementation NumericalColumnImpl is invoked.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

NumericalColumn

Constructor with the name of the column as a parameter. The implementation NumericalColumnImpl is invoked.

Parameter:

     sName The name of the column

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

 

setdMaxInterval

This method sets the maximum real value allowed in the column. The implementation is not invoked, since this information is part of the column metadata.

Parameter

     dMaxInterval The maximum value allowed

 

Exceptions:

     IllegalAccessException if the value cannot be set

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout dMaxInterval : Double

 

setdMinInterval

This method sets the minimum real value allowed in the column. The implementation is not invoked, since this information is part of the column metadata.

Parameter

     dMinInterval The minimum value allowed

 

Exceptions:

     IllegalAccessException if the value cannot be set

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout dMinInterval : Double

 

standardDeviation

This method calls the implementation to return the standard deviation calculated from the set of values in the numerical column.

 

Type

double

Visibility

public

Is Abstract

false

Parameter

 

 

standarize

This method calls the implementation to standarize the set of values in the numerical column.

Parameters:

     dMean Value of the mean used to standardize the set of values of the column

     dVariance Value of the variance used for the standardization

 

Type

void

Visibility

public

Is Abstract

false

Parameter

in dMean : double

in dVariance : double

 

 

toInteger

This method calls the implementation to return an integer column containing values extracted from the numerical column. It returns an IntegerColumn object.

Parameter:

     bRoundedValue if false, values are truncated; if true, values are rounded.

 

Type

IntegerColumn

Visibility

public

Is Abstract

false

Parameter

     in bRoundedValue : boolean

 

toNominal

This method calls the implementation to return a nominal column, where strings are constructed from real values.

 

Type

NominalColumn

Visibility

public

Is Abstract

false

Parameter

 

 

Relation Detail

Generalization

 

Name

 

Related Element

     ColumnAbstraction

 

 

Class NumericalColumnImpl

This class provides the implementation code accessing real data in a numerical column. Values are stored as objects of the class Double. Notice that this class should not be directly instantiated, with the exception of its abstraction.

Figure 35. Class NumericalColumnImpl

 

 

Name

NumericalColumnImpl

Qualified Name

es::uco::kdis::datapro::dataset::Column::NumericalColumnImpl

Visibility

public

Abstract

false

Base Classifier

     ColumnImpl

Realized Interface

 

 

Attribute Detail

 

All the attributes are either private or protected.

 

Operation Detail

addAllValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     inout rgoCol : List<Object>

 

addValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     inout oValue : Object

 

addValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

     inout oValue : Object

 

countEmptyValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countInvalidValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countMissingValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countNullValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

getElement

 

Type

Object

Visibility

public

Is Abstract

false

Parameter

     in iPos : int

 

 

getMaxValue

This method implements the method getMaxValue of the abstraction class, returning the maximum existing value in the column.

 

Type

double

Visibility

public

Is Abstract

false

Parameter

 

 

getMinValue

This method implements the method getMinValue of the abstraction class, returning the maximum existing value in the column.

 

Type

double

Visibility

public

Is Abstract

false

Parameter

 

 

getSize

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

getValues

 

Type

List<Object>

Visibility

public

Is Abstract

false

Parameter

 

 

mean

This method implements the method mean of the abstraction class, returning the mean value of the column.

 

Type

double

Visibility

public

Is Abstract

false

Parameter

 

 

normalize

This method implements the method normalize of the abstraction class, calculating and normalizing the values contained in the set of values of the column.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

NumericalColumnImpl

Default constructor with no parameters.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

removeValue

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

 

setValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

     inout oValue : Object

 

standardDeviation

This method implements the method standardDeviation of the abstraction class, returning the standard deviation value of the set of values contained in the numerical column.

 

Type

double

Visibility

public

Is Abstract

false

Parameter

 

 

standarize

This method implements the method standarize of the abstraction class, standardizing the values in the column according to the mean and variance passed as parameter.

Parameters:

     dMean Mean value considered for the standardization

     dVariance Variance value considered for the standardization

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     in dMean : double

     in dVariance : double

 

toInteger

This method implements the method toInteger of the abstraction class, returning an integer column calculated from the numerical column.

Parameters:

     sName The name of the resulting new column

     bRoundedValue If false, values are truncated; if true, values are rounded

 

 

Type

IntegerColumn

Visibility

public

Is Abstract

false

Parameter

     in bRoundedValue : boolean

     inout sName : String

 

toNominal

This method implements the method toNominal of the abstraction class, returning a nominal column which strings are constructed parsing the numerical values in the column.

Parameter:

     sName The name of the resulting new column

 

Type

NominalColumn

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

 

 

Relation Detail

Generalization

 

Name

 

Related Element

     ColumnImpl

 

 

Class RangeColumn

This class represents the abstraction of a range column, whose values are intervals with a minimum and a maximum value in the range.

Figure 36. Class RangeColumn

 

 

Name

RangeColumn

Qualified Name

es::uco::kdis::datapro::dataset::Column::RangeColumn

Visibility

public

Abstract

false

Base Classifier

     ColumnAbstraction

Realized Interface

 

 

Operation Detail

RangeColumn

Default constructor with no parameters.

 

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

 

RangeColumn

Constructor with the name of the column as a parameter.

Parameter:

     sName The name of the column.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

 

 

toCategorical

This method calls the implementation to return a categorical column extracted from the range data contained in the column. The method returns a CategoricalColumn object.

 

Exceptions:

     NotAddedValueException

 

Type

CategoricalColumn

Visibility

public

Is Abstract

false

Parameter

 

 

 

toNumerical

This method calls the implementation to return a numerical column extracted from the range values contained in the column, and according to on of the following modes:

0: The minimum value of each range is selected.

1: The maximum value of each range is selected.

2: The mean value between min and max is selected.

3: A random value in the range is selected.

 

It returns the resulting NumericalColumn object.

Parameter:

     iMode An integer between 0 and 3 indicating the conversion mode, as described above.

 

Exceptions:

     NotAddedValueException

 

Type

NumericalColumn

Visibility

public

Is Abstract

false

Parameter

     inout iMode : int

 

toNumericalByGaussian

This method calls the implementation to return a numerical column extracted from the range values contained in the column, according to the Gauss distribution.

Parameters:

     dMean The arithmetic mean for the distribution

     dStdDev The standard deviation for the distribution

 

It returns the resulting NumericalColumn object.

Exceptions:

     NotAddedValueException

 

Type

NumericalColumn

Visibility

public

Is Abstract

false

Parameter

     in dMean : double

     in dStdDev : double

 

 

Relation Detail

Generalization

 

 

Name

 

Related Element

     ColumnAbstraction

 

 

Class RangeColumnImpl

This class, the abstraction of a range column (i.e. a representation of a [min, max] interval), is the one that should be used by the programmer, since it hides the actual implementation of the column. Even when the implementation changes, the abstraction must remain unaltered.

Figure 37. Class RangeColumnImpl

 

 

 

Name

RangeColumnImpl

Qualified Name

es::uco::kdis::datapro::dataset::Column::RangeColumnImpl

Visibility

public

Abstract

false

Base Classifier

     ColumnImpl

Realized Interface

 

 

Attribute Detail

 

All attributes are private.

 

Operation Detail

 

For a detailed specification of the methods inherited from  ColumnImpl, see its specifications above.

 

addAllValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     inout rgoValues : List<Object>

 

addValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     inout oValue : Object

 

addValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

     inout oValue : Object

 

countEmptyValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countInvalidValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

 

countMissingValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

countNullValues

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

getElement

 

Type

Object

Visibility

public

Is Abstract

false

Parameter

     in iPos : int

 

getSize

 

Type

int

Visibility

public

Is Abstract

false

Parameter

 

 

getValues

 

Type

List<Object>

Visibility

public

Is Abstract

false

Parameter

 

 

RangeColumn

Default constructor with no parameters.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

RangeColumn

Constructor with the name of the column as a Parameter.

Parameter:

     sName The name of the column.

 

 

 

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout sName : String

 

removeValue

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

 

setValue

 

Type

int

Visibility

public

Is Abstract

false

Parameter

     in iIndex : int

     inout oValue : Object

 

toCategorical

This method implements the method toCategorical of the abstraction, returning a categorical column extracted from the range data contained in the column. The method returns the resulting CategoricalColumn object.

 

Exceptions:

     NotAddedValueException

 

Type

CategoricalColumn

Visibility

public

Is Abstract

false

Parameter

 

 

toNumerical

This method implements the method toNumerical of the abstraction, returning a numerical column extracted from the range values contained in the column, and according to on of the following modes:

0: The minimum value of each range is selected.

1: The maximum value of each range is selected.

2: The mean value between min and max is selected.

3: A random value in the range is selected.

 

It returns the resulting NumericalColumn object.

Parameter:

     iMode An integer between 0 and 3 indicating the conversion mode, as described above.

 

Exceptions:

     NotAddedValueException

 

 

Type

NumericalColumn

Visibility

public

Is Abstract

false

Parameter

     in iMode : int

 

 

toNumericalByGaussian

This method implements the method toNumericalByGaussian of the abstraction, returning a numerical column extracted from the range values contained in the column, according to the Gauss distribution.

Parameters:

     dMean The arithmetic mean for the distribution

     dStdDev The standard deviation for the distribution

 

It returns the resulting NumericalColumn object.

Exceptions:

     NotAddedValueException

 

Type

NumericalColumn

Visibility

public

Is Abstract

false

Parameter

     in dMean : double

     in dStdDev : double

 

 

Relation Detail

 

Generalization

 

 

Name

 

Related Element

     ColumnImpl


 

Package es::uco::kdis::datapro::dataset::Source

 

 

Figure 38. Package es.uco.kdis.datapro.dataset.Source

 

 

Name

Source

Qualified Name

es::uco::kdis::datapro::dataset::Source

 

 

Class ArffDataset

ArffDataset implements the ARFF (Attribute-Relation File Format) dataset file specification, as used by Weka. This is a subclass of FileDataset.

ARFF files are ASCII text files that describe a list of instances sharing a set of attributes. After a few heading lines, where the metainformation is presented, one instance per line is dumped, until the end of the file is reached.

Types of attribute in ARFF dataset files:

     @ATTRIBUTE name numeric (As numerical columns)

     @ATTRIBUTE name {value1, value2, ...} (As categorical columns)

     @ATTRIBUTE name string (As nominal columns)

     @ATTRIBUTE name date "yyyy-MM-dd HH:mm:ss" (As date columns)

 

 

For a further description, visit the web site  http://www.cs.waikato.ac.nz/ml/weka/arff.html (Nov. 1st, 2008).

Figure 39. Class ArffDataset

 

Name

ArffDataset

Qualified Name

es::uco::kdis::datapro::dataset::Source::ArffDataset

Visibility

public

Abstract

false

Base Classifier

     FileDataset

Realized Interface

 


 

Attribute Detail

 

Some attributes are protected to allow reusability by inheritance.

 

ATTRIBUTE

ATTRIBUTE is the static constant string for the ARFF keyword '@attribute'.

 

Type

String

Default Value

"@attribute"

Visibility

protected

Multiplicity

 

 

DATA

DATA is the static constant string for the ARFF keyword '@data'. It defines the beginning of the data block in the ARFF file.

 

Type

String

Default Value

"@data"

Visibility

protected

Multiplicity

 

 

RELATION

RELATION is the static constant with the ARFF keyword '@relation'. It represents the beginning of the ARFF dataset definition.

 

Type

String

Default Value

"@relation"

Visibility

protected

Multiplicity

 

 

 

Operation Detail

addAllValues

This method reads the DATA block in the dataset and adds the values in the file to the corresponding column structure.

Parameter:

     sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.

o  s: Nominal column

o  f: Numerical (real) column

o  c: Categorical column

o  b: Binary column

o  d: Date column

o  %: Skip this column (do not dump its values to any column)

 

For example, the string “cbbf%%d” indicates the sequence of attributes that are read from the dataset and copied in memory to the column structure of the dataset: a categorical column, two binary columns, and a numerical column. The following two attributes are omitted. Finally, the date attribute is copied.

 

 

Exceptions:

     IndexOutOfBoundsException

     IOException

     NotAddedValueException

 

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

     inout sColumnFormat : String

 

 

ArffDataset

Default constructor with no parameters. No dataset filename is specified using this constructor.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

ArffDataset

Constructor with the filename of the dataset as a parameter.

Parameter:

     sFileName The filename of the dataset

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout sFileName : String

 

close

This method closes the ARFF file.

Exception:

     IOException

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

obtainMetadata

This method reads the metadata of an ARFF file. Each attribute specification is interpreted and, if required, the column structure is created in the dataset.

 

This method reads the metadata block of the dataset. Parameter:

     sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.

o  s: Nominal column

o  f: Numerical (real) column

o  c: Categorical column

o  b: Binary column

o  d: Date column

o  %: Skip this column (do not dump its values to any column)

 

For example, the code "bbf%c" indicates that two binary columns and a numerical (real) column will be read. Then, the forth attribute will be skipped and, finally, a categorical column will be read.

 

Exceptions:

     IOException

     InputMismatchException

 

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

     inout sColumnFormat : String

 

open

This method opens the dataset file using the name passed as a parameter to the constructor.

Exceptions:

     FileNotFoundException

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

 

readDataset

This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.

Parameter:

     sContentFormat Not considered for ARFF datasets

     sColumnFormat String that specifies the types of columns to be read. Each column type is represented by one of the following symbols:

o  s: Nominal column

o  f: Numerical column

o  c: Categorical column

o  b: Binary column

o  d: Date column

o  %: Skip this column

 

Exceptions:

     NotAddedValueException

     IOException

     IndexOutOfBoundsException

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout sColumnFormat : String

     inout sContentFormat : String

 

readDataset

This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.

Parameter:

     sColumnFormat String that specifies the types of columns to be read. Each column type is represented by one of the following symbols:

o  s: Nominal column

o  f: Numerical column

o  c: Categorical column

o  b: Binary column

o  d: Date column

o  %: Skip this column

 

Exceptions:

      NotAddedValueException

      IOException

      IndexOutOfBoundsException

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout sColumnFormat : String

 

readDataset

This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file. The value of the column format string is null.

 

Exceptions:

     NotAddedValueException

     IOException

     IndexOutOfBoundsException

 

Type

void

Visibility

public

Is Abstract

false

Parameter

 

 

writeDataset

This method opens the dataset file, writes metadata and instances, and closes the file. The column types accepted (otherwise, an InputMismatchException exception is thrown) are the following:

     Numerical

     Date

     Nominal

     Categorical

     Boolean (binary values are saved as categorical values)

 

Parameter:

     sOutputFile The filename of the dataset

 

Exceptions:

     InputMismatchException

     IOException

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout sOutputFile : String

 

 

Relation Detail

 

Generalization

 

Name

 

Related Element

     FileDataset

 

 

Class CsvDataset

CsvDataset implements the CSV (Comma-Separated Values) dataset file specification, as prescribed by the IETF specification, available from  http://tools.ietf.org/html/rfc4180 (October, 2005).

 

Figure 40. Class CsvDataset

 

 

Name

CsvDataset

Qualified Name

es::uco::kdis::datapro::dataset::Source::CsvDataset

Visibility

public

Abstract

false

Base Classifier

     FileDataset

Realized Interface

 

 

Operation Detail

addAllValues

This method adds all the values in the file to the corresponding column structure. Parameter:

     sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.

o  s: Nominal column

o  f: Numerical (real) column

o  i: Integer column

o  c: Categorical column

o  %: Skip this column (do not dump its values to any column)

 

For example, the string “cf%%s indicates the sequence of attributes that are read from the dataset and copied in memory to the column structure of the dataset: a categorical column and a numerical column. The following two attributes are omitted, and, finally, the date attribute is copied.

 

Exceptions:

     IndexOutOfBoundsException

     IOException

     NotAddedValueException

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

     inout sColumnFormat : String

 

close

This method closes the CSV file.

Exception:

     IOException

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

 

CsvDataset

The default constructor of the CSV dataset with no parameters.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

CsvDataset

Constructor of the CSV dataset with its filename as a parameter.

Parameter:

     sFileName The filename of the CVS dataset

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout sFileName : String

 

obtainMetadata

This method reads the metadata of the CSV file. Notice that any metainformation in CSV files is optional.

Parameter:

     sContentFormat String that specifies the structure of the CSV file. The following symbols are used:

o  n: Indicates that a line with the attribute names is read

o  v: Indicates the block containing the instance values is read

o  %: Skip one row in the file

     sColumnFormat String that specifies the different column types to be read from the file. Each column type is represented by one of the following symbols:

o  s: Nominal column

o  f: Numerical (real) column

o  c: Categorical column

o  i: Integer column

o  %: Skip this column

 

Exceptions:

     IOException

     IllegalFormatSpecificationException

 

Type

void

Visibility

Protected

Is Abstract

false

Parameter

     inout sColumnFormat : String

     inout sContentFormat : String

 

open

This method opens the dataset CSV file using the name passed as a parameter to the constructor.

Exceptions:

     FileNotFoundException

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

readDataset

This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.

Parameter:

     sContentFormat String that specifies the structure of the CSV file. The following symbols are used:

o  n: Indicates that a line with the attribute names is read

o  v: Indicates the block containing the instance values is read

o  %: Skip one row in the file

For example, “%n%%v omits the first line, then reads the column names, omits

the next two lines and, finally, reads the dataset instances

     sColumnFormat String that specifies the types of columns to be read. Each column type is represented by one of the following symbols:

o  s: Nominal column

o  f: Numerical column

o  i: Integer column

o  c: Categorical column

o  %: Skip this column

 

Exceptions:

     NotAddedValueException

     IOException

     IndexOutOfBoundsException

     IllegalFormatSpecificationException

 

 

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout sColumnFormat : String

     inout sContentFormat : String

 

 

readDataset

This method opens the dataset, reads metainformation and instances and, finally, closes the dataset file. This method assumes the following file format: one first line with the attribute names (metadata), followed by the instances.

Parameter:

     sColumnFormat String that specifies the types of columns to be read. Each column type is represented by one of the following symbols:

o  s: Nominal column

o  f: Numerical column

o  i: Integer column

o  c: Categorical column

o  %: Skip this column

 

Exceptions:

      NotAddedValueException

      IOException

      IndexOutOfBoundsException

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout sColumnFormat : String

 

 

writeDataset

This method writes a new CVS dataset file. The column types allowed for writing are the following:

      Numerical

      Integer

      Nominal

      Categorical

      Binary (binary values are saved as categorical values)

Parameter:

     sOutputFile The filename of the dataset

 

Exceptions:

     IOException

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout sOutputFile : String

 

 

 

 

 

 

Relation Detail

Generalization

 

 

Name

 

Related Element

     FileDataset

 

 

Class ExcelDataset

ExcelDataset is a class that represents a dataset conformant to the Microsoft Excel standard specification. This type of files has the basic features of all spreadsheets, using a grid of cells arranged in numbered rows and letter-named columns.

Note: This class has external dependencies to the Java library POI.

 

Figure 41. Class ExcelDataset

 

Name

ExcelDataset

Qualified Name

es::uco::kdis::datapro::dataset::Source::ExcelDataset

Visibility

public

Abstract

false

Base Classifier

     FileDataset

Realized Interface

 

 

Attribute Detail

 

All attributes are private.

 

Operation Detail

addAllValues

This method adds all the values in the DATA block of the file to the corresponding column structure. Parameter:

     sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.

o  s: Nominal column

o  f: Numerical (real) column

o  i: Integer column

o  c: Categorical column

o  %: Skip this column (do not dump its values to any column)

 

For example, the string “cf%%s indicates the sequence of attributes that are read from the dataset and copied in memory to the column structure of the dataset: a categorical column and a numerical column. The following two attributes are omitted, and, finally, the nominal attribute is copied.

Exceptions:

     IndexOutOfBoundsException

     IOException

     NotAddedValueException

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

     inout sColumnFormat : String

 

 

close

Close the Excel file.

Exceptions:

     IOException

 

Type

void

Visibility

Protected

Is Abstract

false

Parameter

 

 

 

ExcelDataset

Default constructor with no parameters.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

 

ExcelDataset

Constructor with the filename as parameter.

Parameter:

     sFileName The filename of the Excel dataset

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout sFileName : String

 

obtainMetadata

This method reads the metadata of the Excel file.

Parameter:

     sContentFormat String that specifies the data structure in the Excel file. The following symbols are used:

o  n: Indicates that a line with the attribute names is read

o  v: Indicates the block containing the instance values is read

o  %: Skip one row in the file

     sColumnFormat String that specifies the different column types to be read from the file. Each column type is represented by one of the following symbols:

o  s: Nominal column

o  f: Numerical (real) column

o  c: Categorical column

o  i: Integer column

o  %: Skip this column

 

Exceptions:

     IOException

     IllegalFormatSpecificationException

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

     inout sColumnFormat : String

     inout sContentFormat : String

 

 

open

This method opens the Excel file using the name passed as a parameter to the constructor.

Exceptions:

     FileNotFoundException

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

 

 

 

readDataset

This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.

Parameter:

     sContentFormat String that specifies the structure of the CSV file. The following symbols are used:

o  n: Indicates that a line with the attribute names is read

o  v: Indicates the block containing the instance values is read

o  %: Skip one row in the file

For example, “%n%%v omits the first line, then reads the column names, omits

the next two lines and, finally, reads the dataset instances

     sColumnFormat String that specifies the types of columns to be read. Each column type is represented by one of the following symbols:

o  s: Nominal column

o  f: Numerical column

o  i: Integer column

o  c: Categorical column

o  %: Skip this column


 

Exceptions:

     NotAddedValueException

     IOException

     IndexOutOfBoundsException

     IllegalFormatSpecificationException

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout sColumnFormat : String

     inout sContentFormat : String

 

 

writeDataset

This method writes the dataset to a new Excel file. The column types supported for writing are the following:

      Numerical

      Integer

      Nominal

      Categorical

      Binary (binary values are saved as categorical values)

Parameter:

     sOutputFile The filename of the dataset

 

Exceptions:

     IOException

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout sOutputFile : String

 

 

Relation Detail

 

Generalization

 

 

Name

 

Related Element

     FileDataset

 

 

Class KeelDataset

KeelDataset is the class representing a dataset conformant to the KEEL (Knowledge Extraction based on Evolutionary Learning) standard specification. KeelDataset is a subclass of ArffDataset.

KEEL files are a specific subtype of ARFF files with the following kind of attributes:

     @ATTRIBUTE name real [value1, value2] for real data

     @ATTRIBUTE name integer [value1, value2] for integer data

     @ATTRIBUTE name {value1, value2, ...} for categorical data

 

For a more detailed description of this specification, the reader can consult the following reference:

J. Alcalá-Fdez et al. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. Journal of Multiple-Valued Logic and Soft Computing 17:2-3 (2011) 255-287.

 

Also, for further information, visit the website http://www.keel.es.

Figure 42. Class KeelDataset

 

Name

KeelDataset

Qualified Name

es::uco::kdis::datapro::dataset::Source::KeelDataset

Visibility

public

Abstract

false

Base Classifier

     ArffDataset

Realized Interface

 

 

 

Attribute Detail

INPUTS

Constant for the keyword @inputs

 

Type

String

Default Value

"@inputs"

Visibility

protected

Multiplicity

 

 

OUTPUTS

Constant for the keyword @outputs

 

Type

String

Default Value

"@outputs"

Visibility

protected

Multiplicity

 

 

 

Operation Detail

addAllValues

This method adds all the values in the @DATA block of the file to the corresponding column structure.

Parameter:

     sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.

o  f: Numerical (real) column

o  i: Integer column

o  c: Categorical column

o  b: Binary column

o  %: Skip this column (do not dump its values to any column)

 

For example, the string “cf%%b” indicates the sequence of attributes that are read from the dataset and copied in memory to the column structure of the dataset: a categorical column and a numerical column. The following two attributes are omitted, and, finally, the binary attribute is copied.

Exceptions:

     IndexOutOfBoundsException

     IOException

     NotAddedValueException

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

     inout sColumnFormat : String

 

 

KeelDataset

Default constructor with no parameters.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

 

KeelDataset

Constructor with the filename of the dataset as a parameter.

Parameter:

     sFileName The filename containing the dataset

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout sFileName : String

 

 

obtainMetadata

This method reads the metadata of the KEEL file.

Parameter:

     sColumnFormat String that specifies the different column types to be read from the file. Each column type is represented by one of the following symbols:

o  b: Binary column

o  f: Numerical (real) column

o  c: Categorical column

o  i: Integer column

o  %: Skip this column

 

Exceptions:

     IOException

     IllegalFormatSpecificationException

 

Type

void

Visibility

protected

Is Abstract

false

Parameter

     inout sColumnFormat : String

 

writeDataset

This method writes the dataset to a new Excel file. Only the following types of column are supported for writing:

     Numerical (real)

     Integer

     Categorical

 

Parameter:

     sOutputFile The filename of the dataset

 

Exceptions:

     IOException

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout sOutputFile : String

 

 

Relation Detail

Generalization

 

Name

 

Related Element

     ArffDataset


 

Package es::uco::kdis::datapro::datatypes

 

 

Figure 43. Package es.uco.kdis.datapro.datatypes

 

 

Name

datatypes

Qualified Name

es::uco::kdis::datapro::datatypes

 

 

Class InvalidValue

This abstract class represents any invalid value in a column. This is the base class of the following types of invalid values:

     Missing values.

     Null values.

     Empty values.

 

 

For a more detailed description, see the following reference:

Pyle, D. Data preparation for data mining. Morgan Kaufmann, 1999. ISBN: 1-55869-529-0.

 

Note. Notice that columns may define their own invalid values. However, these values are not processed by the library, but only devoted to serialization and specific algorithms. Generally, these objects for invalid values are more than enough for a regular use. Further, these objects are notation-independent, and only used for data processing.

Figure 44. Class InvalidValue

 

Name

InvalidValue

Qualified Name

es::uco::kdis::datapro::datatypes::InvalidValue

Visibility

public

Abstract

true

Base Classifier

 

Realized Interface

 

 

Relation Detail

Generalization

 

Name

 

Related Element

     MissingValue

 

Name

 

Related Element

     EmptyValue


 

Name

 

Related Element

     NullValue

 

 

Class EmptyValue

This class represents an empty value in a variable, i.e., the one for which no real-world value can be supposed.

This class implements a singleton object, so only one reference can be instantiated simultaneously. Instantiation is done using the method getEmptyValue. Therefore, empty values can be compared using the operator ==.

Figure 45. Class EmptyValue

 

Name

EmptyValue

Qualified Name

es::uco::kdis::datapro::datatypes::EmptyValue

Visibility

public

Abstract

false

Base Classifier

     InvalidValue

Realized Interface

 

 

Attribute Detail

 

All attributes are private.

 

Operation Detail

getEmptyValue

Singleton constructor for the object representing an empty value.

 

Type

EmptyValue

Visibility

public

Is Abstract

false

Parameter

 

 

 

Relation Detail

Generalization

 

Name

 

Related Element

     InvalidValue

 

 

Class MissingValue

This class represents a missing value in a variable, i.e., the one that has not been entered into the dataset, but for which an actual value exists in the real-world in which the measurements were made.

This class implements a singleton object, so only one reference can be instantiated simultaneously. Instantiation is done using the method getMissingValue. Therefore, missing values can be compared using the operator ==.

Figure 46. Class MissingValue

 

Name

MissingValue

Qualified Name

es::uco::kdis::datapro::datatypes::MissingValue

Visibility

public

Abstract

false

Base Classifier

     InvalidValue

Realized Interface

 

 

Attribute Detail

 

All attributes are private.

 

Operation Detail

getMissingValue

Singleton constructor for the object representing a missing value.

 

Type

MissingValue

Visibility

public

Is Abstract

false

Parameter

 

 

 

Relation Detail

 

Generalization

 

 

Name

 

Related Element

     InvalidValue

 

 

Class NullValue

This class represents an explicit null value in a variable.

This class implements a singleton object, so only one reference can be simultaneously instantiated. Instantiation is done using the method getNullValue. Therefore, null values can be compared using the operator ==. Its use allows the programmer to replace null values with comparable object instances (e.g. in collections, comparisons, etc.).

Figure 47. Class NullValue

 

 

Name

NullValue

Qualified Name

es::uco::kdis::datapro::datatypes::NullValue

Visibility

public

Abstract

false

Base Classifier

     InvalidValue

Realized Interface

 

 

Attribute Detail

 

All attributes are private.

 

Operation Detail

getNullValue

Singleton constructor for the object representing a null value.

 

Type

NullValue

Visibility

public

Is Abstract

false

Parameter

 

 

 

Relation Detail

 

Generalization

 

Name

 

Related Element

     InvalidValue

 

 

Class Range

This class is a template to represent any kind of interval consisting of a maximum and minimum limit. These boundaries can be open or close, indicating that the value is excluded or included in the range. The C defined by the template is the class of object involved in the range.

Figure 48. Class Range

 

Name

Range

Qualified Name

es::uco::kdis::datapro::datatypes::Range

Visibility

public

Abstract

true

Base Classifier

 

Realized Interface

 

 

Attribute Detail

 

Protected attributes with accessors (getter/setter) are omitted.

 

Operation Detail

getMaxValue

This method returns the upper interval boundary value, i.e. the maximum value in the interval (the programmer has to check whether the interval is open or close).

 

Type

C

Visibility

public

Is Abstract

false

Parameter

 

 

getMinValue

This method returns the lower interval boundary value, i.e. the minimum value in the interval (the programmer has to check whether the interval is open or close).

 

Type

C

Visibility

public

Is Abstract

false

Parameter

 

 

isOpenMax

This method returns a boolean value indicating whether the upper interval boundary is open, i.e. the maximum value is excluded from the range.

 

Type

boolean

Visibility

public

Is Abstract

false

Parameter

 

 

isOpenMin

This method returns a boolean value indicating whether the lower interval boundary is open, i.e. the minimum value is excluded from the range.

 

Type

boolean

Visibility

public

Is Abstract

false

Parameter

 

 

setMaxValue

This method sets the upper interval boundary.

Parameter:

     oMax The new maximum value

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout oMax : C

 

setMinValue

This method sets the lower interval boundary.

Parameter:

     oMin The new minimum value

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout oMin : C

 

setOpenMax

This method sets the upper interval boundary to open or close.

Parameter:

     bOpenMax True if open; false if close.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout bOpenMax : boolean

 

setOpenMin

This method sets the lower interval boundary to open or close.

Parameter:

     bOpenMin True if open; false if close.

 

Type

void

Visibility

public

Is Abstract

false

Parameter

     inout bOpenMin : boolean

 

Relation Detail

Dependency

 

Name

 

Related Element

     Range<Double>

 

 

Class DoubleRange

This class is a specialization of the template Range, where the template parameter is of type Double.

Figure 49. Class DoubleRange

 

Name

DoubleRange

Qualified Name

es::uco::kdis::datapro::datatypes::DoubleRange

Visibility

public

Abstract

false

Base Classifier

     Range<Double>

Realized Interface

 

 

Operation Detail

DoubleRange

Default constructor with no parameters. By default, the lower and upper limit boundaries are set to the negative and positive infinite values, respectively.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

 

 

DoubleRange

Constructor with parameters.

Parameters:

     dMin The minimum value of the range, i.e. the lower interval boundary.

     dMax The maximum value of the range, i.e. the upper interval boundary.

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     in dMax : double

     in dMin : double

 

hasValue

This method returns true if the value passed as a parameter is a valid value in the interval.

Parameter:

     dValue The value to be checked.

 

Type

boolean

Visibility

public

Is Abstract

false

Parameter

     in dValue : double

 

toString

This method returns the interval in a String format. The output format is as follows:

[|’( <min> ,<max> )|’]

where square brackets are used for close intervals, and regular brackets indicate an open value.

 

Type

String

Visibility

public

Is Abstract

false

Parameter

 

 

 

Relation Detail

 

Generalization

 

 

Name

 

Related Element

     Range<Double>


 

Package es::uco::kdis::datapro::exception

 

 

Figure 50. Package es.uco.kdis.datapro.exception

 

Name

exception

Qualified Name

es::uco::kdis::datapro::exception

 

 

Class IllegalFormatSpecificationException

This class is the exception indicating that the file format under consideration does not fulfill the expected standards for such a specification.

 

Figure 51. Class IllegalFormatSpecificationException

 

Name

IllegalFormatSpecificationException

Qualified Name

es::uco::kdis::datapro::exception::IllegalFormatSpecificationException

Visibility

public

Abstract

false

Base Classifier

     Exception

Realized Interface

 

 

Attribute Detail

 

All attributes are private.

 

Operation Detail

IllegalFormatSpecificationException

Constructor with the error message as a parameter.

Parameter:

     string Error message

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout string : String

 

 

 

 

 

Relation Detail

 

Generalization

 

Name

 

Related Element

     Exception

 

 

Class NoSuchCategoryException

This class is the exception indicating that a certain element does not belong to the specified category, or that a category is not found.

Figure 52. Class NoSuchCategoryException

 

Name

NoSuchCategoryException

Qualified Name

es::uco::kdis::datapro::exception::NoSuchCategoryException

Visibility

public

Abstract

false

Base Classifier

     Exception

Realized Interface

 

 

Attribute Detail

 

All attributes are private.

 

Operation Detail

NoSuchCategoryException

Constructor with the error message as a parameter.

Parameter:

     string Error message

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout string : String

 

Relation Detail

 

Generalization

 

 

Name

 

Related Element

     Exception

 

 

 

Class NotAddedValueException

This class is the exception indicating that a value was not successfully added to the dataset.

 

Figure 53. Class NotAddedValueException

 

Name

NotAddedValueException

Qualified Name

es::uco::kdis::datapro::exception::NotAddedValueException

Visibility

public

Abstract

false

Base Classifier

     Exception

Realized Interface

 

 

Attribute Detail

 

All attribute are private.

 

Operation Detail

NotAddedValueException

Constructor with the error message as a parameter.

Parameter:

     string Error message

 

Type

 

Visibility

public

Is Abstract

false

Parameter

     inout string : String

 

 

Relation Detail

 

Generalization

 

 

Name

 

Related Element

     Exception


 

 

Appendix A: UML diagrams

 

This appendix shows the class diagrams that represent the structure of datapro4j. This is the general package overview. The different packages are shown next.

 


Figure 54. Class diagram: package overview

 

 

Package es.uco.kdis.datapro.algorithm.base

 

Figure 55. Class diagram: package es.uco.kdis.datapro.algorithm.base

 

Package es.uco.kdis.datapro.algorithm.preprocessing

Figure 56. Class diagram: Package es.uco.kdis.datapro.algorithm.preprocessing

Package es.uco.kdis.datapro.dataset columns

 

Figure 57. Class diagram: Package es.uco.kdis.datapro.dataset.Column

 

 

Package es.uco.kdis.datapro.dataset.Source

 

Figure 58. Package es.uco.kdis.datapro.dataset.Source

 


 

Package es.uco.kdis.datapro.datatypes

 

 

Figure 59. Class diagram: Package es.uco.kdis.datapro.datatypes

 

Package es.uco.kdis.datapro.exception

 

Figure 60. Class diagram: Package es.uco.kdis.datapro.exception


Appendix B: Extending the library

Project structure

 

This project is structured in three different parts:

1.   Column structure.

2.   Datasets hierarchy.

3.   Strategies.

 

Column structure

If the programmer wants to develop new columns or adapt an existing one to his own requirements, he should have in mind the strict separation between abstraction and implementation. The former implements those methods directly devoted to manage the column metainformation and delegates any processing, handling or query related to the column real values to its implementation. For further information, see the Bridge design pattern (http://en.wikipedia.org/wiki/Bridge_pattern).

 

We recommend the following guidelines for the development of new columns:

     Column classes should be located in the package es.uco.kdis.datapro.dataset.Column

     For a given type of column, namely X, the abstraction class will be named XColumn, and its implementation class, XColumnImpl.

     The new column X has to be added to the enumeration ColumnType. This value is returned by the column as its type.

     Column implementations should not be directly accessed from any other class than its abstraction.

 

Datasets hierarchy

The library provides a finite number of dataset implementations (ARFF, Keel, CSV, MySql, ... and increasing), but its architecture permits the programmer to extend this part to make his own datasets of interest available. Rarely dataset classes are directly inherited from the top Dataset abstract class, but it is advisable to create, use and maintain the correct class hierarchy where common (both structural and behavioural) properties are defined, for design reasons. For example, ARFF and CSV datasets will inherit from the common file-based dataset, i.e. the abstract class FileDataset. Their respective classes will only define those properties that are specific to these kinds of file, whereas file-specific properties are defined by intermediate abstract classes. Dataset is always the root of this hierarchy, since this class links the physical dataset to the logical column structure.

 

Some guidelines to be considered:

     Dataset abstract classes for defining common properties are located in the package

es.uco.kdis.datapro.dataset

     Dataset concrete classes are located in the package es.uco.kdis.datapro.dataset.Source

     Dataset classes should be named with the suffix -“Dataset, .e.g, CsvDataset.

 

Apart from the constructor (with or without parameters), the main methods to pay attention are inherited from the abstract class Dataset:

     readDataset, which allows the programmer to configure the type of columns to be filled, as well as and the dataset structure.

     writeDataset, which permits the programmer to save current dataset values into the specific format.

 

These methods should fulfill the following assumptions:

     When reading, format can vary or contain errors (invalid values, missing or wrong structure, etc.).

     When reading, the original structure (meta-data) of the dataset should be recalled somehow.

     When writing, the dataset may have been read from a dataset of the same type, or not:

o  If the source dataset is of the same format, the programmer may want to overwrite or generate a new dataset. In both cases, the resulting dataset should maintain the same structure (e.g. column types and meta-data) than the source dataset.

o  If the dataset to be written is of a different type than the source dataset (or the same type with a different structure), the programmer may want to specify the type of columns to be declared in the resulting dataset.

 

Strategies

Strategies are the core and most scalable element of the library. Strategies implement algorithms on data. Strategies are independent of a specific dataset, so they can make use of more than one dataset. See DatasetStrategy in this guide for more information on the methods that should be implemented.

 

To implement your own algorithms, the following guidelines should be considered:

     Every algorithm should be a subclass of DatasetStrategy.

     Algorithms are grouped in packages from es.uco.kdis.datapro.algorithm

     Only the package es.uco.kdis.datapro.algorithm.base is required by the library. The rest of packages from es.uco.kdis.datapro.algorithm could be excluded from the programmers distribution. Notice that each specific algorithm package may have its own external dependencies.

 

Other packages

Apart from the specific packages for columns, datasets and strategies, there are some other relevant packages to consider that may be extended as well:

     es.uco.kdis.datapro.datatypes, this package implements the auxiliary classes and datatypes used by datapro4j. For example, the classes declaring invalid values, ranges, etc.

     es.uco.kdis.datapro.exception, this package implements the exception classes. The programmer should look for alternative Java common exceptions before implementing his own class and clutter the library up with unnecessary classes.

 

 

Code documentation

Class headings are documented according to the following structure: class description, contact info and history.

 

/**

* CLASS DESCRIPTION

*

* <p>

* CONTACT INFO:

* <ul>

* <li>Jose Raul Romero, PhD                   [jrromero@uco.es]

* <p>{@link http://www.jrromero.net}

* <p><p>

* Knowledge Discovery and Intelligent Systems Research Group (KDIS) <p>

* {@link http://www.uco.es/grupos/kdis}

* </ul>

* <p>

* HISTORY:

* <ul>

* <li> INCLUDE HERE THE LIST OF CHANGES TO THIS SPECIFIC FILE

* </ul>

* <p>

*

@author  Jose Raul Romero (JRR, 0.2, 0.3)                            EXAMPLE OF AUTHORS, INITIALS, VERSIONS

@author  Jose Maria Luna (JML, 0.1)

@version 0.3

*

**/

 

Each parameter and method should follow the Javadoc notation for documenting the code.

Further, remember include the file license.txt  in every distribution that includes the library or part of it.

 

Coding recommendations

 

1.     Code should be implemented following the Hungarian notation.

2.     Code and comments should be written in English.