Package es::uco::kdis::datapro::algorithm::intruder

Figure 4. Package es.uco.kdis.datapro.algorithm.intruder

Name	intruder
Qualified Name	es::uco::kdis::datapro::algorithm::intruder

Class AverageAttack

This class implements the Average Attack. This attack strategy sets the maximum value (push attack) or the minimum value (nuke attack) to the target item. The filler items are selected randomly and their values are also randomly chosen over a Normal Distribution, using the mean and standard deviation of the own item.

For a further description see the following paper:

B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.

Name	AverageAttack
Qualified Name	es::uco::kdis::datapro::algorithm::intruder::AverageAttack
Visibility	public
Abstract	false
Base Classifier	• IntruderAttack
Realized Interface

Operation Detail

AverageAttack

Parameterized Constructor.

• oDataset The original dataset

• iNumAttacks The number of attack instances

• bPush The attack type (true, push; false, nuke)

• iTarget The target item (The column attribute/item index)

• iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size

• dXRand The possibility of choose an item as selected/filler item

• iSeed The random seed

Type
Visibility	public
Is Abstract	false
Parameter	• in bPush : boolean • in dXRand : double • in iNumAttacks : int • in iNumFillers : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset

chooseSelectedItems

The Average Attack does not use the selected item set.

Type	void
Visibility	protected
Is Abstract	false
Parameter

initialize

Initialization method.

Type	void
Visibility	public
Is Abstract	false
Parameter

setFillerValues

In the Average Attack, the values for filler items must be randomly generated by a Normal Distribution, using the mean and standard deviation of each item.

Type	void
Visibility	protected
Is Abstract	false
Parameter

setSelectedValues

The Average Attack does not use the selected item set.

Type	void
Visibility	protected
Is Abstract	false
Parameter

Relation Detail

Generalization

Name
Related Element	• IntruderAttack

Class BandwagonAttack

This class implements the Bandwagon Attack. This attack strategy sets the maximum value (push attack) to the target item. Then, a set of items, named selected items, are chosen between the most visibility items.

The visibility items are those having a high mean and high evaluation density. For a further description see the following paper:

Name	BandwagonAttack
Qualified Name	es::uco::kdis::datapro::algorithm::intruder::BandwagonAttack
Visibility	public
Abstract	false
Base Classifier	• IntruderAttack
Realized Interface

Attribute Detail

dDensity

The density threshold, i.e. the minimum number of values in the column.

Type	double
Default Value
Visibility	protected
Multiplicity

dVisibility

The visibility threshold, i.e., the possibility of choose an item to act as selected item.

Type	double
Default Value
Visibility	protected
Multiplicity

rgdMeanSD

It stores the mean and standard deviation of the overall dataset.

Type	Double
Default Value	new ArrayList<Double>()
Visibility	protected
Multiplicity	0..*

rgoVisibilityColumns

The array of columns whose visibility exceed the thresholds dXVisibility and dXDensity.

Type	Integer
Default Value	new ArrayList<Integer>()
Visibility	package
Multiplicity	0..*

rgoVisibilityMeans

The array of mean columns whose visibility exceed the thresholds dXVisibility and dXDensity.

Type	Double
Default Value	new ArrayList<Double>()
Visibility	package
Multiplicity	0..*

Operation Detail

BandwagonAttack

Parameterized Constructor:

• oDataset The original dataset

• iNumAttacks The number of attack instances

• iTarget The target item (The column attribute/item index)

• iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size

• iNumSelected The size of selected item set

• dVisibility The visibility threshold (absolute value of column mean).

• dDensity The density threshold (absolute value of instances without counting null, empty or missing values in the column)

• dXRand The possibility of choose an item as filler item

• iSeed The random seed

Type
Visibility	public
Is Abstract	false
Parameter	• in dDensity : double • in dVisibility : double • in dXRand : double • in iNumAttacks : int • in iNumFillers : int • in iNumSelected : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset

chooseSelectedItems

Create the set of selected items. The size is prefixed by iNumSelected property.

Type	void
Visibility	protected
Is Abstract	false
Parameter

initialize

Initialization method for the strategy.

Type	void
Visibility	public
Is Abstract	false
Parameter

orderArray

Order the columns using their mean as comparative metric. This method implements the QuickSort algorithm.

• iInit The initial position of the array

• iEnd The end position in the array

Type	void
Visibility	protected
Is Abstract	false
Parameter	• in iEnd : int • in iInit : int

setFillerValues

In the Bandwagon Attack, the values for filler items must be randomly generated by a Normal Distribution, using the mean and standard deviation of the overall dataset.

Type	void
Visibility	protected
Is Abstract	false
Parameter

setSelectedValues

Set the values of selected items. In the Bandwagon Attack, each selected item has the maximum value.

Type	void
Visibility	protected
Is Abstract	false
Parameter

setVisibilityColumns

Select the columns that exceed the visibility and density threshold.

Type	void
Visibility	protected
Is Abstract	false
Parameter

Relation Detail

Generalization

Name
Related Element	• ReverseBandwagonAttack

Name
Related Element	• IntruderAttack

Class DatasetStatistics

Name	DatasetStatistics
Qualified Name	es::uco::kdis::datapro::algorithm::intruder::DatasetStatistics
Visibility	public
Abstract	false
Base Classifier	• DatasetStrategy
Realized Interface

Attribute Detail

All attributes are private.

Operation Detail

DatasetStatistics

Constructor. A parameter is required:

• data Dataset over which the statistical strategy will be executed.

Type
Visibility	public
Is Abstract	false
Parameter	• inout data : Dataset

execute

It executes the algorithm.

Type	void
Visibility	public
Is Abstract	false
Parameter

getResult

It returns the mean and SD in form of an ArrayList of Double values.

Type	ArrayList<Double>
Visibility	public
Is Abstract	false
Parameter

Initialize

Inialization/Pre-processing method for the strategy.

Type	void
Visibility	public
Is Abstract	false
Parameter

postexec

Type	void
Visibility	public
Is Abstract	false
Parameter

Relation Detail

Generalization

Name
Related Element	• DatasetStrategy

Class IntruderAttack

IntruderAttack is the abstract base class for all the intruder attack algorithms. This class represents a generic attack used to alter the content of a dataset. It extends DatasetStrategy, whose methods are implemented and adapted to a general intruder strategy.

For a further description see the paper:

Name	IntruderAttack
Qualified Name	es::uco::kdis::datapro::algorithm::intruder::IntruderAttack
Visibility	public
Abstract	true
Base Classifier	• DatasetStrategy
Realized Interface

Attribute Detail

bPush

bPush represents the version of the algorithm (true, for push attack; false for nuke attack).

Type	boolean
Default Value
Visibility	protected
Multiplicity

dXRand

dXrand represents the possibility of choosing an itemm(attribute) as filler item.

Type	double
Default Value
Visibility	protected
Multiplicity

iActualInstance

iActualInstance represents the dataset instance modified by the attack.

Type	Int
Default Value
Visibility	Protected
Multiplicity

iNumAttacks

iNumAttacks represents the number of attack instances that will be generated.

Type	int
Default Value
Visibility	protected
Multiplicity

iNumFillers

iNumFillers is the number of filler items, -1 if the filler item set size is randomly chosen.

Type	int
Default Value
Visibility	protected
Multiplicity

iNumSelected

iNumSelected is the number of selected items, -1 if the selected item set size is randomly chosen.

Type	Int
Default Value
Visibility	Protected
Multiplicity

iSeed

iSeed is the seed for the oRand object.

Type	Int
Default Value
Visibility	Protected
Multiplicity

iTarget

iTarget is the target attribute of the attack.

Type	int
Default Value
Visibility	protected
Multiplicity

oInjection

oInjection stores the attack instances.

Type	Dataset
Default Value
Visibility	protected
Multiplicity

oRand

oRand represents a random object.

Type	Random
Default Value
Visibility	protected
Multiplicity

rgoFillers

rgoFillers is the set of selected items.

Type	ColumnAbstraction
Default Value	new ArrayList<ColumnAbstraction>()
Visibility	protected
Multiplicity	0..*

rgoSelected

rgoSelected is the set of selected items.

Type	ColumnAbstraction
Default Value	new ArrayList<ColumnAbstraction>()
Visibility	protected
Multiplicity	0..*

Operation Detail

addAttack

Add a new instance (all items set to missed value) to the injection.

Type	void
Visibility	protected
Is Abstract	false
Parameter

chooseFillerItems

Select the set of filler items. This set is common for all the intruder attack algorithms.

Type	void
Visibility	protected
Is Abstract	false
Parameter

chooseSelectedItems

Select the set of selected items. The selection process is part of a specific intruder attack algorithm.

Type	void
Visibility	protected
Is Abstract	true
Parameter

createRandomSetOfFiller

Select a random set of columns to act as filler items. The set size is also randomly selected. It returns the array of dataset columns that will act as filler items.

Type	ArrayList<ColumnAbstraction>
Visibility	protected
Is Abstract	false
Parameter

createSetOfFiller

Select a random set of columns to act as filler items. The set size is prefixed by iNumFiller property. It returns the array of dataset columns that will act as filler items.

Type	ArrayList<ColumnAbstraction>
Visibility	protected
Is Abstract	false
Parameter

execute

Implements the strategy of attack algorithms.

Type	void
Visibility	public
Is Abstract	false
Parameter

getMeanAndSD

Calculate the mean and standard deviation of the overall dataset. It returns an array with two elements, mean and standard deviation.

Type	ArrayList<Double>
Visibility	protected
Is Abstract	false
Parameter

getResult

Return the dataset injection created. It returns the object comprising the injection after the attack.

Type	Object
Visibility	public
Is Abstract	false
Parameter

initialize

Initialize the algorithm to prepare the execution.

Type	void
Visibility	public
Is Abstract	false
Parameter

isSelectedColumn

This method returns a true value if the rgoSelected contains a column named as sName parameter, false otherwise.

· sName The name of the column to be searched. It returns True if the column exists, false if not.

Type	boolean
Visibility	protected
Is Abstract	false
Parameter	· inout sName: String · ·

postexec

Post-processing after the execute method.

Type	void
Visibility	public
Is Abstract	false
Parameter

setFillerValues

This method assigns the correct value for each filler item. It depends on the intruder attack algorithm.

Type	void
Visibility	protected
Is Abstract	true
Parameter

setMaximumValue

Assign the maximum value to the target item.

Type	void
Visibility	protected
Is Abstract	false
Parameter

setMinimumValue

Assign the minimum value to the target item.

Type	void
Visibility	protected
Is Abstract	false
Parameter

setSelectedValues

The selected items value generation process. It is also depends on the specific intruder attack algorithm.

Type	void
Visibility	protected
Is Abstract	true
Parameter

Relation Detail

Generalization

Name
Related Element	• AverageAttack

Name
Related Element	• DatasetStrategy

Name
Related Element	• RandomAttack

Name
Related Element	• LoveHateAttack

Name
Related Element	• BandwagonAttack

Name
Related Element	• SegmentAttack

Class LoveHateAttack

This class implements the Love/Hate Attack. This attack strategy sets the maximum value (push attack) or the minimum value (nuke attack) to the target item. The filler items are selected randomly and their values are assigned in the opposite sense of the target item.

For a further description see the paper:

B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.

Name	LoveHateAttack
Qualified Name	es::uco::kdis::datapro::algorithm::intruder::LoveHateAttack
Visibility	public
Abstract	false
Base Classifier	• IntruderAttack
Realized Interface

Operation Detail

chooseSelectedItems

The Love/Hate Attack does not use the selected items.

Type	void
Visibility	protected
Is Abstract	false
Parameter

initialize

Initialization method.

Type	void
Visibility	public
Is Abstract	false
Parameter

LoveHateAttack

Parameterized Constructor:

• oDataset The original dataset

• iNumAttacks The number of attack instances

• bPush The attack type (true, push; false, nuke)

• iTarget The target item (The column attribute/item index)

• iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size

• dXRand The possibility of choose an item as selected/filler item

• iSeed The random seed

Type
Visibility	public
Is Abstract	false
Parameter	• in bPush : boolean • in dXRand : double • in iNumAttacks : int • in iNumFillers : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset

setFillerValues

In the Love/Hate Attack, the values for filler items must be assigned in the opposite sense of the type of attack. If it is a push attack, all the filler items will be set to minimum value; if it is a nuke attack, all the filler items will be set to maximum value.

Type	void
Visibility	protected
Is Abstract	false
Parameter

setSelectedValues

The Love/Hate Attack does not use the selected items.

Type	void
Visibility	protected
Is Abstract	false
Parameter

Relation Detail

Generalization

Name
Related Element	• IntruderAttack

Class RandomAttack

This class implements the Random Attack. This attack strategy sets the maximum value (push attack) or the minimum value (nuke attack) to the target item. The filler items are selected randomly and their values are also chosen with a Normal Distribution, using the global dataset mean and standard deviation.

For a further description read the article:

Name	RandomAttack
Qualified Name	es::uco::kdis::datapro::algorithm::intruder::RandomAttack
Visibility	public
Abstract	false
Base Classifier	• IntruderAttack
Realized Interface

Attribute Detail

All attributes are private.

Operation Detail

chooseSelectedItems

The Random Attack does not use the selected items.

Type	void
Visibility	protected
Is Abstract	false
Parameter

initialize

Initialization method.

Type	void
Visibility	public
Is Abstract	false
Parameter

RandomAttack

Parameterized Constructor:

• oDataset The original dataset

• iNumAttacks The number of attack instances

• bPush The attack type (true, push; false, nuke)

• iTarget The target item (The column attribute/item index)

• iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size

• dXRand The possibility of choose an item as selected/filler item

• iSeed The random seed

Type
Visibility	public
Is Abstract	false
Parameter	• in bPush : boolean • in dXRand : double • in iNumAttacks : int • in iNumFillers : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset

setFillerValues

In the Random Attack, the values for filler items must be randomly generated by a Normal Distribution, using the mean and standard deviation of the dataset.

Type	void
Visibility	protected
Is Abstract	false
Parameter

setSelectedValues

The Random Attack does not use the selected items.

Type	void
Visibility	protected
Is Abstract	false
Parameter

Figure 16. Class Dataset

Name	Dataset
Qualified Name	es::uco::kdis::datapro::dataset::Dataset
Visibility	public
Abstract	true
Base Classifier
Realized Interface

Attribute Detail

iCursor

iCursor refers to the row being pointed in the dataset by the InstanceIterator.

Type	int
Default Value
Visibility	Protected
Multiplicity

rgoColumns

rgoColumns is the list of columns that comprise the dataset.

Type	ColumnAbstraction
Default Value
Visibility	protected
Multiplicity	0..*

rgoValidBinaryFalseValues

For binary columns, it contains the list of values that will be interpreted as False when reading from the physical dataset. Writing will be performed using the first element in the list.

Type	String
Default Value
Visibility	Protected
Multiplicity	0..*

rgoValidBinaryTrueValues

For binary columns, it contains the list of values that will be interpreted as True when reading from the physical dataset. Writing will be performed using the first element in the list.

Type	String
Default Value
Visibility	protected
Multiplicity	0..*

sOpenRangeDelimiter

For range columns, sOpenRangeDelimiter stores the symbol(s) that open the numerical range, right before the minimum value: e.g., '[' for [2,3]. This is used during the reading and writing of the physical dataset.

Type	String
Default Value
Visibility	protected
Multiplicity

sSeparationRangeDelimiter

For range columns, sSeparationRangeDelimiter stores the symbol(s) that separate the minimum and maximum values in a numerical range: e.g., ',' for [2,3]. This value is only used during the reading and writing of the physical dataset.

Type	String
Default Value
Visibility	protected
Multiplicity

sCloseRangeDelimiter

For range columns, sCloseRangeDelimiter stores the symbol(s) that serves to close the numerical range, right after the maximum value: e.g., ']' for [2,3]. This is only used during the reading and writing of the physical dataset.

Type	String
Default Value
Visibility	protected
Multiplicity

tiplicity

sEmptyValue

sEmptyValue stores the string that will represent an empty value in the dataset file.

Type	String
Default Value
Visibility	protected
Multiplicity

sMissingValue

sMissedValue stores the string that will represent a missing value in the dataset file.

Type	String
Default Value
Visibility	protected
Multiplicity

sNullValue

sNullValue stores the string that will represent a null value in the dataset file.

Type	String
Default Value
Visibility	protected
Multiplicity

sName

The name of the dataset.

Type	String
Default Value
Visibility	protected
Multiplicity

Operation Detail

addAllValues

A set of column values are inserted into the dataset structure. Notice that instance duplication is not checked.

Parameters:

• sColumnFormat String that specifies the types of the columns to be added. Types depend on the specific dataset.

Exceptions:

• IOException

• IllegalFormatSpecificationException

• NotAddedValueException

• IndexOutOfBoundsException

Type	void
Visibility	protected
Is Abstract	true
Parameter	• inout sColumnFormat : String

addColumn

Insert a column abstraction given by parameter in the last position of the list of columns of the dataset

Parameter:

• oColumn: Column abstraction to be added

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout oColumn : ColumnAbstraction

addColumn

Insert a column abstraction in a given position of the list of dataset columns.

Parameters:

• oColumn: Column abstraction to be inserted

• iIndex: Position index where the column element is added in the list. The rest of column items will be shifted one position to the right.

Exceptions:

• UnsupportedOperationException

• ClassCastException

• NullPointedException

• IllegalArgumentException

• IndexOutOfBoundsException

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout iIndex : int • inout oColumn : ColumnAbstraction

clone

Create a new dataset exactly with the same metadata and column structure. However, only the structure is copied, since instances from the original dataset are not added to the new one.

It returns the empty cloned dataset.

Type	Dataset
Visibility	public
Is Abstract	false
Parameter

close

Abstract method that serves to close the physical dataset source.

Exceptions:

• IOException

Type	void
Visibility	protected
Is Abstract	true
Parameter

copy

This method creates a new dataset exactly with the same metadata, column structure and data than the original dataset. In this case, instances from the original dataset are also copied to the new one.

A copy of the dataset is returned.

Type	Dataset
Visibility	public
Is Abstract	false
Parameter

Dataset

This is the default constructor of this class. By default, it sets the following parameters to their default values:

• sMissedValue: "?"

• sNullValue: "?"

• sEmptyValue: "?"

• sOpenRangeDelimiter: "["

• sSeparationRangeDelimiter: ","

• sCloseRangeDelimiter: "]"

Notice that using these symbols is not mandatory for reading/writing, as its applicability depends on the specific implementation of each source dataset.

Type
Visibility	public
Is Abstract	false
Parameter

getColumn

This method looks for a column abstraction by its index in the column list. Notice that indexes can change when one column is added or removed to/from intermediate positions.

Parameter:

• iIndex: Index of the queried column.

It returns a reference to the column abstraction queried.

Type	ColumnAbstraction
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int

getColumnByName

This method returns the first column instance found having the name required as parameter. Parameter:

• sName: The name of the column queried (no case-sensitive)

It returns the column abstraction class that accesses to the column required by its name.

Type	ColumnAbstraction
Visibility	public
Is Abstract	false
Parameter	• inout sName : String

getColumns

Getter method for the private property rgoColumns, which comprises the array of column abstractions in the dataset.

Type	List<ColumnAbstraction>
Visibility	public
Is Abstract	false
Parameter

getEmptyValue

Getter method for the private property sEmptyValue, which comprises the String that represents the symbol for the empty value in the dataset source. Notice that each specific dataset implementation can use, or not, this property accordingly.

Type	String
Visibility	public
Is Abstract	false
Parameter

getIndexOfColumn

Given a column abstraction, it searches for the index that this column occupies in the array of column abstractions in the dataset.

Parameter:

• oCol: Column to be located.

It returns the index of the column abstraction passed as parameter; -1, otherwise.

Type	int
Visibility	public
Is Abstract	false
Parameter	• inout oCol : ColumnAbstraction

getMissingValue

Getter method for the private property sMissingValue, which comprises the String that represents the symbol for the missing value in the dataset source. Notice that each specific dataset implementation can use, or not, this property accordingly.

Type	String
Visibility	public
Is Abstract	false
Parameter

getName

Getter method for the private property sName, which represents the name given to the dataset.

Type	String
Visibility	public
Is Abstract	false
Parameter

getNullValue

Getter method for the private property sNullValue, which comprises the String that represents the symbol for the null value in the dataset source. Notice that each specific dataset implementation can use or not this property accordingly.

Type	String
Visibility	public
Is Abstract	false
Parameter

getNumberOfDecimals

Getter method for the private property iNumberOfDecimals, which indicates the number of decimal digits used when writing numerical columns in dataset sources. Notice that this value can be used accordingly by each specific dataset source.

Type	int
Visibility	public
Is Abstract	false
Parameter

getRangeDelimiters

This method gets a list of the three values used to demarcate a range, comprising the sOpenRangeDelimiter, sSeparationRangeDelimiter and sCloseRangeDelimiter. Notice that each specific dataset source could make use of these values accordingly.

Type	ArrayList<String>
Visibility	public
Is Abstract	false
Parameter

getValidBinaryFalseValues

Getter method for the private property rgoValidBinaryFalseValues: the list of strings that are interpreted as false when reading and writing Boolean values in a dataset source. Notice that its specific use depends on the implementation made for each dataset source.

Type	ArrayList<String>
Visibility	public
Is Abstract	false
Parameter

getValidBinaryTrueValues

Getter method for the private property rgoValidBinaryTrueValues: the list of strings that are interpreted as true when reading and writing Boolean values in a dataset source. Notice that its specific use depends on the implementation made for each dataset source.

Type	ArrayList<String>
Visibility	public
Is Abstract	false
Parameter

merge

This method merges two datasets by adding the dataset passed as parameter to the current one. Parameters:

• oDSInjected: The dataset to be added. Notice that this dataset must contain the same number and type of columns than the dataset object this.

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout oDSInjected : Dataset

merge

This method merges two datasets by adding the dataset passed as parameter to the dataset object this.

Parameters:

· oDataset: The dataset to be added.

· sColumnFormat: Sometimes the target dataset contains more columns than the source dataset. For those cases, the columns to be added can be explicitly specified. This parameter is a String that indicates the columns to be added. Each character in the String matches to a column in the target dataset. The String may comprise some of the following characters:

o x: Include this column

o %: Skip this column.

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout oDataset : Dataset • inout sColumnFormat : String

open

Abstract protected method. This method just opens the source dataset and initializes the row cursor to the first row of data. However, each specific dataset class is responsible for its implementation, and thus defining its real scope, according to its specific properties.

Notice that each type of datasets will provide specific methods to process the full dataset. For example, file datasets provide the method readDataset.

Exceptions:

• FileNotFoundException

• IOException

• IllegalFormatSpecificationException

Type	void
Visibility	protected
Is Abstract	true
Parameter

removeColumn

This method removes a column from the dataset. Notice that column indexes can be modified (decreased) for the rest of columns. The column removed is returned.

Parameter:

• iIndex: Position index where the column to be removed is located.

Exceptions:

• UnsupportedOperationException

• IndexOutOfBoundsException

Type	ColumnAbstraction
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int

setColumns

Setter method for the property rgoColumns. Even when it is a public method, notice that it should be used very carefully, mainly for those cases when the replacement of the entire set of columns is mandatory. To add or remove a single column, or just a set of them, use instead the methods addColumn and removeColumn.

Parameter:

• rgoCols: The entire list of columns in the dataset.

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout rgoCols : List<ColumnAbstraction>

setEmptyValue

Setter method for the private property sEmptyValue, which comprises the String that represents the symbol for the null value in the dataset source. Notice that each specific dataset implementation can make its own use of this property accordingly.

Parameters:

• sEmptyValue The symbol/string representing an empty value in the dataset

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout sEmptyValue : String

setMissingValue

Setter method for the private property sMissingValue, which comprises the String that represents the symbol for the missing value in the dataset source. Notice that each specific dataset implementation can make its own use of this property accordingly.

Parameters:

• sMissingValue The symbol/string representing a missing value in the dataset

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout sMissingValue : String

setName

Setter method for the private property sName, which represents the name of the dataset. Parameter:

• sName: The name of the dataset.

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout sName : String

setNullValue

Setter method for the private property sNullValue, which comprises the String that represents the symbol for the null value in the dataset source. Notice that each specific dataset implementation can make its own use of this property accordingly.

Parameters:

• sNullValue The symbol/string representing a null value in the dataset

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout sNull : String

setNumberOfDecimals

Setter method for the private property iNumberOfDecimals, which represents the number of decimals that the programmer wants to set for numerical values. Notice that the specific applicability of this attribute directly depends on the specific implementation of the dataset source.

Parameter:

• iNum: The number of decimal digits that will be considered when saving numerical values.

Type	void
Visibility	public
Is Abstract	false
Parameter	• in iNum : int

setRangeDelimiters

This method sets the symbols that will serve as range delimiter. Notice that the specific applicability of these attributes directly depends on the specific implementation of the dataset source.

Parameters:

• sInitial: The symbol/string that represents the starting delimiter.

• sSeparator: The symbol/string that represents the value separator.

• sEnding: The symbol/string that represents the ending delimiter.

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout sEnding : String • inout sInitial : String • inout sSeparator : String

setValidBinaryFalseValues

Setter method of the list rgoValidBinaryFalseValues, which contains the set of strings that represent a False boolean value in the dataset. Notice that only the first value of the list should be used for serialization (reading or writing). In any case, it depends on the specific implementation of each dataset source.

Parameter:

• rgoValidBinaryFalseValues: The list of values that will be interpreted as False.

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout rgoValidBinaryFalseValues : ArrayList<String>

setValidBinaryTrueValues

Setter method of the list rgoValidBinaryTrueValues, which contains the set of strings that represent a True boolean value in the dataset. Notice that only the first value of the list should be used for serialization (reading or writing). In any case, it depends on the specific implementation of each dataset source.

Parameter:

• rgoValidBinaryTrueValues: The list of values that will be interpreted as True.

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout rgoValidBinaryTrueValues : ArrayList<String>

setValidBinaryValues

This method sets both the list of strings that will represent a True boolean value, and the list of strings that will represent a False boolean value in the dataset. This functionality could be also done by invoking seldom specific methods.

Parameters:

• rgoFalseList: A list with the valid False symbols/strings

• rgoTrueList: A list with the valid True symbols/strings

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout rgoFalseList : ArrayList<String> • inout rgoTrueList : ArrayList<String>

swapColumns

This method swaps two columns in the list of columns of the dataset. It searches for both columns, and swaps its positions, and thus both structure and data.

Parameters:

• oColumn1: The first column to swap.

• oColumn2: The second column to swap.

Exceptions:

• ColumnAbstraction

• UnsupportedOperationException

• ClassCastException

• NullPointedException

• IllegalArgumentException

• IndexOutOfBoundsException

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout oColumn1 : ColumnAbstraction • inout oColumn2 : ColumnAbstraction

Relation Detail

Association

Name	rgoColumns
Related Element	• ColumnAbstraction

Dependency

Name
Related Element	• InstanceIterator

Generalization

Name
Related Element	• FileDataset

Class FileDataset

This abstract class represents a dataset when its source is extracted from a file. It includes the specific methods required to handle with datasets in form of files.

Figure 17. Class FileDataset

Name	FileDataset
Qualified Name	es::uco::kdis::datapro::dataset::FileDataset
Visibility	public
Abstract	true
Base Classifier	• Dataset
Realized Interface

Attribute Detail

oBufferedReader

oBufferedReader is the buffer used to read the file.

Type	BufferedReader
Default Value
Visibility	protected
Multiplicity

sCommentValue

sCommentedValue stores the string that will indicate the beginning of a comment line in the dataset file, if this line has to be omitted from the processing.

Type	String
Default Value
Visibility	protected
Multiplicity

sFileName

sFileName is the name of the file source that contains the dataset.

Type	String
Default Value
Visibility	protected
Multiplicity

sSeparationSymbol

sSeparationSymbol stores the symbol/string that indicates the separator between values of the same instance-row (i.e., a comma, a line of the dataset file, etc).

Type	String
Default Value
Visibility	protected
Multiplicity

Operation Detail

clone

This method creates a new dataset exactly with the same type and column structure than the original. Instances from the original dataset are not copied. It returns a new Dataset instance.

Type	Dataset
Visibility	public
Is Abstract	false
Parameter

copy

This method clones the dataset and fills its content with the instances extracted from the original. Create a new dataset exactly with the same type, column structure and data. It returns the copied Dataset instance.

Type	Dataset
Visibility	public
Is Abstract	false
Parameter

FileDataset

Default constructor. Notice that the following symbols are used by default:

• sCommentValue: "%"

• sSeparationSymbol: ","

Type
Visibility	public
Is Abstract	false
Parameter

FileDataset

This constructor receives the name of the file as parameter. The following symbols are used as default:

• sCommentValue: "%"

• sSeparationSymbol: ","

Parameter:

• sFileName: The filename of the dataset source.

Type
Visibility	public
Is Abstract	false
Parameter	• inout sFileName : String

getCommentValue

Getter method of the property sCommentValue.

Type	String
Visibility	public
Is Abstract	false
Parameter

getFileName

Getter method of the filename of the dataset source.

Type	String
Visibility	public
Is Abstract	false
Parameter

getSeparationSymbol

Getter method of the property sSeparationSymbol.

Type	String
Visibility	public
Is Abstract	false
Parameter

readDataset

Implementations of this abstract method will read the dataset from the file specified by the constructor.

Parameters:

• sContentFormat: String that specifies the reading format of the dataset file. Construct the string using a sequence of control tokens:

o % to omit a line (only one line).

o %name to read the name of columns (only one line).

o %col to read data (zero, one or more lines).

Example: the string “%%%col%%name” indicates that the first two lines must be omitted, then data is read and, finally, the last line will contain the column names.

• sColumnFormat: A String that contains an ordered sequence of tokens that determine the data type of each column to be read. Use the following tokens:

o s: Nominal column

o f: Real column

o c: Categorical column

o b: Binary column

o i: Integer column

o %: Skip this column (the column skipped is not processed)

Additionally, notice that other tokens can be considered depending of the specific dataset source (e.g., d for columns of type date).

Exceptions:

• FileNotFoundException

• IOException

• IllegalFormatSpecificationException

• NotAddedValueException

• IndexOutOfBoundsException

Type	void
Visibility	public
Is Abstract	true
Parameter	• inout sColumnFormat : String • inout sContentFormat : String

setCommentValue

Setter method of the property sCommentValue.

Parameter:

• sComment: The token/string indicating the symbol that represents a comment line in the dataset file.

Type	void
Visibility	protected
Is Abstract	false
Parameter	• inout sComment : String

setFileName

Setter method of the property sFileName. Parameter:

• sFileName: The filename of the dataset source.

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout sFileName : String

setSeparationSymbol

Setter method of the property sSeparationSymbol. Parameter:

• sSeparationSymbol: The token used to differentiate between instances in the same line of the dataset source.

Type	void
Visibility	protected
Is Abstract	false
Parameter	• inout sSeparator : String

writeDataset

This abstract method defines the signature of the write method for every file dataset. Implementations of this method deal with the serialization (writing) of the current column structure into each specific file format.

Parameter:

• sOutputFile: The path where the dataset file will be saved.

Exception:

• IOException

Type	void
Visibility	public
Is Abstract	true
Parameter	• inout sOutputFile : String

Relation Detail

Generalization

Name
Related Element	• CsvDataset

Name
Related Element	• ExcelDataset

Name
Related Element	• ArffDataset

Name
Related Element	• Dataset

Class InstanceIterator

InstanceIterator is the class that implements the interface IIterator for covering the instances of the dataset. Thus, this class represents an iterator to access each row/instance in a dataset. The instance iterator provides methods to cover the whole set of instances and keeps the reference to the dataset being iterated.

Figure 18. Class InstanceIterator

Name	InstanceIterator
Qualified Name	es::uco::kdis::datapro::dataset::InstanceIterator
Visibility	public
Abstract	false
Base Classifier
Realized Interface	• IIterator

Attribute Detail

All attributes are private.

Operation Detail

currentInstance

This method returns the list of objects that form the currently pointed instance in the dataset.

Type	List<Object>
Visibility	public
Is Abstract	false
Parameter

first

This method returns the list of objects that form the first instance in the dataset and sets the pointer to the first instance.

Type	List<Object>
Visibility	public
Is Abstract	false
Parameter

InstanceIterator

Default iterator constructor.

Parameter:

• oDataset: The dataset to be covered by the iterator.

Type
Visibility	public
Is Abstract	false
Parameter	• inout oDataset : Dataset

isDone

This method returns true if the dataset has no more instances to be iterated. False, otherwise.

Type	boolean
Visibility	public
Is Abstract	false
Parameter

This method increases the instance pointer by one, i.e. sets the pointer to the next instance in the dataset.

Type	void
Visibility	public
Is Abstract	false
Parameter

Relation Detail

Interface Realization

Name
Related Element	• IIterator

Interface IIterator

IIterator is the interface that any instance iterator has to implement, as InstanceIterator does.

Figure 19. Interface IIterator

Name	IIterator
Qualified Name	es::uco::kdis::datapro::dataset::IIterator
Visibility	public
Base Classifier

Operation Detail

currentInstance

The implementation of this method has to return the current pointed instance in the dataset as a List of instances of any class from Object.

Type	List<Object>
Visibility	public
Is Abstract	true
Parameter

first

An implementation of this method returns the first instance of the dataset. From here on, the current instance pointed by the iterator should be this first one.

Type	List<Object>
Visibility	public
Is Abstract	true
Parameter

isDone

This method should be implemented to return True if the iterator points to the last instance of the dataset. It returns False otherwise.

Type	boolean
Visibility	public
Is Abstract	true
Parameter

The implementation of this method increases the iterator to the next instance in the dataset.

Type	void
Visibility	public
Is Abstract	true
Parameter

Relation Detail

Interface Realization

Name
Related Element	• InstanceIterator

addValue

This method implements the method addValue of the column abstraction, returning the number of objects successfully added.

Parameters:

• oValue The value to be added.

• iIndex The position in the column to add the value.

Type	Int
Visibility	public
Is Abstract	true
Parameter	• inout oValue : Object • in iIndex : int

countEmptyValues

This method implements the method countEmptyValue of the column abstraction, returning the number of empty values contained in the column values. -1 is returned if this value could not be calculated.

Type	int
Visibility	public
Is Abstract	false
Parameter

countInvalidValues

This method implements the method countInvalidValue of the column abstraction, returning the number of invalid values (null, empty and missing values) contained in the column values. -1 is returned if this value could not be calculated.

Type	int
Visibility	public
Is Abstract	false
Parameter

countMissingValues

This method implements the method countMissingValue of the column abstraction, returning the number of missing values contained in the column values. -1 is returned if this value could not be calculated.

Type	int
Visibility	public
Is Abstract	false
Parameter

countNullValues

This method implements the method countNullValue of the column abstraction, returning the number of null values contained in the column values. -1 is returned if this value could not be calculated.

Type	int
Visibility	public
Is Abstract	false
Parameter

getElement

This method implements the method getElement of the column abstraction, returning the element at the given position.

Parameter:

• iPos The position of the element to be returned.

Type	Object
Visibility	public
Is Abstract	true
Parameter	• in iPos : int

getEmptyValue

This method implements the method getEmptyValue of the column abstraction, returning the element representing the column-specific empty value.

Type	Object
Visibility	public
Is Abstract	false
Parameter

getMissingValue

This method implements the method getMissingValue of the column abstraction, returning the element representing the column-specific missing value.

Type	Object
Visibility	public
Is Abstract	false
Parameter

getNullValue

This method implements the method getNullValue of the column abstraction, returning the element representing the column-specific null value.

Type	Object
Visibility	public
Is Abstract	false
Parameter

getSize

This method implements the method getSize of the column abstraction, returning the number of elements contained in the column.

Type	int
Visibility	public
Is Abstract	true
Parameter

getValues

This method implements the method getValues of the column abstraction, returning the list of elements (as instances of Object) contained in the column.

Type	List<Object>
Visibility	public
Is Abstract	true
Parameter

removeValue

This method implements the method removeValue of the column abstraction.

Parameter:

• iIndex The position in the column to add the value.

Type	void
Visibility	public
Is Abstract	true
Parameter	• in iIndex : int

setEmptyValue

This method implements the method setEmptyValue of the column abstraction, setting the element representing the column-specific empty value.

Parameter:

• oEmptyValue The object representing a specific empty value in this column.

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout oEmptyValue : Object

setMissingValue

This method implements the method setMissingValue of the column abstraction, setting the element representing the column-specific missing value.

Parameter:

• oMissingValue The object representing a specific missing value in this column.

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout oMissingValue : Object

setNullValue

This method implements the method setNullValue of the column abstraction, setting the element representing the column-specific null value.

Parameter:

• oNullValue The object representing a specific null value in this column.

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout oNullValue : Object

setValue

This method implements the method setValue of the column abstraction, setting the element value at the given position.

Parameters:

• oValue The object value to set.

• iIndex The position index in the column.

Type	int
Visibility	public
Is Abstract	true
Parameter	• in iIndex : int • inout oValue : Object

Relation Detail

Association

Name
Related Element	• ColumnAbstraction

Generalization

Name
Related Element	• RangeColumnImpl

Name
Related Element	• NominalColumnImpl

Name
Related Element	• NumericalColumnImpl

Name
Related Element	• DateColumnImpl

Name
Related Element	• CategoricalColumnImpl

Name
Related Element	• BinaryColumnImpl

Enumeration ColumnType

This enumeration contains the different types of columns supported by datapro4j. The following types are currently supported:

• Binary

• Categorical

• Date

• Integer

• Nominal

• Numerical

• Range

Note: If the programmer wants to check the column type, the following code should be used (e.g. for binary columns)

ColumnAbstraction oCol;

…

if (oCol.getType().equals(ColumnType.Binary)) {

…

}

Figure 23. Enumeration ColumnType

Name	ColumnType
Qualified Name	es::uco::kdis::datapro::dataset::Column::ColumnType
Visibility	public
Abstract	false
Base Classifier
Realized Interface

Attribute Detail

Binary

Boolean attribute

Type
Default Value
Visibility	public
Multiplicity

Categorical

Categorical attribute

Type
Default Value
Visibility	public
Multiplicity

Date

Date attribute

Type
Default Value
Visibility	public
Multiplicity

Integer

Integer attribute

Type
Default Value
Visibility	public
Multiplicity

Nominal

Nominal attribute

Type
Default Value
Visibility	public
Multiplicity

Numerical

Numerical attribute

Type
Default Value
Visibility	public
Multiplicity

Range

Range attribute

Type
Default Value
Visibility	public
Multiplicity

Relation Detail

Association

Name
Related Element	• ColumnAbstraction

Class BinaryColumn

This class represents the abstraction of a binary column. Here the methods that provide specific operations on specific binary data are defined.

Figure 24. Class BinaryColumn

Name	BinaryColumn
Qualified Name	es::uco::kdis::datapro::dataset::Column::BinaryColumn
Visibility	public
Abstract	false
Base Classifier	• ColumnAbstraction
Realized Interface

Operation Detail

BinaryColumn

Default constructor. The implementation BinaryColumnImpl is invoked.

Type
Visibility	public
Is Abstract	false
Parameter

BinaryColumn

Constructor with the name of the column as a parameter. The implementation BinaryColumnImpl is invoked.

Parameter:

• sName The name of the column.

Type
Visibility	public
Is Abstract	false
Parameter	• inout sName : String

toCategorical

This method calls the implementation to return a categorical column generated from the binary column. The resulting categorical column defines two categories, one per each binary value (false, true).

Parameters:

• sFalseCategory The category representing the false binary value.

• sTrueCategory The category representing the true binary value.

Notes:

• If the value is an empty or a missing value, then a false value is considered.

• If the value is a null value, then a null value is considered.

Type	CategoricalColumn
Visibility	public
Is Abstract	false
Parameter	• inout sFalseCategory : String • inout sTrueCategory : String

Relation Detail

Generalization

Name
Related Element	• ColumnAbstraction

Class BinaryColumnImpl

This class provides the implementation code accessing real data in a binary column. Binary values are stored as objects of class Boolean.

Note: None of its methods should be directly invoked, but only from its specific abstraction.

Figure 25. Class BinaryColumnImpl

Name	BinaryColumnImpl
Qualified Name	es::uco::kdis::datapro::dataset::Column::BinaryColumnImpl
Visibility	public
Abstract	false
Base Classifier	• ColumnImpl
Realized Interface

Attribute Detail

All attributes are private.

Operation Detail

For a more complete specification of the methods inherited from ColumnImpl, see its specifications above.

addAllValues

Type	int
Visibility	public
Is Abstract	false
Parameter	• inout rgoCol : List<Object>

addValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• inout oValue : Object

addValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int • inout oValue : Object

BinaryColumnImpl

Default constructor.

Type
Visibility	public
Is Abstract	false
Parameter

countEmptyValues

Type	int
Visibility	public
Is Abstract	false
Parameter

countInvalidValues

Type	int
Visibility	public
Is Abstract	false
Parameter

countMissingValues

Type	int
Visibility	public
Is Abstract	false
Parameter

countNullValues

Type	int
Visibility	public
Is Abstract	false
Parameter

getElement

Type	Object
Visibility	public
Is Abstract	false
Parameter	• in iPos : int

getSize

Type	int
Visibility	public
Is Abstract	false
Parameter

getValues

Type	List<Object>
Visibility	public
Is Abstract	false
Parameter

removeValue

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout iIndex : int

setValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int • inout oValue : Object

toCategorical

This method implements the method toCategorical of the binary column abstraction, converting the binary column into a categorical column.

Parameters:

• sName The name of the column. By default this property is set by the abstraction to the current name of the binary column.

• sFalseCategory The category representing the false binary value.

• sTrueCategory The category representing the true binary value.

Type	CategoricalColumn
Visibility	public
Is Abstract	false
Parameter	• inout sName : String • inout sFalseCategory : String • inout sTrueCategory : String

Relation Detail

Generalization

Name
Related Element	• ColumnImpl

Class CategoricalColumn

This class defines the abstraction of a categorical column, where every value belongs to a predefined category. Here the methods that provide specific operations on categorical data are defined.

Figure 26. Class CategoricalColumn

Name	CategoricalColumn
Qualified Name	es::uco::kdis::datapro::dataset::Column::CategoricalColumn
Visibility	public
Abstract	false
Base Classifier	• ColumnAbstraction
Realized Interface

Operation Detail

addCategory

This method calls the implementation to add a new category to the set of allowable values. Categories are included as objects of class String.

Parameter:

• szCategory The new category in the column

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout szCategory : String

CategoricalColumn

Constructor with the name of the column as a parameter. The implementation CategoricalColumnImpl is invoked.

Parameter:

• sName The name of the column

Type
Visibility	public
Is Abstract	false
Parameter	• inout sName : String

CategoricalColumn

Default constructor. The implementation CategoricalColumnImpl is invoked.

Type
Visibility	public
Is Abstract	false
Parameter

getCategoryIndex

This method calls the implementation to return the index in the list of categories of a given string. The value -1 is returned if the value is not found.

Parameter:

• szCategory The string representing the category to be searched in the list of categories

Type	int
Visibility	public
Is Abstract	false
Parameter	• inout szCategory : String

getCategoryList

This method calls the implementation to return the list of categories in the column.

Type	List<Object>
Visibility	public
Is Abstract	false
Parameter

getCategoryName

This method calls the implementation to return the category string stored in a given position of the list of categories. null is returned if the index given is not valid.

Parameter:

• iIndex The index of the wanted category

Type	String
Visibility	public
Is Abstract	false
Parameter	• inout iIndex : Integer

getElementIndex

This method calls the implementation to return the element stored in a given position in the column. The category index is returned, whereas the default method getElement (inherited from ColumnAbstraction) returns the category by name. If the value is invalid, -1 is returned.

Parameter:

• iPos The index of the item in the column

Exceptions:

• IndexOutOfBoundsException

Type	Integer
Visibility	public
Is Abstract	false
Parameter	• in iPos : int

replaceCategory

This method calls the implementation to replace a given category with a new one. Parameters:

• szOldCategory The category string to be replaced

• szNewCategory The new category string to be set

• bJoinCategory If the new category string already exists, then this parameter determines whether the values in of the old category are mixed together with the values of the column whose values coincide

1 is returned if the category is successfully replaced, or 0 otherwise.

Type	int
Visibility	public
Is Abstract	false
Parameter	• in bJoinCategory : boolean • inout szNewCategory : String • inout szOldCategory : String

toBinary

This method calls the implementation to return a binary column generated from the categorical column. Invalid values remain unaltered.

Parameter:

• aReferenceTrueValues The list of category strings to be as true values

Type	BinaryColumn
Visibility	public
Is Abstract	false
Parameter	• inout aReferenceTrueValues : List<String>

toNominal

This method calls the implementation to return a nominal column generated from the strings stored in the categorical column. Nominal values are extracted from the strings representing each category.

Type	NominalColumn
Visibility	public
Is Abstract	false
Parameter

toNumerical

This method calls the implementation to return an integer column generated from the index values assigned to the categories in the source column.

Type	IntegerColumn
Visibility	public
Is Abstract	false
Parameter

Relation Detail

Generalization

Name
Related Element	• ColumnAbstraction

Class CategoricalColumnImpl

This class provides the implementation code accessing real data in a categorical column. Categories are stored as a HashMap between a String and an Integer. Thus, internally, data are stored as an ArrayList of Integer, whereas their correspondences to categories are saved as String.

This class should never be directly invoked, apart from those invocations coming from its abstraction.

Figure 27. Class CategoricalColumnImpl

Name	CategoricalColumnImpl
Qualified Name	es::uco::kdis::datapro::dataset::Column::CategoricalColumnImpl
Visibility	public
Abstract	false
Base Classifier	• ColumnImpl
Realized Interface

Attribute Detail

All attributes are private.

Operation Detail

For a more complete specification of the methods inherited from ColumnImpl, see its specification above. Notice that values can be added both as a String –identifier- and as an Integer–index- (see methods addValue, addAllValues). In both cases only elements belonging to valid categories are added to the set of values in the column.

addCategory

This method implements the functionality of addCategory in the categorical column abstraction, adding a new category to the column. This category should not exist. It returns the index of the new category, if successfully created, or -1 if the category cannot be added.

Parameter:

• sCat The identifier of the new category

Type	int
Visibility	public
Is Abstract	false
Parameter	• inout sCat : String

addValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• inout oValue : Object

addValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• in bForce : boolean • inout oValue : Object

addValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int • inout oValue : Object

CategoricalColumnImpl

Default constructor.

Type
Visibility	public
Is Abstract	false
Parameter

countEmptyValues

Type	int
Visibility	public
Is Abstract	false
Parameter

countInvalidValues

Type	int
Visibility	public
Is Abstract	false
Parameter

countMissingValues

Type	int
Visibility	public
Is Abstract	false
Parameter

countNullValues

Type	int
Visibility	public
Is Abstract	false
Parameter

getCategoryIndex

This method implements the functionality of getCategoryIndex in the column abstraction, returning the index of the category passed as String, or -1 if the category does not exist in the list of categories of the column.

Parameter:

• sCategory The category identifier

Type	int
Visibility	public
Is Abstract	false
Parameter	• inout sCategory : String

getCategoryList

This method implements the functionality of getCategoryIndex in the column abstraction, returning the list of category identifiers comprised by the category list. The resulting list is not sorted.

Type	List<Object>
Visibility	public
Is Abstract	false
Parameter

getCategoryName

This method implements the functionality of getCategoryName in the column abstraction, returning the identifier of the category whose index is passed as parameter. If the category does not exist, then null is returned.

Parameter:

• iIndex The category index

Type	String
Visibility	public
Is Abstract	false
Parameter	• inout iIndex : Integer

getElement

Type	Object
Visibility	public
Is Abstract	false
Parameter	• in iPos : int

getElementIndex

This method implements the functionality of getElementIndex in the column abstraction, returning the category index stored at a given position. Notice that indexes in the category list do not have to be sorted or sequencial, since categories may be successively created and deleted, causing gaps in the index sequence. Always consider category indexes as numerical identifiers, never as sequential indexes.

This method returns -1 if the position given is invalid.

Parameter:

• iPos The position given in the category list.

Exceptions:

• IndexOutOfBoundsException

Type	Integer
Visibility	public
Is Abstract	false
Parameter	• in iPos : int

getSize

Type	int
Visibility	public
Is Abstract	false
Parameter

getValues

Type	List<Object>
Visibility	public
Is Abstract	false
Parameter

removeValue

Type	void
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int

replaceCategory

This method implements the functionality of replaceCategory in the column abstraction, updating both the category list and replacing the values in the column. 1 is returned if done; 0, otherwise.

Parameters:

• sOldCategory The old category identifier to be replaced

• sNewCategory The new category

• bJoinCategory If true, if the new category identifier already exists in the column, then the values with the old category identifier will be joined to the already existing identifier, having only one category as a result

Type	int
Visibility	public
Is Abstract	false
Parameter	• in bJoinCategory : boolean • inout sNewCategory : String • inout sOldCategory : String

setValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int • inout oValue : Object

toBinary

This method implements the functionality of toBinary in the column abstraction, returning a binary column constructed from the data contained in the categorical column. The list of category identifiers considered as True values in the binary column is passes as parameter. The non included category identifiers are considered as False values. Note that invalid values are observed.

Parameters:

• aReferenceTrueValues The list of categories representing true values

• sName The name of the new binary column

Type	BinaryColumn
Visibility	public
Is Abstract	false
Parameter	• inout aReferenceTrueValues : List<String> • inout sName : String

toNominal

This method implements the functionality of toNominal in the column abstraction, returning a nominal column constructed from the data contained in the categorical column. Strings for the nominal column are constructed from the category identifiers.

Parameter:

• sName The name of the new nominal column

Type	NominalColumn
Visibility	public
Is Abstract	false
Parameter	• inout sName : String

toNumerical

This method implements the functionality of toNumerical in the column abstraction, returning an integer column constructed from the data contained in the categorical column. Numbers of the integer column are extracted from the category indexes.

Parameter:

• sName The name of the new integer column

Type	IntegerColumn
Visibility	public
Is Abstract	false
Parameter	• inout sName : String

Relation Detail

Generalization

Name
Related Element	• RangeColumnImpl

Name
Related Element	• ColumnImpl

Class DateColumn

This class represents the abstraction of a date datatype column. This type of column is specifically required by ARFF datasets.

Figure 28. Class DateColumn

Name	DateColumn
Qualified Name	es::uco::kdis::datapro::dataset::Column::DateColumn
Visibility	public
Abstract	false
Base Classifier	• ColumnAbstraction
Realized Interface

Operation Detail

addDateSpecification

This method calls the implementation to set the date format specification of the values in the column.

Parameter:

• sDate The format specification of the values in the date column

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout oDate : SimpleDateFormat

DateColumn

Default constructor with no parameters. The implementation DateColumnImpl is invoked.

Type
Visibility	public
Is Abstract	false
Parameter

DateColumn

Constructor with the name of the column as a parameter. The implementation DateColumnImpl is invoked.

Parameter:

• sName The name of the column

Type
Visibility	public
Is Abstract	false
Parameter	• inout sName : String

getDateSpecification

This method calls the implementation to get the date format specification of the values in the column.

Type	SimpleDateFormat
Visibility	public
Is Abstract	false
Parameter

Relation Detail

Generalization

Name
Related Element	• ColumnAbstraction

Class DateColumnImpl

This class provides the implementation code accessing real data in a date column. Values are stored as

Date objects according to the format specified by a given SimpleDateFormat object. This class should not be invoked directly, only by the column abstraction.

Figure 29. Class DateColumnImpl

Name	DateColumnImpl
Qualified Name	es::uco::kdis::datapro::dataset::Column::DateColumnImpl
Visibility	public
Abstract	false
Base Classifier	• ColumnImpl
Realized Interface

Attribute Detail

All attributes are private.

Operation Detail

For a more complete specification of the methods inherited from ColumnImpl, see its specifications above.

addAllValues

Type	int
Visibility	public
Is Abstract	false
Parameter	• inout rgoCol : List<Object>

addDateSpecification

This method implements the method addDateSpecification of the date column abstraction, setting the date format specification of the values in the column.

Parameter:

• sDate The format specification of the values in the date column

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout oDate : SimpleDateFormat

addValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• inout oValue : Object

addValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• in bForce : boolean • inout oValue : Object

addValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int • inout oValue : Object

DateColumnImpl

Default constructor with no parameters.

Type
Visibility	public
Is Abstract	false
Parameter

getDateSpecificaiton

This method implements the method getDateSpecification of the column abstraction, returning the date format specification of the values in the column.

Type	SimpleDateFormat
Visibility	public
Is Abstract	false
Parameter

getElement

Type	Object
Visibility	public
Is Abstract	false
Parameter	• in iPos : int

getSize

Type	int
Visibility	public
Is Abstract	false
Parameter

getValues

Type	List<Object>
Visibility	public
Is Abstract	false
Parameter

removeValue

Type	void
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int

setValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int • inout oValue : Object

Relation Detail

Generalization

Name
Related Element	• ColumnImpl

Class IntegerColumn

This class represents the abstraction of an integer column. Integer columns are a specialization of numerical (real) columns.

Figure 30. Class IntegerColumn

Name	IntegerColumn
Qualified Name	es::uco::kdis::datapro::dataset::Column::IntegerColumn
Visibility	public
Abstract	false
Base Classifier	• NumericalColumn
Realized Interface

Operation Detail

Many methods are specializations of their respective methods in the numerical column (NumericalColumn), adapted to the domain of integer values.

getiMaxInterval

Analogously to getdMaxInterval in the NumericalColumn abstraction class, this method gets the maximum integer value allowed for this column.

Type	Integer
Visibility	public
Is Abstract	false
Parameter

getiMinInterval

Analogously to getdMinInterval in the NumericalColumn abstraction class, this method gets the minimum integer value allowed for this column.

Type	Integer
Visibility	public
Is Abstract	false
Parameter

getMaxValue

See getMaxValue in the specification of the NumericalColumn abstraction class.

Type	double
Visibility	public
Is Abstract	false
Parameter

getMinValue

For further information, see getMinValue in the specification of the NumericalColumn abstraction class.

Type	double
Visibility	public
Is Abstract	false
Parameter

IntegerColumn

Default constructor with no parameters.

Type
Visibility	public
Is Abstract	false
Parameter

IntegerColumn

Constructor with the name of the resulting column as a parameter.

Parameter:

• sName The Name of the column

Type
Visibility	public
Is Abstract	false
Parameter	• inout sName : String

mean

For further information, see mean in the specification of the NumericalColumn abstraction class.

Type	double
Visibility	public
Is Abstract	false
Parameter

setiMaxInterval

Analogously to setdMaxInterval in the NumericalColumn abstraction class, this method sets the maximum integer value allowed for this column.

Parameter:

• iMaxInterval The maximum value allowed in the column

Exceptions:

• IllegalAccessException if the value cannot be set.

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout iMaxInterval : Integer

setiMinInterval

Analogously to setdMinInterval in the NumericalColumn abstraction class, this method sets the minimum integer value allowed for this column.

Parameter:

• iMinInterval The maximum value allowed in the column

Exceptions:

• IllegalAccessException if the value cannot be set.

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout iMinInterval : Integer

standardDeviation

For further information, see standardDeviation in the specification of the NumericalColumn abstraction class.

Type	double
Visibility	public
Is Abstract	false
Parameter

toCategorical

This method calls the implementation to return a categorical column using the values contained in the integer column, where each different value constitutes a different category.

Type	CategoricalColumn
Visibility	public
Is Abstract	false
Parameter

toNumerical

This method calls the implementation to return a numerical column using the values contained in the integer column, where each integer value is casted to a double value.

Type	NumericalColumn
Visibility	public
Is Abstract	false
Parameter

Relation Detail

Generalization

Name
Related Element	• NumericalColumn

Class IntegerColumnImpl

This class provides the implementation code accessing real data in an integer column. This class is a specialization of the numerical column implementation (NumericalColumnImpl). Integer values are stored as objects of class Integer. This class and its methods should not be invoked directly.

Figure 31. Class IntegerColumnImpl

Name	IntegerColumnImpl
Qualified Name	es::uco::kdis::datapro::dataset::Column::IntegerColumnImpl
Visibility	public
Abstract	false
Base Classifier	• NumericalColumnImpl
Realized Interface

Operation Detail

For further information, see a complete specification of these methods in NumericalColumnImpl and ColumnImpl.

addValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• inout oValue : Object

addValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int • inout oValue : Object

getMaxValue

Type	double
Visibility	public
Is Abstract	false
Parameter

getMinValue

Type	double
Visibility	public
Is Abstract	false
Parameter

IntegerColumnImpl

Default constructor with no parameters.

Type
Visibility	public
Is Abstract	false
Parameter

mean

Type	double
Visibility	public
Is Abstract	false
Parameter

setValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int • inout oValue : Object

standardDeviation

Type	double
Visibility	public
Is Abstract	false
Parameter

toCategorical

This method implements the method toNumerical of the abstraction, returning a categorical column using the values contained in the integer column, where each different value constitutes a different category.

Parameter:

• sName The name of the resulting column

Type	CategoricalColumn
Visibility	public
Is Abstract	false
Parameter	• inout sName : String

toNumerical

This method implements the method toNumerical of the abstraction, returning a numerical column using the values contained in the integer column, where each different value constitutes a different category.

Parameter:

• sName The name of the resulting column

Type	NumericalColumn
Visibility	public
Is Abstract	false
Parameter	• inout sName : String

Relation Detail

Generalization

Name
Related Element	• NumericalColumnImpl

Class NominalColumn

This class represents the abstraction of a nominal column containing free-style strings as values. Here the methods that provide specific operations of nominal values are defined.

Figure 32. Class NominalColumn

Name	NominalColumn
Qualified Name	es::uco::kdis::datapro::dataset::Column::NominalColumn
Visibility	public
Abstract	false
Base Classifier	• ColumnAbstraction
Realized Interface

Operation Detail

NominalColumn

Default constructor with no parameters.

Type
Visibility	public
Is Abstract	false
Parameter

NominalColumn

Constructor with the name of the column as parameter.

Parameter:

• sName Name of the column

Type
Visibility	public
Is Abstract	false
Parameter	• inout sName : String

toCategorical

This method calls the implementation to return a categorical column, where each different string is a category (no repetition).

Type	CategoricalColumn
Visibility	public
Is Abstract	false
Parameter

toNumerical

This method calls the implementation to return a numerical column, where each string is parsed to a numerical value. If the string could not be parsed, then an empty value is added instead. Notice that upper and lower interval values are not set for the numerical column returned.

Type	NumericalColumn
Visibility	public
Is Abstract	false
Parameter

Relation Detail

Generalization

Name
Related Element	• ColumnAbstraction

Class NominalColumnImpl

This class provides the implementation code accessing real data in the nominal column. Nominal values are stored as String objects. Note that these methods should not be invoked directly.

Figure 33. Class NominalColumnImpl

Name	NominalColumnImpl
Qualified Name	es::uco::kdis::datapro::dataset::Column::NominalColumnImpl
Visibility	public
Abstract	false
Base Classifier	• ColumnImpl
Realized Interface

Attribute Detail

All attributes are private.

Operation Detail

For a more detailed specification of the methods inherited from ColumnImpl, see its specification above.

addAllValues

Type	int
Visibility	public
Is Abstract	false
Parameter	• inout rgoCol : List<Object>

addValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• inout oValue : Object

addValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int • inout oValue : Object

countEmptyValues

Type	int
Visibility	public
Is Abstract	false
Parameter

countInvalidValues

Type	int
Visibility	public
Is Abstract	false
Parameter

countMissingValues

Type	int
Visibility	public
Is Abstract	false
Parameter

countNullValues

Type	int
Visibility	public
Is Abstract	false
Parameter

getElement

Type	Object
Visibility	public
Is Abstract	false
Parameter	• in iPos : int

getSize

Type	int
Visibility	public
Is Abstract	false
Parameter

getValues

Type	List<Object>
Visibility	public
Is Abstract	false
Parameter

NominalColumnImpl

Default constructor with no parameters.

Type
Visibility	public
Is Abstract	false
Parameter

removeValue

Type	void
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int

setValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int • inout oValue : Object

toCategorical

This method implements the method toCategorical of the abstraction, returning a categorical column, where each different string is a category (no repetition).

Parameter:

• sName The name of the column to be created

Type	CategoricalColumn
Visibility	public
Is Abstract	false
Parameter	• inout sName : String

toNumerical

This method implements the method toNumerical of the abstraction, returning a numerical column, where each string is parsed to a numerical value. If the string could not be parsed, then an empty value is added instead. Notice that upper and lower interval values are not set for the numerical column returned.

Parameter:

• sName The name of the column

Type	NumericalColumn
Visibility	public
Is Abstract	false
Parameter	• inout sName : String

Relation Detail

Generalization

Name
Related Element	• ColumnImpl

Class NumericalColumn

This class represents the abstraction of a numerical (real) column.

Figure 34. Class NumericalColumn

Name	NumericalColumn
Qualified Name	es::uco::kdis::datapro::dataset::Column::NumericalColumn
Visibility	public
Abstract	false
Base Classifier	• ColumnAbstraction
Realized Interface

Attribute Detail

dMaxInterval

This attribute indicates the maximum value allowed in the column. This property should be accessed using getter/setter methods.

Type	Double
Default Value	Double.MAX_VALUE
Visibility	protected
Multiplicity

dMinInterval

This attribute indicates the minimum value allowed in the column. This property should be accessed using getter/setter methods.

Type	Double
Default Value	Double.MIN_VALUE
Visibility	protected
Multiplicity

Operation Detail

getdMaxInterval

This method returns the maximum real value allowed in the column. The implementation is not invoked, since this information is part of metadata.

Type	Double
Visibility	public
Is Abstract	false
Parameter

getdMinInterval

This method returns the minimum real value allowed in the column. The implementation is not invoked, since this information is part of metadata.

Type	Double
Visibility	public
Is Abstract	false
Parameter

getMaxValue

This method calls the implementation to get the maximum existing value in the column data.

Type	double
Visibility	public
Is Abstract	false
Parameter

getMinValue

This method calls the implementation to get the minimum existing value in the column data.

Type	double
Visibility	public
Is Abstract	false
Parameter

mean

This method calls the implementation to get the mean value of the column data.

Type	double
Visibility	public
Is Abstract	false
Parameter

normalize

This method calls the implementation to normalize the set of values in the numerical column.

Type	void
Visibility	public
Is Abstract	false
Parameter

NumericalColumn

Default constructor with no parameters. The implementation NumericalColumnImpl is invoked.

Type
Visibility	public
Is Abstract	false
Parameter

NumericalColumn

Constructor with the name of the column as a parameter. The implementation NumericalColumnImpl is invoked.

Parameter:

• sName The name of the column

Type
Visibility	public
Is Abstract	false
Parameter	• inout sName : String

setdMaxInterval

This method sets the maximum real value allowed in the column. The implementation is not invoked, since this information is part of the column metadata.

Parameter

• dMaxInterval The maximum value allowed

Exceptions:

• IllegalAccessException if the value cannot be set

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout dMaxInterval : Double

setdMinInterval

This method sets the minimum real value allowed in the column. The implementation is not invoked, since this information is part of the column metadata.

Parameter

• dMinInterval The minimum value allowed

Exceptions:

• IllegalAccessException if the value cannot be set

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout dMinInterval : Double

standardDeviation

This method calls the implementation to return the standard deviation calculated from the set of values in the numerical column.

Type	double
Visibility	public
Is Abstract	false
Parameter

standarize

This method calls the implementation to standarize the set of values in the numerical column.

Parameters:

• dMean Value of the mean used to standardize the set of values of the column

• dVariance Value of the variance used for the standardization

Type	void
Visibility	public
Is Abstract	false
Parameter	• in dMean : double • in dVariance : double

toInteger

This method calls the implementation to return an integer column containing values extracted from the numerical column. It returns an IntegerColumn object.

Parameter:

• bRoundedValue if false, values are truncated; if true, values are rounded.

Type	IntegerColumn
Visibility	public
Is Abstract	false
Parameter	• in bRoundedValue : boolean

toNominal

This method calls the implementation to return a nominal column, where strings are constructed from real values.

Type	NominalColumn
Visibility	public
Is Abstract	false
Parameter

Relation Detail

Generalization

Name
Related Element	• ColumnAbstraction

Class NumericalColumnImpl

This class provides the implementation code accessing real data in a numerical column. Values are stored as objects of the class Double. Notice that this class should not be directly instantiated, with the exception of its abstraction.

Figure 35. Class NumericalColumnImpl

Name	NumericalColumnImpl
Qualified Name	es::uco::kdis::datapro::dataset::Column::NumericalColumnImpl
Visibility	public
Abstract	false
Base Classifier	• ColumnImpl
Realized Interface

Attribute Detail

All the attributes are either private or protected.

Operation Detail

addAllValues

Type	int
Visibility	public
Is Abstract	false
Parameter	• inout rgoCol : List<Object>

addValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• inout oValue : Object

addValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int • inout oValue : Object

countEmptyValues

Type	int
Visibility	public
Is Abstract	false
Parameter

countInvalidValues

Type	int
Visibility	public
Is Abstract	false
Parameter

countMissingValues

Type	int
Visibility	public
Is Abstract	false
Parameter

countNullValues

Type	int
Visibility	public
Is Abstract	false
Parameter

getElement

Type	Object
Visibility	public
Is Abstract	false
Parameter	• in iPos : int

getMaxValue

This method implements the method getMaxValue of the abstraction class, returning the maximum existing value in the column.

Type	double
Visibility	public
Is Abstract	false
Parameter

getMinValue

This method implements the method getMinValue of the abstraction class, returning the maximum existing value in the column.

Type	double
Visibility	public
Is Abstract	false
Parameter

getSize

Type	int
Visibility	public
Is Abstract	false
Parameter

getValues

Type	List<Object>
Visibility	public
Is Abstract	false
Parameter

mean

This method implements the method mean of the abstraction class, returning the mean value of the column.

Type	double
Visibility	public
Is Abstract	false
Parameter

normalize

This method implements the method normalize of the abstraction class, calculating and normalizing the values contained in the set of values of the column.

Type	void
Visibility	public
Is Abstract	false
Parameter

NumericalColumnImpl

Default constructor with no parameters.

Type
Visibility	public
Is Abstract	false
Parameter

removeValue

Type	void
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int

setValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int • inout oValue : Object

standardDeviation

This method implements the method standardDeviation of the abstraction class, returning the standard deviation value of the set of values contained in the numerical column.

Type	double
Visibility	public
Is Abstract	false
Parameter

standarize

This method implements the method standarize of the abstraction class, standardizing the values in the column according to the mean and variance passed as parameter.

Parameters:

• dMean Mean value considered for the standardization

• dVariance Variance value considered for the standardization

Type	void
Visibility	public
Is Abstract	false
Parameter	• in dMean : double • in dVariance : double

toInteger

This method implements the method toInteger of the abstraction class, returning an integer column calculated from the numerical column.

Parameters:

• sName The name of the resulting new column

• bRoundedValue If false, values are truncated; if true, values are rounded

Type	IntegerColumn
Visibility	public
Is Abstract	false
Parameter	• in bRoundedValue : boolean • inout sName : String

toNominal

This method implements the method toNominal of the abstraction class, returning a nominal column which strings are constructed parsing the numerical values in the column.

Parameter:

• sName The name of the resulting new column

Type	NominalColumn
Visibility	public
Is Abstract	false
Parameter	• inout sName : String

Relation Detail

Generalization

Name
Related Element	• ColumnImpl

Class RangeColumn

This class represents the abstraction of a range column, whose values are intervals with a minimum and a maximum value in the range.

Figure 36. Class RangeColumn

Name	RangeColumn
Qualified Name	es::uco::kdis::datapro::dataset::Column::RangeColumn
Visibility	public
Abstract	false
Base Classifier	• ColumnAbstraction
Realized Interface

Operation Detail

RangeColumn

Default constructor with no parameters.

Type
Visibility	public
Is Abstract	false
Parameter

RangeColumn

Constructor with the name of the column as a parameter.

Parameter:

• sName The name of the column.

Type
Visibility	public
Is Abstract	false
Parameter	• inout sName : String

toCategorical

This method calls the implementation to return a categorical column extracted from the range data contained in the column. The method returns a CategoricalColumn object.

Exceptions:

• NotAddedValueException

Type	CategoricalColumn
Visibility	public
Is Abstract	false
Parameter

toNumerical

This method calls the implementation to return a numerical column extracted from the range values contained in the column, and according to on of the following modes:

0: The minimum value of each range is selected.

1: The maximum value of each range is selected.

2: The mean value between min and max is selected.

3: A random value in the range is selected.

It returns the resulting NumericalColumn object.

Parameter:

• iMode An integer between 0 and 3 indicating the conversion mode, as described above.

Exceptions:

• NotAddedValueException

Type	NumericalColumn
Visibility	public
Is Abstract	false
Parameter	• inout iMode : int

toNumericalByGaussian

This method calls the implementation to return a numerical column extracted from the range values contained in the column, according to the Gauss distribution.

Parameters:

• dMean The arithmetic mean for the distribution

• dStdDev The standard deviation for the distribution

It returns the resulting NumericalColumn object.

Exceptions:

• NotAddedValueException

Type	NumericalColumn
Visibility	public
Is Abstract	false
Parameter	• in dMean : double • in dStdDev : double

Relation Detail

Generalization

Name
Related Element	• ColumnAbstraction

Class RangeColumnImpl

This class, the abstraction of a range column (i.e. a representation of a [min, max] interval), is the one that should be used by the programmer, since it hides the actual implementation of the column. Even when the implementation changes, the abstraction must remain unaltered.

Figure 37. Class RangeColumnImpl

Name	RangeColumnImpl
Qualified Name	es::uco::kdis::datapro::dataset::Column::RangeColumnImpl
Visibility	public
Abstract	false
Base Classifier	• ColumnImpl
Realized Interface

Attribute Detail

All attributes are private.

Operation Detail

For a detailed specification of the methods inherited from ColumnImpl, see its specifications above.

addAllValues

Type	int
Visibility	public
Is Abstract	false
Parameter	• inout rgoValues : List<Object>

addValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• inout oValue : Object

addValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int • inout oValue : Object

countEmptyValues

Type	int
Visibility	public
Is Abstract	false
Parameter

countInvalidValues

Type	int
Visibility	public
Is Abstract	false
Parameter

countMissingValues

Type	int
Visibility	public
Is Abstract	false
Parameter

countNullValues

Type	int
Visibility	public
Is Abstract	false
Parameter

getElement

Type	Object
Visibility	public
Is Abstract	false
Parameter	• in iPos : int

getSize

Type	int
Visibility	public
Is Abstract	false
Parameter

getValues

Type	List<Object>
Visibility	public
Is Abstract	false
Parameter

RangeColumn

Default constructor with no parameters.

Type
Visibility	public
Is Abstract	false
Parameter

RangeColumn

Constructor with the name of the column as a Parameter.

Parameter:

• sName The name of the column.

Type
Visibility	public
Is Abstract	false
Parameter	• inout sName : String

removeValue

Type
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int

setValue

Type	int
Visibility	public
Is Abstract	false
Parameter	• in iIndex : int • inout oValue : Object

toCategorical

This method implements the method toCategorical of the abstraction, returning a categorical column extracted from the range data contained in the column. The method returns the resulting CategoricalColumn object.

Exceptions:

• NotAddedValueException

Type	CategoricalColumn
Visibility	public
Is Abstract	false
Parameter

toNumerical

This method implements the method toNumerical of the abstraction, returning a numerical column extracted from the range values contained in the column, and according to on of the following modes:

0: The minimum value of each range is selected.

1: The maximum value of each range is selected.

2: The mean value between min and max is selected.

3: A random value in the range is selected.

It returns the resulting NumericalColumn object.

Parameter:

• iMode An integer between 0 and 3 indicating the conversion mode, as described above.

Exceptions:

• NotAddedValueException

Type	NumericalColumn
Visibility	public
Is Abstract	false
Parameter	• in iMode : int

toNumericalByGaussian

This method implements the method toNumericalByGaussian of the abstraction, returning a numerical column extracted from the range values contained in the column, according to the Gauss distribution.

Parameters:

• dMean The arithmetic mean for the distribution

• dStdDev The standard deviation for the distribution

It returns the resulting NumericalColumn object.

Exceptions:

• NotAddedValueException

Type	NumericalColumn
Visibility	public
Is Abstract	false
Parameter	• in dMean : double • in dStdDev : double

Relation Detail

Generalization

Name
Related Element	• ColumnImpl

Attribute Detail

Some attributes are protected to allow reusability by inheritance.

ATTRIBUTE

ATTRIBUTE is the static constant string for the ARFF keyword '@attribute'.

Type	String
Default Value	"@attribute"
Visibility	protected
Multiplicity

DATA

DATA is the static constant string for the ARFF keyword '@data'. It defines the beginning of the data block in the ARFF file.

Type	String
Default Value	"@data"
Visibility	protected
Multiplicity

RELATION

RELATION is the static constant with the ARFF keyword '@relation'. It represents the beginning of the ARFF dataset definition.

Type	String
Default Value	"@relation"
Visibility	protected
Multiplicity

Operation Detail

addAllValues

This method reads the DATA block in the dataset and adds the values in the file to the corresponding column structure.

Parameter:

• sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.

o s: Nominal column

o f: Numerical (real) column

o c: Categorical column

o b: Binary column

o d: Date column

o %: Skip this column (do not dump its values to any column)

For example, the string “cbbf%%d” indicates the sequence of attributes that are read from the dataset and copied in memory to the column structure of the dataset: a categorical column, two binary columns, and a numerical column. The following two attributes are omitted. Finally, the date attribute is copied.

Exceptions:

• IndexOutOfBoundsException

• IOException

• NotAddedValueException

Type	void
Visibility	protected
Is Abstract	false
Parameter	• inout sColumnFormat : String

ArffDataset

Default constructor with no parameters. No dataset filename is specified using this constructor.

Type
Visibility	public
Is Abstract	false
Parameter

ArffDataset

Constructor with the filename of the dataset as a parameter.

Parameter:

• sFileName The filename of the dataset

Type
Visibility	public
Is Abstract	false
Parameter	• inout sFileName : String

close

This method closes the ARFF file.

Exception:

• IOException

Type	void
Visibility	protected
Is Abstract	false
Parameter

obtainMetadata

This method reads the metadata of an ARFF file. Each attribute specification is interpreted and, if required, the column structure is created in the dataset.

This method reads the metadata block of the dataset. Parameter:

• sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.

o s: Nominal column

o f: Numerical (real) column

o c: Categorical column

o b: Binary column

o d: Date column

o %: Skip this column (do not dump its values to any column)

For example, the code "bbf%c" indicates that two binary columns and a numerical (real) column will be read. Then, the forth attribute will be skipped and, finally, a categorical column will be read.

Exceptions:

• IOException

• InputMismatchException

Type	void
Visibility	protected
Is Abstract	false
Parameter	• inout sColumnFormat : String

open

This method opens the dataset file using the name passed as a parameter to the constructor.

Exceptions:

• FileNotFoundException

Type	void
Visibility	protected
Is Abstract	false
Parameter

readDataset

This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.

Parameter:

• sContentFormat Not considered for ARFF datasets

• sColumnFormat String that specifies the types of columns to be read. Each column type is represented by one of the following symbols:

o s: Nominal column

o f: Numerical column

o c: Categorical column

o b: Binary column

o d: Date column

o %: Skip this column

Exceptions:

• NotAddedValueException

• IOException

• IndexOutOfBoundsException

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout sColumnFormat : String • inout sContentFormat : String

readDataset

This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.

Parameter:

• sColumnFormat String that specifies the types of columns to be read. Each column type is represented by one of the following symbols:

o s: Nominal column

o f: Numerical column

o c: Categorical column

o b: Binary column

o d: Date column

o %: Skip this column

Exceptions:

• NotAddedValueException

• IOException

• IndexOutOfBoundsException

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout sColumnFormat : String

readDataset

This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file. The value of the column format string is null.

Exceptions:

• NotAddedValueException

• IOException

• IndexOutOfBoundsException

Type	void
Visibility	public
Is Abstract	false
Parameter

writeDataset

This method opens the dataset file, writes metadata and instances, and closes the file. The column types accepted (otherwise, an InputMismatchException exception is thrown) are the following:

• Numerical

• Date

• Nominal

• Categorical

• Boolean (binary values are saved as categorical values)

Parameter:

• sOutputFile The filename of the dataset

Exceptions:

• InputMismatchException

• IOException

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout sOutputFile : String

Relation Detail

Generalization

Name
Related Element	• FileDataset

Class CsvDataset

CsvDataset implements the CSV (Comma-Separated Values) dataset file specification, as prescribed by the IETF specification, available from http://tools.ietf.org/html/rfc4180 (October, 2005).

Figure 40. Class CsvDataset

Name	CsvDataset
Qualified Name	es::uco::kdis::datapro::dataset::Source::CsvDataset
Visibility	public
Abstract	false
Base Classifier	• FileDataset
Realized Interface

Operation Detail

addAllValues

This method adds all the values in the file to the corresponding column structure. Parameter:

• sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.

o s: Nominal column

o f: Numerical (real) column

o i: Integer column

o c: Categorical column

o %: Skip this column (do not dump its values to any column)

For example, the string “cf%%s” indicates the sequence of attributes that are read from the dataset and copied in memory to the column structure of the dataset: a categorical column and a numerical column. The following two attributes are omitted, and, finally, the date attribute is copied.

Exceptions:

• IndexOutOfBoundsException

• IOException

• NotAddedValueException

Type	void
Visibility	protected
Is Abstract	false
Parameter	• inout sColumnFormat : String

close

This method closes the CSV file.

Exception:

• IOException

Type	void
Visibility	protected
Is Abstract	false
Parameter

CsvDataset

The default constructor of the CSV dataset with no parameters.

Type
Visibility	public
Is Abstract	false
Parameter

CsvDataset

Constructor of the CSV dataset with its filename as a parameter.

Parameter:

• sFileName The filename of the CVS dataset

Type
Visibility	public
Is Abstract	false
Parameter	• inout sFileName : String

obtainMetadata

This method reads the metadata of the CSV file. Notice that any metainformation in CSV files is optional.

Parameter:

• sContentFormat String that specifies the structure of the CSV file. The following symbols are used:

o n: Indicates that a line with the attribute names is read

o v: Indicates the block containing the instance values is read

o %: Skip one row in the file

• sColumnFormat String that specifies the different column types to be read from the file. Each column type is represented by one of the following symbols:

o s: Nominal column

o f: Numerical (real) column

o c: Categorical column

o i: Integer column

o %: Skip this column

Exceptions:

• IOException

• IllegalFormatSpecificationException

Type	void
Visibility	Protected
Is Abstract	false
Parameter	• inout sColumnFormat : String • inout sContentFormat : String

open

This method opens the dataset CSV file using the name passed as a parameter to the constructor.

Exceptions:

• FileNotFoundException

Type	void
Visibility	protected
Is Abstract	false
Parameter

readDataset

This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.

Parameter:

• sContentFormat String that specifies the structure of the CSV file. The following symbols are used:

o n: Indicates that a line with the attribute names is read

o v: Indicates the block containing the instance values is read

o %: Skip one row in the file

For example, “%n%%v” omits the first line, then reads the column names, omits

the next two lines and, finally, reads the dataset instances

• sColumnFormat String that specifies the types of columns to be read. Each column type is represented by one of the following symbols:

o s: Nominal column

o f: Numerical column

o i: Integer column

o c: Categorical column

o %: Skip this column

Exceptions:

• NotAddedValueException

• IOException

• IndexOutOfBoundsException

• IllegalFormatSpecificationException

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout sColumnFormat : String • inout sContentFormat : String

readDataset

This method opens the dataset, reads metainformation and instances and, finally, closes the dataset file. This method assumes the following file format: one first line with the attribute names (metadata), followed by the instances.

Parameter:

• sColumnFormat String that specifies the types of columns to be read. Each column type is represented by one of the following symbols:

o s: Nominal column

o f: Numerical column

o i: Integer column

o c: Categorical column

o %: Skip this column

Exceptions:

• NotAddedValueException

• IOException

• IndexOutOfBoundsException

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout sColumnFormat : String

writeDataset

This method writes a new CVS dataset file. The column types allowed for writing are the following:

• Numerical

• Integer

• Nominal

• Categorical

• Binary (binary values are saved as categorical values)

Parameter:

• sOutputFile The filename of the dataset

Exceptions:

• IOException

Type	void
Visibility	public
Is Abstract	false
Parameter	• inout sOutputFile : String

Relation Detail

Generalization

Name
Related Element	• FileDataset

Class ExcelDataset

ExcelDataset is a class that represents a dataset conformant to the Microsoft Excel standard specification. This type of files has the basic features of all spreadsheets, using a grid of cells arranged in numbered rows and letter-named columns.

Note: This class has external dependencies to the Java library POI.

Figure 41. Class ExcelDataset

Name	ExcelDataset
Qualified Name	es::uco::kdis::datapro::dataset::Source::ExcelDataset
Visibility	public
Abstract	false
Base Classifier	• FileDataset
Realized Interface

Attribute Detail

All attributes are private.

Operation Detail

addAllValues

This method adds all the values in the DATA block of the file to the corresponding column structure. Parameter:

• sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.

o s: Nominal column

o f: Numerical (real) column

o i: Integer column

o c: Categorical column

o %: Skip this column (do not dump its values to any column)

Exceptions:

• IndexOutOfBoundsException

• IOException

• NotAddedValueException

Type	void
Visibility	protected
Is Abstract	false
Parameter	• inout sColumnFormat : String

close

Close the Excel file.

Exceptions:

• IOException

Type	void
Visibility	Protected
Is Abstract	false
Parameter

ExcelDataset

Default constructor with no parameters.

Type
Visibility	public
Is Abstract	false
Parameter

ExcelDataset

Constructor with the filename as parameter.

Parameter:

• sFileName The filename of the Excel dataset

Type
Visibility	public
Is Abstract	false
Parameter	• inout sFileName : String

obtainMetadata

This method reads the metadata of the Excel file.

Parameter:

• sContentFormat String that specifies the data structure in the Excel file. The following symbols are used:

o n: Indicates that a line with the attribute names is read

o v: Indicates the block containing the instance values is read

o %: Skip one row in the file

• sColumnFormat String that specifies the different column types to be read from the file. Each column type is represented by one of the following symbols:

o s: Nominal column

o f: Numerical (real) column

o c: Categorical column

o i: Integer column

o %: Skip this column

Exceptions:

• IOException

• IllegalFormatSpecificationException

Type	void
Visibility	protected
Is Abstract	false
Parameter	• inout sColumnFormat : String • inout sContentFormat : String

open

This method opens the Excel file using the name passed as a parameter to the constructor.

Exceptions:

• FileNotFoundException

Type	void
Visibility	protected
Is Abstract	false
Parameter

readDataset

This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.

Parameter:

• sContentFormat String that specifies the structure of the CSV file. The following symbols are used:

o n: Indicates that a line with the attribute names is read

o v: Indicates the block containing the instance values is read

o %: Skip one row in the file

For example, “%n%%v” omits the first line, then reads the column names, omits

the next two lines and, finally, reads the dataset instances

• sColumnFormat String that specifies the types of columns to be read. Each column type is represented by one of the following symbols:

o s: Nominal column

o f: Numerical column

o i: Integer column

o c: Categorical column

o %: Skip this column

Version	Date	Description	Participants
0.1	July 2011	Initial version. Intruder algorithms.	ARQ, JTL, JML, JRR
0.2	September 2011	Strategies and columns	MOB, JML, JRR
0.3	April 2012	Refactoring, performance improvements and testing	ARQ, JML, JRR
0.4	Under development	Weka wrappers for preprocessing, association, clustering and classification	JRR
0.5	Under development	New dataset sources from relational databases and noSQL databases	JRR

Revision	Date	Description	Author
1	July 17, 2012	Initial version of this document	JRR

Name	algorithm
Qualified Name	es::uco::kdis::datapro::algorithm

Name	base
Qualified Name	es::uco::kdis::datapro::algorithm::base

Name	DatasetStrategy
Qualified Name	es::uco::kdis::datapro::algorithm::base::DatasetStrategy
Visibility	public
Abstract	true
Base Classifier
Realized Interface

Name	ReverseBandwagonAttack
Qualified Name	es::uco::kdis::datapro::algorithm::intruder::ReverseBandwagonAttack
Visibility	public
Abstract	false
Base Classifier	• BandwagonAttack
Realized Interface

Name	SegmentAttack
Qualified Name	es::uco::kdis::datapro::algorithm::intruder::SegmentAttack
Visibility	public
Abstract	false
Base Classifier	• IntruderAttack
Realized Interface

Name	preprocessing
Qualified Name	es::uco::kdis::datapro::algorithm::preprocessing

Name	discretization
Qualified Name	es::uco::kdis::datapro::algorithm::preprocessing::discretization

Name	EqualWidthDiscretization
Qualified Name	es::uco::kdis::datapro::algorithm::preprocessing::discretization::EqualWidthDi scretization
Visibility	public
Abstract	false
Base Classifier	• DatasetStrategy
Realized Interface

Name	EqualFrequencyDiscretization
Qualified Name	es::uco::kdis::datapro::algorithm::preprocessing::discretization::EqualFrequen cyDiscretization
Visibility	public
Abstract	false
Base Classifier	• DatasetStrategy • EqualWidthDiscretization
Realized Interface

Name	MDLPDiscretize
Qualified Name	es::uco::kdis::datapro::algorithm::preprocessing::discretization::MDLPDiscreti ze
Visibility	public
Abstract	false
Base Classifier	• DatasetStrategy
Realized Interface

Name	instance
Qualified Name	es::uco::kdis::datapro::algorithm::preprocessing::instance

Name	RemoveDuplicates
Qualified Name	es::uco::kdis::datapro::algorithm::preprocessing::instance::RemoveDuplicates
Visibility	public
Abstract	false
Base Classifier	• DatasetStrategy
Realized Interface

Name	RemovePercentage
Qualified Name	es::uco::kdis::datapro::algorithm::preprocessing::instance::RemovePercentag e
Visibility	public
Abstract	false
Base Classifier	• DatasetStrategy
Realized Interface

Type	Random
Default Value	new Random()
Visibility	public
Multiplicity

Name	validation
Qualified Name	es::uco::kdis::datapro::algorithm::validation

Name	KFolds
Qualified Name	es::uco::kdis::datapro::algorithm::validation::KFolds
Visibility	public
Abstract	false
Base Classifier	• DatasetStrategy
Realized Interface

Name	Column
Qualified Name	es::uco::kdis::datapro::dataset::Column

Type	ColumnImpl
Default Value
Visibility	protected
Multiplicity	1

Name	Source
Qualified Name	es::uco::kdis::datapro::dataset::Source

Name	KeelDataset
Qualified Name	es::uco::kdis::datapro::dataset::Source::KeelDataset
Visibility	public
Abstract	false
Base Classifier	• ArffDataset
Realized Interface

Scope

License

Overview

To-do list

Package es::uco::kdis::datapro::algorithm

Package es::uco::kdis::datapro::algorithm::base

Class DatasetStrategy

Attribute Detail

bExecutable

oDataset

Operation Detail

execute

getDataset

getResult

initialize

isExecutable

postexec

setDataset

setExecutable

Relation Detail

Generalization

Package es::uco::kdis::datapro::algorithm::intruder

Operation Detail

AverageAttack

chooseSelectedItems

initialize

setFillerValues

setSelectedValues

Relation Detail

Generalization

Class BandwagonAttack

Attribute Detail

dDensity

dVisibility

rgdMeanSD

rgoVisibilityColumns

rgoVisibilityMeans

Operation Detail

BandwagonAttack

chooseSelectedItems

initialize

orderArray

setFillerValues

setSelectedValues

setVisibilityColumns

Relation Detail

Generalization

Class DatasetStatistics

Attribute Detail

Operation Detail

DatasetStatistics

execute

getResult

Initialize

postexec

Relation Detail

Generalization

Class IntruderAttack

Attribute Detail

bPush

dXRand

iActualInstance

iNumAttacks

iNumFillers

iNumSelected

iSeed

iTarget

oInjection

oRand

rgoFillers

rgoSelected

Operation Detail

addAttack

chooseFillerItems

chooseSelectedItems

createRandomSetOfFiller

createSetOfFiller

execute

getMeanAndSD

getResult

Name	datatypes
Qualified Name	es::uco::kdis::datapro::datatypes

Name	InvalidValue
Qualified Name	es::uco::kdis::datapro::datatypes::InvalidValue
Visibility	public
Abstract	true
Base Classifier
Realized Interface

Name	EmptyValue
Qualified Name	es::uco::kdis::datapro::datatypes::EmptyValue
Visibility	public
Abstract	false
Base Classifier	• InvalidValue
Realized Interface

Name	MissingValue
Qualified Name	es::uco::kdis::datapro::datatypes::MissingValue
Visibility	public
Abstract	false
Base Classifier	• InvalidValue
Realized Interface

Name	NullValue
Qualified Name	es::uco::kdis::datapro::datatypes::NullValue
Visibility	public
Abstract	false
Base Classifier	• InvalidValue
Realized Interface

Name	DoubleRange
Qualified Name	es::uco::kdis::datapro::datatypes::DoubleRange
Visibility	public
Abstract	false
Base Classifier	• Range<Double>
Realized Interface

Name	exception
Qualified Name	es::uco::kdis::datapro::exception

Name	IllegalFormatSpecificationException
Qualified Name	es::uco::kdis::datapro::exception::IllegalFormatSpecificationException
Visibility	public
Abstract	false
Base Classifier	• Exception
Realized Interface

Name	NoSuchCategoryException
Qualified Name	es::uco::kdis::datapro::exception::NoSuchCategoryException
Visibility	public
Abstract	false
Base Classifier	• Exception
Realized Interface

Name	NotAddedValueException
Qualified Name	es::uco::kdis::datapro::exception::NotAddedValueException
Visibility	public
Abstract	false
Base Classifier	• Exception
Realized Interface