datapro4j
The data processing library for Java
The programmer’s guide
Revision: 1
Please, cite this document as:
J.R. Romero, J.M. Luna, S.
Ventura (2012). datapro4j: the data processing library for Java. Dept. of
Computer Science and Numerical Analysis, University of
Córdoba (Spain). Available
for download from http://www.uco.es/grupos/kdis/datapro4j
Knowledge Discovery and Intelligent Systems
University of Córdoba,
Spain
http://www.uco.es/grupos/kdis July
2012
CONTACT INFO
José Raúl Romero, PhD
Dept. Computer Science and Numerical Analysis
University of Córdoba, Spain
Email: jrromero@uco.es
Web: http://www.jrromero.net/en
PARTICIPANTS (BY
ALPHABETICAL ORDER)
• de la Torre López, José. BSc. [JTL]
• Luna, José María, MSc. [JML]
• Orozco Borrego, Mario. BSc. [MOB]
• Ramírez Quesada, Aurora. MSc. [ARQ]
PROJECT HISTORY
Version |
Date |
Description |
Participants |
0.1 |
July 2011 |
Initial version. Intruder algorithms. |
ARQ, JTL, JML, JRR |
0.2 |
September 2011 |
Strategies and columns |
MOB, JML, JRR |
0.3 |
April 2012 |
Refactoring, performance improvements and testing |
ARQ, JML, JRR |
0.4 |
Under development |
Weka wrappers for preprocessing, association, clustering and classification |
JRR |
0.5 |
Under development |
New dataset sources from relational databases and
noSQL databases |
JRR |
DOCUMENT HISTORY
Revision |
Date |
Description |
Author |
1 |
July 17, 2012 |
Initial version of
this document |
JRR |
Package
es::uco::kdis::datapro
Package es::uco::kdis::datapro::algorithm
Package es::uco::kdis::datapro::algorithm::base
Package
es::uco::kdis::datapro::algorithm::intruder
Package es::uco::kdis::datapro::algorithm::preprocessing
Package es::uco::kdis::datapro::algorithm::preprocessing::
discretization
Class EqualFrequencyDiscretization
Class EqualWidthDiscretization
Package es::uco::kdis::datapro::algorithm::preprocessing::
instance
Package es::uco::kdis::datapro::algorithm::validation
Package es::uco::kdis::datapro::dataset
Package es::uco::kdis::datapro::dataset::Column
Package
es::uco::kdis::datapro::dataset::Source
Package
es::uco::kdis::datapro::datatypes
Package
es::uco::kdis::datapro::exception
Class IllegalFormatSpecificationException
Package es.uco.kdis.datapro.algorithm.base
Package
es.uco.kdis.datapro.algorithm.preprocessing
Package
es.uco.kdis.datapro.dataset columns
Package es.uco.kdis.datapro.dataset.Source
Appendix B: Extending the
library
Package es.uco.kdis.datapro.algorithm
Package
es.uco.kdis.datapro.algorithm.base
Package es.uco.kdis.datapro.algorithm.intruder
Package
es.uco.kdis.datapro.algorithm.preprocessing
Package
es.uco.kdis.datapro.algorithm.preprocessing.discretization
Class
EqualFrequencyDiscretization
Class
EqualWidthDiscretization
Package
es.uco.kdis.datapro.algorithm.preprocessing.instance
Package es.uco.kdis.datapro.algorithm.validation
Package
es.uco.kdis.datapro.dataset
Package
es.uco.kdis.datapro.dataset.Column
Abstract class
ColumnAbstraction
Package
es.uco.kdis.datapro.dataset.Source
Package
es.uco.kdis.datapro.datatypes
Package
es.uco.kdis.datapro.exception
Class
IllegalFormatSpecificationException
Class diagram: package
overview
Class diagram: package
es.uco.kdis.datapro.algorithm.base
Class diagram: Package
es.uco.kdis.datapro.algorithm.preprocessing
Class diagram: Package
es.uco.kdis.datapro.dataset.Column
Package
es.uco.kdis.datapro.dataset.Source
Class diagram: Package
es.uco.kdis.datapro.datatypes
Class diagram: Package
es.uco.kdis.datapro.exception
This document provides class, interface, and enumeration specification for the datapro4j library. The specification provides the details of the types being modeled within the system.
The datapro4j library is conceived to provide fully support to the efficient handling of data sets from different sources and declaring different kind of data types. This task often takes too long to the Java programmer, especially in certain domains, such as analytical analysis or data mining. Notice that this library is not provided for a given application domain, just for those that require the handling of structured data in Java from diverse data sources.
Therefore, datapro4j can be used in data mining for handling inputs or preprocessing data, using both internal strategies (e.g. algorithms on instances, discretization, etc.) or external tools (e.g. Weka or any other application). It can be also used for handling outputs: for example, in migrating data to other different formats, rearrange results from external tools or algorithms, executing statistical tests, etc.
It is worth mentioning that datapro4j is conceived to be extended, adding new algorithms, data formats, column types, etc. All these aspects are independent of each other, so algorithms can be executed regardless of being introduced in diverse formats (stored in noSQL databases, as an ARFF file, or whichever).
This document is intended to define the class specification for the datapro4j library.
Copyright Š 2012 The authors (University of Cordoba, Spain)
This software was developed by
members of the Knowlegde Discovery and Intelligent Systems at the University of Córdoba, Spain. For further information on the library and modifications, please
visit the URL http://www.uco.es/grupos/kdis/datapro4j
THE SOFTWARE IS PROVIDED "AS IS",
WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.
Redistribution and use of binary forms, with or without modification, are permitted if the following conditions are met:
ˇ Redistributions of source code must retain the above copyright notice, this list of conditions and the disclaimer above.
ˇ Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
ˇ All advertising materials or publication mentioning features or use of this software must display the following acknowledgement: “This product includes software developed by the KDIS Research Group at the University of Córdoba (Spain) and its contributors.” or cite the following reference:
J.R. Romero, J.M. Luna, S. Ventura (2012). datapro4j: the data processing library for Java. Dept. of Computer
Science and Numerical Analysis, University of Córdoba (Spain). Available
for download from http://www.uco.es/grupos/kdis/datapro4j
ˇ Neither the name of the University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
ˇ Commercial use of this software or part of it is not allowed without specific prior written permission.
ˇ Licensing and conditions are subject to change without notice.
Note: At the moment this software is provided in binary form as a Java library. Source code is not provided (we plan to release the Java source code in a near future).
This document provides a list of all packages with a summary for each. Each package has a section that contains a list of its classes, interfaces and enumeration type, with a summary for each. Class and Interface contains description, summary tables, detailed member descriptions, and relation table.
Private properties are omitted. Protected properties are shown when useful for external programmers.
In the near future, this library will be updated with the following features (not necessarily in this order):
ˇ Listeners in strategies.
ˇ Graphical UI. (Some minor support is already provided).
ˇ Generation of synthetic datasets under precise constraints.
ˇ Multipart datasets: those datasets which are not possible to be fully stored in memory, so they need to be split and partially retrieved.
ˇ Different data mining support.
ˇ Wrappers for different datasets and tools.
o A wrapper for Weka is under development.
ˇ Access to different databases.
o Access thru JDBC to RDBMS engines (e.g. MySQL, Oracle) is under development.
o Access to no-sql engines (e.g. Cassandra) is under development.
ˇ More dataset formats:
o Currently, the following formats are supported: ARFF, KEEL, CSV, Excel
o The following formats are under development: XRFF
The library base package. The software is mainly divided into three different components:
ˇ Dataset and columns. The logical abstract representation of a dataset and its attributes.
ˇ Dataset and sources. The physical representation of a dataset, serialized in files, stored in databases or any other device.
ˇ Dataset and strategies. Any algorithm running on a single dataset, set of datasets or column.
Name |
datapro |
Qualified Name |
es::uco::kdis::datapro |
Only those public strategies are described here. Developers can easily provide their own strategies.
Figure
1. Package es.uco.kdis.datapro.algorithm
Name |
algorithm |
Qualified Name |
es::uco::kdis::datapro::algorithm |
Figure
2. Package es.uco.kdis.datapro.algorithm.base
Name |
base |
Qualified Name |
es::uco::kdis::datapro::algorithm::base |
This class represents a generic strategy.
Strategies are a well-known design pattern, where algorithms are encapsulated into classes. Strategies should be
executed using either a sequential or a step-by-step process. In
general, every strategy is executed according to the
following sequence:
ˇ
Creation:
the strategy constructor should collect all the parameters required by the algorithm to be initialized and executed for the first time. Build as many constructors as required.
ˇ
Initialization:
the method initialize() implements any preprocessing step required to
the algorithm to be executed. This preprocessing is not a part of
the algorithm itself but it should be executed for the first time that the algorithm is
invoked.
ˇ
Execution:
the method execution() runs the algorithm once using the parameters introduced when the constructor was invoked, and initialized afterwards. If the algorithm has finished and it could not be
invoked any more, then the method setExecutable(false) should be called. On the contrary, the execution is
allowed until the stop criteria are fulfilled.
ˇ
Stop criteria: the method isExecutable returns true if the algorithm can be executed once more over the dataset; false, otherwise.
ˇ
Post-execution: Any post-processing step has to be implemented by the method postexec().
ˇ
Result collection: Final results are collected from the dataset, if changed, and returned from the method getResult().
Figure
3. Class DatasetStrategy
Name |
DatasetStrategy |
Qualified Name |
es::uco::kdis::datapro::algorithm::base::DatasetStrategy |
Visibility |
public |
Abstract |
true |
Base Classifier |
|
Realized Interface |
|
Execution flag. This is protected only for inheritance purposes, and should be never directly modified.
Type |
boolean |
Default Value |
true |
Visibility |
protected |
Multiplicity |
|
Dataset used by the strategy.
Type |
Dataset |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
This method is invoked to execute the strategy.
Type |
void |
Visibility |
public |
Is Abstract |
true |
Parameter |
|
Getter method for the dataset attribute.
Type |
Dataset |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
This method returns an object comprising the resulting Object of the process
Type |
Object |
Visibility |
public |
Is Abstract |
true |
Parameter |
|
This method calls the Initialization process of the strategy.
Type |
void |
Visibility |
public |
Is Abstract |
true |
Parameter |
|
This method returns true if the strategy is in an executable state.
Type |
boolean |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method should be invoked, if required, after the strategy execution.
Type |
void |
Visibility |
public |
Is Abstract |
true |
Parameter |
|
This method sets the dataset to be used by the strategy.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
• inout data : Dataset |
This method sets the current executable state of the strategy.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
• in bExecutable : boolean |
Name |
|
Related Element |
• EqualFrequencyDiscretization |
Name |
|
Related Element |
• EqualWidthDiscretization |
Name |
|
Related Element |
• MDLPDiscretize |
Name |
|
Related Element |
• RemoveDuplicates |
Name |
|
Related Element |
• IntruderAttack |
Name |
|
Related Element |
• KFolds |
Name |
|
Related Element |
• RemovePercentage |
Name |
|
Related Element |
• DatasetStatistics |
Figure
4. Package
es.uco.kdis.datapro.algorithm.intruder
Name |
intruder |
Qualified Name |
es::uco::kdis::datapro::algorithm::intruder |
This class implements the Average Attack. This attack strategy sets the maximum value (push attack) or the minimum value (nuke attack) to the target item. The filler items are selected randomly and their values are also randomly chosen over a Normal Distribution, using the mean and standard deviation of the own item.
For a further description see the following paper:
B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.
Name |
AverageAttack |
Qualified Name |
es::uco::kdis::datapro::algorithm::intruder::AverageAttack |
Visibility |
public |
Abstract |
false |
Base Classifier |
• IntruderAttack |
Realized Interface |
|
Parameterized Constructor.
• oDataset The original dataset
• iNumAttacks The number of attack instances
• bPush The attack type (true, push; false, nuke)
• iTarget The target item (The column attribute/item index)
• iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size
• dXRand The possibility of choose an
item as selected/filler item
• iSeed The random seed
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• in bPush : boolean • in dXRand : double • in iNumAttacks : int • in iNumFillers : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset |
The Average Attack does not use the selected item set.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Initialization method.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
In the Average Attack, the values for filler items must be randomly generated by a Normal Distribution, using the mean and standard deviation of each item.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
The Average Attack does not use the selected item set.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• IntruderAttack |
This class implements the Bandwagon Attack. This attack strategy sets the maximum value (push attack) to the target item. Then, a set of items, named selected items, are chosen between the most visibility items.
The visibility items are those having a high mean and high evaluation density. For a further description see the following paper:
B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.
Name |
BandwagonAttack |
Qualified Name |
es::uco::kdis::datapro::algorithm::intruder::BandwagonAttack |
Visibility |
public |
Abstract |
false |
Base Classifier |
• IntruderAttack |
Realized Interface |
|
The density threshold, i.e. the minimum number of values in the column.
Type |
double |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
The visibility threshold, i.e., the possibility of choose an item to act as selected item.
Type |
double |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
It stores the mean and standard deviation of the overall dataset.
Type |
Double |
Default Value |
new ArrayList<Double>() |
Visibility |
protected |
Multiplicity |
0..* |
The array of columns whose visibility exceed the thresholds dXVisibility and dXDensity.
Type |
Integer |
Default Value |
new ArrayList<Integer>() |
Visibility |
package |
Multiplicity |
0..* |
The array of mean columns whose visibility exceed the thresholds dXVisibility and dXDensity.
Type |
Double |
Default Value |
new ArrayList<Double>() |
Visibility |
package |
Multiplicity |
0..* |
Parameterized Constructor:
• oDataset The original dataset
• iNumAttacks The number of attack instances
• iTarget The target item (The column attribute/item index)
• iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size
• iNumSelected The size of
selected item set
• dVisibility The visibility threshold (absolute value of
column mean).
• dDensity The density threshold (absolute value of
instances without counting null, empty or
missing values in the column)
• dXRand The possibility of choose an
item as filler item
• iSeed The random seed
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• in dDensity : double • in dVisibility : double • in dXRand : double • in iNumAttacks : int • in iNumFillers : int • in iNumSelected : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset |
Create the set of selected items. The size is prefixed by iNumSelected property.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Initialization method for the strategy.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Order the columns using their mean as comparative metric. This method implements the QuickSort algorithm.
• iInit The initial position of
the array
• iEnd The end position in the array
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
• in iEnd : int • in iInit : int |
In the Bandwagon Attack, the values for filler items must be randomly generated by a Normal Distribution, using the mean and standard deviation of the overall dataset.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Set the values of selected items. In the Bandwagon Attack, each selected item has the maximum value.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Select the columns that exceed the visibility and density threshold.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• ReverseBandwagonAttack |
Name |
|
Related Element |
• IntruderAttack |
Name |
DatasetStatistics |
Qualified Name |
es::uco::kdis::datapro::algorithm::intruder::DatasetStatistics |
Visibility |
public |
Abstract |
false |
Base Classifier |
• DatasetStrategy |
Realized Interface |
|
All attributes are private.
Constructor. A parameter is required:
• data Dataset over which the statistical strategy will be executed.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout data : Dataset |
It executes the algorithm.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
It returns the mean and SD in form of an ArrayList of Double values.
Type |
ArrayList<Double> |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Inialization/Pre-processing method for the strategy.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• DatasetStrategy |
IntruderAttack is the abstract base class for all the intruder attack algorithms. This class represents a generic attack used to alter the content of a dataset. It extends DatasetStrategy, whose methods are implemented and adapted to a general intruder strategy.
For a further description see the paper:
B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.
Name |
IntruderAttack |
Qualified Name |
es::uco::kdis::datapro::algorithm::intruder::IntruderAttack |
Visibility |
public |
Abstract |
true |
Base Classifier |
• DatasetStrategy |
Realized Interface |
|
bPush represents the version of the algorithm (true, for push attack; false for nuke attack).
Type |
boolean |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
dXrand represents the possibility of choosing an itemm(attribute) as filler item.
Type |
double |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
iActualInstance represents the dataset instance modified by the attack.
Type |
Int |
Default Value |
|
Visibility |
Protected |
Multiplicity |
|
iNumAttacks represents the number of attack instances that will be generated.
Type |
int |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
iNumFillers is the number of filler items, -1 if the filler item set size is randomly chosen.
Type |
int |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
iNumSelected is the number of selected items, -1 if the selected item set size is randomly chosen.
Type |
Int |
Default Value |
|
Visibility |
Protected |
Multiplicity |
|
iSeed is the seed for the oRand object.
Type |
Int |
Default Value |
|
Visibility |
Protected |
Multiplicity |
|
iTarget is the target attribute of the attack.
Type |
int |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
oInjection stores the attack instances.
Type |
Dataset |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
oRand represents a random object.
Type |
Random |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
rgoFillers is the set of selected items.
Type |
ColumnAbstraction |
Default Value |
new ArrayList<ColumnAbstraction>() |
Visibility |
protected |
Multiplicity |
0..* |
rgoSelected is the set of selected items.
Type |
ColumnAbstraction |
Default Value |
new ArrayList<ColumnAbstraction>() |
Visibility |
protected |
Multiplicity |
0..* |
Add a new instance (all items set to missed value) to the injection.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Select the set of filler items. This set is common for all the intruder attack algorithms.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Select the set of selected items. The selection process is part of a specific intruder attack algorithm.
Type |
void |
Visibility |
protected |
Is Abstract |
true |
Parameter |
|
Select a random set of columns to act as filler items. The set size is also randomly selected. It returns the array of dataset columns that will act as filler items.
Type |
ArrayList<ColumnAbstraction> |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Select a random set of columns to act as filler items. The set size is prefixed by iNumFiller property. It returns the array of dataset columns that will act as filler items.
Type |
ArrayList<ColumnAbstraction> |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Implements the strategy of attack algorithms.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Calculate the mean and standard deviation of the overall dataset. It returns an array with two elements, mean and standard deviation.
Type |
ArrayList<Double> |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Return the dataset injection created. It returns the object comprising the injection after the attack.
Type |
Object |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Initialize the algorithm to prepare the execution.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method returns a true value if the rgoSelected contains a column named as sName parameter, false otherwise.
ˇ sName The name of the column to be searched. It returns True if the column exists, false if not.
Type |
boolean |
Visibility |
protected |
Is Abstract |
false |
Parameter |
ˇ
inout sName: String ˇ
ˇ
|
Post-processing after the execute method.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method assigns the correct value for each filler item. It depends on the intruder attack algorithm.
Type |
void |
Visibility |
protected |
Is Abstract |
true |
Parameter |
|
Assign the maximum value to the target item.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Assign the minimum value to the target item.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
The selected items value generation process. It is also depends on the specific intruder attack algorithm.
Type |
void |
Visibility |
protected |
Is Abstract |
true |
Parameter |
|
Name |
|
Related Element |
• AverageAttack |
Name |
|
Related Element |
• DatasetStrategy |
Name |
|
Related Element |
• RandomAttack |
Name |
|
Related Element |
• LoveHateAttack |
Name |
|
Related Element |
• BandwagonAttack |
Name |
|
Related Element |
• SegmentAttack |
This class implements the Love/Hate Attack. This attack strategy sets the maximum value (push attack) or the minimum value (nuke attack) to the target item. The filler items are selected randomly and their values are assigned in the opposite sense of the target item.
For a further description see the paper:
B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.
Name |
LoveHateAttack |
Qualified Name |
es::uco::kdis::datapro::algorithm::intruder::LoveHateAttack |
Visibility |
public |
Abstract |
false |
Base Classifier |
• IntruderAttack |
Realized Interface |
|
The Love/Hate Attack does not use the selected items.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Initialization method.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Parameterized Constructor:
• oDataset The original dataset
• iNumAttacks The number of attack instances
• bPush The attack type (true, push; false, nuke)
• iTarget The target item (The column attribute/item index)
• iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size
• dXRand The possibility of choose an
item as selected/filler item
• iSeed The random seed
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• in
bPush : boolean • in dXRand : double • in iNumAttacks : int • in iNumFillers : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset |
In the Love/Hate Attack, the values for filler items must be assigned in the opposite sense of the type of attack. If it is a push attack, all the filler items will be set to minimum value; if it is a nuke attack, all the filler items will be set to maximum value.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
The Love/Hate Attack does not use the selected items.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• IntruderAttack |
This class implements the Random Attack. This attack strategy sets the maximum value (push attack) or the minimum value (nuke attack) to the target item. The filler items are selected randomly and their values are also chosen with a Normal Distribution, using the global dataset mean and standard deviation.
For a further description read the article:
B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.
Name |
RandomAttack |
Qualified Name |
es::uco::kdis::datapro::algorithm::intruder::RandomAttack |
Visibility |
public |
Abstract |
false |
Base Classifier |
• IntruderAttack |
Realized Interface |
|
All attributes are private.
The Random Attack does not use the selected items.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Initialization method.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Parameterized Constructor:
• oDataset The original dataset
• iNumAttacks The number of attack instances
• bPush The attack type (true, push; false, nuke)
• iTarget The target item (The column attribute/item index)
• iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size
• dXRand The possibility of choose an
item as selected/filler item
• iSeed The random seed
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• in bPush : boolean • in dXRand : double • in iNumAttacks : int • in iNumFillers : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset |
In the Random Attack, the values for filler items must be randomly generated by a Normal Distribution, using the mean and standard deviation of the dataset.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
The Random Attack does not use the selected items.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• IntruderAttack |
This class implements the Reverse Bandwagon Attack. This attack strategy sets the minimum value (nuke attack) to the target item. Then, a set of items, named selected items, are chosen between the less visibility items. The visibility items are those having a low mean and high evaluation density.
For a better description read the article:
B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.
Name |
ReverseBandwagonAttack |
Qualified Name |
es::uco::kdis::datapro::algorithm::intruder::ReverseBandwagonAttack |
Visibility |
public |
Abstract |
false |
Base Classifier |
• BandwagonAttack |
Realized Interface |
|
Create the set of selected items. The size is prefixed by iNumSelected property.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Initialization method.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Parameterized Constructor:
• oDataset The original dataset
• iNumAttacks The number of attack instances
• iTarget The target item (The column attribute/item index)
• iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size
• iNumSelected The size of
selected item set: -1 for randomly size, >0 for fixed size
• dXVisibility The visibility threshold
• dXDensity The density threshold
• dXRand The possibility of choose an
item as selected/filler item
• iSeed The random seed
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• in dXDensity : double • in dXRand : double • in dXVisibility : double • in iNumAttacks : int • in iNumFillers : int • in iNumSelected : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset |
Set the values of selected items. In the Reverse Bandwagon Attack, each selected item has the minimum value.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Select the columns that exceed the visibility and density threshold.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• BandwagonAttack |
This class implements the Segment Attack. This attack strategy sets the maximum value (push attack) to the target item. Then, a set of selected items (the segment) are set to the maximum value. Finally, a set of filler items are randomly chosen and the minimum value are set to their.
For a better description read the article:
B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. 7(4):1-23, 2007.
Name |
SegmentAttack |
Qualified Name |
es::uco::kdis::datapro::algorithm::intruder::SegmentAttack |
Visibility |
public |
Abstract |
false |
Base Classifier |
• IntruderAttack |
Realized Interface |
|
rgdMeanSDstores the mean and standard deviation of the overall dataset.
Type |
Double |
Default Value |
new ArrayList<Double>() |
Visibility |
protected |
Multiplicity |
0..* |
Create the segment, the set of selected item, with the information given in rgsNamesOfSelected. It returns the array of dataset columns that will act as selected items.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Initialization method.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Parameterized Constructor:
• oDataset The original dataset
• iNumAttacks The number of attack instances
• iTarget The target item (The column attribute/item index)
• iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size
• rgsNamesOfSelected The array with the names of the columns that will act as
selected items (the segment)
• dXRand The possibility of choose an
item as selected/filler item
• iSeed The random seed
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• in dXRand : double • in iNumAttacks : int • in iNumFillers : int • in iSeed : int • in iTarget : int •
inout oDataset : Dataset •
inout rgsNamesOfSelected : ArrayList<String> |
Set the value for filler items. In the Segment Attack, the minimum value is assigned.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Set the values for the selected items. In the Segment Attack, the maximum value is assigned.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• IntruderAttack |
Figure
5. Package
es.uco.kdis.datapro.algorithm.preprocessing
Name |
preprocessing |
Qualified Name |
es::uco::kdis::datapro::algorithm::preprocessing |
Figure
6. Package
es.uco.kdis.datapro.algorithm.preprocessing.discretization
Name |
discretization |
Qualified Name |
es::uco::kdis::datapro::algorithm::preprocessing::discretization |
Equal-width discretization of a given numerical/integer column of the dataset. A RangeColumn is returned. Notice that this class is inherited from EqualFrequencyDiscretization.
Figure
8. Class EqualWidthDiscretization
Name |
EqualWidthDiscretization |
Qualified Name |
es::uco::kdis::datapro::algorithm::preprocessing::discretization::EqualWidthDi scretization |
Visibility |
public |
Abstract |
false |
Base Classifier |
• DatasetStrategy |
Realized Interface |
|
iBins is the number of bins.
Type |
int |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
The column to be discretized.
Type |
NumericalColumn |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
The column returned as result.
Type |
RangeColumn |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
The name of the column to be discretized.
Type |
String |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
The name of the resulting column.
Type |
String |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
This (protected) method creates a new RangeColumn taking both the intervals given as parameter and the values comprised by the original numerical column.
• aoRanges Array of
intervals
• sName Name of the new column
It returns the resulting RangeColumn.
Type |
RangeColumn |
Visibility |
protected |
Is Abstract |
false |
Parameter |
• inout aoRanges : DoubleRange • inout sName : String |
Parameterized Constructor:
• oDataset The dataset to be processed.
• iBins The number of bins.
• sColName The name of the column to be processed.
• sResName The name of the resulting column .
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iBins : int • inout oDataset : Dataset • inout sColName : String • inout sResName : String |
This method runs the discretization process. Firstly, it calculates the cut-points and sets the range intervals.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
The discretized RangeColumn is returned.
Type |
Object |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
The initialization method. Types of the column and its values are checked.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Not required.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• EqualFrequencyDiscretization |
Name |
|
Related Element |
• DatasetStrategy |
Equal-frequency discretization of a given numerical/integer column of the dataset. A RangeColumn is returned.
Figure
7. Class EqualFrequencyDiscretization
Name |
EqualFrequencyDiscretization |
Qualified Name |
es::uco::kdis::datapro::algorithm::preprocessing::discretization::EqualFrequen
cyDiscretization |
Visibility |
public |
Abstract |
false |
Base Classifier |
• DatasetStrategy • EqualWidthDiscretization |
Realized Interface |
|
All attributes are private.
Notice that this class is inherited from EqualWidthDiscretization.
Parametrized constructor.
Parameters:
ˇ iBins Number of bins to be created
ˇ oDataset Source dataset containing the column to be discretized
ˇ sColName Name of the source column
ˇ sResName Name of the resulting Range column
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iBins : int • inout oDataset : Dataset • inout sColName : String • inout sResName : String |
This method makes the discretization by frequency of the column passed as parameter.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• DatasetStrategy |
Name |
|
Related Element |
• EqualWidthDiscretization |
Figure
9. Class MDLPDiscretize
Name |
MDLPDiscretize |
Qualified Name |
es::uco::kdis::datapro::algorithm::preprocessing::discretization::MDLPDiscreti ze |
Visibility |
public |
Abstract |
false |
Base Classifier |
• DatasetStrategy |
Realized Interface |
|
All attributes are private.
This method runs the discretization process
following the MDLP algorithm.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
It returns the discretized dataset.
Type |
Object |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
The initialize() strategy method. It takes the whole dataset, and distribute each column in a LinkedList that contains a double array where the first value is the concrete value of the column, the second value is the label associated.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Constructor with parameters:
• oDataset source dataset
Note: class labels are supposed to be in the last column of the dataset.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oDataset : Dataset |
The postexec() strategy method
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• DatasetStrategy |
Figure
10. Package
es.uco.kdis.datapro.algorithm.preprocessing.instance
Name |
instance |
Qualified Name |
es::uco::kdis::datapro::algorithm::preprocessing::instance |
This class modifies the content of a Dataset by removing duplicate instances from this dataset.
Figure
11. Class RemoveDuplicates
Name |
RemoveDuplicates |
Qualified Name |
es::uco::kdis::datapro::algorithm::preprocessing::instance::RemoveDuplicates |
Visibility |
public |
Abstract |
false |
Base Classifier |
• DatasetStrategy |
Realized Interface |
|
All attributes are private.
Execution method.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
It returns the clean dataset.
Type |
Object |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Initialize the algorithm to prepare the execution.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Post-processing.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Parameterized Constructor:
• oDataset The source dataset to
work
with.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oDataset : Dataset |
Name |
|
Related Element |
• DatasetStrategy |
This class modifies the content of a dataset by removing a percentage of its instances.
Figure
12. Class RemovePercentage
Name |
RemovePercentage |
Qualified Name |
es::uco::kdis::datapro::algorithm::preprocessing::instance::RemovePercentag
e |
Visibility |
public |
Abstract |
false |
Base Classifier |
• DatasetStrategy |
Realized Interface |
|
RANDOM mode, when instances to be removed are randomly selected.
Type |
int |
Default Value |
0 |
Visibility |
public |
Multiplicity |
|
FROMINIT mode, when instances to be removed are taken from the beginning of the column.
Type |
int |
Default Value |
1 |
Visibility |
public |
Multiplicity |
|
FROMEND mode, when instances to be removed are taken from the end of the column.
Type |
int |
Default Value |
2 |
Visibility |
public |
Multiplicity |
|
oRnd is the random generator object.
Type |
Random |
Default Value |
new Random() |
Visibility |
public |
Multiplicity |
|
Execute method.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Return the resulting dataset from the strategy process.
Type |
Object |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Initialize the algorithm to prepare the execution.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Post-processing method.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Parameterized Constructor:
• oDataset The source dataset
• iMode The mode of removal
• dPercentage The percentage of instances (in [0,1]) to remove from
the dataset
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• in dPercentage : double • in iMode : int • inout oDataset : Dataset |
Name |
|
Related Element |
• DatasetStrategy |
Figure
13. Package
es.uco.kdis.datapro.algorithm.validation
Name |
validation |
Qualified Name |
es::uco::kdis::datapro::algorithm::validation |
This class implements the strategy that calculates the different partitions of the dataset using the KFolds algorithm.
Figure 14. Class es.uco.kdis.datapro.algorithm.validation.KFolds
Name |
KFolds |
Qualified Name |
es::uco::kdis::datapro::algorithm::validation::KFolds |
Visibility |
public |
Abstract |
false |
Base Classifier |
• DatasetStrategy |
Realized Interface |
|
All attributes are private.
It runs the KFolds algorithm. After the execution, the algorithm is not executable anymore.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method returns the list containing the resulting dataset partitions.
Type |
List<Object> |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method initializes the algorithm. The instances are sorted as a HashMap by categories.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Parameterized constructor. Notice that the class column is supposed to be the last column in the dataset:
• oDataset Source dataset
• iNumberOfPartitions Number of partitions to
be built
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iNumberOfPartitions : int • inout oDataset : Dataset |
Parameterized constructor. Notice that the class column is supposed to be the last column in the dataset:
• oDataset Source dataset
• iNumberOfPartitions Number of partitions to
be built
• iSeed If the programmer wants to reproduce a previous
partition, he can indicate a given seed to the process. Otherwise, the seed is
randomly selected.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iNumberOfPartitions : int • inout oDataset : Dataset |
Not required.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• DatasetStrategy |
Figure
15. Package es.uco.kdis.datapro.dataset
Name |
dataset |
Qualified Name |
es::uco::kdis::datapro::dataset |
Dataset is the abstract base class for all the different types of dataset sources. This class fills the gap between the physical dataset (stored in a file, database, etc.) and its logical handling, where the access to attributes/columns and processing methods is provided.
Figure
16. Class Dataset
Name |
Dataset |
Qualified Name |
es::uco::kdis::datapro::dataset::Dataset |
Visibility |
public |
Abstract |
true |
Base Classifier |
|
Realized Interface |
|
iCursor refers to the row being pointed in the dataset by the InstanceIterator.
Type |
int |
Default Value |
|
Visibility |
Protected |
Multiplicity |
|
rgoColumns is the list of columns that comprise the dataset.
Type |
ColumnAbstraction |
Default Value |
|
Visibility |
protected |
Multiplicity |
0..* |
For binary columns, it contains the list of values that will be interpreted as False when reading from the physical dataset. Writing will be performed using the first element in the list.
Type |
String |
Default Value |
|
Visibility |
Protected |
Multiplicity |
0..* |
For binary columns, it contains the list of values that will be interpreted as True when reading from the physical dataset. Writing will be performed using the first element in the list.
Type |
String |
Default Value |
|
Visibility |
protected |
Multiplicity |
0..* |
For range columns, sOpenRangeDelimiter stores the symbol(s) that open the numerical range, right before the minimum value: e.g., '[' for [2,3]. This is used during the reading and writing of the physical dataset.
Type |
String |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
For range columns, sSeparationRangeDelimiter stores the symbol(s) that separate the minimum and maximum values in a numerical range: e.g., ',' for [2,3]. This value is only used during the reading and writing of the physical dataset.
Type |
String |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
For range columns, sCloseRangeDelimiter stores the symbol(s) that serves to close the numerical range, right after the maximum value: e.g., ']' for [2,3]. This is only used during the reading and writing of the physical dataset.
Type |
String |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
tiplicity
sEmptyValue stores the string that will represent an empty value in the dataset file.
Type |
String |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
sMissedValue stores the string that will represent a missing value in the dataset file.
Type |
String |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
sNullValue stores the string that will represent a null value in the dataset file.
Type |
String |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
The name of the dataset.
Type |
String |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
A set of column values are inserted into the dataset structure. Notice that instance duplication is not checked.
Parameters:
• sColumnFormat String that specifies the types of the columns to be
added. Types depend
on the specific dataset.
Exceptions:
• IOException
• IllegalFormatSpecificationException
• NotAddedValueException
• IndexOutOfBoundsException
Type |
void |
Visibility |
protected |
Is Abstract |
true |
Parameter |
• inout sColumnFormat : String |
Insert a column abstraction given by parameter in the last position of the list of columns of the dataset
Parameter:
• oColumn: Column abstraction to
be added
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oColumn : ColumnAbstraction |
Insert a column abstraction in a given position of the list of dataset columns.
Parameters:
• oColumn: Column abstraction to be inserted
• iIndex: Position index where the column element is
added
in the list. The rest of column items will be shifted one position to the right.
Exceptions:
• UnsupportedOperationException
• ClassCastException
• NullPointedException
• IllegalArgumentException
• IndexOutOfBoundsException
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout iIndex : int • inout oColumn : ColumnAbstraction |
Create a new dataset exactly with the same metadata and column structure. However, only the structure is copied, since instances from the original dataset are not added to the new one.
It returns the empty cloned dataset.
Type |
Dataset |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Abstract method that serves to close the physical dataset source.
Exceptions:
• IOException
Type |
void |
Visibility |
protected |
Is Abstract |
true |
Parameter |
|
This method creates a new dataset exactly with the same metadata, column structure and data than the original dataset. In this case, instances from the original dataset are also copied to the new one.
A copy of the dataset is returned.
Type |
Dataset |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This is the default constructor of this class. By default, it sets the following parameters to their default values:
• sMissedValue: "?"
• sNullValue: "?"
• sEmptyValue: "?"
• sOpenRangeDelimiter: "["
• sSeparationRangeDelimiter: ","
• sCloseRangeDelimiter: "]"
Notice that using these symbols is not mandatory for reading/writing, as its applicability depends on the
specific implementation of
each
source dataset.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method looks for a column abstraction by its index in the column list. Notice that indexes can change when one column is added or removed to/from intermediate positions.
Parameter:
• iIndex: Index of the queried column.
It returns a reference to the column abstraction queried.
Type |
ColumnAbstraction |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int |
This method returns the first column instance found having the name required as parameter. Parameter:
• sName: The name of the column queried (no case-sensitive)
It returns the column abstraction class that accesses to the column required by its name.
Type |
ColumnAbstraction |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String |
Getter method for the private property rgoColumns, which comprises the array of column abstractions in the dataset.
Type |
List<ColumnAbstraction> |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Getter method for the private property sEmptyValue, which comprises the String that represents the symbol for the empty value in the dataset source. Notice that each specific dataset implementation can use, or not, this property accordingly.
Type |
String |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Given a column abstraction, it searches for the index that this column occupies in the array of column abstractions in the dataset.
Parameter:
• oCol: Column to be located.
It returns the index of the column abstraction passed as parameter; -1, otherwise.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oCol : ColumnAbstraction |
Getter method for the private property sMissingValue, which comprises the String that represents the symbol for the missing value in the dataset source. Notice that each specific dataset implementation can use, or not, this property accordingly.
Type |
String |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Getter method for the private property sName, which represents the name given to the dataset.
Type |
String |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Getter method for the private property sNullValue, which comprises the String that represents the symbol for the null value in the dataset source. Notice that each specific dataset implementation can use or not this property accordingly.
Type |
String |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Getter method for the private property iNumberOfDecimals, which indicates the number of decimal digits used when writing numerical columns in dataset sources. Notice that this value can be used accordingly by each specific dataset source.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method gets a list of the three values used to demarcate a range, comprising the sOpenRangeDelimiter, sSeparationRangeDelimiter and sCloseRangeDelimiter. Notice that each specific dataset source could make use of these values accordingly.
Type |
ArrayList<String> |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Getter method for the private property rgoValidBinaryFalseValues: the list of strings that are interpreted as false when reading and writing Boolean values in a dataset source. Notice that its specific use depends on the implementation made for each dataset source.
Type |
ArrayList<String> |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Getter method for the private property rgoValidBinaryTrueValues: the list of strings that are interpreted as true when reading and writing Boolean values in a dataset source. Notice that its specific use depends on the implementation made for each dataset source.
Type |
ArrayList<String> |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method merges two datasets by adding the dataset passed as parameter to the current one. Parameters:
• oDSInjected: The dataset to be added. Notice that this dataset must contain the same number and type of columns than the dataset object this.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oDSInjected : Dataset |
This method merges two datasets by adding the dataset passed as parameter to the dataset object this.
Parameters:
ˇ oDataset: The dataset to be added.
ˇ sColumnFormat: Sometimes the target dataset contains more columns than the source dataset. For those cases, the columns to be added can be explicitly specified. This parameter is a String that indicates the columns to be added. Each character in the String matches to a column in the target dataset. The String may comprise some of the following characters:
o x: Include this column
o %: Skip this column.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oDataset : Dataset • inout sColumnFormat : String |
Abstract protected method. This method just opens the source dataset and initializes the row cursor to the first row of data. However, each specific dataset class is responsible for its implementation, and thus defining its real scope, according to its specific properties.
Notice that each type of datasets will provide specific methods to process the full dataset. For example, file datasets provide the method readDataset.
Exceptions:
• FileNotFoundException
• IOException
• IllegalFormatSpecificationException
Type |
void |
Visibility |
protected |
Is Abstract |
true |
Parameter |
|
This method removes a column from the dataset. Notice that column indexes can be modified (decreased) for the rest of columns. The column removed is returned.
Parameter:
• iIndex: Position index where the column to be removed is located.
Exceptions:
• UnsupportedOperationException
• IndexOutOfBoundsException
Type |
ColumnAbstraction |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int |
Setter method for the property rgoColumns. Even when it is a public method, notice that it should be used very carefully, mainly for those cases when the replacement of the entire set of columns is mandatory. To add or remove a single column, or just a set of them, use instead the methods addColumn and removeColumn.
Parameter:
• rgoCols: The entire list of columns in the dataset.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout rgoCols : List<ColumnAbstraction> |
Setter method for the private property sEmptyValue, which comprises the String that represents the symbol for the null value in the dataset source. Notice that each specific dataset implementation can make its own use of this property accordingly.
Parameters:
• sEmptyValue The symbol/string representing an empty value in the
dataset
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sEmptyValue : String |
Setter method for the private property sMissingValue, which comprises the String that represents the symbol for the missing value in the dataset source. Notice that each specific dataset implementation can make its own use of this property accordingly.
Parameters:
• sMissingValue The symbol/string representing a missing value in the dataset
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sMissingValue : String |
Setter method for the private property sName, which represents the name of the dataset. Parameter:
• sName: The name of
the dataset.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String |
Setter method for the private property sNullValue, which comprises the String that represents the symbol for the null value in the dataset source. Notice that each specific dataset implementation can make its own use of this property accordingly.
Parameters:
• sNullValue The symbol/string representing a null value in the dataset
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sNull : String |
Setter method for the private property iNumberOfDecimals, which represents the number of decimals that the programmer wants to set for numerical values. Notice that the specific applicability of this attribute directly depends on the specific implementation of the dataset source.
Parameter:
• iNum: The number of
decimal digits that will be considered when saving numerical values.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iNum : int |
This method sets the symbols that will serve as range delimiter. Notice that the specific applicability of these attributes directly depends on the specific implementation of the dataset source.
Parameters:
• sInitial: The symbol/string that represents the starting delimiter.
• sSeparator: The symbol/string that represents the value separator.
• sEnding: The symbol/string that represents the ending delimiter.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sEnding : String • inout sInitial : String • inout sSeparator : String |
Setter method of the list rgoValidBinaryFalseValues, which contains the set of strings that represent a False boolean value in the dataset. Notice that only the first value of the list should be used for serialization (reading or writing). In any case, it depends on the specific implementation of each dataset source.
Parameter:
• rgoValidBinaryFalseValues: The list of
values that will be interpreted as False.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout rgoValidBinaryFalseValues :
ArrayList<String> |
Setter method of the list rgoValidBinaryTrueValues, which contains the set of strings that represent a True boolean value in the dataset. Notice that only the first value of the list should be used for serialization (reading or writing). In any case, it depends on the specific implementation of each dataset source.
Parameter:
• rgoValidBinaryTrueValues: The list of
values that will be
interpreted as True.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout rgoValidBinaryTrueValues
: ArrayList<String> |
This method sets both the list of strings that will represent a True boolean value, and the list of strings that will represent a False boolean value in the dataset. This functionality could be also done by invoking seldom specific methods.
Parameters:
• rgoFalseList: A list with
the valid False symbols/strings
• rgoTrueList: A list with the valid True symbols/strings
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout rgoFalseList : ArrayList<String> • inout rgoTrueList : ArrayList<String> |
This method swaps two columns in the list of columns of the dataset. It searches for both columns, and swaps its positions, and thus both structure and data.
Parameters:
• oColumn1: The first column to swap.
• oColumn2: The second column to swap.
Exceptions:
• ColumnAbstraction
• UnsupportedOperationException
• ClassCastException
• NullPointedException
• IllegalArgumentException
• IndexOutOfBoundsException
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oColumn1 : ColumnAbstraction • inout oColumn2 : ColumnAbstraction |
Name |
rgoColumns |
Related Element |
• ColumnAbstraction |
Name |
|
Related Element |
• InstanceIterator |
Name |
|
Related Element |
• FileDataset |
This abstract class represents a dataset when its source is extracted from a file. It includes the specific methods required to handle with datasets in form of files.
Figure
17. Class FileDataset
Name |
FileDataset |
Qualified Name |
es::uco::kdis::datapro::dataset::FileDataset |
Visibility |
public |
Abstract |
true |
Base Classifier |
• Dataset |
Realized Interface |
|
oBufferedReader is the buffer used to read the file.
Type |
BufferedReader |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
sCommentedValue stores the string that will indicate the beginning of a comment line in the dataset file, if this line has to be omitted from the processing.
Type |
String |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
sFileName is the name of the file source that contains the dataset.
Type |
String |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
sSeparationSymbol stores the symbol/string that indicates the separator between values of the same instance-row (i.e., a comma, a line of the dataset file, etc).
Type |
String |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
This method creates a new dataset exactly with the same type and column structure than the original. Instances from the original dataset are not copied. It returns a new Dataset instance.
Type |
Dataset |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method clones the dataset and fills its content with the instances extracted from the original. Create a new dataset exactly with the same type, column structure and data. It returns the copied Dataset instance.
Type |
Dataset |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Default constructor. Notice that the following symbols are used by default:
• sCommentValue: "%"
• sSeparationSymbol: ","
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This constructor receives the name of the file as parameter. The following symbols are used as default:
• sCommentValue: "%"
• sSeparationSymbol: ","
Parameter:
• sFileName: The filename of
the dataset source.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sFileName : String |
Getter method of the property sCommentValue.
Type |
String |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Getter method of the filename of the dataset source.
Type |
String |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Getter method of the property sSeparationSymbol.
Type |
String |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Implementations of this abstract method will read the dataset from the file specified by the constructor.
Parameters:
• sContentFormat: String that specifies the reading format of
the dataset file. Construct the
string using a sequence of control tokens:
o % to omit a line (only one line).
o %name to read the name of columns (only one line).
o %col to read data (zero, one or more lines).
Example: the string “%%%col%%name”
indicates that the first two lines must be omitted, then data is
read and, finally, the last line will contain the column names.
• sColumnFormat: A String that contains an ordered sequence of tokens that determine the data type of each column to be read. Use the following tokens:
o s: Nominal column
o f: Real column
o c: Categorical column
o b: Binary column
o i: Integer column
o %: Skip this column (the column skipped is
not
processed)
Additionally, notice that other tokens can be considered depending of the
specific dataset
source (e.g., d for columns of
type date).
Exceptions:
• FileNotFoundException
• IOException
• IllegalFormatSpecificationException
• NotAddedValueException
• IndexOutOfBoundsException
Type |
void |
Visibility |
public |
Is Abstract |
true |
Parameter |
• inout sColumnFormat : String • inout sContentFormat : String |
Setter method of the property sCommentValue.
Parameter:
• sComment: The token/string indicating the symbol that represents a comment line in the dataset
file.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
• inout sComment : String |
Setter method of the property sFileName. Parameter:
• sFileName: The filename of
the dataset source.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sFileName : String |
Setter method of the property sSeparationSymbol. Parameter:
• sSeparationSymbol: The token used to differentiate between instances in the same line of
the dataset source.
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
• inout sSeparator : String |
This abstract method defines the signature of the write method for every file dataset. Implementations of this method deal with the serialization (writing) of the current column structure into each specific file format.
Parameter:
• sOutputFile: The path where the dataset file will be
saved.
Exception:
• IOException
Type |
void |
Visibility |
public |
Is Abstract |
true |
Parameter |
• inout sOutputFile : String |
Name |
|
Related Element |
• CsvDataset |
Name |
|
Related Element |
• ExcelDataset |
Name |
|
Related Element |
• ArffDataset |
Name |
|
Related Element |
• Dataset |
InstanceIterator is the class that implements the interface IIterator for covering the instances of the dataset. Thus, this class represents an iterator to access each row/instance in a dataset. The instance iterator provides methods to cover the whole set of instances and keeps the reference to the dataset being iterated.
Figure
18. Class InstanceIterator
Name |
InstanceIterator |
Qualified Name |
es::uco::kdis::datapro::dataset::InstanceIterator |
Visibility |
public |
Abstract |
false |
Base Classifier |
|
Realized Interface |
• IIterator |
All attributes are private.
This method returns the list of objects that form the currently pointed instance in the dataset.
Type |
List<Object> |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method returns the list of objects that form the first instance in the dataset and sets the pointer to the first instance.
Type |
List<Object> |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Default iterator constructor.
Parameter:
• oDataset: The dataset to be covered by the iterator.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oDataset : Dataset |
This method returns true if the dataset has no more instances to be iterated. False, otherwise.
Type |
boolean |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method increases the instance pointer by one, i.e. sets the pointer to the next instance in the dataset.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• IIterator |
IIterator is the interface that any instance iterator has to implement, as InstanceIterator does.
Figure
19. Interface IIterator
Name |
IIterator |
Qualified Name |
es::uco::kdis::datapro::dataset::IIterator |
Visibility |
public |
Base Classifier |
|
The implementation of this method has to return the current pointed instance in the dataset as a List of instances of any class from Object.
Type |
List<Object> |
Visibility |
public |
Is Abstract |
true |
Parameter |
|
An implementation of this method returns the first instance of the dataset. From here on, the current instance pointed by the iterator should be this first one.
Type |
List<Object> |
Visibility |
public |
Is Abstract |
true |
Parameter |
|
This method should be implemented to return True if the iterator points to the last instance of the dataset. It returns False otherwise.
Type |
boolean |
Visibility |
public |
Is Abstract |
true |
Parameter |
|
next
The implementation of this method increases the iterator to the next instance in the dataset.
Type |
void |
Visibility |
public |
Is Abstract |
true |
Parameter |
|
Name |
|
Related Element |
• InstanceIterator |
This package contains the classes related to the different types of columns supported by the library. At the moment, datapro4j provides an implementation for the following types:
• Binary column, for positive or negative values.
• Categorical column, for prefixed string values, considered as an enumeration of categories.
• Date column.
• Integer column, for numerical integer values.
• Nominal column, for free valued strings.
• Numerical column, for numerical real values.
• Range column, for those values that represent a numerical interval (minimum, maximum), where both open and close ranges can be considered.
Columns are coded following the philosophy of the bridge design pattern, where an abstraction is decoupled from its implementation. In this way, the programmer can add to the library new implementations of some of the columns provided, e.g. for performance reasons, without altering the manner in which the rest of the library–including algorithms–interacts with this column.
Therefore, every column type is implemented by at least two different classes: its abstraction, where the accessor methods to its functionalities exist, and its implementation, where these functionalities are coded, and invoked from the abstraction.
Using columns properly demands considering the following rules:
• Any code from the library (i.e. from other columns, datasets or strategies) should always invoke methods of the abstraction. Never invoke directly to the column implementation (only its own abstraction should).
• Altering current abstractions may cause unexpected failures. Use generalization or provide conversion methods to build your own abstractions instead.
• Abstractions and implementations must be subclasses of ColumnAbstraction and ColumnImplementation, respectively.
• Datapro4j only supports one implementation class per abstraction. If the programmer wants to have more than one implementation, then more than one abstraction should be provided, or a factory pattern should be coded.
• If new abstractions (i.e. type of columns) are provided, modify the enumeration ColumnType accordingly.
Figure
20. Package es.uco.kdis.datapro.dataset.Column
Name |
Column |
Qualified Name |
es::uco::kdis::datapro::dataset::Column |
This abstract class implements the common functionalities contained by every column in the dataset. It also defines the methods that are not coded by the implementation class, but they refer to the column metainformation (e.g. name, type, etc.). The latter methods are directly implemented by abstractions, since they do not require any access to data.
Figure
21. Abstract class ColumnAbstraction
Name |
ColumnAbstraction |
Qualified Name |
es::uco::kdis::datapro::dataset::Column::ColumnAbstraction |
Visibility |
public |
Abstract |
true |
Base Classifier |
|
Realized Interface |
|
The column type, as represented by the enumeration defined by the class ColumnType.
Type |
ColumnType |
Default Value |
|
Visibility |
protected |
Multiplicity |
1 |
A reference to the implementation object.
Type |
ColumnImpl |
Default Value |
|
Visibility |
protected |
Multiplicity |
1 |
The name of the column.
Type |
String |
Default Value |
|
Visibility |
protected |
Multiplicity |
|
This method calls the implementation to add a list of values at the end of the column.
Parameter:
• rgoCol The list of values to be
added. The objects here contained must satisfy the type required by the column.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout rgoCol : List<Object> |
This method calls the implementation to add a single value at the end of the column.
Parameter:
• oValue The value to be added. It must satisfy the type required by the column.
The method returns the number of items successfully added to the column.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oValue : Object |
This method calls the implementation to add a single value at the end of the column.
Parameters:
• oValue The value to be added. It must satisfy the type required by the column.
• bForce is used to indicate that the value must be added, independently of
the constraints and
addition policies defined by the column type.
The method returns the number of items successfully added to the column.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in bForce : boolean • inout oValue : Object |
This method calls the implementation to add a single value at a given position in the column.
Parameters:
• iIndex indicates the element position where the item has to be added.
• oValue The value to be added. It must satisfy the type required by the column.
The method returns the number of items successfully added to the column.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int • inout oValue : Object |
Default constructor with parameters. Subclasses may override this method or create new constructors.
This constructor only assigns the parameter values to its respective variables. The constructor in the subclass should create the implementation object and assigned it to the variable oImpl.
Parameters:
• ctColumnType The column type.
• sName The Name of the column to be created.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout ctColumnType : ColumnType • inout sName : String |
This method calls the implementation to return the number of empty values in the column set.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to return the number of invalid values (i.e. empty, null and missing values) in the column set.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to return the number of missing values in the column set.
Type |
Int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to return the number of null values in the column set.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to return the element at the given position.
Parameter:
• iPos Position of the
element queried.
Type |
Object |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iPos : int |
This method calls the implementation to return the column-specific empty value. This is not the default empty value used by datapro4j (Class EmptyValue) for reading, writing or internally checking empty values, but it serves the developer to define its own use (e.g., the symbol associated to the empty value, or whatever).
Type |
Object |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to return the column-specific missing value. This is not the default missing value used by datapro4j (Class MissingValue) for reading, writing or internally checking missing values, but it serves the developer to define its own use (e.g., the symbol associated to a missing value, or whatever).
Type |
Object |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method returns the name given of the column.
Type |
String |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to return the column-specific null value. This is not the default null value used by datapro4j (Class NullValue) for reading, writing or internally checking null values, but it serves the developer to define its own null object.
Type |
Object |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to return the size of the column.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method returns the type of the column as a value of ColumnType.
Type |
ColumnType |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to return the list of items (as instances of Object) contained in the column.
Type |
List<Object> |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
It calls the implementation to remove an element in the column at a given position. Parameter:
• iIndex The index of the element to be removed.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int |
This method calls the implementation to set the column-specific empty value, if required. This is not the default empty value used by datapro4j (Class EmptyValue) for reading, writing or internally checking empty values, but the developer has to define its usage in the code of the proper strategies.
Parameter:
• oEmptyValue The empty value to be set.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oEmptyValue : Object |
This method calls the implementation to set the column-specific missing value, if required. This is not the default missing value used by datapro4j (Class MissingValue) for reading, writing or internally checking missing values, but the developer has to define its usage in the code of the proper strategies. Parameter:
• oMissingValue The missing value to
be set.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oMissingValue : Object |
This method sets the name of the column.
Parameter:
• sName The new name for the column.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String |
This method calls the implementation to set the column-specific null value, if required. This is not the default null value used by datapro4j (Class NullValue) for reading, writing or internally checking null values, but the developer has to define its usage in the code of the proper strategies.
Parameter:
• oNullValue The null value to
be set.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oNullValue : Object |
This method calls the implementation to set the value of an element in the column at a given position.
Parameters:
• oValue The value to be added.
• iIndex The element position in the column.
It returns the number of elements correctly added.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int • inout oValue : Object |
Name |
|
Related Element |
• ColumnImpl |
Name |
|
Related Element |
• ColumnType |
Name |
rgoColumns |
Related Element |
• Dataset |
Name |
|
Related Element |
• CategoricalColumn |
Name |
|
Related Element |
• NumericalColumn |
Name |
|
Related Element |
• DateColumn |
Name |
|
Related Element |
• BinaryColumn |
Name |
|
Related Element |
• NominalColumn |
Name |
|
Related Element |
• RangeColumn |
This abstract class serves as a base for column implementation classes. These classes comprise the real code accessing data in the column. Only metainformation is managed by its abstraction.
Note: None of its methods should be directly invoked, apart from its specific abstraction. Thus, for a given column type, abstraction is inalterable, whereas implementation could be adapted by the programmer.
Figure
22. Abstract class ColumnImpl
Name |
ColumnImpl |
Qualified Name |
es::uco::kdis::datapro::dataset::Column::ColumnImpl |
Visibility |
public |
Abstract |
true |
Base Classifier |
|
Realized Interface |
|
This object represents a column-specific empty value. Notice that this is not the standard empty value object, as used by datapro4j strategies and datasets.
Type |
Object |
Default Value |
null |
Visibility |
protected |
Multiplicity |
|
This object represents a column-specific missing value. Notice that this is not the standard missing value object, as used by datapro4j strategies and datasets.
Type |
Object |
Default Value |
null |
Visibility |
protected |
Multiplicity |
|
This object represents a column-specific null value. Notice that this is not the standard null value object, as used by datapro4j strategies and datasets.
Type |
Object |
Default Value |
null |
Visibility |
protected |
Multiplicity |
|
The following methods code the implementation for their corresponding abstraction methods.
This method implements the method addAllValues of the column abstraction, returning the number of objects successfully added.
Parameter:
• rgoCol The list of item objects to be added to the column.
Type |
int |
Visibility |
public |
Is Abstract |
true |
Parameter |
• inout rgoCol : List<Object> |
This method implements the method addValue of the column abstraction, returning the number of objects successfully added.
Parameter:
• oValue The value to be added.
Type |
int |
Visibility |
public |
Is Abstract |
true |
Parameter |
• inout oValue : Object |
This method implements the method addValue of the column abstraction, returning the number of objects successfully added.
Parameters:
• oValue The value to be added
• bForce If true, the implementation must force its addition.
Note: By default bForce is not considered. Otherwise, the subclass implementing the specific
column should explicitly rewrite this method.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oValue : Object • in bForce : boolean |
This method implements the method addValue of the column abstraction, returning the number of objects successfully added.
Parameters:
• oValue The value to be added.
• iIndex The position in
the column to add the value.
Type |
Int |
Visibility |
public |
Is Abstract |
true |
Parameter |
• inout oValue : Object • in iIndex : int |
This method implements the method countEmptyValue of the column abstraction, returning the number of empty values contained in the column values. -1 is returned if this value could not be calculated.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method implements the method countInvalidValue of the column abstraction, returning the number of invalid values (null, empty and missing values) contained in the column values. -1 is returned if this value could not be calculated.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method implements the method countMissingValue of the column abstraction, returning the number of missing values contained in the column values. -1 is returned if this value could not be calculated.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method implements the method countNullValue of the column abstraction, returning the number of null values contained in the column values. -1 is returned if this value could not be calculated.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method implements the method getElement of the column abstraction, returning the element at the given position.
Parameter:
• iPos The position of the
element to be returned.
Type |
Object |
Visibility |
public |
Is Abstract |
true |
Parameter |
• in iPos : int |
This method implements the method getEmptyValue of the column abstraction, returning the element representing the column-specific empty value.
Type |
Object |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method implements the method getMissingValue of the column abstraction, returning the element representing the column-specific missing value.
Type |
Object |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method implements the method getNullValue of the column abstraction, returning the element representing the column-specific null value.
Type |
Object |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method implements the method getSize of the column abstraction, returning the number of elements contained in the column.
Type |
int |
Visibility |
public |
Is Abstract |
true |
Parameter |
|
This method implements the method getValues of the column abstraction, returning the list of elements (as instances of Object) contained in the column.
Type |
List<Object> |
Visibility |
public |
Is Abstract |
true |
Parameter |
|
This method implements the method removeValue of the column abstraction.
Parameter:
• iIndex The position in
the column to add the value.
Type |
void |
Visibility |
public |
Is Abstract |
true |
Parameter |
• in iIndex : int |
This method implements the method setEmptyValue of the column abstraction, setting the element representing the column-specific empty value.
Parameter:
• oEmptyValue The object representing a specific empty value in this column.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oEmptyValue : Object |
This method implements the method setMissingValue of the column abstraction, setting the element representing the column-specific missing value.
Parameter:
• oMissingValue The object representing a specific missing value in this column.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oMissingValue : Object |
This method implements the method setNullValue of the column abstraction, setting the element representing the column-specific null value.
Parameter:
• oNullValue The object representing a specific null value in this column.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oNullValue : Object |
This method implements the method setValue of the column abstraction, setting the element value at the given position.
Parameters:
• oValue The object value to set.
• iIndex The position index in the column.
Type |
int |
Visibility |
public |
Is Abstract |
true |
Parameter |
• in iIndex : int • inout oValue : Object |
Name |
|
Related Element |
• ColumnAbstraction |
Name |
|
Related Element |
• RangeColumnImpl |
Name |
|
Related Element |
• NominalColumnImpl |
Name |
|
Related Element |
• NumericalColumnImpl |
Name |
|
Related Element |
• DateColumnImpl |
Name |
|
Related Element |
• CategoricalColumnImpl |
Name |
|
Related Element |
• BinaryColumnImpl |
This enumeration contains the different types of columns supported by datapro4j. The following types are currently supported:
• Binary
• Categorical
• Date
• Integer
• Nominal
• Numerical
• Range
Note: If the programmer wants to check the column type, the following code should be used (e.g. for binary columns)
ColumnAbstraction oCol;
…
if (oCol.getType().equals(ColumnType.Binary)) {
…
}
Figure 23. Enumeration ColumnType
Name |
ColumnType |
Qualified Name |
es::uco::kdis::datapro::dataset::Column::ColumnType |
Visibility |
public |
Abstract |
false |
Base Classifier |
|
Realized Interface |
|
Boolean attribute
Type |
|
Default Value |
|
Visibility |
public |
Multiplicity |
|
Categorical attribute
Type |
|
Default Value |
|
Visibility |
public |
Multiplicity |
|
Date attribute
Type |
|
Default Value |
|
Visibility |
public |
Multiplicity |
|
Integer attribute
Type |
|
Default Value |
|
Visibility |
public |
Multiplicity |
|
Nominal attribute
Type |
|
Default Value |
|
Visibility |
public |
Multiplicity |
|
Numerical attribute
Type |
|
Default Value |
|
Visibility |
public |
Multiplicity |
|
Range attribute
Type |
|
Default Value |
|
Visibility |
public |
Multiplicity |
|
Name |
|
Related Element |
• ColumnAbstraction |
This class represents the abstraction of a binary column. Here the methods that provide specific operations on specific binary data are defined.
Figure
24. Class BinaryColumn
Name |
BinaryColumn |
Qualified Name |
es::uco::kdis::datapro::dataset::Column::BinaryColumn |
Visibility |
public |
Abstract |
false |
Base Classifier |
• ColumnAbstraction |
Realized Interface |
|
Default constructor. The implementation BinaryColumnImpl is invoked.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Constructor with the name of the column as a parameter. The implementation BinaryColumnImpl is invoked.
Parameter:
• sName The name of the column.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String |
This method calls the implementation to return a categorical column generated from the binary column. The resulting categorical column defines two categories, one per each binary value (false, true).
Parameters:
• sFalseCategory The category representing the false binary value.
• sTrueCategory The category representing the true binary value.
Notes:
• If the value is an empty or a missing value, then a false value is considered.
• If the value is a null value, then a null value is considered.
Type |
CategoricalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sFalseCategory : String • inout sTrueCategory : String |
Name |
|
Related Element |
• ColumnAbstraction |
This class provides the implementation code accessing real data in a binary column. Binary values are stored as objects of class Boolean.
Note: None of its methods should be directly invoked, but only from its specific abstraction.
Figure
25. Class BinaryColumnImpl
Name |
BinaryColumnImpl |
Qualified Name |
es::uco::kdis::datapro::dataset::Column::BinaryColumnImpl |
Visibility |
public |
Abstract |
false |
Base Classifier |
• ColumnImpl |
Realized Interface |
|
All attributes are private.
For a more complete specification of the methods inherited from ColumnImpl, see its specifications above.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout rgoCol : List<Object> |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oValue : Object |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int • inout oValue : Object |
Default constructor.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
Object |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iPos : int |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
List<Object> |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout iIndex : int |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int • inout oValue : Object |
This method implements the method toCategorical of the binary column abstraction, converting the binary column into a categorical column.
Parameters:
• sName The name of
the column. By default this property is set by the abstraction to the current
name
of the binary column.
• sFalseCategory The category representing the false binary value.
• sTrueCategory The category representing the true binary value.
Type |
CategoricalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String • inout sFalseCategory : String • inout sTrueCategory : String |
Name |
|
Related Element |
• ColumnImpl |
This class defines the abstraction of a categorical column, where every value belongs to a predefined category. Here the methods that provide specific operations on categorical data are defined.
Figure
26. Class CategoricalColumn
Name |
CategoricalColumn |
Qualified Name |
es::uco::kdis::datapro::dataset::Column::CategoricalColumn |
Visibility |
public |
Abstract |
false |
Base Classifier |
• ColumnAbstraction |
Realized Interface |
|
This method calls the implementation to add a new category to the set of allowable values. Categories are included as objects of class String.
Parameter:
• szCategory The new category in the
column
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout szCategory : String |
Constructor with the name of the column as a parameter. The implementation
CategoricalColumnImpl is invoked.
Parameter:
• sName The name of the column
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String |
Default constructor. The implementation CategoricalColumnImpl is invoked.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to return the index in the list of categories of a given string. The value -1 is returned if the value is not found.
Parameter:
• szCategory The string representing the category to
be searched in the list of categories
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout szCategory : String |
This method calls the implementation to return the list of categories in the column.
Type |
List<Object> |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to return the category string stored in a given position of the list of categories. null is returned if the index given is not valid.
Parameter:
• iIndex The index of the
wanted category
Type |
String |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout iIndex : Integer |
This method calls the implementation to return the element stored in a given position in the column. The category index is returned, whereas the default method getElement (inherited from ColumnAbstraction) returns the category by name. If the value is invalid, -1 is returned.
Parameter:
• iPos The index of the
item in the column
Exceptions:
• IndexOutOfBoundsException
Type |
Integer |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iPos : int |
This method calls the implementation to replace a given category with a new one. Parameters:
• szOldCategory The category string to be replaced
• szNewCategory The new category string to
be set
• bJoinCategory If the new category string already exists, then this
parameter determines whether the values in of the
old category are mixed together with the values of
the column whose values coincide
1 is returned if the category is successfully replaced, or 0 otherwise.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in bJoinCategory : boolean • inout szNewCategory : String • inout szOldCategory : String |
This method calls the implementation to return a binary column generated from the categorical column. Invalid values remain unaltered.
Parameter:
• aReferenceTrueValues The list of category strings to be
as true values
Type |
BinaryColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout aReferenceTrueValues : List<String> |
This method calls the implementation to return a nominal column generated from the strings stored in the categorical column. Nominal values are extracted from the strings representing each category.
Type |
NominalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to return an integer column generated from the index values assigned to the categories in the source column.
Type |
IntegerColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• ColumnAbstraction |
This class provides the implementation code accessing real data in a categorical column. Categories are stored as a HashMap between a String and an Integer. Thus, internally, data are stored as an ArrayList of Integer, whereas their correspondences to categories are saved as String.
This class should never be directly invoked, apart from those invocations coming from its abstraction.
Figure
27. Class CategoricalColumnImpl
Name |
CategoricalColumnImpl |
Qualified Name |
es::uco::kdis::datapro::dataset::Column::CategoricalColumnImpl |
Visibility |
public |
Abstract |
false |
Base Classifier |
• ColumnImpl |
Realized Interface |
|
All attributes are private.
For a more complete specification of the methods inherited from ColumnImpl, see its specification above. Notice that values can be added both as a String –identifier- and as an Integer–index- (see methods addValue, addAllValues). In both cases only elements belonging to valid categories are added to the set of values in the column.
This method implements the functionality of addCategory in the categorical column abstraction, adding a new category to the column. This category should not exist. It returns the index of the new category, if successfully created, or -1 if the category cannot be added.
Parameter:
• sCat The identifier of
the
new category
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sCat : String |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oValue : Object |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in bForce : boolean • inout oValue : Object |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int • inout oValue : Object |
Default constructor.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method implements the functionality of getCategoryIndex in the column abstraction, returning the index of the category passed as String, or -1 if the category does not exist in the list of categories of the column.
Parameter:
• sCategory The category identifier
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sCategory : String |
This method implements the functionality of getCategoryIndex in the column abstraction, returning the list of category identifiers comprised by the category list. The resulting list is not sorted.
Type |
List<Object> |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method implements the functionality of getCategoryName in the column abstraction, returning the identifier of the category whose index is passed as parameter. If the category does not exist, then null is returned.
Parameter:
• iIndex The category index
Type |
String |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout iIndex : Integer |
Type |
Object |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iPos : int |
This method implements the functionality of getElementIndex in the column abstraction, returning the category index stored at a given position. Notice that indexes in the category list do not have to be sorted or sequencial, since categories may be successively created and deleted, causing gaps in the index sequence. Always consider category indexes as numerical identifiers, never as sequential indexes.
This method returns -1 if the position given is invalid.
Parameter:
• iPos The position given in
the category list.
Exceptions:
• IndexOutOfBoundsException
Type |
Integer |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iPos : int |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
List<Object> |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int |
This method implements the functionality of replaceCategory in the column abstraction, updating both the category list and replacing the values in the column. 1 is returned if done; 0, otherwise.
Parameters:
• sOldCategory The old category identifier to
be replaced
• sNewCategory The new category
• bJoinCategory If true, if the new category identifier already exists in the column, then the values with the old category identifier will be joined to the already existing identifier, having only one category as a result
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in bJoinCategory : boolean • inout sNewCategory : String • inout sOldCategory : String |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int • inout oValue : Object |
This method implements the functionality of toBinary in the column abstraction, returning a binary column constructed from the data contained in the categorical column. The list of category identifiers considered as True values in the binary column is passes as parameter. The non included category identifiers are considered as False values. Note that invalid values are observed.
Parameters:
• aReferenceTrueValues The list of categories representing true values
• sName The name of the new binary column
Type |
BinaryColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout aReferenceTrueValues : List<String> • inout sName : String |
This method implements the functionality of toNominal in the column abstraction, returning a nominal column constructed from the data contained in the categorical column. Strings for the nominal column are constructed from the category identifiers.
Parameter:
• sName The name of the new nominal column
Type |
NominalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String |
This method implements the functionality of toNumerical in the column abstraction, returning an integer column constructed from the data contained in the categorical column. Numbers of the integer column are extracted from the category indexes.
Parameter:
• sName The name of the new integer column
Type |
IntegerColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String |
Name |
|
Related Element |
• RangeColumnImpl |
Name |
|
Related Element |
• ColumnImpl |
This class represents the abstraction of a date datatype column. This type of column is specifically required by ARFF datasets.
Figure
28. Class DateColumn
Name |
DateColumn |
Qualified Name |
es::uco::kdis::datapro::dataset::Column::DateColumn |
Visibility |
public |
Abstract |
false |
Base Classifier |
• ColumnAbstraction |
Realized Interface |
|
This method calls the implementation to set the date format specification of the values in the column.
Parameter:
• sDate The format specification of
the values in the
date column
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oDate : SimpleDateFormat |
Default constructor with no parameters. The implementation DateColumnImpl is invoked.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Constructor with the name of the column as a parameter. The implementation DateColumnImpl is invoked.
Parameter:
• sName The name of the column
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String |
This method calls the implementation to get the date format specification of the values in the column.
Type |
SimpleDateFormat |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• ColumnAbstraction |
This class provides the implementation code accessing real data in a date column. Values are stored as
Date objects according to the format specified by a given SimpleDateFormat object. This class should not be invoked directly, only by the column abstraction.
Figure
29. Class DateColumnImpl
Name |
DateColumnImpl |
Qualified Name |
es::uco::kdis::datapro::dataset::Column::DateColumnImpl |
Visibility |
public |
Abstract |
false |
Base Classifier |
• ColumnImpl |
Realized Interface |
|
All attributes are private.
For a more complete specification of the methods inherited from ColumnImpl, see its specifications above.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout rgoCol : List<Object> |
This method implements the method addDateSpecification of the date column abstraction, setting the date format specification of the values in the column.
Parameter:
• sDate The format specification of
the values in the
date column
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oDate : SimpleDateFormat |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oValue : Object |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in bForce : boolean • inout oValue : Object |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int • inout oValue : Object |
DateColumnImpl
Default constructor with no parameters.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method implements the method getDateSpecification of the column abstraction, returning the date format specification of the values in the column.
Type |
SimpleDateFormat |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
Object |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in
iPos : int |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
List<Object> |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int • inout oValue : Object |
Name |
|
Related Element |
• ColumnImpl |
This class represents the abstraction of an integer column. Integer columns are a specialization of numerical (real) columns.
Figure
30. Class IntegerColumn
Name |
IntegerColumn |
Qualified Name |
es::uco::kdis::datapro::dataset::Column::IntegerColumn |
Visibility |
public |
Abstract |
false |
Base Classifier |
• NumericalColumn |
Realized Interface |
|
Many methods are specializations of their respective methods in the numerical column (NumericalColumn), adapted to the domain of integer values.
Analogously to getdMaxInterval in the NumericalColumn abstraction class, this method gets the maximum integer value allowed for this column.
Type |
Integer |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Analogously to getdMinInterval in the NumericalColumn abstraction class, this method gets the minimum integer value allowed for this column.
Type |
Integer |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
See getMaxValue in the specification of the NumericalColumn abstraction class.
Type |
double |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
For further information, see getMinValue in the specification of the NumericalColumn abstraction class.
Type |
double |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Default constructor with no parameters.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Constructor with the name of the resulting column as a parameter.
Parameter:
• sName The Name of the column
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String |
For further information, see mean in the specification of the NumericalColumn abstraction class.
Type |
double |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Analogously to setdMaxInterval in the NumericalColumn abstraction class, this method sets the maximum integer value allowed for this column.
Parameter:
• iMaxInterval The maximum value allowed in the
column
Exceptions:
• IllegalAccessException if the value cannot be
set.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout iMaxInterval : Integer |
Analogously to setdMinInterval in the NumericalColumn abstraction class, this method sets the minimum integer value allowed for this column.
Parameter:
• iMinInterval The maximum value allowed in the
column
Exceptions:
• IllegalAccessException if the value cannot be
set.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout iMinInterval : Integer |
For further information, see standardDeviation in the specification of the NumericalColumn abstraction class.
Type |
double |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to return a categorical column using the values contained in the integer column, where each different value constitutes a different category.
Type |
CategoricalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to return a numerical column using the values contained in the integer column, where each integer value is casted to a double value.
Type |
NumericalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• NumericalColumn |
This class provides the implementation code accessing real data in an integer column. This class is a specialization of the numerical column implementation (NumericalColumnImpl). Integer values are stored as objects of class Integer. This class and its methods should not be invoked directly.
Figure
31. Class IntegerColumnImpl
Name |
IntegerColumnImpl |
Qualified Name |
es::uco::kdis::datapro::dataset::Column::IntegerColumnImpl |
Visibility |
public |
Abstract |
false |
Base Classifier |
• NumericalColumnImpl |
Realized Interface |
|
For further information, see a complete specification of these methods in NumericalColumnImpl and ColumnImpl.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oValue : Object |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int • inout oValue : Object |
Type |
double |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
double |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Default constructor with no parameters.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
double |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int • inout oValue : Object |
Type |
double |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method implements the method toNumerical of the abstraction, returning a categorical column using the values contained in the integer column, where each different value constitutes a different category.
Parameter:
• sName The name of the resulting column
Type |
CategoricalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String |
This method implements the method toNumerical of the abstraction, returning a numerical column using the values contained in the integer column, where each different value constitutes a different category.
Parameter:
• sName The name of the resulting column
Type |
NumericalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String |
Name |
|
Related Element |
• NumericalColumnImpl |
This class represents the abstraction of a nominal column containing free-style strings as values. Here the methods that provide specific operations of nominal values are defined.
Figure
32. Class NominalColumn
Name |
NominalColumn |
Qualified Name |
es::uco::kdis::datapro::dataset::Column::NominalColumn |
Visibility |
public |
Abstract |
false |
Base Classifier |
• ColumnAbstraction |
Realized Interface |
|
Default constructor with no parameters.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Constructor with the name of the column as parameter.
Parameter:
• sName Name of the column
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String |
This method calls the implementation to return a categorical column, where each different string is a category (no repetition).
Type |
CategoricalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to return a numerical column, where each string is parsed to a numerical value. If the string could not be parsed, then an empty value is added instead. Notice that upper and lower interval values are not set for the numerical column returned.
Type |
NumericalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• ColumnAbstraction |
This class provides the implementation code accessing real data in the nominal column. Nominal values are stored as String objects. Note that these methods should not be invoked directly.
Figure
33. Class NominalColumnImpl
Name |
NominalColumnImpl |
Qualified Name |
es::uco::kdis::datapro::dataset::Column::NominalColumnImpl |
Visibility |
public |
Abstract |
false |
Base Classifier |
• ColumnImpl |
Realized Interface |
|
All attributes are private.
For a more detailed specification of the methods inherited from ColumnImpl, see its specification above.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout rgoCol : List<Object> |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oValue : Object |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int • inout oValue : Object |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
Object |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iPos : int |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
List<Object> |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Default constructor with no parameters.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int • inout oValue : Object |
This method implements the method toCategorical of the abstraction, returning a categorical column, where each different string is a category (no repetition).
Parameter:
• sName The name of the column to be created
Type |
CategoricalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String |
This method implements the method toNumerical of the abstraction, returning a numerical column, where each string is parsed to a numerical value. If the string could not be parsed, then an empty value is added instead. Notice that upper and lower interval values are not set for the numerical column returned.
Parameter:
• sName The name of the column
Type |
NumericalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String |
Name |
|
Related Element |
• ColumnImpl |
This class represents the abstraction of a numerical (real) column.
Figure
34. Class NumericalColumn
Name |
NumericalColumn |
Qualified Name |
es::uco::kdis::datapro::dataset::Column::NumericalColumn |
Visibility |
public |
Abstract |
false |
Base Classifier |
• ColumnAbstraction |
Realized Interface |
|
This attribute indicates the maximum value allowed in the column. This property should be accessed using getter/setter methods.
Type |
Double |
Default Value |
Double.MAX_VALUE |
Visibility |
protected |
Multiplicity |
|
This attribute indicates the minimum value allowed in the column. This property should be accessed using getter/setter methods.
Type |
Double |
Default Value |
Double.MIN_VALUE |
Visibility |
protected |
Multiplicity |
|
This method returns the maximum real value allowed in the column. The implementation is not invoked, since this information is part of metadata.
Type |
Double |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method returns the minimum real value allowed in the column. The implementation is not invoked, since this information is part of metadata.
Type |
Double |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to get the maximum existing value in the column data.
Type |
double |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to get the minimum existing value in the column data.
Type |
double |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to get the mean value of the column data.
Type |
double |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to normalize the set of values in the numerical column.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Default constructor with no parameters. The implementation NumericalColumnImpl is invoked.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Constructor with the name of the column as a parameter. The implementation NumericalColumnImpl is invoked.
Parameter:
• sName The name of the column
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String |
This method sets the maximum real value allowed in the column. The implementation is not invoked, since this information is part of the column metadata.
Parameter
• dMaxInterval The maximum value allowed
Exceptions:
• IllegalAccessException if the value cannot be
set
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout dMaxInterval : Double |
This method sets the minimum real value allowed in the column. The implementation is not invoked, since this information is part of the column metadata.
Parameter
• dMinInterval The minimum value allowed
Exceptions:
• IllegalAccessException if the value cannot be
set
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout dMinInterval : Double |
This method calls the implementation to return the standard deviation calculated from the set of values in the numerical column.
Type |
double |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to standarize the set of values in the numerical column.
Parameters:
• dMean Value of
the mean used to standardize the set of values of
the column
• dVariance Value of
the variance used for the standardization
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in
dMean : double • in
dVariance : double |
This method calls the implementation to return an integer column containing values extracted from the numerical column. It returns an IntegerColumn object.
Parameter:
• bRoundedValue if false, values are truncated; if true,
values are rounded.
Type |
IntegerColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in bRoundedValue : boolean |
This method calls the implementation to return a nominal column, where strings are constructed from real values.
Type |
NominalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• ColumnAbstraction |
This class provides the implementation code accessing real data in a numerical column. Values are stored as objects of the class Double. Notice that this class should not be directly instantiated, with the exception of its abstraction.
Figure
35. Class NumericalColumnImpl
Name |
NumericalColumnImpl |
Qualified Name |
es::uco::kdis::datapro::dataset::Column::NumericalColumnImpl |
Visibility |
public |
Abstract |
false |
Base Classifier |
• ColumnImpl |
Realized Interface |
|
All the attributes are either private or protected.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout rgoCol : List<Object> |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oValue : Object |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int • inout oValue : Object |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
Object |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iPos : int |
This method implements the method getMaxValue of the abstraction class, returning the maximum existing value in the column.
Type |
double |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method implements the method getMinValue of the abstraction class, returning the maximum existing value in the column.
Type |
double |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
List<Object> |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method implements the method mean of the abstraction class, returning the mean value of the column.
Type |
double |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method implements the method normalize of the abstraction class, calculating and normalizing the values contained in the set of values of the column.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Default constructor with no parameters.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int • inout oValue : Object |
This method implements the method standardDeviation of the abstraction class, returning the standard deviation value of the set of values contained in the numerical column.
Type |
double |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method implements the method standarize of the abstraction class, standardizing the values in the column according to the mean and variance passed as parameter.
Parameters:
• dMean Mean value considered for the standardization
• dVariance Variance value considered for the standardization
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in dMean : double • in dVariance : double |
This method implements the method toInteger of the abstraction class, returning an integer column calculated from the numerical column.
Parameters:
• sName The name of the resulting new column
• bRoundedValue If false, values are truncated; if true,
values are rounded
Type |
IntegerColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in bRoundedValue : boolean • inout sName : String |
This method implements the method toNominal of the abstraction class, returning a nominal column which strings are constructed parsing the numerical values in the column.
Parameter:
• sName The name of the resulting new column
Type |
NominalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String |
Name |
|
Related Element |
• ColumnImpl |
This class represents the abstraction of a range column, whose values are intervals with a minimum and a maximum value in the range.
Figure
36. Class RangeColumn
Name |
RangeColumn |
Qualified Name |
es::uco::kdis::datapro::dataset::Column::RangeColumn |
Visibility |
public |
Abstract |
false |
Base Classifier |
• ColumnAbstraction |
Realized Interface |
|
Default constructor with no parameters.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Constructor with the name of the column as a parameter.
Parameter:
• sName The name of the column.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String |
This method calls the implementation to return a categorical column extracted from the range data contained in the column. The method returns a CategoricalColumn object.
Exceptions:
• NotAddedValueException
Type |
CategoricalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method calls the implementation to return a numerical column extracted from the range values contained in the column, and according to on of the following modes:
0: The minimum value of each range is selected.
1: The maximum value of each range is selected.
2: The mean value between min and max is selected.
3: A random value in the range is selected.
It returns the resulting NumericalColumn object.
Parameter:
• iMode An integer between 0 and 3 indicating the conversion mode, as described above.
Exceptions:
• NotAddedValueException
Type |
NumericalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout iMode : int |
This method calls the implementation to return a numerical column extracted from the range values contained in the column, according to the Gauss distribution.
Parameters:
• dMean The arithmetic mean for the distribution
• dStdDev The standard deviation for the distribution
It returns the resulting NumericalColumn object.
Exceptions:
• NotAddedValueException
Type |
NumericalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in dMean : double • in dStdDev : double |
Name |
|
Related Element |
• ColumnAbstraction |
This class, the abstraction of a range column (i.e. a representation of a [min, max] interval), is the one that should be used by the programmer, since it hides the actual implementation of the column. Even when the implementation changes, the abstraction must remain unaltered.
Figure
37. Class RangeColumnImpl
Name |
RangeColumnImpl |
Qualified Name |
es::uco::kdis::datapro::dataset::Column::RangeColumnImpl |
Visibility |
public |
Abstract |
false |
Base Classifier |
• ColumnImpl |
Realized Interface |
|
All attributes are private.
For a detailed specification of the methods inherited from ColumnImpl, see its specifications above.
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout rgoValues : List<Object> |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oValue : Object |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int • inout oValue : Object |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
Object |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iPos : int |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Type |
List<Object> |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Default constructor with no parameters.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Constructor with the name of the column as a Parameter.
Parameter:
• sName The name of the column.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sName : String |
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int |
Type |
int |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iIndex : int • inout oValue : Object |
This method implements the method toCategorical of the abstraction, returning a categorical column extracted from the range data contained in the column. The method returns the resulting CategoricalColumn object.
Exceptions:
• NotAddedValueException
Type |
CategoricalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method implements the method toNumerical of the abstraction, returning a numerical column extracted from the range values contained in the column, and according to on of the following modes:
0: The minimum value of each range is selected.
1: The maximum value of each range is selected.
2: The mean value between min and max is selected.
3: A random value in the range is selected.
It returns the resulting NumericalColumn object.
Parameter:
• iMode An integer between 0 and 3 indicating the conversion mode, as described above.
Exceptions:
• NotAddedValueException
Type |
NumericalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in iMode : int |
This method implements the method toNumericalByGaussian of the abstraction, returning a numerical column extracted from the range values contained in the column, according to the Gauss distribution.
Parameters:
• dMean The arithmetic mean for the distribution
• dStdDev The standard deviation for the distribution
It returns the resulting NumericalColumn object.
Exceptions:
• NotAddedValueException
Type |
NumericalColumn |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in dMean : double • in dStdDev : double |
Name |
|
Related Element |
• ColumnImpl |
Figure
38. Package es.uco.kdis.datapro.dataset.Source
Name |
Source |
Qualified Name |
es::uco::kdis::datapro::dataset::Source |
ArffDataset implements the ARFF (Attribute-Relation File Format) dataset file specification, as used by Weka. This is a subclass of FileDataset.
ARFF files are ASCII text files that describe a list of instances sharing a set of attributes. After a few heading lines, where the metainformation is presented, one instance per line is dumped, until the end of the file is reached.
Types of attribute in ARFF dataset files:
• @ATTRIBUTE name numeric (As numerical columns)
• @ATTRIBUTE name {value1,
value2, ...} (As categorical columns)
• @ATTRIBUTE name string (As nominal columns)
• @ATTRIBUTE name date "yyyy-MM-dd HH:mm:ss" (As date columns)
For a further description, visit the web site http://www.cs.waikato.ac.nz/ml/weka/arff.html (Nov. 1st, 2008).
Figure
39. Class ArffDataset
Name |
ArffDataset |
Qualified Name |
es::uco::kdis::datapro::dataset::Source::ArffDataset |
Visibility |
public |
Abstract |
false |
Base Classifier |
• FileDataset |
Realized Interface |
|
Some attributes are protected to allow reusability by inheritance.
ATTRIBUTE is the static constant string for the ARFF keyword '@attribute'.
Type |
String |
Default Value |
"@attribute" |
Visibility |
protected |
Multiplicity |
|
DATA is the static constant string for the ARFF keyword '@data'. It defines the beginning of the data block in the ARFF file.
Type |
String |
Default Value |
"@data" |
Visibility |
protected |
Multiplicity |
|
RELATION is the static constant with the ARFF keyword '@relation'. It represents the beginning of the ARFF dataset definition.
Type |
String |
Default Value |
"@relation" |
Visibility |
protected |
Multiplicity |
|
This method reads the DATA block in the dataset and adds the values in the file to the corresponding column structure.
Parameter:
• sColumnFormat String indicating the sequence of
column types that corresponds to the attribute order of each instance in the
dataset.
o s: Nominal column
o f: Numerical (real) column
o c: Categorical column
o b: Binary column
o d: Date column
o %: Skip this column (do not dump its values to any column)
For example, the string “cbbf%%d” indicates the sequence of attributes that are read from the dataset and copied in memory to the column structure of the dataset: a categorical column, two binary columns, and a numerical column. The following two attributes are omitted. Finally, the date attribute is copied.
Exceptions:
• IndexOutOfBoundsException
• IOException
• NotAddedValueException
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
• inout sColumnFormat : String |
Default constructor with no parameters. No dataset filename is specified using this constructor.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Constructor with the filename of the dataset as a parameter.
Parameter:
• sFileName The filename of
the dataset
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sFileName : String |
This method closes the ARFF file.
Exception:
• IOException
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
This method reads the metadata of an ARFF file. Each attribute specification is interpreted and, if required, the column structure is created in the dataset.
This method reads the metadata block of the dataset. Parameter:
• sColumnFormat String indicating the sequence of
column types that corresponds to the attribute order of each instance in the
dataset.
o s: Nominal column
o f: Numerical (real) column
o c: Categorical column
o b: Binary column
o d: Date column
o %: Skip this column (do not dump its values to any column)
For example, the code "bbf%c" indicates that two binary columns and a numerical (real) column will be read. Then, the forth attribute will be skipped and, finally, a categorical column will be read.
Exceptions:
• IOException
• InputMismatchException
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
• inout sColumnFormat : String |
This method opens the dataset file using the name passed as a parameter to the constructor.
Exceptions:
• FileNotFoundException
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.
Parameter:
• sContentFormat Not considered for ARFF datasets
• sColumnFormat String that specifies the types of columns to be read. Each column type is
represented by one of the
following symbols:
o s: Nominal column
o f: Numerical column
o c: Categorical column
o b: Binary column
o d: Date column
o %: Skip this column
Exceptions:
• NotAddedValueException
• IOException
• IndexOutOfBoundsException
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sColumnFormat : String • inout sContentFormat : String |
This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.
Parameter:
• sColumnFormat String that specifies the types of columns to be read. Each column type is
represented by one of the
following symbols:
o s: Nominal column
o f: Numerical column
o c: Categorical column
o b: Binary column
o d: Date column
o %: Skip this column
Exceptions:
• NotAddedValueException
• IOException
• IndexOutOfBoundsException
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sColumnFormat : String |
This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file. The value of the column format string is null.
Exceptions:
• NotAddedValueException
• IOException
• IndexOutOfBoundsException
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method opens the dataset file, writes metadata and instances, and closes the file. The column types accepted (otherwise, an InputMismatchException exception is thrown) are the following:
• Numerical
• Date
• Nominal
• Categorical
• Boolean (binary values are saved as categorical values)
Parameter:
• sOutputFile The filename of
the dataset
Exceptions:
• InputMismatchException
• IOException
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sOutputFile : String |
Name |
|
Related Element |
• FileDataset |
CsvDataset implements the CSV (Comma-Separated Values) dataset file specification, as prescribed by the IETF specification, available from http://tools.ietf.org/html/rfc4180 (October, 2005).
Figure
40. Class CsvDataset
Name |
CsvDataset |
Qualified Name |
es::uco::kdis::datapro::dataset::Source::CsvDataset |
Visibility |
public |
Abstract |
false |
Base Classifier |
• FileDataset |
Realized Interface |
|
This method adds all the values in the file to the corresponding column structure. Parameter:
• sColumnFormat String indicating the sequence of
column types that corresponds to the attribute order of each instance in the
dataset.
o s: Nominal column
o f: Numerical (real) column
o i: Integer column
o c: Categorical column
o %: Skip this column (do not dump its values to any column)
For example, the string “cf%%s” indicates the sequence of attributes that are read from the dataset and copied in memory to the column structure of the dataset: a categorical column and a numerical column. The following two attributes are omitted, and, finally, the date attribute is copied.
Exceptions:
• IndexOutOfBoundsException
• IOException
• NotAddedValueException
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
• inout sColumnFormat : String |
This method closes the CSV file.
Exception:
• IOException
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
The default constructor of the CSV dataset with no parameters.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Constructor of the CSV dataset with its filename as a parameter.
Parameter:
• sFileName The filename of
the CVS dataset
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sFileName : String |
This method reads the metadata of the CSV file. Notice that any metainformation in CSV files is optional.
Parameter:
• sContentFormat String that specifies the structure of the CSV file. The following symbols
are
used:
o n: Indicates that a line with the attribute names is
read
o v: Indicates the block containing the instance values is read
o %: Skip one row in
the file
• sColumnFormat String that specifies the different column types to be read from the file. Each column type is represented by one of
the following symbols:
o s: Nominal column
o f: Numerical (real) column
o c: Categorical column
o i: Integer column
o %: Skip this column
Exceptions:
• IOException
• IllegalFormatSpecificationException
Type |
void |
Visibility |
Protected |
Is Abstract |
false |
Parameter |
• inout sColumnFormat : String • inout sContentFormat : String |
This method opens the dataset CSV file using the name passed as a parameter to the constructor.
Exceptions:
• FileNotFoundException
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.
Parameter:
• sContentFormat String that specifies the structure of the CSV file. The following symbols
are
used:
o n: Indicates that a line with the attribute names is
read
o v: Indicates the block containing the instance values is read
o %: Skip one row in
the file
For example, “%n%%v” omits the first line, then reads the column names, omits
the next two lines and, finally, reads the dataset instances
• sColumnFormat String that specifies the types of columns to be read. Each column type is
represented by one of the
following symbols:
o s: Nominal column
o f: Numerical column
o i: Integer column
o c: Categorical column
o %: Skip this column
Exceptions:
• NotAddedValueException
• IOException
• IndexOutOfBoundsException
• IllegalFormatSpecificationException
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sColumnFormat : String • inout sContentFormat : String |
This method opens the dataset, reads metainformation and instances and, finally, closes the dataset file. This method assumes the following file format: one first line with the attribute names (metadata), followed by the instances.
Parameter:
• sColumnFormat String that specifies the types of columns to be read. Each column type is
represented by one of the
following symbols:
o s: Nominal column
o f: Numerical column
o i: Integer column
o c: Categorical column
o %: Skip this column
Exceptions:
• NotAddedValueException
• IOException
• IndexOutOfBoundsException
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sColumnFormat : String |
This method writes a new CVS dataset file. The column types allowed for writing are the following:
• Numerical
• Integer
• Nominal
• Categorical
• Binary (binary values are saved as categorical values)
Parameter:
• sOutputFile The filename of
the dataset
Exceptions:
• IOException
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sOutputFile : String |
Name |
|
Related Element |
• FileDataset |
ExcelDataset is a class that represents a dataset conformant to the Microsoft Excel standard specification. This type of files has the basic features of all spreadsheets, using a grid of cells arranged in numbered rows and letter-named columns.
Note: This class has external dependencies to the Java library POI.
Figure
41. Class ExcelDataset
Name |
ExcelDataset |
Qualified Name |
es::uco::kdis::datapro::dataset::Source::ExcelDataset |
Visibility |
public |
Abstract |
false |
Base Classifier |
• FileDataset |
Realized Interface |
|
All attributes are private.
This method adds all the values in the DATA block of the file to the corresponding column structure. Parameter:
• sColumnFormat String indicating the sequence of
column types that corresponds to the attribute order of each instance in the
dataset.
o s: Nominal column
o f: Numerical (real) column
o i: Integer column
o c: Categorical column
o %: Skip this column (do not dump its values to any column)
For example, the string “cf%%s” indicates the sequence of attributes that are read from the dataset and copied in memory to the column structure of the dataset: a categorical column and a numerical column. The following two attributes are omitted, and, finally, the nominal attribute is copied.
Exceptions:
• IndexOutOfBoundsException
• IOException
• NotAddedValueException
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
• inout sColumnFormat : String |
Close the Excel file.
Exceptions:
• IOException
Type |
void |
Visibility |
Protected |
Is Abstract |
false |
Parameter |
|
Default constructor with no parameters.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Constructor with the filename as parameter.
Parameter:
• sFileName The filename of
the Excel dataset
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sFileName : String |
This method reads the metadata of the Excel file.
Parameter:
• sContentFormat String that specifies the data structure in the
Excel file. The following symbols are used:
o n: Indicates that a line with the attribute names is
read
o v: Indicates the block containing the instance values is read
o %: Skip one row in
the file
• sColumnFormat String that specifies the different column types to be read from the file. Each column type is represented by one of
the following symbols:
o s: Nominal column
o f: Numerical (real) column
o c: Categorical column
o i: Integer column
o %: Skip this column
Exceptions:
• IOException
• IllegalFormatSpecificationException
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
• inout sColumnFormat : String • inout sContentFormat : String |
This method opens the Excel file using the name passed as a parameter to the constructor.
Exceptions:
• FileNotFoundException
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
|
This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.
Parameter:
• sContentFormat String that specifies the structure of the CSV file. The following symbols
are
used:
o n: Indicates that a line with the attribute names is
read
o v: Indicates the block containing the instance values is read
o %: Skip one row in
the file
For example, “%n%%v” omits the first line, then reads the column names, omits
the next two lines and, finally, reads the dataset instances
• sColumnFormat String that specifies the types of columns to be read. Each column type is
represented by one of the
following symbols:
o s: Nominal column
o f: Numerical column
o i: Integer column
o c: Categorical column
o %: Skip this column
Exceptions:
• NotAddedValueException
• IOException
• IndexOutOfBoundsException
• IllegalFormatSpecificationException
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sColumnFormat : String • inout sContentFormat : String |
This method writes the dataset to a new Excel file. The column types supported for writing are the following:
• Numerical
• Integer
• Nominal
• Categorical
• Binary (binary values are saved as categorical values)
Parameter:
• sOutputFile The filename of
the dataset
Exceptions:
• IOException
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sOutputFile : String |
Name |
|
Related Element |
• FileDataset |
KeelDataset is the class representing a dataset conformant to the KEEL (Knowledge Extraction based on Evolutionary Learning) standard specification. KeelDataset is a subclass of ArffDataset.
KEEL files are a specific subtype of ARFF files with the following kind of attributes:
• @ATTRIBUTE name real [value1,
value2] for real data
• @ATTRIBUTE name integer [value1,
value2] for integer data
• @ATTRIBUTE name {value1,
value2, ...} for categorical data
For a more detailed description of this specification, the reader can consult the following reference:
J. Alcalá-Fdez et al. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. Journal of Multiple-Valued Logic and Soft Computing 17:2-3 (2011) 255-287.
Also, for further information, visit the website http://www.keel.es.
Figure
42. Class KeelDataset
Name |
KeelDataset |
Qualified Name |
es::uco::kdis::datapro::dataset::Source::KeelDataset |
Visibility |
public |
Abstract |
false |
Base Classifier |
• ArffDataset |
Realized Interface |
|
Constant for the keyword @inputs
Type |
String |
Default Value |
"@inputs" |
Visibility |
protected |
Multiplicity |
|
Constant for the keyword @outputs
Type |
String |
Default Value |
"@outputs" |
Visibility |
protected |
Multiplicity |
|
This method adds all the values in the @DATA block of the file to the corresponding column structure.
Parameter:
• sColumnFormat String indicating the sequence of
column types that corresponds to the attribute order of each instance in the
dataset.
o f: Numerical (real) column
o i: Integer column
o c: Categorical column
o b: Binary column
o %: Skip this column (do not dump its values to any column)
For example, the string “cf%%b” indicates the sequence of attributes that are read from the dataset and copied in memory to the column structure of the dataset: a categorical column and a numerical column. The following two attributes are omitted, and, finally, the binary attribute is copied.
Exceptions:
• IndexOutOfBoundsException
• IOException
• NotAddedValueException
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
• inout sColumnFormat : String |
Default constructor with no parameters.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Constructor with the filename of the dataset as a parameter.
Parameter:
• sFileName The filename containing the dataset
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sFileName : String |
This method reads the metadata of the KEEL file.
Parameter:
• sColumnFormat String that specifies the different column types to be read from the file. Each column type is represented by one of
the following symbols:
o b: Binary column
o f: Numerical (real) column
o c: Categorical column
o i: Integer column
o %: Skip this column
Exceptions:
• IOException
• IllegalFormatSpecificationException
Type |
void |
Visibility |
protected |
Is Abstract |
false |
Parameter |
• inout sColumnFormat : String |
This method writes the dataset to a new Excel file. Only the following types of column are supported for writing:
• Numerical (real)
• Integer
• Categorical
Parameter:
• sOutputFile The filename of
the dataset
Exceptions:
• IOException
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout sOutputFile : String |
Name |
|
Related Element |
• ArffDataset |
Figure
43. Package es.uco.kdis.datapro.datatypes
Name |
datatypes |
Qualified Name |
es::uco::kdis::datapro::datatypes |
This abstract class represents any invalid value in a column. This is the base class of the following types of invalid values:
• Missing values.
• Null values.
• Empty values.
For a more detailed description, see the following reference:
Pyle, D. Data preparation for data mining. Morgan Kaufmann, 1999. ISBN: 1-55869-529-0.
Note. Notice that columns may define their own invalid values. However, these values are not processed by the library, but only devoted to serialization and specific algorithms. Generally, these objects for invalid values are more than enough for a regular use. Further, these objects are notation-independent, and only used for data processing.
Figure
44. Class InvalidValue
Name |
InvalidValue |
Qualified Name |
es::uco::kdis::datapro::datatypes::InvalidValue |
Visibility |
public |
Abstract |
true |
Base Classifier |
|
Realized Interface |
|
Name |
|
Related Element |
• MissingValue |
Name |
|
Related Element |
• EmptyValue |
Name |
|
Related Element |
• NullValue |
This class represents an empty value in a variable, i.e., the one for which no real-world value can be supposed.
This class implements a singleton object, so only one reference can be instantiated simultaneously. Instantiation is done using the method getEmptyValue. Therefore, empty values can be compared using the operator ‘==’.
Figure
45. Class EmptyValue
Name |
EmptyValue |
Qualified Name |
es::uco::kdis::datapro::datatypes::EmptyValue |
Visibility |
public |
Abstract |
false |
Base Classifier |
• InvalidValue |
Realized Interface |
|
All attributes are private.
Singleton constructor for the object representing an empty value.
Type |
EmptyValue |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• InvalidValue |
This class represents a missing value in a variable, i.e., the one that has not been entered into the dataset, but for which an actual value exists in the real-world in which the measurements were made.
This class implements a singleton object, so only one reference can be instantiated simultaneously. Instantiation is done using the method getMissingValue. Therefore, missing values can be compared using the operator ‘==’.
Figure
46. Class MissingValue
Name |
MissingValue |
Qualified Name |
es::uco::kdis::datapro::datatypes::MissingValue |
Visibility |
public |
Abstract |
false |
Base Classifier |
• InvalidValue |
Realized Interface |
|
All attributes are private.
Singleton constructor for the object representing a missing value.
Type |
MissingValue |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• InvalidValue |
This class represents an explicit null value in a variable.
This class implements a singleton object, so only one reference can be simultaneously instantiated. Instantiation is done using the method getNullValue. Therefore, null values can be compared using the operator ‘==’. Its use allows the programmer to replace null values with comparable object instances (e.g. in collections, comparisons, etc.).
Figure
47. Class NullValue
Name |
NullValue |
Qualified Name |
es::uco::kdis::datapro::datatypes::NullValue |
Visibility |
public |
Abstract |
false |
Base Classifier |
• InvalidValue |
Realized Interface |
|
All attributes are private.
Singleton constructor for the object representing a null value.
Type |
NullValue |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• InvalidValue |
This class is a template to represent any kind of interval consisting of a maximum and minimum limit. These boundaries can be open or close, indicating that the value is excluded or included in the range. The C defined by the template is the class of object involved in the range.
Figure
48. Class Range
Name |
Range |
Qualified Name |
es::uco::kdis::datapro::datatypes::Range |
Visibility |
public |
Abstract |
true |
Base Classifier |
|
Realized Interface |
|
Protected attributes with accessors (getter/setter) are omitted.
This method returns the upper interval boundary value, i.e. the maximum value in the interval (the programmer has to check whether the interval is open or close).
Type |
C |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method returns the lower interval boundary value, i.e. the minimum value in the interval (the programmer has to check whether the interval is open or close).
Type |
C |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method returns a boolean value indicating whether the upper interval boundary is open, i.e. the maximum value is excluded from the range.
Type |
boolean |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method returns a boolean value indicating whether the lower interval boundary is open, i.e. the minimum value is excluded from the range.
Type |
boolean |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
This method sets the upper interval boundary.
Parameter:
• oMax The new maximum value
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oMax : C |
This method sets the lower interval boundary.
Parameter:
• oMin The new minimum value
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout oMin : C |
This method sets the upper interval boundary to open or close.
Parameter:
• bOpenMax True if open; false if close.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout bOpenMax : boolean |
This method sets the lower interval boundary to open or close.
Parameter:
• bOpenMin True if open; false if close.
Type |
void |
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout bOpenMin : boolean |
Name |
|
Related Element |
• Range<Double> |
This class is a specialization of the template Range, where the template parameter is of type Double.
Figure
49. Class DoubleRange
Name |
DoubleRange |
Qualified Name |
es::uco::kdis::datapro::datatypes::DoubleRange |
Visibility |
public |
Abstract |
false |
Base Classifier |
• Range<Double> |
Realized Interface |
|
Default constructor with no parameters. By default, the lower and upper limit boundaries are set to the negative and positive infinite values, respectively.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Constructor with parameters.
Parameters:
• dMin The minimum value of
the range, i.e. the lower interval boundary.
• dMax The maximum value of
the range, i.e. the upper interval boundary.
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• in dMax : double • in dMin : double |
This method returns true if the value passed as a parameter is a valid value in the interval.
Parameter:
• dValue The value to be checked.
Type |
boolean |
Visibility |
public |
Is Abstract |
false |
Parameter |
• in dValue : double |
This method returns the interval in a String format. The output format is as follows:
‘[‘|’(‘ <min> ‘,’ <max> ‘)’|’]’
where square brackets are used for close intervals, and regular brackets indicate an open value.
Type |
String |
Visibility |
public |
Is Abstract |
false |
Parameter |
|
Name |
|
Related Element |
• Range<Double> |
Figure
50. Package es.uco.kdis.datapro.exception
Name |
exception |
Qualified Name |
es::uco::kdis::datapro::exception |
This class is the exception indicating that the file format under consideration does not fulfill the expected standards for such a specification.
Figure
51. Class IllegalFormatSpecificationException
Name |
IllegalFormatSpecificationException |
Qualified Name |
es::uco::kdis::datapro::exception::IllegalFormatSpecificationException |
Visibility |
public |
Abstract |
false |
Base Classifier |
• Exception |
Realized Interface |
|
All attributes are private.
Constructor with the error message as a parameter.
Parameter:
• string Error message
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout string : String |
Name |
|
Related Element |
• Exception |
This class is the exception indicating that a certain element does not belong to the specified category, or that a category is not found.
Figure
52. Class NoSuchCategoryException
Name |
NoSuchCategoryException |
Qualified Name |
es::uco::kdis::datapro::exception::NoSuchCategoryException |
Visibility |
public |
Abstract |
false |
Base Classifier |
• Exception |
Realized Interface |
|
All attributes are private.
Constructor with the error message as a parameter.
Parameter:
• string Error message
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout string : String |
Name |
|
Related Element |
• Exception |
This class is the exception indicating that a value was not successfully added to the dataset.
Figure
53. Class NotAddedValueException
Name |
NotAddedValueException |
Qualified Name |
es::uco::kdis::datapro::exception::NotAddedValueException |
Visibility |
public |
Abstract |
false |
Base Classifier |
• Exception |
Realized Interface |
|
All attribute are private.
Constructor with the error message as a parameter.
Parameter:
• string Error message
Type |
|
Visibility |
public |
Is Abstract |
false |
Parameter |
• inout string : String |
Generalization
Name |
|
Related Element |
• Exception |
This appendix shows the class diagrams that represent the structure of datapro4j. This is the general package overview. The different packages are shown next.
Figure
54. Class diagram: package overview
Figure
55. Class diagram: package
es.uco.kdis.datapro.algorithm.base
Figure 56. Class diagram: Package es.uco.kdis.datapro.algorithm.preprocessing
Figure
57. Class diagram: Package
es.uco.kdis.datapro.dataset.Column
Figure
58. Package es.uco.kdis.datapro.dataset.Source
Package es.uco.kdis.datapro.datatypes
Figure
59. Class diagram: Package
es.uco.kdis.datapro.datatypes
Package es.uco.kdis.datapro.exception
Figure
60. Class diagram: Package
es.uco.kdis.datapro.exception
This project is
structured in
three different parts:
1. Column structure.
2. Datasets hierarchy.
3. Strategies.
If the programmer wants to develop new columns or adapt an existing one to his own requirements, he should have in mind the strict separation between abstraction and implementation. The former implements those methods directly devoted to manage the column metainformation and delegates any processing, handling or query related to the column real values to its implementation. For further information, see the Bridge design pattern (http://en.wikipedia.org/wiki/Bridge_pattern).
We recommend the following guidelines for the development of new columns:
• Column classes should be located in the
package es.uco.kdis.datapro.dataset.Column
• For a given type of column, namely X, the abstraction class will be named XColumn, and its implementation class, XColumnImpl.
• The new column X has to be added to the enumeration ColumnType. This value is
returned by
the column as its type.
• Column implementations should not be directly accessed from any other class than its abstraction.
The library provides a finite number of dataset implementations (ARFF, Keel, CSV, MySql, ... and increasing), but its architecture permits the programmer to extend this part to make his own datasets of interest available. Rarely dataset classes are directly inherited from the top Dataset abstract class, but it is advisable to create, use and maintain the correct class hierarchy where common (both structural and behavioural) properties are defined, for design reasons. For example, ARFF and CSV datasets will inherit from the common file-based dataset, i.e. the abstract class FileDataset. Their respective classes will only define those properties that are specific to these kinds of file, whereas file-specific properties are defined by intermediate abstract classes. Dataset is always the root of this hierarchy, since this class links the physical dataset to the logical column structure.
Some guidelines to be considered:
• Dataset abstract classes for defining common properties are located in the package
es.uco.kdis.datapro.dataset
• Dataset concrete classes are located in the package es.uco.kdis.datapro.dataset.Source
• Dataset classes should be
named with the suffix -“Dataset”, .e.g, CsvDataset.
Apart from the constructor (with or without parameters), the main methods to pay attention are inherited from the abstract class Dataset:
• readDataset, which allows the programmer to configure the type of columns to be filled, as well as and the dataset structure.
• writeDataset, which permits the programmer to save current dataset values into the specific
format.
These methods should fulfill the following assumptions:
• When reading, format can vary or contain errors (invalid values, missing or wrong structure, etc.).
• When reading, the original structure (meta-data) of the
dataset should be recalled somehow.
• When writing, the dataset may have been read from a dataset of the same type, or not:
o If the source dataset is
of the same format, the programmer may want to
overwrite or
generate a new dataset. In both
cases, the resulting dataset should maintain the same structure (e.g. column types and meta-data) than the source dataset.
o If the dataset to be written is of a different type than the source dataset (or the same type
with a different structure), the programmer may want to specify the type of columns to be declared in the resulting dataset.
Strategies are the core and most scalable element of the library. Strategies implement algorithms on data. Strategies are independent of a specific dataset, so they can make use of more than one dataset. See DatasetStrategy in this guide for more information on the methods that should be implemented.
To implement your own algorithms, the following guidelines should be considered:
• Every algorithm should be
a subclass of DatasetStrategy.
• Algorithms are grouped in
packages from es.uco.kdis.datapro.algorithm
• Only the package es.uco.kdis.datapro.algorithm.base is required by the library. The rest of packages from
es.uco.kdis.datapro.algorithm could be excluded from the programmer’s
distribution. Notice that each specific algorithm package may have its own external dependencies.
Apart from the specific packages for columns, datasets and strategies, there are some other relevant packages to consider that may be extended as well:
• es.uco.kdis.datapro.datatypes, this package implements the auxiliary classes and datatypes used by datapro4j. For example, the classes declaring invalid values, ranges, etc.
• es.uco.kdis.datapro.exception, this package implements the exception classes. The programmer should look for alternative Java common exceptions before implementing his own class
and clutter the library up with unnecessary classes.
Class headings are documented according to the following structure: class description, contact info and history.
/**
* CLASS DESCRIPTION
*
* <p>
* CONTACT INFO:
* <ul>
* <li>Jose Raul Romero,
PhD [jrromero@uco.es]
* <p>{@link http://www.jrromero.net}
* <p><p>
* Knowledge Discovery and Intelligent Systems Research Group (KDIS) <p>
* {@link http://www.uco.es/grupos/kdis}
* </ul>
* <p>
* HISTORY:
* <ul>
* <li> INCLUDE HERE THE LIST OF CHANGES TO THIS SPECIFIC FILE
* </ul>
* <p>
*
@author Jose
Raul Romero (JRR, 0.2, 0.3) EXAMPLE OF AUTHORS, INITIALS, VERSIONS
@author Jose Maria Luna (JML, 0.1)
@version 0.3
*
**/
Each parameter and method should follow the Javadoc notation for documenting the code.
Further, remember include the file license.txt in every distribution that includes the library or part of it.
1.
Code should be implemented following the Hungarian
notation.
2. Code and comments should be written in English.