yat
0.14.5pre
|
Class splitting Data into training and validation set. More...
#include <yat/classifier/SubsetGenerator.h>
Public Types | |
typedef Data | value_type |
Public Member Functions | |
SubsetGenerator (const Sampler &sampler, const Data &data) | |
Create SubDataSets. More... | |
SubsetGenerator (const Sampler &sampler, const Data &data, FeatureSelector &fs) | |
Create SubDataSets with feature selection. More... | |
~SubsetGenerator () | |
size_t | size (void) const |
const Target & | target (void) const |
const Data & | training_data (size_t i) const |
const utility::Index & | training_features (size_t i) const |
const utility::Index & | training_index (size_t i) const |
const Target & | training_target (size_t i) const |
const Data & | validation_data (size_t i) const |
const utility::Index & | validation_index (size_t i) const |
const Target & | validation_target (size_t i) const |
Class splitting Data into training and validation set.
A SubsetGenerator splits a Data into several training and validation data. A Sampler is used to select samples for a training Data set and a validation Data set, respectively. In addition a FeatureSelector can be used to select Features. For more details see constructors.
typedef Data theplu::yat::classifier::SubsetGenerator< Data >::value_type |
type of Data that is stored in SubsetGenerator
theplu::yat::classifier::SubsetGenerator< Data >::SubsetGenerator | ( | const Sampler & | sampler, |
const Data & | data | ||
) |
Create SubDataSets.
Creates N training data sets and N validation data sets, where N equals the size of sampler. Data must be one of MatrixLookup, MatrixLookupWeighted, or KernelLookup.
In case of MatrixLookup or MatrixLookupWeighted, each column corresponds to a sample and the sampler is used to select columns. Sampler::training_index(size_t) is used to select columns for the corresponding traing_data, and Sampler::validation_index(size_t) is used to select columns for the corresponding validation_data.
In case of a KernelLookup it is a bit different. A symmetric training kernel is created using Sampler::training_index(size_t) to select rows and columns. The validation kernel is typically not symmetric, but the columns correspond to a validation sample and each row corresponds to a training sample. Consequently Sampler::training_index(size_t) is used to select rows, and Sampler::validation_index(size_t) is used to select columns.
sampler | Sampler that is used to select samples. |
data | Data to split up in validation and training. |
theplu::yat::classifier::SubsetGenerator< Data >::SubsetGenerator | ( | const Sampler & | sampler, |
const Data & | data, | ||
FeatureSelector & | fs | ||
) |
Create SubDataSets with feature selection.
Creates N training data sets and N validation data sets, where N equals the size of sampler. The Sampler defines which samples are included in a subset. Likewise a FeatureSelector, fs, is used to select features. The selection is based on not based on the entire dataset but solely on the training dataset. Data must be one of MatrixLookup, MatrixLookupWeighted, or KernelLookup.
In case of MatrixLookup or MatrixLookupWeighted, each column corresponds to a sample and the sampler is used to select columns. Sampler::training_index(size_t) is used to select columns for the corresponding traing_data, and Sampler::validation_index(size_t) is used to select columns for the corresponding validation_data. The FeatureSelector is used to select features, i.e., to select rows to be included in the subsets.
In case of a KernelLookup it is a bit different. A symmetric training kernel is created using Sampler::training_index(size_t) to select rows and columns. However, the created KernelLookup is not simply the subkernel of data, but each element is recalculated using the features selected by FeatureSelector fs. In the validation kernel each column corresponds to a validation sample and each row corresponds to a training sample. Consequently Sampler::training_index(size_t) is used to select rows, and Sampler::validation_index(size_t) is used to select columns. The same set of features are used to caclulate the elements as for the training kernel, i.e., feature selection is based on training data.
sampler | taking care of partioning dataset |
data | data to be split up in validation and training. |
fs | Object selecting features for each subset |
theplu::yat::classifier::SubsetGenerator< Data >::~SubsetGenerator | ( | ) |
Destructor
size_t theplu::yat::classifier::SubsetGenerator< Data >::size | ( | void | ) | const |
const Target & theplu::yat::classifier::SubsetGenerator< Data >::target | ( | void | ) | const |
const Data & theplu::yat::classifier::SubsetGenerator< Data >::training_data | ( | size_t | i | ) | const |
See constructors for details on how training data are generated.
const utility::Index & theplu::yat::classifier::SubsetGenerator< Data >::training_features | ( | size_t | i | ) | const |
Features that are used to create ith training data and validation data.
const utility::Index & theplu::yat::classifier::SubsetGenerator< Data >::training_index | ( | size_t | i | ) | const |
const Target & theplu::yat::classifier::SubsetGenerator< Data >::training_target | ( | size_t | i | ) | const |
const Data & theplu::yat::classifier::SubsetGenerator< Data >::validation_data | ( | size_t | i | ) | const |
See constructors for details on how validation data are generated.
const utility::Index & theplu::yat::classifier::SubsetGenerator< Data >::validation_index | ( | size_t | i | ) | const |
const Target & theplu::yat::classifier::SubsetGenerator< Data >::validation_target | ( | size_t | i | ) | const |