Class splitting Data into training and validation set. More...

#include <yat/classifier/SubsetGenerator.h>

Public Types
typedef Data	value_type

Public Member Functions
	SubsetGenerator (const Sampler &sampler, const Data &data)
	Create SubDataSets. More...

	SubsetGenerator (const Sampler &sampler, const Data &data, FeatureSelector &fs)
	Create SubDataSets with feature selection. More...

	~SubsetGenerator ()

size_t	size (void) const

const Target &	target (void) const

const Data &	training_data (size_t i) const

const utility::Index &	training_features (size_t i) const

const utility::Index &	training_index (size_t i) const

const Target &	training_target (size_t i) const

const Data &	validation_data (size_t i) const

const utility::Index &	validation_index (size_t i) const

const Target &	validation_target (size_t i) const

Detailed Description

template<typename Data>
class theplu::yat::classifier::SubsetGenerator< Data >

Class splitting Data into training and validation set.

A SubsetGenerator splits a Data into several training and validation data. A Sampler is used to select samples for a training Data set and a validation Data set, respectively. In addition a FeatureSelector can be used to select Features. For more details see constructors.

Note: Data must be one of MatrixLookup, MatrixLookupWeighted, or KernelLookup.

Member Typedef Documentation

template<typename Data >

typedef Data theplu::yat::classifier::SubsetGenerator< Data >::value_type

type of Data that is stored in SubsetGenerator

Constructor & Destructor Documentation

template<typename Data >

theplu::yat::classifier::SubsetGenerator< Data >::SubsetGenerator	(	const Sampler &	sampler,
		const Data &	data
	)

Create SubDataSets.

Creates N training data sets and N validation data sets, where N equals the size of sampler. Data must be one of MatrixLookup, MatrixLookupWeighted, or KernelLookup.

In case of MatrixLookup or MatrixLookupWeighted, each column corresponds to a sample and the sampler is used to select columns. Sampler::training_index(size_t) is used to select columns for the corresponding traing_data, and Sampler::validation_index(size_t) is used to select columns for the corresponding validation_data.

In case of a KernelLookup it is a bit different. A symmetric training kernel is created using Sampler::training_index(size_t) to select rows and columns. The validation kernel is typically not symmetric, but the columns correspond to a validation sample and each row corresponds to a training sample. Consequently Sampler::training_index(size_t) is used to select rows, and Sampler::validation_index(size_t) is used to select columns.

Parameters

sampler	Sampler that is used to select samples.
data	Data to split up in validation and training.

template<typename Data >

theplu::yat::classifier::SubsetGenerator< Data >::SubsetGenerator	(	const Sampler &	sampler,
		const Data &	data,
		FeatureSelector &	fs
	)

Create SubDataSets with feature selection.

Creates N training data sets and N validation data sets, where N equals the size of sampler. The Sampler defines which samples are included in a subset. Likewise a FeatureSelector, fs, is used to select features. The selection is based on not based on the entire dataset but solely on the training dataset. Data must be one of MatrixLookup, MatrixLookupWeighted, or KernelLookup.

In case of MatrixLookup or MatrixLookupWeighted, each column corresponds to a sample and the sampler is used to select columns. Sampler::training_index(size_t) is used to select columns for the corresponding traing_data, and Sampler::validation_index(size_t) is used to select columns for the corresponding validation_data. The FeatureSelector is used to select features, i.e., to select rows to be included in the subsets.

In case of a KernelLookup it is a bit different. A symmetric training kernel is created using Sampler::training_index(size_t) to select rows and columns. However, the created KernelLookup is not simply the subkernel of data, but each element is recalculated using the features selected by FeatureSelector fs. In the validation kernel each column corresponds to a validation sample and each row corresponds to a training sample. Consequently Sampler::training_index(size_t) is used to select rows, and Sampler::validation_index(size_t) is used to select columns. The same set of features are used to caclulate the elements as for the training kernel, i.e., feature selection is based on training data.

Parameters

sampler	taking care of partioning dataset
data	data to be split up in validation and training.
fs	Object selecting features for each subset

template<typename Data >

theplu::yat::classifier::SubsetGenerator< Data >::~SubsetGenerator ( )

Destructor

Member Function Documentation

template<typename Data >

size_t theplu::yat::classifier::SubsetGenerator< Data >::size ( void ) const

Returns: number of subsets

template<typename Data >

const Target & theplu::yat::classifier::SubsetGenerator< Data >::target ( void ) const

Returns: the target for the total set

template<typename Data >

const Data & theplu::yat::classifier::SubsetGenerator< Data >::training_data ( size_t i ) const

See constructors for details on how training data are generated.

Returns: ith training data

template<typename Data >

const utility::Index & theplu::yat::classifier::SubsetGenerator< Data >::training_features ( size_t i ) const

Features that are used to create ith training data and validation data.

Returns: training features

template<typename Data >

const utility::Index & theplu::yat::classifier::SubsetGenerator< Data >::training_index ( size_t i ) const

Returns: Index of samples included in ith training data.

template<typename Data >

const Target & theplu::yat::classifier::SubsetGenerator< Data >::training_target ( size_t i ) const

Returns: Targets of ith set of training samples

template<typename Data >

const Data & theplu::yat::classifier::SubsetGenerator< Data >::validation_data ( size_t i ) const

See constructors for details on how validation data are generated.

Returns: ith validation data

template<typename Data >

const utility::Index & theplu::yat::classifier::SubsetGenerator< Data >::validation_index ( size_t i ) const

Returns: Index of samples included in ith validation data.

template<typename Data >

const Target & theplu::yat::classifier::SubsetGenerator< Data >::validation_target ( size_t i ) const

Returns: Targets of ith set validation samples

The documentation for this class was generated from the following file:

yat/classifier/SubsetGenerator.h

Public Types

Public Member Functions

Detailed Description

template<typename Data> class theplu::yat::classifier::SubsetGenerator< Data >

Member Typedef Documentation

Constructor & Destructor Documentation

Member Function Documentation

template<typename Data>
class theplu::yat::classifier::SubsetGenerator< Data >