Reciever Operating Characteristic. More...

#include <yat/statistics/ROC.h>

Public Member Functions
	ROC (void)
	Default constructor.

void	add (double value, bool target, double weight=1.0)
	Add a data value. More...

double	area (void) const
	Area Under Curve, AUC. More...

unsigned int &	minimum_size (void)
	threshold for p_value calculation More...

const unsigned int &	minimum_size (void) const
	threshold for p_value calculation More...

double	n (void) const
	number of samples More...

double	n_neg (void) const
	number of negative samples More...

double	n_pos (void) const
	number of positive samples More...

double	p_left (void) const

double	p_right (void) const
	One-sided P-value. More...

double	p_value_one_sided (void) const

double	p_value (void) const
	Two-sided p-value. More...

void	remove (double value, bool target, double weight=1.0)
	remove a data value More...

void	reset (void)
	Set everything to zero.

Detailed Description

Reciever Operating Characteristic.

As the area under an ROC curve is equivalent to Mann-Whitney U statistica, this class can be used to perform a Mann-Whitney U-test (aka Wilcoxon).

See also: AUC

Member Function Documentation

◆ add()

void theplu::yat::statistics::ROC::add	(	double	value,
		bool	target,
		double	weight = `1.0`
	)

Add a data value.

Parameters

value	data value
target	`true` if value belongs to class positive
weight	indicating how important the data point is. A zero weight implies the data point is ignored. A negative weight should be understood as removing a data point and thus typically only makes sense if there is a previously added data point with same value and target.

◆ area()

double theplu::yat::statistics::ROC::area ( void ) const

Area Under Curve, AUC.

See also: AUC for how the area is calculated

Returns: Area under curve.

◆ minimum_size() [1/2]

unsigned int& theplu::yat::statistics::ROC::minimum_size ( void )

threshold for p_value calculation

Function can used to change the minimum_size.

Returns: reference to threshold minimum size

◆ minimum_size() [2/2]

const unsigned int& theplu::yat::statistics::ROC::minimum_size ( void ) const

threshold for p_value calculation

Threshold deciding whether p-value is computed using exact method or a Gaussian approximation. If either number of positive samples, n_pos(void), or number of negative samples, n_neg(void), are smaller than minimum_size the exact method is used.

See also: p_value

Returns: const reference to minimum_size

◆ n()

double theplu::yat::statistics::ROC::n ( void ) const

number of samples

Returns: sum of weights

◆ n_neg()

double theplu::yat::statistics::ROC::n_neg ( void ) const

number of negative samples

Returns: sum of weights with negative target

◆ n_pos()

double theplu::yat::statistics::ROC::n_pos ( void ) const

number of positive samples

Returns: sum of weights with positive target

◆ p_left()

double theplu::yat::statistics::ROC::p_left ( void ) const

Calculates the probability to get this area (or less).

See also: p_right for more details

◆ p_right()

double theplu::yat::statistics::ROC::p_right ( void ) const

One-sided P-value.

Calculates the one-sided p-value, i.e., probability to get this area (or greater) given that there is no difference between the two classes.

Exact method: In the exact method the function goes through all permutations and counts what fraction for which the area is greater (or equal) than area in original permutation. In case all non-zero weights are not equal, iterating through all permutations is not sufficient so algorithm goes through all combinations instead which quickly becomes a large number (N!).

Large-sample Approximation: When many data points are available, see minimum_size(), a Gaussian approximation is used and the p-value is calculated as

$P = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^z \exp{\left(-\frac{t^2}{2}\right)} dt$

where

$z = \frac{\textrm{area} - 0.5 - 0.5/(n^+ \cdot n^-)}{s}$

and

$s^2 = \frac{n+1+\sum \left(n_x \cdot (n_x^2-1)\right)} {12\cdot n^+\cdot n^-}$

where sum runs over different data values (of ties) and $ n_x $ is number data points with that value. The sum is a correction term for ties and is zero if there are no ties.

The number of samples in a group, $ n^+ $ , is calculated as $n = (\sum w)^2 / \sum w^2$

Returns: $P(a \ge \textrm{area})$

◆ p_value()

double theplu::yat::statistics::ROC::p_value ( void ) const

Two-sided p-value.

Calculates the probability to get an area, a, equal or more extreme than area

$P(a \ge \textrm{max}(\textrm{area},1-\textrm{area})) + P(a \le \textrm{min}(\textrm{area}, 1-\textrm{area}))$

If there are no ties, distribution of a is symmetric, so if area is greater than 0.5, this boils down to $P = 2*P(a \ge \textrm{area}) = 2*P_\textrm{one-sided}$ .

Returns: two-sided p-value

See also: p_right

◆ p_value_one_sided()

double theplu::yat::statistics::ROC::p_value_one_sided ( void ) const

Deprecated:: Provided for backward compatibility with 0.10 API. Use p_right() instead.

◆ remove()

void theplu::yat::statistics::ROC::remove	(	double	value,
		bool	target,
		double	weight = `1.0`
	)

remove a data value

A data point with identical value, target, and weight must have beed added prior calling this function; else an exception is thrown.

Since: New in yat 0.9

The documentation for this class was generated from the following file:

yat/statistics/ROC.h

Public Member Functions