yat
0.20.3pre
|
Reciever Operating Characteristic. More...
#include <yat/statistics/ROC.h>
Public Member Functions | |
ROC (void) | |
Default constructor. | |
void | add (double value, bool target, double weight=1.0) |
Add a data value. More... | |
double | area (void) const |
Area Under Curve, AUC. More... | |
unsigned int & | minimum_size (void) |
threshold for p_value calculation More... | |
const unsigned int & | minimum_size (void) const |
threshold for p_value calculation More... | |
double | n (void) const |
number of samples More... | |
double | n_neg (void) const |
number of negative samples More... | |
double | n_pos (void) const |
number of positive samples More... | |
double | p_left (void) const |
double | p_right (void) const |
One-sided P-value. More... | |
double | p_value_one_sided (void) const |
double | p_value (void) const |
Two-sided p-value. More... | |
void | remove (double value, bool target, double weight=1.0) |
remove a data value More... | |
void | reset (void) |
Set everything to zero. | |
Reciever Operating Characteristic.
As the area under an ROC curve is equivalent to Mann-Whitney U statistica, this class can be used to perform a Mann-Whitney U-test (aka Wilcoxon).
void theplu::yat::statistics::ROC::add | ( | double | value, |
bool | target, | ||
double | weight = 1.0 |
||
) |
Add a data value.
value | data value |
target | true if value belongs to class positive |
weight | indicating how important the data point is. A zero weight implies the data point is ignored. A negative weight should be understood as removing a data point and thus typically only makes sense if there is a previously added data point with same value and target. |
double theplu::yat::statistics::ROC::area | ( | void | ) | const |
unsigned int& theplu::yat::statistics::ROC::minimum_size | ( | void | ) |
threshold for p_value calculation
Function can used to change the minimum_size.
const unsigned int& theplu::yat::statistics::ROC::minimum_size | ( | void | ) | const |
threshold for p_value calculation
Threshold deciding whether p-value is computed using exact method or a Gaussian approximation. If either number of positive samples, n_pos(void), or number of negative samples, n_neg(void), are smaller than minimum_size the exact method is used.
double theplu::yat::statistics::ROC::n | ( | void | ) | const |
number of samples
double theplu::yat::statistics::ROC::n_neg | ( | void | ) | const |
number of negative samples
double theplu::yat::statistics::ROC::n_pos | ( | void | ) | const |
number of positive samples
double theplu::yat::statistics::ROC::p_left | ( | void | ) | const |
Calculates the probability to get this area (or less).
double theplu::yat::statistics::ROC::p_right | ( | void | ) | const |
One-sided P-value.
Calculates the one-sided p-value, i.e., probability to get this area (or greater) given that there is no difference between the two classes.
Exact method: In the exact method the function goes through all permutations and counts what fraction for which the area is greater (or equal) than area in original permutation. In case all non-zero weights are not equal, iterating through all permutations is not sufficient so algorithm goes through all combinations instead which quickly becomes a large number (N!).
Large-sample Approximation: When many data points are available, see minimum_size(), a Gaussian approximation is used and the p-value is calculated as
where
and
where sum runs over different data values (of ties) and is number data points with that value. The sum is a correction term for ties and is zero if there are no ties.
The number of samples in a group, , is calculated as
double theplu::yat::statistics::ROC::p_value | ( | void | ) | const |
Two-sided p-value.
Calculates the probability to get an area, a
, equal or more extreme than area
If there are no ties, distribution of a is symmetric, so if area is greater than 0.5, this boils down to .
double theplu::yat::statistics::ROC::p_value_one_sided | ( | void | ) | const |
void theplu::yat::statistics::ROC::remove | ( | double | value, |
bool | target, | ||
double | weight = 1.0 |
||
) |
remove a data value
A data point with identical value, target, and weight must have beed added prior calling this function; else an exception is thrown.