neurotools.stats.pvalues module
pvalues.py
This module collects useful routines for working with p-values.
- neurotools.stats.pvalues.benjamini_hochberg_positive_correlations(pvalues, alpha)[source]
Derived from the following matlab code from Wilson Truccolo
function [pID,pN] = fdr(p,q) % FORMAT pt = fdr(p,q) % % p - vector of p-values % q - False Discovery Rate level % % pID - p-value threshold based on independence or positive dependence % pN - Nonparametric p-value threshold % % This function takes a vector of p-values and a False Discovery Rate % (FDR). It returns two p-value thresholds, one based on an assumption of % independence or positive dependence, and one that makes no assumptions % about how the tests are correlated. For imaging data, an assumption of % positive dependence is reasonable, so it should be OK to use the first % (more sensitive) threshold. % % Reference: Benjamini and Hochberg, J Royal Statistical Society. Series B % (Methodological), V 57, No. 1 (1995), pp. 289-300. % _____________________________________________________________________________ % @(#)fdr.m 1.3 Tom Nichols 02/01/18 % Wilson Truccolo: modified 10/19/2007
p = sort(p(:)); V = length(p); I = (1:V)’; cVID = 1; cVN = sum(1./(1:V)); pID = p(max(find(p<=I/V*q/cVID))); pN = p(max(find(p<=I/V*q/cVN)));
- Parameters:
pvalues (list) – p-values to correct
alpha (float in (0,1)) – target false-discovery rate
- Returns:
pID – p-value threshold based on independence or positive dependence
pN – Nonparametric p-value threshold
- neurotools.stats.pvalues.correct_pvalues_positive_dependent(pvalue_dictionary, verbose=0, alpha=0.05)[source]
Benjamini-Hochberg multiple-comparison correction assuming positive dependence.
- Parameters:
pvalue_dictionary (dict) – label -> pvalue
- Returns:
Benjamini-Hochberg corrected dictionary assuming positive correlations, entries as label -> pvalue, reject
- Return type:
dict
- neurotools.stats.pvalues.correct_pvalues(pvalue_dictionary, verbose=0, alpha=0.05)[source]
This is a convenience wrapper for statsmodels.sandbox.stats.multicomp.multipletests
This corrects for multiple comparisons using the Benjamini-Hochberg procedure, using either the variance for positive dependence or no dependence, whichever is more conservative.
- Parameters:
pvalue_dictionary (dict) – label -> pvalue This may also simply be a list of p-values.
- Returns:
Benjamini-Hochberg corrected dictionary correlations, entries as label -> pvalue, reject
- Return type:
dict
- neurotools.stats.pvalues.bootstrap_statistic(statistic, population, ntrials=1000, show_progress=False)[source]
Re-sample a statistical estimator from a single population to estimate the estimator’s uncertainty.
- Parameters:
statistic (function) – Function accepting elements in population and returning an object reflecting a statistical estimator
population (iterable) – Values in the test population.
ntrials (positive int; default 1000) – Number of random samples to use
- Returns:
samples – Length ntrials list of return-vales from statistic() evaluated on bootstrap samples population with replacement.
- Return type:
list
- neurotools.stats.pvalues.bootstrap_statistic_two_sided(statistic, test_population, null_population, ntrials=1000)[source]
Estimate the probability that statistic is significantly larger in test_population compared to null_population using bootstrap resampling.
Bootstrapped p-values addresses nonlinearity and non-Gaussian dispersion, it is not a cure for limited data. The number of data points in the test and null populations should be sufficiently large. I recommend no less than 20.
- Parameters:
statistic (function) – A function that accepts a resampled collection of data from the population and returns some scalar statistic.
test_population – Values in the test population.
null_population – Values in the null population.
ntrials (positive int; default 1000) – Number of random samples to use
- Returns:
pvalue – Bootstrapped p-value
- Return type:
float in [0,1]
- neurotools.stats.pvalues.bootstrap_median(test_population, null_population, ntrials=10000)[source]
Estimate pvalue for difference in medians using bootstrapping
- Parameters:
test_population (list of samples) – Values in the test population.
null_population (list of samples) – Values in the null population.
ntrials (positive int; default 1000) – Number of random samples to use
- Returns:
pvalue – Bootstrapped p-value
- Return type:
float in [0,1]
- neurotools.stats.pvalues.bootstrap_mean(test_population, null_population, ntrials=10000)[source]
Estimate pvalue for difference in means using bootstrapping.
- Parameters:
test_population (list of samples) – Values in the test population.
null_population (list of samples) – Values in the null population.
ntrials (positive int; default 1000) – Number of random samples to use
- Returns:
pvalue – Bootstrapped p-value
- Return type:
float in [0,1]
- neurotools.stats.pvalues.bootstrap_compare_statistic_two_sided(statistic, popA, popB, ntrials=1000)[source]
Estimate pvalue using bootstrapping
- Parameters:
popA (list of samples) – Values in population A.
popB (list of samples) – Values in population B.
statistic (function) – A function that accepts a resampled collection of data from the population and returns some scalar statistic.
ntrials (positive int; default 1000) – Number of random samples to use
- Returns:
delta (float) – Mean difference of statistic(A)-statistics(B)
pvalue (float in [0,1]) – Bootstrapped p-value
- neurotools.stats.pvalues.bootstrap_compare_statistic_two_sided_parallel(statistic, popA, popB, NA=None, NB=None, ntrials=10000)[source]
Estimate pvalue using bootstrapping
- Parameters:
statistic (function) – A function that accepts a resampled collection of data from the population and returns some scalar statistic.
popA (list of samples) – Values in population A.
popB (list of samples) – Values in population B.
NA (positive int; default len(A)) – Number of samples to draw from population A. This should be no larger than the size of A.
NB (positive int; default len(B)) – Number of samples to draw from population B. This should be no larger than the size of B.
ntrials (positive int; default 1000) – Number of random samples to use
- Returns:
delta (float) – Mean difference of statistic(A)-statistics(B)
pvalue (float in [0,1]) – Bootstrapped p-value
- neurotools.stats.pvalues.bootstrap_compare_median(popA, popB, NA=None, NB=None, ntrials=100000)[source]
Estimate pvalue for difference in medians using bootstrapping.
- Parameters:
popA (list of samples) – Values in population A.
popB (list of samples) – Values in population B.
NA (positive int; default len(A)) – Number of samples to draw from population A. This should be no larger than the size of A.
NB (positive int; default len(B)) – Number of samples to draw from population B. This should be no larger than the size of B.
ntrials (positive int; default 1000) – Number of random samples to use
- Returns:
delta (float) – Mean difference of statistic(A)-statistics(B)
pvalue (float in [0,1]) – Bootstrapped p-value
- neurotools.stats.pvalues.bootstrap_compare_mean(popA, popB, NA=None, NB=None, ntrials=100000)[source]
Estimate pvalue for difference in means using bootstrapping.
- Parameters:
popA (list of samples) – Values in population A.
popB (list of samples) – Values in population B.
NA (positive int; default len(A)) – Number of samples to draw from population A. This should be no larger than the size of A.
NB (positive int; default len(B)) – Number of samples to draw from population B. This should be no larger than the size of B.
ntrials (positive int; default 1000) – Number of random samples to use
- Returns:
delta (float) – Mean difference of statistic(A)-statistics(B)
pvalue (float in [0,1]) – Bootstrapped p-value
- neurotools.stats.pvalues.bootstrap_in_blocks(variables, blocklength, nsamples=1, replace=True, seed=None)[source]
Chop dataset into blocks, then re-compose it, sampling these blocks randomly with replacement.
This is used to compute bootstrap confidence intervals.
If `blocklength doesn’t evenly divide the number of time samples, then we include an extra block, but truncate the resulting timeseries to match the length of the original data.
This casts all variables to np.float32.
- Parameters:
variables (list of np.float32) – List of arrays. The first dimension of each array is time samples and should be the same length for all variables.
blocklength (int) – Length of blocks for block-bootstrap-resample
nsamples (positive int; default 1) – Number of bootstrap samples to generate.
replace (boolean; default True) – Whether to sample with or without replacement. Setting this to false implements a shuffle test, rather than a bootstrap.
seed (int or None; default None) – If not None, use seed to set the state of np.random If None, continue with current state.
- Returns:
result
- Return type:
list of np.float32
- neurotools.stats.pvalues.nancorrect(pvalues, alpha=0.05, method='fdr_tsbh')[source]
Wrapper for
multipletests(pvalues,alpha,method='fdr_tsbh')
instatsmodels.stats.multitest
that gracefully handlesNaN
.- Parameters:
pvalues (np.float32) – List of p-values to correct
alpha (float in (0,1); default 0.05) – False-discovery rate to correct for
method (str; default 'fdr_tsbh') – Method to use with
multipletests()
. The default is a two-stage Benjamini-Hochberg. seestatsmodels.stats.multitest.multipletests
for more options.
- Returns:
reject (np.bool)
corrected_pvalues (np.float32)