neurotools.stats.pvalues module

pvalues.py

This module collects useful routines for working with p-values.

neurotools.stats.pvalues.benjamini_hochberg_positive_correlations(pvalues, alpha)[source]

Derived from the following matlab code from Wilson Truccolo

function [pID,pN] = fdr(p,q) % FORMAT pt = fdr(p,q) % % p - vector of p-values % q - False Discovery Rate level % % pID - p-value threshold based on independence or positive dependence % pN - Nonparametric p-value threshold % % This function takes a vector of p-values and a False Discovery Rate % (FDR). It returns two p-value thresholds, one based on an assumption of % independence or positive dependence, and one that makes no assumptions % about how the tests are correlated. For imaging data, an assumption of % positive dependence is reasonable, so it should be OK to use the first % (more sensitive) threshold. % % Reference: Benjamini and Hochberg, J Royal Statistical Society. Series B % (Methodological), V 57, No. 1 (1995), pp. 289-300. % _____________________________________________________________________________ % @(#)fdr.m 1.3 Tom Nichols 02/01/18 % Wilson Truccolo: modified 10/19/2007

p = sort(p(:)); V = length(p); I = (1:V)’; cVID = 1; cVN = sum(1./(1:V)); pID = p(max(find(p<=I/V*q/cVID))); pN = p(max(find(p<=I/V*q/cVN)));

Parameters:
  • pvalues (list) – p-values to correct

  • alpha (float in (0,1)) – target false-discovery rate

Returns:

  • pID – p-value threshold based on independence or positive dependence

  • pN – Nonparametric p-value threshold

neurotools.stats.pvalues.correct_pvalues_positive_dependent(pvalue_dictionary, verbose=0, alpha=0.05)[source]

Benjamini-Hochberg multiple-comparison correction assuming positive dependence.

Parameters:

pvalue_dictionary (dict) – label -> pvalue

Returns:

Benjamini-Hochberg corrected dictionary assuming positive correlations, entries as label -> pvalue, reject

Return type:

dict

neurotools.stats.pvalues.correct_pvalues(pvalue_dictionary, verbose=0, alpha=0.05)[source]

This is a convenience wrapper for statsmodels.sandbox.stats.multicomp.multipletests

This corrects for multiple comparisons using the Benjamini-Hochberg procedure, using either the variance for positive dependence or no dependence, whichever is more conservative.

Parameters:

pvalue_dictionary (dict) – label -> pvalue This may also simply be a list of p-values.

Returns:

Benjamini-Hochberg corrected dictionary correlations, entries as label -> pvalue, reject

Return type:

dict

neurotools.stats.pvalues.bootstrap_statistic(statistic, population, ntrials=1000, show_progress=False)[source]

Re-sample a statistical estimator from a single population to estimate the estimator’s uncertainty.

Parameters:
  • statistic (function) – Function accepting elements in population and returning an object reflecting a statistical estimator

  • population (iterable) – Values in the test population.

  • ntrials (positive int; default 1000) – Number of random samples to use

Returns:

samples – Length ntrials list of return-vales from statistic() evaluated on bootstrap samples population with replacement.

Return type:

list

neurotools.stats.pvalues.bootstrap_statistic_two_sided(statistic, test_population, null_population, ntrials=1000)[source]

Estimate the probability that statistic is significantly larger in test_population compared to null_population using bootstrap resampling.

Bootstrapped p-values addresses nonlinearity and non-Gaussian dispersion, it is not a cure for limited data. The number of data points in the test and null populations should be sufficiently large. I recommend no less than 20.

Parameters:
  • statistic (function) – A function that accepts a resampled collection of data from the population and returns some scalar statistic.

  • test_population – Values in the test population.

  • null_population – Values in the null population.

  • ntrials (positive int; default 1000) – Number of random samples to use

Returns:

pvalue – Bootstrapped p-value

Return type:

float in [0,1]

neurotools.stats.pvalues.bootstrap_median(test_population, null_population, ntrials=10000)[source]

Estimate pvalue for difference in medians using bootstrapping

Parameters:
  • test_population (list of samples) – Values in the test population.

  • null_population (list of samples) – Values in the null population.

  • ntrials (positive int; default 1000) – Number of random samples to use

Returns:

pvalue – Bootstrapped p-value

Return type:

float in [0,1]

neurotools.stats.pvalues.bootstrap_mean(test_population, null_population, ntrials=10000)[source]

Estimate pvalue for difference in means using bootstrapping.

Parameters:
  • test_population (list of samples) – Values in the test population.

  • null_population (list of samples) – Values in the null population.

  • ntrials (positive int; default 1000) – Number of random samples to use

Returns:

pvalue – Bootstrapped p-value

Return type:

float in [0,1]

neurotools.stats.pvalues.bootstrap_compare_statistic_two_sided(statistic, popA, popB, ntrials=1000)[source]

Estimate pvalue using bootstrapping

Parameters:
  • popA (list of samples) – Values in population A.

  • popB (list of samples) – Values in population B.

  • statistic (function) – A function that accepts a resampled collection of data from the population and returns some scalar statistic.

  • ntrials (positive int; default 1000) – Number of random samples to use

Returns:

  • delta (float) – Mean difference of statistic(A)-statistics(B)

  • pvalue (float in [0,1]) – Bootstrapped p-value

neurotools.stats.pvalues.bootstrap_compare_statistic_two_sided_parallel(statistic, popA, popB, NA=None, NB=None, ntrials=10000)[source]

Estimate pvalue using bootstrapping

Parameters:
  • statistic (function) – A function that accepts a resampled collection of data from the population and returns some scalar statistic.

  • popA (list of samples) – Values in population A.

  • popB (list of samples) – Values in population B.

  • NA (positive int; default len(A)) – Number of samples to draw from population A. This should be no larger than the size of A.

  • NB (positive int; default len(B)) – Number of samples to draw from population B. This should be no larger than the size of B.

  • ntrials (positive int; default 1000) – Number of random samples to use

Returns:

  • delta (float) – Mean difference of statistic(A)-statistics(B)

  • pvalue (float in [0,1]) – Bootstrapped p-value

neurotools.stats.pvalues.bootstrap_compare_median(popA, popB, NA=None, NB=None, ntrials=100000)[source]

Estimate pvalue for difference in medians using bootstrapping.

Parameters:
  • popA (list of samples) – Values in population A.

  • popB (list of samples) – Values in population B.

  • NA (positive int; default len(A)) – Number of samples to draw from population A. This should be no larger than the size of A.

  • NB (positive int; default len(B)) – Number of samples to draw from population B. This should be no larger than the size of B.

  • ntrials (positive int; default 1000) – Number of random samples to use

Returns:

  • delta (float) – Mean difference of statistic(A)-statistics(B)

  • pvalue (float in [0,1]) – Bootstrapped p-value

neurotools.stats.pvalues.bootstrap_compare_mean(popA, popB, NA=None, NB=None, ntrials=100000)[source]

Estimate pvalue for difference in means using bootstrapping.

Parameters:
  • popA (list of samples) – Values in population A.

  • popB (list of samples) – Values in population B.

  • NA (positive int; default len(A)) – Number of samples to draw from population A. This should be no larger than the size of A.

  • NB (positive int; default len(B)) – Number of samples to draw from population B. This should be no larger than the size of B.

  • ntrials (positive int; default 1000) – Number of random samples to use

Returns:

  • delta (float) – Mean difference of statistic(A)-statistics(B)

  • pvalue (float in [0,1]) – Bootstrapped p-value

neurotools.stats.pvalues.bootstrap_in_blocks(variables, blocklength, nsamples=1, replace=True, seed=None)[source]

Chop dataset into blocks, then re-compose it, sampling these blocks randomly with replacement.

This is used to compute bootstrap confidence intervals.

If `blocklength doesn’t evenly divide the number of time samples, then we include an extra block, but truncate the resulting timeseries to match the length of the original data.

This casts all variables to np.float32.

Parameters:
  • variables (list of np.float32) – List of arrays. The first dimension of each array is time samples and should be the same length for all variables.

  • blocklength (int) – Length of blocks for block-bootstrap-resample

  • nsamples (positive int; default 1) – Number of bootstrap samples to generate.

  • replace (boolean; default True) – Whether to sample with or without replacement. Setting this to false implements a shuffle test, rather than a bootstrap.

  • seed (int or None; default None) – If not None, use seed to set the state of np.random If None, continue with current state.

Returns:

result

Return type:

list of np.float32

neurotools.stats.pvalues.nancorrect(pvalues, alpha=0.05, method='fdr_tsbh')[source]

Wrapper for multipletests(pvalues,alpha,method='fdr_tsbh') in statsmodels.stats.multitest that gracefully handles NaN.

Parameters:
  • pvalues (np.float32) – List of p-values to correct

  • alpha (float in (0,1); default 0.05) – False-discovery rate to correct for

  • method (str; default 'fdr_tsbh') – Method to use with multipletests(). The default is a two-stage Benjamini-Hochberg. see statsmodels.stats.multitest.multipletests for more options.

Returns:

  • reject (np.bool)

  • corrected_pvalues (np.float32)

neurotools.stats.pvalues.atac(p)[source]

Aggregate p-values using Cauchey distribution model (invariant to the degrees of freedom and therefore conservative with potenitally correlated p-values).