neurotools.stats package

Submodules

Module contents

Statistical routines.

neurotools.stats.partition_data(x, y, NFOLD=3)[source]

Parrition independent variables x and dependent variables y into NFOLD crossvalidation training/testing datasets.

Parameters:
  • x (Ncovariates × Nsamples np.array) – Independent variables

  • y (Npredicted × Nsamples np.array) – Dependent variables

  • NFOLD (int; default 3) – Number of crossvalidation blocks to partition data into.

Returns:

result

Iterator with NFOLD items, each element yielding:

xtrain: training data for x for this block ytrain: training data for y for this block xtest: testing data for x for this block ytest: testing data for y for this block

Return type:

iterator

neurotools.stats.partition_trials_for_crossvalidation(x, K, shuffle=False)[source]

Split trial data into crossvalidation blocks.

Parameters:
  • x (list) – List of trial data to partition. Each entry in the list should be a NTIMEPOITS×NVARIABLES array.

  • K (int) – Number of crossvalidation blocks to compute

  • shuffle (bool, default False)

Returns:

spans – List of trial indecies to use for each block

Return type:

list

neurotools.stats.add_constant(data, axis=None)[source]

Appends a constant feature to a multi-dimensional array of dependent variables.

Parameters:
  • data (np.array)

  • axis (int or (default) None) – Axis along which to append the constant feature

neurotools.stats.trial_crossvalidated_least_squares(a, b, K, regress=None, reg=1e-10, shuffle=False, errmethod='L2', **kwargs)[source]

predicts B from A in K-fold cross-validated blocks using linear least squares. I.e. find w such that B=Aw.

Parameters:
  • a (array) – List of trials for independent variables; For every trial, the first dimension should be time or number of samples, etc.

  • b (vector) – List of trials for dependent variables

  • K (int) – Number of cross-validation blocks

  • regress (function, optional) – Regression function, defaults to np.linalg.lstsq (if providing another function, please match the call signature of np.linalg.lstsq)

  • reg (scalar, default 1e-10) – L2 regularization penalty

  • shuffle (bool, default False) – Whether to shuffle trials before crossvalidation

  • errmethod (String) – Method used to compute the error. Can be ‘L1’ (mean absolute error); ‘L2’ (root mean-squared error) or ‘correlation’ (pearson correlation coefficient).

  • add_constant (bool, default False) – Whether to append an additional constand offset feature to the data. The returned weight matrix will have one extra entry, at the end, reflecting the offset, if this is set to True.

Returns:

  • w, array-like – model coefficients x from each cross-validation

  • bhat, array-like – predicted values of b under crossvalidation

  • error – root mean squared error from each crossvalidation

neurotools.stats.partition_data_for_crossvalidation(a, b, K, discard_mean=False)[source]

For predicting B from A, partition both training and testing data into K-fold cross-validation blocks. This operates over the first axis of the array

Parameters:
  • a (array) – Independent variables; First dimension should be time or number of samples, etc.

  • b (vector) – dependent variables

  • K (int) – Number of cross-validation blocks

  • discard_mean (boolean; default False) – Whether to remove the means from all variables.

Returns:

  • trainA (list) – list of training blocks for independent variables A

  • trainB (list) – list of training blocks for dependent variables B

  • testA (list) – list of testing blocks for independent variables A

  • testB (list) – list of testing blocks for dependent variables B

neurotools.stats.block_shuffle(x, blocksize=None)[source]

Shuffle a 2D array in blocks along axis 0 For example, if you provide a NTIMES × NFEATURES array, this will shuffle all features similarly in blocks along the time axis.

Parameters:
  • x (np.array) – First dimension should be time or samples; This dimension will be shuffled in blocks of size blocksize.

  • blocksize (int or (default) None) – If None, defaults to max(10,x.shape[0]//100) i.e. chooses a size to divide x into 100 blocks.

Returns:

result

Return type:

np.array

neurotools.stats.crossvalidated_least_squares(a, b, K, regress=None, reg=1e-10, blockshuffle=None)[source]

predicts B from A in K-fold cross-validated blocks using linear least squares. I.e. find w such that B = Aw

Parameters:
  • a (array) – Independent variables; First dimension should be time or number of samples, etc.

  • b (vector) – dependent variables

  • K (int) – Number of cross-validation blocks

  • regress (function, optional) – Regression function, defaults to np.linalg.lstsq (if providing another function, please match the call signature of np.linalg.lstsq)

  • reg (scalar, default 1e-10) – L2 regularization penalty

  • blockshuffle (positive int or None, default None) – If not None, should be a positive integeter indicating the block-size in which to shuffle the input data before breaking it into cross-validation blocks.

Returns:

  • w, array-like – model coefficients x from each cross-validation

  • bhat, array-like – predicted values of b under crossvalidation

  • cc, number – correlation coefficient

  • rms, number – root mean squared error

neurotools.stats.fraction_explained_deviance(L, L0, Ls)[source]

Calculate the fraction explained deviance, which is the analogue of the linear-Gaussian r² for Generalized Linear Models (GLMs)

Parameters:
  • L (np.float32) – Model likelihood(s) evaluated on held-out test data.

  • L0 (np.float32) – Baseline likelihood, calculated by using the test-data’s mean-rate as a prediction.

  • Ls (np.float32) – Saturated model likelihood(s) calculated by using the true labels as the estimated values

Returns:

– normalized explained deviance

Return type:

np.array

neurotools.stats.nrmse(estimate, true, axis=None)[source]

Normalized root mean-squared error.

Parameters:
  • estimate (array-like) – Estimated data values

  • true (array-like) – True data values

  • axis (int; default None) – Array axis along which to operate.

Returns:

result – Root-mean-squared error between estiamte and true, normalized by the variance of true.

Return type:

np.float64

neurotools.stats.weighted_avg_and_std(values, weights)[source]

Return the weighted average and standard deviation. values, weights – Numpy ndarrays with the same shape.

Parameters:
  • values (np.array) – Array of values for which to compute (μ,σ) weighted summary statistics

  • weights (np.array) – Weights for each value

Returns:

  • mean (np.float64) – Weighted mean

  • sigma (np.float64) – Weighted standard deviation

neurotools.stats.print_stats(g, name='', prefix='')[source]

computes, prints, and returns mean, median, minimum, and maximum.

Parameters:

g (1D np.array) – List of samples

Returns:

  • mean (np.array)

  • median (np.array)

  • minimum (np.array)

  • maximum (np.array)

neurotools.stats.outliers(x, percent=10, side='both')[source]

Reject outliers from data based on percentiles.

Parameters:
  • x (ndarary) – 1D numeric array of data values

  • percent (number) – percent between 0 and 100 to remove

  • side (str) – ‘left’ ‘right’ or ‘both’. Default is ‘both’. Remove extreme values from the left / right / both sides of the data distribution. If both, the percent is halved and removed from both the left and the right

Returns:

Boolean array of same shape as x indicating outliers

Return type:

np.bool

neurotools.stats.reject_outliers(x, percent=10, side='both')[source]

Reject outliers from data based on percentiles.

Parameters:
  • x (ndarary) – 1D numeric array of data values

  • percent (number) – percent between 0 and 100 to remove

  • side (str) – ‘left’ ‘right’ or ‘both’. Default is ‘both’. Remove extreme values from the left / right / both sides of the data distribution. If both, the percent is halved and removed from both the left and the right.

Returns:

  • np.ndarray – Values with outliers removed

  • kept (np.int32) – Indecies of values kept

  • removed (np.int32) – Indecies of values removed

class neurotools.stats.Description(data)[source]

Bases: object

quick statistical description

short()[source]

Abbreviated statistical summary

neurotools.stats.glmfit(X, Y)[source]

Wrapper for statsmodels glmfit that prepares a constant parameter and configuration options for poisson-GLM fitting. Please see the documentation for glmfit in statsmodels for more details.

This method will automatically add a constant colum to the feature matrix Y.

Parameters:
  • X (array-like) – A NOBSERVATIONS × K array where NOBSERVATIONS is the number of observations and k is the number of regressors. An intercept is not included by default and should be added by the user (models specified using a formula include an intercept by default). See statsmodels.tools.add_constant.

  • Y (array-like) – 1d array of poisson counts. This array can be 1d or 2d.

neurotools.stats.pca(x, n_keep=None, rank_deficient=False)[source]

w,v = pca(x,n_keep=None) Performs PCA on data x, keeping the first `n_keep ` dimensions

Parameters:
  • x (ndarray) – NSAMPLES×NFEATURES array on which to perform PCA

  • n_keep (int) – Number of principle components to retain

Returns:

  • w (weights (eigenvalues))

  • v (eigenvector (principal components))

neurotools.stats.covariance(x, y=None, sample_deficient=False, reg=0.0, centered=True)[source]

Covariance matrix for Nsamples x Nfeatures matrix. Data are not centered before computing covariance.

Parameters:
  • x (NSAMPLES×NFEATURES array-like) – Array of input features

  • y (Nsamples x Nyfeatures array-like) – Array of input features

  • sample_deficient (bool, default False) – Whether the data contains fewer samples than it does features. If False (the default), routine will raise a ValueError.

  • reg (positive scalar, default 0) – Diagonal regularization to add to the covariance

  • centered (boolean, default True) – Whether to subtract the means from the data before taking the covariace.

Returns:

C – Sample covariance matrix

Return type:

np.array

neurotools.stats.get_factor_analysis(X, NFACTORS)[source]

Wrapper to fit factor analysis model, extract the model, and sort by factor importance.

Parameters:
  • X (np.array) – Multivariate signal

  • NFACTORS (int) – Number of factors to fit

Returns:

  • Y – Result of fa.fit_transform(X)

  • Sigmafa.noise_variance_

  • Ffa.components_

  • lmbda – Loadings diag(F.dot(F.T))

  • fa (sklearn.decomposition.FactorAnalysis) – Fitted factor analysis model

neurotools.stats.project_factors(X, F, S)[source]

Project observations X with noise variances S onto latent factors F. This uses the same argument/return conventions as scipy’s factor analysis.

Parameters:
  • X (array-like) – data

  • F (array-like) – factor matrix

  • S (array-like) – i.i.d variances

neurotools.stats.predict_latent(fa, predict_from, X)[source]

Predict mean of all factors from predict_from factors.

Parameters:
  • fa (sklearn.decomposition.FactorAnalysis) – Fitted factor analysis model

  • predict_from (list of int) – Factor indecies to use for prediction

  • X (np.array) – Underlying signal

Returns:

Xthat – Predicted means over time

Return type:

np.array

neurotools.stats.factor_predict(fa, predict_from, predict_to, X)[source]

Predict mean, variance of predict_to factors from predict_from factors.

Parameters:
  • fa (sklearn.decomposition.FactorAnalysis) – Fitted factor analysis model

  • predict_from (list of int) – Factor indecies to use for prediction

  • predict_to (list of int) – Factor indecies to predict

  • X (np.array) – Underlying signal

Returns:

  • Xthat (np.array) – Predicted means over time

  • Xtc (np.array) – Predicted covariance over time

neurotools.stats.nanrankdata(x, mode='fraction')[source]

A variant of rankdata that handles NaNs better. Returns normalized rank in (0,1) by default.

Parameters:
  • x (iterable) – Data to be ranked

  • mode (str; default 'fraction') – 'fraction': return rank of non-NaN values in (0,1) 'percentile': return rank of non-NaN values in (0,100) None: return integer ranks (NaNs excluded)

neurotools.stats.mean_confidence_interval(data, confidence=0.95)[source]
neurotools.stats.minmax(x)[source]