neurotools.obsolete.gpu.cu.device module

Functions that deal with things associated with the physical graphics card device.

neurotools.obsolete.gpu.cu.device.estimateThreadsPerBlock(cudamodule)[source]: This function acceptas a cuda module. It will estimate the number of threads from this module that can fit in one block in the current context. It will return the largest number of threads that do not exceed the amount of shared memory, registers, or the hard limit on threads per block, rounded down to a multiple of the warp size.

neurotools.obsolete.gpu.cu.device.estimateBlocks(cudamodule, n_units)[source]

Called after estimateThreadsPerBlock. This function will estimate the number of blocks needed to run n_units. It will not return more blocks than there are multiprocessors.

If there are more blocks than multiprocessors, my convention is to loop within the kernel. It is unclear to me weather running more blocks than there are processors is more or less efficient than looping within blocks.

neurotools.obsolete.gpu.cu.device.estimateLoop(cudamodule, n_units)[source]: Called after estimateBlocks If there are not enough multiprocessors to handle n_units, this will return the number of loops within each kernel needed to process all data.

neurotools.obsolete.gpu.cu.device.card_info()[source]: returns information on the current GPU device as known to pycuda as a string

neurotools.obsolete.gpu.cu.device.missing(*args, **kwargs)[source]