neurotools.jobs.initialize_system_cache module

Static initialization routines accompanying neurotools_cache. These routines were written to set up a caching framework for the Oscar high performance computing cluster at Brown University, and have not yet been modified for general use. They still contain hard-coded user-specific paths, for example.

neurotools.jobs.initialize_system_cache.purge_ram_cache(cache_identifier='neurotools_cache')[source]

Deletes the ramdisk cache. USE WITH CAUTION.

This will rm -rf the entire ramdisk_location and is EXTREMELY dangerous. It has been disabled and now raises NotImplementedError

neurotools.jobs.initialize_system_cache.purge_ssd_cache(cache_identifier='neurotools_cache')[source]

Deletes the SSD drive cache. USE WITH CAUTION.

This will rm -rf the entire level2_location and is EXTREMELY dangerous. It has been disabled and now raises NotImplementedError

neurotools.jobs.initialize_system_cache.du(location)[source]

Returns the disk usave (du) of a file

Parameters:

location (string) – Path on filesystem

Returns:

du – File size in bytes

Return type:

integer

neurotools.jobs.initialize_system_cache.reset_ramdisk(force=False, override_ramdisk_location=None)[source]

This will create a 500GB ramdisk on debian linux. This allows in RAM inter-process communication using the filesystem metaphore. It should be considered dangerous. It runs shell commands that require sudo privileges. In some cases, these commands may not execute automatically (e.g. if called form a Jupyter or IPython notebook inside a browser). In this case, one must run the commands by hand.

Parameters:

force (bool) – Modifying the configuration of a ramdisk is risky; The function fails with a warning unless force is set to true.

neurotools.jobs.initialize_system_cache.launch_cache_synchronizers(cache_identifier='neurotools_cache')[source]

Depricated; now raises NotImplementedError.

Inter-process communication is mediated via shared caches mapped onto the file-system. If a collection of processes are distributed over a large filesystem, they may need to share data.

Notes:

This solution originally spawned rsync jobs to keep a collection of locations in the filesystem synchronized. This is bad for the following reasons:

  • Mis-configuration can lead to loss of data.

  • Not all job need to share all cache values.

  • This sort of synchronization should be done lazily.

neurotools.jobs.initialize_system_cache.initialize_caches(level1='/home/mer49/.neurotools_ramdisk', level2=None, level3=None, force=False, verbose=False, cache_identifier='neurotools_cache')[source]

Static cache initialization code This should be run with caution

Caches can be set up in a hierarchy from fast to slow. If a cache entry is missing in the fast cache, it can be repopulated from a slower cache.

For example, a cache hierarchy might include - local RAMdisk for inter-process communication between life processes - local SSD for frequently used intermediate value - local HDD for larger working datasets - network filesystem for large database

# neurotools.jobs.ndecorator.memoize memoizes within the process memory # unsafe_disk_cache memoizes within # memory, ssd, and possible hdd in a hierarchy. # this patches neurotools.jobs.ndecorator.memoize and replaces it # with the disk cacher, causing all dependent code to automatically # implement persistent disk-memoization.

This function will need to be called before importing other libraries.

Parameters:
  • level1 (str) – Path to ram disk for caching intermediate results

  • ssd (str) – Path to SSD to provide persistent storage (back the ram disk)

  • hdd (str) – Optional path to a hard disk; larger but slower storage space.

  • force (boolean, False) – The disk caching framework is still experimental and could lead to loss of data if there is a bug (or worse!). By default, this routine and its subroutines will not run unless forced. By requiring the user to set force=true excplitly, we hopefully enforce caution when using this functionality.

Example

This code was originalled called from a function like this, set up for specific configurations in the Truccolo lab

myhost = os.uname()[1]
if myhost in ('moonbase',):
    level1_location   = '/media/neurotools_level1'
    level2_location = '/ssd_1/mrule'
    level3_location = '/ldisk_1/mrule'
elif myhost in ('basecamp',):
    level1_location   = '/media/neurotools_level1'
    level2_location = '/home/mrule'
elif myhost in ('RobotFortress','petra'):
    level1_location   = '/Users/mrule/neurotools_level1'
    level2_location = '/Users/mrule'
else:
    print('New System. Cache Locations will need configuring.')
    level1_location = level2_location = level3_location = None
neurotools.jobs.initialize_system_cache.cache_test()[source]

Run a test of the disk cache to see if everything is ok; Called if this script is run as main.