Configuring asyncmd#

Various functions can be used to configure asyncmd resource usage behavior during runtime. Most notably are probably the functions to limit resource use (i.e. number of concurrent SLURM jobs, number of open files, number of processes, etc.) and functions to influence the Trajectory CV value caching like setting the default cache type for all Trajectory or registering (and deregistering) h5py.File or h5py.Group objects for caching.

Show/print current configuration#

asyncmd.config.show_config() None#

Print/show current configuration.

General resource usage#

asyncmd.config.set_max_process(num: int | None = None, max_num: int | None = None) None#

Set the maximum number of concurrent python processes.

If num is None, default to os.cpu_count() / 4.

Parameters:
  • num (int, optional) – Number of processes, if None will default to 1/4 of the CPU count.

  • max_num (int, optional) – If given the number of processes can not exceed this number independent of the value of CPU count. Useful mostly for code that runs on multiple different machines (with different CPU counts) but still wants to avoid spawning hundreds of processes.

asyncmd.config.set_max_files_open(num: int | None = None, margin: int = 30) None#

Set the maximum number of concurrently opened files.

By default use the systems soft resource limit.

Parameters:
  • num (int, optional) – Maximum number of open files, if None use systems (soft) resourcelimit, by default None

  • margin (int, optional) – Safe margin to keep, i.e. we will only ever open num - margin files, by default 30

Raises:

ValueError – If num <= margin.

SLURM settings and resource usage#

asyncmd.config.set_slurm_max_jobs(num: int | None) None#

Set the maximum number of simultaneously submitted SLURM jobs.

Parameters:

num (int or None) – The maximum number of simultaneous SLURM jobs for this invocation of python/asyncmd. None means do not limit the maximum number of jobs.

Note

The function below is an alias for/imported from

asyncmd.slurm.config.set_slurm_settings()#

Note: It is recommended/preferred to use asyncmd.config.set_slurm_settings().

asyncmd.config.set_slurm_settings(*, sinfo_executable: str | None = None, sacct_executable: str | None = None, sbatch_executable: str | None = None, scancel_executable: str | None = None, min_time_between_sacct_calls: int | None = None, num_fails_for_broken_node: int | None = None, success_to_fail_ratio: int | None = None, exclude_nodes: list[str] | None = None) None#

Set single or multiple settings relevant for SLURM job control.

Call this function if you want to change e.g. the path/name of SLURM executables. This function only modifies those settings for which a value other than None is passed. See set_all_slurm_settings if you want to set/ modify all slurm settings and/or reset them to their defaults.

Parameters:
  • sinfo_executable (str, optional) – Name of path to the sinfo executable, by default None.

  • sacct_executable (str, optional) – Name or path to the sacct executable, by default None.

  • sbatch_executable (str, optional) – Name or path to the sbatch executable, by default None.

  • scancel_executable (str, optional) – Name or path to the scancel executable, by default None.

  • min_time_between_sacct_calls (int, optional) – Minimum time (in seconds) between subsequent sacct calls, by default None.

  • num_fails_for_broken_node (int, optional) – Number of failed jobs we need to observe per node before declaring it to be broken (and not submitting any more jobs to it), by default None.

  • success_to_fail_ratio (int, optional) – Number of successful jobs we need to observe per node to decrease the failed job counter by one, by default None.

  • exclude_nodes (list[str], optional) – List of nodes to exclude in job submissions, by default None, which results in no excluded nodes.

Note

The function below is an alias for/imported from

asyncmd.slurm.config.set_all_slurm_settings()#

Note: It is recommended/preferred to use asyncmd.config.set_all_slurm_settings().

asyncmd.config.set_all_slurm_settings(*, sinfo_executable: str = 'sinfo', sacct_executable: str = 'sacct', sbatch_executable: str = 'sbatch', scancel_executable: str = 'scancel', min_time_between_sacct_calls: int = 10, num_fails_for_broken_node: int = 3, success_to_fail_ratio: int = 50, exclude_nodes: list[str] | None = None) None#

(Re) initialize all settings relevant for SLURM job control.

Call this function if you want to change e.g. the path/name of SLURM executables. Note that this is a convenience function to set all SLURM settings in one central place and all at once, i.e. calling this function will overwrite all previous settings. If this is not intended, have a look at the set_slurm_settings function which only changes the passed arguments or you can also set/modify each setting separately in the SlurmProcess and SlurmClusterMediator classes.

Parameters:
  • sinfo_executable (str, optional) – Name of path to the sinfo executable, by default “sinfo”.

  • sacct_executable (str, optional) – Name or path to the sacct executable, by default “sacct”.

  • sbatch_executable (str, optional) – Name or path to the sbatch executable, by default “sbatch”.

  • scancel_executable (str, optional) – Name or path to the scancel executable, by default “scancel”.

  • min_time_between_sacct_calls (int, optional) – Minimum time (in seconds) between subsequent sacct calls, by default 10.

  • num_fails_for_broken_node (int, optional) – Number of failed jobs we need to observe per node before declaring it to be broken (and not submitting any more jobs to it), by default 3.

  • success_to_fail_ratio (int, optional) – Number of successful jobs we need to observe per node to decrease the failed job counter by one, by default 50.

  • exclude_nodes (list[str] or None, optional) – List of nodes to exclude in job submissions, by default None, which results in no excluded nodes.

CV value caching#

asyncmd.config.set_trajectory_cache_type(cache_type: str, copy_content: bool = True, clear_old_cache: bool = False) None#

Set the cache type for TrajectoryFunctionWrapper values.

By default the content of the current caches is copied to the new caches. To clear the old/previously set caches (after copying their values), pass clear_old_cache=True.

Parameters:
  • cache_type (str) – One of “h5py”, “npz”, “memory”.

  • copy_content (bool, optional) – Whether to copy the current cache content to the new cache, by default True

  • clear_old_cache (bool, optional) – Whether to clear the old/previously set cache, by default False.

Raises:

ValueError – Raised if cache_type is not one of the allowed values.

asyncmd.config.register_h5py_cache(h5py_group: h5py.Group | h5py.File, copy_h5py: bool = False, copy_content: bool = True, clear_old_cache: bool = False) None#

Register a h5py file or group for CV value caching.

Optionally copy over all cached values from the previously set h5py_cache(s), also for Trajectory objects that are currently not instantiated, see the copy_h5py argument. If it is True all previously set caches will be deregistered after copying (since their values are now available in newly set cache also).

Note that in case the trajectory cache type is currently not “h5py”, this function sets the cache type to “h5py”, i.e. it calls set_trajectory_cache_type() with cache_type="h5py". The arguments copy_content and clear_old_cache are directly passed to set_trajectory_cache_type().

Note that a h5py.File is just a slightly special h5py.Group, so you can pass either. asyncmd will use either the file or the group as the root of its own stored values. E.g. you will have h5py_group["asyncmd/TrajectoryFunctionValueCache"] always pointing to the cached trajectory values and if h5py_group is the top-level group (i.e. the file) you also have (file["/asyncmd/TrajectoryFunctionValueCache"] == h5py_group["asyncmd/TrajectoryFunctionValueCache"]).

Parameters:
  • h5py_group (h5py.Group or h5py.File) – The file or group to use for caching.

  • copy_h5py (bool, optional, by default False) – Whether to copy over all cached values from the previously set h5py cache (even for Trajectory objects that are currently not instantiated).

  • copy_content (bool, optional) – Whether to copy the current cache content to the new cache, by default True

  • clear_old_cache (bool, optional) – Whether to clear the old/previously set cache, by default False.

asyncmd.config.deregister_h5py_cache(h5py_group: h5py.Group | h5py.File)#

Deregister a given h5py_group from use as a cache for trajectory function values.

Also deregisters the given h5py_group from all asyncmd.Trajectory objects currently in existence.

Parameters:

h5py_group (h5py.Group | h5py.File) – The h5py_group to deregister.