Trajectories, TrajectoryFunctions, and value caching

Trajectories, TrajectoryFunctions, and value caching#

Trajectories#

The asyncmd.trajectory module contains a Trajectory class which is the return object for all MD engines. These objects enable easy access to a number properties of the underlying trajectory, like the length in frames or time, the integration step and many more. Note that Trajectory are unique objects in the sense that every combination of underlying trajectory_files will give you the same object back even if you instantiate it multiple times, i.e. is will be True for the two objects (in addition to == being True). Also note that it is possible to pickle and unpickle Trajectory objects. You can even change the filepath of the underlying trajectories, i.e. copy/move them to another location (consider also moving the npz cache files) and still unpickle to get a working Trajectory object as long as the relative path between your python workdir and the trajectory files does not change. Or you can change the workdir of the python interpreter as long as the trajectory files remain at the same location in the filesystem.

TrajectoryFunctions and caching their values#

asyncmd comes with a number of TrajectoryFunctionWrapper (sub)classes, which can be used to wrap (python) functions or arbitrary executables for easy concurrent application on Trajectory objects, either submitted via slurm or ran locally. Currently included are the PyTrajectoryFunctionWrapper and the SlurmTrajectoryFunctionWrapper, but it is straightforward to implement your own (see here for an in-depth explanation). The benefit of the wrapped functions is that the calculated values will be cached automatically. The caching is even persistent over multiple reloads and invocations of the python interpreter.

You can set the default caching mechanism for all Trajectory objects centrally via asyncmd.config.set_trajectory_cache_type(). The default caching mechanism creates hidden numpy npz files for every Trajectory (named after the trajectory) in which the values are stored. Other caching mechanism are an in-memory cache and the option to store all cached values in a h5py.File or h5py.Group (highly recommended for use-cases with many calculated/cached function values). The h5py-based caching mechanism supports registering multiple h5py caches (also in read-only mode) and retrieves values from all registered cache sources. This enables you to, e.g. open multiple different h5py.File in read-only for an analysis and easily work with any of the cached values that have ever been calculated and stored in any of the files. It is as easy as registering all h5py.File via asyncmd.config.register_h5py_cache(). Please note that there can only be one writeable h5py cache registered, i.e. registering a new writeable h5py cache will remove the previous one. Use the copy_h5py argument of asyncmd.config.register_h5py_cache() in case you want to consolidate the cached values into one (writeable) h5py cache source. When you close a h5py.File registered for caching, you need to deregister a h5py cache source (best directly before closing the file) using asyncmd.config.deregister_h5py_cache(). This is only necessary in case you want to keep using the Trajectory objects currently instantiated, as they can not access (part of) their cached values anymore.

Trajectory#

class asyncmd.Trajectory(trajectory_files: list[str] | str, structure_file: str, nstout: int | None = None, **kwargs)#

Represent a trajectory.

Keep track of the paths of the trajectory and the structure files. Caches values for (wrapped) functions acting on the trajectory. Supports pickling and unpickling with the cached values restored, if a non-persistent cache is used when pickling, the values will be written to a hidden numpy npz file next to the trajectory and will be read at unpickling. Supports equality checks with other Trajectory. Also makes available (and caches) a number of useful attributes, e.g. first_step and last_step (the first and last integration step in the trajectory), dt, first_time, last_time,and length (in frames). All properties are read-only (for the simple reason that they depend only on the underlying trajectory files). A special case is nstout, the output frequency in integration steps. Since it can not be reliably read/inferred from the trajectory files alone, it can be set by the user (at initialization or later via the property).

Notes

first_step and last_step is only useful for trajectories that come directly from a asyncmd.mdengine.MDEngine. As soon as the trajectory has been concatenated using MDAnalysis (e.g. with the TrajectoryConcatenator) the step information is just the frame number in the trajectory part that became first/last frame in the concatenated trajectory.

Initialize a Trajectory.

Parameters:

trajectory_files (list[str] or str) – Absolute or relative path(s) to the trajectory file(s), e.g. trr, xtc, dcd, …
structure_file (str) – Absolute or relative path to the structure file (e.g. tpr, gro).
nstout (int or None, optional) – The output frequency used when creating the trajectory, by default None

Raises:

FileNotFoundError – If the trajectory_files or the structure_file are not accessible.

__hash__() → int#: Return hash(self).

__len__() → int#

Return the number of frames in the trajectory.

Returns:: The number of frames in the trajectory.
Return type:: int

clear_all_cache_values() → None#

Clear all function values cached for this Trajectory.

For file-based caches, this also removes the associated cache files. Note that this just calls the underlying TrajectoryFunctionValueCache classes clear_all_values method.

deregister_h5py_cache(h5py_group: h5py.File | h5py.Group) → None#

Deregister the given h5py_group as a source of cached values.

Parameters:: h5py_group (h5py.File | h5py.Group) – The h5py_group to deregister/remove from caching
Raises:: RuntimeError – When the cache type is not a h5py cache and no deregistering is possible.

update_cache_type(copy_content: bool = True, clear_old_cache: bool = False) → None#

Update the TrajectoryFunctionValueCache this Trajectory uses.

By default the content of the current cache is copied to the new cache. See asyncmd.config.set_trajectory_cache_type() to set the cache_type. To clear the old/previously set cache (after copying its values), pass clear_old_cache=True.

Parameters:

copy_content (bool, optional) – Whether to copy the current cache content to the new cache, by default True
clear_old_cache (bool, optional) – Whether to clear the old/previously set cache, by default False.

property dt: float#: The time interval between subsequent frames (not steps) in ps.

property first_step: int | None#: Return the integration step of the first frame in the trajectory.

property first_time: float#: Return the integration timestep of the first frame in ps.

property last_step: int | None#: Return the integration step of the last frame in the trajectory.

property last_time: float#: Return the integration timestep of the last frame in ps.

property nstout: int | None#: Output frequency between subsequent frames in integration steps.

property structure_file: str#: Return relative path to the structure file.

property trajectory_files: list[str]#: Return relative path to the trajectory files.

property trajectory_hash: int#: Return hash over the trajectory files

TrajectoryFunctionWrappers#

class asyncmd.trajectory.PyTrajectoryFunctionWrapper(function, call_kwargs: dict[str, Any] | None = None, **kwargs)#

Wrap python functions for use on asyncmd.Trajectory.

Turns every python callable into an asynchronous (awaitable) and cached function for application on asyncmd.Trajectory. Also works for asynchronous (awaitable) functions, they will be cached.

Initialize a PyTrajectoryFunctionWrapper.

Parameters:

function (callable) – The (synchronous or asynchronous) callable to wrap.
call_kwargs (dict, optional) – Keyword arguments for function, the keys will be used as keyword with the corresponding values, by default {}

async __call__(value: Trajectory) → ndarray#

Apply wrapped function asynchronously on given trajectory.

Parameters:: value (asyncmd.Trajectory) – Input trajectory.
Returns:: The values of the wrapped function when applied on the trajectory.
Return type:: iterable, usually list or np.ndarray

property call_kwargs: dict[str, Any]#

Additional calling arguments for the wrapped function/executable.

NOTE: You can only (re)set the complete dict and not single keys!

property function#: The python callable this PyTrajectoryFunctionWrapper wraps.

property id: str#

Unique and persistent identifier.

Takes into account the wrapped function and its calling arguments.

class asyncmd.trajectory.SlurmTrajectoryFunctionWrapper(executable, sbatch_script, *, sbatch_options: dict | None = None, call_kwargs: dict | None = None, load_results_func: Callable | None = None, **kwargs)#

Wrap executables to use on asyncmd.Trajectory via SLURM.

The execution of the job is submitted to the queueing system with the given sbatch script (template). The executable will be called with the following positional arguments:

full filepath of the structure file associated with the trajectory

full filepath of the trajectory to calculate values for, note that multipart trajectories result in multiple files/arguments here.

full filepath of the file the results should be written to without fileending. Note that if no custom loading function is supplied we expect that the written file has ‘npy’ format and the added ending ‘.npy’, i.e. we expect the executable to add the ending ‘.npy’ to the passed filepath (as e.g. np.save($FILEPATH, data) would do)

any additional arguments from call_kwargs are added as " {key} {value}" for key, value in call_kwargs.items()

See also the examples for a reference (python) implementation of multiple different functions/executables for use with this class.

Initialize SlurmTrajectoryFunctionWrapper.

Note that all attributes can be set via __init__ by passing them as keyword arguments.

Parameters:

executable (str) – Absolute or relative path to an executable or name of an executable available via the environment (e.g. via the $PATH variable on LINUX)
sbatch_script (str) –
Path to a sbatch submission script file or string with the content of a submission script. Note that the submission script must contain the following placeholders (also see the examples folder):
- {cmd_str} : Replaced by the command to call the executable on a given trajectory.
sbatch_options (dict or None) – Dictionary of sbatch options, keys are long names for options, values are the corresponding values. The keys/long names are given without the dashes, e.g. to specify --mem=1024 the dictionary needs to be {"mem": "1024"}. To specify options without values use keys with empty strings as values, e.g. to specify --contiguous the dictionary needs to be {"contiguous": ""}. See the SLURM documentation for a full list of sbatch options (https://slurm.schedmd.com/sbatch.html). Note: This argument is passed as is to the SlurmProcess in which the computation is performed. Each call of the TrajectoryFunction triggers the creation of a new asyncmd.slurm.SlurmProcess and will use the then current sbatch_options.
call_kwargs (dict, optional) – Dictionary of additional arguments to pass to the executable, they will be added to the call as pair `` {key} {val}``, note that in case you want to pass single command line flags (like -v) this can be achieved by setting key="-v" and val="", i.e. to the empty string. Lists as values will be unpacked and added as (for a list with n entries): `` {key} {val1} {val2} … {valn}``. The values are shell escaped using shlex.quote() when writing them to the sbatch script.
load_results_func (None or function (callable)) – Function to call to customize the loading of the results. If a function is supplied, it will be called with the full path to the results file (as in the call to the executable) and should return a numpy array containing the loaded values.

async __call__(value: Trajectory) → ndarray#

Apply wrapped function asynchronously on given trajectory.

Parameters:: value (asyncmd.Trajectory) – Input trajectory.
Returns:: The values of the wrapped function when applied on the trajectory.
Return type:: iterable, usually list or np.ndarray

property call_kwargs: dict[str, Any]#

Additional calling arguments for the wrapped function/executable.

NOTE: You can only (re)set the complete dict and not single keys!

property executable: str#: The executable used to compute the function results.

property id: str#

Unique and persistent identifier.

Takes into account the wrapped function and its calling arguments.

property sbatch_options: dict[str, str] | None#

Dictionary of sbatch_options or None (see the corresponding __init__ argument).

NOTE: You can only (re)set the complete dict and not single keys!

property sbatch_script: str#

Content of the sbatch script (see the corresponding __init__ argument).

Can also be set with the path to a file, in this case the script will be read.

property slurm_jobname: str#

The jobname of the slurm job used to compute the function results.

Also used as part of the filename for the submission script that will be written (and deleted if everything goes well) for every trajectory.

NOTE: Must be unique for each SlurmTrajectoryFunctionWrapper instance. Will by default include the persistent unique ID id(). To (re)set to the default set it to None.

Trajectories, TrajectoryFunctions, and value caching

Contents

Trajectories, TrajectoryFunctions, and value caching#

Trajectories#

TrajectoryFunctions and caching their values#

Trajectory#

TrajectoryFunctionWrappers#