Trajectories, TrajectoryFunctions, and value caching#

The asyncmd.trajectory module contains a Trajectory class which is the return object for all MD engines. These objects enable easy access to a number properties of the underlying trajectory, like the length in frames or time, the integration step and many more. Note that Trajectory are unique objects in the sense that every combination of underlying trajectory_files will give you the same object back even if you instantiate it multiple times, i.e. is will be True for the two objects (in addition to == being True). Also note that it is possible to pickle and unpickle Trajectory objects. You can even change the filepath of the underlying trajectories, i.e. copy/move them to another location (consider also moving the npz cache files) and still unpickle to get a working Trajectory object as long as the relative path between your python workdir and the trajectory files does not change. Or you can change the workdir of the python interpreter as long as the trajectory files remain at the same location in the filesystem.

asyncmd comes with a number of TrajectoryFunctionWrapper (sub)classes, which can be used to wrap (python) functions or arbitrary executables for easy concurrent application on Trajectory objects, either submitted via slurm or ran locally. Currently included are the PyTrajectoryFunctionWrapper and the SlurmTrajectoryFunctionWrapper, but it is straightforward to implement your own (see here for an in-depth explanation). The benefit of the wrapped functions is that the calculated values will be cached automatically. The caching is even persistent over multiple reloads and invocations of the python interpreter. To this end the default caching mechanism creates hidden numpy npz files for every Trajectory (named after the trajectory) in which the values are stored. Other caching mechanism are an in-memory cache and the option to store all cached values in a h5py.File or h5py.Group. You can set the default caching mechanism for all Trajectory objects centrally via asyncmd.config.set_default_trajectory_cache_type() or overwrite it for each Trajectory separately at init by passing cache_type.

See also

The example notebooks on the PyTrajectoryFunctionWrapper and the SlurmTrajectoryFunctionWrapper.

See also

asyncmd.config.register_h5py_cache(), the function used to register the h5py cache.

See also

asyncmd.trajectory._forget_all_trajectories() and asyncmd.trajectory._forget_trajectory(), two helper functions to remove trajectories or a specific trajectory from the internal registry of trajectories.

Trajectory#

class asyncmd.Trajectory(trajectory_files: list[str] | str, structure_file: str, nstout: int | None = None, **kwargs)#

Represent a trajectory.

Keep track of the paths of the trajectory and the structure files. Caches values for (wrapped) functions acting on the trajectory. Supports pickling and unpickling with the cached values restored, if a non-persistent cache is used when pickling, the values will be written to a hidden numpy npz file next to the trajectory and will be read at unpickling. Supports equality checks with other Trajectory. Also makes available (and caches) a number of useful attributes, e.g. first_step and last_step (the first and last integration step in the trajectory), dt, first_time, last_time,and length (in frames). All properties are read-only (for the simple reason that they depend only on the underlying trajectory files). A special case is nstout, the output frequency in integration steps. Since it can not be reliably read/inferred from the trajectory files alone, it can be set by the user (at initialization or later via the property).

Notes

first_step and last_step is only useful for trajectories that come directly from a asyncmd.mdengine.MDEngine. As soon as the trajectory has been concatenated using MDAnalysis (e.g. with the TrajectoryConcatenator) the step information is just the frame number in the trajectory part that became first/last frame in the concatenated trajectory.

Initialize a Trajectory.

Parameters:
  • trajectory_files (list[str] or str) – Absolute or relative path(s) to the trajectory file(s), e.g. trr, xtc, dcd, …

  • structure_file (str) – Absolute or relative path to the structure file (e.g. tpr, gro).

  • nstout (int or None, optional) – The output frequency used when creating the trajectory, by default None

Raises:

FileNotFoundError – If the trajectory_files or the structure_file are not accessible.

__hash__() int#

Return hash(self).

__len__() int#

Return the number of frames in the trajectory.

Returns:

The number of frames in the trajectory.

Return type:

int

clear_all_cache_values() None#

Clear all function values cached for this Trajectory.

For file-based caches, this also removes the associated cache files. Note that this just calls the underlying TrajectoryFunctionValueCache classes clear_all_values method.

update_cache_type(copy_content: bool = True, clear_old_cache: bool = False) None#

Update the TrajectoryFunctionValueCache this Trajectory uses.

By default the content of the current cache is copied to the new cache. This will only have an effect if the globally set cache_type differs from what this Trajectory currently uses. See asyncmd.config.set_trajectory_cache_type() to set the cache_type. To clear the old/previously set cache (after copying its values), pass clear_old_cache=True.

Parameters:
  • copy_content (bool, optional) – Whether to copy the current cache content to the new cache, by default True

  • clear_old_cache (bool, optional) – Whether to clear the old/previously set cache, by default False.

property dt: float#

The time interval between subsequent frames (not steps) in ps.

property first_step: int | None#

Return the integration step of the first frame in the trajectory.

property first_time: float#

Return the integration timestep of the first frame in ps.

property last_step: int | None#

Return the integration step of the last frame in the trajectory.

property last_time: float#

Return the integration timestep of the last frame in ps.

property nstout: int | None#

Output frequency between subsequent frames in integration steps.

property structure_file: str#

Return relative path to the structure file.

property trajectory_files: list[str]#

Return relative path to the trajectory files.

property trajectory_hash: int#

Return hash over the trajectory files

TrajectoryFunctionWrappers#

class asyncmd.trajectory.PyTrajectoryFunctionWrapper(function, call_kwargs: dict[str, Any] | None = None, **kwargs)#

Wrap python functions for use on asyncmd.Trajectory.

Turns every python callable into an asynchronous (awaitable) and cached function for application on asyncmd.Trajectory. Also works for asynchronous (awaitable) functions, they will be cached.

Initialize a PyTrajectoryFunctionWrapper.

Parameters:
  • function (callable) – The (synchronous or asynchronous) callable to wrap.

  • call_kwargs (dict, optional) – Keyword arguments for function, the keys will be used as keyword with the corresponding values, by default {}

async __call__(value: Trajectory) ndarray#

Apply wrapped function asynchronously on given trajectory.

Parameters:

value (asyncmd.Trajectory) – Input trajectory.

Returns:

The values of the wrapped function when applied on the trajectory.

Return type:

iterable, usually list or np.ndarray

property call_kwargs: dict[str, Any]#

Additional calling arguments for the wrapped function/executable.

NOTE: You can only (re)set the complete dict and not single keys!

property function#

The python callable this PyTrajectoryFunctionWrapper wraps.

property id: str#

Unique and persistent identifier.

Takes into account the wrapped function and its calling arguments.

class asyncmd.trajectory.SlurmTrajectoryFunctionWrapper(executable, sbatch_script, *, sbatch_options: dict | None = None, call_kwargs: dict | None = None, load_results_func: Callable | None = None, **kwargs)#

Wrap executables to use on asyncmd.Trajectory via SLURM.

The execution of the job is submitted to the queueing system with the given sbatch script (template). The executable will be called with the following positional arguments:

  • full filepath of the structure file associated with the trajectory

  • full filepath of the trajectory to calculate values for, note that multipart trajectories result in multiple files/arguments here.

  • full filepath of the file the results should be written to without fileending. Note that if no custom loading function is supplied we expect that the written file has ‘npy’ format and the added ending ‘.npy’, i.e. we expect the executable to add the ending ‘.npy’ to the passed filepath (as e.g. np.save($FILEPATH, data) would do)

  • any additional arguments from call_kwargs are added as " {key} {value}" for key, value in call_kwargs.items()

See also the examples for a reference (python) implementation of multiple different functions/executables for use with this class.

Initialize SlurmTrajectoryFunctionWrapper.

Note that all attributes can be set via __init__ by passing them as keyword arguments.

Parameters:
  • executable (str) – Absolute or relative path to an executable or name of an executable available via the environment (e.g. via the $PATH variable on LINUX)

  • sbatch_script (str) –

    Path to a sbatch submission script file or string with the content of a submission script. Note that the submission script must contain the following placeholders (also see the examples folder):

    • {cmd_str} : Replaced by the command to call the executable on a given trajectory.

  • sbatch_options (dict or None) – Dictionary of sbatch options, keys are long names for options, values are the corresponding values. The keys/long names are given without the dashes, e.g. to specify --mem=1024 the dictionary needs to be {"mem": "1024"}. To specify options without values use keys with empty strings as values, e.g. to specify --contiguous the dictionary needs to be {"contiguous": ""}. See the SLURM documentation for a full list of sbatch options (https://slurm.schedmd.com/sbatch.html). Note: This argument is passed as is to the SlurmProcess in which the computation is performed. Each call of the TrajectoryFunction triggers the creation of a new asyncmd.slurm.SlurmProcess and will use the then current sbatch_options.

  • call_kwargs (dict, optional) – Dictionary of additional arguments to pass to the executable, they will be added to the call as pair `` {key} {val}``, note that in case you want to pass single command line flags (like -v) this can be achieved by setting key="-v" and val="", i.e. to the empty string. Lists as values will be unpacked and added as (for a list with n entries): `` {key} {val1} {val2} … {valn}``. The values are shell escaped using shlex.quote() when writing them to the sbatch script.

  • load_results_func (None or function (callable)) – Function to call to customize the loading of the results. If a function is supplied, it will be called with the full path to the results file (as in the call to the executable) and should return a numpy array containing the loaded values.

async __call__(value: Trajectory) ndarray#

Apply wrapped function asynchronously on given trajectory.

Parameters:

value (asyncmd.Trajectory) – Input trajectory.

Returns:

The values of the wrapped function when applied on the trajectory.

Return type:

iterable, usually list or np.ndarray

property call_kwargs: dict[str, Any]#

Additional calling arguments for the wrapped function/executable.

NOTE: You can only (re)set the complete dict and not single keys!

property executable: str#

The executable used to compute the function results.

property id: str#

Unique and persistent identifier.

Takes into account the wrapped function and its calling arguments.

property sbatch_options: dict[str, str] | None#

Dictionary of sbatch_options or None (see the corresponding __init__ argument).

NOTE: You can only (re)set the complete dict and not single keys!

property sbatch_script: str#

Content of the sbatch script (see the corresponding __init__ argument).

Can also be set with the path to a file, in this case the script will be read.

property slurm_jobname: str#

The jobname of the slurm job used to compute the function results.

Also used as part of the filename for the submission script that will be written (and deleted if everything goes well) for every trajectory.

NOTE: Must be unique for each SlurmTrajectoryFunctionWrapper instance. Will by default include the persistent unique ID id(). To (re)set to the default set it to None.