Gromacs engines#

This notebook showcases the use of the python classes used to steer gromacs from python. It will only work if the gromacs executables are available (e.g. in your $PATH variable).

There are two main classes you will use together:

  • asyncmd.gromacs.MDP, a python class which parses a gromacs molecular dynamics parameter file (.mdp) and makes its content available via a dictionary-like interface

  • the asyncmd.gromacs.GmxEngine or the asyncmd.gromacs.SlurmGmxEngine, both share a common interface and are async/await enabled python wrappers to run gromacs locally or via the SLURM workload manager, respectively

Imports and some basic checks that everything is available#

%%bash
# if using the module system to make gromacs and friends available:
# check that they are loaded!
#module list
%%bash
# unix only, check that gmx is available
which gmx
/usr/local/gromacs-2022.4/bin/gmx
%matplotlib inline
import os
import asyncio
import asyncmd
import asyncmd.gromacs as asyncgmx
Could not initialize SLURM cluster handling. If you are sure SLURM (sinfo/sacct/etc) is available try calling `asyncmd.config.set_slurm_settings()` with the appropriate arguments.

Optional: setup logging to be more verbose#

loglevel = "WARN"
loglevel = "INFO"  # comment this line if you want more logging
%config Application.log_level=loglevel
import logging
l = logging.getLogger("asyncmd")
l.setLevel(loglevel)

Setup working directory and the number of gromacs simulations to run in parallel#

n_engines = 4

scratch_dir = "gmx_engine_wdirs"
wdirs = [os.path.join(scratch_dir, f"engine_{i}") for i in range(n_engines)]

for d in wdirs:
    if not os.path.isdir(d):
        os.makedirs(d)

asyncmd.gromacs.MDP#

The MDP is a dictionary-like interface to a parsed gromacs molecular dynamics parameter file .mdp file to enable easy inspection and modification from python code. Most of the values are automatically cast to their respective types, e.g. nsteps will always be an int and ref-t will always be a list of float. The default for unknown parameters is a list of str to allow for the highest flexibility possible.

The class supports writing of its (possibly changed) content to a new .mdp file by using its .write() method and also knows if its content has been changed since parsing the original .mdp file. It even supports the (undocumented) keyformat CHARMM-GUI uses in which all - are replaced by _.

# one MDP object per engine, in principal we could use the same object but this way is more customizable,
# e.g. we could want to modify our setup have the engines run at a different temperatures
mdps = [asyncgmx.MDP("../../resources/gromacs/capped_alanine_dipeptide/md.mdp") for _ in range(n_engines)]
# lets have a look at what is inside
print("MDP has been changed since parsing: ", mdps[0].changed)
print("Parsed content:")
print("---------------")
for key, val in mdps[0].items():
    print(key, " : ", val)
MDP has been changed since parsing:  False
Parsed content:
---------------
integrator  :  md
dt  :  0.002
nsteps  :  -1
nstxout  :  20
nstvout  :  20
nstlog  :  20
nstxout-compressed  :  0
nstlist  :  50
ns-type  :  grid
cutoff-scheme  :  Verlet
rlist  :  1.1
coulombtype  :  PME
rcoulomb  :  1.1
rvdw  :  1.1
Tcoupl  :  v-rescale
tc-grps  :  ['Protein', 'SOL']
tau-t  :  [0.5, 0.5]
ref-t  :  [300.0, 300.0]
Pcoupl  :  C-rescale
tau-p  :  1.0
compressibility  :  [4.5e-05]
ref-p  :  [1.0]
gen-vel  :  no
constraints  :  h-bonds

NOTE: “Pcoupl = C-rescale” needs gromacs version >= 2021#

I.e. if you are running a recent version of gromacs (>= 2021) you should comment the line below in which we set “Pcoupl = Berendsen”.

# lets set the xtc output frequency to 0 in all MDPs, we will use the trr trajectory
# we will also increase the trr output interval by a bit and add the `continuation` parameter
nstout = 200
for i, mdp in enumerate(mdps):
    if i == 0:
        # Use something different for the first mdp
        # Note that the nstout options all have the correct datatype (i.e. integer)
        # so we can do inplace multiplication on it
        mdp["nstvout"] *= 50
        mdp["nstxout"] *= 50
        mdp["nstlog"] *= 50
    else:
        mdp['nstvout'] = nstout
        mdp["nstxout"] = nstout
        mdp["nstlog"] = nstout
    
    mdp["nstenergy"] = nstout
    mdp["nstxout-compressed"] = 0
    mdp["continuation"] = "yes"  # dont apply constraints to the initial configuration
    #mdp["Pcoupl"] = "Berendsen"
# have a look again
print("MDP has been changed since parsing: ", mdps[0].changed)
print("Parsed content:")
print("---------------")
for key, val in mdps[0].items():
    print(key, " : ", val)
MDP has been changed since parsing:  True
Parsed content:
---------------
integrator  :  md
dt  :  0.002
nsteps  :  -1
nstxout  :  1000
nstvout  :  1000
nstlog  :  1000
nstxout-compressed  :  0
nstlist  :  50
ns-type  :  grid
cutoff-scheme  :  Verlet
rlist  :  1.1
coulombtype  :  PME
rcoulomb  :  1.1
rvdw  :  1.1
Tcoupl  :  v-rescale
tc-grps  :  ['Protein', 'SOL']
tau-t  :  [0.5, 0.5]
ref-t  :  [300.0, 300.0]
Pcoupl  :  C-rescale
tau-p  :  1.0
compressibility  :  [4.5e-05]
ref-p  :  [1.0]
gen-vel  :  no
constraints  :  h-bonds
nstenergy  :  200
continuation  :  yes

asyncmd.gromacs.GmxEngine (and asyncmd.gromacs.SlurmGmxEngine)#

Both provide the functionality of the gromacs grompp and mdrun executables in one class, i.e. given molecular dynamics parameters and possibly an initial configuration they will setup and steer a MD run. Their interfaces differ only in the additional sbatch_script that the slurm engine requires at initialization time, they can otherwise be used interchangeably. Both engines need the gromacs executables to be available, specifically gmx grompp and gmx mdrun (gmx_mpi mdrun for the SlurmGmxEngine). The SlurmGmxEngine naturally also must have access to the slurm executables, specifically sbatch, sacct and scancel. However all of these can be set either at initialization time via keyword arguments or globally as attributes to the uninitialized class.

Each engine has a prepare() method (which will call grompp) and multiple methods to then run the simulation, namely run(), run_walltime() and run_nsteps(). The additional prepare_from_files() method can be used to continue a previous MD run from given deffnm and workdir (assuming all files/parts are there), note that it will (currently) not call grompp again and therefore assumes that the portable run input file (.tpr) allows for the continuation (i.e. has no or a sufficiently large integration step limit).

# Let us create a list of identical engines to showcase the power of concurrent execution :)

# we will use the `mdrun_extra_args` option to limit each engine to 2 threads
# (the box is so small that otherwise the domain decomposition might fail)
# depending on where you run and your gromacs version you might need to adapt this
mdrun_extra_args = "-nt 2"

engines = [asyncgmx.GmxEngine(mdconfig=mdp,
                              gro_file="../../resources/gromacs/capped_alanine_dipeptide/conf.gro",  # required
                              top_file="../../resources/gromacs/capped_alanine_dipeptide/topol_amber99sbildn.top",  # required
                              # optional (can be omitted or None), however naturally without an index file
                              # you can not reference custom groups in the .mdp-file or MDP object 
                              ndx_file="../../resources/gromacs/capped_alanine_dipeptide/index.ndx",
                              output_traj_type="trr",  # optional, defaults to xtc
                              mdrun_extra_args=mdrun_extra_args,
                            )
           for mdp in mdps]

After setting the molecular dynamics parameters we can prepare a gromacs MD run.#

The gromacs engines prepare() method will call grompp, as with grompp you can use a specific starting configuration (the grompp -t option) or start from the structure file (.gro) the engine got at initialization.

Lets prepare the first engine without a starting structure:#

e0 = engines[0]  # get it out of the list so tab-help/completion works
# the prepare method is an async def function (a coroutine) and must be awaited
await e0.prepare(starting_configuration=None, workdir=wdirs[0], deffnm="test")

Lets prepare all other engines at once with the same initial configuration#

We can use asyncio.gather to run all coroutines concurrently, for prepare this does not make a big difference (since it is fast), but the same mechanism enables us to run all 4 gromacs engines in parallel later.

# create an asyncmd.Trajectory of the initial configuration
init_conf = asyncmd.Trajectory(trajectory_files="../../resources/gromacs/capped_alanine_dipeptide/conf_in_alphaR.trr",
                               structure_file="../../resources/gromacs/capped_alanine_dipeptide/conf.gro",
                              )
# and prepare the engines (the return value of prepare is None)
await asyncio.gather(*(e.prepare(starting_configuration=init_conf, workdir=wdir, deffnm="test")
                       for e, wdir in zip(engines[1:], wdirs[1:])
                       )
                     )
[None, None, None]

Run the engines for a number of steps each.#

We will first run the last engine in the list alone and then all 4 concurrently for the same number of steps to show off the power of the concurrent execution of the gromacs subprocesses.

import time  # import time to be able to show off ;)
nsteps = 10000

# run one engine and time it
start = time.time()
# the engine will return an asyncmd.Trajectory with the produced trajectory (part)
traj = await engines[-1].run_steps(nsteps=nsteps, steps_per_part=True)
end = time.time()

print(f"Running one engine for {nsteps} integration steps took {round(end - start, 4)} seconds.")
print(f"The produced trajectory ({traj}) has a length of {len(traj)} frames.")
print(f"This length is the number of steps divided by the engines output frequency (={engines[-1].nstout}).")
print("Note, that we are off by plus one because the initial configuration is in the trajectory for gromacs.")
print("Note also that this is only true when explicitly passing nsteps to the `run` methods, ")
print("unfortunately the real relation between frames and steps done is a bit more involved...")
print("See the docstring for `GmxEngine.steps_done` if you are brave and want to know more ;)")
Running one engine for 10000 integration steps took 1.9287 seconds.
The produced trajectory (Trajectory(trajectory_files=gmx_engine_wdirs/engine_3/test.part0001.trr, structure_file=gmx_engine_wdirs/engine_3/test.tpr)) has a length of 51 frames.
This length is the number of steps divided by the engines output frequency (=200).
Note, that we are off by plus one because the initial configuration is in the trajectory for gromacs.
Note also that this is only true when explicitly passing nsteps to the `run` methods, 
unfortunately the real relation between frames and steps done is a bit more involved...
See the docstring for `GmxEngine.steps_done` if you are brave and want to know more ;)
# run all engines at once and time it
start = time.time()
# Now each engine will return asyncmd.Trajectory with the produced trajectory (part)
# i.e. trajs will be a list of trajectories (in the same order as the engines in the list)
trajs = await asyncio.gather(*(e.run_steps(nsteps=nsteps, steps_per_part=True) for e in engines))
end = time.time()

print(f"Running all engines for {nsteps} integration steps took {round(end - start, 4)} seconds.")
print(f"But now we have a list of {len(trajs)} trajectories with {nsteps} steps each...")
for t in trajs:
    print(t, f"with length: {len(t)}")
Running all engines for 10000 integration steps took 5.5837 seconds.
But now we have a list of 4 trajectories with 10000 steps each...
Trajectory(trajectory_files=gmx_engine_wdirs/engine_0/test.part0001.trr, structure_file=gmx_engine_wdirs/engine_0/test.tpr) with length: 11
Trajectory(trajectory_files=gmx_engine_wdirs/engine_1/test.part0001.trr, structure_file=gmx_engine_wdirs/engine_1/test.tpr) with length: 51
Trajectory(trajectory_files=gmx_engine_wdirs/engine_2/test.part0001.trr, structure_file=gmx_engine_wdirs/engine_2/test.tpr) with length: 51
Trajectory(trajectory_files=gmx_engine_wdirs/engine_3/test.part0002.trr, structure_file=gmx_engine_wdirs/engine_3/test.tpr) with length: 51

Note how the first engine has produced a different number of frames in the trajectory? The reason is the different output frequency (nstout), so the same number of integration steps will result in a different number of frames.

Use prepare_from_files to initialize new engines and pick up where we left off with the ‘old’ ones.#

# create the engines
new_engines = [asyncgmx.GmxEngine(mdconfig=mdp,
                                  gro_file="../../resources/gromacs/capped_alanine_dipeptide/conf.gro",
                                  top_file="../../resources/gromacs/capped_alanine_dipeptide/topol_amber99sbildn.top",
                                  ndx_file="../../resources/gromacs/capped_alanine_dipeptide/index.ndx",
                                  output_traj_type="trr",  # need to specify that we are using trr because xtc is the default
                                  mdrun_extra_args=mdrun_extra_args,  # again limit to 2 threads
                                  )
               for mdp in mdps]
e0 = new_engines[0]  # get one out for the autocomplete
# and initialize with prepare_from_files
await e0.prepare_from_files(workdir=wdirs[0], deffnm="test")
# and the others concurrent in one go
await asyncio.gather(*(e.prepare_from_files(workdir=wdir, deffnm="test")
                       for e, wdir in zip(new_engines[1:], wdirs[1:])
                       )
                     )
[None, None, None]

Now we can do another round of MD in all engines in parallel#

Note that the partnums indicate that we picked up exactly where we left of. We could additionally check using the trajectories .last_step and .first_step properties, compare and observe that the last step in the previous MD runs will be the first step in these here.

# run all engines at once and time it
start = time.time()
trajs = await asyncio.gather(*(e.run_steps(nsteps=nsteps, steps_per_part=True) for e in new_engines))
end = time.time()

print(f"Running all engines for {nsteps} integration steps took {round(end - start, 4)} seconds.")
print(f"But now we have a list of {len(trajs)} trajectories with {nsteps} steps each...")
for t in trajs:
    print(t, f"with length: {len(t)}")
Running all engines for 10000 integration steps took 5.6832 seconds.
But now we have a list of 4 trajectories with 10000 steps each...
Trajectory(trajectory_files=gmx_engine_wdirs/engine_0/test.part0002.trr, structure_file=gmx_engine_wdirs/engine_0/test.tpr) with length: 11
Trajectory(trajectory_files=gmx_engine_wdirs/engine_1/test.part0002.trr, structure_file=gmx_engine_wdirs/engine_1/test.tpr) with length: 51
Trajectory(trajectory_files=gmx_engine_wdirs/engine_2/test.part0002.trr, structure_file=gmx_engine_wdirs/engine_2/test.tpr) with length: 51
Trajectory(trajectory_files=gmx_engine_wdirs/engine_3/test.part0003.trr, structure_file=gmx_engine_wdirs/engine_3/test.tpr) with length: 51

Run for specified walltime#

walltime = 0.02 # 0.01 h = 36 s

# run all engines at once and time it
start = time.time()
trajs = await asyncio.gather(*(e.run_walltime(walltime) for e in new_engines))
end = time.time()

print(f"Running all engines for {walltime} h (={walltime*60*60} s) took {round(end - start, 4)} seconds.")
Running all engines for 0.02 h (=72.0 s) took 72.7034 seconds.

Run for specified walltime or number of steps (depending on what is reached first)#

We can also use the generic run() method which takes one or both of the walltime and nsteps arguments, it will finish as soon as one of the conditions is fulfilled. As the run_steps() method it also accepts the steps_per_part argument making it particularly useful to run in chunks (of length walltime) but for a fixed total number of steps.

Note that we can either check if engine.steps_done < n_steps_desired (as we do below) or call the engine.run(nsteps=n_steps_desired) method until it returns None instead of a trajectory object, which indicates that the total number of steps done in that engine is exactly the requested number of total steps.

print([e.steps_done for e in new_engines])
print([e.steps_done < (max([e.steps_done for e in new_engines]) + 20000) for e in new_engines])
[178000, 178000, 178000, 188000]
[True, True, True, True]
walltime = 0.01 # 0.01 h = 36 s
nsteps = max([e.steps_done for e in new_engines]) + 20000

print(f"Original nsteps = {nsteps}")

# make sure that nsteps is a multiple of nstout for all engines
# (this is enforced when you run for a fixed number of steps to avoid stupid one-off errors that might happen
#  because the frames and steps are not multiples of each other)
perfect_nsteps = all([nsteps % e.nstout == 0 for e in engines])
while not perfect_nsteps:
    for e in engines:
        if nsteps % e.nstout != 0:
            nsteps += e.nstout - (nsteps % e.nstout)
    perfect_nsteps = all([nsteps % e.nstout == 0 for e in engines])

print(f"Will run for {nsteps} steps in all engines!")

# now the actual trajectory generation
all_trajs = []
all_times = []
while any([e.steps_done < nsteps for e in new_engines]):
    # run all engines at once and time it
    start = time.time()
    trajs = await asyncio.gather(*(e.run(walltime=walltime, nsteps=nsteps, steps_per_part=False)
                                   for e in new_engines
                                   )
                                 )
    end = time.time()
    all_trajs.append(trajs)
    all_times.append(end-start)

print(f"Ran for a total of {len(all_times)} loops. It took us {round(sum(all_times), 4)} seconds.")
Original nsteps = 208000
Will run for 208000 steps in all engines!
Ran for a total of 1 loops. It took us 13.6632 seconds.
# the last engine could already have produced a `None` instead of a trajectory in the last iteration
# (since it is some steps ahead of the others because we ran it alone at the beginning of the notebook)
all_trajs[-1]
[Trajectory(trajectory_files=gmx_engine_wdirs/engine_0/test.part0004.trr, structure_file=gmx_engine_wdirs/engine_0/test.tpr),
 Trajectory(trajectory_files=gmx_engine_wdirs/engine_1/test.part0004.trr, structure_file=gmx_engine_wdirs/engine_1/test.tpr),
 Trajectory(trajectory_files=gmx_engine_wdirs/engine_2/test.part0004.trr, structure_file=gmx_engine_wdirs/engine_2/test.tpr),
 Trajectory(trajectory_files=gmx_engine_wdirs/engine_3/test.part0005.trr, structure_file=gmx_engine_wdirs/engine_3/test.tpr)]