Hi,
I want to compute transient solutions for several different input parameters to eventually be able to do some UQ via sampling. For convenience, let's say these are in total 100 transient runs (how many it will be still depends somewhat on how much I can break down the compute time). I've made some test runs with the np
options to see how the ISSM solve scales. For my test problem I recorded (one run each, all with the same parameter):
np = 1
: computation time 29 min
np = 2
: computation time 18 min
np = 4
: computation time 12.6 min
np = 8
: computation time 12.3 min
So essentially using more cores initially speeds up the computations but I get diminishing returns. This happens depending on the mesh size for smaller or larger np
values, which overlaps with the observations made here https://issm.ess.uci.edu/forum/d/155-parallel-computing back in 2017.
Rather than running 100 solves with np=8
(wait 20.5 h), it would speed up things a lot more if I ran 2x (50 solves with np=4
) in parallel (wait 10.5 h). Is there a way for me to do that? I've tried the obvious ways of running two separate Matlab windows or calling matlab scripts in parallel from the terminal with mpirun. The solves actually terminate without any thrown errors and the solution at the final time seem to be correct. However, whichever process finishes second only records the transient solution starting at the time step where it was when the first process ended. I didn't think this approach would work at all, but based on my results so far it almost does. So, is there a way I can make it work?
Example: If window 1 finishes after 200 time steps and window 2 is at the time at time step 119, then the md.results
returned in window 1 will be a 1x200 struct with the transient solution for time steps 1 - 200, but md.results
for window 2 will be a 1x82 struct with time steps 119 - 200 but with the actual solution for the parameter passed in window 2.
Thanks and best wishes,
Nicole