Best way to get output from output_routines ? #421

DanRRRR · 2022-11-25T22:18:24Z

DanRRRR
Nov 25, 2022

I'd like to do some outputs saved on harddrive from subroutine diagnostics.f90/output_routines and thinking what would be best method to do that.

The first idea is to temporally stop all the processes and save the data (say the coordinate and momentum of all species and fields). Stopping would allow to use really simplest method of saving using APPEND when we open the files for writing otherwise all the processes will compete for the same WRITE stream. We will append the writes streams one core after another with all the data for species1, species2 go to their own separate files.

That would most probably require to call MPI_BARRIER at some point to stop all the process writing in the same file simultaneously and allowing us to do that only one after another in sequence .

Here is the demo how it is potentially possible to do that with the classic example of using of MPI barrier.. Question is what is the best place to put the barrier and how to restart the code to continue after it. In short, substituting PRINT* with the OPEN/WRITE/CLOSE is essentially what is ideologically intended.
! compilation:
!1) gFortran : mpif90 hello_world_mpi.f90 -o hello_world_mpi.exe or
!2) Intel : mpiifort hello_world_mpi.f90 -o hello_world_mpi.exe
! After compilation running it for 4 cores: mpirun -np 4 ./hello_world_mpi.exe

PROGRAM hello_world_mpi
include 'mpif.h'

integer process_Rank, size_Of_Cluster, ierror

call MPI_INIT(ierror)
call MPI_COMM_SIZE(MPI_COMM_WORLD, size_Of_Cluster, ierror)
call MPI_COMM_RANK(MPI_COMM_WORLD, process_Rank, ierror)

DO i = 0, 3, 1
IF(i == process_Rank) THEN
print *, 'Hello World from process: ', process_Rank, 'of ', size_Of_Cluster
END IF
call MPI_BARRIER( MPI_COMM_WORLD, i_error)
END DO

call MPI_FINALIZE(ierror)
END PROGRAM

And another - should i put the barrier outside the output_routines (in the calling it code epoch3d) or somewhere inside output_routines ?

Another method is not to mess with the MPI barriers but just to collect all the data from each core for each species in the allocates arrays. And then at some point just spew all the data from the arrays to the disk without any APPEND.
But the small still annoying problem is that each core does not broadcast the number of particles it is currently working with before it goes into the output_routines where you have to do "trepanation of their brains" calling linked lists for each core. Or may be this data is still available without running and counting all the particles from all cores separately? Specifically, can we in one call from any thread find the number of particles running in all other cores? Yes, we can allocate in advance more memory in the arrays than potential number of particles but this is not the way intuition tells you to proceed...
Third way is to find number of particles in each core manually counting linked lists, create the derived type multidimensional array with TYPE and allocate its subdimensions according to the found number of particles for each core and each species. It is not clear if MPI or Fortran will like that essentially "harassment" of the branches of one global array in many different threads by arbitrary allocating/deallocating them there and also not just at ones but in parallel.

In all these and many other cases though pausing all running threads on specific place with the some kind of a barrier, do your things with the code data and continue looks so appealing.

Any considerations are welcome.

Status-Mirror · 2022-11-27T15:48:09Z

Status-Mirror
Nov 27, 2022
Maintainer

Hey Dan,

I'm away from my workspace for another week or two, so my support here is limited.

I've been thinking about your request, and I believe the best way to do this would be with different file names. For particles, you could make files with names like:

particles_step_124_rank_12_species_1.out

And in this, use each line to output the position, momentum and weight of each macro-particle. This must be done by creating a particle pointer, and going through the linked list (see an example in bremsstrahlung.F90, for update optical depth). Then it becomes a problem of post-processing - combining all the particle info from all the rank files into a single collection. This doesn't have to be done in EPOCH with MPI barriers and gatherers. If each rank is writing to a separate file, there's no data conflicts.

Similarly, you could have:

Ex_step_124_rank_12.out

Where you output the Ex grid which is local to the rank. If you output from 1 to nx (or ny or nz), then you just get the data without ghost cells.

Perhaps it's not the most elegant way, but it's definitely the easiest.

Cheers,
Stuart

0 replies

DanRRRR · 2022-11-28T16:52:24Z

DanRRRR
Nov 28, 2022
Author

Thanks Stuart for suggestion. This was great insight !
Over the last month I tried ~20 different ways and variants of binary/ASCII output and the what you have described was the last one in my list because i was reluctant to open 1000s of files, several for each thread, because with smaller files you can not reach the peak bandwidth. But damn thing are that all these variants ended not right way, crashing runs, crashing Linux, corrupting files, corrupting threads flow due to weirdly working MPI_barrier, even corrupting file names (try to find in less than couple days without cursing and swearing one spurios byte in 4096 symbol long Linux file names why seemingly the same name files abruptly can not APPEND) ! Nothing worked. When yesterday i gave up and tried the one you described, and despite if was literally the smallest and simplest one -- it appeared the only which actually works !!! Will look how we will improve it starting from this. For example RAM drive could be good solution to keep and assemble these temporal files without losing bandwidth. Or some other ways will be found to combine these files (good here is that the order of this data does not matter and each file is substantially large to reach peak bandwidth). Anyway, this was nice milestone to kick out this proprietary blackbox SDF bs. I speculate that if developers have initially chosen for example HDF and binary/ASCII or even just the SILO and VTK graphics formats the EPOCH had 10x more users.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best way to get output from output_routines ? #421

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Best way to get output from output_routines ? #421

DanRRRR Nov 25, 2022

Replies: 2 comments

Status-Mirror Nov 27, 2022 Maintainer

DanRRRR Nov 28, 2022 Author

DanRRRR
Nov 25, 2022

Status-Mirror
Nov 27, 2022
Maintainer

DanRRRR
Nov 28, 2022
Author