MUMPS error: signal 11 "segmentation fault"


246
views
0
7 months ago by
RR  
Hi,

for some reason, recently, I run into the following cryptic error quite often at the beginning of the solution process of an assembled system of equations with MUMPS:

mpirun noticed that process rank 0 with PID 152515 on node s4lx61 exited on signal 11 (Segmentation fault).

I'm solving ill-conditioned systems of EM equations and the error occurs only from time to time for bigger meshes (+100 k nodes) and mostly if I use 2nd order polyonmials.
I wonder most about the following fact: I can solve identical physical problems on the same mesh using 1st order polynomials without any issues, but with using p2 polys, MUMPS crashes with the above error message.

If I "re-mesh" the same geometry with slightly different parameters (quality, discretization) and run the code on the "new" mesh, suddenly MUMPS is able to solve the system with p2 polys as expected. I'm pretty sure that it is not a matter of memory, since I already solved bigger systems with p2. Maybe it has something to do with a few badly shaped tetrahedras affecting the structure of the assembled matrix A? Unfortunately, I found not any reasonable hint about "exited on signal 11 (Segmentation fault)" via google, but maybe someone in the community has experienced the same issue and knows where it comes from?

Thank you in advance for any suggestions!

Community: FEniCS Project

1 Answer


0
7 months ago by
RR  
I still don't exactly know where the error comes from, but I'm pretty sure my issue IS related to the overall available RAM and it's distribution, corresponding to information from different discussions I found about it and the following tests:

I have 1TB available and can still solve a problem which needs ~ 900GB, but MUMPS fails with signal 11 (Segmentation fault) if I increase the mesh size a bit . I tested this for 2 different meshes with varying physics and geometry but similar domain sizes/discretization. I obviously understimated the required RAM increase, which is quite a lot if only a few percent more nodes are added to an already large model.
Now I wonder if MUMPS is able to estimate the required amount of RAM at the beginning of the solution process and hence, stops already at the beginning or if the issue had still another reason.
At least, applying the "out_of_core" option for MUMPS doesn't solve the signal 11 (Segmentation fault) crash.

For me the issue is solved since I don't want to go for even larger systems anyway, though I found only assumptions and no real answear for the original question. If anyone else wants to contribute to this topic, I'm still interested.
Please login to add an answer/comment or follow this question.

Similar posts:
Search »