MUMPS crashes due to "integer overflow" (error code -51) for VERY large ystems

29 days ago by
This is a rather sepcific question regarding MUMPS but I would be glad if someone might be able to help me understanding the following issue:

As I recently got access to a new machine, I tired to solve even larger models with highly accurate discretization in certain model areas which require >1TB RAM.  If solving very large models with system sizes of approx. >20 M dofs directly with MUMPS, the solver crashes because of "INTEGER OVERFLOW" and with enabling the verbose mode, I found that the cooresponding MUMPS error code is "-51".

Searching within the MUMPS User Guide, I found:
     "-51 error message. Error -51 which was previously raised in case of integer overflow during analysisis now only raised when a 32-bit external ordering is invoked on a graph with more than 2^31 - 1 edges."

1. Can someone explain to me what "when a 32-bit external ordering is invoked on a graph with more than 2^31 - 1 edges" exactly means which might help to understand or resolve this problem?
2. Does FEniCS use the 32 or 64 bit version of MUMPS and if 64 bit, are there other dependencies that might force MUMPS to use 32 bit? (at least I read something like this in the user guide)
3. Is it possible that these systems (the crash occurs not only for approx. >20 M dofs but depends also on the number of entries, which are of course way more if using second order polyonmials.) are just too large for the (I assume) 32-bit version of MUMPS.? I can hardly imagine this as 2^32 is approx. 4.3^9.

Best regards, RR
Community: FEniCS Project

1 Answer

29 days ago by
FEniCS uses MUMPS provided by PETSc. PETSc can be built in two modes: 32 or 64-bit integer width. But PETSc deliberately refuses to cooperate with MUMPS in 64-bit mode: This is not anything we can do about in FEniCS.

Regarding your confusion about number of DOFs not reaching 2^32: sparse direct solvers perform LU (Cholesky, or LDL^T) factorization. Factorization of a sparse matrix is generally much less sparser. Graph and edges in the error message refer to a certain graph model representing the sparse matrix and/or its factors.
Thanks for the explanation! I've read about "super_lu_dist" as alternative to MUMPS. Is it a reasonable alternative for symemtric systems and could it be possible to add it to the Ubuntu or conda installation?
written 29 days ago by RR  
SuperLU_dist can't be added to any of binary distributions of FEniCS because its license does not allow it. I strongly recommend to query copyright holders of SuperLU_dist about it.
written 29 days ago by Jan Blechta  
good to know but bad for the FEniCS community as there seems to be no competitive alternative to MUMPS - I assume there are similar issues with PARDISO or WSMP.
written 29 days ago by RR  
Haha, closed source?
written 29 days ago by Jan Blechta  
Please login to add an answer/comment or follow this question.

Similar posts:
Search »