FFC fails on MPI Cluster with multiple nodes: Error inside jit


109
views
1
11 weeks ago by
Dear FEniCS team and Users,

I have the FEniCS version 2017.2.0 installed on a Ubuntu-based Cluster with multiple processors and nodes. (see the file "FEniCS_installation_details")

When I try to run parallel jobs (with a number or processors > 2), after cleaning the cache, the simulation crashes with the error message attached (see "FEniCS_error_output"). The problem does not arise if the same code is executed in a sequential mode; after the first sequential execution and without cache removal also the parallel jobs work fine.
Is it a problem related to this particular FEniCS version?
Does anybody has suggestions?

Thanks in advance for your time and Best Regards

File attached: FEniCSinstallation_details.txt (216 Bytes)
File attached: error_message.txt (9.21 KB)

Community: FEniCS Project
I've seen the same thing happen on a different cluster.  My workaround was just to always run a one-element-mesh "compilation" job in serial (which usually moves through the queue quickly) before submitting big parallel runs, although that's not a very satisfying solution.  I'd be interested to see if someone has a smarter way to fix this.
written 11 weeks ago by David Kamensky  
Thanks for this suggestion. I've tried it but it doesn't seem to work in my case: even after the 1-element sequential run the parallel job stops running with the same error message.
written 10 weeks ago by CBressan  
Please login to add an answer/comment or follow this question.

Similar posts:
Search »