How do I stop the nonlinear solver as soon as it encounters nan?


179
views
0
10 weeks ago by

The short question: How do I make it so that the nonlinear solver stops iterating once encountering a nan (i.e. NaN, Not a Number)?

The details:

Quite often when attempting to solve new problems, I spend an unreasonable amount of time watching the nonlinear solver return nan's until it reaches the maximum number of Newton iterations.

Here is a minimal script which will demonstrate this behavior:

import fenics


function_space = fenics.FunctionSpace(fenics.UnitIntervalMesh(1), "P", 1)

u, v = fenics.Function(function_space), fenics.TestFunction(function_space)


inner, grad, tanh, dx = fenics.inner, fenics.grad, fenics.tanh, fenics.dx

F = (inner(grad(v), grad(u)) + v*tanh(u))*dx


solver = fenics.NonlinearVariationalSolver(
    problem = fenics.NonlinearVariationalProblem(
        F = F, 
        u = u, 
        bcs = fenics.DirichletBC(function_space, 1000., "near(x[0], 0.)"), 
        J = fenics.derivative(form = F, u = u)))

        
solver.parameters["newton_solver"]["maximum_iterations"] = 3
        
solver.solve()


Running this with FEniCS 2017.2.0 prints

Solving nonlinear variational problem.
Newton iteration 0: r (abs) = 1.302e+03 (tol = 1.000e-10) r (rel) = 1.000e+00 (tol = 1.000e-09)
Newton iteration 1: r (abs) = -nan (tol = 1.000e-10) r (rel) = -nan (tol = 1.000e-09)
Newton iteration 2: r (abs) = -nan (tol = 1.000e-10) r (rel) = -nan (tol = 1.000e-09)
Newton iteration 3: r (abs) = -nan (tol = 1.000e-10) r (rel) = -nan (tol = 1.000e-09)
Traceback (most recent call last):
File "newton_nan.py", line 24, in <module>
solver.solve()
RuntimeError:

*** -------------------------------------------------------------------------
*** DOLFIN encountered an error. If you are not able to resolve this issue
*** using the information listed below, you can ask for help at
***
*** fenics-support@googlegroups.com
***
*** Remember to include the error message listed below and, if possible,
*** include a *minimal* running example to reproduce the error.
***
*** -------------------------------------------------------------------------
*** Error: Unable to solve nonlinear system with NewtonSolver.
*** Reason: Newton solver did not converge because maximum number of iterations reached.
*** Where: This error was encountered inside NewtonSolver.cpp.
*** Process: 0
***
*** DOLFIN version: 2017.2.0
*** Git changeset: unknown
*** -------------------------------------------------------------------------

Aborted (core dumped)


It seems unreasonable to me that the nonlinear solver should continue after encountering NaN's. So a side question: Can anyone provide any example of why one would ever want to continue iterating after encountering nan's? If not, then maybe I should file this as an issue on Bitbucket. My quick query didn't find an existing issue.

There is some relevant discussion in the answer to a question about other nonlinear solver failures.

Based on a somewhat similar question about catching divergence, I think the answer is going to be "do it yourself". The answer to that question shows how we can define a new class inheriting from the Newton solver and redefine a method to solve that particular issue. Unfortunately, I never got around to figuring out how to do this when using fenics.AdaptiveNonlinearVariationalSolver, which is critical for my work. I wanted to get my current question posted before digging into that deeply again. There comes a point where I'm having to redefine quite a lot of code and dig deep into undocumented territory. For the case of catching "divergence", I understand how I was asking for something much less well defined than handling actual nan's.

Edit: Set the maximum number of iterations in the example to shorten and clarify the question.

Community: FEniCS Project
Do you want the program to stop or the nonlinear solver to return with an error? Or an exception?
written 10 weeks ago by Marco Morandini  
Right now I can happily work with any of these behaviors.

Ideally, it would otherwise behave the same as it does currently. This means that either it would throw an error or return a value based on the "error_on_nonconvergence" parameter.
written 10 weeks ago by Alexander G. Zimmerman  

1 Answer


4
10 weeks ago by
1) you put this snippet in a file, say pippo.c

#define _GNU_SOURCE
#include <fenv.h>
int trap_fpe_exceptions(void) {
    int excepts = FE_INVALID|FE_DIVBYZERO|FE_OVERFLOW;
    return feenableexcept(excepts);
}
​

and build it as a shared library, say pippo.so:
gcc -o pippo.so -fPIC -shared pippo.c​


2) in python

import ctypes
fexc = ctypes.CDLL('pippo.so')
fexc.trap_fpe_exceptions()

Thanks for the idea.

This almost gets there; but has at least two critical issues.

1. The new output when run from the command line is 

    Solving nonlinear variational problem.
    Newton iteration 0: r (abs) = 1.302e+03 (tol = 1.000e-10) r (rel) = 1.000e+00 (tol = 1.000e-09)

So there's no way to know why the program stopped, and no NaN values are printed.

2. This kills Jupyter notebook kernels.


On a lesser note: this is much more hackey than what I want; but perhaps if I feel strongly about that aspect, then I can file this as an issue on Bitbucket instead. Then again, both issues above are side effects of the hacking.

written 7 weeks ago by Alexander G. Zimmerman  
Also for the record, both on the Ubuntu command line and in a Jupyter notebook, I found it necessary to slightly modify the second step of the answer with

import os

fexc = ctypes.CDLL(os.path.abspath('pippo.so'))​

since apparently ctypes.CDLL was not finding the present working directory in the path.
written 7 weeks ago by Alexander G. Zimmerman  
Please login to add an answer/comment or follow this question.

Similar posts:
Search »