Best practice for optimizing compilation of complex forms

26 days ago by
Dear all,

for my application I need to build a quite complex form defined on a mixed FunctionSpace which consists of a n-times duplicate of scalar and vectorial FE with n being potentially quite large (up to 50) for instance. Here is a simplified example for n=10:
# simple 2D mesh
N = 50
mesh = UnitSquareMesh(N, N)

# FunctionSpace containing `nlayers` vectorial and scalar fields
nlayers = 10
U1e = VectorElement("CG", mesh.ufl_cell(), 2)
U2e = FiniteElement("CG", mesh.ufl_cell(), 1)
list_Ve = [U1e, U2e]*nlayers
V = FunctionSpace(mesh, MixedElement(list_Ve))​

Generalized strain measures consisting of the field themselves or their gradient are defined and the form contains nxn cross-products of similar strain measures as follows:

u = Function(V)
u_ = TestFunction(V)
du = TrialFunction(V)
e = Constant(0.1)

def generalized_deformation(u):
    Ug = split(u)
    list_e1 = [as_vector([sym(grad(Ug[2*i]))[j, k] for j,k in [(0,0), (1,1), (0,1)]])
                for i in range(nlayers)]
    list_e2 = [Ug[2*i+3] - Ug[2*i+1] for i in range(nlayers - 1)]
    list_e3 = [Ug[2*i+2] - Ug[2*i]
               - e*(grad(Ug[2*i+1]) + grad(Ug[2*i+3]))
                   for i in range(nlayers - 1)]
    return list_e1, list_e2, list_e3

def custom_inner(E1, E2):
    a_list = []
    for (e1, e2) in zip(E1, E2):
        for el1 in e1:
            for el2 in e2:
                s = shape(el1)
                if s:
                    m = s[0]
                    Ce = as_matrix([[1.]*m]*m)
                    a_list.append(dot(el1, dot(Ce, el2)))
                    a_list.append(dot(el1, el2))
    return sum(a_list)

dEps = generalized_deformation(du)
Eps_ = generalized_deformation(u_)
a = custom_inner(dEps, Eps_)*dx
tic = time()
print "Compilation time (split):", time()-tic
tic = time()
print "Assembly time (split):", time()-tic​

I also implemented a variant in which the similar strain measures are collected into a global vector so that the cross-products can be performed using a simple vector, Matrix, vector dot product as follows:

def generalized_deformation_vect(u):
    Ug = split(u)
    list_e1 = [sym(grad(Ug[2*i]))[j, k] for i in range(nlayers)
               for j,k in [(0,0), (1,1), (0,1)]]
    list_e2 = [Ug[2*i+3] - Ug[2*i+1] for i in range(nlayers - 1)]
    list_e3 = [Ug[2*i+2][j] - Ug[2*i][j]
               - e*(grad(Ug[2*i+1])[j] + grad(Ug[2*i+3])[j])
                   for i in range(nlayers - 1) for j in range(2)]
    return as_vector(list_e1), as_vector(list_e2), as_vector(list_e3)

tic = time()
dEps = generalized_deformation_vect(du)
Eps_ = generalized_deformation_vect(u_)
a2 = []
for i in range(3):
    n = len(dEps[i])
    C = as_matrix([[1.]*n]*n)
    a2.append(dot(dEps[i], dot(C, Eps_[i])))
a2 = sum(a2)*dx
tic = time()
print "Compilation time (vector):", time()-tic
tic = time()
print "Assembly time (vector):", time()-tic​

My question is to know if there are some general best practice advice when building such complex forms, especially regarding their compilation. Indeed:
  • using uflacs: compilation of the first (split) approach took 40s while the second (vector) took 35s, assembly time took 14s
  • using quadrature: compilation of the first (split) approach took 11s while the second (vector) took 6s, assembly time took 15s
For n=20, the first approach (split) failed with a maximum recursion depth exceeded while calling a Python object, no problem was observed with the vector approach:
  • uflacs compiled in 475s, assembly in 95s
  • quadrature compiled in 29s, assembly in 100s
I find that the compilation speed-up of quadrature is quite important, which is especially important when considering much larger n. Since quadrature is intended to disappear as uflacs is supposed to be more efficient, is there any idea why there are such differences ?

Community: FEniCS Project
Have you tried ?
written 25 days ago by Nate  
Yes but I got a JIT failure that I do not really understand...
written 23 days ago by bleyerj  
Please login to add an answer/comment or follow this question.

Similar posts:
Search »