FreeFEM Documentation on GitHub

stars - forks

ffddm documentation

Minimal example

 1macro dimension 3// EOM            // 2D or 3D
 2
 3include "ffddm.idp"
 4
 5load "msh3"
 6
 7int[int] LL = [2,2, 1,2, 2,2];
 8mesh3 ThGlobal = cube(10, 10, 10, [x, y, z], label = LL);      // global mesh
 9
10macro grad(u) [dx(u), dy(u), dz(u)]// EOM    // three-dimensional gradient
11
12macro Varf(varfName, meshName, VhName)
13    varf varfName(u,v) = int3d(meshName)(grad(u)'* grad(v)) + int3d(meshName)(v) + on(1, u = 1.0);
14// EOM
15
16// Domain decomposition
17ffddmbuildDmesh( LapMesh , ThGlobal , mpiCommWorld )
18
19macro def(i)i// EOM                         // scalar field definition
20macro init(i)i// EOM                        // scalar field initialization
21ffddmbuildDfespace( LapFE , LapMesh , real , def , init , P1 )
22
23ffddmsetupOperator( Lap , LapFE , Varf )
24
25real[int] rhsi(0);
26ffddmbuildrhs( Lap , Varf , rhsi )
27
28LapFEVhi def(ui);
29
30//Direct solve
31ui[] = Lapdirectsolve(rhsi);
32
33Lapwritesummary
34
35ffddmplot(LapFE,ui,"u");

Overlapping mesh decomposition

1ffddmbuildDmesh(prmesh,Th,comm)

decomposes the mesh Th into overlapping submeshes. The mesh will be distributed over the mpi ranks of communicator comm. This will create and expose variables whose names will be prefixed by prmesh, see below (# is the concatenation operator). The way the initial mesh Th is partitioned depends on the value of ffddmpartitioner.

The size of the overlap between subdomains (its width in terms of number of mesh elements) is given by ffddmoverlap.

The level of refinement of the resulting submeshes with respect to the input mesh Th is given by ffddmsplit.

If ffddmexclude \(\neq 0\), the first ffddmpCS mpi ranks of comm will be excluded from the spatial domain decomposition, in order to dedicate them later to the coarse problem (for two-level preconditioners).

The label of the new border of the submeshes (the interface between the subdomains) is given by ffddminterfacelabel.

defines:

  • int prmesh#npart number of subdomains for this decomposition; should be equal to mpiSize(comm) - ffddmexclude * ffddmpCS

  • int prmesh#pCS equal to ffddmpCS

  • int prmesh#exclude equal to ffddmexclude

  • int prmesh#excluded true if ffddmexclude is true (\(\neq 0\)) and mpiRank(comm) < prmesh#pCS. In this case, this mpi rank will be excluded from the spatial domain decomposition and will only work on the coarse problem.

  • mpiComm prmesh#commddm mpi communicator for ranks participating in the spatial domain decomposition (ranks 0 to prmesh#npart - 1 in comm if prmesh#exclude is false, ranks prmesh#pCS to prmesh#pCS+prmesh#npart - 1 otherwise)

  • mpiComm prmesh#commCS mpi communicator for ranks participating in the assembly and resolution of the coarse problem for two-level preconditioners (ranks 0 to prmesh#pCS - 1 in comm)

  • mpiComm prmesh#commself self mpi communicator (this mpi rank only), used for factorizing local matrices

  • meshN[int] prmesh#aTh array (size prmesh#npart) of local meshes of the subdomains. In the standard parallel case, only the local mesh for this mpi rank prmesh#aTh[mpiRank(prmesh#commddm)] is defined (unless this mpi rank is excluded from the spatial domain decomposition, i.e. prmesh#excluded = 1, see below). In the sequential case, all local meshes are defined.

  • meshN prmesh#Thi the local mesh of the subdomain for this mpi rank, i. e. prmesh#aTh[mpiRank(prmesh#commddm)] in the parallel case

  • int prmesh#numberIntersection the number of neighbors for this mpi rank

  • int[int] prmesh#arrayIntersection the list of neighbor ranks in prmesh#commddm for this mpi rank

Remark for sequential use (see -seqddm):
  • meshN[int] prmesh#aTh array (size prmesh#npart) of local meshes of the subdomains

Local finite element spaces

1ffddmbuildDfespace(prfe,prmesh,scalar,def,init,Pk)

builds the local finite element spaces and associated distributed operators on top of the mesh decomposition prmesh. This will create and expose variables whose names will be prefixed by prfe, see below. It is assumed that ffddmbuildDmesh has already been called with prefix prmesh in order to build the mesh decomposition.

The local finite element spaces of type Pk (where Pk is the type of finite element: P1, [P2,P2,P1], …) are defined on the local meshes of the subdomains based on the mesh decomposition previously created with prefix prmesh.

scalar determines the type of data for this finite element: real or complex.

Two macros, def and init, are needed: def specifies how to define a finite element function in the finite element space Pk, and init specifies how to interpolate a scalar function onto the (possibly multiple) components of Pk. Two examples are given below:

For scalar P2 finite elements and complex-valued problems:

1macro def(u) u// EOM
2macro init(u) u// EOM
3ffddmbuildDfespace(myFEprefix,mymeshprefix,complex,def,init,P2)

For vectorial [P2,P2,P1] finite elements and real-valued problems:

1macro def(u) [u, u#B, u#C]// EOM
2macro init(u) [u, u, u]// EOM
3ffddmbuildDfespace(myFEprefix,mymeshprefix,real,def,init,[P2,P2,P1])

In practice, this builds the necessary distributed operators associated to the finite element space: the local partition of unity functions \((D_i)_{i=1,...,N}\) (see prfe#Dk and prfe#Dih below) as well as the function prfe#update (see below) which synchronizes local vectors \((u_i)_{i=1,...,N}\) between neighboring subdomains, performing the equivalent of \(u_i = R_i (\sum_{j=1}^N R_j^T u_j)\) or \(u_i = R_i (\sum_{j=1}^N R_j^T D_j u_j)\) in a distributed parallel environment.

prfe#scalprod (see below) performs the parallel scalar product for vectors defined on this finite element.

defines:

  • prfe#prmesh macro, saves the parent prefix prmesh of the mesh decomposition

  • prfe#K macro, saves the type of data scalar for this finite element space (real or complex)

  • func prfe#fPk saves the type of finite element Pk, e.g. P1, [P2,P2,P1], …

  • fespace prfe#Vhi the local finite element space for this mpi rank, defined on the local mesh prmesh#Thi

  • int prfe#Ndofglobal the total number of degrees of freedom \(n\) for this finite element discretization

  • prfe#mdef macro, saves the macro def giving the definition of a finite element function in the finite element space Pk

  • prfe#minit macro, saves the macro init specifying how to interpolate a scalar function onto the (possibly multiple) components of a finite element function of Pk. This is used to create the local partition of unity function in prfe#Vhi, by interpolating the local P1 partition of unity function onto the components of prfe#Vhi. For non Lagrange finite element spaces (e.g. RT0, Edge03d, …), see ffddmbuildDfespaceEdge.

  • prfe#K[int][int] prfe#Dk array (size prmesh#npart) of local partition of unity vectors in the subdomains, equivalent to \((D_i)_{i=1,...,N}\). In the standard parallel case, only the local partition of unity vector for this mpi rank prfe#Dk[mpiRank(prmesh#commddm)] is defined (unless this mpi rank is excluded from the spatial domain decomposition, i. e. prmesh#excluded = 1). In the sequential case, all local partition of unity vectors are defined.

  • matrix<prfe#K>[int] prfe#Dih array (size prmesh#npart) similar to prfe#Dk but in matrix form, allowing for easier matrix-matrix multiplications. prfe#Dih[i] is a diagonal matrix, with the diagonal equal to prfe#Dk[i].

  • fespace prfe#Vhglob the global finite element space defined on the global mesh prmesh#Thglob. Defined only if -noGlob is not used.

  • matrix<prfe#K>[int] prfe#Rih array (size prmesh#npart) of restriction matrices from the global finite element space to the local finite element spaces on the local submeshes of the subdomains. In the standard parallel case, only the restriction matrix for this mpi rank prfe#Rih[mpiRank(prmesh#commddm)] is defined (unless this mpi rank is excluded from the spatial domain decomposition, i. e. prmesh#excluded = 1). In the sequential case, all restriction matrices are defined. The restriction matrices prfe#Rih are defined only if -noGlob is not used.

  • func int prfe#update(scalar[int] ui, bool scale) The function prfe#update synchronizes the local vector ui between subdomains by exchanging the values of ui shared with neighboring subdomains (in the overlap region) using point-to-point MPI communications. If scale is true, ui is multiplied by the local partition of unity beforehand. This is equivalent to \(u_i = R_i (\sum_{j=1}^N R_j^T u_j)\) when scale is false and \(u_i = R_i (\sum_{j=1}^N R_j^T D_j u_j)\) when scale is true.

  • func scalar prfe#scalprod(scalar[int] ai, scalar[int] bi) The function prfe#scalprod computes the global scalar product of two vectors whose local restriction to the subdomain of this mpi rank are ai and bi. The result is computed as \(\sum_{j=1}^N (D_j a_j, b_j)\).

Define the problem to solve

1ffddmsetupOperator(pr,prfe,Varf)

builds the distributed operator associated to the variational problem given by Varf, on top of the distributed finite element space prfe. This will create and expose variables whose names will be prefixed by pr, see below. It is assumed that ffddmbuildDfespace has already been called with prefix prfe in order to define the distributed finite element space.

In practice, this builds the so-called local ‘Dirichlet’ matrices \(A_i = R_i A R_i^T\), the restrictions of the global operator \(A\) to the subdomains (see pr#aRdbelow). The matrices correspond to the discretization of the bilinear form given by the macro Varf, which represents the abstract variational form of the problem. These matrices are then used to implement the action of the global operator \(A\) on a local vector (the parallel matrix-vector product with \(A\)), see pr#A below.

At this point, we already have the necessary data to be able to solve the problem with a parallel direct solver (MUMPS), which is the purpose of the function pr#directsolve (see below). See ffddmbuildrhs for building the right-hand side.

The macro Varf is required to have three parameters: the name of the variational form, the mesh, and the finite element space. The variational form given in this ‘abstract’ format will then be used by ffddm to assemble the discrete operators by setting the appropriate mesh and finite element space as parameters. An example is given below:

1macro myVarf(varfName, meshName, VhName)
2    varf varfName(u,v) = int3d(meshName)(grad(u)''* grad(v)) + on(1, u = 1.0);
3// EOM
4
5ffddmsetupOperator(myprefix,myFEprefix,myVarf)

Remark In this simple example, the third parameter VhName is not used. However, for more complex cases such as non-linear or time dependent problems where the problem depends on a solution computed at a previous step, it is useful to know for which discrete finite element space the variational form is being used. See for example TODO

defines:

  • pr#prfe macro, saves the parent prefix prfe of the finite element space

  • int pr#verbosity the level of verbosity for this problem, initialized with the value of ffddmverbosity

  • pr#writesummary macro, prints a summary of timings for this problem, such as the time spent to assemble local matrices or solve the linear system.

  • matrix<prfe#K> pr#Aglobal the global matrix \(A\) corresponding to the discretization of the variational form given by the macro Varf on the global finite element space prfe#Vhglob. Defined only in the sequential case.

  • matrix<prfe#K>[int] pr#aRd array (size prfe#prmesh#npart) of so-called local ‘Dirichlet’ matrices in the subdomains; these are the restrictions of the global operator to the subdomains, equivalent to \(A_i = R_i A R_i^T\) with \(A\) the global matrix corresponding to the discretization of the variational form given by the macro Varf on the global finite element space. In the standard parallel case, only the local matrix for this mpi rank pr#aRd[mpiRank(prmesh#commddm)] is defined (unless this mpi rank is excluded from the spatial domain decomposition, i. e. prmesh#excluded = 1). In the sequential case, all local matrices are defined.

  • func prfe#K[int] pr#A(prfe#K[int] &ui) The function pr#A computes the parallel matrix-vector product, i.e. the action of the global operator \(A\) on the local vector \(u_i\). The computation is equivalent to \(R_i (\sum_{j=1}^N R_j^T D_j A_j u_j)\) and is performed in parallel using local matrices pr#aRd and the function prfe#update. In the sequential case, the global matrix pr#Aglobal is used instead.

  • func prfe#K[int] pr#AT(prfe#K[int] &ui) Similarly to pr#A, The function pr#AT computes the action of \(A^T\), the transpose of the global operator \(A\), on \(u_i\).

  • func prfe#K[int] pr#directsolve(prfe#K[int]& rhsi) The function pr#directsolve allows to solve the linear system \(A x = b\) in parallel using the parallel direct solver MUMPS. The matrix is given to MUMPS in distributed form through the local matrices pr#aRd. The input rhsi is given as a distributed vector (rhsi is the restriction of the global right-hand side \(b\) to the subdomain of this mpi rank, see ffddmbuildrhs) and the returned vector is local as well.

Remark: rectangular operators

It is possible to define a non-square distributed operator where the variational form takes two different finite element spaces of unknown and test functions. This is done through macro ffddmsetupOperatorRect which takes two FE prefixes (which must be defined on the same mesh prefix), see below:

1macro myVarf(varfName, meshName, VhName)
2    varf varfName([u, uB, uC], [q]) = int3d(meshName)(div(u) * q);
3// EOM
4
5ffddmsetupOperatorRect(myprefix,myFEprefixV,myFEprefixP,myVarf)

1ffddmbuildrhs(pr,Varfrhs,rhs)

builds the right-hand side associated to the variational form given by Varfrhs for the problem corresponding to prefix pr. The resulting right-hand side vector rhs corresponds to the discretization of the abstract linear form given by the macro Varfrhs (see ffddmsetupOperator for more details on how to define the abstract variational form as a macro).

The input vector rhs is resized and contains the resulting local right-hand side \(R_i b\), the restriction of the global right-hand side \(b\) to the subdomain of this mpi rank. In the sequential case, the global right-hand side vector \(b\) is assembled instead.

An example is given below:

1macro myVarfrhs(varfName, meshName, VhName)
2    varf varfName(u,v) = intN(meshName)(v) + on(1, u = 1.0);
3// EOM
4
5real[int] rhsi(0);
6ffddmbuildrhs(myprefix,myVarfrhs,rhsi)

One level preconditioners

1ffddmsetupPrecond(pr,VarfPrec)

builds the one level preconditioner for problem pr. This will create and expose variables whose names will be prefixed by pr, see below. It is assumed that ffddmsetupOperator has already been called with prefix pr in order to define the problem to solve.

In practice, this builds and performs the factorization of the local matrices used in the one level preconditioner. The local matrices depend on the choice of ffddmprecond and VarfPrec, see pr#aRbelow.

defines:

  • string pr#prec equal to ffddmprecond. Sets the type of one level preconditioner \(M^{-1}_1\) to be used: “asm” (Additive Schwarz), “ras” (Restricted Additive Schwarz), “oras” (Optimized Restricted Additive Schwarz), “soras” (Symmetric Optimized Restricted Additive Schwarz) or “none” (no preconditioner).

  • matrix<pr#prfe#K>[int] pr#aR array (size prfe#prmesh#npart) of local matrices used for the one level preconditioner. Each mpi rank of the spatial domain decomposition performs the \(LU\) (or \(LDL^T\)) factorization of the local matrix corresponding to its subdomain using the direct solver MUMPS.

    • If VarfPrec is not a previously defined macro (just put null for example), the matrices pr#aR are set to be equal to the so-called local ‘Dirichlet’ matrices pr#aRd (see ffddmsetupOperator). This is for the classical ASM preconditioner \(M^{-1}_1 = M^{-1}_{\text{ASM}} = \sum_{i=1}^N R_i^T A_i^{-1} R_i\) or classical RAS preconditioner \(M^{-1}_1 = M^{-1}_{\text{RAS}} = \sum_{i=1}^N R_i^T D_i A_i^{-1} R_i\) (it is assumed that ffddmprecond is equal to “asm” or “ras”).

    • If VarfPrec is a macro, it is assumed that VarfPrec defines an abstract bilinear form (see ffddmsetupOperator for more details on how to define the abstract variational form as a macro).

      • If ffddmprecond is equal to “asm” or “ras”, the matrices pr#aR will be assembled as local ‘Dirichlet’ matrices in the same manner as pr#aRd, but using the bilinear form defined by VarfPrec instead. This defines the ASM preconditioner as \(M^{-1}_1 = M^{-1}_{\text{ASM}} = \sum_{i=1}^N R_i^T {(A_i^{\text{Prec}})}^{-1} R_i\) and the RAS preconditioner as \(M^{-1}_1 = M^{-1}_{\text{RAS}} = \sum_{i=1}^N R_i^T D_i {(A_i^{\text{Prec}})}^{-1} R_i\), where \(A_i^{\text{Prec}} = R_i A^{\text{Prec}} R_i^T\).

      • If ffddmprecond is equal to “oras” or “soras”, the matrices pr#aR will correspond to the discretization of the variational form VarfPrec in the subdomains \(\Omega_i\). In particular, various boundary conditions can be imposed at the interface between subdomains (corresponding to mesh boundary of label ffddminterfacelabel set by the parent call to ffddmbuildDmesh), such as Optimized Robin boundary conditions. We note the ORAS preconditioner as \(M^{-1}_1 = M^{-1}_{\text{ORAS}} = \sum_{i=1}^N R_i^T D_i {(B_i^{\text{Prec}})}^{-1} R_i\) and the SORAS preconditioner as \(M^{-1}_1 = M^{-1}_{\text{SORAS}} = \sum_{i=1}^N R_i^T D_i {(B_i^{\text{Prec}})}^{-1} D_i R_i\).

  • func pr#prfe#K[int] pr#PREC1(pr#prfe#K[int] &ui) The function pr#PREC1 computes the parallel application of the one level preconditioner \(M^{-1}_1\), i.e. the action of \(M^{-1}_1\) on the local vector \(u_i\). In the sequential case, it computes the action of \(M^{-1}_1\) on a global vector. The action of the inverse of local matrices pr#aRd is computed by forward-backward substitution using their \(LU\) (or \(LDL^T\)) decomposition.

  • func pr#prfe#K[int] pr#PREC(pr#prfe#K[int] &ui) The function pr#PREC corresponds to the action of the preconditioner \(M^{-1}\) for problem pr. It coincides with the one level preconditioner pr#PREC1 after the call to ffddmsetupPrecond. If a second level is subsequently added (see the next section about Two level preconditioners), it will then coincide with the two level preconditioner \(M^{-1}_2\) (see pr#PREC2level).

  • func pr#prfe#K[int] pr#fGMRES(pr#prfe#K[int]& x0i, pr#prfe#K[int]& bi, real eps, int nbiter, string sprec) The function pr#fGMRES allows to solve the linear system \(A x = b\) in parallel using the flexible GMRES method preconditioned by \(M^{-1}\). The action of the global operator \(A\) is given by pr#A, the action of the preconditioner \(M^{-1}\) is given by pr#PREC and the scalar products are computed by pr#scalprod. More details are given in the section Solving the linear system.

Two level preconditioners

The main ingredient of a two level preconditioner is the so-called ‘coarse space’ matrix \(Z\).

\(Z\) is a rectangular matrix of size \(n \times n_c\), where usually \(n_c \ll n\).

\(Z\) is used to build the ‘coarse space operator’ \(E = Z^T A Z\), a square matrix of size \(n_c \times n_c\). We can then define the ‘coarse space correction operator’ \(Q = Z E^{-1} Z^T = Z (Z^T A Z)^{-1} Z^T\), which can then be used to enrich the one level preconditioner through a correction formula. The simplest one is the additive coarse correction: \(M^{-1}_2 = M^{-1}_1 + Q\). See pr#corr below for all other available correction formulas.

There are multiple ways to define a relevant coarse space \(Z\) for different classes of problems. ffddmgeneosetup defines a coarse space correction operator by building the GenEO coarse space, while ffddmcoarsemeshsetup builds the coarse space using a coarse mesh.

After a call to either ffddmgeneosetup or ffddmcoarsemeshsetup, the following variables and functions are set up:

  • int pr#ncoarsespace the size of the coarse space \(n_c\).

  • string pr#corr initialized with the value of ffddmcorrection. Specifies the type of coarse correction formula to use for the two level preconditioner. The possible values are:

\[\begin{split}\begin{array}{llllll} \nonumber &&\text{"AD"}:&&\textit{Additive}, \quad &M^{-1} = M^{-1}_2 = \phantom{(I - Q A) }M^{-1}_1\phantom{ (I - A Q)} + Q\\ &&\text{"BNN"}:&&\textit{Balancing Neumann-Neumann}, \quad &M^{-1} = M^{-1}_2 = (I - Q A) M^{-1}_1 (I - A Q) + Q\\ &&\text{"ADEF1"}:&&\textit{Adapted Deflation Variant 1}, \quad &M^{-1} = M^{-1}_2 = \phantom{(I - Q A) }M^{-1}_1 (I - A Q) + Q\\ &&\text{"ADEF2"}:&&\textit{Adapted Deflation Variant 2}, \quad &M^{-1} = M^{-1}_2 = (I - Q A) M^{-1}_1\phantom{ (I - A Q)} + Q\\ &&\text{"RBNN1"}:&&\textit{Reduced Balancing Variant 1}, \quad &M^{-1} = M^{-1}_2 = (I - Q A) M^{-1}_1 (I - A Q)\\ &&\text{"RBNN2"}:&&\textit{Reduced Balancing Variant 2}, \quad &M^{-1} = M^{-1}_2 = (I - Q A) M^{-1}_1\phantom{ (I - A Q)}\\ &&\text{"none"}:&&\textit{no coarse correction}, \quad &M^{-1} = M^{-1}_2 = \phantom{(I - Q A) }M^{-1}_1\phantom{ (I - A Q)}\\ \end{array}\end{split}\]
  • Note that AD, ADEF1 and RBNN2 only require one application of \(Q\), while BNN, ADEF2 and RBNN1 require two. The default coarse correction is ADEF1, which is cheaper and generally as robust as BNN or ADEF2.

  • func pr#prfe#K[int] pr#Q(pr#prfe#K[int] &ui) The function pr#Q computes the parallel application of the coarse correction operator \(Q\), i.e. the action of \(Q = Z E^{-1} Z^T\) on the local vector \(u_i\). In the sequential case, it computes the action of \(Q\) on a global vector. The implementation differs depending on the method used to build the coarse space (with GenEO or using a coarse mesh), but the idea is the same: the action of the transpose of the distributed operator \(Z\) on the distributed vector \(u_i\) is computed in parallel, with the contribution of all subdomains being gathered in a vector of size \(n_c\) in the mpi process of rank 0. The action of the inverse of the coarse space operator \(E\) is then computed by forward-backward substitution using its \(LU\) (or \(LDL^T\)) decomposition previously computed by the first pr#prfe#prmesh#pCS ranks of the mpi communicator. The result is then sent back to all subdomains to perform the last application of \(Z\) and obtain the resulting local vector in each subdomain.

  • func pr#prfe#K[int] pr#PREC2level(pr#prfe#K[int] &ui) The function pr#PREC2level computes the parallel application of the two level preconditioner \(M^{-1}_2\), i.e. the action of \(M^{-1}_2\) on the local vector \(u_i\). In the sequential case, it computes the action of \(M^{-1}_2\) on a global vector. The two level preconditioner depends on the choice of the coarse correction formula which is determined by pr#corr, see above.

Building the GenEO coarse space

1ffddmgeneosetup(pr,Varf)

This builds the GenEO coarse space for problem pr. This will create and expose variables whose names will be prefixed by pr, see below. It is assumed that ffddmsetupPrecond has already been called for prefix pr in order to define the one level preconditioner for problem pr. The GenEO coarse space is \(Z = (R_i^T D_i V_{i,k})^{i=1,...,N}_{\lambda_{i,k} \ge \tau}\), where \(V_{i,k}\) are eigenvectors corresponding to eigenvalues \(\lambda_{i,k}\) of the following local generalized eigenvalue problem in subdomain \(i\):

\(D_i A_i D_i V_{i,k} = \lambda_{i,k} A_i^{\text{Neu}} V_{i,k}\),

where \(A_i^{\text{Neu}}\) is the local Neumann matrix of subdomain \(i\) (with Neumann boundary conditions at the subdomain interface).

In practice, this builds and factorizes the local Neumann matrices \(A_i^{\text{Neu}}\) corresponding to the abstract bilinear form given by the macro Varf (see ffddmsetupOperator for more details on how to define the abstract variational form as a macro). In the GenEO method, the abstract bilinear form Varf is assumed to be the same as the one used to define the problem pr through the previous call to ffddmsetupOperator. The local generalized eigenvalue problem is then solved in each subdomain to find the eigenvectors \(V_{i,k}\) corresponding to the largest eigenvalues \(\lambda_{i,k}\) (see pr#Z below). The number of computed eigenvectors \(\nu\) is given by ffddmnu. The eigenvectors selected to enter \(Z\) correspond to eigenvalues \(\lambda_{i,k}\) larger than \(\tau\), where the threshold parameter \(\tau\) is given by ffddmtau. If ffddmtau \(= 0\), all ffddmnu eigenvectors are selected. Finally, the coarse space operator \(E = Z^T A Z\) is assembled and factorized (see pr#E below).

defines:

  • pr#prfe#K[int][int] pr#Z array of local eigenvectors \(Z_{i,k} = D_i V_{i,k}\) obtained by solving the local generalized eigenvalue problem above in the subdomain of this mpi rank using Arpack. The number of computed eigenvectors \(\nu\) is given by ffddmnu. The eigenvectors selected to enter \(Z\) correspond to eigenvalues \(\lambda_{i,k}\) larger than \(\tau\), where the threshold parameter \(\tau\) is given by ffddmtau. If ffddmtau \(= 0\), all ffddmnu eigenvectors are selected.

  • matrix<pr#prfe#K> pr#E the coarse space operator \(E = Z^T A Z\). The matrix pr#E is assembled in parallel and is factorized by the parallel direct solver MUMPS using the first pr#prfe#prmesh#pCS ranks of the mpi communicator, with mpi rank 0 as the master process. The number of mpi processes dedicated to the coarse problem is set by the underlying mesh decomposition of problem pr, which also specifies if these mpi ranks are excluded from the spatial decomposition or not. These parameters are set by ffddmpCS and ffddmexclude when calling ffddmbuildDmesh (see ffddmbuildDmesh for more details).

Building the coarse space from a coarse mesh

1ffddmcoarsemeshsetup(pr,Thc,VarfEprec,VarfAprec)

builds the coarse space for problem pr from a coarse problem which corresponds to the discretization of a variational form on a coarser mesh Thc of \(\Omega\). This will create and expose variables whose names will be prefixed by pr, see below. It is assumed that ffddmsetupPrecond has already been called for prefix pr in order to define the one level preconditioner for problem pr. The abstract variational form for the coarse problem can differ from the original problem pr and is given by macro VarfEprec (see ffddmsetupOperator for more details on how to define the abstract variational form as a macro). For example, absorption can be added in the preconditioner for wave propagation problems, see examples for Helmholtz and Maxwell equations in the Examples section.

The coarse space \(Z\) corresponds to the interpolation operator from the coarse finite element space to the original finite element space of the problem. Thus, the coarse space operator \(E = Z^T A^{\text{Eprec}} Z\) corresponds to the matrix of the problem given by VarfEprec discretized on the coarse mesh Thc and is assembled as such.

Similarly, VarfAprec specifies the global operator involved in multiplicative coarse correction formulas. For example, \(M^{-1}_{2,\text{ADEF1}} = M^{-1}_1 (I - A^{\text{Aprec}} Q) + Q\) (where \(Q = Z E^{-1} Z^T\)). \(A^{\text{Aprec}}\) defaults to \(A\) if VarfAprec is not a valid macro (you can put null for example).

defines:

  • meshN pr#ThCoarse the coarse mesh Thc

  • fespace pr#VhCoarse the coarse finite element space of type pr#prfe#fPk defined on the coarse mesh pr#ThCoarse

  • matrix<pr#prfe#K> pr#AglobEprec the global matrix \(A^{\text{Aprec}}\) corresponding to the discretization of the variational form given by the macro VarfAprec on the global finite element space pr#prfe#Vhglob. Defined only in the sequential case. pr#AglobEprec is equal to pr#Aglobal if VarfAprec is not a valid macro.

  • matrix<pr#prfe#K> pr#aRdEprec the local ‘Dirichlet’ matrix corresponding to VarfAprec; it is the local restriction of the global operator \(A^{\text{Aprec}}\) to the subdomain, equivalent to \(A^{\text{Aprec}}_i = R_i A^{\text{Aprec}} R_i^T\) with \(A^{\text{Aprec}}\) the global matrix corresponding to the discretization of the variational form given by the macro VarfAprec on the global finite element space. Defined only if this mpi rank is not excluded from the spatial domain decomposition, i. e. prmesh#excluded = 0. pr#aRdEprec is equal to pr#aRd[mpiRank(prmesh#commddm)] if VarfAprec is not a valid macro.

  • func pr#prfe#K[int] pr#AEprec(pr#prfe#K[int] &ui) The function pr#AEprec computes the parallel matrix-vector product, i.e. the action of the global operator \(A^{\text{Aprec}}\) on the local vector \(u_i\). The computation is equivalent to \(R_i (\sum_{j=1}^N R_j^T D_j A^{\text{Aprec}}_j u_j)\) and is performed in parallel using local matrices pr#aRdEprec and the function pr#prfe#update. In the sequential case, the global matrix pr#AglobEprec is used instead.

  • matrix<pr#prfe#K> pr#ZCM the interpolation operator \(Z\) from the coarse finite element space pr#VhCoarse to the global finite element space pr#prfe#Vhglob. Defined only in the sequential case.

  • matrix<pr#prfe#K> pr#ZCMi the local interpolation operator \(Z_i\) from the coarse finite element space pr#VhCoarse to the local finite element space pr#prfe#Vhi. Defined only if this mpi rank is not excluded from the spatial domain decomposition, i. e. prmesh#excluded = 0. pr#ZCMi is used for the parallel application of \(Z\) and \(Z^T\).

  • matrix<pr#prfe#K> pr#ECM the coarse space operator \(E = Z^T A^{\text{Eprec}} Z\). The matrix pr#ECM is assembled by discretizing the variational form given by VarfEprec on the coarse mesh and factorized by the parallel direct solver MUMPS using the first pr#prfe#prmesh#pCS ranks of the mpi communicator, with mpi rank 0 as the master process. The number of mpi processes dedicated to the coarse problem is set by the underlying mesh decomposition of problem pr, which also specifies if these mpi ranks are excluded from the spatial decomposition or not. These parameters are set by ffddmpCS and ffddmexclude when calling ffddmbuildDmesh (see ffddmbuildDmesh for more details).

Solving the linear system

1func pr#prfe#K[int] pr#fGMRES(pr#prfe#K[int]& x0i, pr#prfe#K[int]& bi, real eps, int itmax, string sp)

solves the linear system for problem pr using the flexible GMRES algorithm with preconditioner \(M^{-1}\) (corresponding to pr#PREC). Returns the local vector corresponding to the restriction of the solution to pr#prfe#Vhi. x0i and bi are local distributed vectors corresponding respectively to the initial guess and the right-hand side (see ffddmbuildrhs). eps is the stopping criterion in terms of the relative decrease in residual norm. If eps \(< 0\), the residual norm itself is used instead. itmax sets the maximum number of iterations. sp selects between the "left" or "right" preconditioning variants: left preconditioned GMRES solves \(M^{-1} A x = M^{-1} b\), while right preconditioned GMRES solves \(A M^{-1} y = b\) for \(y\), with \(x = M^{-1} y\).

Using HPDDM within ffddm

ffddm allows you to use HPDDM to solve your problem, effectively replacing the ffddm implementation of all parallel linear algebra computations. ffddm can then be viewed as a finite element interface for HPDDM.

You can use HPDDM features unavailable in ffddm such as advanced Krylov subspace methods implementing block and recycling techniques.

To switch to HPDDM, simply define the macro pr#withhpddm before using ffddmsetupOperator. You can then pass HPDDM options with command-line arguments or directly to the underlying HPDDM operator pr#hpddmOP. Options need to be prefixed by the operator prefix:

1macro PBwithhpddm()1 // EOM
2ffddmsetupOperator( PB , FE , Varf )
3set(PBhpddmOP,sparams="-hpddm_PB_krylov_method gcrodr -hpddm_PB_recycle 10");

You can also choose to replace only the Krylov solver, by defining the macro pr#withhpddmkrylov before using ffddmsetupOperator. Doing so, a call to pr#fGMRES will call the HPDDM Krylov solver, with ffddm providing the operator and preconditioner through pr#A and pr#PREC. You can then pass HPDDM options to the Krylov solver through command-line arguments:

1macro PBwithhpddmkrylov()1 // EOM
2ffddmsetupOperator( PB , FE , Varf )

For example, using restarted GCRO-DR(40) and recycling 10 Ritz vectors at each restart:

1ff-mpirun -np 4 test.edp -wg -hpddm_krylov_method gcrodr -hpddm_recycle 10 -ffddm_gmres_restart 40

An example can be found in Helmholtz-2d-HPDDM-BGMRES.edp, see the Examples section.

Advanced use

Interpolation between two distributed finite element spaces

The parallel interpolation of a distributed finite element function to another distributed finite element space can be computed using the prfe#transferfromVhi macro. Internally, it uses the transfer macro from the macro_ddm.idp script. The macro is prefixed by the source finite element prefix prfe and is used a follows:

1prfe#transferfromVhi(us,Vht,Pkt,rest)

where us is distributed source FE function defined on prfe#Vhi, Vht is the target local finite element space, Pkt is the approximation space corresponding to Vht and rest is the target local FE function defined on Vht. You can find an example below:

 1macro dimension()2//
 2
 3include "ffddm.idp"
 4
 5mesh Ths = square(10,10);
 6
 7mesh Tht = square(30,20);
 8
 9ffddmbuildDmesh(Ms,Ths,mpiCommWorld)
10
11ffddmbuildDmesh(Mt,Tht,mpiCommWorld)
12
13func Pk = [P2,P2];
14
15macro def(u)[u,u#2]//
16macro init(u)[u,u]//
17ffddmbuildDfespace(FEs,Ms,real,def,init,Pk)
18ffddmbuildDfespace(FEt,Mt,real,def,init,Pk)
19
20FEsVhi def(us) = [cos(x^2+y),sin(x^2+y)];
21
22FEtVhi def(ut);
23
24FEstransferfromVhi(us,FEtVhi,Pk,ut)
25
26ffddmplot(FEs,us,"u source");
27ffddmplot(FEt,ut,"u target");

Local finite element spaces for non Lagrange finite elements

For Lagrange finite elements, the partition of unity \((D_i)_{i=1,...,N}\) (see prfe#Dk and prfe#Dih) is built by interpolating the local P1 partition of unity function onto the components of the Pk finite element space prfe#Vhi. For non Lagrange finite element spaces, such as Raviart–Thomas or Nédélec edge elements, the definition of the degrees of freedom can be more involved, and interpolating the P1 partition of unity functions directly is inappropriate. The idea is then to use a “pseudo” finite element Pkpart derived from Pk which is suitable for interpolating the P1 partition of unity, in the sense that it will produce a partition of unity for Pk.

For example, for first-order Nédélec edge elements (Edge03d), whose degrees of freedom are the circulations along the edges, we define the “pseudo” finite element Edge03ds0 which can be seen as a scalar Lagrange counterpart: the numbering of the degrees of freedom is the same, but they correspond to the value at the edge midpoints.

For Lagrange finite elements, the distributed finite element spaces are built using ffddmbuildDfespace. Here you must use ffddmbuildDfespaceEdge, which builds the distributed finite element space using a “pseudo” finite element to build the partition of unity:

1ffddmbuildDfespaceEdge(prfe,prmesh,scalar,def,init,Pk,defpart,initpart,Pkpart)

where macros defpart and initpart specify how to define and interpolate a function in the ‘pseudo’ finite element space Pkpart, similar to def and init for Pk.

An example with first-order Nédélec edge elements (Edge03d + Edge03ds0) for Maxwell equations can be found in Maxwell-3d-simple.edp, see the Examples section.

Inexact coarse solves for two level methods

We have seen in the Two level preconditioners section that two level methods produce a ‘coarse space operator’ \(E\) that needs to be inverted at each iteration. By default the coarse space operator matrix is factorized by the direct solver MUMPS. This can become a bottleneck and hinder scalability for large problems, where \(E\) can become too large to be factorized efficiently. To remedy this, we can instead opt to use an iterative method to solve the coarse problem at each iteration. Moreover, in order to retain robustness, a DD preconditioner can be used to solve the inner coarse problem more efficiently.

Coarse mesh and inexact coarse solve

When the coarse problem comes from a coarse mesh discretization, a natural way to do inexact coarse solve is to use a one level domain decomposition method on the coarse problem, with the same subdomain partitioning for the coarse and fine meshes. This means that each processor is associated to one spatial subdomain and hosts the two local (nested) coarse and fine submeshes corresponding to this subdomain, as well as the corresponding local matrices for the two discretizations. This natural choice offers interesting benefits:

  • We naturally recover a load-balanced parallel implementation, provided that the initial partitioning is balanced.

  • The communication pattern between neighboring subdomains is the same for the coarse and fine discretizations.

  • The assembly and the application of the interpolation operator \(Z\) (and \(Z^T\)) between the fine and the coarse spaces can be computed locally in each subdomain and require no communication.

In ffddm, the first step is to build the two nested mesh decompositions using ffddmbuildDmeshNested:

1ffddmbuildDmeshNested(prmesh,Thc,s,comm)

decomposes the coarse mesh Thc into overlapping submeshes and creates the fine decomposition by locally refining submeshes by a factor of s, i.e. splitting each mesh element into \(s^d\) elements, \(s \geq 1\). This will create and expose variables corresponding to both decompositions, prefixed by prmesh for the fine mesh and by prmesh#Coarse for the coarse mesh (see ffddmbuildDmesh). It also sets the integer variable prmesh#binexactCS to 1, which specifies that any two level method defined on mesh prefix prmesh will use inexact coarse solves.

The distributed finite element spaces, operators and preconditioners can then be defined for both decompositions. Here is an example where the coarse problem is solved using a one level method:

 1ffddmbuildDmeshNested(M, Thc, 3, mpiCommWorld)
 2
 3ffddmbuildDfespace(FE, M, real, def, init, Pk)
 4ffddmbuildDfespace(FECoarse, MCoarse, real, def, init, Pk)
 5
 6// coarse operator (Varf of E):
 7ffddmsetupOperator(PBCoarse, FECoarse, VarfEprec)
 8// one level preconditioner for the coarse problem:
 9ffddmsetupPrecond(PBCoarse, VarfPrecC)
10
11// operator for the fine problem:
12ffddmsetupOperator(PB, FE, Varf)
13// one level preconditioner for the fine problem:
14ffddmsetupPrecond(PB, VarfPrec)
15
16// add the second level:
17ffddmcoarsemeshsetup(PB, Thc, VarfEprec, null)
18
19[...]
20u[] = PBfGMRES(x0, rhs, 1.e-6, 200, "right");

Remarks:

  • Note that the different prefixes need to match: prefixes for the coarse decomposition have to be those of the fine decomposition, appended with Coarse.

  • The operator and preconditioner for the coarse problem have to be defined before those of the fine problem, because the pr#Q function is actually defined by ffddmsetupPrecond and involves a call to pr#CoarsefGMRES (which is defined by ffddmsetupPrecond for the coarse problem) for the iterative solution of the coarse problem if pr#prfe#prmesh#binexactCS \(\neq 0\).

  • In this case, ffddmcoarsemeshsetup does not use Thc or VarfEprec and only builds the local interpolation matrices between fine and coarse local finite element spaces pr#prfe#Vhi and pr#prfe#CoarseVhi to be able to apply \(Z\) and \(Z^T\).

  • The GMRES tolerance for the inner solution of the coarse problem is set by ffddminexactCStol and is equal to 0.1 by default.

In practice, these methods can give good results for wave propagation problems, where the addition of artificial absorption in the preconditioner helps with the convergence of the one level method for the inner solution of the coarse problem. You can find an example for Maxwell equations in Maxwell_Cobracavity.edp, see the Examples section. More details can be found here and in

M. Bonazzoli, V. Dolean, I. G. Graham, E. A. Spence, P.-H. Tournier. Domain decomposition preconditioning for the high-frequency time-harmonic Maxwell equations with absorption. Mathematics of Computation, 2019. DOI: https://doi.org/10.1090/mcom/3447

Table of content