Domain Decomposition (DD)
When the size of a three dimensional problem is large (whatever it means), it is necessary to distribute data among several processors especially for solving linear systems. A natural way is to do it via domain decomposition.
Mesh Decomposition
The starting point is a collection of \(N\) sub-meshes \((Th_i)_{i=1}^N\) that together form a global mesh
These meshes may be overlapping or not. This decomposition induces a natural decomposition of the global finite element space \(Vh\) on \(Th\) into \(N\) local finite element spaces \((Vh_i)_{i=1}^N\) each of them defined on \(Th_i\).
Note By global, we mean that the corresponding structure can be refered to in the code (most often only) by its local values. In computer science term, it corresponds to a distributed data where each piece of data is stored by a MPI process.
Distributed Linear Algebra
For a given finite element space \(Vh\), the domain decomposition induces a natural decomposition of the set of the global degrees of freedom (d.o.f.) of \(Vh\) into the \(N\) subsets of d.o.f.’s \(({\mathcal N}_i)_{i=1}^N\) each associated with the local finite element space \(Vh_i\). We have thus
but with duplications of some of the d.o.f.’s.
Associated with this decomposition of the set of d.o.f.’s \({\mathcal N}\), a distributed vector is a collection of local vectors \(({\mathbf V_i})_{1\le i\le N}\) so that the values on the duplicated d.o.f.’s are the same.
Note
In mathematical terms, it can be described as follows for a real valued problem. Let \(R_i\) be the restriction operator from \(\R^{\#{\mathcal N}}\) to \(\R^{\#{\mathcal N}_i}\), where \(\#{\mathcal N}_i\) denotes the number of elements of \({\mathcal N}_i\). A collection of local vectors \(({\mathbf V}_i)_{1\le i\le N}\in \Pi_{i=1}^N \R^{\#{\mathcal N}_i}\) is a distributed vector iff there exists a global vector \({\mathbf V}\in\R^{\#{\mathcal N}}\) such that for all subset \(1\le i\le N\), we have:
We will also say that the collection of local vectors \(({\mathbf V}_i)_{1\le i\le N}\) is consistent. For a complex valued problem, simply replace \(\R\) with \(\C\).
Partition of Unity Matrices (POUM)
Let \((D_i)_{1\le i \le N}\) be square diagonal matrices of size \(\#{\mathcal N}_i\) which form a partition of unity in the sense that:
For instance if a degree of freedom is shared by \(k\) subdomains defining the corresponding entry of the diagonal matrix \(D\) to be \(1/k\) yields partition of unity matrices. The matrices \(R_i\) and \(D_i\) are the heart of distributed linear algebra.
Distributed scalar product
For two global vectors \({\mathbf U}\) and \({\mathbf V}\) of size \(\#{\mathcal N}\), the formula for the scalar product \({\mathbf V}^T\,{\mathbf U}=({\mathbf U},\,{\mathbf V})\) in terms of their distributed vector counterparts makes use of the partition of unity matrices \((D_i)_{1\le i \le N}\) introduced above:
Local scalar products are performed concurrently.
Thus, the implementation is parallel except for the sum which corresponds to a MPI_Reduce call across the \(N\) MPI processes.
Note also that the implementation relies on the knowledge of a partition of unity so that the FreeFEM syntax is dscalprod(Di,u,v)
or equivalently myFEprefix#scalprod(u,v)
where myFEprefix
is a user defined prefix for the finite element space decomposition, see the ffddm documentation.
Update
From a collection of local vectors \(({\mathbf U}_i)_{1\le i \le N}\), it is possible ensure consistency of the duplicated data by modifying the distributed vector \(({\mathbf U}_i)_{1\le i \le N}\) by calling the function myFEprefix#update(Ui, true)
where myFEprefix
is the user defined prefix that refers to the finite element space decomposition.
This function performs the following operation for all \(1\le i \le N\):
Note
The implementation corresponds to
where \(\mathcal{O}(i)\) is the set of neighbors of subdomain \(i\). Therefore, the matrix vector product is computed in three steps:
concurrent computing of \(D_j {\mathbf U}_j\) for all \(1\le j\le N\);
neighbor to neighbor MPI-communications from subdomain \(j\) to subdomain \(i\) (\(R_i\,R_j^T\)) ;
concurrent sum of neighbor contributions.
Distributed Matrix and Vector resulting from a variational formulation
The discretization of a variational formulation on the global mesh \(Th\) yields a global matrix \(A\) and a global right hand side \(\mathbf{RHS}\).
Thanks to the sparsity of finite element matrices for partial differential equations and thanks to the overlap between subdomains, the knowledge of the local matrix \(R_i A R_i^T\) on each subdomain \(1\le i\le N\) is sufficient to perform the matrix-vector product \(A\times \mathbf{U}\) for any global vector \(\mathbf{U}\).
Once the problem has been set up by a call to ffddmsetupOperator(myprefix, myFEprefix, myVarf)
, the matrix-vector product is performed by calling the function myprefix#A(Ui)
where myprefix
is a user defined prefix that refers to the problem at hand which itself implicitly refers to the triplet (domain decomposition, finite element, variational formulation).
See more on problem definition in this documentation and more on distributed linear algebra in chapter 8 of “An Introduction to Domain Decomposition Methods: algorithms, theory and parallel implementation” SIAM 2015.
Distributed Linear Solvers
In many cases, we are interested in the solution of the problem in terms of the vector of d.o.f.’s \(\mathbf{X}\) that satisfies:
ffddm
offers two parallel solvers: direct factorization and iterative preconditioned solvers via Schwarz domain decomposition methods.
Distributed Direct Solvers
In order to benefit from the sparsity of the matrix arising from a finite element discretization of a partial differential equation, a variant of Gauss elimination, the frontal method, that automatically avoids a large number of operations involving zero terms was developed.
A frontal solver builds a \(LU\) or Cholesky decomposition of a sparse matrix given as the assembly of element matrices by eliminating equations only on a subset of elements at a time.
This subset is called the front and it is essentially the transition region between the part of the system already finished and the part not touched yet.
These methods are basically sequential since the unknowns are processed the one after another or one front after another.
In order to benefit from multicore processors, a multifrontal solver is an improvement of the frontal solver that uses several independent fronts at the same time.
The fronts can be worked on by different processors, which enables parallel computing. ffddm
provides an interface to the parallel sparse direct solver MUMPS. These methods have the advantage to be very robust and to have a predictable cost. The main drawback is the memory requirement which can be prohibitive especially for three-dimensional problems.
Schwarz methods
These methods are part of the large family of preconditioned iterative solvers. When considering the solve of the equation \(A\, \mathbf{X} = \mathbf{RHS}\), a preconditioner is a linear operator that approximates the inverse of \(A\) and whose cost of the associated matrix-vector product is much cheaper than solving the original linear system. It enables to accelerate the solution of the latter with Krylov type methods such as the conjugate gradient (in the symmetric positive definite case), GMRES or BiCGSTAB in the general case. Two options are possible.
Left preconditioning: the preconditioner is applied to the left of the equation
and the Krylov method is applied to the left preconditioned system with a residual that is preconditioner dependent.
Right preconditioning: the preconditioner is inserted on the right of the operator:
and the Krylov method is applied to the right preconditioned system with a residual that is preconditioner independent.
In both cases, if the preconditioner is efficient the number of iterations to get a converged solution is much smaller than the number of iterations of the Krylov method applied to the original equation \(A\, \mathbf{X} = \mathbf{RHS}\). Although right preconditioning seems more intricate, it is much safer to use since the convergence is checked on a residual that does not depend on the preconditioner.
In the sequel, we consider the solution of the equation \(A\, \mathbf{X} = \mathbf{RHS}\) preconditioned by domain decomposition methods and with a flexible GMRES Krylov method which is thus necessarily right preconditioned.
Restricted Additive Schwarz (RAS)
The RAS preconditioner reads:
where for each subdomain \(j\) the restriction matrix \(R_j\) and the partition of unity matrix \(D_j\) have been introduced above. Note that in the original ASM (additive Schwarz method) preconditioner the partition of unity is dropped. The application of the operator \(M^{-1}_{RAS}\) to a global right hand side \(\mathbf{RHS}\) is detailed below. Recall that this global vector is distributed among processes via the local vectors \((\mathbf{RHS}_i)_{i=1}^N\). Let \(A_{j}\) denote the local matrix \((R_j\, A\,R_j^T)\). The local vector in subdomain \(i\) resulting from the matrix vector product \(M^{-1}_{RAS}\, \mathbf{RHS}\) consists in computing:
This task is performed by first solving concurrently on all subdomains a linear system for \({\mathbf Y}_j\) for all \(1\le j \le N\):
Each local vector \({\mathbf Y}_j\) is weighted by the partition of unity matrix \(D_j\). Then data transfers between neighboring subdomains implement the \(R_i\,R_j^T\,D_j\,{\mathbf Y}_j\) formula. The contribution from neighboring subdomains are summed locally. This pattern is very similar to that of the update procedure.
Optimized Restricted Additive Schwarz (ORAS)
The ORAS preconditioner may be seen as a variant of the RAS preconditioner. It reads:
where \(B_j\) are local matrices of size \(\#{\mathcal N}_j \times \#{\mathcal N}_j\) for \(1\le j \le N\). This variant is very useful when dealing with wave propagation phenomena such as Helmholtz problems in acoustics or Maxwell system in the frequency domain for electromagnetism. Defining \(B_j\) as the discretization of the physical equation with impedance conditions on the boundary of the subdomain has been proved to be a good choice.
Two level methods
The RAS and ORAS methods are called a one-level method in the sense that sub-domains only interact with their direct neighbors. For some problems such as Darcy problems or static elasticity problems and when the number of subdomains is large, such one-level methods may suffer from a slow convergence. The fix is to add to the preconditioner an auxiliary coarse problem that couples all subdomains at each iteration and is inexpensive to calculate.
In mathematical terms, we first choose a full rank rectangular matrix \(Z\in\R^{\#{\mathcal N}\times NC}\) where \(NC \ll \#{\mathcal N}\) denotes the dimension of the coarse space spanned by the columns of \(Z\). We also pick a coarse matrix \(A_C\in \R^{N_C\times N_C}\). A generic one-level method preconditioner \(M_1^{-1}\) is enriched by a solve on the coarse space. The simplest correction formula is additive:
Other correction formulas are given in documentation.
We consider two ways to build \(Z\) and thus the coarse space and the coarse problem \(A_C\), see below Coarse Mesh and GenEO
Coarse Mesh
A first possibility is to discretize the problem on a coarse mesh, following the same principle as multi-grid methods. For 3-D problems, a coarsening of the mesh size by a factor 2, reduces by a factor \(2^3=8\) the size of the coarse problem which is then easier to solve by a direct method. Then, \(Z\) is the interpolation matrix from the coarse finite element space to the fine one.
GenEO
For highly heterogeneous or anisotropic problems, two level methods based on coarse meshes might fail and a more sophisticated construction must be used. A provable robust coarse space called GenEO is built by first solving the following local generalized eigenvalue problem in parallel for each subdomain \(1\le i\le N\), where \(A_i^{\text{Neu}}\) denotes the local matrix resulting from the variational formulation:
The eigenvectors selected to enter the coarse space correspond to eigenvalues \(\lambda_{i,k} \ge \tau\), where the threshold parameter \(\tau\) is user-defined. The precise formulas are given in this documentation. From a mathematical point of view, it has been proved that for a symmetric positive definite matrix \(A\), the spectrum of the preconditioned by the two-level method with a GenEO coarse space lies in the interval \([\displaystyle \frac{1}{1+k_1\,\tau} , k_0 ]\).
Note A heuristic that justifies this construction is as follows. We first introduce the Additive Schwarz method (ASM) which can be seen as a symmetrized variant of the RAS preconditioner:
It can be proved that the lower bound for the eigenvalue of \(M_{ASM}^{-1}\,A\) is close to zero (which is bad for convergence) whereas the upper bound depends only on the number of neigbors of a subdomain (which is good for convergence).
Second, we also introduce the following preconditioner \(M^{-1}_{NN}\):
We have a very good lower bound for the preconditioned operator \(M^{-1}_{NN}\,A\) that does not depend on the number of subdomains but only on the maximum multiplicity of intersections \(k_1\) (which is good for convergence). But the upper bound for this preconditioner is very large (which is bad for convergence).
Now, if we compare formulas for \(M^{-1}_{NN}\) and \(M^{-1}_{ASM}\), we may suspect that vectors \(\mathbf{V}_{ik}\) for which \(D_i\, (A_i^{\text{Neu}})^{-1}\,D_i\,\mathbf{V}_{ik}\) and \(A_{i}^{-1}\,\mathbf{V}_{ik}\) have very different values are responsible for the slow convergence and should contribute to the coarse space. This is a way to interpret the above generalized eigenvalue problem which controls the lower bound of the two-level preconditioned system.