# Domain Decomposition (DD)#

When the size of a three dimensional problem is large (whatever it means), it is necessary to distribute data among several processors especially for solving linear systems. A natural way is to do it via domain decomposition.

## Mesh Decomposition#

The starting point is a collection of N sub-meshes (Th_i)_{i=1}^N that together form a global mesh

This induces a natural decomposition of the global finite element space Vh on Th into N local finite element spaces (Vh_i)_{i=1}^N each of them defined on Th_i.

**Note** By global, we mean that the corresponding structure can be refered to in the code (most often only) by its local values. In computer science term, it corresponds to a distributed data where each piece of data is stored by a MPI process.

## Distributed Linear Algebra#

The domain decomposition induces a natural decomposition of the set of the global degrees of freedom (d.o.f.) {\mathcal N} of the finite element space Vh into the N subsets of d.o.f.'s ({\mathcal N})_{i=1}^N each associated with the local finite element space Vh_i. We have thus

but with duplications of some of the d.o.f.'s.

Associated with this decomposition of the set of d.o.f.'s {\mathcal N}, a *distributed vector* is a collection of local vectors ({\mathbf V_i}_{1\le i\le N}) so that the values on the duplicated d.o.f.'s are the same.

**Note** In mathematical terms, it can be described as follows for a real valued problem. For a real value problem, simply replace \R with \C. Let R_i be the restriction operator from \R^{\#{\mathcal N}} to \R^{\#{\mathcal N}_i}, where \#{\mathcal N}_i denotes the number of elements of {\mathcal N}_i. A collection of local vectors ({\mathbf V}_i)_{1\le i\le N}\in \Pi_{i=1}^N \R^{\#{\mathcal N}_i} is a distributed vector iff there exists a global vector {\mathbf V}\in\R^{\#{\mathcal N}} such that for all subset 1\le i\le N, we have:

We will also say that the collection of local vectors ({\mathbf V}_i)_{1\le i\le N} is consistent.

## Partition of Unity Matrices (POUM)#

Let (D_i)_{1\le i \le N} be square diagonal matrices of size \#{\mathcal N}_i which form a partition of unity in the sense that:

For instance if a degree of freedom is shared by k subdomains defining the corresponding entry of the diagonal matrix D to be 1/k yields partition of unity matrices. The matrices R_i and D_i are the heart of distributed linear algebra.

### Distributed scalar product#

For two global vectors {\mathbf U} and {\mathbf V} of size \#{\mathcal N}, the formula for the scalar product {\mathbf V}^T\,{\mathbf U}=({\mathbf U},\,{\mathbf V}) in terms of their distributed vector counterparts is:

Local scalar products are performed concurrently. Thus, the implementation is parallel except for the sum which corresponds to a MPI_Reduce call across the N MPI processes. Note also that the implementation relies on the knowledge of a partition of unity so that the FreeFem++ syntax is `dscalprod(Di,u,v)`

or equivalently `pr#scalprod(u,v)`

where `pr`

is a user defined prefix that refers to the domain decomposition and thus implicitely also to the partition of unity.

### Update#

From a collection of local vectors ({\mathbf U}_i)_{1\le i \le N}, it is possible ensure consistency of the duplicated data and thus creating a distributed vector ({\mathbf V}_i)_{1\le i \le N} by calling the function `pr#update(Ui, TRUE)`

where `pr`

is a user defined prefix that refers to the domain decomposition. This function performs the following operation for all 1\le i \le N:

**Note** The implementation corresponds to

where \mathcal{O}(i) is the set of neighbors of subdomain i. Therefore, the matrix vector product is computed in three steps: - concurrent computing of D_j {\mathbf U}_j for all 1\le j\le N; - neighbor to neighbor MPI-communications (R_i\,R_j^T) ; - concurrent sum of neighbor contributions.

## Distributed Matrix and Vector resulting from a variational formulation#

The discretization of a variational formulation on the global mesh Th yields a global matrix A and a global right hand side \mathbf{RHS}. Thanks to the sparsity of finite element matrices for partial differential equations and thanks to the overlap between subdomains, the knowledge of the local matrix R_i A R_i^T on each subdomain 1\le i\le N is sufficient to perform the matrix-vector product A\times \mathbf{U} for any global vector \mathbf{U}. Once the problem has been set up by a call to `ffddmsetupOperator(myprefix,myFEprefix,myVarf)`

, the matrix-vector product is performed by calling the function `pr#A(Ui)`

where `pr`

is a user defined prefix that refers to the problem at hand which itself implicitly refers to the triplet (domain decomposition, finite element, variational formulation). See more on problem defintion in this documentation and more on distributed linear algebra in chapter 8 of "An Introduction to Domain Decomposition Methods: algorithms, theory and parallel implementation" SIAM 2015.

## Distributed Linear Solvers#

In many cases, we are interested in the solution of the problem in terms of the vector of d.o.f.'s \mathbf{X} that satisfies:

`ffddm`

offers two parallel solvers: direct factorization and Schwarz domain decomposition methods.

### Distributed Direct Solvers#

In order to benefit from the sparsity of the matrix
arising from a finite element discretization of a partial differential
equation, a variant of Gauss elimination, the frontal method, that
automatically avoids a large number of operations involving zero terms
was developed.
A frontal solver builds a LU or Cholesky decomposition of a sparse matrix
given as the assembly of element matrices by eliminating equations
only on a subset of elements at a time. This subset is called the *front* and it is essentially the transition region between the part of the system already finished and the part not touched yet. These methods are basically sequential since the unknowns are processed the one after another or one front after another. In order to benefit from multicore processors, a multifrontal solver
is an improvement of the frontal solver that uses several independent fronts at the same time. The fronts can be worked on by different processors, which enables parallel computing. `ffddm`

provides an interface to the parallel sparse direct solver MUMPS.

### Schwarz methods#

We consider the solve of the equation A\, \mathbf{X} = \mathbf{RHS} by a flexible GMRES method preconditioned by domain decomposition methods.

#### Restricted Additive Schwarz (RAS)#

The RAS preconditioner reads:

Let A_{i} denote the local matrix (R_i\, A\,R_i^T). The application of the operator M^{-1}_{RAS} to a distributed right hand side (\mathbf{RHS}_i)_{i=1}^N consists in computing:

This task is performed by first solving concurrently on all subdomains a linear system for {\mathbf Y}_j for all 1\le j \le N:

Each local vector {\mathbf Y}_j is weighted by the partition of unity matrix D_j. Then data transfers between neighboring subdomains implement the R_i\,R_j^T\,D_j\,{\mathbf Y}_j formula. The contribution from neighboring subdomains are summed locally. This pattern is very similar to that of the update procedure.

#### Optimized Restricted Additive Schwarz (ORAS)#

The ORAS preconditioner may be seen as a variant of the RAS preconditioner. It reads:

where B_j are local matrices of size \#{\mathcal N}_j \times \#{\mathcal N}_j for 1\le j \le N. This variant is very useful when dealing with wave propagation phenomena such as Helmholtz problems in acoustics or Maxwell system in the frequency domain for electromagnetism. Defining B_j as the discretization of the physical equation with impedance conditions on the boundary of the subdomain

#### Two level methods#

The RAS method is called a one-level method in the sense that sub-domains only interact with their direct neighbors. For some problems such as Darcy problems or static elasticiy problems and when the number of subdomains is large, such one-level methods may suffer from a slow convergence. The fix is to add to the preconditioner an auxiliary coarse problem that couples all subdomains at each iteration and is inexpensive to calculate. We consider two ways to build this coarse problem, see below Coarse Mesh and GenEO

##### Coarse Mesh#

A first possibility is to discretize the problem on a coarse mesh, following the same principle as multi-grid methods. For 3-D problems, a coarsening of the mesh size by a factor 2, reduces by a factor 2^3=8 the size of the coarse problem which is then easier to solve by a direct method.

##### GenEO#

For highly heterogeneous or anisotropic problems, two level methods based on coarse meshes might fail and a more sophisticated construction must be used. A provable robust coarse space called GenEO is built by first solving the following local generalized eigenvalue problem in parallel for each subdomain 1\le i\le N, where A_i^{\text{Neu}} denotes the local matrix resulting from the variational formulation:

The eigenvectors selected to enter the coarse space correspond to eigenvalues \lambda_{i,k} \ge \tau, where the threshold parameter \tau is user-defined. The precise formulas are given in this documentation. From a mathematical point of view, it has been proved that for a symmetric positive definite matrix A, the spectrum of the preconditioned by the two-level method with a GenEO coarse space lies in the interval [\displaystyle \frac{1}{1+k_1\,\tau} , k_0 ].

**Note** A heuristic that justifies this construction is as follows. We first introduce the Additive Schwarz method (ASM) which can be seen as a symmetrized variant of the RAS preconditioner:

It can be proved that the lower bound for the eigenvalue of M_{ASM}^{-1}\,A is close to zero (which is bad for convergence) whereas the upper bound depends only on the number of neigbors of a subdomain (which is good for convergence).

Second, we also introduce the following preconditioner M^{-1}_{NN}:

We have a very good lower bound for the preconditioned operator M^{-1}_{NN}\,A that does not depend on the number of subdomains but only on the maximum multiplicity of intersections k_1 (which is good for convergence). But the upper bound for this preconditioner is very large (which is bad for convergence).

Now, if we compare formulas for M^{-1}_{NN} and M^{-1}_{ASM}, we may suspect that vectors \mathbf{V}_{ik} for which D_i\, (A_i^{\text{Neu}})^{-1}\,D_i\,\mathbf{V}_{ik} and A_{i}^{-1}\,\mathbf{V}_{ik} have very different values are responsible for the slow convergence and should contribute to the coarse space. This is a way to interpret the above generalized eigenvalue problem which controls the lower bound of the two-level preconditioned system.