FreeFEM Documentation on GitHub

stars - forks

# Parallelization

A first attempt of parallelization of FreeFEM is made here with MPI. An extended interface with MPI has been added to FreeFEM version 3.5, (see the MPI documentation for the functionality of the language).

## MPI

### MPI Keywords

The following keywords and concepts are used:

• mpiComm to defined a communication world

• mpiGroup to defined a group of processors in the communication world

• mpiRequest to defined a request to wait for the end of the communication

### MPI Constants

• mpisize The total number of processes,

• mpirank the id-number of my current process in {0, ..., mpisize-1},

• mpiUndefined The MPI_Undefined constant,

• mpiAnySource The MPI_ANY_SOURCE constant,

• mpiCommWorld The MPI_COMM_WORLD constant,

• [ … ] and all the keywords of MPI_Op for the reduce operator: mpiMAX, mpiMIN, mpiSUM, mpiPROD, mpiLAND, mpiLOR, mpiLXOR, mpiBAND, mpiBXOR.

### MPI Constructor

 1// Parameters
2int[int] proc1 = [1, 2], proc2 = [0, 3];
3int color = 1;
4int key = 1;
5
6// MPI ranks
7cout << "MPI rank = " << mpirank << endl;
8
9// MPI
10mpiComm comm(mpiCommWorld, 0, 0); //set a MPI_Comm to MPI_COMM_WORLD
11
12mpiGroup grp(proc1); //set MPI_Group to proc 1,2 in MPI_COMM_WORLD
13mpiGroup grp1(comm, proc1); //set MPI_Group to proc 1,2 in comm
14
15mpiComm ncomm1(mpiCommWorld, grp); //set the MPI_Comm form grp
16
17mpiComm ncomm2(comm, color, key); //MPI_Comm_split(MPI_Comm comm, int color, int key, MPI_Comm *ncomm)
18
19mpiRequest rq; //defined an MPI_Request
20mpiRequest[int] arq(10); //defined an array of 10 MPI_Request


### MPI Functions

 1mpiComm Comm(mpiCommWorld, 0, 0);
2
3int MPICommSize = mpiSize(Comm);
4int MPIRank = mpiRank(Comm);
5
6if (MPIRank == 0) cout << "MPI Comm size = " << MPICommSize << endl;
7cout << "MPI rank in Comm = " << mpiRank(Comm) << endl;
8
9mpiRequest Req;
10mpiRequest[int] ReqArray(10);
11
12for (int i = 0; i < MPICommSize; i++){
13     //return processor i with no Resquest in MPI_COMM_WORLD
14    processor(i);
15    //return processor any source with no Resquest in MPI_COMM_WORLD
16    processor(mpiAnySource);
17    //return processor i with no Resquest in Comm
18    processor(i, Comm);
19    //return processor i with no Resquest in Comm
20    processor(Comm, i);
21    //return processor i with Resquest rq in Comm
22    /* processor(i, Req, Comm);
23    //return processor i with Resquest rq in MPI_COMM_WORLD
24    processor(i, Req); */
25    //return processor i in MPI_COMM_WORLD in block mode for synchronously communication
26    processorblock(i);
27    //return processor any source in MPI_COMM_WORLD in block mode for synchronously communication
28    processorblock(mpiAnySource);
29    //return processor i in in Comm in block mode
30    processorblock(i, Comm);
31}
32
33mpiBarrier(Comm); //do a MPI_Barrier on communicator Comm
34mpiWaitAny(ReqArray); //wait add of Request array,
35mpiWait(Req); //wait on a Request
36real t = mpiWtime(); //return MPIWtime in second
37real tick = mpiWtick(); //return MPIWTick in second


where a processor is just a integer rank, pointer to a MPI_comm and pointer to a MPI_Request, and processorblock with a special MPI_Request.

### MPI Communicator operator

 1int status; //to get the MPI status of send / recv
2real a, b;
3
4mpiComm comm(mpiCommWorld, 0, 0);
5mpiRequest req;
6
7//send a,b asynchronously to the process 1
8processor(1) << a << b;
9//receive a,b synchronously from the process 10
10processor(10) >> a >> b;
11
12//broadcast from processor of comm to other comm processor
14//send synchronously to the process 10 the data a
15status = Send(processor(10, comm), a);
16//receive synchronously from the process 10 the data a
17status = Recv(processor(10, comm), a);
18
19//send asynchronously to the process 10 the data a without request
20status = Isend(processor(10, comm), a);
21//send asynchronously to the process 10 the data a with request
22status = Isend(processor(10, comm, req), a);
23//receive asynchronously from the process 10 the data a
24status = Irecv(processor(10, req), a);
25//Error asynchronously without request.
26// status = Irecv(processor(10), a);


where the data type of a can be of type of int, real, complex, int[int], real[int], complex[int], int[int,int], double[int,int], complex[int,int], mesh, mesh3, mesh[int], mesh3[int], matrix, matrix<complex>

1//send asynchronously to the process 10 the data a with request
2processor(10, req) << a ;
3//receive asynchronously from the process 10 the data a with request
4processor(10, req) >> a ;


If a, b are arrays or full matrices of int, real, or complex, we can use the following MPI functions:

1mpiAlltoall(a, b, [comm]);
2mpiAllgather(a, b, [comm]);
3mpiGather(a, b, processor(..) );
4mpiScatter(a, b, processor(..));
5mpiReduce(a, b, processor(..), mpiMAX);
6mpiAllReduce(a, b, comm, mpiMAX);


Thank you to Guy-Antoine Atenekeng Kahou for his help to code this interface.

### Schwarz example in parallel

This example is a rewritting of example Schwarz overlapping.

1ff-mpirun -np 2 SchwarzParallel.edp
2# OR
3mpirun -np 2 FreeFem++-mpi SchwarzParallel.edp

 1if (mpisize != 2){
2    cout << " sorry, number of processors !=2 " << endl;
3    exit(1);
4}
5
6// Parameters
7verbosity = 0;
8int interior = 2;
9int exterior = 1;
10int n = 4;
11
12// Mesh
13border a(t=1, 2){x=t; y=0; label=exterior;}
14border b(t=0, 1){x=2; y=t; label=exterior;}
15border c(t=2, 0){x=t; y=1; label=exterior;}
16border d(t=1, 0){x=1-t; y=t; label=interior;}
17border e(t=0, pi/2){x=cos(t); y=sin(t); label=interior;}
18border e1(t=pi/2, 2*pi){x=cos(t); y=sin(t); label=exterior;}
19mesh[int] Th(mpisize);
20if (mpirank == 0)
21    Th[0] = buildmesh(a(5*n) + b(5*n) + c(10*n) + d(5*n));
22else
23    Th[1] = buildmesh(e(5*n) + e1(25*n));
24
27
28// Fespace
29fespace Vh(Th[mpirank], P1);
30Vh u = 0, v;
31
32fespace Vhother(Th[1-mpirank], P1);
33Vhother U = 0;
34
35//Problem
36int i = 0;
37problem pb (u, v, init=i, solver=Cholesky)
38    = int2d(Th[mpirank])(
39          dx(u)*dx(v)
40        + dy(u)*dy(v)
41    )
42    - int2d(Th[mpirank])(
43          v
44    )
45    + on(interior, u=U)
46    + on(exterior, u= 0 )
47    ;
48
49// Loop
50for (i = 0; i < 20; i++){
51    cout << mpirank << " - Loop " << i << endl;
52
53    // Solve
54    pb;
55    //send u to the other proc, receive in U
56    processor(1-mpirank) << u[];
57    processor(1-mpirank) >> U[];
58
59    // Error
60    real err0, err1;
61    err0 = int1d(Th[mpirank],interior)(square(U - u));
62    // send err0 to the other proc, receive in err1
63    processor(1-mpirank) << err0;
64    processor(1-mpirank) >> err1;
65    real err = sqrt(err0 + err1);
66    cout << " err = " << err << " - err0 = " << err0 << " - err1 = " << err1 << endl;
67    if (err < 1e-3) break;
68}
69if (mpirank == 0)
70    plot(u, U);


Todo

script freeze in the loop

#### True parallel Schwarz example

Thank you to F. Nataf

This is a explanation of the two examples MPI-GMRES 2D and MPI-GMRES 3D, a Schwarz parallel with a complexity almost independent of the number of process (with a coarse grid preconditioner).

To solve the following Poisson problem on domain $$\Omega$$ with boundary $$\Gamma$$ in $$L^2(\Omega)$$ :

$\begin{split}\begin{array}{rcll} -\Delta u &=& f & \mbox{ in } \Omega\\ u &=& g & \mbox{ on } \Gamma \end{array}\end{split}$

where $$f$$ and $$g$$ are two given functions of $$L^2(\Omega)$$ and of $$H^{\frac12}(\Gamma)$$,

Lets introduce $$(\pi_i)_{i=1,.., N_p}$$ a regular partition of the unity of $$\Omega$$, q-e-d:

$\pi_i \in \mathcal{C}^0(\Omega) : \quad \pi_i\ge 0 \mbox{ and } \sum_{i=1}^{N_p} \pi_i =1 .$

Denote $$\Omega_i$$ the sub domain which is the support of $$\pi_i$$ function and also denote $$\Gamma_i$$ the boundary of $$\Omega_i$$.

The parallel Schwarz method is:

Let $$\ell=0$$ the iterator and a initial guest $$u^0$$ respecting the boundary condition (i.e. $$u^0_{|\Gamma} = g$$).

(37)$\begin{split}\begin{array}{rcll} \forall i = 1 .., N_p:&\nonumber\\ \displaystyle -\Delta u_i^\ell &=& f &\mbox{ in } \Omega_i\\ u_i^\ell &=& u^\ell & \mbox{ on } \Gamma_i \setminus \Gamma\\ u_i^\ell &=& g & \mbox{ on } \Gamma_i \cap \Gamma \end{array}\end{split}$
(38)$u^{\ell+1} = \sum_{i=1}^{N_p} \pi_i u_i^\ell$

After discretization with the Lagrange finite element method, with a compatible mesh $${\mathcal{T}_h}_i$$ of $$\Omega_i$$, i. e., the exist a global mesh $${\mathcal{T}_h}$$ such that $${\mathcal{T}_h}_i$$ is include in $${\mathcal{T}_h}$$.

Let us denote:

• $${V_h}_i$$ the finite element space corresponding to domain $$\Omega_i$$,

• $${\mathcal{N}_h}_i$$ is the set of the degree of freedom $$\sigma_i^k$$,

• $${\mathcal{N}^{\Gamma_i}_{hi}}$$ is the set of the degree of freedom of $${V_h}_i$$ on the boundary $$\Gamma_i$$ of $$\Omega_i$$,

• $$\sigma_i^k({v_h})$$ is the value the degree of freedom $$k$$,

• $${V_{0h}}_i= \{ {v_h} \in {V_h}_i :\forall k \in {\mathcal{N}^{\Gamma_i}_{hi}}, \quad \sigma_i^k({v_h})=0 \}$$,

• The conditional expression $$a\;?\;b:c$$ is defined like in :cC of C++ language by

$\begin{split}a?b: c \equiv \left\{ \begin{array}{l} \mbox{if } a \mbox{ is true then return b}\\ \mbox{else return } c\\ \end{array} \right..\end{split}$

Note

We never use finite element space associated to the full domain $$\Omega$$ because it is too expensive.

We have to defined to operator to build the previous algorithm:

We denote $${u_h^{\ell}}_{|i}$$ the restriction of $$u_h^\ell$$ on $${V_h}_i$$, so the discrete problem on $$\Omega_i$$ of problem (37) is find $${u_h^{\ell}}_{i}\in {V_h}_i$$ such that:

$\forall {v_h}_i\in V_{0i}: \int_{\Omega_i} \nabla {v_h}_i \cdot \nabla {u_h}^{\ell}_{i} = \int_{\Omega_i} f {v_h}_i ,\quad \forall k \in {\mathcal{N}^{\Gamma_i}_{hi}}\;:\; \sigma_i^k({u_h}^\ell_i) = (k\in \Gamma) \; ? \; g_i^k : \sigma_i^k({u_h}^{\ell}_{|i})$

where $$g_i^k$$ is the value of $$g$$ associated to the degree of freedom $$k\in {\mathcal{N}^{\Gamma_i}_{hi}}$$.

In FreeFEM, it can be written has with U is the vector corresponding to $${u_h^{\ell}}_{|i}$$ and the vector U1 is the vector corresponding to $${u_h^{\ell}}_{i}$$ is the solution of:

1real[int] U1(Ui.n);
2real[int] b = onG .* U;
3b = onG ? b : Bi ;
4U1 = Ai^-1*b;


where $$\mathtt{onG}[i] =(i \in \Gamma_i\setminus\Gamma) ? 1 : 0$$, and $$\mathtt{Bi}$$ the right of side of the problem, are defined by

 1// Fespace
2fespace Whi(Thi, P2);
3
4// Problem
5varf vPb (U, V)
6    = int3d(Thi)(
8    )
9    + int3d(Thi)(
10          F*V
11    )
12    + on(1, U=g)
13    + on(10, U=G)
14    ;
15
16varf vPbon (U, V) = on(10, U=1) + on(1, U=0);
17
18matrix Ai = vPb (Whi, Whi, solver=sparsesolver);
19real[int] onG = vPbon(0, Whi);
20real[int] Bi=vPb(0, Whi);


where the FreeFEM label of $$\Gamma$$ is 1 and the label of $$\Gamma_i\setminus \Gamma$$ is $$10$$.

To build the transfer/update part corresponding to (38) equation on process $$i$$, let us call njpart the number the neighborhood of domain of $$\Omega_i$$ (i.e: $$\pi_j$$ is none $$0$$ of $$\Omega_i$$), we store in an array jpart of size njpart all this neighborhood.

Let us introduce two array of matrix, Smj[j] to defined the vector to send from $$i$$ to $$j$$ a neighborhood process, and the matrix $$rMj[j]$$ to after to reduce owith neighborhood $$j$$ domain.

So the tranfert and update part compute $$v_i= \pi_i u_i + \sum_{j\in J_i} \pi_j u_j$$ and can be write the FreeFEM function Update:

 1func bool Update (real[int] &ui, real[int] &vi){
2    int n = jpart.n;
3    for (int j = 0; j < njpart; ++j) Usend[j][] = sMj[j]*ui;
4    mpiRequest[int] rq(n*2);
5    for (int j = 0; j < n; ++j) Irecv(processor(jpart[j], comm,rq[j]), Ri[j][]);
6    for (int j = 0; j < n; ++j) Isend(processor(jpart[j], comm, rq[j+n]), Si[j][]);
7    for (int j = 0; j < n*2; ++j) int k = mpiWaitAny(rq);
8    // apply the unity local partition
9    vi = Pii*ui; //set to pi_i u_i
10    for (int j = 0; j < njpart; ++j) vi += rMj[j]*Vrecv[j][]; //add pi_j u_j
11    return true;
12}


where the buffer are defined by:

1InitU(njpart, Whij, Thij, aThij, Usend) //defined the send buffer
2InitU(njpart, Whij, Thij, aThij, Vrecv) //defined the revc buffer


with the following macro definition:

1macro InitU(n, Vh, Th, aTh, U) Vh[int] U(n); for (int j = 0; j < n; ++j){Th = aTh[j]; U[j] = 0;}


First GMRES algorithm: you can easily accelerate the fixed point algorithm by using a parallel GMRES algorithm after the introduction the following affine $$\mathcal{A}_i$$ operator sub domain $$\Omega_i$$.

1func real[int] DJ0 (real[int]& U){
2    real[int] V(U.n), b = onG .* U;
3    b = onG ? b : Bi ;
4    V = Ai^-1*b;
5    Update(V, U);
6    V -= U;
7    return V;
8}


Where the parallel MPIGMRES or MPICG algorithm is just a simple way to solve in parallel the following $$A_i x_i = b_i, i = 1, .., N_p$$ by just changing the dot product by reduce the local dot product of all process with the following MPI code:

1template<class R> R ReduceSum1(R s, MPI_Comm *comm){
2    R r = 0;
3    MPI_Allreduce(&s, &r, 1, MPI_TYPE<R>::TYPE(), MPI_SUM, *comm );
4    return r;
5}


This is done in MPIGC dynamics library tool.

Second GMRES algorithm: Use scharwz algorithm as a preconditioner of basic GMRES method to solving the parallel problem.

 1func real[int] DJ (real[int]& U){ //the original problem
2    ++kiter;
3    real[int] V(U.n);
4    V = Ai*U;
5    V = onGi ? 0.: V; //remove boundary term
6    return V;
7}
8
9func real[int] PDJ (real[int]& U){ //the preconditioner
10    real[int] V(U.n);
11    real[int] b = onG ? 0. : U;
12    V = Ai^-1*b;
13    Update(V, U);
14    return U;
15}


Third GMRES algorithm: Add a coarse solver to the previous algorithm

First build a coarse grid on processor 0, and the

 1matrix AC, Rci, Pci;
2if (mpiRank(comm) == 0)
3    AC = vPbC(VhC, VhC, solver=sparsesolver); //the coarse problem
4
5Pci = interpolate(Whi, VhC); //the projection on coarse grid
6Rci = Pci'*Pii; //the restriction on Process i grid with the partition pi_i
7
8func bool CoarseSolve (real[int]& V, real[int]& U, mpiComm& comm){
9    // solving the coarse problem
10    real[int] Uc(Rci.n), Bc(Uc.n);
11    Uc = Rci*U;
12    mpiReduce(Uc, Bc, processor(0, comm), mpiSUM);
13    if (mpiRank(comm) == 0)
14    Uc = AC^-1*Bc;
16    V = Pci*Uc;
17}


The New preconditionner

 1func real[int] PDJC (real[int]& U){
2    // Idea: F. Nataf.
3    // 0 ~ (I C1A)(I-C2A) => I ~ - C1AC2A +C1A +C2A
4    // New Prec P= C1+C2 - C1AC2 = C1(I- A C2) +C2
5    // ( C1(I- A C2) +C2 ) Uo
6    // V = - C2*Uo
7    // ....
8    real[int] V(U.n);
9    CoarseSolve(V, U, comm);
10    V = -V; //-C2*Uo
11    U += Ai*V; //U = (I-A C2) Uo
12    real[int] b = onG ? 0. : U;
13    U = Ai^-1*b; //C1( I -A C2) Uo
14    V = U - V;
15    Update(V, U);
16    return U;
17}


The code of the 4 algorithms:

 1real epss = 1e-6;
2int rgmres = 0;
3if (gmres == 1){
4    rgmres = MPIAffineGMRES(DJ0, u[], veps=epss, nbiter=300,
5        comm=comm, dimKrylov=100, verbosity=ipart?0: 50);
6    real[int] b = onG .* u[];
7    b = onG ? b : Bi ;
8    v[] = Ai^-1*b;
9    Update(v[], u[]);
10}
11else if (gmres == 2)
12    rgmres = MPILinearGMRES(DJ, precon=PDJ, u[], Bi, veps=epss,
13        nbiter=300, comm=comm, dimKrylov=100, verbosity=ipart?0: 50);
14else if (gmres == 3)
15    rgmres = MPILinearGMRES(DJ, precon=PDJC, u[], Bi, veps=epss,
16        nbiter=300, comm=comm, dimKrylov=100, verbosity=ipart?0: 50);
17else //algo Shwarz for demo
18    for(int iter = 0; iter < 10; ++iter)
19        ...


We have all ingredient to solve in parallel if we have et the partitions of the unity. To build this partition we do:

The initial step on process $$1$$ to build a coarse mesh, $${\mathcal{T}_h}^*$$ of the full domain, and build the partition $$\pi$$ function constant equal to $$i$$ on each sub domain $$\mathcal{O}_i, i =1 ,.., N_p$$, of the grid with the metis graph partitioner [KARYPIS1995] and on each process $$i$$ in $$1..,N_p$$ do

1. Broadcast from process $$1$$, the mesh $${\mathcal{T}_h}^*$$ (call Thii in FreeFEM script), and $$\pi$$ function,

2. remark that the characteristic function $$\mathrm{1\!\!I}_{\mathcal{O}_i}$$ of domain $$\mathcal{O}_i$$, is defined by $$(\pi=i)?1:0$$,

3. Let us call $$\Pi^2_P$$ (resp. $$\Pi^2_V$$) the $$L^2$$ on $$P_h^*$$ the space of the constant finite element function per element on $${\mathcal{T}_h}^*$$ (resp. $$V_h^*$$ the space of the affine continuous finite element per element on $${\mathcal{T}_h}^*$$) and build in parallel the $$\pi_i$$ and $$\Omega_i$$, such that $$\mathcal{O}_i\ \subset \Omega_i$$ where $$\mathcal{O}_i= supp ((\Pi^2_V \Pi^2_C)^m \mathrm{1\!\!I}_{O_i})$$, and $$m$$ is a the overlaps size on the coarse mesh (generally one), (this is done in function AddLayers(Thii,suppii[],nlayer,phii[]); We choose a function $$\pi^*_i = (\Pi^2_1 \Pi^2_0)^m \mathrm{1\!\!I}_{\mathcal{O}_i}$$ so the partition of the unity is simply defined by

$\pi_i = \frac{\pi_i^*}{\sum_{j=1}^{N_p} \pi_j^*}$

The set $$J_i$$ of neighborhood of the domain $$\Omega_i$$, and the local version on $$V_{hi}$$ can be defined the array jpart and njpart with:

 1Vhi pii = piistar;
2Vhi[int] pij(npij); //local partition of 1 = pii + sum_j pij[j]
3int[int] jpart(npart);
4int njpart = 0;
5Vhi sumphi = piistar;
6for (int i = 0; i < npart; ++i)
7    if (i != ipart){
8        if (int3d(Thi)(pijstar,j) > 0){
9            pij[njpart] = pijstar;
10            sumphi[] += pij[njpart][];
11            jpart[njpart++] = i;
12        }
13    }
14pii[] = pii[] ./ sumphi[];
15for (int j = 0; j < njpart; ++j)
16pij[j][] = pij[j][] ./ sumphi[];
17jpart.resize(njpart);

4. We call $${\mathcal{T}_h}^*_{ij}$$ the sub mesh part of $${\mathcal{T}_h}_i$$ where $$\pi_j$$ are none zero. And thanks to the function trunc to build this array,

1for(int jp = 0; jp < njpart; ++jp)
2    aThij[jp] = trunc(Thi, pij[jp] > 1e-10, label=10);

5. At this step we have all on the coarse mesh, so we can build the fine final mesh by splitting all meshes: Thi, Thij[j], Thij[j] with FreeFEM trunc mesh function which do restriction and slipping.

6. The construction of the send/recv matrices sMj and freefem:rMj: can done with this code:

 1mesh3 Thij = Thi;
2fespace Whij(Thij, Pk);
3matrix Pii; Whi wpii = pii; Pii = wpii[]; //Diagonal matrix corresponding X pi_i
4matrix[int] sMj(njpart), rMj(njpart); //M send/recive case
5for (int jp = 0; jp < njpart; ++jp){
6    int j = jpart[jp];
7    Thij = aThij[jp]; //change mesh to change Whij, Whij
8    matrix I = interpolate(Whij, Whi); //Whij <- Whi
9    sMj[jp] = I*Pii; //Whi -> s Whij
10    rMj[jp] = interpolate(Whij, Whi, t=1); //Whij -> Whi
11}


To buil a not too bad application, all variables come from parameters value with the following code

1include "getARGV.idp"
2verbosity = getARGV("-vv", 0);
3int vdebug = getARGV("-d", 1);
4int ksplit = getARGV("-k", 10);
5int nloc = getARGV("-n", 25);
6string sff = getARGV("-p, ", "");
7int gmres = getARGV("-gmres", 3);
8bool dplot = getARGV("-dp", 0);
9int nC = getARGV("-N", max(nloc/10, 4));


And small include to make graphic in parallel of distribute solution of vector $$u$$ on mesh $$T_h$$ with the following interface:

1include "MPIplot.idp"
2func bool plotMPIall(mesh &Th, real[int] &u, string cm){
3    PLOTMPIALL(mesh, Pk, Th, u, {cmm=cm, nbiso=20, fill=1, dim=3, value=1});
4    return 1;
5}


Note

The cmm=cm, ... in the macro argument is a way to quote macro argument so the argument is cmm=cm, ....

## Parallel sparse solvers

Parallel sparse solvers use several processors to solve linear systems of equation. Like sequential, parallel linear solvers can be direct or iterative. In FreeFEM both are available.

### Using parallel sparse solvers in FreeFEM

We recall that the solver parameters are defined in the following commands: solve, problem, set (setting parameter of a matrix) and in the construction of the matrix corresponding to a bilinear form. In these commands, the parameter solver must be set to sparsesolver for parallel sparse solver. We have added specify parameters to these command lines for parallel sparse solvers. These are:

• lparams : vector of integer parameters (l is for the C++ type long)

• dparams : vector of real parameters

• sparams : string parameters

• datafilename : name of the file which contains solver parameters

The following four parameters are only for direct solvers and are vectors. These parameters allow the user to preprocess the matrix (see the section on sparse direct solver for more information).

• permr : row permutation (integer vector)

• permc : column permutation or inverse row permutation (integer vector)

• scaler : row scaling (real vector)

• scalec : column scaling (real vector)

There are two possibilities to control solver parameters. The first method defines parameters with lparams, dparams and sparams in .edp file.

The second one reads the solver parameters from a data file. The name of this file is specified by datafilename. If lparams, dparams, sparams or datafilename is not provided by the user, the solver’s default values are used.

To use parallel solver in FreeFEM, we need to load the dynamic library corresponding to this solver. For example to use MUMPS solver as parallel solver in FreeFEM, write in the .edp file load "MUMPS_FreeFem".

If the libraries are not loaded, the default sparse solver will be loaded (default sparse solver is UMFPACK). The Table 2 gives this new value for the different libraries.

Table 2 Default sparse solver for real and complex arithmetics when we load a parallel sparse solver library

Libraries

Default sparse solver

real

complex

MUMPS_FreeFem

mumps

mumps

real_SuperLU_DIST_FreeFem

SuperLU_DIST

previous solver

complex_SuperLU_DIST_FreeFem

previous solver

SuperLU_DIST

real_pastix_FreeFem

PaStiX

previous solver

complex_pastix_FreeFem

previous solver

PaStiX

hips_FreeFem

hips

previous solver

hypre_FreeFem

hypre

previous solver

parms_FreeFem

parms

previous solver

We also add functions (see Table 3) with no parameter to change the default sparse solver in the .edp file. To use these functions, we need to load the library corresponding to the solver. An example of using different parallel sparse solvers for the same problem is given in Direct solvers example.

Table 3 Functions that allow to change the default sparse solver for real and complex arithmetics and the result of these functions

Function

default sparse solver

real

complex

defaulttoMUMPS()

mumps

mumps

realdefaulttoSuperLUdist()

SuperLU_DIST

previous solver

complexdefaulttoSuperLUdist()

previous solver

SuperLU_DIST

realdefaultopastix()

pastix

previous solver

complexdefaulttopastix()

previous solver

pastix

defaulttohips()

hips

previous solver

defaulttohypre()

hypre

previous solver

defaulttoparms()

parms

previous solver

Tip

Test direct solvers

 1load "MUMPS_FreeFem"
2//default solver: real-> MUMPS, complex -> MUMPS
4//default solver: real-> SuperLU_DIST,
6//default solver: real-> pastix, complex -> MUMPS
7
8// Solving with pastix
9{
10    matrix A =
11        [[1, 2, 2, 1, 1],
12        [ 2, 12, 0, 10, 10],
13        [ 2, 0, 1, 0, 2],
14        [ 1, 10, 0, 22, 0.],
15        [ 1, 10, 2, 0., 22]];
16
17    real[int] xx = [1, 32, 45, 7, 2], x(5), b(5), di(5);
18    b = A*xx;
19    cout << "b =" << b << endl; cout << "xx =" << xx << endl;
20
21    set(A, solver=sparsesolver, datafilename="ffpastix_iparm_dparm.txt");
22    cout << "solve" << endl;
23    x = A^-1*b;
24    cout << "b =" << b << endl;
25    cout << "x =" << endl;
26    cout << x << endl;
27    di = xx - x;
28    if (mpirank == 0){
29        cout << "x-xx =" << endl;
30        cout << "Linf =" << di.linfty << ", L2 =" << di.l2 << endl;
31    }
32}
33
34// Solving with SuperLU_DIST
35realdefaulttoSuperLUdist();
36//default solver: real-> SuperLU_DIST, complex -> MUMPS
37{
38    matrix A =
39        [[1, 2, 2, 1, 1],
40        [ 2, 12, 0, 10, 10],
41        [ 2, 0, 1, 0, 2],
42        [ 1, 10, 0, 22, 0.],
43        [ 1, 10, 2, 0., 22]];
44
45    real[int] xx = [1, 32, 45, 7, 2], x(5), b(5), di(5);
46    b = A*xx;
47    cout << "b =" << b << endl;
48    cout << "xx =" << xx << endl;
49
50    set(A, solver=sparsesolver, datafilename="ffsuperlu_dist_fileparam.txt");
51    cout << "solve" << endl;
52    x = A^-1*b;
53    cout << "b =" << b << endl;
54    cout << "x =" << endl;
55    cout << x << endl;
56    di = xx - x;
57    if (mpirank == 0){
58        cout << "x-xx =" << endl;
59        cout << "Linf =" << di.linfty << ", L2 =" << di.l2 << endl;
60    }
61}
62
63// Solving with MUMPS
64defaulttoMUMPS();
65//default solver: real-> MUMPS, complex -> MUMPS
66{
67    matrix A =
68        [[1, 2, 2, 1, 1],
69        [ 2, 12, 0, 10, 10],
70        [ 2, 0, 1, 0, 2],
71        [ 1, 10, 0, 22, 0.],
72        [ 1, 10, 2, 0., 22]];
73
74    real[int] xx = [1, 32, 45, 7, 2], x(5), b(5), di(5);
75    b = A*xx;
76    cout << "b =" << b << endl;
77    cout << "xx =" << xx << endl;
78
79    set(A, solver=sparsesolver, datafilename="ffmumps_fileparam.txt");
80    cout << "solving solution" << endl;
81    x = A^-1*b;
82    cout << "b =" << b << endl;
83    cout << "x =" << endl;
84    cout << x << endl;
85    di = xx - x;
86    if (mpirank == 0){
87        cout << "x-xx =" << endl;
88        cout << "Linf =" << di.linfty << ", L2" << di.l2 << endl;
89    }
90}


### Sparse direct solver

In this section, we present the sparse direct solvers interfaced with FreeFEM.

#### MUMPS solver

MUltifrontal Massively Parallel Solver (MUMPS) is an open-source library.

This package solves linear system of the form $$A \: x = b$$ where $$A$$ is a square sparse matrix with a direct method. The square matrix considered in MUMPS can be either unsymmetric, symmetric positive definite or general symmetric.

The method implemented in MUMPS is a direct method based on a multifrontal approach. It constructs a direct factorization $$A \:= \: L\:U$$, $$A\: = \: L^t \: D \: L$$ depending of the symmetry of the matrix $$A$$.

MUMPS uses the following libraries :

Warning

MUMPS does not solve linear system with a rectangular matrix.

MUMPS parameters:

There are four input parameters in MUMPS. Two integers SYM and PAR, a vector of integer of size 40 INCTL and a vector of real of size 15 CNTL.

The first parameter gives the type of the matrix: 0 for unsymmetric matrix, 1 for symmetric positive matrix and 2 for general symmetric.

The second parameter defined if the host processor work during the factorization and solves steps : PAR=1 host processor working and PAR=0 host processor not working.

The parameter INCTL and CNTL is the control parameter of MUMPS. The vectors ICNTL and CNTL in MUMPS becomes with index 1 like vector in Fortran. For more details see the MUMPS user’s guide.

We describe now some elements of the main parameters of ICNTL for MUMPS.

• Input matrix parameter The input matrix is controlled by parameters ICNTL(5) and ICNTL(18).

The matrix format (resp. matrix pattern and matrix entries) are controlled by INCTL(5) (resp. INCTL(18)).

The different values of ICNTL(5) are 0 for assembled format and 1 for element format. In the current release of FreeFEM, we consider that FE matrix or matrix is storage in assembled format. Therefore, INCTL(5) is treated as 0 value.

The main option for ICNTL(18): INCLTL(18)=0 centrally on the host processor, ICNTL(18)=3 distributed the input matrix pattern and the entries (recommended option for distributed matrix by developer of MUMPS). For other values of ICNTL(18) see the MUMPS user’s guide. These values can be used also in FreeFEM.

The default option implemented in FreeFEM are ICNTL(5)=0 and ICNTL(18)=0.

• Preprocessing parameter The preprocessed matrix $$A_{p}$$ that will be effectively factored is defined by

$A_{p} = P \: D_r \: A \: Q_c \ D_c P^t$

where $$P$$ is the permutation matrix, $$Q_c$$ is the column permutation, $$D_r$$ and $$D_c$$ are diagonal matrix for respectively row and column scaling.

The ordering strategy to obtain $$P$$ is controlled by parameter ICNTL(7). The permutation of zero free diagonal $$Q_c$$ is controlled by parameter ICNTL(6). The row and column scaling is controlled by parameter ICNTL(18). These option are connected and also strongly related with ICNTL(12) (see the MUMPS user’s guide for more details).

The parameters permr, scaler, and scalec in FreeFEM allow to give permutation matrix($$P$$), row scaling ($$D_r$$) and column scaling ($$D_c$$) of the user respectively.

Calling MUMPS in FreeFEM

To call MUMPS in FreeFEM, we need to load the dynamic library MUMPS_freefem.dylib (MacOSX), MUMPS_freefem.so (Unix) or MUMPS_freefem.dll (Windows).

This is done in typing load "MUMPS_FreeFem" in the .edp file. We give now the two methods to give the option of MUMPS solver in FreeFEM.

• Solver parameters is defined in .edp file: In this method, we need to give the parameters lparams and dparams.

These parameters are defined for MUMPS by :

• lparams[0] = SYM, lparams[1] = PAR,

• $$\forall i$$ = 1,…,40, lparams[i+1] = ICNTL(i)

• $$\forall i$$ = 1,…,15, dparams[i-1] = CNTL(i)

• Reading solver parameters on a file:

The structure of data file for MUMPS in FreeFEM is : first line parameter SYM and second line parameter PAR and in the following line the different value of vectors ICNTL and CNTL. An example of this parameter file is given in ffmumpsfileparam.txt.

 10 /* SYM :: 0 for non symmetric matrix, 1 for symmetric definite positive matrix and 2 general symmetric matrix*/
21 /* PAR :: 0 host not working during factorization and solves steps, 1 host working during factorization and solves steps*/
3-1 /* ICNTL(1) :: output stream for error message */
4-1 /* ICNTL(2) :: output for diagnostic printing, statics and warning message */
5-1 /* ICNTL(3) :: for global information */
60 /* ICNTL(4) :: Level of printing for error, warning and diagnostic message */
70 /* ICNTL(5) :: matrix format : 0 assembled format, 1 elemental format. */
87 /* ICNTL(6) :: control option for permuting and/or scaling the matrix in analysis phase */
93 /* ICNTL(7) :: pivot order strategy : AMD, AMF, metis, pord scotch*/
1077 /* ICNTL(8) :: Row and Column scaling strategy */
111 /* ICNTL(9) :: 0 solve Ax = b, 1 solve the transposed system A^t x = b : parameter is not considered in the current release of FreeFEM*/
120 /* ICNTL(10) :: number of steps of iterative refinement */
130 /* ICNTL(11) :: statics related to linear system depending on ICNTL(9) */
141 /* ICNTL(12) :: constrained ordering strategy for general symmetric matrix */
150 /* ICNTL(13) :: method to control splitting of the root frontal matrix */
1620 /* ICNTL(14) :: percentage increase in the estimated working space (default 20\%)*/
170 /* ICNTL(15) :: not used in this release of MUMPS */
180 /* ICNTL(16) :: not used in this release of MUMPS */
190 /* ICNTL(17) :: not used in this release of MUMPS */
203 /* ICNTL(18) :: method for given : matrix pattern and matrix entries : */
210 /* ICNTL(19) :: method to return the Schur complement matrix */
220 /* ICNTL(20) :: right hand side form ( 0 dense form, 1 sparse form) : parameter will be set to 0 for FreeFEM */
230 /* ICNTL(21) :: 0, 1 kept distributed solution : parameter is not considered in the current release of FreeFEM */
240 /* ICNTL(22) :: controls the in-core/out-of-core (OOC) facility */
250 /* ICNTL(23) :: maximum size of the working memory in Megabyte than MUMPS can allocate per working processor */
260 /* ICNTL(24) :: control the detection of null pivot */
270 /* ICNTL(25) :: control the computation of a null space basis */
280 /* ICNTL(26) :: This parameter is only significant with Schur option (ICNTL(19) not zero). : parameter is not considered in the current release of FreeFEM */
29-8 /* ICNTL(27) (Experimental parameter subject to change in next release of MUMPS) :: control the blocking factor for multiple righthand side during the solution phase : parameter is not considered in the current release of FreeFEM */
300 /* ICNTL(28) :: not used in this release of MUMPS*/
310 /* ICNTL(29) :: not used in this release of MUMPS*/
320 /* ICNTL(30) :: not used in this release of MUMPS*/
330 /* ICNTL(31) :: not used in this release of MUMPS*/
340 /* ICNTL(32) :: not used in this release of MUMPS*/
350 /* ICNTL(33) :: not used in this release of MUMPS*/
360 /* ICNTL(34) :: not used in this release of MUMPS*/
370 /* ICNTL(35) :: not used in this release of MUMPS*/
380 /* ICNTL(36) :: not used in this release of MUMPS*/
390 /* ICNTL(37) :: not used in this release of MUMPS*/
400 /* ICNTL(38) :: not used in this release of MUMPS*/
411 /* ICNTL(39) :: not used in this release of MUMPS*/
420 /* ICNTL(40) :: not used in this release of MUMPS*/
430.01 /* CNTL(1) :: relative threshold for numerical pivoting */
441e-8 /* CNTL(2) :: stopping criteria for iterative refinement */
45-1 /* CNTL(3) :: threshold for null pivot detection */
46-1 /* CNTL(4) :: determine the threshold for partial pivoting */
470.0 /* CNTL(5) :: fixation for null pivots */
480 /* CNTL(6) :: not used in this release of MUMPS */
490 /* CNTL(7) :: not used in this release of MUMPS */
500 /* CNTL(8) :: not used in this release of MUMPS */
510 /* CNTL(9) :: not used in this release of MUMPS */
520 /* CNTL(10) :: not used in this release of MUMPS */
530 /* CNTL(11) :: not used in this release of MUMPS */
540 /* CNTL(12) :: not used in this release of MUMPS */
550 /* CNTL(13) :: not used in this release of MUMPS */
560 /* CNTL(14) :: not used in this release of MUMPS */
570 /* CNTL(15) :: not used in this release of MUMPS */


If no solver parameter is given, we used default option of MUMPS solver.

Tip

MUMPS example

A simple example of calling MUMPS in FreeFEM with this two methods is given in the Test solver MUMPS example.

#### SuperLU distributed solver

The package SuperLU_DIST solves linear systems using LU factorization. It is a free scientific library

This library provides functions to handle square or rectangular matrix in real and complex arithmetics. The method implemented in SuperLU_DIST is a supernodal method. New release of this package includes a parallel symbolic factorization. This scientific library is written in C and MPI for communications.

SuperLU_DIST parameters:

We describe now some parameters of SuperLU_DIST. The SuperLU_DIST library use a 2D-logical process group. This process grid is specified by $$nprow$$ (process row) and $$npcol$$ (process column) such that $$N_{p} = nprow \: npcol$$ where $$N_{p}$$ is the number of all process allocated for SuperLU_DIST.

The input matrix parameters is controlled by “matrix=” in sparams for internal parameter or in the third line of parameters file. The different value are

• matrix=assembled global matrix are available on all process

• matrix=distributedglobal The global matrix is distributed among all the process

• matrix=distributed The input matrix is distributed (not yet implemented)

The option arguments of SuperLU_DIST are described in the section Users-callable routine of the SuperLU users’ guide.

The parameter Fact and TRANS are specified in FreeFEM interfaces to SuperLU_DIST during the different steps. For this reason, the value given by the user for this option is not considered.

The factorization LU is calculated in SuperLU_DIST on the matrix $$A_p$$.

$A_{p} = P_{c} \: P_r \: D_r \: A \: D_{c} \: P_{c}^{t}$

where $$P_c$$ and $$P_r$$ is the row and column permutation matrix respectively, $$D_r$$ and $$D_c$$ are diagonal matrix for respectively row and column scaling.

The option argument RowPerm (resp. ColPerm) control the row (resp. column) permutation matrix. $$D_r$$ and $$D_c$$ is controlled by the parameter DiagScale.

The parameter permr, permc, scaler, and scalec in FreeFEM is provided to give row permutation, column permutation, row scaling and column scaling of the user respectively.

The other parameters for LU factorization are ParSymFact and ReplaceTinyPivot. The parallel symbolic factorization works only on a power of two processes and need the ParMetis ordering. The default option argument of SuperLU_DIST are given in the file ffsuperlu_dist_fileparam.txt.

Calling SuperLU_DIST in FreeFEM

To call SuperLU_DIST in FreeFEM, we need to load the library dynamic correspond to interface. This done by the following line load "real_superlu _DIST_FreeFem" (resp. load "complex_superlu_DIST_FreeFem") for real (resp. complex) arithmetics in the file .edp.

Solver parameters is defined in .edp file:

To call SuperLU_DIST with internal parameter, we used the parameters sparams. The value of parameters of SuperLU_DIST in sparams are defined by :

• nprow=1,

• npcol=1,

• matrix= distributedgloba,

• Fact= DOFACT,

• Equil=NO,

• ParSymbFact=NO,

• ColPerm= MMD_AT_PLUS_A,

• RowPerm= LargeDiag,

• DiagPivotThresh=1.0,

• IterRefine=DOUBLE,

• Trans=NOTRANS,

• ReplaceTinyPivot=NO,

• SolveInitialized=NO,

• PrintStat=NO,

• DiagScale=NOEQUIL

This value correspond to the parameter in the file ffsuperlu_dist_fileparam.txt. If one parameter is not specified by the user, we take the default value of SuperLU_DIST.

Reading solver parameters on a file: The structure of data file for SuperLU_DIST in FreeFEM is given in the file ffsuperlu_dist_fileparam.txt (default value of the FreeFEM interface).

 11 /* nprow : integer value */
21 /* npcol : integer value */
3distributedglobal /* matrix input : assembled, distributedglobal, distributed */
4DOFACT /* Fact : DOFACT, SamePattern, SamePattern_SameRowPerm, FACTORED */
5NO /* Equil : NO, YES */
6NO /* ParSymbFact : NO, YES */
7MMD_AT_PLUS_A /* ColPerm : NATURAL, MMD_AT_PLUS_A, MMD_ATA, METIS_AT_PLUS_A, PARMETIS, MY_PERMC */
8LargeDiag /* RowPerm : NOROWPERM, LargeDiag, MY_PERMR */
91.0 /* DiagPivotThresh : real value */
10DOUBLE /* IterRefine : NOREFINE, SINGLE, DOUBLE, EXTRA */
11NOTRANS /* Trans : NOTRANS, TRANS, CONJ*/
12NO /* ReplaceTinyPivot : NO, YES*/
13NO /* SolveInitialized : NO, YES*/
14NO /* RefineInitialized : NO, YES*/
15NO /* PrintStat : NO, YES*/
16NOEQUIL /* DiagScale : NOEQUIL, ROW, COL, BOTH*/


If no solver parameter is given, we used default option of SuperLU_DIST solver.

Tip

A simple example of calling SuperLU_DIST in FreeFEM with this two methods is given in the Solver superLU_DIST example.

#### PaStiX solver

PaStiX (Parallel Sparse matrix package) is a free scientific library under CECILL-C license. This package solves sparse linear system with a direct and block ILU(k) iterative methods. his solver can be applied to a real or complex matrix with a symmetric pattern.

PaStiX parameters:

The input matrix parameter of FreeFEM depend on PaStiX interface. matrix = assembled for non distributed matrix. It is the same parameter for SuperLU_DIST.

There are four parameters in PaStiX : iparm, dparm, perm and invp. These parameters are respectively the integer parameters (vector of size 64), real parameters (vector of size 64), permutation matrix and inverse permutation matrix respectively. iparm and dparm vectors are described in PaStiX RefCard.

The parameters permr and permc in FreeFEM are provided to give permutation matrix and inverse permutation matrix of the user respectively.

Solver parameters defined in .edp file:

To call PaStiX in FreeFEM in this case, we need to specify the parameters lparams and dparams. These parameters are defined by :

$$\forall i$$ = 0,… ,63, lparams[i] = iparm[i].

$$\forall i$$ = 0,… ,63, dparams[i] = dparm[i].

Reading solver parameters on a file:

The structure of data file for PaStiX parameters in FreeFEM is: first line structure parameters of the matrix and in the following line the value of vectors iparm and dparm in this order.

 1assembled /* matrix input :: assembled, distributed global and distributed */
2iparm[0]
3iparm[1]
4...
5...
6iparm[63]
7dparm[0]
8dparm[1]
9...
10...
11dparm[63]


An example of this file parameter is given in ffpastix_iparm_dparm.txt with a description of these parameters. This file is obtained with the example file iparm.txt and dparm.txt including in the PaStiX package.

If no solver parameter is given, we use the default option of PaStiX solver.

Tip

A simple example of calling PaStiX in FreeFEM with this two methods is given in the Solver PaStiX example.

In Table 4, we recall the different matrix considering in the different direct solvers.

Table 4 Type of matrix used by the different direct sparse solver

direct solver

square matrix

rectangular matrix

sym

sym pattern

unsym

sym

sym pattern

unsym

SuperLU_DIST

yes

yes

yes

yes

yes

yes

MUMPS

yes

yes

yes

no

no

no

Pastix

yes

yes

no

no

no

no

### Parallel sparse iterative solver

Concerning iterative solvers, we have chosen pARMS, HIPS and Hypre.

Each software implements a different type of parallel preconditioner.

So, pARMS implements algebraic domain decomposition preconditioner type such as additive Schwartz [CAI1989] and interface method; while HIPS implement hierarchical incomplete factorization and finally HYPRE implements multilevel preconditioner are AMG(Algebraic MultiGrid) and parallel approximated inverse.

To use one of these programs in FreeFEM, you have to install it independently of FreeFEM. It is also necessary to install the MPI communication library which is essential for communication between the processors and, in some cases, software partitioning graphs like METIS or Scotch.

All this preconditioners are used with Krylov subspace methods accelerators.

Krylov subspace methods are iterative methods which consist in finding a solution $$x$$ of linear system $$Ax=b$$ inside the affine space $$x_0+K_m$$ by imposing that $$b-Ax \bot \mathcal{L}_m$$, where $$K_m$$ is Krylov subspace of dimension $$m$$ defined by $$K_m=\{r_0, Ar_0, A^2r_0,...,A^{m-1}r_0\}$$ and $$\mathcal{L}_m$$ is another subspace of dimension $$m$$ which depends on type of Krylov subspace. For example in GMRES, $$\mathcal{L}_m=AK_m$$.

We realized an interface which is easy to use, so that the call of these different softwares in FreeFEM is done in the same way. You just have to load the solver and then specify the parameters to apply to the specific solvers. In the rest of this chapter, when we talk about Krylov subspace methods we mean one among GMRES, CG and BICGSTAB.

#### pARMS solver

pARMS (parallel Algebraic Multilevel Solver) is a software developed by Youssef Saad and al at University of Minnesota.

This software is specialized in the resolution of large sparse non symmetric linear systems of equation. Solvers developed in pARMS are of type “Krylov’s subspace”.

It consists of variants of GMRES like FGMRES (Flexible GMRES), DGMRES (Deflated GMRES) [SAAD2003] and BICGSTAB. pARMS also implements parallel preconditioner like RAS (Restricted Additive Schwarz) [CAI1989] and Schur Complement type preconditioner.

All these parallel preconditioners are based on the principle of domain decomposition. Thus, the matrix $$A$$ is partitioned into sub matrices $$A_i$$($$i=1,...,p$$) where p represents the number of partitions one needs. The union of $$A_i$$ forms the original matrix. The solution of the overall system is obtained by solving the local systems on $$A_i$$ (see [SMITH1996]). Therefore, a distinction is made between iterations on $$A$$ and the local iterations on $$A_i$$.

To solve the local problem on $$A_i$$ there are several preconditioners as ilut (Incomplete LU with threshold), iluk (Incomplete LU with level of fill in) and ARMS (Algebraic Recursive Multilevel Solver).

Tip

Default parameters

 1load "parms_FreeFem" //Tell FreeFem that you will use pARMS
2
3// Mesh
4border C(t=0, 2*pi){x=cos(t); y=sin(t); label=1;}
5mesh Th = buildmesh (C(50));
6
7// Fespace
8fespace Vh(Th, P2); Vh u, v;
9
10// Function
11func f= x*y;
12
13// Problem
14problem Poisson (u, v, solver=sparsesolver)
15    = int2d(Th)(
16          dx(u)*dx(v)
17        + dy(u)*dy(v) )
18    + int2d(Th)(
19        - f*v
20    )
21    + on(1, u=0) ;
22
23// Solve
24real cpu = clock();
25Poisson;
26cout << " CPU time = " << clock()-cpu << endl;
27
28// Plot
29plot(u);


In line 1, the pARMS dynamic library is loaded with interface FreeFEM. After this, in line 15 we specify that the bilinear form will be solved by the last sparse linear solver load in memory which, in this case, is pARMS.

The parameters used in pARMS in this case are the default one since the user does not have to provide any parameter.

Note

In order to see the plot of a parallel script, run the command FreeFem++-mpi -glut ffglut script.edp

Here are some default parameters:

• solver=FGMRES,

• Krylov dimension=30,

• Maximum of Krylov=1000,

• Tolerance for convergence=1e-08 (see book [SAAD2003] to understand all this parameters),

• preconditionner=Restricted Additif Schwarz [CAI1989],

• Inner Krylov dimension=5,

• Maximum of inner Krylov dimension=5,

• Inner preconditionner=ILUK.

To specify the parameters to apply to the solver, the user can either give an integer vector for integer parameters and real vectors for real parameters or provide a file which contains those parameters.

Tip

User specifies parameters inside two vectors

Lets us consider Navier-Stokes example. In this example we solve linear systems coming from discretization of Navier-Stokes equations with pARMS. Parameters of solver is specified by user.

 1load "parms_FreeFem"
2
3// Parameters
4real nu = 1.;
5int[int] iparm(16);
6real[int] dparm(6);
7for (int ii = 0; ii < 16; ii++)
8    iparm[ii] = -1;
9for (int ii = 0; ii < 6; ii++)
10    dparm[ii] = -1.0; iparm[0]=0;
11
12// Mesh
13mesh Th = square(10, 10);
14int[int] wall = [1, 3];
15int inlet = 4;
16
17// Fespace
18fespace Vh(Th, [P2, P2, P1]);
19
20// Function
21func uc = 1.;
22
23// Problem
24varf Stokes ([u, v, p], [ush, vsh, psh], solver=sparsesolver)
25    = int2d(Th)(
26          nu*(
27            dx(u)*dx(ush)
28            + dy(u)*dy(ush)
29            + dx(v)*dx(vsh)
30            + dy(v)*dy(vsh)
31        )
32        - p*psh*(1.e-6)
33        - p*(dx(ush) + dy(vsh))
34        - (dx(u) + dy(v))*psh
35    )
36    + on(wall, wall, u=0., v=0.)
37    + on(inlet, u=uc, v=0) ;
38
39matrix AA = Stokes(Vh, Vh);
40set(AA, solver=sparsesolver, lparams=iparm, dparams=dparm); //set pARMS as linear solver
41real[int] bb = Stokes(0, Vh);
42real[int] sol(AA.n);
43sol = AA^-1 * bb;


We need two vectors to specify the parameters of the linear solver. In line 5-6 of the example, we have declared these vectors(int[int] iparm(16); real[int] dparm(6);). In line 7-10 we have initialized these vectors by negative values.

We do this because all parameters values in pARMS are positive and if you do not change the negative values of one entry of this vector, the default value will be set.

In Table 7 and Table 8, we have the meaning of different entries of these vectors.

We run this example on a cluster paradent of Grid5000 and report results in Table 5.

Table 5 Convergence and time for solving linear system

$$n=471281$$ $$nnz=13\times10^6$$ $$Te=571.29$$

np

shur(iluk)

nit

time

nit

time

4

230

637.57

21

557.8

8

240

364.12

22

302.25

16

247

212.07

24

167.5

32

261

111.16

25

81.5

Table 6 Legend of Table 5

n

matrix size

nnz

number of non null entries inside matrix

nit

number of iteration for convergence

time

Time for convergence

Te

Time for constructing finite element matrix

np

number of processor

In this example, we fix the matrix size (in term of finite element, we fix the mesh) and increase the number of processors used to solve the linear system. We saw that, when the number of processors increases, the time for solving the linear equation decreases, even if the number of iteration increases. This proves that, using pARMS as solver of linear systems coming from discretization of partial differential equation in FreeFEM can decrease drastically the total time of simulation.

Table 7 Meaning of lparams corresponding variables

Entries of iparm

Significations of each entries

iparm[0]

Krylov subspace methods

Different values for this parameters are specify on Table 9

iparm[1]

Preconditionner

Different preconditionners for this parameters are specify on Table 10

iparm[2]

Krylov subspace dimension in outer iteration: default value 30

iparm[3]

Maximum of iterations in outer iteration: default value 1000

iparm[4]

Number of level in arms when used

iparm[5]

Krylov subspace dimension in inner iteration: default value 3

iparm[6]

Maximum of iterations in inner iteration: default value 3

iparm[7]

Symmetric(=1 for symmetric) or unsymmetric matrix:

default value 0(unsymmetric matrix)

iparm[8]

Overlap size between different subdomain: default value 0(no overlap)

iparm[9]

Scale the input matrix or not: Default value 1 (Matrix should be scaled)

iparm[10]

Block size in arms when used: default value 20

iparm[11]

lfil0 (ilut, iluk, and arms) : default value 20

iparm[12]

lfil for Schur complement const : default value 20

iparm[13]

lfil for Schur complement const : default value 20

iparm[14]

Multicoloring or not in ILU when used : default value 1

iparm[15]

Inner iteration : default value 0

iparm[16]

Print message when solving: default 0 (no message print)

• 0: no message is print,

• 1: Convergence informations like number of iteration and residual,

• 2: Timing for a different step like preconditioner,

• 3 : Print all informations

Table 8 Significations of dparams corresponding variables

Entries of dparm

Significations of each entries

dparm[0]

precision for outer iteration : default value 1e-08

dparm[1]

precision for inner iteration: default value 1e-2

dparm[2]

tolerance used for diagonal domain: : default value 0.1

dparm[3]

drop tolerance droptol0 (ilut, iluk, and arms) : default value 1e-2

dparm[4]

droptol for Schur complement const: default value 1e-2

dparm[5]

droptol for Schur complement const: default value 1e-2

Table 9 Krylov Solvers in pARMS

Values of iparm[0]

Krylov subspace methods

0

FGMRES (Flexible GMRES)

1

DGMRES (Deflated GMRES)

2

BICGSTAB

Table 10 Preconditionners in pARMS

Values of iparm[1]

Preconditionners type

0

additive Schwartz preconditioner with ilu0 as local preconditioner

1

additive Schwartz preconditioner with iluk as local preconditioner

2

additive Schwartz preconditioner with ilut as local preconditioner

3

additive Schwartz preconditioner with arms as local preconditioner

4

Left Schur complement preconditioner with ilu0 as local preconditioner

5

Left Schur complement preconditioner with ilut as local preconditioner

6

Left Schur complement preconditioner with iluk as local preconditioner

7

Left Schur complement preconditioner with arms as local preconditioner

8

Right Schur complement preconditioner with ilu0 as local preconditioner

9

Right Schur complement preconditioner with ilut as local preconditioner

10

Right Schur complement preconditioner with iluk as local preconditioner

11

Right Schur complement preconditioner with arms as local preconditioner

12

sch_gilu0, Schur complement preconditioner with global ilu0

13

SchurSymmetric GS preconditioner

#### Interfacing with HIPS

HIPS (Hierarchical Iterative Parallel Solver) is a scientific library that provides an efficient parallel iterative solver for very large sparse linear systems. HIPS is available as free software under the CeCILL-C licence.

HIPS implements two solver classes which are the iteratives class (GMRES, PCG) and the Direct class. Concerning preconditionners, HIPS implements a type of multilevel ILU. For further informations on those preconditionners see the HIPS documentation.

Tip

Laplacian 3D solved with HIPS

Let us consider the 3D Laplacian example inside FreeFEM package where after discretization we want to solve the linear equation with HIPS.

The following example is a Laplacian 3D using Hips as linear solver. We first load Hips solver at line 2. From line 7 to 18 we specify the parameters for the Hips solver and in line 82 we set these parameters in the linear solver.

In Table 11 results of running on Cluster Paradent of Grid5000 are reported. We can see in this running example the efficiency of parallelism.

 1load "msh3"
3
4// Parameters
5int nn = 10;
6real zmin = 0, zmax = 1;
7int[int] iparm(14);
8real[int] dparm(6);
9for (int iii = 0; iii < 14; iii++)
10    iparm[iii] = -1;
11for (int iii = 0; iii < 6; iii++)
12    dparm[iii] = -1;
13iparm[0] = 0; //use iterative solver
14iparm[1] = 1; //PCG as Krylov method
15iparm[4] = 0; //Matrix are symmetric
16iparm[5] = 1; //Pattern are also symmetric
17iparm[9] = 1; //Scale matrix
18dparm[0] = 1e-13; //Tolerance to convergence
19dparm[1] = 5e-4; //Threshold in ILUT
20dparm[2] = 5e-4; //Threshold for Schur preconditionner
21
22// Functions
23func ue = 2*x*x + 3*y*y + 4*z*z + 5*x*y + 6*x*z + 1;
24func uex = 4*x + 5*y + 6*z;
25func uey = 6*y + 5*x;
26func uez = 8*z + 6*x;
27func f = -18.;
28
29// Mesh
30mesh Th2 = square(nn, nn);
31
32int[int] rup = [0,2], rdown=[0, 1];
33int[int] rmid=[1, 1, 2, 1, 3, 1, 4, 1];
34
35mesh3 Th=buildlayers(Th2, nn, zbound=[zmin, zmax], reffacemid=rmid,
36    reffaceup = rup, reffacelow = rdown);
37
38// Fespace
39fespace Vh2(Th2, P2);
40Vh2 ux, uz, p2;
41
42fespace Vh(Th, P2);
43Vh uhe = ue;
44cout << "uhe min =" << uhe[].min << ", max =" << uhe[].max << endl;
45Vh u, v;
46Vh F;
47
48// Macro
49macro Grad3(u) [dx(u), dy(u), dz(u)] //
50
51// Problem
52varf va (u, v)
53    = int3d(Th)(
55    )
56    + int2d(Th, 2)(
57          u*v
58    )
59    - int3d(Th)(
60          f*v
61    )
62    - int2d(Th, 2)(
63          ue*v + (uex*N.x + uey*N.y + uez*N.z)*v
64    )
65    + on(1, u=ue);
66
67varf l (unused, v) = int3d(Th)(f*v);
68
69real cpu=clock();
70matrix Aa = va(Vh, Vh);
71
72F[] = va(0, Vh);
73
74if (mpirank == 0){
75    cout << "Size of A =" << Aa.n << endl;
76    cout << "Non zero coefficients =" << Aa.nbcoef << endl;
77    cout << "CPU TIME FOR FORMING MATRIX =" << clock()-cpu << endl;
78}
79
80set(Aa, solver=sparsesolver, dparams=dparm, lparams=iparm); //Set hips as linear solver
81
82// Solve
83u[] = Aa^-1*F[];
84
85// Plot
86plot(u);

Table 11 Legend of this table are give in Table 6

$$n=4\times 10^6$$ $$nnz=118 \times 10^6$$ $$Te=221.34$$

np

nit

time

8

190

120.34

16

189

61.08

32

186

31.70

64

183

23.44

Tip

Table 12 Significations of lparams corresponding to HIPS interface

Entries of iparm

Significations of each entries

iparm[0]

Strategy use for solving (Iterative=0 or Hybrid=1 or Direct=2).

Defaults values are : Iterative

iparm[1]

Krylov methods.

If iparm[0]=0, give type of Krylov methods: 0 for GMRES, 1 for PCG

iparm[2]

Maximum of iterations in outer iteration: default value 1000

iparm[3]

Krylov subspace dimension in outer iteration: default value 40

iparm[4]

Symmetric(=0 for symmetric) and 1 for unsymmetricmatrix:

default value 1 (unsymmetric matrix)

iparm[5]

Pattern of matrix are symmetric or not: default value 0

iparm[6]

Partition type of input matrix: default value 0

iparm[7]

Number of level that use the HIPS locally consistentfill-in:

Default value 2

iparm[8]

Numbering in indices array will start at 0 or 1: Default value 0

iparm[9]

Scale matrix. Default value 1

iparm[10]

Reordering use inside subdomains for reducingfill-in:

Only use for iterative. Default value 1

iparm[11]

Number of unknowns per node in the matrix non-zeropattern graph:

Default value 1

iparm[12]

This value is used to set the number of time the

normalization is applied to the matrix: Default 2.

iparm[13]

Level of informations printed during solving: Default 5.

iparm[14]

HIPS_DOMSIZE Subdomain size

 dparm[0] HIPS_PREC: Relative residual norm: Default=1e-9 dparm[1] HIPS_DROPTOL0: Numerical threshold in ILUT for interior domain (important : set 0.0 in HYBRID: Default=0.005) dparm[2] HIPS_DROPTOL1 : Numerical threshold in ILUT for Schur preconditioner: Default=0.005 dparm[3] HIPS_DROPTOLE : Numerical threshold for coupling between the interior level and Schur: Default 0.005 dparm[4] HIPS_AMALG : Numerical threshold for coupling between the interior level and Schur: Default=0.005 dparm[5] HIPS_DROPSCHUR : Numerical threshold for coupling between the interior level and Schur: Default=0.005

#### Interfacing with HYPRE

Hypre (High Level Preconditioner) is a suite of parallel preconditioner developed at Lawrence Livermore National Lab.

There are two main classes of preconditioners developed in HYPRE: AMG (Algebraic MultiGrid) and Parasails (Parallel Sparse Approximate Inverse).

Now, suppose we want to solve $$Ax=b$$.

At the heart of AMG there is a series of progressively coarser (smaller) representations of the matrix $$A$$. Given an approximation $$\hat{x}$$ to the solution $$x$$, consider solving the residual equation $$Ae=r$$ to find the error $$e$$, where $$r=b-A\hat{x}$$. A fundamental principle of AMG is that it is an algebraically smooth error. To reduce the algebraically smooth errors further, they need to be represented by a smaller defect equation (coarse grid residual equation) $$A_ce_c=r_c$$, which is cheaper to solve. After solving this coarse equation, the solution is then interpolated in fine grid represented here by matrix $$A$$. The quality of AMG depends on the choice of coarsening and interpolating operators.

The sparse approximate inverse approximates the inverse of a matrix $$A$$ by a sparse matrix $$M$$. A technical idea to construct matrix $$M$$ is to minimize the Frobenuis norm of the residual matrix $$I-MA$$. For more details on this preconditioner technics see [CHOW1997].

HYPRE implement three Krylov subspace solvers: GMRES, PCG and BiCGStab.

Tip

Laplacian 3D solved with HYPRE

Let us consider again the 3D Laplacian example inside FreeFEM package where after discretization we want to solve the linear equation with Hypre. The following example is a Laplacian 3D using Hypre as linear solver. This is the same example as Hips one, so we just show here the lines where we set some Hypre parameters.

We first load the Hypre solver at line 2. From line 6 to 18 we specifies the parameters to set to Hypre solver and in line 22 we set parameters to Hypre solver.

It should be noted that the meaning of the entries of these vectors is different from those of Hips. In the case of HYPRE, the meaning of differents entries of vectors iparm and dparm are given in Table 14 to Table 18.

In Table 19 the results of running on Cluster Paradent of Grid5000 are reported. We can see in this running example the efficiency of parallelism, in particular when AMG are use as preconditioner.

 1load "msh3"
3
4// Parameters
5int nn = 10;
6int[int] iparm(20);
7real[int] dparm(6);
8for (int iii = 0; iii < 20; iii++)
9    iparm[iii] = -1;
10for (int iii = 0; iii < 6; iii++)
11    dparm[iii] = -1;
12iparm[0] = 2; //PCG as krylov method
13iparm[1] = 0; //AMG as preconditionner 2: if ParaSails
14iparm[7] = 7; //Interpolation
15iparm[9] = 6; //AMG Coarsen type
16iparm[10] = 1; //Measure type
17iparm[16] = 2; //Additive schwarz as smoother
18dparm[0] = 1e-13; //Tolerance to convergence
19dparm[1] = 5e-4; //Threshold
20dparm[2] = 5e-4; //Truncation factor
21
22...
23
24set(Aa, solver=sparsesolver, dparams=dparm, lparams=iparm);

 iparms[0] Solver identification: 0: BiCGStab, 1: GMRES, 2: PCG. Default=1 iparms[1] Preconditioner identification: 0: BOOMER AMG, 1: PILUT, 2: Parasails, 3: Schwartz Default=0 iparms[2] Maximum of iteration: Default=1000 iparms[3] Krylov subspace dim: Default= 40 iparms[4] Solver print info level: Default=2 iparms[5] Solver log: Default=1 iparms[6] Solver stopping criteria only for BiCGStab : Default=1 dparms[0] Tolerance for convergence: Default=:math:1.0e-11
 iparms[7] AMG interpolation type: Default=6 iparms[8] Specifies the use of GSMG - geometrically smooth coarsening and interpolation: Default=1 iparms[9] AMG coarsen type: Default=6 iparms[10] Defines whether local or global measures are used: Default=1 iparms[11] AMG cycle type: Default=1 iparms[12] AMG Smoother type: Default=1 iparms[13] AMG number of levels for smoothers: Default=3 iparms[14] AMG number of sweeps for smoothers: Default=2 iparms[15] Maximum number of multigrid levels: Default=25 iparms[16] Defines which variant of the Schwartz method isused: 0: hybrid multiplicative Schwartz method (no overlap across processor boundaries) 1: hybrid additive Schwartz method (no overlap across processor boundaries) 2: additive Schwartz method 3: hybrid multiplicative Schwartz method (with overlap across processor boundaries) Default=1 iparms[17] Size of the system of PDEs: Default=1 iparms[18] Overlap for the Schwarz method: Default=1 iparms[19] Type of domain used for the Schwarz method 0: each point is a domain 1: each node is a domain (only of interest in “systems” AMG) 2: each domain is generated by agglomeration (default) dparms[1] AMG strength threshold: Default=0.25 dparms[2] Truncation factor for the interpolation: Default=1e-2 dparms[3] Sets a parameter to modify the definition of strength for diagonal dominant portions of the matrix: Default=0.9 dparms[3] Defines a smoothing parameter for the additive Schwartz method. Default=1
 iparms[7] Row size in Parallel ILUT: Default=1000 iparms[8] Set maximum number of iterations: Default=30 dparms[1] Drop tolerance in Parallel ILUT: Default=1e-5
 iparms[7] Number of levels in Parallel Sparse Approximate inverse: Default=1 iparms[8] Symmetric parameter for the ParaSails preconditioner: 0: nonsymmetric and/or indefinite problem, and nonsymmetric preconditioner 1: SPD problem, and SPD (factored) preconditioner 2: nonsymmetric, definite problem, and SPD (factored) preconditioner Default=0 dparms[1] Filters parameters. The filter parameter is used to drop small nonzeros in the preconditioner, to reduce the cost of applying the preconditioner: Default=0.1 dparms[2] Threshold parameter: Default=0.1
 iparms[7] Defines which variant of the Schwartz method isused: 0: hybrid multiplicative Schwartz method (no overlap across processor boundaries) 1: hybrid additive Schwartz method (no overlap across processor boundaries) 2: additive Schwartz method 3: hybrid multiplicative Schwartz method (with overlap across processor boundaries) Default=1 iparms[8] Overlap for the Schwartz method: Default=1 iparms[9] Type of domain used for the Schwartz method 0: each point is a domain 1: each node is a domain (only of interest in “systems” AMG) 2: each domain is generated by agglomeration (default)
Table 19 Convergence and time for solving linear system

$$n=4\times10^6$$

$$nnz=13\times10^6$$

$$Te = 571.29$$

np

AMG

nit

time

8

6

1491.83

16

5

708.49

32

4

296.22

64

4

145.64

#### Conclusion

With the different runs presented here, we wanted to illustrate the gain in time when we increase the number of processors used for the simulations. We saw that in every case the time for the construction of the finite element matrix is constant. This is normal because until now this phase is sequential in FreeFEM. In contrast, phases for solving the linear system are parallel. We saw on several examples presented here that when we increase the number of processors, in general we decrease the time used for solving the linear systems. But this is not true in every case. In several case, when we increase the number of processors the time to convergence also increases. There are two main reasons for this. First, the increase of processors can lead to the increase of volume of exchanged data across processors consequently increasing the time for solving the linear systems.

Furthermore, in decomposition domain type preconditioners, the number of processors generally corresponds to the number of sub domains. In subdomain methods, generally when we increase the number of subdomains we decrease convergence quality of the preconditioner. This can increase the time used for solving linear equations.

To end this, we should note that good use of the preconditioners interfaced in FreeFEM is empiric, because it is difficult to know what is a good preconditioner for some type of problems. Although, the efficiency of preconditioners sometimes depends on how its parameters are set. For this reason we advise the user to pay attention to the meaning of the parameters in the user guide of the iterative solvers interfaced in FreeFEM.

### Domain decomposition

In the previous section, we saw that the phases to construct a matrix are sequential. One strategy to construct the matrix in parallel is to divide geometrically the domain into subdomains. In every subdomain we construct a local submatrix and after that we assemble every submatrix to form the global matrix.

We can use this technique to solve PDE directly in domain $$\Omega$$. In this case, in every subdomains you have to define artificial boundary conditions to form consistent equations in every subdomains. After this, you solve equation in every subdomains and define a strategy to obtain the global solution.

In terms of parallel programming for FreeFEM, with MPI, this means that the user must be able to divide processors avaible for computation into subgroups of processors and also must be able to realize different type of communications in FreeFEM script. Here is a wrapper of some MPI functions.

#### Communicators and groups

Groups

mpiGroup grpe(mpiGroup gp, KN_<long>): Create MPI_Group from existing group gp by given vector.

Communicators

Communicators is an abstract MPI object which allows MPI user to communicate across group of processors. Communicators can be Intra-communicators(involves a single group) or Inter-communicators (involves two groups). When we not specify type of communicator it will be Intra-communicators

mpiComm cc(mpiComm comm, mpiGroup gp): Creates a new communicator.

comm communicator(handle), gp group which is a subset of the group of comm (handle). Return new communicator

mpiComm cc(mpiGroup gp): Same as previous constructor but default comm here is MPI_COMM_WORLD.

mpiComm cc(mpiComm comm, int color, int key): Creates new communicators based on colors and key. This constructor is based on MPI_Comm_split routine of MPI.

mpiComm cc(MPIrank p, int key): Same constructor than the last one.

Here colors and comm is defined in MPIrank. This constructor is based on MPI_Comm_split routine of MPI.

Tip

Split communicator

1mpiComm comm(mpiCommWorld, 0, 0);
2int color = mpiRank(comm)%2;
3mpiComm ccc(processor(color, comm), 0);
4mpiComm qpp(comm, 0, 0);
5mpiComm cp(ccc, color, 0);


mpiComm cc(mpiComm comm, int high): Creates an intracommunicator from an intercommunicator. comm intercommunicator, high.

Used to order the groups within comm (logical) when creating the new communicator. This constructor is based on MPI_Intercomm_merge routine of MPI.

mpiComm cc(MPIrank p1, MPIrank p2, int tag): This constructor creates an intercommuncator from two intracommunicators. p1 defined local (intra)communicator and rank in local_comm of leader (often 0) while p2 defined remote communicator and rank in peer_comm of remote leader (often 0). tag Message tag to use in constructing intercommunicator. This constructor is based on MPI_Intercomm_create.

Tip

Merge

 1mpiComm comm, cc;
2int color = mpiRank(comm)%2;
3int rk = mpiRank(comm);
4int size = mpiSize(comm);
5cout << "Color values: " << color << endl;
6mpiComm ccc(processor((rk<size/2), comm), rk);
7mpiComm cp(cc, color, 0);
9if (rk == 0){ rleader = size/2; }
10else if (rk == size/2){ rleader = 0; }
12mpiComm qqp(processor(0, ccc), processor(rleader, comm), 12345);
13int aaa = mpiSize(ccc);
14cout << "Number of processor: " << aaa << endl;


#### Process

In FreeFEM we wrap MPI process by function call processor which create internal FreeFEM object call MPIrank. This mean that do not use MPIrank in FreeFEM script.

processor(int rk): Keep process rank inside object MPIrank. Rank is inside MPI_COMM_WORLD.

processor(int rk, mpiComm cc) and processor(mpiComm cc, int rk) process rank inside communicator cc.

processor(int rk, mpiComm cc) and processor(mpiComm cc, int rk) process rank inside communicator cc.

processorblock(int rk): This function is exactlly the same than processor(int rk) but is use in case of blocking communication.

processorblock(int rk, mpiComm cc): This function is exactly the same as processor(int rk, mpiComm cc) but uses a synchronization point.

#### Points to Points communicators

In FreeFEM you can call MPI points to points communications functions.

Send(processor(int rk, mpiComm cc), Data D) : Blocking send of Data D to processor of rank rk inside communicator cc. Note that Data D can be: int, real, complex, int[int], real[int], complex[int], Mesh, Mesh3, Matrix.

Recv(processor(int rk, mpiComm cc), Data D): Receive Data D from process of rank rk in communicator cc.

Note that Data D can be: int, real, complex, int[int], real[int], complex[int], Mesh, Mesh3, Matrix and should be the same type than corresponding send.

Isend(processor(int rk, mpiComm cc), Data D) : Non blocking send of Data D to processor of rank rk inside communicator cc.

Note that Data D can be: int, real, complex, int[int], real[int], complex[int], mesh, mesh3, matrix.

Recv(processor(int rk, mpiComm cc), Data D): Receive corresponding to send.

#### Global operations

In FreeFEM you can call MPI global communication functions.

broadcast(processor(int rk, mpiComm cc), Data D): Process rk Broadcast Data D to all process inside communicator cc. Note that Data D can be: int, real, complex, int[int], real[int], complex[int], Mesh, Mesh3, Matrix.

broadcast(processor(int rk), Data D): Process rk Broadcast Data D to all process inside MPI_COMM_WORLD. Note that Data D can be: int, real, complex, int[int], real[int], complex[int], Mesh, Mesh3, Matrix.

mpiAlltoall(Data a, Data b): Sends data a from all to all processes. Receive buffer is Data b. This is done inside communicator MPI_COMM_WORLD.

mpiAlltoall(Data a, Data b, mpiComm cc): Sends data a from all to all processes. Receive buffer is Data b. This is done inside communicator cc.

mpiGather(Data a, Data b, processor(mpiComm, int rk): Gathers together values Data a from a group of processes. Process of rank rk get data on communicator rk. This function is like MPI_Gather.

mpiAllgather(Data a, Data b): Gathers Data a from all processes and distribute it to all in Data b. This is done inside communicator MPI_COMM_WORLD. This function is like MPI_Allgather.

mpiAllgather(Data a, Data b, mpiComm cc): Gathers Data a from all processes and distribute it to all in Data b. This is done inside communicator cc. This function is like MPI_Allgather.

mpiScatter(Data a,Data b,processor(int rk, mpiComm cc)): Sends Data a from one process whith rank rk to all other processes in group represented by communicator mpiComm cc.

mpiReduce(Data a, Data b, processor(int rk, mpiComm cc), MPI_Op op) Reduces values Data a on all processes to a single value Data b on process of rank rk and communicator cc.

Operation use in reduce is: MPI_Op op which can be: mpiMAX, mpiMIN, mpiSUM, mpiPROD, mpiLAND, mpiLOR, mpiLXOR, mpiBAND, mpiBXOR, mpiMAXLOC, mpiMINLOC.

Note that, for all global operations, only int[int] and real[int]` are data type take in account in FreeFEM.