You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

73 lines
4.2 KiB

2 years ago
#### 0. 待测线性系统:规模为50w*50w,非零项nnz为1.7e7。来源:lattice晶格压缩工况的IPC仿真。
#### 1. AMGCL:CPU backend (花费主要在solve上)
| solver | Relaxation | Coarsening | Time cost (s) |
| ---------- | ---------------- | ----------------------------------------------------- | ------------- |
| CG | gauss_seidel | ruge_stuben | 51.55 |
| CG | gauss_seidel | aggregation | 31.56 |
| CG | gauss_seidel | smoothed_aggregation (eps_strong=0.0, block_size = 3) | 38.48 |
| CG | gauss_seidel | smoothed_aggr_emin (eps_strong=0.0, block_size = 3) | 42.70 |
| CG | spai0 | aggregation | 51.30 |
| LGMRES | gauss_seidel | ruge_stuben | 25.54 |
| **LGMRES** | **gauss_seidel** | **aggregation** | **17.86** |
| LGMRES | gauss_seidel | smoothed_aggregation (eps_strong=0.0, block_size = 3) | 30.30 |
| LGMRES | gauss_seidel | smoothed_aggr_emin (eps_strong=0.0, block_size = 3) | 34.64 |
| LGMRES | spai0 | aggregation | 38.44 |
#### 2. AMGCL:CUDA Backend (GPU: GeForce GTX 1080 Ti,花费主要在setup上)
| solver | Relaxation | Coarsening | Time cost (s) |
| ------------------ | ---------- | ----------------------------------------------------- | ------------- |
| CG | spai0 | ruge_stuben | 7.35 |
| CG | spai0 | aggregation | 4.30 |
| CG | spai0 | smoothed_aggregation (eps_strong=0.0, block_size = 3) | 9.98 |
| CG | spai0 | smoothed_aggr_emin (eps_strong=0.0, block_size = 3) | 14.78 |
| BiCGStabL(L=2) | spai0 | ruge_stuben | 7.85 |
| **BiCGStabL(L=2)** | **spai0** | **aggregation** | **3.88** |
| BiCGStabL(L=2) | spai0 | smoothed_aggregation (eps_strong=0.0, block_size = 3) | 10.10 |
| BiCGStabL(L=2) | spai0 | smoothed_aggr_emin (eps_strong=0.0, block_size = 3) | 15.07 |
| LGMRES | spai0 | ruge_stuben | 7.55 |
| LGMRES | spai0 | aggregation | 4.22 |
| LGMRES | spai0 | smoothed_aggregation (eps_strong=0.0, block_size = 3) | 10.06 |
| LGMRES | spai0 | smoothed_aggr_emin (eps_strong=0.0, block_size = 3) | 14.44 |
#### 3. 该系统分别用CHOLMOD和cuSolver求解所需要的时间
| CHOLMOD | 9.05s |
| ------------------------------------------------------------ | ------ |
| cuSolver (GPU: GeForce GTX 1080 Ti, solver=chol, reorder=metis) | 17.11s |
#### 4. 关于AMGCL中backend,solver,Preconditioner,relaxation,coarsening选取的总结(仅针对该线性系统,不适合所有情况)
##### backend:
* CUDA比CPU (builtin)快**5-12倍**左右
##### solver:
* CPU上,较快:FGMRES > LGMRES > CG;BiCGStab没测;IDR(s)和Richardson较慢
* CUDA上,CG, BiCGStab, LGMRES, FGMRES性能接近,较快;IDR(s)和Richardson较慢
**Preconditioners:**
* AMG较快,其他没测试,Composite preconditioner针对the solution of saddle point或者Stokes-like systems
##### Relaxation:
* CPU上,较快:Gauss-Seidel和SPAI0,较慢:Damped Jacobi, Chebyshev, ILU, SPAI1
* CUDA上,较快:SPAI0,较慢:Damped Jacobi, Chebyshev, ILU, SPAI1,不能使用:Gauss-Seidel
##### Coarsening:
* CPU和CUDA的结论一致,aggregation > ruge_stuben > smoothed aggregation > smoothed_aggr_emin