You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
73 lines
4.2 KiB
73 lines
4.2 KiB
2 years ago
|
#### 0. 待测线性系统:规模为50w*50w,非零项nnz为1.7e7。来源:lattice晶格压缩工况的IPC仿真。
|
||
|
|
||
|
|
||
|
|
||
|
#### 1. AMGCL:CPU backend (花费主要在solve上)
|
||
|
|
||
|
| solver | Relaxation | Coarsening | Time cost (s) |
|
||
|
| ---------- | ---------------- | ----------------------------------------------------- | ------------- |
|
||
|
| CG | gauss_seidel | ruge_stuben | 51.55 |
|
||
|
| CG | gauss_seidel | aggregation | 31.56 |
|
||
|
| CG | gauss_seidel | smoothed_aggregation (eps_strong=0.0, block_size = 3) | 38.48 |
|
||
|
| CG | gauss_seidel | smoothed_aggr_emin (eps_strong=0.0, block_size = 3) | 42.70 |
|
||
|
| CG | spai0 | aggregation | 51.30 |
|
||
|
| LGMRES | gauss_seidel | ruge_stuben | 25.54 |
|
||
|
| **LGMRES** | **gauss_seidel** | **aggregation** | **17.86** |
|
||
|
| LGMRES | gauss_seidel | smoothed_aggregation (eps_strong=0.0, block_size = 3) | 30.30 |
|
||
|
| LGMRES | gauss_seidel | smoothed_aggr_emin (eps_strong=0.0, block_size = 3) | 34.64 |
|
||
|
| LGMRES | spai0 | aggregation | 38.44 |
|
||
|
|
||
|
|
||
|
|
||
|
#### 2. AMGCL:CUDA Backend (GPU: GeForce GTX 1080 Ti,花费主要在setup上)
|
||
|
|
||
|
| solver | Relaxation | Coarsening | Time cost (s) |
|
||
|
| ------------------ | ---------- | ----------------------------------------------------- | ------------- |
|
||
|
| CG | spai0 | ruge_stuben | 7.35 |
|
||
|
| CG | spai0 | aggregation | 4.30 |
|
||
|
| CG | spai0 | smoothed_aggregation (eps_strong=0.0, block_size = 3) | 9.98 |
|
||
|
| CG | spai0 | smoothed_aggr_emin (eps_strong=0.0, block_size = 3) | 14.78 |
|
||
|
| BiCGStabL(L=2) | spai0 | ruge_stuben | 7.85 |
|
||
|
| **BiCGStabL(L=2)** | **spai0** | **aggregation** | **3.88** |
|
||
|
| BiCGStabL(L=2) | spai0 | smoothed_aggregation (eps_strong=0.0, block_size = 3) | 10.10 |
|
||
|
| BiCGStabL(L=2) | spai0 | smoothed_aggr_emin (eps_strong=0.0, block_size = 3) | 15.07 |
|
||
|
| LGMRES | spai0 | ruge_stuben | 7.55 |
|
||
|
| LGMRES | spai0 | aggregation | 4.22 |
|
||
|
| LGMRES | spai0 | smoothed_aggregation (eps_strong=0.0, block_size = 3) | 10.06 |
|
||
|
| LGMRES | spai0 | smoothed_aggr_emin (eps_strong=0.0, block_size = 3) | 14.44 |
|
||
|
|
||
|
|
||
|
|
||
|
#### 3. 该系统分别用CHOLMOD和cuSolver求解所需要的时间
|
||
|
|
||
|
| CHOLMOD | 9.05s |
|
||
|
| ------------------------------------------------------------ | ------ |
|
||
|
| cuSolver (GPU: GeForce GTX 1080 Ti, solver=chol, reorder=metis) | 17.11s |
|
||
|
|
||
|
|
||
|
|
||
|
#### 4. 关于AMGCL中backend,solver,Preconditioner,relaxation,coarsening选取的总结(仅针对该线性系统,不适合所有情况)
|
||
|
|
||
|
##### backend:
|
||
|
|
||
|
* CUDA比CPU (builtin)快**5-12倍**左右
|
||
|
|
||
|
##### solver:
|
||
|
|
||
|
* CPU上,较快:FGMRES > LGMRES > CG;BiCGStab没测;IDR(s)和Richardson较慢
|
||
|
* CUDA上,CG, BiCGStab, LGMRES, FGMRES性能接近,较快;IDR(s)和Richardson较慢
|
||
|
|
||
|
**Preconditioners:**
|
||
|
|
||
|
* AMG较快,其他没测试,Composite preconditioner针对the solution of saddle point或者Stokes-like systems
|
||
|
|
||
|
##### Relaxation:
|
||
|
|
||
|
* CPU上,较快:Gauss-Seidel和SPAI0,较慢:Damped Jacobi, Chebyshev, ILU, SPAI1
|
||
|
* CUDA上,较快:SPAI0,较慢:Damped Jacobi, Chebyshev, ILU, SPAI1,不能使用:Gauss-Seidel
|
||
|
|
||
|
##### Coarsening:
|
||
|
|
||
|
* CPU和CUDA的结论一致,aggregation > ruge_stuben > smoothed aggregation > smoothed_aggr_emin
|
||
|
|