You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

4.2 KiB

0. 待测线性系统:规模为50w*50w,非零项nnz为1.7e7。来源:lattice晶格压缩工况的IPC仿真。

1. AMGCL:CPU backend (花费主要在solve上)

solver Relaxation Coarsening Time cost (s)
CG gauss_seidel ruge_stuben 51.55
CG gauss_seidel aggregation 31.56
CG gauss_seidel smoothed_aggregation (eps_strong=0.0, block_size = 3) 38.48
CG gauss_seidel smoothed_aggr_emin (eps_strong=0.0, block_size = 3) 42.70
CG spai0 aggregation 51.30
LGMRES gauss_seidel ruge_stuben 25.54
LGMRES gauss_seidel aggregation 17.86
LGMRES gauss_seidel smoothed_aggregation (eps_strong=0.0, block_size = 3) 30.30
LGMRES gauss_seidel smoothed_aggr_emin (eps_strong=0.0, block_size = 3) 34.64
LGMRES spai0 aggregation 38.44

2. AMGCL:CUDA Backend (GPU: GeForce GTX 1080 Ti,花费主要在setup上)

solver Relaxation Coarsening Time cost (s)
CG spai0 ruge_stuben 7.35
CG spai0 aggregation 4.30
CG spai0 smoothed_aggregation (eps_strong=0.0, block_size = 3) 9.98
CG spai0 smoothed_aggr_emin (eps_strong=0.0, block_size = 3) 14.78
BiCGStabL(L=2) spai0 ruge_stuben 7.85
BiCGStabL(L=2) spai0 aggregation 3.88
BiCGStabL(L=2) spai0 smoothed_aggregation (eps_strong=0.0, block_size = 3) 10.10
BiCGStabL(L=2) spai0 smoothed_aggr_emin (eps_strong=0.0, block_size = 3) 15.07
LGMRES spai0 ruge_stuben 7.55
LGMRES spai0 aggregation 4.22
LGMRES spai0 smoothed_aggregation (eps_strong=0.0, block_size = 3) 10.06
LGMRES spai0 smoothed_aggr_emin (eps_strong=0.0, block_size = 3) 14.44

3. 该系统分别用CHOLMOD和cuSolver求解所需要的时间

CHOLMOD 9.05s
cuSolver (GPU: GeForce GTX 1080 Ti, solver=chol, reorder=metis) 17.11s

4. 关于AMGCL中backend,solver,Preconditioner,relaxation,coarsening选取的总结(仅针对该线性系统,不适合所有情况)

backend:
  • CUDA比CPU (builtin)快5-12倍左右
solver:
  • CPU上,较快:FGMRES > LGMRES > CG;BiCGStab没测;IDR(s)和Richardson较慢
  • CUDA上,CG, BiCGStab, LGMRES, FGMRES性能接近,较快;IDR(s)和Richardson较慢

Preconditioners:

  • AMG较快,其他没测试,Composite preconditioner针对the solution of saddle point或者Stokes-like systems
Relaxation:
  • CPU上,较快:Gauss-Seidel和SPAI0,较慢:Damped Jacobi, Chebyshev, ILU, SPAI1
  • CUDA上,较快:SPAI0,较慢:Damped Jacobi, Chebyshev, ILU, SPAI1,不能使用:Gauss-Seidel
Coarsening:
  • CPU和CUDA的结论一致,aggregation > ruge_stuben > smoothed aggregation > smoothed_aggr_emin