Factors Impacting Performance of Multithreaded Sparse Triangular Solve
Michael Wolf (Sandia National Laboratories)
Mike Heroux (Sandia National Laboratories)
Erik Boman (Sandia National Laboratories)
As computational science applications grow more parallel with multi-core supercomputers having hundreds of thousands of computational cores, it will become increasingly difficult for solvers to scale. Our approach is to use hybrid MPI/threaded numerical algorithms to solve these systems in order to reduce the number of MPI tasks and increase the parallel efficiency of the algorithm. However, we need efficient threaded numerical kernels to run on the multi-core nodes in order to achieve good parallel efficiency. In this paper, we focus on improving the performance of a multithreaded triangular solver, an important kernel for preconditioning. We analyze three factors that affect the parallel performance of this threaded kernel and obtain good scalability on the multi-core nodes for a range of matrix sizes.
Parallel and Distributed Computing, Numerical Algorithms for CS&E