A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators
Hatem Ltaief (University of Tennessee)Stanimire Tomov (University of Tennessee)
Rajib Nath (University of Tennessee)
Peng Du (University of Tennessee)
Jack Dongarra (University of Tennessee)
Abstract:
We present a Cholesky factorization for multicore with GPU
accelerators systems. The challenges in developing scalable high performance
algorithms for these emerging systems stem from their heterogeneity,
massive parallelism, and the huge gap between the GPUs’ compute
power vs the CPU-GPU communication speed. We show an approach
that is largely based on software infrastructures that have already been
developed for homogeneous multicores and hybrid GPU-based computing.
This results in a scalable hybrid Cholesky factorization of unprecedented
performance. In particular, using NVIDIA’s Tesla S1070 (4 C1060
GPUs, each with 30 cores @1.44 GHz) connected to two dual-core AMD
Opteron @1.8GHz processors, we reach up to 1.163 TFlop/s in single
and up to 275 GFlop/s in double precision arithmetic. Compared with
the performance of the embarrassingly parallel xGEMM over four GPUs,
where no communication between GPUs are involved, our algorithm still
runs at 73% and 84% for single and double precision arithmetic respectively.
Keywords:
Numerical Algorithms for CS&E, ,