Applying Parallel Design Techniques to Template Matching with GPUs
Robert Anderson (University of Texas at Dallas)
Steven Kirtzic (University of Texas at Dallas)
Ovidiu Daescu (University of Texas at Dallas)
Designing algorithms for data parallelism can create significant gains in performance on SIMD architectures. The performance of General Purpose GPU's can also benefit from careful analysis of memory usage and data flow due to their large throughput and system memory bottlenecks. In this paper we present an algorithm for template matching that is designed from the beginning for the GPU architecture and achieves greater than an order of magnitude speedup over traditional algorithms designed for the CPU and reimplemented on the GPU. This shows that it is not only desirable to adapt existing algorithms to run on GPUs, but also that future algorithms should be designed with their architecture in mind.
Parallel and Distributed Computing, Image Processing