Efficient code design for RISC processors

The availability of powerful RISC processors is of major importance in today's market since they are used both in workstations and in the most recent parallel computers.

In this talk, we outline the main features of RISC-based architectures and try to identify what are the main performance bottlenecks. We describe some of the classical tuning techniques and show their impact on the code performance.
We illustrate our talk using examples coming from computational kernels used in linear algebra and industrial codes. We show performance obtained on some of the fastest RISC processors currently available (DEC alpha, IBM P2SC, UltraSPARC, MIPS R10000, HP PA,.,...).
We also show how the use of a vector or a RISC processor can influence the code design.

The content of the tutorial will be the following:
  • Introduction
  • The RISC concept
  • Architectural design (superscalar and superpipeline RISC, memory hierarchy,...)
  • Overview of current RISC processors
  • Performance models for RISC processors (using examples)
  • Code tuning techniques (blocking, loop-unrolling, copying, prefetching,....), use of profilers and performance analyzers
  • Use of numerical libraries
  • Examples using industrial codes

Note that I have already run this course in the past with hands-on sessions. If there is any interest, it would perhaps be possible of having a practical session, using RISC workstations, if the number of participants is not over 15.