Efficient code design for RISC processors
The availability of powerful RISC processors is of major importance in today's market since they are used both in workstations and in the most recent parallel computers.
In this talk, we outline the main features of RISC-based architectures and try to identify what are the main performance bottlenecks. We describe some of the classical tuning techniques
and show their impact on the code performance.
The content of the tutorial will be the following:
We illustrate our talk using examples coming from computational kernels used in linear algebra and industrial codes. We show performance obtained on some of the fastest RISC processors currently available (DEC alpha, IBM P2SC, UltraSPARC, MIPS R10000, HP PA,.,...).
We also show how the use of a vector or a RISC processor can influence the code design.
- The RISC concept
- Architectural design (superscalar and superpipeline RISC, memory hierarchy,...)
- Overview of current RISC processors
- Performance models for RISC processors (using examples)
- Code tuning techniques (blocking, loop-unrolling, copying, prefetching,....), use of profilers and performance analyzers
- Use of numerical libraries
- Examples using industrial codes
Note that I have already run this course in the past with hands-on sessions. If there is any interest, it would perhaps be possible of having a practical session, using RISC workstations, if the number of participants is not over 15.