Om SIMD Programming Manual for Linux and Wi
A number of widely used contemporary processors have instruction-set extensions for improved performance in multi-media applications. The aim is to allow operations to proceed on multiple pixels each clock cycle. Such instruction-sets have been incorporated both in specialist DSPchips such as the Texas C62xx (Texas Instruments, 1998) and in general purpose CPU chips like the Intel IA32 (Intel, 2000) or the AMD K6 (Advanced Micro Devices, 1999). These instruction-set extensions are typically based on the Single Instruc- tion-stream Multiple Data-stream (SIMD) model in which a single instruction causes the same mathematical operation to be carried out on several operands, or pairs of operands, at the same time. The level or parallelism supported ranges from two floating point operations, at a time on the AMD K6 architecture to 16 byte operations at a time on the Intel P4 architecture. Whereas processor architectures are moving towards greater levels of parallelism, the most widely used programming languages such as C, Java and Delphi are structured around a model of computation in which operations takeplace on a single value at a time. This was appropriate when processors worked this way, but has become an impediment to programmers seeking to make use of the performance offered by multi-media instruction -sets. The introduction of SIMD instruction sets (Peleg et al.