What is MFLOPS?

 What is MFLOPS? Million Floating Point Operations Per Second
MFLOPS
Million Floating-Point Operations Per Second

     People often mean MFLOPS to mean different things, but a general definition would be the number of full word-size fp multiply operations that can be performed per second (the M stands for 'Million'). Obviously, fp add or subtract operations take less time and slowest of all is fp divide. Older CPUs take many clock cycles to complete one FLOP and so, even at a high clock speeds, their FLOP rate can be low. An example is the 486DX4/100 which is rated at about 6MFLOPS. Compare this to the 200MHz R4400 which is rated at about 35MFLOPS. For older processors, clock speed is clearly no indication of MFLOP rate.
     Newer designs don't mean things become clearer - if anything the situation is more complex, since the situation is often the reverse: CPUs like the R10000 can do two fp operations each clock cycle, giving it a rating of 400MFLOPS at 200MHz. The R8000 is even more confusing since it has two fp execution units, each capable of doing two fp ops/clock, giving it a rating of 360MFLOPS at 90MHz! (that's ten times faster than an Intel P90).
     Again, the nature of the task is important. A 64bit CPU that can do 400MFLOPS may be fine, but if one's work only needs 32bit processing then much of the CPU's capabilities are being wasted. CPUs like the R5000 address this problem, aiming at markets that do not need 64bit floating point (fp for short) processing. Future designs like MDMX will solve the wastage problem, but it will also make the measuring of CPU performance even harder. Perhaps CPU capability is a better metric, but no one has devised such a test yet. There are just a wide variety of benchmarks and one must use the most appropriate test as a basis for decision making.
     All this talk of MFLOPS is fine, but it misses one very important point: memory bandwidth. A fast CPU may sound impressive, and PR people will always talk in terms of theoretical peak performance, etc., but in reality a CPU's best possible performance totally depends on the rate at which it can access data from the various kinds of memory (L1 / L2 cache and main RAM). A fast CPU in a system with low memory bandwidth will not perform anywhere near its theoretical peak (eg. 500MHz Alpha). I have studied the effect of this on the 195MHz R10000 and the results are very interesting.
     What is important here with regard to the N64 is that SGI have given it a very high memory bandwidth indeed (500MB/sec peak, ie. almost 4 times faster than PCI). The N64's memory design uses Rambus technology, which is also used in SGI's IMPACT graphics technology.

More on this subject
Beginner's Help
BUG Club Home

 What is MFLOPS? Million Floating Point Operations Per Second