Question submitted by (22 June 1999)
|Return to The Archives|
|I've heard a lot about making sure your data is 32-bit aligned for best performance. I haven't been able to find any resources on the internet about how to do this. Does the compiler (I use Microsoft VC++) do it automatically? How does alignment improve performance?|
The 32-byte alignment comes from the fact that a Pentium-based cache
line is 32-bytes long. By staying within the 32-bytes, you avoid the
stall incurred by reading the extra cache line.
If you've got some tight loops that execute a lot of code (i.e. software texture mapping) then the 32-byte alignment won't buy you much since you'll be spending an enormous amount of time in that code, having only incurred the stall(s) on the first time through. You should be aware that if an instruction spans a 32-byte boundary, there can be another stall incurred.
Many compilers do optimize the code so that subroutines and many 'basic blocks' of code are somewhat aligned. Of course, this all depends on the compiler...
Response provided by Paul Nettle
This article was originally an entry in flipCode's Fountain of Knowledge, an open Question and Answer column that no longer exists.