jeudi 30 septembre 2010

Memory alignment : theory and c++ examples

--- [ e-on software research ] ---

1. Alignment theory
a. Definition
b. How processor fetch memory
c. Data structure padding

2. C++ examples
a. What the c++ specification says
b. GCC and Visual c++ x86 / x86-64 implementation
c. Benchmarks
d. Controlling alignment and padding
e. Common data type size and alignment

3. References


1. Alignment theory


This post is a refactoring of what you can find over the web. Used sources can be found in references section.


a. Definition


The alignment of a given variable is the largest power-of-2 value, where the address of the variable, modulo this power-of-two value is 0, that is :
address modulo alignment = 0
We will call this variable alignment-byte aligned.

Note
– Different types can have different alignment requirement
– If x > y, and both x and y are power-of-two values, a variable that is x-byte aligned is also y-byte aligned

Example
Address (bytes)Alignment
0x00infinite
0x011-byte
0x022-byte
0x031-byte
0x044-byte (so also 2-byte)
0x051-byte
0x062-byte
0x071-byte
0x088-byte (4 and 2)


b. How processor fetch memory


Aligned address
– Read the chunk and place it into the register

Unaligned address
– read the first chunk of the unaligned address
– shift out the "unwanted" bytes from the first chunk
– read the second chunk of the unaligned address
– shift out some of its information
– merged together the two chunks for placement in the register

Compared to only read a chunk, it's a lot of work !

schema

Some processors just aren't willing to do all of that work for you :
– exception (68000)
– nothing
– something wrong (Altivec, Itanium)


c. Data structure padding


Compilators add unnamed data members in structures :
– After members, to keep members aligned on their required alignment
– After the last member to keep structure aligned in arrays

Note
To keep these two constraints, a structure alignment requirement, is the stricter member alignment requirement.

Example
We take :
– char : 1-byte aligned and take 1 byte
– int : 2-byte aligned. And take 2 byte

struct S // must be 2–byte aligned
{
   char c1; // can be placed on any address
   int i;   // must be 2-byte aligned
   char c2; // can be placed on any address
};
S s[2]; // sizeof(s) == 10 bytes (2 bytes for padding)


AddressVariable
0x0s[0].c1
0x1unnamed member
0x2s[0].i
0x3s[0].i
0x4S[0].c2
0x5s[1].c1
0x6unnamed member
0x7s[1].i
0x8s[1].i
0x9S[1].c2

Tips
– We could have saved the 2 padding bytes by placing c2 just before i
– With power of two alignments ascending/descending (by size) declaration deliver an optimal size, but writing readable code should be your primary goal


2. C++ examples


a. What the c++ specification says


The C+ + memory model [intro.memory] (1.7 § 1)
The fundamental storage unit in the C + + memory model is the byte. A byte […] is composed of a contiguous sequence of bits, the number of which is implementation-defined.
Types (3.9 §5)
[…] The alignment of a complete object type is an implementation-defined integer value representing a number of bytes; an object is allocated at an address that meets the alignment requirements of its object type.
Sizeof (5.3.3 §2)
When applied to a class, the result is the number of bytes in an object of that class including any padding required for placing objects of that type in an array.


b. GCC and Visual c++ x86/x86-64 implementation


For performance reason, all types are aligned on their natural lengths, except items that are greater than 8 bytes in length,. It is recommended that all structures larger than 16 bytes align on 16-byte boundaries.

In general, for the best performance, align data as follows:
– align 8-bit data at any address
– align 16-bit data to be contained within an aligned four-byte word
– align 32-bit data so that its base address is a multiple of four
– align 64-bit data so that its base address is a multiple of eight
– align 80-bit data so that its base address is a multiple of sixteen
– align 128-bit data so that its base address is a multiple of sixteen

SSE2 instructions on x86 CPUs do require the data to be 128-bit (16-byte) aligned and there can be substantial performance advantages from using aligned data on these architectures.


c. Benchmarks


9 000 000 iterations double copy from source[i] to dest[i].

unaligned / aligned access time ratio :
– pentium III (731 MHz) : 3.25 times slower
– pentium IV (2.53 GHz) : 2 times slower
– itanium2 (900 MHz) : 459 times slower


d. Controlling alignment and padding


visual 2008
#pragma pack(4) // 4-byte aligned
struct S
{
   char c;   // 1-byte aligned
   double d; // 4-byte aligned instead of 8-bytes aligned
             // causes warning C4121
};
#pragma pack() // reset to default

This pragma directive permits to have a maximum alignment of N-byte.


gcc 4
GCC understands pragma pack as visual. But can use more accurate syntaxes :
struct foo
{
   int x[2] __attribute__ ((aligned (8))); // minimum 8-byte aligned
};

struc foo
{
   char a;
   int x[2] __attribute__ ((packed)); // pack this member behind a
};


e. Common data type size and alignment


VISUAL C++ / GCC (WIN32 )
typesize (bytes)alignment(byte)
void *44
bool11
char11
short22
int44
long44
float44
double88

VISUAL C++ (WIN64)
typesize (bytes)alignment(byte)
void *88
bool11
char11
short22
int44
long44
float44
double88

MAC OS 10.6 (32 bits)
typesize (bytes)alignment(byte)
void *44
bool11
char11
short22
int44
long44
float44
double84

MAC OS 10.6 (64 bits)
typesize (bytes)alignment(byte)
void *88
bool11
char11
short22
int44
long88
float44
double88


3. References


C++ specification
INTERNATIONAL STANDARD ISO/IEC 14882 Second edition 2003-10-15

WIKIPEDIA
http://en.wikipedia.org/wiki/Data_structure_alignment

IBM
http://www.ibm.com/developerworks/library/pa-dalign

Microsoft MSDN
http://msdn.microsoft.com/en-us/library/aa290049%28VS.71%29.aspx

Intel
http://software.intel.com/en-us/articles/data-alignment-when-migrating-to-64-bit-intel-architecture

Aucun commentaire:

Enregistrer un commentaire