C array initialisation behind the scenes
5/27/2025
What happens when you initialize an array in C ?
Let’s see what assembly instructions (x86_64) are used by the compiler when we initialize an array with zeros.
Our simple test program will be this one, with the size varying between the tests.
#define SIZE 1
int main(void) {
int tab[SIZE] = { 0 };
return 0;
}
Compiler: gcc 15.1, platform: x86_64
You can use https://godbolt.org/ to test by yourself.
Arrays with size known at compile time
SIZE = 1
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 0
mov eax, 0
pop rbp
ret
With a size of 1, the compiler treats the array as if we declared an int.
Moving 0 into rbp-4
SIZE = 2
Same output but with rbp-8
SIZE = 3
main:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-12], 0
mov DWORD PTR [rbp-4], 0
mov eax, 0
pop rbp
ret
Now we have 2 instructions:
- the first one setting 8 bytes to 0 (tab[0] and tab[1])
- the second one setting 4 bytes to 0 (tab[2])
SIZE = 4
Same as before but setting 2 QWORDS to 0
4x4 bytes = 16 bytes = 128 bits
SIZE = 5
main:
push rbp
mov rbp, rsp
pxor xmm0, xmm0
movaps XMMWORD PTR [rbp-32], xmm0
movd DWORD PTR [rbp-16], xmm0
mov eax, 0
pop rbp
ret
Things become insteresting, with 5 ints, we need 5x4 bytes = 20 bytes = 160 bits.
That means, in theory, that we should use another mov instruction for a total of 3 movs, like this:
main:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-20], 0
mov QWORD PTR [rbp-12], 0
mov DWORD PTR [rbp-4], 0
mov eax, 0
pop rbp
ret
But that’s not what the compiler is doing, instead it uses SIMD (Single Instruction Multiple Data) instructions.
Those instructions can manipulate 128bits (or 256, 512) operands with special registers (xmm, ymm, zmm).
Let’s describe each instruction used here.
pxor xmm0, xmm0
pxor a, b does a bitwise XOR between a and b, or a ^ b, if you prefer.
Here it does a XOR on the same register xmm0, setting it to 0.
xmm0 is a 128bit register.
movaps XMMWORD PTR [rbp-32], xmm0
movaps: Move Aligned Packed Single Precision Floating-Point Values. Here we move xmm0 (128bits) into rbp-32.
Except we need 20 bytes so it should be rbp-20 right ?
Yes but per the spec, if one of the operands is a memory operand, the operand must be aligned on a 16-byte boundary.
The next multiple of 16 greater than 20 is 32, that’s why we reserve 32 bytes.
After this instruction, we are left with 160-128 = 32 bits to set to 0, or 4 bytes.
This leads us to the next instruction.
movd DWORD PTR [rbp-16], xmm0
movd: Move Doubleword
It moves 32bits from xmm0 to rbp-16, since xmm0 is still full of zeros, this completes the array assignment to 0 (tab[4]).
I don’t know if reusing xmm0 is faster than doing a mov DWORD PTR [rbp-16], 0
, but let’s say it is.
SIZE 5 to 20
The compiler uses multiple SIMD instructions on xmm0 registers.
SIZE = 21
New instructions appear !
main:
push rbp
mov rbp, rsp
lea rdx, [rbp-96]
mov eax, 0
mov ecx, 10
mov rdi, rdx
rep stosq
mov rdx, rdi
mov DWORD PTR [rdx], eax
add rdx, 4
mov eax, 0
pop rbp
ret
Let’s describe step by step what’s happening there.
lea rdx, [rbp-96]
lea: Load Effective Address, loads the address rbp-96 into the register rdx.
We should bet at rbp-(21*4) = rbp-84, but I guess it’s aligned on a 16-byte boundary, and 96 is the next multiple greater than 84.
So rbp-96 is the address of tab[0].
mov eax, 0
Straightforward: moving 0 into eax, but I don’t know why yet
mov ecx, 10
Straightforward: moving 10 into ecx, but I don’t know why
Note that we are using the 32-bit names of the registers, instead of rax and rcx.
But it doesn’t matter since the most significant bits are set to zero.
mov rdi, rdx
Straightforward: moving rdx into rdi, so basically it puts the address of tab[0] into rdi.
rep stosq
Now that’s really something new to me.
repeat store quadword: copies the value of rax into [rdi] (at the address contained by rdi), X times.
X is fetched from ecx.
rdi is automatically incremented after each copy, by sizeof(qword) for a total of 10 * 8 = 80, so rdi = rbp-16 at the end of the instruction.
The previous mov instructions now make sense:
- eax stores the value to copy
- ecx stores the iteration number
- rdi is the start address of the copying process
So, we have assigned zeros to 80 bytes but remember our array is 84-bytes long, so we still have work to do.
mov rdx, rdi
rdi is saved into rdx, so rdx is now rbp-16, or &tab[0].
mov DWORD PTR [rdx], eax
We already now what this does, it sets tab[20] to 0 ! (Remember eax contains zero)
add rdx, 4
The compiler increments rdx with 4, which is rbp-12, even though we don’t need it anymore since we assigned every cell of our array to zero.
Phew, all this work for 21 ints, let’s continue.
SIZE 22 to 2049
About the same instructions with rep stosq
SIZE = 2050
main:
push rbp
mov rbp, rsp
sub rsp, 8208
lea rax, [rbp-8208]
mov edx, 8200
mov esi, 0
mov rdi, rax
call memset
mov eax, 0
leave
ret
Finally some changes. As usual time to understand what’s going on.
sub rsp, 8208
The compiler substract 8208 to rsp.
That means we are reserving 8208 bytes on the stack.
So our array of 2050 ints is of size 2050 * 4 = 8200 bytes.
Like before, for the 16-byte alignment we use the next multiple of 16 which is 8208.
All good.
lea rax, [rbp-8208]
Now we know what that means, we load the address rbp-8208 (&tab[0]) into rax.
mov edx, 8200
mov esi, 0
mov rdi, rax
I took those 3 mov as a block, because those are arguments for the next instruction.
8200 -> edx: the size of our array, in bytes
0 -> esi: the value we want to assign to each cell
rax -> rdi: the address of our array (&tab[0])
call memset
Here we have a call to a function called memset.
The memset function (provided by the libc) is used to fill memory with a certain value.
The calling convention for x86_64 is to use RDI, RSI, RDX, RCX, R8, R9 for arguments 1 to 6.
Here we only need 3, so RDI, RSI and RDX(EDX) are used.
Address into memory: RDI - 1st arg (&tab[0])
Assignment value: RSI - 2ng arg (0)
Size to fill (bytes): RDX - 3rd arg (8200 = sizeof(tab))
So the compiler decided that after 8200 bytes, it’s better to let the job to the libc to initialize the array.
And it’s actually the same for arrays of greater sizes.
We’ve seen how array init is done for fixed size arrays, C99 allows VLA (Variable Length Arrays) and I may investigate what happens in this case, probably in a future post !
Thanks for reading :)