C array initialisation behind the scenes

5/27/2025

What happens when you initialize an array in C ?

Let’s see what assembly instructions (x86_64) are used by the compiler when we initialize an array with zeros.

Our simple test program will be this one, with the size varying between the tests.

#define SIZE 1

int main(void) {
    int tab[SIZE] = { 0 };
    return 0;
}

Compiler: gcc 15.1, platform: x86_64

You can use https://godbolt.org/ to test by yourself.

Arrays with size known at compile time

SIZE = 1

main:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 0
        mov     eax, 0
        pop     rbp
        ret

With a size of 1, the compiler treats the array as if we declared an int.
Moving 0 into rbp-4

SIZE = 2

Same output but with rbp-8

SIZE = 3

main:
        push    rbp
        mov     rbp, rsp
        mov     QWORD PTR [rbp-12], 0
        mov     DWORD PTR [rbp-4], 0
        mov     eax, 0
        pop     rbp
        ret

Now we have 2 instructions:

the first one setting 8 bytes to 0 (tab[0] and tab[1])
the second one setting 4 bytes to 0 (tab[2])

SIZE = 4

Same as before but setting 2 QWORDS to 0

4x4 bytes = 16 bytes = 128 bits

SIZE = 5

main:
        push    rbp
        mov     rbp, rsp
        pxor    xmm0, xmm0
        movaps  XMMWORD PTR [rbp-32], xmm0
        movd    DWORD PTR [rbp-16], xmm0
        mov     eax, 0
        pop     rbp
        ret

Things become insteresting, with 5 ints, we need 5x4 bytes = 20 bytes = 160 bits.
That means, in theory, that we should use another mov instruction for a total of 3 movs, like this:

main:
        push    rbp
        mov     rbp, rsp
        mov     QWORD PTR [rbp-20], 0
        mov     QWORD PTR [rbp-12], 0
        mov     DWORD PTR [rbp-4], 0
        mov     eax, 0
        pop     rbp
        ret

But that’s not what the compiler is doing, instead it uses SIMD (Single Instruction Multiple Data) instructions.

Those instructions can manipulate 128bits (or 256, 512) operands with special registers (xmm, ymm, zmm).

Let’s describe each instruction used here.

pxor    xmm0, xmm0

pxor a, b does a bitwise XOR between a and b, or a ^ b, if you prefer.
Here it does a XOR on the same register xmm0, setting it to 0.
xmm0 is a 128bit register.

movaps  XMMWORD PTR [rbp-32], xmm0

movaps: Move Aligned Packed Single Precision Floating-Point Values. Here we move xmm0 (128bits) into rbp-32.

Except we need 20 bytes so it should be rbp-20 right ?

Yes but per the spec, if one of the operands is a memory operand, the operand must be aligned on a 16-byte boundary.
The next multiple of 16 greater than 20 is 32, that’s why we reserve 32 bytes.

After this instruction, we are left with 160-128 = 32 bits to set to 0, or 4 bytes.

This leads us to the next instruction.

movd    DWORD PTR [rbp-16], xmm0

movd: Move Doubleword
It moves 32bits from xmm0 to rbp-16, since xmm0 is still full of zeros, this completes the array assignment to 0 (tab[4]).

I don’t know if reusing xmm0 is faster than doing a mov DWORD PTR [rbp-16], 0, but let’s say it is.

SIZE 5 to 20

The compiler uses multiple SIMD instructions on xmm0 registers.

SIZE = 21

New instructions appear !

main:
        push    rbp
        mov     rbp, rsp
        lea     rdx, [rbp-96]
        mov     eax, 0
        mov     ecx, 10
        mov     rdi, rdx
        rep stosq
        mov     rdx, rdi
        mov     DWORD PTR [rdx], eax
        add     rdx, 4
        mov     eax, 0
        pop     rbp
        ret

Let’s describe step by step what’s happening there.

lea     rdx, [rbp-96]

lea: Load Effective Address, loads the address rbp-96 into the register rdx.
We should bet at rbp-(21*4) = rbp-84, but I guess it’s aligned on a 16-byte boundary, and 96 is the next multiple greater than 84.
So rbp-96 is the address of tab[0].

mov     eax, 0

Straightforward: moving 0 into eax, but I don’t know why yet

mov     ecx, 10

Straightforward: moving 10 into ecx, but I don’t know why

Note that we are using the 32-bit names of the registers, instead of rax and rcx.
But it doesn’t matter since the most significant bits are set to zero.

mov     rdi, rdx

Straightforward: moving rdx into rdi, so basically it puts the address of tab[0] into rdi.

rep stosq

Now that’s really something new to me.
repeat store quadword: copies the value of rax into [rdi] (at the address contained by rdi), X times.
X is fetched from ecx.

rdi is automatically incremented after each copy, by sizeof(qword) for a total of 10 * 8 = 80, so rdi = rbp-16 at the end of the instruction.

The previous mov instructions now make sense:

eax stores the value to copy
ecx stores the iteration number
rdi is the start address of the copying process

So, we have assigned zeros to 80 bytes but remember our array is 84-bytes long, so we still have work to do.

mov     rdx, rdi

rdi is saved into rdx, so rdx is now rbp-16, or &tab[0].

mov     DWORD PTR [rdx], eax

We already now what this does, it sets tab[20] to 0 ! (Remember eax contains zero)

add     rdx, 4

The compiler increments rdx with 4, which is rbp-12, even though we don’t need it anymore since we assigned every cell of our array to zero.

Phew, all this work for 21 ints, let’s continue.

SIZE 22 to 2049

About the same instructions with rep stosq

SIZE = 2050

main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 8208
        lea     rax, [rbp-8208]
        mov     edx, 8200
        mov     esi, 0
        mov     rdi, rax
        call    memset
        mov     eax, 0
        leave
        ret

Finally some changes. As usual time to understand what’s going on.

sub     rsp, 8208

The compiler substract 8208 to rsp.
That means we are reserving 8208 bytes on the stack.
So our array of 2050 ints is of size 2050 * 4 = 8200 bytes.

Like before, for the 16-byte alignment we use the next multiple of 16 which is 8208.
All good.

lea     rax, [rbp-8208]

Now we know what that means, we load the address rbp-8208 (&tab[0]) into rax.

mov     edx, 8200
mov     esi, 0
mov     rdi, rax

I took those 3 mov as a block, because those are arguments for the next instruction.

8200 -> edx: the size of our array, in bytes
0 -> esi: the value we want to assign to each cell
rax -> rdi: the address of our array (&tab[0])

call    memset

Here we have a call to a function called memset.
The memset function (provided by the libc) is used to fill memory with a certain value.

The calling convention for x86_64 is to use RDI, RSI, RDX, RCX, R8, R9 for arguments 1 to 6.
Here we only need 3, so RDI, RSI and RDX(EDX) are used.

Address into memory: RDI - 1st arg (&tab[0])
Assignment value: RSI - 2ng arg (0)
Size to fill (bytes): RDX - 3rd arg (8200 = sizeof(tab))

So the compiler decided that after 8200 bytes, it’s better to let the job to the libc to initialize the array.

And it’s actually the same for arrays of greater sizes.

We’ve seen how array init is done for fixed size arrays, C99 allows VLA (Variable Length Arrays) and I may investigate what happens in this case, probably in a future post !

Thanks for reading :)