Tiniest hello world
10/15/2024
What’s the goal ?
Trying to output the smallest binary for a C Hello World.
Why ?
For fun and learning, other quesions ?
What does the hello world do ?
Write “Hello World!” to the standard output.
Environment, platform, architecture ?
Platform: Linux x86_64
Compiler: gcc
Executable format: ELF 64-bit
How do we measure the size of the output binary ?
Some would measure the size by the number of opcodes in the resulting binary.
For simplicity we will measure the resulting file size of the binary:
stat -c %s a.out
This command will provide the size of the file in bytes.
Note that there are other ways to get the file size but it is imho the most simple and convenient.
All code used here is available in my github repo.
Let’s start with the base code (01):
#include <stdio.h>
int main () {
printf("Hello World!");
}
Pretty simple right ?
But this simple 5 lines code is compiled to a 15448
bytes binary.
That’s waaaaaaaaay to much for us.
Let’s find ways to shrink that.
gcc optimizations
gcc has some optimization flags that you can use to increase program speed, reduce it’s size…
You can check what’s available for your version of gcc with gcc -Q --help=optimizers
.
I tried many options like -0s
and -0z
but I didn’t find any changes in output size.
Time to try something else.
Replacing printf by a lower level function (02)
What if we replaced the printf
call by the write
syscall ?
Would it change anything ?
Let’s see.
#include <unistd.h>
int main () {
write(1, "Hello World!", 12);
}
Binary size: 15448
bytes
No changes, but we will keep the write
version for the next steps.
Get rid of the main function and the C runtime (03)
Did you know that you could write a C program without a main
function ?
Let’s see how this works.
#include <unistd.h>
void _start () {
write(1, "Hello World!", 12);
_exit(0);
}
Okay so what happened there ?
_start
is actually the entrypoint of a program run on Linux.
The _start
function provided by C:
- initializes C runtime stuff
- gets arguments (argc, argv)
- calls your
main
function - handles the result returned and exit
Note that it is MY understanding of how things work and it is not an exhaustive list.
What changed in the code ?
We repaced the main
function by the _start
function.
We had to explicitly call _exit
to terminate the program (it segfaults if we don’t).
Note that we could use exit
instead of _exit
, the binary size is the same.
Time to compile
/usr/bin/ld: /tmp/cc2wHwD6.o: in function `_start':
03_hello.c:(.text+0x0): multiple definition of `_start'; /usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../lib/Scrt1.o:(.text+0x0): first defined here
/usr/bin/ld: /usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../lib/Scrt1.o: in function `_start':
(.text+0x1b): undefined reference to `main'
collect2: error: ld returned 1 exit status
UH-OH what is going on ?
The linker says that
- it found multiple definitions of
_start
main
was not found
Indeed we forgot to tell gcc to NOT use the C _start
function (which expects a main
function to be found).
Let’s change this: gcc -nostartfiles <YOUR_C_FILE>
Recompiling…and YES we got our new binary !
Do not forget to test it before getting its size, we want a small binary but above all a WORKING binary :)
Binary size: 14296
bytes
YES ! Finally some improvement. I know, I know it’s not that much but we can do better.
Get rid of the libc (04)
This one will be tough.
Libc implements Linux system calls, provides convenient methods to deal with strings, memory…
How do we talk to the OS without the libc ?
We need assembly.
#include <unistd.h>
#include <stdint.h>
void _start () {
asm (
"movq $1, %%rax;"
"movq %0, %%rdi;"
"movq %1, %%rsi;"
"movq %2, %%rdx;"
"syscall;"
:
: "r"((uint64_t)1), "r"((const char*)"Hello World!"), "r"((size_t)12)
: "%rax", "%rdi", "%rsi", "%rdx"
);
asm (
"movq $60, %%rax;"
"movq $0, %%rdi;"
"syscall;"
:
:
: "%rax", "%rdi"
);
}
We replaced our system calls with the equivalent assembly code.
Since we don’t need libc anymore we can compile like this:
gcc -nostdlib <C_FILE>
.
Notes:
The asm keyword is a gnu extension, it might not work with other compilers, use __asm__
if needed.
That syntax is called extended asm, we can read C variables from assembly, look at %0, %1 and %2.
For more information about this, look at gcc asm doc.
Binary size: 13784
bytes
Ok so we reduced the size a bit.
But it should not be that big, where do those kilobytes come from ?
The next steps will be heavily inspired by this blog post that I recommend reading.
Actually most of those bytes come from linking and ELF (Executable and Linkable Format) sections.
For now we got rid of libc, but we still rely on the C language.
Let’s get rid of C, farewell my friend.
Get rid of C (05)
That means we cannot use our .c file anymore.
Luckily we already have a good amount of assembly already written in the previous step.
global _start
section .data
msg: db "Hello world!"
section .text
_start:
mov rax, 1
mov rdi, 1
mov rsi, msg
mov rdx, 12
syscall
mov rax, 60
mov rdi, 0
syscall
Some things to notice here:
- Inversed src/dst for operands
- mov instead of movq
- no ”$” before values (optional)
This is to comply with nasm syntax.
Finally, the build command:
nasm -f elf64 05_hello.asm && gcc hello.o -o a.out -no-pie
First, we generate an object file with nasm and our assembly code.
Second, we still use gcc to link the object file (though we don’t need gcc anymore).
Binary size: 8928
bytes
Nice, almost 5k saved !
Get rid of gcc
Since we do not have any C code we don’t need gcc, let’s link our executable ourselves.
nasm -f elf64 05_hello.asm && ld -m elf_x86_64 05_hello.o -o a.out
Binary size: 8848
bytes
Not much of an improvement but it’s still better than before.
What’s next ?
You have seen many occurences of ELF, elf64… let’s dive in :)
ELF optimization (06)
We don’t need symbols and relocation information since we don’t use any functions or dynamically linked libs.
To remove them use strip -s ./a.out
.
For now on, I will strip symbols after every a.out generation.
Binary size: 8488
bytes
An ELF file has different sections.
By examinating them (i.e with readelf), .data
seems to take a lot of space, let’s remove it.
global _start
section .text
_start:
mov rax, 1
mov rdi, 1
mov rsi, msg
mov rdx, 12
syscall
mov rax, 60
mov rdi, 0
syscall
msg: db "Hello world!"
Binary size: 4360
bytes
Bingo ! The size got divided by 2.
Can we shrink MORE ?
Let’s look at the bytes contained in our binary now.
You can use a tool like xxd
to get an hexdump.
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............
00000010: 0200 3e00 0100 0000 0010 4000 0000 0000 ..>.......@.....
00000020: 4000 0000 0000 0000 4810 0000 0000 0000 @.......H.......
00000030: 0000 0000 4000 3800 0200 4000 0300 0200 [email protected]...@.....
00000040: 0100 0000 0400 0000 0000 0000 0000 0000 ................
00000050: 0000 4000 0000 0000 0000 4000 0000 0000 ..@.......@.....
00000060: b000 0000 0000 0000 b000 0000 0000 0000 ................
00000070: 0010 0000 0000 0000 0100 0000 0500 0000 ................
00000080: 0010 0000 0000 0000 0010 4000 0000 0000 ..........@.....
00000090: 0010 4000 0000 0000 3300 0000 0000 0000 [email protected].......
000000a0: 3300 0000 0000 0000 0010 0000 0000 0000 3...............
000000b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
...
00000ff0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001000: b801 0000 00bf 0100 0000 48be 2710 4000 ..........H.'.@.
00001010: 0000 0000 ba0c 0000 000f 05b8 3c00 0000 ............<...
00001020: bf00 0000 000f 0548 656c 6c6f 2077 6f72 .......Hello wor
00001030: 6c64 2100 2e73 6873 7472 7461 6200 2e74 ld!..shstrtab..t
00001040: 6578 7400 0000 0000 0000 0000 0000 0000 ext.............
00001050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001080: 0000 0000 0000 0000 0b00 0000 0100 0000 ................
00001090: 0600 0000 0000 0000 0010 4000 0000 0000 ..........@.....
000010a0: 0010 0000 0000 0000 3300 0000 0000 0000 ........3.......
000010b0: 0000 0000 0000 0000 1000 0000 0000 0000 ................
000010c0: 0000 0000 0000 0000 0100 0000 0300 0000 ................
000010d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000010e0: 3310 0000 0000 0000 1100 0000 0000 0000 3...............
000010f0: 0000 0000 0000 0000 0100 0000 0000 0000 ................
00001100: 0000 0000 0000 0000 ........
Notice the bytes between 0xf0 to 0xff0, they are all zeros, we should find a way to remove them.
It’s time to manually build our binary, without the assembler or the linker.
What do we need ?
ELF header + Program Header Table + program code.
Note that in an usual ELF exe you would find symbols, relocation code and many sections for different parts of your program (code, readonly data, uninitialized data, initialized data…).
ELF header
64 bytes
Needed for the OS to understand what kind of ELF file it is.
readelf -h ./a.out
This command shows that
- our ELF header is 64 bytes long
- program header table starts at byte 64 (so just after the ELF header)
- program header is 56 bytes long
So, to get our first part of the file, we need to get the first 120 bytes of our binary (64 bytes ELF header + 56 bytes program header table).
head -c 120 ./a.out
Now our actual code
Let’s see where our code is (.text section) and how long it is.
readelf -S ./a.out
There are 3 section headers, starting at offset 0x1048:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000401000 00001000
0000000000000033 0000000000000000 AX 0 0 16
[ 2] .shstrtab STRTAB 0000000000000000 00001033
0000000000000011 0000000000000000 0 0 1
If we look at the .text section, we can see its size 0x33 (53 bytes) and its offset in the file 0x1000 (4096).
Okay how to dump that part of the file ?
You could use many tools like tail, head, xxd or even dd.
Here is a simple example with tail and head:
tail -c +4097 ./a.out | head -c 53
The trick here is to use tail’s +X
syntax, so tail reads from the byte X until the end of the file.
Note the 4097 instead of 4096, because tails counts from 1, not 0.
Then we pipe that output to head to only get the 1st 53 bytes, our code :)
Now that we have the ELF header, the program header table and the code, let’s concatenate everything.
cat <(head -c 120 ./a.out) <(tail -c +4097 ./a.out | head -c 53) > ./a2.out
Some details about this command:
cat
concatenates files passed as args to its stdout.
Here we don’t have file names but outputs of 2 commands.
With bash’s process substitution ”<(list)”, cat
actually sees temporary file descriptors created by bash, corresponding to the output of the command list”.
Do not forget to set your new file executable: chmod +x ./a2.out
Now our small binary won’t work. Why ?
We moved the exec part in a different offset in the file, which does not match with the information stored in the ELF header / program header table.
Check yourselves with readelf -a ./a2.out
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x401000
Start of program headers: 64 (bytes into file)
Start of section headers: 4168 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 2
Size of section headers: 64 (bytes)
Number of section headers: 3
Section header string table index: 2
readelf: Error: Reading 192 bytes extends past end of file for section headers
readelf: Error: Section headers are not available!
readelf: Error: Reading 112 bytes extends past end of file for program headers
We need to manually modify some bytes of the binary with an hex editor.
Modify the ELF
To modify the ELF we need to understand which bytes of the ELF must be changed.
See https://man7.org/linux/man-pages/man5/elf.5.html for a description of the ELF header.
Entrypoint address
As we saw with readelf, the current entrypoint address is wrong 0x401000
.
It should be 0x400078
because our code is at position 120 (0x78) in the file.
Entrypoint address is the 4th field of the ELF, starting at the 24th byte (0x18) and 64bit long (8 bytes) (for my architecture x86_64)
In my editor, I can see 00 10 40 00 00 00 00 00
.
My architecture is little-endian, so 0x00 is the least significant byte, then goes 0x10, etc…
We should have 78 00 40 00 00 00 00 00
Start of section header
Originally that was 4168 (0x1048) but we don’t have a section header anymore so it should be 0.
Start of section header is the 7th field of the ELF header, starting at the 40th byte (0x28) and 8 bytes long.
Number of program headers
Originally 2, but we have only 1 program header.
Nb of program headers is the 11th field of the ELF header, starting at the 56th byte (0x38), 2 bytes long.
Number of section headers
Originally 3 but we now have 0 section header.
Nb of section headers is the 13th field of the ELF header, starting at the 60th byte (0x3C), 2 bytes long.
Section header string table index
Originally 2 but we now have 0 sections, this should be 0.
It’s the last field of the ELF header, starting at the 62th byte (0x3E), 2 bytes long.
Now that we are done with the ELF header, some changes must be made to the program header table.
Program header table
The offset from the beginning of the file at which the first byte of the segment resides changed.
0x1000 -> 0x78
It’s the 3rd field of the program header, 72th byte (0x48)
The virtual address of the program has changed
Originally 0x400000, now 0x400078.
It’s the 4th field of the program header, since the program header follows the ELF header, it’s at the 80th byte (0x50), 8 bytes long.
Flags related to the segment (permissions rwx)
Originally 0x004 which means read-only, we need at least read and exec for our code so 0x005.
It’s the 2nd field of the program header (for x86_64 arch), 68th byte (0x44), 4 bytes long.
Relocate the Hello World! string
Originally 0x0000000000401027 (8 bytes on a 64bit arch), it should be 0x000000000040009f.
Since we moved our .text section with our custom string, the call to write(1) refers to a wrong address.
We have now a functional binary (check it yourselves ;)
Final binary size: 171
bytes
If you checked https://ech0.re/building-the-smallest-elf-program/, you can see that we have one less byte than him.
It’s because our “Hello world!” string does not contain the line feed (\n).
That’s all folks !
Final binary:
7f454c4602010100000000000000000002003e0001000000780040000000
000040000000000000000000000000000000000000004000380001000000
000000000100000005000000780000000000000078004000000000000000
400000000000b000000000000000b0000000000000000010000000000000
b801000000bf0100000048be9f00400000000000ba0c0000000f05b83c00
0000bf000000000f0548656c6c6f20776f726c6421