Here’s how I reduced the assembly binary size in Tiny Binaries from 456 bytes to 114 bytes.
Shrinking the Code
Below is the original assembly code (asm-naive
in the
results):
;
; hi.s: unoptimized linux x86-64 assembly implementation.
;
bits 64
global _start
section .rodata
; "hi!\n"
hi db "hi!", 10
len equ $ - hi
section .text
_start:
mov rax, 1 ; write
mov rdi, 1 ; fd
mov rsi, hi ; msg
mov rdx, len ; len
syscall ; call write()
mov rax, 60 ; exit
mov rdi, 0 ; exit code
syscall ; call exit()
This produces a 456 byte binary with 39 bytes of code and 4 bytes of data:
$ make
nasm -f elf64 -o hi.o hi.s
ld -s -static -nostdinc -o hi hi.o
$ wc -c ./hi
456 ./hi
$ objdump -hd -Mintel ./hi
...
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000027 0000000000400080 0000000000400080 00000080 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .rodata 00000004 00000000004000a8 00000000004000a8 000000a8 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
Disassembly of section .text:
0000000000400080 <.text>:
400080: b8 01 00 00 00 mov eax,0x1
400085: bf 01 00 00 00 mov edi,0x1
40008a: 48 be a8 00 40 00 00 movabs rsi,0x4000a8
400091: 00 00 00
400094: ba 04 00 00 00 mov edx,0x4
400099: 0f 05 syscall
40009b: b8 3c 00 00 00 mov eax,0x3c
4000a0: bf 00 00 00 00 mov edi,0x0
4000a5: 0f 05 syscall
First, we replace the unnecessary 5 byte instructions with smaller equivalents:
diff --git a/src/asm-naive/hi.s b/src/asm-naive/hi.s
index 9d17cab..3694091 100644
--- a/src/asm-naive/hi.s
+++ b/src/asm-naive/hi.s
@@ -14,12 +14,12 @@ section .rodata
section .text
_start:
- mov rax, 1 ; write
- mov rdi, 1 ; fd
+ inc al ; write
+ inc edi; fd
mov rsi, hi ; msg
- mov rdx, len ; len
+ mov dl, len ; len
syscall ; call write()
- mov rax, 60 ; exit
- mov rdi, 0 ; exit code
+ mov al, 60 ; exit
+ xor edi, edi ; exit code
syscall ; call exit()
Notes:
inc al
works because Linux zeros registers on process init.inc edi
is 2 bytes. Another 2 byte option ismov edi, eax
. The other candidates (inc dil
,inc di
,mov dil, al
, andmov di, ax
) are all 3 bytes.xor edi, edi
is 2 bytes. The other candidates (mov dil, 0
,mov di, 0
,mov edi, 0
,xor dil, dil
,xor di, di
, andxor rdi, rdi
) are all 3-5 bytes.
These changes shrink the binary size to 440 bytes, with 24 bytes of code and 4 bytes of data:
$ make
nasm -f elf64 -o hi.o hi.s
ld -s -static -nostdinc -o hi hi.o
$ wc -c ./hi
440 ./hi
$ objdump -hd -Mintel ./hi
...
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000018 0000000000400080 0000000000400080 00000080 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .rodata 00000004 0000000000400098 0000000000400098 00000098 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
Disassembly of section .text:
0000000000400080 <.text>:
400080: fe c0 inc al
400082: ff c7 inc edi
400084: 48 be 98 00 40 00 00 movabs rsi,0x400098
40008b: 00 00 00
40008e: b2 04 mov dl,0x4
400090: 0f 05 syscall
400092: b0 3c mov al,0x3c
400094: 31 ff xor edi,edi
400096: 0f 05 syscall
The code is now 24 bytes, of which 10 are one large mov
instruction.
We can drop 2 bytes of code, 4 bytes of data, and the .rodata
section
by doing the following:
- Remove
mov rsi, str
(-10 bytes, good riddance). - Drop the
.rodata
section (-4 bytes of data plus.rodata
section overhead). - Encode
"hi!\n"
as a 32-bit integer and push it to the stack (+5 bytes). Hint:"hi!\n" = 68 69 21 0a
, encoded as0x0a216968
plus one byte forpush
. - Copy
rsp
torsi
(+3 bytes). This giveswrite
a valid pointer.
Here’s the result:
bits 64
; "hi!\n", encoded as 32-bit little-endian int
str: equ 0x0a216968
section .text
global _start
_start:
push dword str ; push str (68 68 69 21 0a)
inc al ; write() (fe c0)
inc edi ; fd (ff c7)
mov rsi, rsp ; msg (48 89 e6)
mov dl, 4 ; len (b2 04)
syscall ; call write() (0f 05)
mov al, 60 ; exit() (b0 3c)
xor edi, edi ; exit code (31 ff)
syscall ; call exit() (0f 05)
This produces a 360 byte binary with 22 bytes of code and no data section:
$ make
nasm -f elf64 -o hi.o hi.s
ld -s -static -nostdinc -o hi hi.o
$ ./hi
hi!
$ wc -c ./hi
360 ./hi
$ objdump -h ./hi
...
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000016 0000000000400080 0000000000400080 00000080 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
This is the smallest legitimate assembly implementation that I could
cook up. It’s available in the companion GitHub repository and
shown in the results as asm-opt
.
Dirty Tricks
In Tiny ELF Files: Revisited in 2021, Nathan Otterness created a 114 byte static x86-64 Linux binary by overlapping portions of the ELF header with the program header, then embedding the code in unverified* gaps of the ELF header.
(* Unverified by Linux, that is. Junk in these fields causes
readelf
and objdump
give these binaries the stink eye, as we’ll see
shortly).
Nathan also created a handy table showing which ELF header bytes are
unverified by Linux. In particular, there are two unverified 12
byte regions at offsets 4
and 40
which could store our 22 bytes of
code.
I reordered the code and divided it into into two chunks:
code_0
: First 10 bytes, plus a two byte jump tocode_1
code_1
: Remaining 12 bytes.
Here’s the assembly for the two chunks:
; entry point
; (10 bytes of code plus a 2 byte jump to code_1)
code_0:
push dword str ; push string onto stack (68 68 69 21 0a)
inc al ; write() (fe c0)
mov rsi, rsp ; str (48 89 e6)
jmp code_1 ; jump to next chunk (eb 18)
; ...
; second code chunk
; (12 bytes)
code_1:
inc edi ; fd (89 c7)
mov dl, 4 ; len (b2 04)
syscall ; call write() (0f 05)
mov al, 60 ; exit() (b0 3c)
xor edi, edi ; 0 exit code (31 ff)
syscall ; call exit() (0f 05)
These changes shrink the binary to 114 bytes. Linux will still
happily execute the binary, but readelf
, objdump
, and file
can’t
make sense of it:
$ make
nasm -f bin -o hi hi.s
chmod a+x hi
$ ./hi
hi!
$ wc -c ./hi
114 ./hi
$ objdump -hd -Mintel ./hi
objdump: ./hi: file format not recognized
$ readelf -SW ./hi
There are 65329 section headers, starting at offset 0x3a:
readelf: Warning: The e_shentsize field in the ELF header is larger than the size of an ELF section header
readelf: Error: Reading 1014951344 bytes extends past end of file for section headers
readelf: Error: Too many program headers - 0x50f - the file is not that big
$ file ./hi
./hi: ELF, unknown class 104
$ hd ./hi
00000000 7f 45 4c 46 68 68 69 21 0a fe c0 48 89 e6 eb 18 |.ELFhhi!...H....|
00000010 02 00 3e 00 01 00 00 00 04 80 02 00 00 00 00 00 |..>.............|
00000020 3a 00 00 00 00 00 00 00 ff c7 b2 04 0f 05 b0 3c |:..............<|
00000030 31 ff 0f 05 40 00 38 00 01 00 01 00 00 00 05 00 |1...@.8.........|
00000040 00 00 00 00 00 00 00 00 00 00 00 80 02 00 00 00 |................|
00000050 00 00 00 00 00 00 00 00 00 00 72 00 00 00 00 00 |..........r.....|
00000060 00 00 72 00 00 00 00 00 00 00 00 00 00 00 00 00 |..r.............|
00000070 00 00 |..|
00000072
It’ll even run as a Docker image:
$ cat Dockerfile
FROM scratch
COPY ./hi /hi
ENTRYPOINT ["/hi"]
$ docker build -t zoinks .
Sending build context to Docker daemon 8.704kB
Step 1/3 : FROM scratch
--->
Step 2/3 : COPY ./hi /hi
---> 336acd7e2d94
Step 3/3 : ENTRYPOINT ["/hi"]
---> Running in e20eca61de44
Removing intermediate container e20eca61de44
---> 297e5d7db5f8
Successfully built 297e5d7db5f8
Successfully tagged zoinks:latest
$ docker images zoinks --format '{{.Size}}'
114B
$ docker run --rm -it zoinks
hi!
$
This glorious monstrosity is included in the companion
repository and shown in the results as asm-elf
.
Links
If you enjoyed this post, you may also like:
- Tiny Binaries Repository: Companion repository with source code, build instructions, and additional details for this post.
- Tiny ELF Files: Revisited in 2021: Tiny static x86-64 Linux binaries, including a table of unverified ELF header bytes.
- A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux: Classic article on tiny 32-bit static binaries.
- My Own Private Binary: Sequel to A Whirlwind Tutorial where the author creates a 0 byte executable using a kernel module.
Update (2022-01-02): Shorten, fix typos, improve grammar.