after days, and days, and days of troubleshooting odd problems I had in my homebrew x86_64 OS, I found out that it was caused by my C compiler. After spending all these years arguing with everyone that it is easier to make a hobby OS in pure assembly, I decided to make this OS with asm and C, and I just proved myself that C is evil! Seriously, C is not evil but it does hide a lot of things that makes it hard to know what your OS does.
So after disassembling all my C code and inspecting the assembly code, I found this:
55 push rbp
4889E5 mov rbp,rsp
48897DE8 mov [rbp-0x18],rdi
488975E0 mov [rbp-0x20],rsi
488955D8 mov [rbp-0x28],rdx
C745FC00000000 mov dword [rbp-0x4],0x0
C9 leave
C3 ret
Notice how the function never decreases the stack pointer? Arguments are passed below the stack pointer. Can you image how insane that is???? What would happen if an interrupt would trigger while in that function? The correct code would be:
55 push rbp
4889E5 mov rbp,rsp
4883EC28 ---> sub rsp,byte +0x28 <---
48897DE8 mov [rbp-0x18],rdi
488975E0 mov [rbp-0x20],rsi
488955D8 mov [rbp-0x28],rdx
C745FC00000000 mov dword [rbp-0x4],0x0
C9 leave
C3 ret
It turns out that this behavior is normal according to the amd64 ABI. There is a thing called the "red zone". The red zone is a 128 bytes buffer that is guaranteed to be untouched by interrupt handlers (I'm not sure how though). To quote the ABI:
The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt handlers. Therefore, functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue. This area is known as the red zone.
So my solution was to just disable that damn red-zone with the gcc flag "-mno-red-zone". I'm guessing that the compiler does that to improve performances because it assumes that your code will be running in ring-3, so when an interrupt occurs, the stack will change because the handler will run in ring-0. Yeah sure, it will improve performances because there is one less instruction in the code, but I think that's a huge assumption to make. It definitely isn't the case when you are writing kernel code anyway.