Linux – How does Linux know when to allocate more pages to the call stack?

How does Linux know when to allocate more pages to the call stack?… here is a solution to the problem.

How does Linux know when to allocate more pages to the call stack?

Given the program below, segfault() will (as the name implies) segfault the program by accessing 256k below the stack. nofault() is gradually pushed down the stack, all the way to below 1m, but never segfaults.

Also, running segfault() after nofault() does not result in an error.

If I put sleep() into nofault() and use the time cat/proc/$pid/maps I understand that the allocated stack space grows between the first and second calls, which explains why segfault() doesn’t crash after – there’s enough memory.

But the disassembly shows no change in %rsp. This makes sense, as that would mess up the call stack.

I’m assuming the maximum stack size is written to the binary at compile time (which was hard for compilers to do in retrospect) or that it just periodically checks %rsp and adds a buffer afterward.

How does the kernel know when to increase stack memory?

#include <stdio.h>
#include <unistd.h>

void segfault(){
  char * x;
  int a;
  for( x = (char *)&x-1024*256; x<(char *)(&x+1); x++){
    a = *x & 0xFF;
    printf("%p = 0x%02x\n",x,a);
  }
}

void nofault(){
  char * x;
  int a;
  sleep(20);
  for( x = (char *)(&x); x>(char *)&x-1024*1024; x--){
    a = *x & 0xFF;
    printf("%p = 0x%02x\n",x,a);
  }
  sleep(20);
}

int main(){
  nofault();
  segfault();
}

Solution

When you visit an unmapped page, the processor throws a page fault. The kernel’s page fault handler checks that the address is reasonably close to the process’s %rsp and, if so, allocates some memory and resumes the process. If it is too far below %rsp, the kernel passes the fault to the process as a signal.

I tried to find precise definitions of which addresses are close enough to %rsp to trigger stack growth and derived this from linux/arch/x86/mm.c<:

/*
 * Accessing the stack below %sp is always a bug.
 * The large cushion allows instructions like enter
 * and pusha to work. ("enter $65535, $31" pushes
 * 32 pointers and then decrements %sp by 65535.)
 */
if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) {
        bad_area(regs, error_code, address);
        return;
}

But when experimenting with your program, I found that 65536+32*sizeof(unsigned long) is not the actual dividing point between segfault and segfault. It appears to be twice the value. Therefore, I will stick with the vague “reasonable approach” as my official answer.

Related Problems and Solutions