Virtual Memory + Page Faults

Memory Page

A memory page is the basic unit of data from the perspective of the Operating System (OS). All of the code and data is arranged in groups of pages. This mechanism exists, because it is possible for a system to have less RAM installed than the maximum address space. This means that sometimes stuff needs to get swapped between secondary storage (HDD/SSD) and RAM/Physical Memory, and in the case of large applications, it is entirely possible that while the application is running, not all of it stored on the RAM.

So basically, the memory pages tables are a way for the OS to remember where different sets of working data lives, and whether it’s stored on secondary storage or on physical memory.

What is a Page Fault?

A page fault is a hardware exception, that is raised when a process tries to access a memory page that is not currently loaded into the physical memory. Most likely to happen when some process requests access to data in physical memory that hasn’t been recently used, or not yet loaded into it at all.

What basically happens:

Most commonly occurs when:

Bringing data in from secondary storage is obviously much slower than directly accessing something in physical memory so if a lot of page faults happen during some process, it can start to make itself felt on the performance.

Virtual Memory

This provides the illusion of a larger memory space than what is physically available. The page table contains fixed-size pages, and are mapped to physical memory. This virtual representation of memory is called the address space, made possible by the fact that the OS can swap pages between storage and physical memory/RAM.

When an application tries to read from a page that isn’t available in physical memory, the CPU complains and raises a major page fault event.

How a Major Page Fault works

The best understanding is seeing… so let’s write a simple program that

  1. Creates a big file

  2. Maps big file into memory

  3. In a loop, touches each page to try and trigger a major page fault

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/resource.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <string.h>
#include <errno.h>

int main() {
    const char *filename = "bigfile.bin";
    size_t size = 1024UL * 1024 * 1024;  // 1 GB
    size_t pagesize = sysconf(_SC_PAGESIZE);
    struct rusage usage;

    printf("Creating %s (%zu bytes)...\n", filename, size);

    //Create or truncate the file
    int fd = open(filename, O_RDWR | O_CREAT | O_TRUNC, 0666);
    if (fd < 0) {
        perror("open");
        return 1;
    }

    //Extend file to desired size using ftruncate
    if (ftruncate(fd, size) != 0) {
        perror("ftruncate");
        close(fd);
        return 1;
    }

    printf("File created. Mapping to memory...\n");

    //Map the file into memory
    char *map = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (map == MAP_FAILED) {
        perror("mmap");
        close(fd);
        return 1;
    }

    printf("Mapped. Touching each page (~%zu bytes per step)...\n", pagesize);
    fflush(stdout);

    //Touch each page
    for (size_t i = 0; i < size; i += pagesize) {
        map[i] = 1;      // write to force disk read into RAM
        getrusage(RUSAGE_SELF, &usage);
        printf("Touched page %zu : majflt=%ld minflt=%ld\n", i/pagesize, usage.ru_majflt, usage.ru_minflt);
    }

    printf("Done touching pages.\n");

    munmap(map, size);
    close(fd);

    printf("All done.\n");
    return 0;
}

On my system, the command to compile this is:

gcc prog_file_page_fault.c -o prog_file_page_fault

Running it should output:

Creating bigfile.bin (1073741824 bytes)...
File created. Mapping to memory...
Mapped. Touching each page (~4096 bytes per step)...
Touched page 0 : majflt=1 minflt=155
Touched page 1 : majflt=2 minflt=155
Touched page 2 : majflt=3 minflt=155
Touched page 3 : majflt=4 minflt=155
...
Touched page 262142 : majflt=262143 minflt=155
Touched page 262143 : majflt=262144 minflt=155
Done touching pages.
All done.

majflt counter is incrementing by 1 for every touched page:

minflt not moving means:

Usually a program wouldn’t print internal statistics this way, but at least on my computer, this executes too fast for a simple monitoring of /proc/$(pidof prog_file_page_fault)/stat to be conveniently feasible.

What does this mean for MariaDB?

If a major page fault happens, the OS must read data from the storage device, which is slow.

This is a major problem for MariaDB, because InnoDB manages its own caching layer (for accessing data and indexes), and the OS shouldn’t do this swapping with it (InnoDB buffer pool should stay in RAM). If Linux needs to force parts of the buffer pool to storage device, it will cause major page fault inside the caching layer of InnoDB, causing extreme slowdown for queries.

Basically, if this happens, the OS is forced to do work that MariaDB expected to avoid/control.

The root cause can boil down to many different things:

Ideally, during normal operations, there should be no major page faults, but minor page faults are OK.

It’s also important to consider that InnoDB does still rely on the OS page cache for other things, like temporary files, bin logs, etc.

innodb_buffer_pool_size

Configuring innodb_buffer_pool_size

Buffer Pool Usage

Buffer Pool Hit Ratio

To calculate the Hit Ratio:

hit_ratio = (1 - (innodb_buffer_pool_reads/innodb_buffer_pool_read_requests)) * 100

Ideally the result of this hit ratio calculation should be somewhere >95%. This by itself doesn’t mean that queries will perform well, but just that the database can access the data and indexes it needs fast.

Important: Not every problem can be fixed by tweaking system wide configurations. If e.g. too much data is indexed (lots of unnecessary, old indexes that could be eliminated), it will use up pages in the buffer pool without giving any performance boost.