Using MMAP to optimize data reading

Sergey SvistunovSergey SvistunovApr 19, 2026 18:25

Using MMAP to Optimize Data Reading in HighLoad.Fun Solutions

When competing on HighLoad.Fun every microsecond matters. One of the most impactful optimizations you can make is replacing conventional I/O with memory-mapped files via the mmap system call. This article explains how and why this technique works, with practical examples in C++, Go, and Rust.

How HighLoad.Fun Delivers Input

On the HighLoad.Fun platform, your solution’s input arrives through STDIN. However, unlike a typical interactive terminal session, STDIN here is backed by an actual file in RAM (likely on a tmpfs filesystem). This is a crucial detail: because it’s a real file, it has a known size and is seekable - properties that a normal terminal STDIN does not have.

This means you can treat file descriptor 0 (STDIN_FILENO) just like any regular file: seek to determine its size, and then memory-map the entire contents into your process’s address space.

What Is MMAP and Why Does It Help?

The mmap system call creates a mapping between a region of virtual memory and a file. Instead of copying data from kernel buffers into user-space buffers (which is what read(), scanf(), std::cin, and similar functions do), mmap lets your program access the file contents directly through memory pointers.

The performance advantages are significant:

Elimination of copy overhead. Traditional I/O follows a path like: disk/RAM -> kernel page cache -> user-space buffer. With mmap, your process accesses the page cache directly - there is no extra copy step.

No system call per read. With read(), every chunk of data requires a system call (a context switch into the kernel and back). With mmap, after the initial setup, all data access happens through regular memory loads - no further syscalls needed.

Simplified parsing. Once the file is mapped, you have a single contiguous char* buffer containing all the input. You can scan through it with pointer arithmetic, which is often simpler and faster than managing buffered reads.

Kernel-level prefetching. When combined with the MAP_POPULATE flag (more on this below), the kernel reads the entire file into memory eagerly, so your processing loop never stalls on a page fault.

The Core Technique

The approach is straightforward:

  1. Get the file size by seeking to the end of STDIN (file descriptor 0).
  2. Call mmap to map the entire file into your address space.
  3. Parse directly from the mapped buffer using pointer arithmetic.

C++

#include <unistd.h>
#include <sys/mman.h>

// ...

off_t fsize = lseek(0, 0, SEEK_END);
char* buffer = (char*)mmap(
    nullptr,                    // let the kernel choose the address
    fsize,                      // map the entire file
    PROT_READ,                  // read-only access
    MAP_PRIVATE | MAP_POPULATE, // private mapping, prefault all pages
    0,                          // file descriptor 0 = STDIN
    0                           // offset 0 = start of file
);

// buffer now points to the entire input; process it as needed
// ...

munmap(buffer, fsize);

Go

import (
    "os"
    "syscall"
)

// ...

f := os.NewFile(0, "stdin")
fsize, _ := f.Seek(0, os.SEEK_END)

buffer, _ := syscall.Mmap(
    0,                                        // fd 0 = STDIN
    0,                                        // offset
    int(fsize),                               // length
    syscall.PROT_READ,                        // read-only
    syscall.MAP_PRIVATE|syscall.MAP_POPULATE, // private + prefault
)
defer syscall.Munmap(buffer)

// buffer is a []byte containing the entire input
// ...

Rust

use std::os::unix::io::FromRawFd;
use std::io::Seek;

// ...

let mut stdin_file = unsafe { std::fs::File::from_raw_fd(0) };
let fsize = stdin_file.seek(std::io::SeekFrom::End(0)).unwrap() as usize;

let buffer = unsafe {
    libc::mmap(
        std::ptr::null_mut(),
        fsize,
        libc::PROT_READ,
        libc::MAP_PRIVATE | libc::MAP_POPULATE,
        0,   // fd 0 = STDIN
        0,   // offset
    )
};

let data = unsafe { std::slice::from_raw_parts(buffer as *const u8, fsize) };

// data is a &[u8] slice over the entire input
// ...

unsafe { libc::munmap(buffer, fsize); }
std::mem::forget(stdin_file); // prevent double-close of fd 0

C#

using System.IO.MemoryMappedFiles;

// ...

var stdin = Console.OpenStandardInput();
var mmf = MemoryMappedFile.CreateFromFile(
    new FileStream(stdin.SafeFileHandle, FileAccess.Read),
    null,                              // no mapping name
    0,                                 // use file size
    MemoryMappedFileAccess.Read,
    HandleInheritability.None,
    false                              // don't close stream on dispose
);

var accessor = mmf.CreateViewAccessor(0, 0, MemoryMappedFileAccess.Read);
byte* ptr = null;
accessor.SafeMemoryMappedViewHandle.AcquirePointer(ref ptr);
long fsize = accessor.Capacity;

// ptr is a byte* over the entire input
// ...

accessor.SafeMemoryMappedViewHandle.ReleasePointer();
accessor.Dispose();
mmf.Dispose();

Understanding the Key Flags

MAP_PRIVATE

This creates a copy-on-write mapping. Any writes to the mapped region would create private copies of the affected pages rather than modifying the underlying file. For our read-only use case this is the correct choice - it tells the kernel we won’t be modifying the source data.

MAP_POPULATE

This is the real performance booster. Without this flag, the kernel sets up the virtual memory mapping lazily: pages are only loaded when your code first accesses them, triggering a page fault each time. With MAP_POPULATE, the kernel pre-faults all pages at mmap time, reading the entire file into memory upfront. Your subsequent sequential scan then hits fully resident pages with zero page faults - which can make a dramatic difference on large inputs.

PROT_READ

Specifies read-only access to the mapped region. This is both a correctness safeguard and a signal to the kernel that it can share these pages more aggressively.

Huge Pages: Squeezing Out More Performance

The original wiki article mentions that “Huge Pages feature is available” on the HighLoad.Fun platform. This refers to the Linux kernel’s ability to use larger memory pages (typically 2 MB instead of the default 4 KB).

Why does page size matter? Every virtual-to-physical address translation goes through the TLB (Translation Lookaside Buffer) - a small hardware cache in the CPU. With 4 KB pages, a 100 MB input requires ~25,600 page table entries. With 2 MB huge pages, the same input needs only ~50 entries, dramatically reducing TLB misses.

To request huge pages in your mmap call, add the MAP_HUGETLB flag:

char* buffer = (char*)mmap(
    nullptr, fsize,
    PROT_READ,
    MAP_PRIVATE | MAP_POPULATE | MAP_HUGETLB,
    0, 0
);

// Fall back to regular pages if huge pages aren't available
if (buffer == MAP_FAILED) {
    buffer = (char*)mmap(
        nullptr, fsize,
        PROT_READ,
        MAP_PRIVATE | MAP_POPULATE,
        0, 0
    );
}

Note that huge pages may not always be available or may not be beneficial for small inputs. A fallback to regular pages is good practice.


3
Discussion (3)
Log in to join the discussion.
Sergey Svistunov
Sergey SvistunovApr 24, 2026 15:59

I’ve just tuned a little bit the server, MAP_HUGETLB should work now.

1
Danylo Mocherniuk
Danylo MocherniukApr 23, 2026 09:52

Doesn’t work for me neither.

zielaj
zielajApr 21, 2026 11:53

Does MAP_HUGETLB work for anyone on highload.fun? I remember trying it a long time ago on “Parse integers” (C++), and also tried it yesterday, and it doesn’t seem to work for me. Is it supposed to work on highload.fun?

2