Linux – Do I need to tune sysctl.conf to run MongoDB under linux?

Do I need to tune sysctl.conf to run MongoDB under linux?… here is a solution to the problem.

Do I need to tune sysctl.conf to run MongoDB under linux?

We see sporadic large writes to disk in MongoDB logs, effectively locking MongoDB for long periods of time. Many people on the Internet are reflecting similar questions, but I have not found a good answer so far.

 Tue Mar 11 09:42:49.818 [DataFileSync] flushing mmaps took 75264ms  for 46 files

According to mongo statistics, the average mmap refresh time on my server is about 100 milliseconds.

Most of our MongDB data is updated within a few hours. This leads me to speculate on whether we need to adjust the Linux sysctl virtual memory parameters, as described in the Neo4J performance guide, another memory mapping tool: http://docs.neo4j.org/chunked/stable/linux-performance-guide.html

There are a lot of blocks going out to IO, way more than expected for the write speed we
are seeing in the benchmark. Another observation that can be made is that the Linux kernel
has spawned a process called “flush-x:x” (run top) that seems to be consuming a lot of
resources.

The problem here is that the Linux kernel is trying to be smart and write out dirty pages
from the virtual memory. As the benchmark will memory map a 1GB file and do random writes
it is likely that this will result in 1/4 of the memory pages available on the system to
be marked as dirty. The Neo4j kernel is not sending any system calls to the Linux kernel to
write out these pages to disk however the Linux kernel decided to start doing so and it
is a very bad decision. The result is that instead of doing sequential like writes down
to disk (the logical log file) we are now doing random writes writing regions of the
memory mapped file to disk.

TOP shows that we do have a refresh process that has been running for a long time, so this seems to match.

      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    28352 mongod    20   0  153g 3.2g 3.1g S  3.3 42.3 299:18.36 mongod
     3678 root      20   0     0    0    0 S  0.3  0.0  26:27.88 flush-253:1

The recommended Neo4J sysctl settings are

    vm.dirty_background_ratio = 50
    vm.dirty_ratio = 80

Do these settings have anything to do with the MongoDB installation?

Solution

The short answer is “is”. What value you choose depends largely on your write mode. This provides the exact context of how MongoDB manages its mappings – which is not surprising.

One problem is that in a web-facing database application, you might be more concerned with latency than throughput. vm.dirty_background_ratio gives a threshold to start writing dirty pages, while vm.dirty_ratio tells when to stop accepting new writes (i.e., block) until all writes are flushed.

If you’re hammering a relatively small working set, you can set both values fairly high and rely on Mongo’s (or the operating system’s) periodic time-based flushes to disk to commit writes.

This may sound like your case if you’re doing a lot of inserts and some modifications, which is a balancing behavior that depends on inserts and rewrites – starting flushes too early will result in writes being rewritten quickly, “wasting” IO. Starting a refresh too late can cause a pause when flushing a large number of writes.

If you do mostly insert operations, then you will most likely need a large dirty_ratio (to avoid blocking) and a relatively small dirty_background_ratio (small enough to always write on insert to reduce latency, and just large enough to linearize some writes).

The right solution is to replay some dummy data using the various options of these sysctl parameters and optimize it by brute, keeping in mind your average latency/total throughput target.

Related Problems and Solutions