"always" marks all user mappings as MADV_HUGEPAGE; while "never" marks all
mappings as MADV_NOHUGEPAGE. The default setting "default" does not change any
settings. Note that all the madvise calls are part of the default extent hooks
by design, so that customized extent hooks have complete control over the
mappings including hugepage settings.
We have a buffer overrun that manifests in the case where arena indices higher
than the number of CPUs are accessed before arena indices lower than the number
of CPUs. This fixes the bug and adds a test.
On glibc and Android's bionic, strerror_r returns char* when
_GNU_SOURCE is defined.
Add a configure check for this rather than assume glibc is the
only libc that behaves this way.
We compute the max size required to satisfy an alignment. However this can be
quite pessimistic, especially with frequent reuse (and combined with state-based
fragmentation). This commit adds one more fit step specific to aligned
allocations, searching in all potential fit size classes.
All the invocations of AC_COMPILE_IFELSE inside JE_CXXFLAGS_ADD were
running 'the compiler and compilation flags of the current language'
which was always the C compiler and the CXXFLAGS were never being tested
against a C++ compiler. This patch fixes this issue by temporarily
changing the chosen compiler to C++ by pushing it over the stack and
popping it immediately after the compilation check.
The arena-associated stats are now all prefixed with arena_stats_, and live in
their own file. Likewise, malloc_bin_stats_t -> bin_stats_t, also in its own
file.
When purging, large allocations are usually the ones that cross the npages_limit
threshold, simply because they are "large". This means we often leave the large
extent around for a while, which has the downsides of: 1) high RSS and 2) more
chance of them getting fragmented. Given that they are not likely to be reused
very soon (LRU), let's over purge by 1 extent (which is often large and not
reused frequently).
Coalescing is a small price to pay for large allocations since they happen less
frequently. This reduces fragmentation while also potentially improving
locality.
When allocating from dirty extents (which we always prefer if available), large
active extents can get split even if the new allocation is much smaller, in
which case the introduced fragmentation causes high long term damage. This new
option controls the threshold to reuse and split an existing active extent. We
avoid using a large extent for much smaller sizes, in order to reduce
fragmentation. In some workload, adding the threshold improves virtual memory
usage by >10x.
While working on #852, I noticed the prng state is atomic. This is the only
atomic use of prng in all of jemalloc. Instead, use a threadlocal prng
state if possible to avoid unnecessary cache line contention.
Added an upper bound on how many pages we can decay during the current run.
Without this, decay could have unbounded increase in stashed, since other
threads could add new pages into the extents.
This option controls the max size when grow_retained. This is useful when we
have customized extent hooks reserving physical memory (e.g. 1G huge pages).
Without this feature, the default increasing sequence could result in fragmented
and wasted physical memory.
This attempts to use VM_OVERCOMMIT OID - newly introduced in -CURRENT
few days ago, specifically for this purpose - instead of querying the
sysctl by its string name. Due to how syctlbyname(3) works, this means
we do one syscall during binary startup instead of two.
Signed-off-by: Edward Tomasz Napierala <trasz@FreeBSD.org>
This avoids sysctl(2) syscall during binary startup, using the value
passed in the ELF aux vector instead.
Signed-off-by: Edward Tomasz Napierala <trasz@FreeBSD.org>
We observed that arena 0 can have much more metadata allocated comparing to
other arenas. Tune the auto mode to only switch to huge page on the 5th block
(instead of 3 previously) for a0.