Refactor the arenas array, which contains pointers to all extant arenas,
such that it starts out as a sparse array of maximum size, and use
double-checked atomics-based reads as the basis for fast and simple
arena_get(). Additionally, reduce arenas_lock's role such that it only
protects against arena initalization races. These changes remove the
possibility for arena lookups to trigger locking, which resolves at
least one known (fork-related) deadlock.
This resolves#315.
Attempt mmap-based in-place huge reallocation by plumbing new_addr into
chunk_alloc_mmap(). This can dramatically speed up incremental huge
reallocation.
This resolves#335.
Separate run trees by index, replacing the previous quantize logic.
Quantization by index is now performed only on insertion / removal from
the tree, and not on node comparison, saving some cpu. This also means
we don't have to dereference the miscelm* pointers, saving half of the
memory loads from miscelms/mapbits that have fallen out of cache. A
linear scan of the indicies appears to be fast enough.
The only cost of this is an extra tree array in each arena.
Use a single uint64_t in nstime_t to store nanoseconds rather than using
struct timespec. This reduces fragility around conversions between long
and uint64_t, especially missing casts that only cause problems on
32-bit platforms.
This is an alternative to the existing ratio-based unused dirty page
purging, and is intended to eventually become the sole purging
mechanism.
Add mallctls:
- opt.purge
- opt.decay_time
- arena.<i>.decay
- arena.<i>.decay_time
- arenas.decay_time
- stats.arenas.<i>.decay_time
This resolves#325.
- Combine multiple runtime branches into a single malloc_slow check.
- Avoid calling arena_choose / size2index / index2size on fast path.
- A few micro optimizations.
ex_destroy iterates over the tree using post-order traversal so nodes
can be removed and processed by the callback function without paying the
cost to rebalance the tree. The destruction process cannot be stopped
once started.
Fix xallocx(..., MALLOCX_ZERO to zero the last full trailing page of
large allocations that have been randomly assigned an offset of 0 when
--enable-cache-oblivious configure option is enabled. This addresses a
special case missed in d260f442ce (Fix
xallocx(..., MALLOCX_ZERO) bugs.).
Don't assume Bourne shell is in /bin/sh when running size_classes.sh .
Consider __sparcv9 a synonym for __sparc64__ when defining LG_QUANTUM.
This resolves#275.
Add arena_prof_tctx_reset() and use it instead of arena_prof_tctx_set()
when resetting the tctx pointer during reallocation, which happens
whenever an originally sampled reallocated object is not sampled during
reallocation.
This regression was introduced by
594c759f37 (Optimize
arena_prof_tctx_set().)
Fix prof_realloc() to call prof_free_sampled_object() after calling
prof_malloc_sample_object(). Prior to this fix, if tctx and old_tctx
were the same, the tctx could have been prematurely destroyed.
Make one call to prof_active_get_unlocked() per allocation event, and
use the result throughout the relevant functions that handle an
allocation event. Also add a missing check in prof_realloc(). These
fixes protect allocation events against concurrent prof_active changes.
Fix heap profiling to distinguish among otherwise identical sample sites
with interposed resets (triggered via the "prof.reset" mallctl). This
bug could cause data structure corruption that would most likely result
in a segfault.