rallocx() for an alignment-constrained request may end up with a
smaller-than-worst-case size if in-place reallocation succeeds due to
serendipitous alignment. In such cases, sampling may not happen.
Look up chunk metadata via the radix tree, rather than using
CHUNK_ADDR2BASE().
Propagate pointer's containing extent.
Minimize extent lookups by doing a single lookup (e.g. in free()) and
propagating the pointer's extent into nearly all the functions that may
need it.
Use pszind_t size classes rather than szind_t size classes, and always
reserve space for NPSIZES elements. This removes unused heaps that are
not multiples of the page size, and adds (currently) unused heaps for
all huge size classes, with the immediate benefit that the size of
arena_t allocations is constant (no longer dependent on chunk size).
These compute size classes and indices similarly to size2index(),
index2size() and s2u(), respectively, but using the subset of size
classes that are multiples of the page size. Note that pszind_t and
szind_t are not interchangeable.
b2c0d6322d (Add witness, a simple online
locking validator.) caused a broad propagation of tsd throughout the
internal API, but tsd_fetch() was designed to fail prior to tsd
bootstrapping. Fix this by splitting tsd_t into non-nullable tsd_t and
nullable tsdn_t, and modifying all internal APIs that do not critically
rely on tsd to take nullable pointers. Furthermore, add the
tsd_booted_get() function so that tsdn_fetch() can probe whether tsd
bootstrapping is complete and return NULL if not. All dangerous
conversions of nullable pointers are tsdn_tsd() calls that assert-fail
on invalid conversion.
This is a broader application of optimizations to malloc() and free() in
f4a0f32d34 (Fast-path improvement:
reduce # of branches and unnecessary operations.).
This resolves#321.
If the OS overcommits:
- Commit all mappings in pages_map() regardless of whether the caller
requested committed memory.
- Linux-specific: Specify MAP_NORESERVE to avoid
unfortunate interactions with heuristic overcommit mode during
fork(2).
This resolves#193.
Prior to 767d85061a (Refactor arenas array
(fixes deadlock).), it was possible under some circumstances for
arena_get() to trigger recreation of the arenas cache during tsd
cleanup, and the arenas cache would then be leaked. In principle a
similar issue could still occur as a side effect of decay-based purging,
which calls arena_tdata_get(). Fix arenas_tdata_cleanup() by setting
tsd->arenas_tdata_bypass to true, so that arena_tdata_get() will
gracefully fail (an expected behavior) rather than recreating
tsd->arena_tdata.
Reported by Christopher Ferris <cferris@google.com>.
Add HUGE_MAXCLASS overflow checks that are specific to heap profiling
code paths. This fixes test failures that were introduced by
0c516a00c4 (Make *allocx() size class
overflow behavior defined.).
Refactor the arenas array, which contains pointers to all extant arenas,
such that it starts out as a sparse array of maximum size, and use
double-checked atomics-based reads as the basis for fast and simple
arena_get(). Additionally, reduce arenas_lock's role such that it only
protects against arena initalization races. These changes remove the
possibility for arena lookups to trigger locking, which resolves at
least one known (fork-related) deadlock.
This resolves#315.
Use a single uint64_t in nstime_t to store nanoseconds rather than using
struct timespec. This reduces fragility around conversions between long
and uint64_t, especially missing casts that only cause problems on
32-bit platforms.
This is an alternative to the existing ratio-based unused dirty page
purging, and is intended to eventually become the sole purging
mechanism.
Add mallctls:
- opt.purge
- opt.decay_time
- arena.<i>.decay
- arena.<i>.decay_time
- arenas.decay_time
- stats.arenas.<i>.decay_time
This resolves#325.
When using LinuxThreads, malloc bootstrapping deadlocks, since
malloc_tsd_boot0() ends up calling pthread_setspecific(), which causes
recursive allocation. Fix it by moving the malloc_tsd_boot0() call to
malloc_init_hard_recursible().
The deadlock was introduced by 8bb3198f72
(Refactor/fix arenas manipulation.), when tsd_boot() was split and the
top half, tsd_boot0(), got an extra tsd_wrapper_set() call.
- Combine multiple runtime branches into a single malloc_slow check.
- Avoid calling arena_choose / size2index / index2size on fast path.
- A few micro optimizations.
Simplify imallocx_prof_sample() to always operate on usize rather than
sometimes using size. This avoids redundant usize computations and
more closely fits the style adopted by i[rx]allocx_prof_sample() to fix
sampling bugs.
Fix ixallocx_prof_sample() to never modify nor create sampled small
allocations. xallocx() is in general incapable of moving small
allocations, so this fix removes buggy code without loss of generality.