Rewrite arena_slab_regind() to provide sufficient constant data for
the compiler to perform division strength reduction. This replaces
more general manual strength reduction that was implemented before
arena_bin_info was compile-time-constant. It would be possible to
slightly improve on the compiler-generated division code by taking
advantage of range limits that the compiler doesn't know about.
Add extent serial numbers and use them where appropriate as a sort key
that is higher priority than address, so that the allocation policy
prefers older extents.
This resolves#147.
Add an "over-size" extent heap in which to store extents which exceed
the maximum size class (plus cache-oblivious padding, if enabled).
Remove psz2ind_clamp() and use psz2ind() instead so that trying to
allocate the maximum size class can in principle succeed. In practice,
this allows assertions to hold so that OOM errors can be successfully
generated.
Simplify decay-based purging attempts to only be triggered when the
epoch is advanced, rather than every time purgeable memory increases.
In a correctly functioning system (not previously the case; see below),
this only causes a behavior difference if during subsequent purge
attempts the least recently used (LRU) purgeable memory extent is
initially too large to be purged, but that memory is reused between
attempts and one or more of the next LRU purgeable memory extents are
small enough to be purged. In practice this is an arbitrary behavior
change that is within the set of acceptable behaviors.
As for the purging fix, assure that arena->decay.ndirty is recorded
*after* the epoch advance and associated purging occurs. Prior to this
fix, it was possible for purging during epoch advance to cause a
substantially underrepresentative (arena->ndirty - arena->decay.ndirty),
i.e. the number of dirty pages attributed to the current epoch was too
low, and a series of unintended purges could result. This fix is also
relevant in the context of the simplification described above, but the
bug's impact would be limited to over-purging at epoch advances.
When an allocation is large enough to trigger multiple dumps, use
modular math rather than subtraction to reset the interval counter.
Prior to this change, it was possible for a single allocation to cause
many subsequent allocations to all trigger profile dumps.
When updating usable size for a sampled object, try to cancel out
the difference between LARGE_MINCLASS and usable size from the interval
counter.
Look up chunk metadata via the radix tree, rather than using
CHUNK_ADDR2BASE().
Propagate pointer's containing extent.
Minimize extent lookups by doing a single lookup (e.g. in free()) and
propagating the pointer's extent into nearly all the functions that may
need it.
Use pszind_t size classes rather than szind_t size classes, and always
reserve space for NPSIZES elements. This removes unused heaps that are
not multiples of the page size, and adds (currently) unused heaps for
all huge size classes, with the immediate benefit that the size of
arena_t allocations is constant (no longer dependent on chunk size).
b2c0d6322d (Add witness, a simple online
locking validator.) caused a broad propagation of tsd throughout the
internal API, but tsd_fetch() was designed to fail prior to tsd
bootstrapping. Fix this by splitting tsd_t into non-nullable tsd_t and
nullable tsdn_t, and modifying all internal APIs that do not critically
rely on tsd to take nullable pointers. Furthermore, add the
tsd_booted_get() function so that tsdn_fetch() can probe whether tsd
bootstrapping is complete and return NULL if not. All dangerous
conversions of nullable pointers are tsdn_tsd() calls that assert-fail
on invalid conversion.
This is a broader application of optimizations to malloc() and free() in
f4a0f32d34 (Fast-path improvement:
reduce # of branches and unnecessary operations.).
This resolves#321.
Refactor ph to support configurable comparison functions. Use a cpp
macro code generation form equivalent to the rb macros so that pairing
heaps can be used for both run heaps and chunk heaps.
Remove per node parent pointers, and instead use leftmost siblings' prev
pointers to track parents.
Fix multi-pass sibling merging to iterate over intermediate results
using a FIFO, rather than a LIFO. Use this fixed sibling merging
implementation for both merge phases of the auxiliary twopass algorithm
(first merging the aux list, then replacing the root with its merged
children). This fixes both degenerate merge behavior and the potential
for deep recursion.
This regression was introduced by
6bafa6678f (Pairing heap).
This resolves#371.
Use pairing heap instead of red black tree in arena runs_avail. The
extra links are unioned with the bitmap_t, so this change doesn't use
any extra memory.
Canaries show this change to be a 1% cpu win, and 2% latency win. In
particular, large free()s, and small bin frees are now O(1) (barring
coalescing).
I also tested changing bin->runs to be a pairing heap, but saw a much
smaller win, and it would mean increasing the size of arena_run_s by two
pointers, so I left that as an rb-tree for now.
Add missing stats.arenas.<i>.{dss,lg_dirty_mult,decay_time}
initialization.
Fix stats.arenas.<i>.{pactive,pdirty} to read under the protection of
the arena mutex.