Fix arena_run_first_best_fit() to search all potentially non-empty
runs_avail heaps, rather than ignoring the heap that contains runs
larger than large_maxclass, but less than chunksize.
This fixes a regression caused by
f193fd80cf (Refactor runs_avail.).
This resolves#493.
Fix paren placement so that QUANTUM_CEILING() applies to the correct
portion of the expression that computes how much memory to base_alloc().
In practice this bug had no impact. This was caused by
5d8db15db9 (Simplify run quantization.),
which in turn fixed an over-allocation regression caused by
3c4d92e82a (Add per size class huge
allocation statistics.).
Fix arena_run_alloc_large_helper() to not convert size to usize when
searching for the first best fit via arena_run_first_best_fit(). This
allows the search to consider the optimal quantized size class, so that
e.g. allocating and deallocating 40 KiB in a tight loop can reuse the
same memory.
This regression was nominally caused by
5707d6f952 (Quantize szad trees by size
class.), but it did not commonly cause problems until
8a03cf039c (Implement cache index
randomization for large allocations.). These regressions were first
released in 4.0.0.
This resolves#487.
Fix chunk_alloc_cache() to support decommitted allocation, and use this
ability in arena_chunk_alloc_internal() and arena_stash_dirty(), so that
chunks don't get permanently stuck in a hybrid state.
This resolves#487.
Do not call s2u() during alloc_size computation, since any necessary
ceiling increase is taken care of later by extent_first_best_fit() -->
extent_size_quantize_ceil(), and the s2u() call may erroneously cause a
higher quantization result.
Remove an overly strict overflow check that was added in
4a7852137d (Fix extent_recycle()'s
cache-oblivious padding support.).
Add padding *after* computing the size class, so that the optimal size
class isn't skipped during search for a usable extent. This regression
was caused by b46261d58b (Implement
cache-oblivious support for huge size classes.).
Add an "over-size" extent heap in which to store extents which exceed
the maximum size class (plus cache-oblivious padding, if enabled).
Remove psz2ind_clamp() and use psz2ind() instead so that trying to
allocate the maximum size class can in principle succeed. In practice,
this allows assertions to hold so that OOM errors can be successfully
generated.
Fix extent_alloc_cache[_locked]() to support decommitted allocation, and
use this ability in arena_stash_dirty(), so that decommitted extents are
not needlessly committed during purging. In practice this does not
happen on any currently supported systems, because both extent merging
and decommit must be implemented; all supported systems implement one
xor the other.
rtree_node_init spinlocks the node, allocates, and then sets the node.
This is under heavy contention at the top of the tree if many threads
start to allocate at the same time.
Instead, take a per-rtree sleeping mutex to reduce spinning. Tested
both pthreads and osx OSSpinLock, and both reduce spinning adequately
Previous benchmark time:
./ttest1 500 100
~15s
New benchmark time:
./ttest1 500 100
.57s
Fix zone_force_unlock() to reinitialize, rather than unlocking mutexes,
since OS X 10.12 cannot tolerate a child unlocking mutexes that were
locked by its parent.
Refactor; this was a side effect of experimenting with zone
{de,re}registration during fork(2).
Fix zone_force_unlock() to reinitialize, rather than unlocking mutexes,
since OS X 10.12 cannot tolerate a child unlocking mutexes that were
locked by its parent.
Refactor; this was a side effect of experimenting with zone
{de,re}registration during fork(2).
The raw clock variant is slow (even relative to plain CLOCK_MONOTONIC),
whereas the coarse clock variant is faster than CLOCK_MONOTONIC, but
still has resolution (~1ms) that is adequate for our purposes.
This resolves#479.