This lets us put more allocations on an "almost as fast" path after a flush.
This results in around a 4% reduction in malloc cycles in prod workloads
(corresponding to about a 0.1% reduction in overall cycles).
With this, we track all of the empty, full, and low water states together. This
simplifies a lot of the tracking logic, since we now don't need the
cache_bin_info_t for state queries (except for some debugging).
I.e. the tcache code just calls a cache-bin function to finish flush (and move
pointers around, etc.). It doesn't directly access the cache-bin's owned memory
any more.
Previously, we took an array of cache_bin_info_ts and an index, and dereferenced
ourselves. But infos for other cache_bins aren't relevant to any particular
cache bin, so that should be the caller's job.
This is debug only and we keep it off the fast path. Moving it here simplifies
the internal logic.
This never tries to junk on regions that were shrunk via xallocx. I think this
is fine for two reasons:
- The shrunk-with-xallocx case is rare.
- We don't always do that anyway before this diff (it depends on the opt
settings and extent hooks in effect).
The small and large pathways share most of their logic, even if some of the
individual operations are different. We pull out the common logic into a
force-inlined function, and then specialize twice, once for each value of
"small".
The only time sharing an rtree context saves across extent operations isn't a
no-op is when tsd is unavailable. But this happens only in situations like
thread death or initialization, and we don't care about shaving off every
possible cycle in such scenarios.
Previously, tcache fill/flush (as well as small alloc/dalloc on the arena) may
potentially drop the bin lock for slab_alloc and slab_dalloc. This commit
refactors the logic so that the slab calls happen in the same function / level
as the bin lock / unlock. The main purpose is to be able to use flat combining
without having to keep track of stack state.
In the meantime, this change reduces the locking, especially for slab_dalloc
calls, where nothing happens after the call.