Use chains of cached objects, rather than using arrays of pointers.
Since tcache_bin_t is no longer dynamically sized, convert tcache_t's
tbin to an array of structures, rather than an array of pointers. This
implicitly removes tcache_bin_{create,destroy}(), which further
simplifies the fast path for malloc/free.
Use cacheline alignment for tcache_t allocations.
Remove runtime configuration option for number of tcache bin slots, and
replace it with a boolean option for enabling/disabling tcache.
Limit the number of tcache objects to the lesser of TCACHE_NSLOTS_MAX
and 2X the number of regions per run for the size class.
For GC-triggered flush, discard 3/4 of the objects below the low water
mark, rather than 1/2.
Convert chunks_dirty from a red-black tree to a doubly linked list,
and use it to purge dirty pages from chunks in FIFO order.
Add a lock around the code that purges dirty pages via madvise(2), in
order to avoid kernel contention. If lock acquisition fails,
indefinitely postpone purging dirty pages.
Add a lower limit of one chunk worth of dirty pages per arena for
purging, in addition to the active:dirty ratio.
When purging, purge all dirty pages from at least one chunk, but rather
than purging enough pages to drop to half the purging threshold, merely
drop to the threshold.
Use left-leaning 2-3 red-black trees instead of left-leaning 2-3-4
red-black trees. This reduces maximum tree height from (3 lg n) to
(2 lg n).
Do lazy balance fixup, rather than transforming the tree during the down
pass. This improves insert/remove speed by ~30%.
Use callback-based iteration rather than macros.