This checks whether or not we're reentrant using thread-local data, and, if we
are, moves certain internal allocations to use arena 0 (which should be properly
initialized after bootstrapping).
The immediate thing this allows is spinning up threads in arena_new, which will
enable spinning up background threads there.
1) Re-organize TSD so that frequently accessed fields are closer to the
beginning and more compact. Assuming 64-bit, the first 2.5 cachelines now
contains everything needed on tcache fast path, expect the tcache struct itself.
2) Re-organize tcache and tbins. Take lg_fill_div out of tbin, and reduce tbin
to 24 bytes (down from 32). Split tbins into tbins_small and tbins_large, and
place tbins_small close to the beginning.
With the tcache change, we plan to leave some blank space when !config_debug
(unused tbins, witnesses) at the end of the tsd. Let's not touch the memory.
The embedded tcache is initialized upon tsd initialization. The avail arrays
for the tbins will be allocated / deallocated accordingly during init / cleanup.
With this change, the pointer to the auto tcache will always be available, as
long as we have access to the TSD. tcache_available() (called in tcache_get())
is provided to check if we should use tcache.
This will facilitate embedding tcache into tsd, which will require proper
initialization cannot be done via the static initializer. Make tsd->rtree_ctx
to be initialized via rtree_ctx_data_init().
Compact extent_t to 128 bytes on 64-bit systems by moving
arena_slab_data_t's nfree into extent_t's e_bits.
Cacheline-align extent_t structures so that they always cross the
minimum number of cacheline boundaries.
Re-order extent_t fields such that all fields except the slab bitmap
(and overlaid heap profiling context pointer) are in the first
cacheline.
This resolves#461.
Remove tree-structured bitmap support, in order to reduce complexity and
ease maintenance. No bitmaps larger than 512 bits have been necessary
since before 4.0.0, and there is no current plan that would increase
maximum bitmap size. Although tree-structured bitmaps were used on
32-bit platforms prior to this change, the overall benefits were
questionable (higher metadata overhead, higher bitmap modification cost,
marginally lower search cost).
For extents which do not delay coalescing, use first fit layout policy
rather than first-best fit layout policy. This packs extents toward
older virtual memory mappings, but at the cost of higher search overhead
in the common case.
This resolves#711.
A fixed max spin count is used -- with benchmark results showing it
solves almost all problems. As the benchmark used was rather intense,
the upper bound could be a little bit high. However it should offer a
good tradeoff between spinning and blocking.
Expand and restructure the rtree API such that all common operations can
be achieved with minimal work, regardless of whether the rtree leaf
fields are independent versus packed into a single atomic pointer.
This allows leaf elements to differ in size from internal node elements.
In principle it would be more correct to use a different type for each
level of the tree, but due to implementation details related to atomic
operations, we use casts anyway, thus counteracting the value of
additional type correctness. Furthermore, such a scheme would require
function code generation (via cpp macros), as well as either unwieldy
type names for leaves or type aliases, e.g.
typedef struct rtree_elm_d2_s rtree_leaf_elm_t;
This alternate strategy would be more correct, and with less code
duplication, but probably not worth the complexity.
Rather than storing usize only for large (and prof-promoted)
allocations, store the size class index for allocations that reside
within the extent, such that the size class index is valid for all
extents that contain extant allocations, and invalid otherwise (mainly
to make debugging simpler).