This will be the centralized component of the coming hugepage allocator; the
source of larger chunks of memory from which smaller ones can be obtained.
These had no uses and complicated the API. As a rule we now expect to only use
thread-local randomization for contention-reduction reasons, so we only pay the
API costs and never get the functionality benefits.
This introduces a new sort of edata_t; a pageslab, and a set to manage them.
This is part of a series of a commits to implement a hugepage allocator; the
pageset will be per-arena, and track small page allocations requests within a
larger extent allocated from a centralized hugepage allocator.
The mallctlbymib_long helper was copy-pasted from mallctlbymib_short, and
incorrectly used its output variable (a char *) rather than the output variable
of the mallctl call it was using (a uint64_t), causing breakages when
sizeof(char *) differed from sizeof(uint64_t).
The existing checks are good at finding such issues (on tcache flush), but not
so good at pinpointing them. Debug mode can find them, but sometimes debug mode
slows down a program so much that hard-to-hit bugs can take a long time to
crash.
This commit adds functionality to keep programs mostly on their fast paths,
while also checking every sized delete argument they get.
Previously, all tests with more than two levels came in powers of 2. It's
usefule to check cases where we have a partially filled group at above the
second level.
These simplify a lot of the bit_util module, which had grown bits and pieces of
this functionality across a variety of places over the years.
While we're here, kill off BIT_UTIL_INLINE and don't do reentrancy testing for
bit_util.
For now, this is just a stub containing the ecaches, with no surrounding code
changed. Eventually all the core allocator bits will be moved in, in the
subsequent stack of commits.
Algorithmically, a size greater than 1024 ZB could access one-past-the-end of
the sizes array. This couldn't really happen since SIZE_MAX is less than 1024
ZB on all platforms we support (and we pick the arguments to this function to be
reasonable anyways), but it's not like there's any reason *not* to fix it,
either.
The goal of `qr_meld()` is to change the following four fields
`(a->prev, a->prev->next, b->prev, b->prev->next)` from the values
`(a->prev, a, b->prev, b)` to `(b->prev, b, a->prev, a)`.
This commit changes
```
a->prev->next = b;
b->prev->next = a;
temp = a->prev;
a->prev = b->prev;
b->prev = temp;
```
to
```
temp = a->prev;
a->prev = b->prev;
b->prev = temp;
a->prev->next = a;
b->prev->next = b;
```
The benefit is that we can use `b->prev->next` for `temp`, and so
there's no need to pass in `a_type`.
The restriction is that `b` cannot be a `qr_next()` macro, so users
of `qr_meld()` must pay attention. (Before this change, neither `a`
nor `b` could be a `qr_next()` macro.)
Previously, large allocations in tcaches would have their sizes reduced during
stats estimation. Added a test, which fails before this change but passes now.
This fixes a bug introduced in 5934846612, which
was itself fixing a bug introduced in 9c0549007d.
This lets us put more allocations on an "almost as fast" path after a flush.
This results in around a 4% reduction in malloc cycles in prod workloads
(corresponding to about a 0.1% reduction in overall cycles).
Previously, we took an array of cache_bin_info_ts and an index, and dereferenced
ourselves. But infos for other cache_bins aren't relevant to any particular
cache bin, so that should be the caller's job.
This is debug only and we keep it off the fast path. Moving it here simplifies
the internal logic.
This never tries to junk on regions that were shrunk via xallocx. I think this
is fine for two reasons:
- The shrunk-with-xallocx case is rare.
- We don't always do that anyway before this diff (it depends on the opt
settings and extent hooks in effect).
Make the event module to accept two event types, and pass around the event
context. Use bytes-based events to trigger tcache GC on deallocation, and get
rid of the tcache ticker.
Add options stats_interval and stats_interval_opts to allow interval based stats
printing. This provides an easy way to collect stats without code changes,
because opt.stats_print may not work (some binaries never exit).
This will eventually completely wrap the eset, and handle concurrency,
allocation, and deallocation. For now, we only pull out the mutex from the
eset.
Fold the tsd_state check onto the event threshold check. The fast threshold is
set to 0 when tsd switch to non-nominal.
The fast_threshold can be reset by remote threads, to refect the non nominal tsd
state change.
Develop new data structure and code logic for holding profiling
related information stored in the extent that may be needed after the
extent is released, which in particular is the case for the
reallocation code path (e.g. in `rallocx()` and `xallocx()`). The
data structure is a generalization of `prof_tctx_t`: we previously
only copy out the `prof_tctx` before the extent is released, but we
may be in need of additional fields. Currently the only additional
field is the allocation time field, but there may be more fields in
the future.
The restructuring also resolved a bug: `prof_realloc()` mistakenly
passed the new `ptr` to `prof_free_sampled_object()`, but passing in
the `old_ptr` would crash because it's already been released. Now
the essential profiling information is collectively copied out early
and safely passed to `prof_free_sampled_object()` after the extent is
released.
Specifically, the extent_arena_[g|s]et functions and the address randomization.
These are the only things that tie the extent struct itself to the arena code.
The -1 value of low_water indicates if the cache has been depleted and
refilled. Track the status explicitly in the tcache struct.
This allows the fast path to check if (cur_ptr > low_water), instead of >=,
which avoids reaching slow path when the last item is allocated.