This mallctl accepts an arena_config_t structure which
can be used to customize the behavior of the arena.
Right now it contains extent_hooks and a new option,
metadata_use_hooks, which controls whether the extent
hooks are also used for metadata allocation.
The medata_use_hooks option has two main use cases:
1. In heterogeneous memory systems, to avoid metadata
being placed on potentially slower memory.
2. Avoiding virtual memory from being leaked as a result
of metadata allocation failure originating in an extent hook.
Existing backtrace implementations skip native stack frames from runtimes like
Python. The hook allows to augment the backtraces to attribute allocations to
native functions in heap profiles.
This change allows every allocator conforming to PAI communicate that it
deferred some work for the future. Without it if a background thread goes into
indefinite sleep, there is no way to notify it about upcoming deferred work.
Previously the calculation of sleep time between wakeups was implemented within
background_thread. This resulted in some parts of decay and hpa specific
logic mixing with background thread implementation. In this change, background
thread delegates this calculation to arena and it, in turn, delegates it to PAI.
The next step is to implement the actual calculation of time until deferred work
in HPA.
Retained pages are those which haven't been touched and are unbacked from OS
perspective. For a pageslab their number should equal "total pages in slab"
minus "touched pages".
By force-inlining everything that would otherwise be a macro, we get the same
effect (it's not clear in the first place that this is actually a good idea, but
it avoids making any changes to the existing performance profile).
This makes the code more maintainable (in anticipation of subsequent changes),
as well as making performance profiles and debug info more readable (we get
"real" line numbers, instead of making everything point to the macro definition
of all associated functions).
The edata_cache_small had a fill/flush heuristic. In retrospect, this was a
premature optimization; more testing indicates that an unbounded cache is
effectively fine here, and moreover we spend a nontrivial amount of time doing
unnecessary filling/flushing.
As the HPA takes on a larger and larger fraction of all allocations, any
theoretical differences in allocation patterns should shrink. The HPA is more
efficient with its metadata in general, so it still comes out ahead on metadata
usage anyways.
We wait a while after deciding a huge extent should get hugified to see if it
gets purged before long. This avoids hugifying extents that might shortly get
dehugified for purging.
Rename and use the hpa_dehugification_threshold option support code for this,
since it's now ignored.
This is a simple multi-producer, single-consumer queue. The intended use case
is in the HPA, as we begin supporting hpdatas that move between hpa_shards. We
take just a single CAS as the cost to send a message (or a batch of messages) in
the low-contention case, and lock-freedom lets us avoid some lock-ordering
issues.
Now that all merging go through try_acquire_edata_neighbor, the mergeablility
checks (including head state checking) are done before reaching the merge hook.
In other words, merge hook will never be called if the head state doesn't agree.
Instead of passing down the new_addr, pass down the active edata which allows us
to always use a neighbor-acquiring semantic. In other words, this tells us both
the original edata and neighbor address. With this change, only neighbors of a
"known" edata can be acquired, i.e. acquiring an edata based on an arbitrary
address isn't possible anymore.
This avoids the addr-based mutexes (i.e. the mutex_pool), and instead relies on
the metadata tracked in rtree leaf: the head state and extent_state. Before
trying to access the neighbor edata (e.g. for coalescing), the states will be
verified first -- only neighbor edatas from the same arena and with the same
state will be accessed.
When retain is on, when extent_grow_retained failed (e.g. due to split hook
failures), we'll try extent_alloc_wrapper as the last resort. Set the is_head
bit in that case to be consistent. The allocated extent in that case will be
retained properly, but not merged with other extents.
Before this change, purge/hugify decisions had several sharp edges that could
lead to pathological behavior if tuning parameters weren't carefully chosen.
It's the first of a series; this introduces basic "make every hugepage with
dirty pages purgeable" functionality, and the next commit expands that
functionality to have a smarter policy for picking hugepages to purge.
Previously, the dehugify logic would *never* dehugify a hugepage unless it was
dirtier than the dehugification threshold. This can lead to situations in which
these pages (which themselves could never be purged) would push us above the
maximum allowed dirty pages in the shard. This forces immediate purging of any
pages deallocated in non-hugified hugepages, which in turn places nonobvious
practical limitations on the relationships between various config settings.
Instead, we make our preference not to dehugify to purge a soft one rather than
a hard one. We'll avoid purging them, but only so long as we can do so by
purging non-hugified pages. If we need to purge them to satisfy our dirty page
limits, or to hugify other, more worthy candidates, we'll still do so.
It tracks pageslabs. Soon, we'll have another bitmap (to track dirty pages)
that we want to disambiguate.
While we're here, fix an out-of-date comment.