Commit Graph

1531 Commits

Author SHA1 Message Date
Qi Wang
83f3294027 Small refactors around 7bb05e0. 2021-09-27 16:05:13 -07:00
Qi Wang
3c4b717ffc Remove unused header base_structs.h. 2021-09-27 16:05:13 -07:00
Qi Wang
deb8e62a83 Implement guard pages.
Adding guarded extents, which are regular extents surrounded by guard pages
(mprotected).  To reduce syscalls, small guarded extents are cached as a
separate eset in ecache, and decay through the dirty / muzzy / retained pipeline
as usual.
2021-09-26 16:30:15 -07:00
Piotr Balcer
7bb05e04be add experimental.arenas_create_ext mallctl
This mallctl accepts an arena_config_t structure which
can be used to customize the behavior of the arena.
Right now it contains extent_hooks and a new option,
metadata_use_hooks, which controls whether the extent
hooks are also used for metadata allocation.

The medata_use_hooks option has two main use cases:

1. In heterogeneous memory systems, to avoid metadata
being placed on potentially slower memory.

2. Avoiding virtual memory from being leaked as a result
of metadata allocation failure originating in an extent hook.
2021-09-24 13:43:18 -07:00
Alex Lapenkou
a9031a0970 Allow setting a dump hook
If users want to be notified when a heap dump occurs, they can set this hook.
2021-09-22 15:04:01 -07:00
Alex Lapenkou
f7d46b8119 Allow setting custom backtrace hook
Existing backtrace implementations skip native stack frames from runtimes like
Python. The hook allows to augment the backtraces to attribute allocations to
native functions in heap profiles.
2021-09-22 15:04:01 -07:00
Alex Lapenkou
6e848a005e Remove opt_background_thread_hpa_interval_max_ms
Now that HPA can communicate the time until its deferred work should be done,
this option is not used anymore.
2021-09-17 16:56:41 -07:00
Alex Lapenkou
8229cc77c5 Wake up background threads on demand
This change allows every allocator conforming to PAI communicate that it
deferred some work for the future. Without it if a background thread goes into
indefinite sleep, there is no way to notify it about upcoming deferred work.
2021-09-17 16:56:41 -07:00
Alex Lapenkou
97da57c13a HPA: Add min_purge_interval_ms option
This rate limiting option is required to avoid purging too often.
2021-09-17 16:56:41 -07:00
Alex Lapenkou
b8b8027f19 Allow PAI to calculate time until deferred work
Previously the calculation of sleep time between wakeups was implemented within
background_thread. This resulted in some parts of decay and hpa specific
logic mixing with background thread implementation. In this change, background
thread delegates this calculation to arena and it, in turn, delegates it to PAI.
The next step is to implement the actual calculation of time until deferred work
in HPA.
2021-09-17 16:56:41 -07:00
Alex Lapenkou
c01a885e94 HPA: Correctly calculate retained pages
Retained pages are those which haven't been touched and are unbacked from OS
perspective. For a pageslab their number should equal "total pages in slab"
minus "touched pages".
2021-08-20 18:06:17 -07:00
Qi Wang
5884a076fb Rename prof.dump_prefix to prof.prefix
This better aligns with our naming convention.  The option has not been included
in any upstream release yet.
2021-08-12 23:04:29 -07:00
David Goldblatt
6f41ba55ee Mutex: Make spin count configurable.
Don't document it since we don't want to support this as a "real" setting, but
it's handy for testing.
2021-08-05 10:13:53 -07:00
David Goldblatt
dae24589bc PH: Insert-below-min fast-path. 2021-08-02 15:02:49 -07:00
David Goldblatt
40d53e007c ph: Add aux-list counting and pre-merging. 2021-08-02 15:02:49 -07:00
David Goldblatt
dcb7b83fac Eset: Cache summary information for heap edatas.
This lets us do a single array scan to find first fits, instead of taking a
cache miss per examined size class.
2021-08-02 15:02:49 -07:00
David Goldblatt
252e0942d0 Eset: Pull per-pszind data into structs.
We currently have one for stats and one for the data.  The data struct is just a
wrapper around the edata_heap_t, but this will change shortly.
2021-08-02 15:02:49 -07:00
David Goldblatt
dc0a4b8b2f Edata: Pull out comparison fields into a summary.
For now, this is a no-op; eventually, it will allow some caching in the eset.
2021-08-02 15:02:49 -07:00
David Goldblatt
0170dd198a Edata: Fix a couple typos.
Some readability-enhancing whitespace, and a spelling error.
2021-08-02 15:02:49 -07:00
David Goldblatt
08a4cc0969 Pairing heap: inline functions instead of macros.
By force-inlining everything that would otherwise be a macro, we get the same
effect (it's not clear in the first place that this is actually a good idea, but
it avoids making any changes to the existing performance profile).

This makes the code more maintainable (in anticipation of subsequent changes),
as well as making performance profiles and debug info more readable (we get
"real" line numbers, instead of making everything point to the macro definition
of all associated functions).
2021-08-02 15:02:49 -07:00
David Goldblatt
92a1e38f52 edata_cache: Allow unbounded fast caching.
The edata_cache_small had a fill/flush heuristic.  In retrospect, this was a
premature optimization; more testing indicates that an unbounded cache is
effectively fine here, and moreover we spend a nontrivial amount of time doing
unnecessary filling/flushing.

As the HPA takes on a larger and larger fraction of all allocations, any
theoretical differences in allocation patterns should shrink.  The HPA is more
efficient with its metadata in general, so it still comes out ahead on metadata
usage anyways.
2021-07-26 15:14:37 -07:00
David Goldblatt
d93eef2f40 HPA: Introduce a redesigned hpa_central_t.
For now, this only handles allocating virtual address space to shards, with no
reuse.  This is framework, though; it will change over time.
2021-07-23 21:59:59 -07:00
David Goldblatt
e09eac1d4e Remove hpa_central.
This is now dead code.
2021-07-23 21:59:59 -07:00
Alex Lapenkou
aaea4fd1e6 Add more documentation to decay.c
It took me a while to understand why some things are implemented the way they
are, so hopefully it will help future readers.
2021-07-22 23:19:09 -07:00
Alex Lapenkou
4b633b9a81 Clean up background thread sleep computation
Isolate the computation of purge interval from background thread logic and
move into more suitable file.
2021-07-22 23:19:09 -07:00
David Goldblatt
6630c59896 HPA: Hugification hysteresis.
We wait a while after deciding a huge extent should get hugified to see if it
gets purged before long.  This avoids hugifying extents that might shortly get
dehugified for purging.

Rename and use the hpa_dehugification_threshold option support code for this,
since it's now ignored.
2021-07-12 17:59:18 -07:00
David Goldblatt
113938b6f4 HPA: Pull out a hooks type.
For now, this is a no-op change.  In a subsequent commit, it will be useful for
testing.
2021-07-12 17:59:18 -07:00
David Goldblatt
1d4a7666d5 HPA: Do deferred operations on background threads. 2021-07-12 17:59:18 -07:00
David Goldblatt
583284f2d9 Add HPA deferral functionality. 2021-07-12 17:59:18 -07:00
David Goldblatt
47d8a7e6b0 psset: Purge empty slabs first.
These are particularly good candidates for purging (listed in the diff).
2021-07-12 17:59:18 -07:00
David Goldblatt
41fd56605e HPA: Purge across retained extents.
This lets us cut down on the number of expensive system calls we perform.
2021-07-12 17:59:18 -07:00
David Goldblatt
347523517b PAI: Fix a typo. 2021-07-12 17:59:11 -07:00
David Goldblatt
de033f56c0 mpsc_queue: Add module.
This is a simple multi-producer, single-consumer queue.  The intended use case
is in the HPA, as we begin supporting hpdatas that move between hpa_shards.  We
take just a single CAS as the cost to send a message (or a batch of messages) in
the low-contention case, and lock-freedom lets us avoid some lock-ordering
issues.
2021-06-24 14:55:49 -07:00
David Goldblatt
4452a4812f Add opt.experimental_infallible_new.
This allows a guarantee that operator new never throws.

Fix the .gitignore rules to include test/integration/cpp while we're here.
2021-06-24 12:22:51 -07:00
David Carlier
4fb93a18ee extent_can_acquire_neighbor typo fix 2021-06-19 08:13:11 -07:00
Vineet Gupta
2381efab57 ARC: add Minimum allocation alignment
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2021-06-03 13:43:38 -07:00
David Goldblatt
36c6bfb963 SEC: Allow arbitrarily many shards, cached sizes. 2021-05-22 08:17:41 -07:00
David Goldblatt
5417938215 Red-black tree: add summarize/filter.
This allows tracking extra information in the nodes of an red-black tree to
filter searches in the tree to just those that match some property.
2021-05-12 11:14:23 -07:00
David Goldblatt
aea91b8c33 Clean up some minor data structure inconsistencies
Namely, unify the include guard styling with the majority of the project, and do
flat_bitmap -> fb, to match its naming convention.
2021-05-12 11:14:23 -07:00
Qi Wang
7dc77527ba Delete the mutex_pool module. 2021-03-29 17:19:53 -07:00
Qi Wang
3093d9455e Move the edata mergeability related functions to extent.h. 2021-03-29 17:19:53 -07:00
Qi Wang
7c964b0352 Add rtree_write_range(): writing the same content to multiple leaf elements.
Apply to emap_(de)register_interior which became noticeable in perf profiles.
2021-03-29 17:19:53 -07:00
Qi Wang
add636596a Stop checking head state in the merge hook.
Now that all merging go through try_acquire_edata_neighbor, the mergeablility
checks (including head state checking) are done before reaching the merge hook.
In other words, merge hook will never be called if the head state doesn't agree.
2021-03-29 17:19:53 -07:00
Qi Wang
49b7d7f0a4 Passing down the original edata on the expand path.
Instead of passing down the new_addr, pass down the active edata which allows us
to always use a neighbor-acquiring semantic.  In other words, this tells us both
the original edata and neighbor address.  With this change, only neighbors of a
"known" edata can be acquired, i.e. acquiring an edata based on an arbitrary
address isn't possible anymore.
2021-03-29 17:19:53 -07:00
Qi Wang
1784939688 Use rtree tracked states to protect edata outside of ecache locks.
This avoids the addr-based mutexes (i.e. the mutex_pool), and instead relies on
the metadata tracked in rtree leaf: the head state and extent_state.  Before
trying to access the neighbor edata (e.g. for coalescing), the states will be
verified first -- only neighbor edatas from the same arena and with the same
state will be accessed.
2021-03-29 17:19:53 -07:00
Qi Wang
9ea235f8fe Add witness_assert_positive_depth_to_rank(). 2021-03-29 17:19:53 -07:00
Qi Wang
4d8c22f9a5 Store edata->state in rtree leaf and make edata_t 128B aligned.
Verified that this doesn't result in any real increase of edata_t bytes
allocated.
2021-03-29 17:19:53 -07:00
Qi Wang
70d1541c5b Track extent is_head state in rtree leaf. 2021-03-29 17:19:53 -07:00
Evers Chen
a137a68252 Remove redundant declaration, pac_retain_grow_limit_get_set was declared twice in pac.h 2021-03-29 16:42:46 -07:00
Qi Wang
22be724af4 Set is_head in extent_alloc_wrapper w/ retain.
When retain is on, when extent_grow_retained failed (e.g. due to split hook
failures), we'll try extent_alloc_wrapper as the last resort.  Set the is_head
bit in that case to be consistent.  The allocated extent in that case will be
retained properly, but not merged with other extents.
2021-03-12 10:20:08 -08:00
David Goldblatt
73ca4b8ef8 HPA: Use dirtiest-first purging.
This seems to be practically beneficial, despite some pathological corner cases.
2021-02-19 15:10:54 -08:00
David Goldblatt
0f6c420f83 HPA: Make purging/hugifying more principled.
Before this change, purge/hugify decisions had several sharp edges that could
lead to pathological behavior if tuning parameters weren't carefully chosen.
It's the first of a series; this introduces basic "make every hugepage with
dirty pages purgeable" functionality, and the next commit expands that
functionality to have a smarter policy for picking hugepages to purge.

Previously, the dehugify logic would *never* dehugify a hugepage unless it was
dirtier than the dehugification threshold.  This can lead to situations in which
these pages (which themselves could never be purged) would push us above the
maximum allowed dirty pages in the shard.  This forces immediate purging of any
pages deallocated in non-hugified hugepages, which in turn places nonobvious
practical limitations on the relationships between various config settings.

Instead, we make our preference not to dehugify to purge a soft one rather than
a hard one.  We'll avoid purging them, but only so long as we can do so by
purging non-hugified pages.  If we need to purge them to satisfy our dirty page
limits, or to hugify other, more worthy candidates, we'll still do so.
2021-02-19 15:10:54 -08:00
David Goldblatt
6bddb92ad6 psset: Rename "bitmap" to "pageslab_bitmap".
It tracks pageslabs.  Soon, we'll have another bitmap (to track dirty pages)
that we want to disambiguate.

While we're here, fix an out-of-date comment.
2021-02-19 15:10:54 -08:00
David Goldblatt
154aa5fcc1 Use the flat bitmap for eset and psset bitmaps.
This is simpler (note that the eset field comment was actually incorrect!), and
slightly faster.
2021-02-19 15:10:54 -08:00
David Goldblatt
d21d5b46b6 Edata: Move sn into its own field.
This lets the bins use a fragmentation avoidance policy that matches the HPA's
(without affecting the PAC).
2021-02-19 15:10:54 -08:00
David Goldblatt
fb327368db SEC: Expand option configurability.
This change pulls the SEC options into a struct, which simplifies their handling
across various modules (e.g. PA needs to forward on SEC options from the
malloc_conf string, but it doesn't really need to know their names).  While
we're here, make some of the fixed constants configurable, and unify naming from
the configuration options to the internals.
2021-02-19 15:10:54 -08:00
David Goldblatt
cdae6706a6 SEC: Use batch fills.
Currently, this doesn't help much, since no PAI implementation supports
flushing.  This will change in subsequent commits.
2021-02-19 15:10:54 -08:00
David Goldblatt
480f3b11cd Add a batch allocation interface to the PAI.
For now, no real allocator actually implements this interface; this will change
in subsequent diffs.
2021-02-19 15:10:54 -08:00
David Goldblatt
bf448d7a5a SEC: Reduce lock hold times.
Only flush a subset of extents during flushing, and drop the lock while doing
so.
2021-02-19 15:10:54 -08:00
David Goldblatt
1944ebbe7f HPA: Implement batch deallocation.
This saves O(n) mutex locks/unlocks during SEC flush.
2021-02-19 15:10:54 -08:00
David Goldblatt
f47b4c2cd8 PAI/SEC: Add a dalloc_batch function.
This lets the SEC flush all of its items in a single call, rather than flushing
everything at once.
2021-02-19 15:10:54 -08:00
David Goldblatt
4b8870c7db SEC: Fix a comment typo. 2021-02-19 15:10:54 -08:00
Qi Wang
a11be50332 Implement opt.cache_oblivious.
Keep config.cache_oblivious for now to remain backward-compatible.
2021-02-11 11:32:01 -08:00
Qi Wang
041145c272 Report the correct and wrong sizes on sized dealloc bug detection. 2021-02-08 14:42:27 -08:00
Qi Wang
f3b2668b32 Report the offending pointer on sized dealloc bug detection. 2021-02-08 14:42:27 -08:00
David Goldblatt
edbfe6912c Inline malloc fastpath into operator new.
This saves a small but non-negligible amount of CPU in C++ programs.
2021-02-08 14:17:47 -08:00
David Goldblatt
79f81a3732 HPA: Make dirty_mult configurable. 2021-02-04 20:58:31 -08:00
David Goldblatt
32dd153796 HPA: Make dehugification threshold configurable. 2021-02-04 20:58:31 -08:00
David Goldblatt
4790db15ed HPA: make the hugification threshold configurable. 2021-02-04 20:58:31 -08:00
David Goldblatt
b3df80bc79 Pull HPA options into a containing struct.
Currently that just means max_alloc, but we're about to add more.  While we're
touching these lines anyways, tweak things to be more in line with testing.
2021-02-04 20:58:31 -08:00
David Goldblatt
bdb7307ff2 fxp: Add FXP_INIT_PERCENT
This lets us specify fxp values easily in source.
2021-02-04 20:58:31 -08:00
David Goldblatt
caef4c2868 FXP: add fxp_mul_frac.
This can multiply size_ts by a fraction without the risk of overflow.
2021-02-04 20:58:31 -08:00
David Goldblatt
56e85c0e47 HPA: Use a whole-shard purging heuristic.
Previously, we used only hpdata-local information to decide whether to purge.
2021-02-04 20:58:31 -08:00
David Goldblatt
dc886e5608 hpdata: Return the number of pages to be purged.
We'll use this in the next commit.
2021-02-04 20:58:31 -08:00
David Goldblatt
9fd9c876bb psset: keep aggregate stats.
This will let us quickly query these stats to make purging decisions quickly.
2021-02-04 20:58:31 -08:00
David Goldblatt
da63f23e68 HPA: Track pending purges/hugifies in the psset.
This finishes the refactoring of the HPA/psset interactions the past few commits
have been building towards.

Rather than the HPA removing and then reinserting hpdatas, it simply begins
updates and ends them.  These updates can set flags on the hpdata that prevent
it from being returned for certain types of requests.  For example, it can call
hpdata_alloc_allowed_set(hpdata, false) during an update, at which point the
given hpdata will no longer be returned for psset_pick_alloc requests.

This has various of benefits:
- It maintains stats correctness during purges and hugifies.
- It allows simpler and more explicit concurrency control for the various
  special cases (e.g. allocations are disallowed during purge, but not during
  hugify).
- It lets allocations and deallocations avoid disturbing the purging and
  hugification orderings.  If an hpdata "loses its place" in one of the queues
  just do to an alloc / dalloc, it can result in pathological edge cases where
  very hot, very full hugepages never get hugified  (and cold extents on the
  same hugepage as hot ones never get purged).

The key benefit though is that tracking hpdatas to be purged / hugified in a
principled way will let us do delayed purging and hugification.  Eventually this
will let us move these operations to background threads, but in the short term
the benefit is that it will let us have global purging policies (e.g. purge when
the entire arena has too many dirty pages, rather than any particular hugepage).
2021-02-04 20:58:31 -08:00
David Goldblatt
bf64557ed6 Move empty slab tracking to the psset.
We're moving towards a world in which purging decisions are less rigidly
enforced at a single-hugepage level.  In that world, it makes sense to keep
around some hpdatas which are not completely purged, in which case we'll need to
track them.
2021-02-04 20:58:31 -08:00
David Goldblatt
99fc0717e6 psset: Reconceptualize insertion/removal.
Really, this isn't a functional change, just a naming change.  We start thinking
of pageslabs as being always in the psset.  What we used to think of as removal
is now thought of as being in the psset, but in the process of being updated
(and therefore, unavalable for serving new allocations).

This is in preparation of subsequent changes to support deferred purging;
allocations will still be in the psset for the purposes of choosing when to
purge, but not for purposes of allocation/deallocation.
2021-02-04 20:58:31 -08:00
David Goldblatt
d3e5ea03c5 HPA: Track dirty stats. 2021-02-04 20:58:31 -08:00
David Goldblatt
68a1666e91 hpdata: Rename "dirty" to "touched".
This matches the usage in the rest of the codebase.
2021-02-04 20:58:31 -08:00
David Goldblatt
be0d7a53f3 HPA: Don't track inactive pages.
This is really only useful for human consumption.  Correspondingly, emit it only
in the human-readable stats, and let everybody else compute from the hugepage
size and nactive.
2021-02-04 20:58:31 -08:00
David Goldblatt
55e0f60ca1 psset stats: Simplify handling.
We can treat the huge and nonhuge cases uniformly using huge state as an array
index.
2021-02-04 20:58:31 -08:00
David Goldblatt
94cd9444c5 HPA: Some minor reformattings. 2021-02-04 20:58:31 -08:00
David Goldblatt
b25ee5d88e HPA: Add purge stats. 2021-02-04 20:58:31 -08:00
David Goldblatt
746ea3de6f HPA stats: Allow some derived stats.
However, we put them in their own struct, to avoid the messiness that the arena
has (mixing derived and non-derived stats in the arena_stats_t).
2021-02-04 20:58:31 -08:00
David Goldblatt
30b9e8162b HPA: Generalize purging.
Previously, we would purge a hugepage only when it's completely empty.  With
this change, we can purge even when only partially empty.  Although the
heuristic here is still fairly primitive, this infrastructure can scale to
become more advanced.
2021-02-04 20:58:31 -08:00
David Goldblatt
70692cfb13 hpdata: Add state changing helpers.
We're about to allow hugepage subextent purging; get as much of our metadata
handling ready as possible.
2021-02-04 20:58:31 -08:00
David Goldblatt
9b75808be1 flat bitmap: Add a bitwise and/or/not.
We're about to need them.
2021-02-04 20:58:31 -08:00
David Goldblatt
2ae966222f hpdata: track per-page dirty state. 2021-02-04 20:58:31 -08:00
David Goldblatt
ff4086aa6b hpdata: count active pages instead of free ones.
This will be more consistent with later naming choices.
2021-02-04 20:58:31 -08:00
David Goldblatt
3624dd42ff hpdata: Add a comment for hpdata_consistent. 2021-02-04 20:58:31 -08:00
David Goldblatt
20140629b4 Bin: Move stats closer to the mutex.
This is a slight cache locality optimization.
2021-02-04 14:10:43 -08:00
David Goldblatt
c259323ab3 Use ticker_geom_t for arena tcache decay. 2021-02-04 14:10:43 -08:00
David Goldblatt
8edfc5b170 Add ticker_geom_t.
This lets a single ticker object drive events across a large number of different
tick streams while sharing state.
2021-02-04 14:10:43 -08:00
David Goldblatt
3967329813 Arena: share bin offsets in a global.
This saves us a cache miss when lookup up the arena bin offset in a remote
arena during tcache flush.  All arenas share the base offset, and so we don't
need to look it up repeatedly for each arena.  Secondarily, it shaves 288 bytes
off the arena on, e.g., x86-64.
2021-02-04 14:10:43 -08:00
David Goldblatt
2fcbd18115 Cache bin: Don't reverse flush order.
The items we pick to flush matter a lot, but the order in which they get flushed
doesn't; just use forward scans.  This simplifies the accessing code, both in
terms of the C and the generated assembly (i.e. this speeds up the flush
pathways).
2021-02-04 14:10:43 -08:00
David Goldblatt
4c46e11365 Cache an arena's index in the arena.
This saves us a pointer hop down some perf-sensitive paths.
2021-02-04 14:10:43 -08:00
David Goldblatt
229994a204 Tcache flush: keep common path state in registers.
By carefully force-inlining the division constants and the operation sum count,
we can eliminate redundant operations in the arena-level dalloc function.  Do
so.
2021-02-04 14:10:43 -08:00
David Goldblatt
31a629c3de Tcache flush: prefetch edata contents.
This frontloads more of the miss latency.  It also moves it to a pathway where
we have not yet acquired any locks, so that it should (hopefully) reduce hold
times.
2021-02-04 14:10:43 -08:00
David Goldblatt
9f9247a62e Tcache fluhing: increase cache miss parallelism.
In practice, many rtree_leaf_elm accesses are cache misses.  By restructuring,
we can make it more likely that these misses occur without blocking us from
starting later lookups, taking more of those misses in parallel.
2021-02-04 14:10:43 -08:00