Commit Graph

1810 Commits

Author SHA1 Message Date
Stan Angelov
912324a1ac Add debug check outside of the loop in hpa_alloc_batch.
This optimizes the whole loop away for non-debug builds.
2021-10-01 14:40:43 -07:00
David CARLIER
cf9724531a Darwin malloc_size override support proposal.
Darwin has similar api than Linux/FreeBSD's malloc_usable_size.
2021-10-01 14:32:40 -07:00
Qi Wang
ab0f1604b4 Delay the atexit call to prof_log_start().
So that atexit() is only done when prof_log is used.
2021-09-29 13:35:50 -07:00
David Carlier
11b6db7448 CPU affinity on BSD platforms support. 2021-09-28 11:40:21 -07:00
Qi Wang
83f3294027 Small refactors around 7bb05e0. 2021-09-27 16:05:13 -07:00
Qi Wang
deb8e62a83 Implement guard pages.
Adding guarded extents, which are regular extents surrounded by guard pages
(mprotected).  To reduce syscalls, small guarded extents are cached as a
separate eset in ecache, and decay through the dirty / muzzy / retained pipeline
as usual.
2021-09-26 16:30:15 -07:00
Piotr Balcer
7bb05e04be add experimental.arenas_create_ext mallctl
This mallctl accepts an arena_config_t structure which
can be used to customize the behavior of the arena.
Right now it contains extent_hooks and a new option,
metadata_use_hooks, which controls whether the extent
hooks are also used for metadata allocation.

The medata_use_hooks option has two main use cases:

1. In heterogeneous memory systems, to avoid metadata
being placed on potentially slower memory.

2. Avoiding virtual memory from being leaked as a result
of metadata allocation failure originating in an extent hook.
2021-09-24 13:43:18 -07:00
Alex Lapenkou
a9031a0970 Allow setting a dump hook
If users want to be notified when a heap dump occurs, they can set this hook.
2021-09-22 15:04:01 -07:00
Alex Lapenkou
f7d46b8119 Allow setting custom backtrace hook
Existing backtrace implementations skip native stack frames from runtimes like
Python. The hook allows to augment the backtraces to attribute allocations to
native functions in heap profiles.
2021-09-22 15:04:01 -07:00
Qi Wang
523cfa55c5 Guard prof related mallctl with opt_prof.
The prof initialization is done only when opt_prof is true.  This change makes
sure the prof_* mallctls only have limited read access (i.e. no access to prof
internals) when opt_prof is false.

In addition, initialize the global prof mutexes even if opt_prof is false.  This
makes sure the mutex stats are set properly.
2021-09-20 10:42:16 -07:00
Alex Lapenkou
6e848a005e Remove opt_background_thread_hpa_interval_max_ms
Now that HPA can communicate the time until its deferred work should be done,
this option is not used anymore.
2021-09-17 16:56:41 -07:00
Alex Lapenkou
8229cc77c5 Wake up background threads on demand
This change allows every allocator conforming to PAI communicate that it
deferred some work for the future. Without it if a background thread goes into
indefinite sleep, there is no way to notify it about upcoming deferred work.
2021-09-17 16:56:41 -07:00
Alex Lapenkou
97da57c13a HPA: Add min_purge_interval_ms option
This rate limiting option is required to avoid purging too often.
2021-09-17 16:56:41 -07:00
Alex Lapenkou
b8b8027f19 Allow PAI to calculate time until deferred work
Previously the calculation of sleep time between wakeups was implemented within
background_thread. This resulted in some parts of decay and hpa specific
logic mixing with background thread implementation. In this change, background
thread delegates this calculation to arena and it, in turn, delegates it to PAI.
The next step is to implement the actual calculation of time until deferred work
in HPA.
2021-09-17 16:56:41 -07:00
Qi Wang
8b24cb8fdf Don't assume initialized arena in the default alloc hook.
Specifically, this change allows the default alloc hook to used during
arenas.create.  One use case is to invoke the default alloc hook in a customized
hook arena, i.e. the default hooks can be read out of a default arena, then
create customized ones based on these hooks.  Note that mixing the default with
customized hooks is not recommended, and should only be considered when the
customization is simple and straightforward.
2021-08-25 14:19:25 -07:00
Qi Wang
5884a076fb Rename prof.dump_prefix to prof.prefix
This better aligns with our naming convention.  The option has not been included
in any upstream release yet.
2021-08-12 23:04:29 -07:00
Alex Lapenkou
f58064b932 Verify that HPA is used before calling its functions
This change eliminates the possibility of PA calling functions of uninitialized
HPA.
2021-08-05 16:43:28 -07:00
David Goldblatt
27f71242b7 Mutex: Tweak internal spin count.
The recent pairing heap optimizations flattened the lock hold time profile.
This was a win for raw cycle counts, but ended up causing us to "just miss"
acquiring the mutex before sleeping more often.  Bump those counts.
2021-08-05 14:33:16 -07:00
David Goldblatt
6f41ba55ee Mutex: Make spin count configurable.
Don't document it since we don't want to support this as a "real" setting, but
it's handy for testing.
2021-08-05 10:13:53 -07:00
David Goldblatt
dcb7b83fac Eset: Cache summary information for heap edatas.
This lets us do a single array scan to find first fits, instead of taking a
cache miss per examined size class.
2021-08-02 15:02:49 -07:00
David Goldblatt
252e0942d0 Eset: Pull per-pszind data into structs.
We currently have one for stats and one for the data.  The data struct is just a
wrapper around the edata_heap_t, but this will change shortly.
2021-08-02 15:02:49 -07:00
David Goldblatt
08a4cc0969 Pairing heap: inline functions instead of macros.
By force-inlining everything that would otherwise be a macro, we get the same
effect (it's not clear in the first place that this is actually a good idea, but
it avoids making any changes to the existing performance profile).

This makes the code more maintainable (in anticipation of subsequent changes),
as well as making performance profiles and debug info more readable (we get
"real" line numbers, instead of making everything point to the macro definition
of all associated functions).
2021-08-02 15:02:49 -07:00
David Goldblatt
92a1e38f52 edata_cache: Allow unbounded fast caching.
The edata_cache_small had a fill/flush heuristic.  In retrospect, this was a
premature optimization; more testing indicates that an unbounded cache is
effectively fine here, and moreover we spend a nontrivial amount of time doing
unnecessary filling/flushing.

As the HPA takes on a larger and larger fraction of all allocations, any
theoretical differences in allocation patterns should shrink.  The HPA is more
efficient with its metadata in general, so it still comes out ahead on metadata
usage anyways.
2021-07-26 15:14:37 -07:00
David Goldblatt
d93eef2f40 HPA: Introduce a redesigned hpa_central_t.
For now, this only handles allocating virtual address space to shards, with no
reuse.  This is framework, though; it will change over time.
2021-07-23 21:59:59 -07:00
David Goldblatt
e09eac1d4e Remove hpa_central.
This is now dead code.
2021-07-23 21:59:59 -07:00
Alex Lapenkou
aaea4fd1e6 Add more documentation to decay.c
It took me a while to understand why some things are implemented the way they
are, so hopefully it will help future readers.
2021-07-22 23:19:09 -07:00
Alex Lapenkou
4b633b9a81 Clean up background thread sleep computation
Isolate the computation of purge interval from background thread logic and
move into more suitable file.
2021-07-22 23:19:09 -07:00
David Goldblatt
6630c59896 HPA: Hugification hysteresis.
We wait a while after deciding a huge extent should get hugified to see if it
gets purged before long.  This avoids hugifying extents that might shortly get
dehugified for purging.

Rename and use the hpa_dehugification_threshold option support code for this,
since it's now ignored.
2021-07-12 17:59:18 -07:00
David Goldblatt
113938b6f4 HPA: Pull out a hooks type.
For now, this is a no-op change.  In a subsequent commit, it will be useful for
testing.
2021-07-12 17:59:18 -07:00
David Goldblatt
1d4a7666d5 HPA: Do deferred operations on background threads. 2021-07-12 17:59:18 -07:00
David Goldblatt
583284f2d9 Add HPA deferral functionality. 2021-07-12 17:59:18 -07:00
David Goldblatt
ace329d11b HPA batch dalloc: Just do one deferred work check.
We only need to do one check per batch dalloc, not one check per dalloc in the
batch.
2021-07-12 17:59:18 -07:00
David Goldblatt
47d8a7e6b0 psset: Purge empty slabs first.
These are particularly good candidates for purging (listed in the diff).
2021-07-12 17:59:18 -07:00
David Goldblatt
41fd56605e HPA: Purge across retained extents.
This lets us cut down on the number of expensive system calls we perform.
2021-07-12 17:59:18 -07:00
David Goldblatt
d202218e86 HPA: Fix typos with big performance implications.
This fixes two simple but significant typos in the HPA:
- The conf string parsing accidentally set a min value of PAGE for
  hpa_sec_batch_fill_extra; i.e. allocating 4096 extra pages every time we
  attempted to allocate a single page.  This puts us over the SEC flush limit,
  so we then immediately flush all but one of them (probably triggering
  purging).
- The HPA was using the default PAI batch alloc implementation, which meant it
  did not actually get any locking advantages.

This snuck by because I did all the performance testing without using the PAI
interface or config settings.  When I cleaned it up and put everything behind
nice interfaces, I only did correctness checks, and didn't try any performance
ones.
2021-06-24 16:26:55 -07:00
David Goldblatt
4452a4812f Add opt.experimental_infallible_new.
This allows a guarantee that operator new never throws.

Fix the .gitignore rules to include test/integration/cpp while we're here.
2021-06-24 12:22:51 -07:00
David Goldblatt
0689448b1e Travis: Unbreak the builds.
In the hopes of future-proofing as much as possible, jump to the latest
distribution Travis supports.
2021-06-24 07:40:28 -07:00
David Goldblatt
36c6bfb963 SEC: Allow arbitrarily many shards, cached sizes. 2021-05-22 08:17:41 -07:00
David Goldblatt
aea91b8c33 Clean up some minor data structure inconsistencies
Namely, unify the include guard styling with the majority of the project, and do
flat_bitmap -> fb, to match its naming convention.
2021-05-12 11:14:23 -07:00
David Goldblatt
1f688490e1 Stats: Fix a printing bug when hpa_dirty_mult = -1
Missed a layer of indirection.
2021-05-05 19:45:25 -07:00
David Goldblatt
4f7cb3a413 Sized deallocation: fix a typo.
dealloction -> deallocation.
2021-05-04 16:46:15 -07:00
Qi Wang
9b523c6c15 Refactor the locking in extent_recycle().
Hold the ecache lock across extent_recycle_extract() and extent_recycle_split(),
so that the extent_deactivate after split can avoid re-take the ecache mutex.
2021-03-31 14:42:33 -07:00
Qi Wang
ce68f326b0 Avoid the release & re-acquire of the ecache locks around the merge hook. 2021-03-31 14:42:33 -07:00
Qi Wang
7dc77527ba Delete the mutex_pool module. 2021-03-29 17:19:53 -07:00
Qi Wang
03d95cba88 Remove the unnecessary arena_ind_set in base_alloc_edata().
All edata alloc sites are already followed with proper edata_init().
2021-03-29 17:19:53 -07:00
Qi Wang
3093d9455e Move the edata mergeability related functions to extent.h. 2021-03-29 17:19:53 -07:00
Qi Wang
7c964b0352 Add rtree_write_range(): writing the same content to multiple leaf elements.
Apply to emap_(de)register_interior which became noticeable in perf profiles.
2021-03-29 17:19:53 -07:00
Qi Wang
add636596a Stop checking head state in the merge hook.
Now that all merging go through try_acquire_edata_neighbor, the mergeablility
checks (including head state checking) are done before reaching the merge hook.
In other words, merge hook will never be called if the head state doesn't agree.
2021-03-29 17:19:53 -07:00
Qi Wang
49b7d7f0a4 Passing down the original edata on the expand path.
Instead of passing down the new_addr, pass down the active edata which allows us
to always use a neighbor-acquiring semantic.  In other words, this tells us both
the original edata and neighbor address.  With this change, only neighbors of a
"known" edata can be acquired, i.e. acquiring an edata based on an arbitrary
address isn't possible anymore.
2021-03-29 17:19:53 -07:00
Qi Wang
1784939688 Use rtree tracked states to protect edata outside of ecache locks.
This avoids the addr-based mutexes (i.e. the mutex_pool), and instead relies on
the metadata tracked in rtree leaf: the head state and extent_state.  Before
trying to access the neighbor edata (e.g. for coalescing), the states will be
verified first -- only neighbor edatas from the same arena and with the same
state will be accessed.
2021-03-29 17:19:53 -07:00
Qi Wang
4d8c22f9a5 Store edata->state in rtree leaf and make edata_t 128B aligned.
Verified that this doesn't result in any real increase of edata_t bytes
allocated.
2021-03-29 17:19:53 -07:00
Qi Wang
70d1541c5b Track extent is_head state in rtree leaf. 2021-03-29 17:19:53 -07:00
Qi Wang
862219e461 Add quiescence sync before deleting base during arena_destroy. 2021-03-29 17:19:53 -07:00
Qi Wang
61afb6a405 Fix locking on arena_i_destroy_ctl().
The ctl_mtx should be held to protect against concurrent arenas.create.
2021-03-22 23:18:52 -07:00
Qi Wang
3913077146 Mark head state during dss alloc.
Specifically, the extent_dalloc_gap relies on the correct head state to
coalesce.
2021-03-12 19:17:25 -08:00
Qi Wang
22be724af4 Set is_head in extent_alloc_wrapper w/ retain.
When retain is on, when extent_grow_retained failed (e.g. due to split hook
failures), we'll try extent_alloc_wrapper as the last resort.  Set the is_head
bit in that case to be consistent.  The allocated extent in that case will be
retained properly, but not merged with other extents.
2021-03-12 10:20:08 -08:00
David Goldblatt
73ca4b8ef8 HPA: Use dirtiest-first purging.
This seems to be practically beneficial, despite some pathological corner cases.
2021-02-19 15:10:54 -08:00
David Goldblatt
0f6c420f83 HPA: Make purging/hugifying more principled.
Before this change, purge/hugify decisions had several sharp edges that could
lead to pathological behavior if tuning parameters weren't carefully chosen.
It's the first of a series; this introduces basic "make every hugepage with
dirty pages purgeable" functionality, and the next commit expands that
functionality to have a smarter policy for picking hugepages to purge.

Previously, the dehugify logic would *never* dehugify a hugepage unless it was
dirtier than the dehugification threshold.  This can lead to situations in which
these pages (which themselves could never be purged) would push us above the
maximum allowed dirty pages in the shard.  This forces immediate purging of any
pages deallocated in non-hugified hugepages, which in turn places nonobvious
practical limitations on the relationships between various config settings.

Instead, we make our preference not to dehugify to purge a soft one rather than
a hard one.  We'll avoid purging them, but only so long as we can do so by
purging non-hugified pages.  If we need to purge them to satisfy our dirty page
limits, or to hugify other, more worthy candidates, we'll still do so.
2021-02-19 15:10:54 -08:00
David Goldblatt
6bddb92ad6 psset: Rename "bitmap" to "pageslab_bitmap".
It tracks pageslabs.  Soon, we'll have another bitmap (to track dirty pages)
that we want to disambiguate.

While we're here, fix an out-of-date comment.
2021-02-19 15:10:54 -08:00
David Goldblatt
154aa5fcc1 Use the flat bitmap for eset and psset bitmaps.
This is simpler (note that the eset field comment was actually incorrect!), and
slightly faster.
2021-02-19 15:10:54 -08:00
David Goldblatt
271a676dcd hpdata: early bailout for longest free range.
A number of common special cases allow us to stop iterating through an hpdata's
bitmap earlier rather than later.
2021-02-19 15:10:54 -08:00
David Goldblatt
d21d5b46b6 Edata: Move sn into its own field.
This lets the bins use a fragmentation avoidance policy that matches the HPA's
(without affecting the PAC).
2021-02-19 15:10:54 -08:00
David Goldblatt
fb327368db SEC: Expand option configurability.
This change pulls the SEC options into a struct, which simplifies their handling
across various modules (e.g. PA needs to forward on SEC options from the
malloc_conf string, but it doesn't really need to know their names).  While
we're here, make some of the fixed constants configurable, and unify naming from
the configuration options to the internals.
2021-02-19 15:10:54 -08:00
David Goldblatt
ce9386370a HPA: Implement batch allocation. 2021-02-19 15:10:54 -08:00
David Goldblatt
cdae6706a6 SEC: Use batch fills.
Currently, this doesn't help much, since no PAI implementation supports
flushing.  This will change in subsequent commits.
2021-02-19 15:10:54 -08:00
David Goldblatt
480f3b11cd Add a batch allocation interface to the PAI.
For now, no real allocator actually implements this interface; this will change
in subsequent diffs.
2021-02-19 15:10:54 -08:00
David Goldblatt
bf448d7a5a SEC: Reduce lock hold times.
Only flush a subset of extents during flushing, and drop the lock while doing
so.
2021-02-19 15:10:54 -08:00
David Goldblatt
1944ebbe7f HPA: Implement batch deallocation.
This saves O(n) mutex locks/unlocks during SEC flush.
2021-02-19 15:10:54 -08:00
David Goldblatt
f47b4c2cd8 PAI/SEC: Add a dalloc_batch function.
This lets the SEC flush all of its items in a single call, rather than flushing
everything at once.
2021-02-19 15:10:54 -08:00
Qi Wang
a11be50332 Implement opt.cache_oblivious.
Keep config.cache_oblivious for now to remain backward-compatible.
2021-02-11 11:32:01 -08:00
Jordan Rome
8c5e5f50a2 Fix stats for "tcache_max" (was "lg_tcache_max")
This opt was changed here: c8209150f9
and looks like this got missed.

Also update the write type to be unsigned.
2021-02-10 23:01:46 -08:00
Qi Wang
041145c272 Report the correct and wrong sizes on sized dealloc bug detection. 2021-02-08 14:42:27 -08:00
Qi Wang
f3b2668b32 Report the offending pointer on sized dealloc bug detection. 2021-02-08 14:42:27 -08:00
David Goldblatt
edbfe6912c Inline malloc fastpath into operator new.
This saves a small but non-negligible amount of CPU in C++ programs.
2021-02-08 14:17:47 -08:00
David Goldblatt
79f81a3732 HPA: Make dirty_mult configurable. 2021-02-04 20:58:31 -08:00
David Goldblatt
32dd153796 HPA: Make dehugification threshold configurable. 2021-02-04 20:58:31 -08:00
David Goldblatt
4790db15ed HPA: make the hugification threshold configurable. 2021-02-04 20:58:31 -08:00
David Goldblatt
b3df80bc79 Pull HPA options into a containing struct.
Currently that just means max_alloc, but we're about to add more.  While we're
touching these lines anyways, tweak things to be more in line with testing.
2021-02-04 20:58:31 -08:00
David Goldblatt
56e85c0e47 HPA: Use a whole-shard purging heuristic.
Previously, we used only hpdata-local information to decide whether to purge.
2021-02-04 20:58:31 -08:00
David Goldblatt
dc886e5608 hpdata: Return the number of pages to be purged.
We'll use this in the next commit.
2021-02-04 20:58:31 -08:00
David Goldblatt
9fd9c876bb psset: keep aggregate stats.
This will let us quickly query these stats to make purging decisions quickly.
2021-02-04 20:58:31 -08:00
David Goldblatt
da63f23e68 HPA: Track pending purges/hugifies in the psset.
This finishes the refactoring of the HPA/psset interactions the past few commits
have been building towards.

Rather than the HPA removing and then reinserting hpdatas, it simply begins
updates and ends them.  These updates can set flags on the hpdata that prevent
it from being returned for certain types of requests.  For example, it can call
hpdata_alloc_allowed_set(hpdata, false) during an update, at which point the
given hpdata will no longer be returned for psset_pick_alloc requests.

This has various of benefits:
- It maintains stats correctness during purges and hugifies.
- It allows simpler and more explicit concurrency control for the various
  special cases (e.g. allocations are disallowed during purge, but not during
  hugify).
- It lets allocations and deallocations avoid disturbing the purging and
  hugification orderings.  If an hpdata "loses its place" in one of the queues
  just do to an alloc / dalloc, it can result in pathological edge cases where
  very hot, very full hugepages never get hugified  (and cold extents on the
  same hugepage as hot ones never get purged).

The key benefit though is that tracking hpdatas to be purged / hugified in a
principled way will let us do delayed purging and hugification.  Eventually this
will let us move these operations to background threads, but in the short term
the benefit is that it will let us have global purging policies (e.g. purge when
the entire arena has too many dirty pages, rather than any particular hugepage).
2021-02-04 20:58:31 -08:00
David Goldblatt
0ea3d6307c CTL, Stats: report HPA empty slab stats. 2021-02-04 20:58:31 -08:00
David Goldblatt
bf64557ed6 Move empty slab tracking to the psset.
We're moving towards a world in which purging decisions are less rigidly
enforced at a single-hugepage level.  In that world, it makes sense to keep
around some hpdatas which are not completely purged, in which case we'll need to
track them.
2021-02-04 20:58:31 -08:00
David Goldblatt
99fc0717e6 psset: Reconceptualize insertion/removal.
Really, this isn't a functional change, just a naming change.  We start thinking
of pageslabs as being always in the psset.  What we used to think of as removal
is now thought of as being in the psset, but in the process of being updated
(and therefore, unavalable for serving new allocations).

This is in preparation of subsequent changes to support deferred purging;
allocations will still be in the psset for the purposes of choosing when to
purge, but not for purposes of allocation/deallocation.
2021-02-04 20:58:31 -08:00
David Goldblatt
061cabb712 HPA stats: report retained instead of inactive.
This more closely maps to the PAC.
2021-02-04 20:58:31 -08:00
David Goldblatt
d3e5ea03c5 HPA: Track dirty stats. 2021-02-04 20:58:31 -08:00
David Goldblatt
68a1666e91 hpdata: Rename "dirty" to "touched".
This matches the usage in the rest of the codebase.
2021-02-04 20:58:31 -08:00
David Goldblatt
be0d7a53f3 HPA: Don't track inactive pages.
This is really only useful for human consumption.  Correspondingly, emit it only
in the human-readable stats, and let everybody else compute from the hugepage
size and nactive.
2021-02-04 20:58:31 -08:00
David Goldblatt
55e0f60ca1 psset stats: Simplify handling.
We can treat the huge and nonhuge cases uniformly using huge state as an array
index.
2021-02-04 20:58:31 -08:00
David Goldblatt
94cd9444c5 HPA: Some minor reformattings. 2021-02-04 20:58:31 -08:00
David Goldblatt
b25ee5d88e HPA: Add purge stats. 2021-02-04 20:58:31 -08:00
David Goldblatt
746ea3de6f HPA stats: Allow some derived stats.
However, we put them in their own struct, to avoid the messiness that the arena
has (mixing derived and non-derived stats in the arena_stats_t).
2021-02-04 20:58:31 -08:00
David Goldblatt
30b9e8162b HPA: Generalize purging.
Previously, we would purge a hugepage only when it's completely empty.  With
this change, we can purge even when only partially empty.  Although the
heuristic here is still fairly primitive, this infrastructure can scale to
become more advanced.
2021-02-04 20:58:31 -08:00
David Goldblatt
70692cfb13 hpdata: Add state changing helpers.
We're about to allow hugepage subextent purging; get as much of our metadata
handling ready as possible.
2021-02-04 20:58:31 -08:00
David Goldblatt
2ae966222f hpdata: track per-page dirty state. 2021-02-04 20:58:31 -08:00
David Goldblatt
ff4086aa6b hpdata: count active pages instead of free ones.
This will be more consistent with later naming choices.
2021-02-04 20:58:31 -08:00
David Goldblatt
20140629b4 Bin: Move stats closer to the mutex.
This is a slight cache locality optimization.
2021-02-04 14:10:43 -08:00
David Goldblatt
c259323ab3 Use ticker_geom_t for arena tcache decay. 2021-02-04 14:10:43 -08:00
David Goldblatt
8edfc5b170 Add ticker_geom_t.
This lets a single ticker object drive events across a large number of different
tick streams while sharing state.
2021-02-04 14:10:43 -08:00