This change allows every allocator conforming to PAI communicate that it
deferred some work for the future. Without it if a background thread goes into
indefinite sleep, there is no way to notify it about upcoming deferred work.
Previously the calculation of sleep time between wakeups was implemented within
background_thread. This resulted in some parts of decay and hpa specific
logic mixing with background thread implementation. In this change, background
thread delegates this calculation to arena and it, in turn, delegates it to PAI.
The next step is to implement the actual calculation of time until deferred work
in HPA.
Prior to the change you could specify --enable-prof-libunwind without
--enable-prof which would do effectively nothing. This was confusing as I
expected --enable-prof-libunwind to act like --enable-prof, but use libunwind.
There is a race between the doc generation and the doc installation,
so make the install depend on the build for doc.
Signed-off-by: Mingli Yu <mingli.yu@windriver.com>
Specifically, this change allows the default alloc hook to used during
arenas.create. One use case is to invoke the default alloc hook in a customized
hook arena, i.e. the default hooks can be read out of a default arena, then
create customized ones based on these hooks. Note that mixing the default with
customized hooks is not recommended, and should only be considered when the
customization is simple and straightforward.
Retained pages are those which haven't been touched and are unbacked from OS
perspective. For a pageslab their number should equal "total pages in slab"
minus "touched pages".
When clang sees an unknown warning option, unlike gcc it doesn't fail the build
with error. It issues a warning. Hence JE_CFLAGS_ADD with warning options that
didnt't exist in clang would still mark those options as available. This led to
several warnings when built with clang or "gcc" on OSX. This change fixes those
warnings by simply making clang fail builds with non-existent warning options.
The recent pairing heap optimizations flattened the lock hold time profile.
This was a win for raw cycle counts, but ended up causing us to "just miss"
acquiring the mutex before sleeping more often. Bump those counts.
By force-inlining everything that would otherwise be a macro, we get the same
effect (it's not clear in the first place that this is actually a good idea, but
it avoids making any changes to the existing performance profile).
This makes the code more maintainable (in anticipation of subsequent changes),
as well as making performance profiles and debug info more readable (we get
"real" line numbers, instead of making everything point to the macro definition
of all associated functions).
The edata_cache_small had a fill/flush heuristic. In retrospect, this was a
premature optimization; more testing indicates that an unbounded cache is
effectively fine here, and moreover we spend a nontrivial amount of time doing
unnecessary filling/flushing.
As the HPA takes on a larger and larger fraction of all allocations, any
theoretical differences in allocation patterns should shrink. The HPA is more
efficient with its metadata in general, so it still comes out ahead on metadata
usage anyways.
We wait a while after deciding a huge extent should get hugified to see if it
gets purged before long. This avoids hugifying extents that might shortly get
dehugified for purging.
Rename and use the hpa_dehugification_threshold option support code for this,
since it's now ignored.
On OS X, "gcc" is really just clang anyways, so this combination gets tested by
the gcc test. This is purely redundant, and (since it runs early in the output)
increases time to signal for real breakages further down in the list.
This fixes two simple but significant typos in the HPA:
- The conf string parsing accidentally set a min value of PAGE for
hpa_sec_batch_fill_extra; i.e. allocating 4096 extra pages every time we
attempted to allocate a single page. This puts us over the SEC flush limit,
so we then immediately flush all but one of them (probably triggering
purging).
- The HPA was using the default PAI batch alloc implementation, which meant it
did not actually get any locking advantages.
This snuck by because I did all the performance testing without using the PAI
interface or config settings. When I cleaned it up and put everything behind
nice interfaces, I only did correctness checks, and didn't try any performance
ones.
This is a simple multi-producer, single-consumer queue. The intended use case
is in the HPA, as we begin supporting hpdatas that move between hpa_shards. We
take just a single CAS as the cost to send a message (or a batch of messages) in
the low-contention case, and lock-freedom lets us avoid some lock-ordering
issues.