Validate that small allocations (i.e. those with `size <= SC_SMALL_MAXCLASS`)
which are sampled for profiling maintain the expected invariants even
though they now take up less space.
With `--with-jemalloc-prefix=` and without `-fno-builtin` or `-O1` both clang and gcc may optimize out `malloc` calls
whose result is unused. Comparing result to NULL also doesn't necessarily count as being used.
This won't be a problem in most client programs as this only concerns really unused pointers, but in
tests it's important to actually execute allocations.
`-fno-builtin` should disable this optimization for both gcc and clang, and applying it only to tests code shouldn't hopefully be an issue.
Another alternative is to force "use" of result but that'd require more changes and may miss some other optimization-related issues.
This should resolve https://github.com/jemalloc/jemalloc/issues/2091
On deallocation, sampled pointers (specially aligned) get junked and stashed
into tcache (to prevent immediate reuse). The expected behavior is to have
read-after-free corrupted and stopped by the junk-filling, while
write-after-free is checked when flushing the stashed pointers.
Adding guarded extents, which are regular extents surrounded by guard pages
(mprotected). To reduce syscalls, small guarded extents are cached as a
separate eset in ecache, and decay through the dirty / muzzy / retained pipeline
as usual.
Existing backtrace implementations skip native stack frames from runtimes like
Python. The hook allows to augment the backtraces to attribute allocations to
native functions in heap profiles.
There is a race between the doc generation and the doc installation,
so make the install depend on the build for doc.
Signed-off-by: Mingli Yu <mingli.yu@windriver.com>
This is a simple multi-producer, single-consumer queue. The intended use case
is in the HPA, as we begin supporting hpdatas that move between hpa_shards. We
take just a single CAS as the cost to send a message (or a batch of messages) in
the low-contention case, and lock-freedom lets us avoid some lock-ordering
issues.
Using an edata_t both for hugepages and the allocations within those hugepages
was convenient at first, but has outlived its usefulness. Representing
hugepages explicitly, with their own data structure, will make future
development easier.
This was promised in the review of the introduction of geom_grow, but would have
been painful to do there because of the series that introduced it. Now that
those are comitted, renaming is easier.
Previously all the small size classes were cached. However this has downsides
-- particularly when page size is greater than 4K (e.g. iOS), which will result
in much higher SMALL_MAXCLASS.
This change allows tcache_max to be set to lower values, to better control
resources taken by tcache.
This will be the centralized component of the coming hugepage allocator; the
source of larger chunks of memory from which smaller ones can be obtained.
This introduces a new sort of edata_t; a pageslab, and a set to manage them.
This is part of a series of a commits to implement a hugepage allocator; the
pageset will be per-arena, and track small page allocations requests within a
larger extent allocated from a centralized hugepage allocator.
The existing checks are good at finding such issues (on tcache flush), but not
so good at pinpointing them. Debug mode can find them, but sometimes debug mode
slows down a program so much that hard-to-hit bugs can take a long time to
crash.
This commit adds functionality to keep programs mostly on their fast paths,
while also checking every sized delete argument they get.