Commit Graph

607 Commits

Author SHA1 Message Date
Qi Wang
d131331310 Avoid eager purging on the dedicated oversize arena when using bg thds.
We have observed new workload patterns (namely ML training type) that cycle
through oversized allocations frequently, because 1) the dataset might be sparse
which is faster to go through, and 2) GPU accelerated.  As a result, the eager
purging from the oversize arena becomes a bottleneck.  To offer an easy
solution, allow normal purging of the oversized extents when background threads
are enabled.
2023-06-27 11:57:41 -07:00
Qi Wang
d4a2b8bab1 Add the prof_sys_thread_name feature in the prof_recent unit test.
This tests the combination of the prof_recent and thread_name features.
Verified that it catches the issue being fixed in this PR.

Also explicitly set thread name in test/unit/prof_recent.  This fixes the name
testing when no default thread name is set (e.g. FreeBSD).
2023-05-11 09:10:57 -07:00
Kevin Svetlitski
70344a2d38 Make eligible functions static
The codebase is already very disciplined in making any function which
can be `static`, but there are a few that appear to have slipped through
the cracks.
2023-05-08 15:00:02 -07:00
Qi Wang
434a68e221 Disallow decay during reentrancy.
Decay should not be triggered during reentrant calls (may cause lock order
reversal / deadlocks).  Added a delay_trigger flag to the tickers to bypass
decay when rentrancy_level is not zero.
2023-04-05 10:16:37 -07:00
Qi Wang
ce0b7ab6c8 Inline the storage for thread name in prof_tdata_t.
The previous approach managed the thread name in a separate buffer, which causes
races because the thread name update (triggered by new samples) can happen at
the same time as prof dumping (which reads the thread names) -- these two
operations are under separate locks to avoid blocking each other.  Implemented
the thread name storage as part of the tdata struct, which resolves the lifetime
issue and also avoids internal alloc / dalloc during prof_sample.
2023-04-05 10:03:12 -07:00
Qi Wang
6cab460a45 Add a multithreaded test for prof_sys_thread_name.
Verified that this catches the issue being fixed in 5fd5583.
2023-04-05 10:03:12 -07:00
Qi Wang
8b64be3441 Explicit arena assignment in test_tcache_max.
Otherwise the associated arena could change with percpu arena enabled.
2023-03-22 15:16:43 -07:00
Qi Wang
8e7353a19b Explicit arena assignment in test_thread_idle.
Otherwise the associated arena could change with percpu arena enabled.
2023-03-22 15:16:43 -07:00
Qi Wang
71bc1a3d91 Avoid assuming the arena id in test when percpu_arena is used. 2023-03-13 10:50:10 -07:00
Qi Wang
97b313c7d4 More conservative setting for /test/unit/background_thread_enable.
Lower the thread and arena count to avoid resource exhaustion on 32-bit.
2023-02-16 14:42:21 -08:00
Qi Wang
8580c65f81 Implement prof sample hooks "experimental.hooks.prof_sample(_free)".
The added hooks hooks.prof_sample and hooks.prof_sample_free are intended to
allow advanced users to track additional information, to enable new ways of
profiling on top of the jemalloc heap profile and sample features.

The sample hook is invoked after the allocation and backtracing, and forwards
the both the allocation and backtrace to the user hook; the sample_free hook
happens before the actual deallocation, and forwards only the ptr and usz to the
hook.
2022-12-07 16:06:49 -08:00
Qi Wang
143e9c4a2f Enable fast thread locals for dealloc-only threads.
Previously if a thread does only allocations, it stays on the slow path /
minimal initialized state forever.  However, dealloc-only is a valid pattern for
dedicated reclamation threads -- this means thread cache is disabled (no batched
flush) for them, which causes high overhead and contention.

Added the condition to fully initialize TSD when a fair amount of dealloc
activities are observed.
2022-10-25 09:54:38 -07:00
Guangli Dai
ba19d2cb78 Add arena-level name.
An arena-level name can help identify manual arenas.
2022-09-16 15:04:59 -07:00
Guangli Dai
a0734fd6ee Making jemalloc max stack depth a runtime option 2022-09-12 13:56:22 -07:00
Guangli Dai
42daa1ac44 Add double free detection using slab bitmap for debug build
Add a sanity check for double free issue in the arena in case that the tcache has been flushed.
2022-09-06 12:54:21 -07:00
Ivan Zaitsev
36366f3c4c Add double free detection in thread cache for debug build
Add new runtime option `debug_double_free_max_scan` that specifies the max
number of stack entries to scan in the cache bit when trying to detect the
double free bug (currently debug build only).
2022-08-04 16:58:22 -07:00
Alex Lapenkou
5b1f2cc5d7 Implement pvalloc replacement
Despite being an obsolete function, pvalloc is still present in GLIBC and should
work correctly when jemalloc replaces libc allocator.
2022-05-18 17:01:09 -07:00
Qi Wang
66c889500a Make test/unit/background_thread_enable more conservative.
To avoid resource exhaustion on 32-bit platforms.
2022-05-04 15:32:57 -07:00
cuishuang
9a242f16d9 fix some typos
Signed-off-by: cuishuang <imcusg@gmail.com>
2022-04-25 11:29:00 -07:00
Qi Wang
0e29ad4efa Rename zero_realloc option "strict" to "alloc".
With realloc(ptr, 0) being UB per C23, the option name "strict" makes less sense
now.  Rename to "alloc" which describes the behavior.
2022-04-20 10:27:25 -07:00
Charles
eaaa368bab Add comments and use meaningful vars in sz_psz2ind. 2022-03-24 16:56:59 -07:00
Alex Lapenkou
52631c90f6 Fix size class calculation for sec
Due to a bug in sec initialization, the number of cached size classes
was equal to 198. The bug caused the creation of more than a hundred of
unused bins, although it didn't affect the caching logic.
2022-03-22 17:45:55 -07:00
Qi Wang
20f9802e4f Avoid overflow warnings in test/unit/safety_check. 2022-01-27 10:29:54 -08:00
yunxu
b798fabdf7 Add prof_leak_error option
The option makes the process to exit with error code 1 if a memory leak
is detected. This is useful for implementing automated tools that rely
on leak detection.
2022-01-21 16:24:20 -08:00
Qi Wang
648b3b9f76 Lower the num_threads in the stress test of test/unit/prof_recent
This takes a fair amount of resources.  Under high concurrency it was causing
resource exhaustion such as pthread_create and mmap failures.
2022-01-11 16:58:56 -08:00
Qi Wang
6230cc88b6 Add background thread sleep retry in test/unit/hpa_background_thread
Under high concurrency / heavy test load (e.g. using run_tests.sh), the
background thread may not get scheduled for a longer period of time.  Retry 100
times max before bailing out.
2022-01-07 10:28:28 -08:00
Qi Wang
d660683d3d Fix test config of lg_san_uaf_align.
The option may be configure-disabled, which resulted in the invalid options
output from the tests.
2022-01-04 11:03:51 -08:00
Qi Wang
dfdd7562f5 Rename san_enabled() to san_guard_enabled(). 2021-12-29 14:44:43 -08:00
Qi Wang
e491cef9ab Add stats for stashed bytes in tcache. 2021-12-29 14:44:43 -08:00
Qi Wang
b75822bc6e Implement use-after-free detection using junk and stash.
On deallocation, sampled pointers (specially aligned) get junked and stashed
into tcache (to prevent immediate reuse).  The expected behavior is to have
read-after-free corrupted and stopped by the junk-filling, while
write-after-free is checked when flushing the stashed pointers.
2021-12-29 14:44:43 -08:00
Qi Wang
d038160f3b Fix shadowed variable usage.
Verified with EXTRA_CFLAGS=-Wshadow.
2021-12-23 10:55:08 -08:00
Qi Wang
bd70d8fc0f Add the profiling settings for tests explicit.
Many profiling related tests make assumptions on the profiling settings,
e.g. opt_prof is off by default, and prof_active is default on when opt_prof is
on.  However the default settings can be changed via --with-malloc-conf at build
time.  Fixing the tests by adding the assumed settings explicitly.
2021-12-22 20:10:28 -08:00
Qi Wang
837b37c4ce Fix the time-since computation in HPA.
nstime module guarantees monotonic clock update within a single nstime_t.  This
means, if two separate nstime_t variables are read and updated separately,
nstime_subtract between them may result in underflow.  Fixed by switching to the
time since utility provided by nstime.
2021-12-21 23:37:22 -08:00
Qi Wang
310af725b0 Add nstime_ns_since which obtains the duration since the input time. 2021-12-21 23:37:22 -08:00
mweisgut
bb5052ce90 Fix base_ehooks_get_for_metadata 2021-12-20 15:37:53 -08:00
Alex Lapenkou
800ce49c19 San: Bump alloc frequently reused guarded allocations
To utilize a separate retained area for guarded extents, use bump alloc
to allocate those extents.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
f56f5b9930 Pass 'frequent_reuse' hint to PAI
Currently used only for guarding purposes, the hint is used to determine
if the allocation is supposed to be frequently reused. For example, it
might urge the allocator to ensure the allocation is cached.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
2c70e8d351 Rename 'arena_decay' to 'arena_util'
While initially this file contained helper functions for one particular
test, now its usage spread across different test files. Purpose has
shifted towards a collection of handy arena ctl wrappers.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
0f6da1257d San: Implement bump alloc
The new allocator will be used to allocate guarded extents used as slabs
for guarded small allocations.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
34b00f8969 San: Avoid running san tests with prof enabled
With prof enabled, number of page aligned allocations doesn't match the
number of slab "ends" because prof allocations skew the addresses. It
leads to 'pages' array overflow and hard to debug failures.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
62f9c54d2a San: Rename 'guard' to 'san'
This prepares the foundation for more sanitizer-related work in the
future.
2021-12-15 10:39:17 -08:00
Qi Wang
400c59895a Fix uninitialized nstime reading / updating on the stack in hpa.
In order for nstime_update to handle non-monotonic clocks, it requires the input
nstime to be initialized -- when reading for the first time, zero init has to be
done.  Otherwise random stack value may be seen as clocks and returned.
2021-11-16 16:54:12 -08:00
Qi Wang
4d56aaeca5 Optimize away the tsd_fast() check on free fastpath.
To ensure that the free fastpath can tolerate uninitialized tsd, improved the
static initializer for rtree_ctx in tsd.
2021-10-28 10:05:59 -07:00
Alex Lapenkou
8daac7958f Redefine functions with test hooks only for tests
Android build has issues with these defines, this will allow the build to
succeed if it doesn't need to build the tests.
2021-10-15 15:25:36 -07:00
Alex Lapenkou
c9ebff0fd6 Initialize deferred_work_generated
As the code evolves, some code paths that have previously assigned
deferred_work_generated may cease being reached. This would leave the value
uninitialized. This change initializes the value for safety.
2021-10-07 11:50:38 -07:00
David CARLIER
cf9724531a Darwin malloc_size override support proposal.
Darwin has similar api than Linux/FreeBSD's malloc_usable_size.
2021-10-01 14:32:40 -07:00
Qi Wang
83f3294027 Small refactors around 7bb05e0. 2021-09-27 16:05:13 -07:00
Qi Wang
deb8e62a83 Implement guard pages.
Adding guarded extents, which are regular extents surrounded by guard pages
(mprotected).  To reduce syscalls, small guarded extents are cached as a
separate eset in ecache, and decay through the dirty / muzzy / retained pipeline
as usual.
2021-09-26 16:30:15 -07:00
Piotr Balcer
7bb05e04be add experimental.arenas_create_ext mallctl
This mallctl accepts an arena_config_t structure which
can be used to customize the behavior of the arena.
Right now it contains extent_hooks and a new option,
metadata_use_hooks, which controls whether the extent
hooks are also used for metadata allocation.

The medata_use_hooks option has two main use cases:

1. In heterogeneous memory systems, to avoid metadata
being placed on potentially slower memory.

2. Avoiding virtual memory from being leaked as a result
of metadata allocation failure originating in an extent hook.
2021-09-24 13:43:18 -07:00
Alex Lapenkou
a9031a0970 Allow setting a dump hook
If users want to be notified when a heap dump occurs, they can set this hook.
2021-09-22 15:04:01 -07:00