server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Shirui Cheng	e4817c8d89	Cleanup cache_bin_info_t* info input args	2023-10-25 10:27:31 -07:00
guangli-dai	6fb3b6a8e4	Refactor the tcache initiailization 1. Pre-generate all default tcache ncached_max in tcache_boot; 2. Add getters returning default ncached_max and ncached_max_set; 3. Refactor tcache init so that it is always init with a given setting.	2023-10-18 14:11:46 -07:00
guangli-dai	8a22d10b83	Allow setting default ncached_max for each bin through malloc_conf	2023-10-18 14:11:46 -07:00
guangli-dai	630f7de952	Add mallctl to set and get ncached_max of each cache_bin. 1. `thread_tcache_ncached_max_read_sizeclass` allows users to get the ncached_max of the bin with the input sizeclass, passed in through oldp (will be upper casted if not an exact bin size is given). 2. `thread_tcache_ncached_max_write` takes in a char array representing the settings for bins in the tcache.	2023-10-17 14:53:23 -07:00
guangli-dai	6b197fdd46	Pre-generate ncached_max for all bins for better tcache_max tuning experience.	2023-10-17 14:53:23 -07:00
Shirui Cheng	36becb1302	metadata usage breakdowns: tracking edata and rtree usages	2023-10-11 11:56:01 -07:00
Qi Wang	72cfdce718	Allocate tcache stack from base allocator When using metadata_thp, allocate tcache bin stacks from base0, which means they will be placed on huge pages along with other metadata, instead of mixed with other regular allocations. In order to do so, modified the base allocator to support limited reuse: freed tcached stacks (from thread termination) will be returned to base0 and made available for reuse, but no merging will be attempted since they were bump allocated out of base blocks. These reused base extents are managed using separately allocated base edata_t -- they are cached in base->edata_avail when the extent is all allocated. One tricky part is, stats updating must be skipped for such reused extents (since they were accounted for already, and there is no purging for base). This requires tracking the "if is reused" state explicitly and bypass the stats updates when allocating from them.	2023-09-18 12:18:32 -07:00
guangli-dai	a442d9b895	Enable per-tcache tcache_max 1. add tcache_max and nhbins into tcache_t so that they are per-tcache, with one auto tcache per thread, it's also per-thread; 2. add mallctl for each thread to set its own tcache_max (of its auto tcache); 3. store the maximum number of items in each bin instead of using a global storage; 4. add tests for the modifications above. 5. Rename `nhbins` and `tcache_maxclass` to `global_do_not_change_nhbins` and `global_do_not_change_tcache_maxclass`.	2023-09-06 10:47:14 -07:00
guangli-dai	fbca96c433	Remove unnecessary parameters for cache_bin_postincrement.	2023-09-06 10:47:14 -07:00
Evers Chen	7d9eceaf38	Fix array bounds false warning in gcc 12.3.0 1.error: array subscript 232 is above array bounds of ‘size_t[232]’ in gcc 12.3.0 2.it also optimizer to the code	2023-09-05 14:33:55 -07:00
Kevin Svetlitski	3aae792b10	Fix infinite purging loop in HPA As reported in #2449, under certain circumstances it's possible to get stuck in an infinite loop attempting to purge from the HPA. We now handle this by validating the HPA settings at the end of configuration parsing and either normalizing them or aborting depending on if `abort_conf` is set.	2023-08-08 14:36:19 -07:00
Kevin Svetlitski	7e54dd1ddb	Define `PROF_TCTX_SENTINEL` instead of using magic numbers This makes the code more readable on its own, and also sets the stage for more cleanly handling the pointer provenance lints in a following commit.	2023-07-24 14:40:42 -07:00
Kevin Svetlitski	589c63b424	Make eligible global variables `static` and/or `const` For better or worse, Jemalloc has a significant number of global variables. Making all eligible global variables `static` and/or `const` at least makes it slightly easier to reason about them, as these qualifications communicate to the programmer restrictions on their use without having to `grep` the whole codebase.	2023-07-06 14:15:12 -07:00
Qi Wang	602edd7566	Enabled -Wstrict-prototypes and fixed warnings.	2023-07-06 12:00:02 -07:00
Kevin Svetlitski	ebd7e99f5c	Add a test-case for small profiled allocations Validate that small allocations (i.e. those with `size <= SC_SMALL_MAXCLASS`) which are sampled for profiling maintain the expected invariants even though they now take up less space.	2023-07-03 16:19:06 -07:00
Kevin Svetlitski	e1338703ef	Address compiler warnings in the unit tests	2023-07-03 16:06:35 -07:00
Qi Wang	d131331310	Avoid eager purging on the dedicated oversize arena when using bg thds. We have observed new workload patterns (namely ML training type) that cycle through oversized allocations frequently, because 1) the dataset might be sparse which is faster to go through, and 2) GPU accelerated. As a result, the eager purging from the oversize arena becomes a bottleneck. To offer an easy solution, allow normal purging of the oversized extents when background threads are enabled.	2023-06-27 11:57:41 -07:00
Qi Wang	d4a2b8bab1	Add the prof_sys_thread_name feature in the prof_recent unit test. This tests the combination of the prof_recent and thread_name features. Verified that it catches the issue being fixed in this PR. Also explicitly set thread name in test/unit/prof_recent. This fixes the name testing when no default thread name is set (e.g. FreeBSD).	2023-05-11 09:10:57 -07:00
Kevin Svetlitski	70344a2d38	Make eligible functions `static` The codebase is already very disciplined in making any function which can be `static`, but there are a few that appear to have slipped through the cracks.	2023-05-08 15:00:02 -07:00
Qi Wang	434a68e221	Disallow decay during reentrancy. Decay should not be triggered during reentrant calls (may cause lock order reversal / deadlocks). Added a delay_trigger flag to the tickers to bypass decay when rentrancy_level is not zero.	2023-04-05 10:16:37 -07:00
Qi Wang	ce0b7ab6c8	Inline the storage for thread name in prof_tdata_t. The previous approach managed the thread name in a separate buffer, which causes races because the thread name update (triggered by new samples) can happen at the same time as prof dumping (which reads the thread names) -- these two operations are under separate locks to avoid blocking each other. Implemented the thread name storage as part of the tdata struct, which resolves the lifetime issue and also avoids internal alloc / dalloc during prof_sample.	2023-04-05 10:03:12 -07:00
Qi Wang	6cab460a45	Add a multithreaded test for prof_sys_thread_name. Verified that this catches the issue being fixed in `5fd5583`.	2023-04-05 10:03:12 -07:00
Qi Wang	8b64be3441	Explicit arena assignment in test_tcache_max. Otherwise the associated arena could change with percpu arena enabled.	2023-03-22 15:16:43 -07:00
Qi Wang	8e7353a19b	Explicit arena assignment in test_thread_idle. Otherwise the associated arena could change with percpu arena enabled.	2023-03-22 15:16:43 -07:00
Qi Wang	71bc1a3d91	Avoid assuming the arena id in test when percpu_arena is used.	2023-03-13 10:50:10 -07:00
Qi Wang	97b313c7d4	More conservative setting for /test/unit/background_thread_enable. Lower the thread and arena count to avoid resource exhaustion on 32-bit.	2023-02-16 14:42:21 -08:00
Qi Wang	8580c65f81	Implement prof sample hooks "experimental.hooks.prof_sample(_free)". The added hooks hooks.prof_sample and hooks.prof_sample_free are intended to allow advanced users to track additional information, to enable new ways of profiling on top of the jemalloc heap profile and sample features. The sample hook is invoked after the allocation and backtracing, and forwards the both the allocation and backtrace to the user hook; the sample_free hook happens before the actual deallocation, and forwards only the ptr and usz to the hook.	2022-12-07 16:06:49 -08:00
Qi Wang	143e9c4a2f	Enable fast thread locals for dealloc-only threads. Previously if a thread does only allocations, it stays on the slow path / minimal initialized state forever. However, dealloc-only is a valid pattern for dedicated reclamation threads -- this means thread cache is disabled (no batched flush) for them, which causes high overhead and contention. Added the condition to fully initialize TSD when a fair amount of dealloc activities are observed.	2022-10-25 09:54:38 -07:00
Guangli Dai	ba19d2cb78	Add arena-level name. An arena-level name can help identify manual arenas.	2022-09-16 15:04:59 -07:00
Guangli Dai	a0734fd6ee	Making jemalloc max stack depth a runtime option	2022-09-12 13:56:22 -07:00
Guangli Dai	42daa1ac44	Add double free detection using slab bitmap for debug build Add a sanity check for double free issue in the arena in case that the tcache has been flushed.	2022-09-06 12:54:21 -07:00
Ivan Zaitsev	36366f3c4c	Add double free detection in thread cache for debug build Add new runtime option `debug_double_free_max_scan` that specifies the max number of stack entries to scan in the cache bit when trying to detect the double free bug (currently debug build only).	2022-08-04 16:58:22 -07:00
Alex Lapenkou	5b1f2cc5d7	Implement pvalloc replacement Despite being an obsolete function, pvalloc is still present in GLIBC and should work correctly when jemalloc replaces libc allocator.	2022-05-18 17:01:09 -07:00
Qi Wang	66c889500a	Make test/unit/background_thread_enable more conservative. To avoid resource exhaustion on 32-bit platforms.	2022-05-04 15:32:57 -07:00
cuishuang	9a242f16d9	fix some typos Signed-off-by: cuishuang <imcusg@gmail.com>	2022-04-25 11:29:00 -07:00
Qi Wang	0e29ad4efa	Rename zero_realloc option "strict" to "alloc". With realloc(ptr, 0) being UB per C23, the option name "strict" makes less sense now. Rename to "alloc" which describes the behavior.	2022-04-20 10:27:25 -07:00
Charles	eaaa368bab	Add comments and use meaningful vars in sz_psz2ind.	2022-03-24 16:56:59 -07:00
Alex Lapenkou	52631c90f6	Fix size class calculation for sec Due to a bug in sec initialization, the number of cached size classes was equal to 198. The bug caused the creation of more than a hundred of unused bins, although it didn't affect the caching logic.	2022-03-22 17:45:55 -07:00
Qi Wang	20f9802e4f	Avoid overflow warnings in test/unit/safety_check.	2022-01-27 10:29:54 -08:00
yunxu	b798fabdf7	Add prof_leak_error option The option makes the process to exit with error code 1 if a memory leak is detected. This is useful for implementing automated tools that rely on leak detection.	2022-01-21 16:24:20 -08:00
Qi Wang	648b3b9f76	Lower the num_threads in the stress test of test/unit/prof_recent This takes a fair amount of resources. Under high concurrency it was causing resource exhaustion such as pthread_create and mmap failures.	2022-01-11 16:58:56 -08:00
Qi Wang	6230cc88b6	Add background thread sleep retry in test/unit/hpa_background_thread Under high concurrency / heavy test load (e.g. using run_tests.sh), the background thread may not get scheduled for a longer period of time. Retry 100 times max before bailing out.	2022-01-07 10:28:28 -08:00
Qi Wang	d660683d3d	Fix test config of lg_san_uaf_align. The option may be configure-disabled, which resulted in the invalid options output from the tests.	2022-01-04 11:03:51 -08:00
Qi Wang	dfdd7562f5	Rename san_enabled() to san_guard_enabled().	2021-12-29 14:44:43 -08:00
Qi Wang	e491cef9ab	Add stats for stashed bytes in tcache.	2021-12-29 14:44:43 -08:00
Qi Wang	b75822bc6e	Implement use-after-free detection using junk and stash. On deallocation, sampled pointers (specially aligned) get junked and stashed into tcache (to prevent immediate reuse). The expected behavior is to have read-after-free corrupted and stopped by the junk-filling, while write-after-free is checked when flushing the stashed pointers.	2021-12-29 14:44:43 -08:00
Qi Wang	d038160f3b	Fix shadowed variable usage. Verified with EXTRA_CFLAGS=-Wshadow.	2021-12-23 10:55:08 -08:00
Qi Wang	bd70d8fc0f	Add the profiling settings for tests explicit. Many profiling related tests make assumptions on the profiling settings, e.g. opt_prof is off by default, and prof_active is default on when opt_prof is on. However the default settings can be changed via --with-malloc-conf at build time. Fixing the tests by adding the assumed settings explicitly.	2021-12-22 20:10:28 -08:00
Qi Wang	837b37c4ce	Fix the time-since computation in HPA. nstime module guarantees monotonic clock update within a single nstime_t. This means, if two separate nstime_t variables are read and updated separately, nstime_subtract between them may result in underflow. Fixed by switching to the time since utility provided by nstime.	2021-12-21 23:37:22 -08:00
Qi Wang	310af725b0	Add nstime_ns_since which obtains the duration since the input time.	2021-12-21 23:37:22 -08:00

1 2 3 4 5 ...

623 Commits