server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
guangli-dai	6fb3b6a8e4	Refactor the tcache initiailization 1. Pre-generate all default tcache ncached_max in tcache_boot; 2. Add getters returning default ncached_max and ncached_max_set; 3. Refactor tcache init so that it is always init with a given setting.	2023-10-18 14:11:46 -07:00
guangli-dai	630f7de952	Add mallctl to set and get ncached_max of each cache_bin. 1. `thread_tcache_ncached_max_read_sizeclass` allows users to get the ncached_max of the bin with the input sizeclass, passed in through oldp (will be upper casted if not an exact bin size is given). 2. `thread_tcache_ncached_max_write` takes in a char array representing the settings for bins in the tcache.	2023-10-17 14:53:23 -07:00
guangli-dai	6b197fdd46	Pre-generate ncached_max for all bins for better tcache_max tuning experience.	2023-10-17 14:53:23 -07:00
Qi Wang	72cfdce718	Allocate tcache stack from base allocator When using metadata_thp, allocate tcache bin stacks from base0, which means they will be placed on huge pages along with other metadata, instead of mixed with other regular allocations. In order to do so, modified the base allocator to support limited reuse: freed tcached stacks (from thread termination) will be returned to base0 and made available for reuse, but no merging will be attempted since they were bump allocated out of base blocks. These reused base extents are managed using separately allocated base edata_t -- they are cached in base->edata_avail when the extent is all allocated. One tricky part is, stats updating must be skipped for such reused extents (since they were accounted for already, and there is no purging for base). This requires tracking the "if is reused" state explicitly and bypass the stats updates when allocating from them.	2023-09-18 12:18:32 -07:00
guangli-dai	a442d9b895	Enable per-tcache tcache_max 1. add tcache_max and nhbins into tcache_t so that they are per-tcache, with one auto tcache per thread, it's also per-thread; 2. add mallctl for each thread to set its own tcache_max (of its auto tcache); 3. store the maximum number of items in each bin instead of using a global storage; 4. add tests for the modifications above. 5. Rename `nhbins` and `tcache_maxclass` to `global_do_not_change_nhbins` and `global_do_not_change_tcache_maxclass`.	2023-09-06 10:47:14 -07:00
guangli-dai	fbca96c433	Remove unnecessary parameters for cache_bin_postincrement.	2023-09-06 10:47:14 -07:00
Kevin Svetlitski	3e82f357bb	Fix all optimization-inhibiting integer-to-pointer casts Following from PR #2481, we replace all integer-to-pointer casts [which hide pointer provenance information (and thus inhibit optimizations)](https://clang.llvm.org/extra/clang-tidy/checks/performance/no-int-to-ptr.html) with equivalent operations that preserve this information. I have enabled the corresponding clang-tidy check in our static analysis CI so that we do not get bitten by this again in the future.	2023-07-24 14:40:42 -07:00
Kevin Svetlitski	41e0b857be	Make headers self-contained by fixing `#include`s Header files are now self-contained, which makes the relationships between the files clearer, and crucially allows LSP tools like `clangd` to function correctly in all of our header files. I have verified that the headers are self-contained (aside from the various Windows shims) by compiling them as if they were C files – in a follow-up commit I plan to add this to CI to ensure we don't regress on this front.	2023-07-14 09:06:32 -07:00
Amaury Séchet	f2b28906e6	Some nits in cache_bin.h	2023-05-01 10:21:17 -07:00
Guangli Dai	ce29b4c3d9	Refactor the remote / cross thread cache bin stats reading Refactored cache_bin.h so that only one function is racy.	2022-09-06 19:41:19 -07:00
Ivan Zaitsev	36366f3c4c	Add double free detection in thread cache for debug build Add new runtime option `debug_double_free_max_scan` that specifies the max number of stack entries to scan in the cache bit when trying to detect the double free bug (currently debug build only).	2022-08-04 16:58:22 -07:00
Alex Lapenkou	ca709c3139	Fix failed assertion due to racy memory access While calculating the number of stashed pointers, multiple variables potentially modified by a concurrent thread were used for the calculation. This led to some inconsistencies, correctly detected by the assertions. The change eliminates some possible inconsistencies by using unmodified variables and only once a concurrently modified one. The assertions are omitted for the cases where we acknowledge potential inconsistencies too.	2022-02-17 09:35:52 -08:00
Qi Wang	eabe889162	Rename full_position to low_bound in cache_bin.h.	2021-12-29 14:44:43 -08:00
Qi Wang	01d61a3c6f	Fix a conversion warning.	2021-12-29 14:44:43 -08:00
Qi Wang	e491cef9ab	Add stats for stashed bytes in tcache.	2021-12-29 14:44:43 -08:00
Qi Wang	b75822bc6e	Implement use-after-free detection using junk and stash. On deallocation, sampled pointers (specially aligned) get junked and stashed into tcache (to prevent immediate reuse). The expected behavior is to have read-after-free corrupted and stopped by the junk-filling, while write-after-free is checked when flushing the stashed pointers.	2021-12-29 14:44:43 -08:00
David Goldblatt	2fcbd18115	Cache bin: Don't reverse flush order. The items we pick to flush matter a lot, but the order in which they get flushed doesn't; just use forward scans. This simplifies the accessing code, both in terms of the C and the generated assembly (i.e. this speeds up the flush pathways).	2021-02-04 14:10:43 -08:00
David Goldblatt	a011c4c22d	cache_bin: Separate out local and remote accesses. This fixes an incorrect debug-mode assert: - T1 starts an arena stats update and reads stack_head from another thread's cache bin, when that cache bin has 1 item in it. - T2 allocates from that cache bin. The cache_bin's stack_head now points to a NULL pointer, since the cache bin is empty. - T1 Re-reads the cache_bin's stack_head to perform an assertion check (since it previously saw that the bin was empty, whatever stack_head points to should be non-NULL).	2021-01-08 14:18:08 -08:00
Yinan Zhang	be5e49f4fa	Add a batch mode for cache_bin_alloc()	2020-11-16 20:58:01 -08:00
Yinan Zhang	566c4a8594	Slight changes to cache bin internal functions	2020-11-16 20:58:01 -08:00
Qi Wang	bf72188f80	Allow opt.tcache_max to accept small size classes. Previously all the small size classes were cached. However this has downsides -- particularly when page size is greater than 4K (e.g. iOS), which will result in much higher SMALL_MAXCLASS. This change allows tcache_max to be set to lower values, to better control resources taken by tcache.	2020-10-24 20:43:44 -07:00
David Goldblatt	ea51e97bb8	Add SEC module: a small extent cache. This can be used to take pressure off a more centralized, worse-sharded allocator without requiring a full break of the arena abstraction.	2020-10-23 11:14:34 -07:00
David Goldblatt	b58dea8d1b	Cache bin: expose ncached_max publicly.	2020-05-16 13:34:23 -07:00
David Goldblatt	cd29ebefd0	Tcache: treat small and large cache bins uniformly	2020-04-14 15:20:19 -07:00
David Goldblatt	92485032b2	Cache bin: improve comments.	2020-03-12 11:54:19 -07:00
David Goldblatt	d701a085c2	Fast path: allow low-water mark changes. This lets us put more allocations on an "almost as fast" path after a flush. This results in around a 4% reduction in malloc cycles in prod workloads (corresponding to about a 0.1% reduction in overall cycles).	2020-03-12 11:54:19 -07:00
David Goldblatt	397da03865	Cache bin: rewrite to track more state. With this, we track all of the empty, full, and low water states together. This simplifies a lot of the tracking logic, since we now don't need the cache_bin_info_t for state queries (except for some debugging).	2020-03-12 11:54:19 -07:00
David Goldblatt	d498a4bb08	Cache bin: Add an emptiness assertion.	2020-03-12 11:54:19 -07:00
David Goldblatt	6a7aa46ef7	Cache bin: Add a debug method for init checking.	2020-03-12 11:54:19 -07:00
David Goldblatt	370c1ea007	Cache bin: Write the unit test in terms of the API I.e. stop allowing the unit test to have secret access to implementation internals.	2020-03-12 11:54:19 -07:00
David Goldblatt	7f5ebd211c	Cache bin: set low-water internally.	2020-03-12 11:54:19 -07:00
David Goldblatt	60113dfe3b	Cache bin: Move in initialization code.	2020-03-12 11:54:19 -07:00
David Goldblatt	44529da852	Cache-bin: Make flush modifications internal I.e. the tcache code just calls a cache-bin function to finish flush (and move pointers around, etc.). It doesn't directly access the cache-bin's owned memory any more.	2020-03-12 11:54:19 -07:00
David Goldblatt	ff6acc6ed5	Cache bin: simplify names and argument ordering. We always start with the cache bin, then its info (if necessary).	2020-03-12 11:54:19 -07:00
David Goldblatt	e1dcc557d6	Cache bin: Only take the relevant cache_bin_info_t Previously, we took an array of cache_bin_info_ts and an index, and dereferenced ourselves. But infos for other cache_bins aren't relevant to any particular cache bin, so that should be the caller's job.	2020-03-12 11:54:19 -07:00
David Goldblatt	1b00d808d7	cache_bin: Don't let arena see empty position.	2020-03-12 11:54:19 -07:00
David Goldblatt	d303f30796	cache_bin nflush -> n. We're going to use it on the fill pathway as well.	2020-03-12 11:54:19 -07:00
David Goldblatt	74d36d78ef	Cache bin: Make ncached_max a query on the info_t.	2020-03-12 11:54:19 -07:00
David Goldblatt	b66c0973cc	cache_bin: Don't allow direct internals access.	2020-03-12 11:54:19 -07:00
David Goldblatt	909c501b07	Cache_bin: Shouldn't know about tcache. Instead, have it take the cache_bin_info_ts to use by pointer. While we're here, add a src file for the cache bin.	2020-03-12 11:54:19 -07:00
David T. Goldblatt	34b7165fde	Put szind_t, pszind_t in sz.h.	2020-02-18 11:22:09 -08:00
Yinan Zhang	05681e387a	Optimize cache_bin_alloc_easy for malloc fast path `tcache_bin_info` is not accessed on malloc fast path but the compiler reserves a register for it, as well as an additional register for `tcache_bin_info[ind].stack_size`. The optimization gets rid of the need for the two registers.	2019-10-21 16:43:45 -07:00
Yinan Zhang	4fe50bc7d0	Fix amd64 MSVC warning	2019-10-18 10:16:29 -07:00
Qi Wang	785b84e603	Make cache_bin_sz_t unsigned. The bin size type was made signed only because the low_water could go -1, which was already removed.	2019-09-04 13:37:07 -07:00
Qi Wang	23dc7a7fba	Fix index type for cache_bin_alloc_easy.	2019-09-04 13:37:07 -07:00
Qi Wang	0043e68d4c	Track low_water == -1 case explicitly. The -1 value of low_water indicates if the cache has been depleted and refilled. Track the status explicitly in the tcache struct. This allows the fast path to check if (cur_ptr > low_water), instead of >=, which avoids reaching slow path when the last item is allocated.	2019-08-21 16:00:38 -07:00
Qi Wang	937ca1db9f	Store ncached_max * ptr_size in tcache_bin_info. With the cache bin metadata switched to pointers, ncached_max is usually accessed and timed by sizeof(ptr). Store the results in tcache_bin_info for direct access, and add a helper function for the ncached_max value.	2019-08-19 12:23:24 -07:00
Qi Wang	7599c82d48	Redesign the cache bin metadata for fast path. Implement the pointer-based metadata for tcache bins -- - 3 pointers are maintained to represent each bin; - 2 of the pointers are compressed on 64-bit; - is_full / is_empty done through pointer comparison; Comparing to the previous counter based design -- - fast-path speed up ~15% in benchmarks - direct pointer comparison and de-reference - no need to access tcache_bin_info in common case	2019-08-19 12:21:44 -07:00
Dave Watson	e2ab215324	refactor tcache_dalloc_small Add a cache_bin_dalloc_easy (to match the alloc_easy function), and use it in tcache_dalloc_small. It will also be used in the new free fastpath.	2018-11-12 13:20:37 -08:00
Dave Watson	09adf18f1a	Remove a branch from cache_bin_alloc_easy Combine the branches for checking for an empty cache_bin, and checking for the low watermark.	2018-10-15 08:18:15 -07:00

1 2

54 Commits