server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Qi Wang	3025b021b9	Optimize mutex and bin alignment / locality.	2023-10-23 20:28:26 -07:00
guangli-dai	e2cd27132a	Change stack_size assertion back to the more compatabile one.	2023-10-23 20:28:26 -07:00
guangli-dai	d88fa71bbd	Fix nfill = 0 bug when ncached_max is 1	2023-10-18 14:11:46 -07:00
guangli-dai	6fb3b6a8e4	Refactor the tcache initiailization 1. Pre-generate all default tcache ncached_max in tcache_boot; 2. Add getters returning default ncached_max and ncached_max_set; 3. Refactor tcache init so that it is always init with a given setting.	2023-10-18 14:11:46 -07:00
guangli-dai	8a22d10b83	Allow setting default ncached_max for each bin through malloc_conf	2023-10-18 14:11:46 -07:00
guangli-dai	630f7de952	Add mallctl to set and get ncached_max of each cache_bin. 1. `thread_tcache_ncached_max_read_sizeclass` allows users to get the ncached_max of the bin with the input sizeclass, passed in through oldp (will be upper casted if not an exact bin size is given). 2. `thread_tcache_ncached_max_write` takes in a char array representing the settings for bins in the tcache.	2023-10-17 14:53:23 -07:00
guangli-dai	6b197fdd46	Pre-generate ncached_max for all bins for better tcache_max tuning experience.	2023-10-17 14:53:23 -07:00
Shirui Cheng	36becb1302	metadata usage breakdowns: tracking edata and rtree usages	2023-10-11 11:56:01 -07:00
Qi Wang	72cfdce718	Allocate tcache stack from base allocator When using metadata_thp, allocate tcache bin stacks from base0, which means they will be placed on huge pages along with other metadata, instead of mixed with other regular allocations. In order to do so, modified the base allocator to support limited reuse: freed tcached stacks (from thread termination) will be returned to base0 and made available for reuse, but no merging will be attempted since they were bump allocated out of base blocks. These reused base extents are managed using separately allocated base edata_t -- they are cached in base->edata_avail when the extent is all allocated. One tricky part is, stats updating must be skipped for such reused extents (since they were accounted for already, and there is no purging for base). This requires tracking the "if is reused" state explicitly and bypass the stats updates when allocating from them.	2023-09-18 12:18:32 -07:00
guangli-dai	a442d9b895	Enable per-tcache tcache_max 1. add tcache_max and nhbins into tcache_t so that they are per-tcache, with one auto tcache per thread, it's also per-thread; 2. add mallctl for each thread to set its own tcache_max (of its auto tcache); 3. store the maximum number of items in each bin instead of using a global storage; 4. add tests for the modifications above. 5. Rename `nhbins` and `tcache_maxclass` to `global_do_not_change_nhbins` and `global_do_not_change_tcache_maxclass`.	2023-09-06 10:47:14 -07:00
guangli-dai	fbca96c433	Remove unnecessary parameters for cache_bin_postincrement.	2023-09-06 10:47:14 -07:00
Qi Wang	7d563a8f81	Update safety check message to remove --enable-debug when it's already on.	2023-09-05 14:15:45 -07:00
Kevin Svetlitski	da66aa391f	Enable a few additional warnings for CI and fix the issues they uncovered - `-Wmissing-prototypes` and `-Wmissing-variable-declarations` are helpful for finding dead code and/or things that should be `static` but aren't marked as such. - `-Wunused-macros` is of similar utility, but for identifying dead macros. - `-Wunreachable-code` and `-Wunreachable-code-aggressive` do exactly what they say: flag unreachable code.	2023-08-11 13:56:23 -07:00
Kevin Svetlitski	3aae792b10	Fix infinite purging loop in HPA As reported in #2449, under certain circumstances it's possible to get stuck in an infinite loop attempting to purge from the HPA. We now handle this by validating the HPA settings at the end of configuration parsing and either normalizing them or aborting depending on if `abort_conf` is set.	2023-08-08 14:36:19 -07:00
Kevin Svetlitski	424dd61d57	Issue a warning upon directly accessing an arena's bins An arena's bins should normally be accessed via the `arena_get_bin` function, which properly takes into account bin-shards. To ensure that we don't accidentally commit code which incorrectly accesses the bins directly, we mark the field with `__attribute__((deprecated))` with an appropriate warning message, and suppress the warning in the few places where directly accessing the bins is allowed.	2023-08-04 15:47:05 -07:00
Kevin Svetlitski	07a2eab3ed	Stop over-reporting memory usage from sampled small allocations @interwq noticed [while reviewing an earlier PR](https://github.com/jemalloc/jemalloc/pull/2478#discussion_r1256217261) that I missed modifying this statistics accounting in line with the rest of the changes from #2459. This is now fixed, such that sampled small allocations increment the `.nmalloc`/`.ndalloc` of their effective bin size instead of over-reporting memory usage by attributing all such allocations to `SC_LARGE_MINCLASS`.	2023-08-03 16:12:22 -07:00
Qi Wang	6816b23862	Include the unrecognized malloc conf option in the error message. Previously the option causing trouble will not be printed, unless the option key:value pair format is found.	2023-08-02 10:44:55 -07:00
Kevin Svetlitski	62648c88e5	Ensured sampled allocations are properly deallocated during `arena_reset` Sampled allocations were not being demoted before being deallocated during an `arena_reset` operation.	2023-08-01 11:35:37 -07:00
Kevin Svetlitski	9ba1e1cb37	Make `ctl_arena_clear` slightly more efficient While this function isn't particularly hot, (accounting for just 0.27% of time spent inside the allocator on average across the fleet), looking at the generated assembly and performance profiles does show we're dispatching to multiple different `memset`s when we could instead be just tail-calling `memset` once, reducing code size and marginally improving performance.	2023-07-31 14:44:04 -07:00
Kevin Svetlitski	3e82f357bb	Fix all optimization-inhibiting integer-to-pointer casts Following from PR #2481, we replace all integer-to-pointer casts [which hide pointer provenance information (and thus inhibit optimizations)](https://clang.llvm.org/extra/clang-tidy/checks/performance/no-int-to-ptr.html) with equivalent operations that preserve this information. I have enabled the corresponding clang-tidy check in our static analysis CI so that we do not get bitten by this again in the future.	2023-07-24 14:40:42 -07:00
Kevin Svetlitski	1431153695	Define `SBRK_INVALID` instead of using a magic number	2023-07-24 14:40:42 -07:00
Kevin Svetlitski	7e54dd1ddb	Define `PROF_TCTX_SENTINEL` instead of using magic numbers This makes the code more readable on its own, and also sets the stage for more cleanly handling the pointer provenance lints in a following commit.	2023-07-24 14:40:42 -07:00
Kevin Svetlitski	c49c17f128	Suppress verbose frame address warnings These warnings are not useful, and make the output of some CI jobs enormous and difficult to read, so let's suppress them.	2023-07-24 10:44:17 -07:00
Kevin Svetlitski	cdb2c0e02f	Implement C23's `free_sized` and `free_aligned_sized` [N2699 - Sized Memory Deallocation](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2699.htm) introduced two new functions which were incorporated into the C23 standard, `free_sized` and `free_aligned_sized`. Both already have analogues in Jemalloc, all we are doing here is adding the appropriate wrappers.	2023-07-20 15:06:41 -07:00
Kevin Svetlitski	589c63b424	Make eligible global variables `static` and/or `const` For better or worse, Jemalloc has a significant number of global variables. Making all eligible global variables `static` and/or `const` at least makes it slightly easier to reason about them, as these qualifications communicate to the programmer restrictions on their use without having to `grep` the whole codebase.	2023-07-06 14:15:12 -07:00
Qi Wang	602edd7566	Enabled -Wstrict-prototypes and fixed warnings.	2023-07-06 12:00:02 -07:00
Kevin Svetlitski	5a858c64d6	Reduce the memory overhead of sampled small allocations Previously, small allocations which were sampled as part of heap profiling were rounded up to `SC_LARGE_MINCLASS`. This additional memory usage becomes problematic when the page size is increased, as noted in #2358. Small allocations are now rounded up to the nearest multiple of `PAGE` instead, reducing the memory overhead by a factor of 4 in the most extreme cases.	2023-07-03 16:19:06 -07:00
Qi Wang	d131331310	Avoid eager purging on the dedicated oversize arena when using bg thds. We have observed new workload patterns (namely ML training type) that cycle through oversized allocations frequently, because 1) the dataset might be sparse which is faster to go through, and 2) GPU accelerated. As a result, the eager purging from the oversize arena becomes a bottleneck. To offer an easy solution, allow normal purging of the oversized extents when background threads are enabled.	2023-06-27 11:57:41 -07:00
Kevin Svetlitski	bb0333e745	Fix remaining static analysis warnings Fix or suppress the remaining warnings generated by static analysis. This is a necessary step before we can incorporate static analysis into CI. Where possible, I've preferred to modify the code itself instead of just disabling the warning with a magic comment, so that if we decide to use different static analysis tools in the future we will be covered against them raising similar warnings.	2023-06-23 11:50:29 -07:00
Qi Wang	86eb49b478	Fix the arena selection for oversized allocations. Use the per-arena oversize_threshold, instead of the global setting.	2023-06-06 15:03:13 -07:00
Christos Zoulas	5832ef6589	Use a local variable to set the alignment for this particular allocation instead of changing mmap_flags which makes the change permanent. This was enforcing large alignments for allocations that did not need it causing fragmentation. Reported by Andreas Gustafsson.	2023-05-31 14:44:24 -07:00
Kevin Svetlitski	6d4aa33753	Extract the calculation of psset heap assignment for an hpdata into a common function This is in preparation for upcoming changes I plan to make to this logic. Extracting it into a common function will make this easier and less error-prone, and cleans up the existing code regardless.	2023-05-31 11:44:04 -07:00
Arne Welzel	d59e30cbc9	Rename fallback_impl to fallbackNewImpl and prune in jeprof The existing fallback_impl name seemed a bit generic and given it's static probably okay to rename. Closes #2451	2023-05-31 11:41:09 -07:00
Kevin Svetlitski	9c32689e57	Fix bug where hpa_shard was not being destroyed It appears that this was a simple mistake where `hpa_shard_disable` was being called instead of `hpa_shard_destroy`. At present `hpa_shard_destroy` is not called anywhere at all outside of test-cases, which further suggests that this is a bug. @davidtgoldblatt noted however that since HPA is disabled for manual arenas and we don't support destruction for auto arenas that presently there is no way to actually trigger this bug. Nonetheless, it should be fixed.	2023-05-18 14:17:38 -07:00
Kevin Svetlitski	3e2ba7a651	Remove dead stores detected by static analysis None of these are harmful, and they are almost certainly optimized away by the compiler. The motivation for fixing them anyway is that we'd like to enable static analysis as part of CI, and the first step towards that is resolving the warnings it produces at present.	2023-05-11 20:27:49 -07:00
Kevin Svetlitski	0288126d9c	Fix possible `NULL` pointer dereference from `mallctl("prof.prefix", ...)` Static analysis flagged this issue. Here is a minimal program which causes a segfault within Jemalloc: ``` #include <jemalloc/jemalloc.h> const char *malloc_conf = "prof:true"; int main() { mallctl("prof.prefix", NULL, NULL, NULL, 0); } ``` Fixed by checking if `prefix` is `NULL`.	2023-05-11 14:47:50 -07:00
Qi Wang	94ace05832	Fix the prof thread_name reference in prof_recent dump. As pointed out in #2434, the thread_name in prof_tdata_t was changed in #2407. This also requires an update for the prof_recent dump, specifically the emitter expects a "char **" which is fixed in this commit.	2023-05-11 09:10:57 -07:00
Qi Wang	6ea8a7e928	Add config detection for JEMALLOC_HAVE_PTHREAD_SET_NAME_NP. and use it on the background thread name setting.	2023-05-11 09:10:57 -07:00
auxten	5bac384970	If ptr present check if alloc_ctx.edata == NULL	2023-05-10 17:18:22 -07:00
auxten	019cccc293	Make arenas_lookup_ctl triable	2023-05-10 17:18:22 -07:00
Kevin Svetlitski	dc0a184f8d	Fix possible `NULL` pointer dereference in `VERIFY_READ` Static analysis flagged this. Fixed by simply checking `oldlenp` before dereferencing it.	2023-05-09 10:57:09 -07:00
Kevin Svetlitski	12311fe6c3	Fix segfault in `extent_try_coalesce_impl` Static analysis flagged this. `extent_record` was passing `NULL` as the value for `coalesced` to `extent_try_coalesce`, which in turn passes that argument to `extent_try_coalesce_impl`, where it is written to without checking if it is `NULL`. I can confirm from reviewing the fleetwide coredump data that this was in fact being hit in production.	2023-05-09 10:55:44 -07:00
Kevin Svetlitski	70344a2d38	Make eligible functions `static` The codebase is already very disciplined in making any function which can be `static`, but there are a few that appear to have slipped through the cracks.	2023-05-08 15:00:02 -07:00
Kevin Svetlitski	fc680128e0	Remove errant `assert` in `arena_extent_alloc_large` This codepath may generate deferred work when the HPA is enabled. See also [@davidtgoldblatt's relevant comment on the PR which introduced this](https://github.com/jemalloc/jemalloc/pull/2107#discussion_r699770967) which prevented a similarly incorrect `assert` from being added elsewhere.	2023-05-01 10:00:30 -07:00
Eric Mueller	521970fb2e	Check for equality instead of assigning in asserts in hpa_from_pai. It appears like a simple typo means we're unconditionally overwriting some fields in hpa_from_pai when asserts are enabled. From hpa_shard_init, it looks like these fields have these values anyway, so this shouldn't cause bugs, but if something is wrong it seems better to have these asserts in place. See issue #2412.	2023-04-17 20:57:48 -07:00
Qi Wang	ce0b7ab6c8	Inline the storage for thread name in prof_tdata_t. The previous approach managed the thread name in a separate buffer, which causes races because the thread name update (triggered by new samples) can happen at the same time as prof dumping (which reads the thread names) -- these two operations are under separate locks to avoid blocking each other. Implemented the thread name storage as part of the tdata struct, which resolves the lifetime issue and also avoids internal alloc / dalloc during prof_sample.	2023-04-05 10:03:12 -07:00
Amaury Séchet	f743690739	Remove unused mutex from hpa_central	2023-03-10 11:25:47 -08:00
Qi Wang	c7805f1eb5	Add a header in HPA stats for the nonfull slabs.	2023-02-17 13:31:27 -08:00
Qi Wang	b6125120ac	Add an explicit name to the dedicated oversize arena.	2023-02-17 13:31:09 -08:00
Qi Wang	5fd55837bb	Fix thread_name updating for heap profiling. The current thread name reading path updates the name every time, which requires both alloc and dalloc -- and the temporary NULL value in the middle causes races where the prof dump read path gets NULLed in the middle. Minimize the changes in this commit to isolate the bugfix testing; will also refactor the whole thread name paths later.	2023-02-15 17:49:40 -08:00

1 2 3 4 5 ...

1873 Commits