server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
guangli-dai	8a22d10b83	Allow setting default ncached_max for each bin through malloc_conf	2023-10-18 14:11:46 -07:00
guangli-dai	867eedfc58	Fix the bug in dalloc promoted allocations. An allocation small enough will be promoted so that it does not share an extent with others. However, when dalloc, such allocations may not be dalloc as a promoted one if nbins < SC_NBINS. This commit fixes the bug.	2023-10-17 14:53:23 -07:00
guangli-dai	630f7de952	Add mallctl to set and get ncached_max of each cache_bin. 1. `thread_tcache_ncached_max_read_sizeclass` allows users to get the ncached_max of the bin with the input sizeclass, passed in through oldp (will be upper casted if not an exact bin size is given). 2. `thread_tcache_ncached_max_write` takes in a char array representing the settings for bins in the tcache.	2023-10-17 14:53:23 -07:00
guangli-dai	6b197fdd46	Pre-generate ncached_max for all bins for better tcache_max tuning experience.	2023-10-17 14:53:23 -07:00
Shirui Cheng	36becb1302	metadata usage breakdowns: tracking edata and rtree usages	2023-10-11 11:56:01 -07:00
Qi Wang	005f20aa7f	Fix comments about malloc_conf to enable logging.	2023-10-04 11:49:10 -07:00
guangli-dai	7a9e4c9073	Mark jemalloc.h as system header to resolve header conflicts.	2023-10-04 11:41:30 -07:00
Qi Wang	72cfdce718	Allocate tcache stack from base allocator When using metadata_thp, allocate tcache bin stacks from base0, which means they will be placed on huge pages along with other metadata, instead of mixed with other regular allocations. In order to do so, modified the base allocator to support limited reuse: freed tcached stacks (from thread termination) will be returned to base0 and made available for reuse, but no merging will be attempted since they were bump allocated out of base blocks. These reused base extents are managed using separately allocated base edata_t -- they are cached in base->edata_avail when the extent is all allocated. One tricky part is, stats updating must be skipped for such reused extents (since they were accounted for already, and there is no purging for base). This requires tracking the "if is reused" state explicitly and bypass the stats updates when allocating from them.	2023-09-18 12:18:32 -07:00
guangli-dai	a442d9b895	Enable per-tcache tcache_max 1. add tcache_max and nhbins into tcache_t so that they are per-tcache, with one auto tcache per thread, it's also per-thread; 2. add mallctl for each thread to set its own tcache_max (of its auto tcache); 3. store the maximum number of items in each bin instead of using a global storage; 4. add tests for the modifications above. 5. Rename `nhbins` and `tcache_maxclass` to `global_do_not_change_nhbins` and `global_do_not_change_tcache_maxclass`.	2023-09-06 10:47:14 -07:00
guangli-dai	fbca96c433	Remove unnecessary parameters for cache_bin_postincrement.	2023-09-06 10:47:14 -07:00
Qi Wang	b71da25b8a	Fix reading CPU id using rdtscp. As pointed out in #2527, the correct register containing CPU id should be ecx instead edx.	2023-08-28 11:46:39 -07:00
Kevin Svetlitski	da66aa391f	Enable a few additional warnings for CI and fix the issues they uncovered - `-Wmissing-prototypes` and `-Wmissing-variable-declarations` are helpful for finding dead code and/or things that should be `static` but aren't marked as such. - `-Wunused-macros` is of similar utility, but for identifying dead macros. - `-Wunreachable-code` and `-Wunreachable-code-aggressive` do exactly what they say: flag unreachable code.	2023-08-11 13:56:23 -07:00
Kevin Svetlitski	d2c9ed3d1e	Ensure short `read(2)`s/`write(2)`s are properly handled by IO utilities `read(2)` and `write(2)` may read or write fewer bytes than were requested. In order to robustly ensure that all of the requested bytes are read/written, these edge-cases must be handled.	2023-08-11 13:36:24 -07:00
Kevin Svetlitski	4f50f782fa	Use compiler-provided assume builtins when available There are several benefits to this: 1. It's cleaner and more reliable to use the builtin to inform the compiler of assumptions instead of hoping that the optimizer understands your intentions. 2. `clang` will warn you if any of your assumptions would produce side-effects (which the compiler will discard). [This blog post](https://fastcompression.blogspot.com/2019/01/compiler-checked-contracts.html) by Yann Collet highlights that a hazard of using the `unreachable()`-based method of signaling assumptions is that it can sometimes result in additional instructions being generated (see [this Godbolt link](https://godbolt.org/z/lKNMs3) from the blog post for an example).	2023-08-08 14:59:36 -07:00
Kevin Svetlitski	3aae792b10	Fix infinite purging loop in HPA As reported in #2449, under certain circumstances it's possible to get stuck in an infinite loop attempting to purge from the HPA. We now handle this by validating the HPA settings at the end of configuration parsing and either normalizing them or aborting depending on if `abort_conf` is set.	2023-08-08 14:36:19 -07:00
Kevin Svetlitski	424dd61d57	Issue a warning upon directly accessing an arena's bins An arena's bins should normally be accessed via the `arena_get_bin` function, which properly takes into account bin-shards. To ensure that we don't accidentally commit code which incorrectly accesses the bins directly, we mark the field with `__attribute__((deprecated))` with an appropriate warning message, and suppress the warning in the few places where directly accessing the bins is allowed.	2023-08-04 15:47:05 -07:00
Kevin Svetlitski	120abd703a	Add support for the `deprecated` attribute This is useful for enforcing the usage of getter/setter functions to access fields which are considered private or have unique access constraints.	2023-08-04 15:47:05 -07:00
Kevin Svetlitski	b01d496646	Add an override for the compile-time malloc_conf to `jemalloc_internal_overrides.h`	2023-07-31 14:53:15 -07:00
Kevin Svetlitski	8ff7e7d6c3	Remove errant `#include`s in public `jemalloc.h` header In an attempt to make all headers self-contained, I inadvertently added `#include`s which refer to intermediate, generated headers that aren't included in the final install. Closes #2489.	2023-07-25 16:26:50 -07:00
Kevin Svetlitski	3e82f357bb	Fix all optimization-inhibiting integer-to-pointer casts Following from PR #2481, we replace all integer-to-pointer casts [which hide pointer provenance information (and thus inhibit optimizations)](https://clang.llvm.org/extra/clang-tidy/checks/performance/no-int-to-ptr.html) with equivalent operations that preserve this information. I have enabled the corresponding clang-tidy check in our static analysis CI so that we do not get bitten by this again in the future.	2023-07-24 14:40:42 -07:00
Kevin Svetlitski	4827bb17bd	Remove vestigial `TCACHE_STATE_*` macros	2023-07-24 14:40:42 -07:00
Kevin Svetlitski	7e54dd1ddb	Define `PROF_TCTX_SENTINEL` instead of using magic numbers This makes the code more readable on its own, and also sets the stage for more cleanly handling the pointer provenance lints in a following commit.	2023-07-24 14:40:42 -07:00
Kevin Svetlitski	c49c17f128	Suppress verbose frame address warnings These warnings are not useful, and make the output of some CI jobs enormous and difficult to read, so let's suppress them.	2023-07-24 10:44:17 -07:00
Kevin Svetlitski	cdb2c0e02f	Implement C23's `free_sized` and `free_aligned_sized` [N2699 - Sized Memory Deallocation](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2699.htm) introduced two new functions which were incorporated into the C23 standard, `free_sized` and `free_aligned_sized`. Both already have analogues in Jemalloc, all we are doing here is adding the appropriate wrappers.	2023-07-20 15:06:41 -07:00
Kevin Svetlitski	41e0b857be	Make headers self-contained by fixing `#include`s Header files are now self-contained, which makes the relationships between the files clearer, and crucially allows LSP tools like `clangd` to function correctly in all of our header files. I have verified that the headers are self-contained (aside from the various Windows shims) by compiling them as if they were C files – in a follow-up commit I plan to add this to CI to ensure we don't regress on this front.	2023-07-14 09:06:32 -07:00
Kevin Svetlitski	856db56f6e	Move tsd implementation details into `tsd_internals.h` This is a prerequisite to achieving self-contained headers. Previously, the various tsd implementation headers (`tsd_generic.h`, `tsd_tls.h`, `tsd_malloc_thread_cleanup.h`, and `tsd_win.h`) relied implicitly on being included in `tsd.h` after a variety of dependencies had been defined above them. This commit instead makes these dependencies explicit by splitting them out into a separate file, `tsd_internals.h`, which each of the tsd implementation headers includes directly.	2023-07-14 09:06:32 -07:00
Kevin Svetlitski	36ca0c1b7d	Stop concealing pointer provenance in `phn_link_get` At least for LLVM, [casting from an integer to a pointer hides provenance information](https://clang.llvm.org/extra/clang-tidy/checks/performance/no-int-to-ptr.html) and inhibits optimizations. Here's a [Godbolt link](https://godbolt.org/z/5bYPcKoWT) showing how this change removes a couple unnecessary branches in `phn_merge_siblings`, which is a very hot function. Canary profiles show only minor improvements (since most of the cost of this function is in cache misses), but there's no reason we shouldn't take it.	2023-07-13 15:12:31 -07:00
Kevin Svetlitski	1d9e9c2ed6	Fix inconsistent parameter names between definition/declaration pairs For the sake of consistency, function definitions and their corresponding declarations should use the same names for parameters. I've enabled this check in static analysis to prevent this issue from occurring again in the future.	2023-07-13 12:59:47 -07:00
Kevin Svetlitski	589c63b424	Make eligible global variables `static` and/or `const` For better or worse, Jemalloc has a significant number of global variables. Making all eligible global variables `static` and/or `const` at least makes it slightly easier to reason about them, as these qualifications communicate to the programmer restrictions on their use without having to `grep` the whole codebase.	2023-07-06 14:15:12 -07:00
Qi Wang	e249d1a2a1	Remove unreachable code.	2023-07-06 12:06:06 -07:00
Qi Wang	602edd7566	Enabled -Wstrict-prototypes and fixed warnings.	2023-07-06 12:00:02 -07:00
Kevin Svetlitski	5a858c64d6	Reduce the memory overhead of sampled small allocations Previously, small allocations which were sampled as part of heap profiling were rounded up to `SC_LARGE_MINCLASS`. This additional memory usage becomes problematic when the page size is increased, as noted in #2358. Small allocations are now rounded up to the nearest multiple of `PAGE` instead, reducing the memory overhead by a factor of 4 in the most extreme cases.	2023-07-03 16:19:06 -07:00
Kevin Svetlitski	f2e00d2fd3	Remove trailing whitespace Additionally, added a GitHub Action to ensure no more trailing whitespace will creep in again in the future. I'm excluding Markdown files from this check, since trailing whitespace is significant there, and also excluding `build-aux/install-sh` because there is significant trailing whitespace on the line that sets `defaultIFS`.	2023-06-23 11:58:18 -07:00
Kevin Svetlitski	bb0333e745	Fix remaining static analysis warnings Fix or suppress the remaining warnings generated by static analysis. This is a necessary step before we can incorporate static analysis into CI. Where possible, I've preferred to modify the code itself instead of just disabling the warning with a magic comment, so that if we decide to use different static analysis tools in the future we will be covered against them raising similar warnings.	2023-06-23 11:50:29 -07:00
Kevin Svetlitski	210f0d0b2b	Fix read of uninitialized data in `prof_free` In #2433, I inadvertently introduced a regression which causes the use of uninitialized data. Namely, the control path I added for the safety check in `arena_prof_info_get` neglected to set `prof_info->alloc_tctx` when the check fails, resulting in `prof_info.alloc_tctx` being uninitialized [when it is read at the end of `prof_free`](`90176f8a87/include/jemalloc/internal/prof_inlines.h (L272)`).	2023-06-15 18:30:05 -07:00
Kevin Svetlitski	90176f8a87	Fix segfault in rb `_tree_remove` Static analysis flagged this. It's possible to segfault in the `_tree_remove` function generated by `rb_gen`, as `nodep` may still be `NULL` after the initial for loop. I can confirm from reviewing the fleetwide coredump data that this was in fact being hit in production, primarily through `tctx_tree_remove`, and much more rarely through `gctx_tree_remove`.	2023-06-07 14:48:41 -07:00
Qi Wang	86eb49b478	Fix the arena selection for oversized allocations. Use the per-arena oversize_threshold, instead of the global setting.	2023-06-06 15:03:13 -07:00
Kevin Svetlitski	6d4aa33753	Extract the calculation of psset heap assignment for an hpdata into a common function This is in preparation for upcoming changes I plan to make to this logic. Extracting it into a common function will make this easier and less error-prone, and cleans up the existing code regardless.	2023-05-31 11:44:04 -07:00
Qi Wang	d577e9b588	Explicitly cast to unsigned for MALLOCX_ARENA and _TCACHE defines.	2023-05-26 11:52:42 -07:00
Qi Wang	a2259f9fa6	Fix the include path of "jemalloc_internal_overrides.h".	2023-05-25 15:22:02 -07:00
Kevin Svetlitski	4e6f1e9208	Allow overriding `LG_PAGE` This is useful for our internal builds where we override the configuration in the header files generated by autoconf.	2023-05-17 13:55:38 -07:00
Kevin Svetlitski	3e2ba7a651	Remove dead stores detected by static analysis None of these are harmful, and they are almost certainly optimized away by the compiler. The motivation for fixing them anyway is that we'd like to enable static analysis as part of CI, and the first step towards that is resolving the warnings it produces at present.	2023-05-11 20:27:49 -07:00
Qi Wang	6ea8a7e928	Add config detection for JEMALLOC_HAVE_PTHREAD_SET_NAME_NP. and use it on the background thread name setting.	2023-05-11 09:10:57 -07:00
Kevin Svetlitski	70344a2d38	Make eligible functions `static` The codebase is already very disciplined in making any function which can be `static`, but there are a few that appear to have slipped through the cracks.	2023-05-08 15:00:02 -07:00
Kevin Svetlitski	6841110bd6	Make `edata_cmp_summary_comp` 30% faster `edata_cmp_summary_comp` is one of the very hottest functions, taking up 3% of all time spent inside Jemalloc. I noticed that all existing callsites rely only on the sign of the value returned by this function, so I came up with this equivalent branchless implementation which preserves this property. After empirical measurement, I have found that this implementation is 30% faster, therefore representing a 1% speed-up to the allocator as a whole. At @interwq's suggestion, I've applied the same optimization to `edata_esnead_comp` in case this function becomes hotter in the future.	2023-05-04 09:59:17 -07:00
Amaury Séchet	f2b28906e6	Some nits in cache_bin.h	2023-05-01 10:21:17 -07:00
guangli-dai	5f64ad60cd	Remove locked flag set in malloc_mutex_trylock As a hint flag of the lock, parameter locked should be set only when the lock is gained or freed.	2023-04-06 10:57:04 -07:00
Qi Wang	434a68e221	Disallow decay during reentrancy. Decay should not be triggered during reentrant calls (may cause lock order reversal / deadlocks). Added a delay_trigger flag to the tickers to bypass decay when rentrancy_level is not zero.	2023-04-05 10:16:37 -07:00
Qi Wang	e62aa478c7	Rearrange the bools in prof_tdata_t to save some bytes. This lowered the sizeof(prof_tdata_t) from 200 to 192 which is a round size class. Afterwards the tdata_t size remain unchanged with the last commit, which effectively inlined the storage of thread names for free.	2023-04-05 10:03:12 -07:00
Qi Wang	ce0b7ab6c8	Inline the storage for thread name in prof_tdata_t. The previous approach managed the thread name in a separate buffer, which causes races because the thread name update (triggered by new samples) can happen at the same time as prof dumping (which reads the thread names) -- these two operations are under separate locks to avoid blocking each other. Implemented the thread name storage as part of the tdata struct, which resolves the lifetime issue and also avoids internal alloc / dalloc during prof_sample.	2023-04-05 10:03:12 -07:00

1 2 3 4 5 ...

1602 Commits