server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
David Goldblatt	40672b0b78	Remove duplicate logging in malloc.	2020-06-16 10:33:55 -07:00
Jon Haslam	4aea743279	High Resolution Timestamps for Profiling	2020-06-15 12:12:49 -07:00
David Goldblatt	6cdac3c573	Tcache: Make flush fractions configurable.	2020-05-16 13:34:23 -07:00
David Goldblatt	ee72bf1cfd	Tcache: Add tcache gc delay option. This can reduce flushing frequency for small size classes.	2020-05-16 13:34:23 -07:00
David Goldblatt	d338dd45d7	Tcache: Make incremental gc bytes configurable.	2020-05-16 13:34:23 -07:00
David Goldblatt	181093173d	Tcache: make slot sizing configurable.	2020-05-16 13:34:23 -07:00
David Goldblatt	634afc4124	Tcache: Make size computation configurable.	2020-05-16 13:34:23 -07:00
David Goldblatt	5dead37a9d	Allow narenas:default. This can be useful when you know you want to override some lower-priority configuration setting with its default value, but don't know what that value would be.	2020-05-14 10:30:08 -07:00
Yinan Zhang	f533ab6da6	Add forking handling for stats	2020-05-11 15:35:06 -07:00
David Goldblatt	f1f8a75496	Let opt.zero propagate to core allocation. I.e. set dopts->zero early on if opt.zero is true, rather than leaving it set by the entry-point function (malloc, calloc, etc.) and then memsetting. This avoids situations where we zero once in the large-alloc pathway and then again via memset.	2020-05-04 12:36:45 -07:00
David Goldblatt	cd29ebefd0	Tcache: treat small and large cache bins uniformly	2020-04-14 15:20:19 -07:00
David Goldblatt	a13fbad374	Tcache: split up fast and slow path data.	2020-04-14 15:20:19 -07:00
David Goldblatt	79ae7f9211	Rtree: Remove the per-field accessors. We instead split things into "edata" and "metadata".	2020-04-10 13:12:47 -07:00
David Goldblatt	294b276fc7	PA: Parameterize emap. Move emap_global to arena. This lets us test the PA module without interfering with the global emap used by the real allocator (the one not under test).	2020-04-10 13:12:47 -07:00
Yinan Zhang	09cd79495f	Encapsulate buffer allocation failure in buffered writer	2020-04-01 09:41:20 -07:00
David T. Goldblatt	d936b46d3a	Add malloc_conf_2_conf_harder This comes in handy when you're just a user of a canary system who wants to change settings set by the configuration system itself.	2020-03-31 06:25:08 -07:00
Yinan Zhang	2256ef8961	Add option to fetch system thread name on each prof sample	2020-03-24 21:39:57 -07:00
Yinan Zhang	ba783b3a0f	Remove prof -> thread_event dependency	2020-03-12 13:55:00 -07:00
Yinan Zhang	441d88d1c7	Rewrite profiling thread event	2020-03-12 13:55:00 -07:00
David Goldblatt	d701a085c2	Fast path: allow low-water mark changes. This lets us put more allocations on an "almost as fast" path after a flush. This results in around a 4% reduction in malloc cycles in prod workloads (corresponding to about a 0.1% reduction in overall cycles).	2020-03-12 11:54:19 -07:00
David Goldblatt	79f1ee2fc0	Move junking out of arena/tcache code. This is debug only and we keep it off the fast path. Moving it here simplifies the internal logic. This never tries to junk on regions that were shrunk via xallocx. I think this is fine for two reasons: - The shrunk-with-xallocx case is rare. - We don't always do that anyway before this diff (it depends on the opt settings and extent hooks in effect).	2020-03-12 11:54:19 -07:00
David T. Goldblatt	162c2bcf31	Background thread: take base as a parameter.	2020-02-18 11:22:09 -08:00
David T. Goldblatt	29436fa056	Break prof and tcache knowledge of b0.	2020-02-18 11:22:09 -08:00
David T. Goldblatt	a0c1f4ac57	Rtree: take the base allocator as a parameter. This facilitates better testing by avoiding mixing of the "real" base with the base used by the rtree under test.	2020-02-18 11:22:09 -08:00
David T. Goldblatt	7013716aaa	Emap: Take (and propagate) a zeroed parameter. Rtree needs this, and we should really treat them similarly.	2020-02-18 11:22:09 -08:00
David Goldblatt	7e6c8a7286	Emap: Standardize naming. Namespace everything under emap_, always specify what it is we're looking up (emap_lookup -> emap_edata_lookup), and use "ctx" over "info".	2020-02-17 10:50:51 -08:00
David Goldblatt	06e42090f7	Make jemalloc.c use the emap interface. While we're here, we'll also clean up some style nits.	2020-02-17 10:50:51 -08:00
David Goldblatt	f7d9c6c42d	Emap: Move in alloc_ctx lookup functionality.	2020-02-17 10:50:51 -08:00
David Goldblatt	9b5d105fc3	Emap: Move in iealloc. This is logically scoped to the emap.	2020-02-17 10:50:51 -08:00
David Goldblatt	01f255161c	Add emap, for tracking extent locking.	2020-02-17 10:50:51 -08:00
Yinan Zhang	9cac3fa8f5	Encapsulate buffer allocation in buffered writer	2020-02-04 13:21:58 -08:00
Yinan Zhang	bdc08b5158	Better naming buffered writer	2020-02-04 13:21:58 -08:00
Qi Wang	e896522616	Abbreviate thread-event to te.	2020-02-04 13:07:05 -08:00
Qi Wang	5e500523a0	Remove thread_event_boot().	2020-02-04 00:18:15 -08:00
Qi Wang	97dd79db6c	Implement deallocation events. Make the event module to accept two event types, and pass around the event context. Use bytes-based events to trigger tcache GC on deallocation, and get rid of the tcache ticker.	2020-02-04 00:18:15 -08:00
Qi Wang	974222c626	Add safety check on sdallocx slow / sampled path.	2020-01-31 00:04:22 -08:00
Qi Wang	88d9eca848	Enforce page alignment for sampled allocations. This allows sampled allocations to be checked through alignment, therefore enable sized deallocation regardless of cache_oblivious.	2020-01-31 00:04:22 -08:00
Qi Wang	88b0e03a4e	Implement opt.stats_interval and the _opts options. Add options stats_interval and stats_interval_opts to allow interval based stats printing. This provides an easy way to collect stats without code changes, because opt.stats_print may not work (some binaries never exit).	2020-01-29 09:57:55 -08:00
Yinan Zhang	f81341a48b	Fallback to unbuffered printing if OOM	2020-01-21 17:09:44 -08:00
Qi Wang	dab81bd315	Rework and fix the assertions on malloc fastpath. The first half of the malloc fastpath may execute before malloc_init. Make the assertions work in that case.	2020-01-14 15:00:41 -08:00
Yinan Zhang	2b604a3016	Record request size in prof recent entries	2020-01-10 12:01:01 -08:00
Yinan Zhang	40a391408c	Define constructor for buffered writer argument	2020-01-10 11:59:02 -08:00
Yinan Zhang	6d8e616902	Make buffered writer an independent module	2020-01-10 11:59:02 -08:00
Yinan Zhang	6b6b4709b3	Unify buffered writer naming	2020-01-09 14:31:31 -08:00
Yinan Zhang	9a60cf54ec	Last-N profiling mode	2019-12-30 15:58:57 -08:00
David Goldblatt	c8dae890c8	Extent -> Ehooks: Move over default hooks.	2019-12-20 10:18:40 -08:00
Qi Wang	d5031ea824	Allow dallocx and sdallocx after tsd destruction. After a thread turns into purgatory / reincarnated state, still allow dallocx and sdallocx to function normally.	2019-12-19 11:17:03 -08:00
Qi Wang	dd649c9485	Optimize away the tsd_fast() check on fastpath. Fold the tsd_state check onto the event threshold check. The fast threshold is set to 0 when tsd switch to non-nominal. The fast_threshold can be reset by remote threads, to refect the non nominal tsd state change.	2019-12-11 23:44:20 -08:00
Yinan Zhang	055478cca8	Threshold is no longer updated before prof_realloc()	2019-12-10 16:31:05 -08:00
Yinan Zhang	6945371778	Change tsdn to tsd for profiling code path	2019-11-22 16:31:56 -08:00
Yinan Zhang	b55419f9b9	Restructure profiling Develop new data structure and code logic for holding profiling related information stored in the extent that may be needed after the extent is released, which in particular is the case for the reallocation code path (e.g. in `rallocx()` and `xallocx()`). The data structure is a generalization of `prof_tctx_t`: we previously only copy out the `prof_tctx` before the extent is released, but we may be in need of additional fields. Currently the only additional field is the allocation time field, but there may be more fields in the future. The restructuring also resolved a bug: `prof_realloc()` mistakenly passed the new `ptr` to `prof_free_sampled_object()`, but passing in the `old_ptr` would crash because it's already been released. Now the essential profiling information is collectively copied out early and safely passed to `prof_free_sampled_object()` after the extent is released.	2019-11-22 16:31:56 -08:00
Qi Wang	cb1a1f4ada	Remove the unnecessary alloc_ctx on free_fastpath.	2019-11-16 13:41:13 -08:00
Qi Wang	7160617107	Add branch hints to free_fastpath. Explicityly mark the non-slab case unlikely. Previously there were jumps in the common case.	2019-11-16 13:41:13 -08:00
Qi Wang	a787d2f5b3	Prefer getaffinity() to detect number of CPUs.	2019-11-15 16:24:38 -08:00
Qi Wang	836d7a7e69	Check for large size first in the uncommon case of malloc. Larger sizes are not that uncommon comparing to !tsd_fast.	2019-11-11 13:30:20 -08:00
Yinan Zhang	97f93fa0f2	Pull tcache GC events into thread event handler	2019-11-04 16:07:56 -08:00
Yinan Zhang	198f02e797	Pull prof_accumbytes into thread event handler	2019-11-04 15:21:16 -08:00
Yinan Zhang	152c0ef954	Build a general purpose thread event handler	2019-11-04 11:15:50 -08:00
David T. Goldblatt	de81a4eada	Add stats counters for number of zero reallocs	2019-10-29 17:48:44 -07:00
David T. Goldblatt	9cfa805947	Realloc: Make behavior of realloc(ptr, 0) configurable.	2019-10-29 17:48:44 -07:00
David T. Goldblatt	ee961c2310	Merge realloc and rallocx pathways.	2019-10-29 17:48:44 -07:00
Yinan Zhang	05681e387a	Optimize cache_bin_alloc_easy for malloc fast path `tcache_bin_info` is not accessed on malloc fast path but the compiler reserves a register for it, as well as an additional register for `tcache_bin_info[ind].stack_size`. The optimization gets rid of the need for the two registers.	2019-10-21 16:43:45 -07:00
David T. Goldblatt	723ccc6c27	Extents: Split out extent struct.	2019-09-23 23:06:27 -07:00
Yinan Zhang	adce29c885	Optimize for prof_active off Move the handling of `prof_active` off case completely to slow path, so as to reduce register pressure on malloc fast path.	2019-08-27 14:48:56 -07:00
Yinan Zhang	49e6fbce78	Always adjust thread_(de)allocated	2019-08-26 11:56:41 -07:00
Yinan Zhang	9e031c1d11	Bug fix for prof_active switch The bug is subtle but critical: if application performs the following three actions in sequence: (a) turn `prof_active` off, (b) make at least one allocation that triggers the malloc slow path via the `if (unlikely(bytes_until_sample < 0))` path, and (c) turn `prof_active` back on, then the application would never get another sample (until a very very long time later). The fix is to properly reset `bytes_until_sample` rather than throwing it all the way to `SSIZE_MAX`. A side minor change is to call `prof_active_get_unlocked()` rather than directly grabbing the `prof_active` variable - it is the very reason why we defined the `prof_active_get_unlocked()` function.	2019-08-22 13:00:10 -07:00
Qi Wang	7599c82d48	Redesign the cache bin metadata for fast path. Implement the pointer-based metadata for tcache bins -- - 3 pointers are maintained to represent each bin; - 2 of the pointers are compressed on 64-bit; - is_full / is_empty done through pointer comparison; Comparing to the previous counter based design -- - fast-path speed up ~15% in benchmarks - direct pointer comparison and de-reference - no need to access tcache_bin_info in common case	2019-08-19 12:21:44 -07:00
Yinan Zhang	28ed9b9a51	Buffer stats printing Without buffering `malloc_stats_print` would invoke the write back call (which could mean an expensive `malloc_write_fd` call) for every single `printf` (including printing each line break and each leading tab/space for indentation).	2019-08-13 09:40:11 -07:00
Qi Wang	85f0cb2d0c	Add indent to individual options for confirm_conf.	2019-07-25 17:00:31 -07:00
Qi Wang	f32f23d6cc	Fix posix_memalign with input size 0. Return a valid pointer instead of failed assertion.	2019-07-18 00:43:23 -07:00
Yinan Zhang	c92ac30601	Add confirm_conf option If the confirm_conf option is set, when the program starts, each of the four malloc_conf strings will be printed, and each option will be printed when being set.	2019-05-22 09:38:39 -07:00
Yinan Zhang	13e88ae970	Fix assert in free fastpath rtree_szind_slab_read_fast() may have not initialized alloc_ctx.szind, unless after confirming the return is true.	2019-05-15 09:42:52 -07:00
Yinan Zhang	259b15dec5	Improve macro readability in malloc_conf_init Define more readable macros than yes and no.	2019-05-08 14:15:03 -07:00
David Goldblatt	33e1dad680	Safety checks: Add a redzoning feature.	2019-04-15 16:48:12 -07:00
mgrice	d3d7a8ef09	remove compare and branch in fast path for c++ operator delete[] Summary: sdallocx is checking a flag that will never be set (at least in the provided C++ destructor implementation). This branch will probably only rarely be mispredicted however it removes two instructions in sdallocx and one at the callsite (to zero out flags).	2019-04-08 10:59:05 -07:00
Qi Wang	0101d5ebef	Avoid check_min for opt_lg_extent_max_active_fit. This fixes a compiler warning.	2019-03-29 15:56:53 -07:00
Qi Wang	788a657cee	Allow low values of oversize_threshold to disable the feature. We should allow a way to easily disable the feature (e.g. not reserving the arena id at all).	2019-03-29 11:33:00 -07:00
Qi Wang	e3db480f6f	Rename huge_threshold to oversize_threshold. The keyword huge tend to remind people of huge pages which is not relevent to the feature.	2019-01-25 13:15:45 -08:00
Qi Wang	350809dc5d	Set huge_threshold to 8M by default. This feature uses an dedicated arena to handle huge requests, which significantly improves VM fragmentation. In production workload we tested it often reduces VM size by >30%.	2019-01-24 13:29:23 -08:00
Qi Wang	7a815c1b7c	Un-experimental the huge_threshold feature.	2019-01-16 12:28:57 -08:00
Qi Wang	bbe8e6a909	Avoid creating bg thds for huge arena lone. For low arena count settings, the huge threshold feature may trigger an unwanted bg thd creation. Given that the huge arena does eager purging by default, bypass bg thd creation when initializing the huge arena.	2019-01-15 16:00:34 -08:00
Qi Wang	98b56ab23d	Store the bin shard selection in TSD. This avoids having to choose bin shard on the fly, also will allow flexible bin binding for each thread.	2018-12-03 17:17:03 -08:00
Qi Wang	3f9f2833f6	Add opt.bin_shards to specify number of bin shards. The option uses the same format as "slab_sizes" to specify number of shards for each bin size.	2018-12-03 17:17:03 -08:00
Qi Wang	37b8913925	Add support for sharded bins within an arena. This makes it possible to have multiple set of bins in an arena, which improves arena scalability because the bins (especially the small ones) are always the limiting factor in production workload. A bin shard is picked on allocation; each extent tracks the bin shard id for deallocation. The shard size will be determined using runtime options.	2018-12-03 17:17:03 -08:00
Dave Watson	794e29c0ab	Add a free() and sdallocx(where flags=0) fastpath Add unsized and sized deallocation fastpaths. Similar to the malloc() fastpath, this removes all frame manipulation for the majority of free() calls. The performance advantages here are less than that of the malloc() fastpath, but from prod tests seems to still be half a percent or so of improvement. Stats and sampling a both supported (sdallocx needs a sampling check, for rtree lookups slab will only be set for unsampled objects). We don't support flush, any flush requests go to the slowpath.	2018-11-12 13:20:37 -08:00
Dave Watson	0f8313659e	malloc: Add a fastpath This diff adds a fastpath that assumes size <= SC_LOOKUP_MAXCLASS, and that we hit tcache. If either of these is false, we fall back to the previous codepath (renamed 'malloc_default'). Crucially, we only tail call malloc_default, and with the same kind and number of arguments, so that both clang and gcc tail-calling will kick in - therefore malloc() gets treated as a leaf function, and there are no caller-saved registers. Previously malloc() contained 5 caller saved registers on x64, resulting in at least 10 extra memory-movement instructions. In microbenchmarks this results in up to ~10% improvement in malloc() fastpath. In real programs, this is a ~1% CPU and latency improvement overall.	2018-10-18 08:32:19 -07:00
Dave Watson	ac34afb403	drop bump_empty_alloc option. Size class lookup support used instead.	2018-10-17 08:50:58 -07:00
gnzlbg	01e2a38e5a	Make `smallocx` symbol name depend on the `JEMALLOC_VERSION_GID` This comments concatenates the `JEMALLOC_VERSION_GID` to the `smallocx` symbol name, such that the symbol ends up exported as `smallocx_{git_hash}`.	2018-10-17 07:12:28 -07:00
gnzlbg	741fca1bb7	Hide smallocx even when enabled from the library API The experimental `smallocx` API is not exposed via header files, requiring the users to peek at `jemalloc`'s source code to manually add the external declarations to their own programs. This should reinforce that `smallocx` is experimental, and that `jemalloc` does not offer any kind of backwards compatiblity or ABI gurantees for it.	2018-10-17 07:12:28 -07:00
gnzlbg	08260a6b94	Add experimental API: smallocx_return_t smallocx(size, flags) --- Motivation: This new experimental memory-allocaction API returns a pointer to the allocation as well as the usable size of the allocated memory region. The `s` in `smallocx` stands for `sized`-`mallocx`, attempting to convey that this API returns the size of the allocated memory region. It should allow C++ P0901r0 [0] and Rust Alloc::alloc_excess to make use of it. The main purpose of these APIs is to improve telemetry. It is more accurate to register `smallocx(size, flags)` than `smallocx(nallocx(size), flags)`, for example. The latter will always line up perfectly with the existing size classes, causing a loss of telemetry information about the internal fragmentation induced by potentially poor size-classes choices. Instrumenting `nallocx` does not help much since user code can cache its result and use it repeatedly. --- Implementation: The implementation adds a new `usize` option to `static_opts_s` and an `usize` variable to `dynamic_opts_s`. These are then used to cache the result of `sz_index2size` and similar functions in the code paths in which they are unconditionally invoked. In the code-paths in which these functions are not unconditionally invoked, `smallocx` calls, as opposed to `mallocx`, these functions explicitly. --- [0]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0901r0.html	2018-10-17 07:12:28 -07:00
Dave Watson	325e3305fc	remove malloc_init() off the fastpath	2018-10-15 10:11:08 -07:00
David Goldblatt	88771fa013	Bootstrapping: don't overwrite opt_prof_prefix.	2018-09-12 17:06:06 -07:00
David Goldblatt	e8ec9528ab	Allow the use of readlinkat over readlink. This can be useful in situations where readlink is disallowed.	2018-08-03 14:04:32 -07:00
Tyler Etzel	b664bd7935	Add logging for sampled allocations - prof_opt_log flag starts logging automatically at runtime - prof_log_{start,stop} mallctl for manual control	2018-08-01 13:27:11 -07:00
David Goldblatt	41b7372ead	TSD: Add fork support to tsd_nominal_tsds. In case of multithreaded fork, we want to leave the child in a reasonable state, in which tsd_nominal_tsds is either empty or contains only the forking thread.	2018-07-26 17:22:25 -07:00
David Goldblatt	3aba072cef	SC: Remove global data. The global data is mostly only used at initialization, or for easy access to values we could compute statically. Instead of consuming that space (and risking TLB misses), we can just pass around a pointer to stack data during bootstrapping.	2018-07-23 13:37:08 -07:00
Qi Wang	4bc48718b2	Tolerate experimental features for abort_conf. Not aborting with unrecognized experimental options. This helps us testing experimental features with abort_conf enabled.	2018-07-17 20:40:32 -07:00
David Goldblatt	55e5cc1341	SC: Make some key size classes static. The largest small class, smallest large class, and largest large class may all be needed down fast paths; to avoid the risk of touching another cache line, we can make them available as constants.	2018-07-12 20:53:06 -07:00
David T. Goldblatt	5112d9e5fd	Add MALLOC_CONF parsing for dynamic slab sizes. This actually enables us to change the values.	2018-07-12 20:53:06 -07:00
David T. Goldblatt	4610ffa942	Bootstrapping: Parse MALLOC_CONF before using slab sizes. I.e., parse before booting the bin module or sz module. This lets us tweak size class settings before committing to them by letting them leak into other modules. This commit does not actually do any tweaking of the size classes; it just chanchanges bootstrapping order; this may help bisecting any bootstrapping failures on poorly-tested architectures.	2018-07-12 20:53:06 -07:00

1 2 3 4 5 ...

479 Commits