server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Qi Wang	c8209150f9	Switch from opt.lg_tcache_max to opt.tcache_max Though for convenience, keep parsing lg_tcache_max.	2020-10-22 20:40:41 -07:00
Qi Wang	3de19ba401	Eagerly detect double free and sized dealloc bugs for large sizes.	2020-10-15 10:03:16 -07:00
Qi Wang	a9aa6f6d0f	Fix the alloc_ctx check in free_fastpath. The sanity check requires a functional TSD, which free_fastpath only guarantees after the threshold branch. Move the check function to afterwards.	2020-10-12 19:02:27 -07:00
David Goldblatt	b971f7c4dd	Add "default" option to slab sizes. This comes in handy when overriding earlier settings to test alternate ones. We don't really include tests for this, but I claim that's OK here: - It's fairly straightforward - It's fairly hard to test well - This entire code path is undocumented and mostly for our internal experimentation in the first place. - I tested manually.	2020-10-07 12:54:29 -07:00
David Goldblatt	ab274a23b9	Add narenas_ratio. This allows setting arenas per cpu dynamically, rather than forcing the user to know the number of CPUs in advance if they want a particular CPU/space tradeoff.	2020-08-12 16:41:57 -07:00
David Goldblatt	eaed1e39be	Add sized-delete size-checking functionality. The existing checks are good at finding such issues (on tcache flush), but not so good at pinpointing them. Debug mode can find them, but sometimes debug mode slows down a program so much that hard-to-hit bugs can take a long time to crash. This commit adds functionality to keep programs mostly on their fast paths, while also checking every sized delete argument they get.	2020-08-05 19:34:05 -07:00
David Goldblatt	60993697d8	Prof: Add prof_unbias. This gives more accurate attribution of bytes and counts to stack traces, without introducing backwards incompatibilities in heap-profile parsing tools. We track the ideal reported (to the end user) number of bytes more carefully inside core jemalloc. When dumping heap profiles, insteading of outputting our counts directly, we output counts that will cause parsing tools to give a result close to the value we want. We retain the old version as an opt setting, to let users who are tracking values on a per-component basis to keep their metrics stable until they decide to switch.	2020-08-05 18:33:55 -07:00
Yinan Zhang	978f830ee3	Add batch allocation API	2020-07-31 09:16:50 -07:00
David Carlier	00f06c9beb	enabling mpss on solaris/illumos. reusing slighty linux configuration as possible, aligning the address range to HUGEPAGE.	2020-07-06 09:59:10 -07:00
Yinan Zhang	d460333efb	Improve naming for prof system thread name option	2020-06-24 14:32:01 -07:00
Yinan Zhang	24bbf376ce	Unify arena flag reading and selection	2020-06-19 11:06:05 -07:00
Yinan Zhang	e128b170a0	Do not fallback to auto arena when manual arena is requested	2020-06-19 11:06:05 -07:00
Yinan Zhang	95a59d2f72	Unify tcache flag reading and selection	2020-06-19 11:06:05 -07:00
Yinan Zhang	4b0c008489	Unify zero flag reading and setting	2020-06-19 11:06:05 -07:00
Yinan Zhang	2a84f9b8fc	Unify alignment flag reading and computation	2020-06-19 11:06:05 -07:00
Yinan Zhang	40fa6674a9	Fix prof timestamp conf reading	2020-06-17 16:02:51 -07:00
David Goldblatt	40672b0b78	Remove duplicate logging in malloc.	2020-06-16 10:33:55 -07:00
Jon Haslam	4aea743279	High Resolution Timestamps for Profiling	2020-06-15 12:12:49 -07:00
David Goldblatt	6cdac3c573	Tcache: Make flush fractions configurable.	2020-05-16 13:34:23 -07:00
David Goldblatt	ee72bf1cfd	Tcache: Add tcache gc delay option. This can reduce flushing frequency for small size classes.	2020-05-16 13:34:23 -07:00
David Goldblatt	d338dd45d7	Tcache: Make incremental gc bytes configurable.	2020-05-16 13:34:23 -07:00
David Goldblatt	181093173d	Tcache: make slot sizing configurable.	2020-05-16 13:34:23 -07:00
David Goldblatt	634afc4124	Tcache: Make size computation configurable.	2020-05-16 13:34:23 -07:00
David Goldblatt	5dead37a9d	Allow narenas:default. This can be useful when you know you want to override some lower-priority configuration setting with its default value, but don't know what that value would be.	2020-05-14 10:30:08 -07:00
Yinan Zhang	f533ab6da6	Add forking handling for stats	2020-05-11 15:35:06 -07:00
David Goldblatt	f1f8a75496	Let opt.zero propagate to core allocation. I.e. set dopts->zero early on if opt.zero is true, rather than leaving it set by the entry-point function (malloc, calloc, etc.) and then memsetting. This avoids situations where we zero once in the large-alloc pathway and then again via memset.	2020-05-04 12:36:45 -07:00
David Goldblatt	cd29ebefd0	Tcache: treat small and large cache bins uniformly	2020-04-14 15:20:19 -07:00
David Goldblatt	a13fbad374	Tcache: split up fast and slow path data.	2020-04-14 15:20:19 -07:00
David Goldblatt	79ae7f9211	Rtree: Remove the per-field accessors. We instead split things into "edata" and "metadata".	2020-04-10 13:12:47 -07:00
David Goldblatt	294b276fc7	PA: Parameterize emap. Move emap_global to arena. This lets us test the PA module without interfering with the global emap used by the real allocator (the one not under test).	2020-04-10 13:12:47 -07:00
Yinan Zhang	09cd79495f	Encapsulate buffer allocation failure in buffered writer	2020-04-01 09:41:20 -07:00
David T. Goldblatt	d936b46d3a	Add malloc_conf_2_conf_harder This comes in handy when you're just a user of a canary system who wants to change settings set by the configuration system itself.	2020-03-31 06:25:08 -07:00
Yinan Zhang	2256ef8961	Add option to fetch system thread name on each prof sample	2020-03-24 21:39:57 -07:00
Yinan Zhang	ba783b3a0f	Remove prof -> thread_event dependency	2020-03-12 13:55:00 -07:00
Yinan Zhang	441d88d1c7	Rewrite profiling thread event	2020-03-12 13:55:00 -07:00
David Goldblatt	d701a085c2	Fast path: allow low-water mark changes. This lets us put more allocations on an "almost as fast" path after a flush. This results in around a 4% reduction in malloc cycles in prod workloads (corresponding to about a 0.1% reduction in overall cycles).	2020-03-12 11:54:19 -07:00
David Goldblatt	79f1ee2fc0	Move junking out of arena/tcache code. This is debug only and we keep it off the fast path. Moving it here simplifies the internal logic. This never tries to junk on regions that were shrunk via xallocx. I think this is fine for two reasons: - The shrunk-with-xallocx case is rare. - We don't always do that anyway before this diff (it depends on the opt settings and extent hooks in effect).	2020-03-12 11:54:19 -07:00
David T. Goldblatt	162c2bcf31	Background thread: take base as a parameter.	2020-02-18 11:22:09 -08:00
David T. Goldblatt	29436fa056	Break prof and tcache knowledge of b0.	2020-02-18 11:22:09 -08:00
David T. Goldblatt	a0c1f4ac57	Rtree: take the base allocator as a parameter. This facilitates better testing by avoiding mixing of the "real" base with the base used by the rtree under test.	2020-02-18 11:22:09 -08:00
David T. Goldblatt	7013716aaa	Emap: Take (and propagate) a zeroed parameter. Rtree needs this, and we should really treat them similarly.	2020-02-18 11:22:09 -08:00
David Goldblatt	7e6c8a7286	Emap: Standardize naming. Namespace everything under emap_, always specify what it is we're looking up (emap_lookup -> emap_edata_lookup), and use "ctx" over "info".	2020-02-17 10:50:51 -08:00
David Goldblatt	06e42090f7	Make jemalloc.c use the emap interface. While we're here, we'll also clean up some style nits.	2020-02-17 10:50:51 -08:00
David Goldblatt	f7d9c6c42d	Emap: Move in alloc_ctx lookup functionality.	2020-02-17 10:50:51 -08:00
David Goldblatt	9b5d105fc3	Emap: Move in iealloc. This is logically scoped to the emap.	2020-02-17 10:50:51 -08:00
David Goldblatt	01f255161c	Add emap, for tracking extent locking.	2020-02-17 10:50:51 -08:00
Yinan Zhang	9cac3fa8f5	Encapsulate buffer allocation in buffered writer	2020-02-04 13:21:58 -08:00
Yinan Zhang	bdc08b5158	Better naming buffered writer	2020-02-04 13:21:58 -08:00
Qi Wang	e896522616	Abbreviate thread-event to te.	2020-02-04 13:07:05 -08:00
Qi Wang	5e500523a0	Remove thread_event_boot().	2020-02-04 00:18:15 -08:00
Qi Wang	97dd79db6c	Implement deallocation events. Make the event module to accept two event types, and pass around the event context. Use bytes-based events to trigger tcache GC on deallocation, and get rid of the tcache ticker.	2020-02-04 00:18:15 -08:00
Qi Wang	974222c626	Add safety check on sdallocx slow / sampled path.	2020-01-31 00:04:22 -08:00
Qi Wang	88d9eca848	Enforce page alignment for sampled allocations. This allows sampled allocations to be checked through alignment, therefore enable sized deallocation regardless of cache_oblivious.	2020-01-31 00:04:22 -08:00
Qi Wang	88b0e03a4e	Implement opt.stats_interval and the _opts options. Add options stats_interval and stats_interval_opts to allow interval based stats printing. This provides an easy way to collect stats without code changes, because opt.stats_print may not work (some binaries never exit).	2020-01-29 09:57:55 -08:00
Yinan Zhang	f81341a48b	Fallback to unbuffered printing if OOM	2020-01-21 17:09:44 -08:00
Qi Wang	dab81bd315	Rework and fix the assertions on malloc fastpath. The first half of the malloc fastpath may execute before malloc_init. Make the assertions work in that case.	2020-01-14 15:00:41 -08:00
Yinan Zhang	2b604a3016	Record request size in prof recent entries	2020-01-10 12:01:01 -08:00
Yinan Zhang	40a391408c	Define constructor for buffered writer argument	2020-01-10 11:59:02 -08:00
Yinan Zhang	6d8e616902	Make buffered writer an independent module	2020-01-10 11:59:02 -08:00
Yinan Zhang	6b6b4709b3	Unify buffered writer naming	2020-01-09 14:31:31 -08:00
Yinan Zhang	9a60cf54ec	Last-N profiling mode	2019-12-30 15:58:57 -08:00
David Goldblatt	c8dae890c8	Extent -> Ehooks: Move over default hooks.	2019-12-20 10:18:40 -08:00
Qi Wang	d5031ea824	Allow dallocx and sdallocx after tsd destruction. After a thread turns into purgatory / reincarnated state, still allow dallocx and sdallocx to function normally.	2019-12-19 11:17:03 -08:00
Qi Wang	dd649c9485	Optimize away the tsd_fast() check on fastpath. Fold the tsd_state check onto the event threshold check. The fast threshold is set to 0 when tsd switch to non-nominal. The fast_threshold can be reset by remote threads, to refect the non nominal tsd state change.	2019-12-11 23:44:20 -08:00
Yinan Zhang	055478cca8	Threshold is no longer updated before prof_realloc()	2019-12-10 16:31:05 -08:00
Yinan Zhang	6945371778	Change tsdn to tsd for profiling code path	2019-11-22 16:31:56 -08:00
Yinan Zhang	b55419f9b9	Restructure profiling Develop new data structure and code logic for holding profiling related information stored in the extent that may be needed after the extent is released, which in particular is the case for the reallocation code path (e.g. in `rallocx()` and `xallocx()`). The data structure is a generalization of `prof_tctx_t`: we previously only copy out the `prof_tctx` before the extent is released, but we may be in need of additional fields. Currently the only additional field is the allocation time field, but there may be more fields in the future. The restructuring also resolved a bug: `prof_realloc()` mistakenly passed the new `ptr` to `prof_free_sampled_object()`, but passing in the `old_ptr` would crash because it's already been released. Now the essential profiling information is collectively copied out early and safely passed to `prof_free_sampled_object()` after the extent is released.	2019-11-22 16:31:56 -08:00
Qi Wang	cb1a1f4ada	Remove the unnecessary alloc_ctx on free_fastpath.	2019-11-16 13:41:13 -08:00
Qi Wang	7160617107	Add branch hints to free_fastpath. Explicityly mark the non-slab case unlikely. Previously there were jumps in the common case.	2019-11-16 13:41:13 -08:00
Qi Wang	a787d2f5b3	Prefer getaffinity() to detect number of CPUs.	2019-11-15 16:24:38 -08:00
Qi Wang	836d7a7e69	Check for large size first in the uncommon case of malloc. Larger sizes are not that uncommon comparing to !tsd_fast.	2019-11-11 13:30:20 -08:00
Yinan Zhang	97f93fa0f2	Pull tcache GC events into thread event handler	2019-11-04 16:07:56 -08:00
Yinan Zhang	198f02e797	Pull prof_accumbytes into thread event handler	2019-11-04 15:21:16 -08:00
Yinan Zhang	152c0ef954	Build a general purpose thread event handler	2019-11-04 11:15:50 -08:00
David T. Goldblatt	de81a4eada	Add stats counters for number of zero reallocs	2019-10-29 17:48:44 -07:00
David T. Goldblatt	9cfa805947	Realloc: Make behavior of realloc(ptr, 0) configurable.	2019-10-29 17:48:44 -07:00
David T. Goldblatt	ee961c2310	Merge realloc and rallocx pathways.	2019-10-29 17:48:44 -07:00
Yinan Zhang	05681e387a	Optimize cache_bin_alloc_easy for malloc fast path `tcache_bin_info` is not accessed on malloc fast path but the compiler reserves a register for it, as well as an additional register for `tcache_bin_info[ind].stack_size`. The optimization gets rid of the need for the two registers.	2019-10-21 16:43:45 -07:00
David T. Goldblatt	723ccc6c27	Extents: Split out extent struct.	2019-09-23 23:06:27 -07:00
Yinan Zhang	adce29c885	Optimize for prof_active off Move the handling of `prof_active` off case completely to slow path, so as to reduce register pressure on malloc fast path.	2019-08-27 14:48:56 -07:00
Yinan Zhang	49e6fbce78	Always adjust thread_(de)allocated	2019-08-26 11:56:41 -07:00
Yinan Zhang	9e031c1d11	Bug fix for prof_active switch The bug is subtle but critical: if application performs the following three actions in sequence: (a) turn `prof_active` off, (b) make at least one allocation that triggers the malloc slow path via the `if (unlikely(bytes_until_sample < 0))` path, and (c) turn `prof_active` back on, then the application would never get another sample (until a very very long time later). The fix is to properly reset `bytes_until_sample` rather than throwing it all the way to `SSIZE_MAX`. A side minor change is to call `prof_active_get_unlocked()` rather than directly grabbing the `prof_active` variable - it is the very reason why we defined the `prof_active_get_unlocked()` function.	2019-08-22 13:00:10 -07:00
Qi Wang	7599c82d48	Redesign the cache bin metadata for fast path. Implement the pointer-based metadata for tcache bins -- - 3 pointers are maintained to represent each bin; - 2 of the pointers are compressed on 64-bit; - is_full / is_empty done through pointer comparison; Comparing to the previous counter based design -- - fast-path speed up ~15% in benchmarks - direct pointer comparison and de-reference - no need to access tcache_bin_info in common case	2019-08-19 12:21:44 -07:00
Yinan Zhang	28ed9b9a51	Buffer stats printing Without buffering `malloc_stats_print` would invoke the write back call (which could mean an expensive `malloc_write_fd` call) for every single `printf` (including printing each line break and each leading tab/space for indentation).	2019-08-13 09:40:11 -07:00
Qi Wang	85f0cb2d0c	Add indent to individual options for confirm_conf.	2019-07-25 17:00:31 -07:00
Qi Wang	f32f23d6cc	Fix posix_memalign with input size 0. Return a valid pointer instead of failed assertion.	2019-07-18 00:43:23 -07:00
Yinan Zhang	c92ac30601	Add confirm_conf option If the confirm_conf option is set, when the program starts, each of the four malloc_conf strings will be printed, and each option will be printed when being set.	2019-05-22 09:38:39 -07:00
Yinan Zhang	13e88ae970	Fix assert in free fastpath rtree_szind_slab_read_fast() may have not initialized alloc_ctx.szind, unless after confirming the return is true.	2019-05-15 09:42:52 -07:00
Yinan Zhang	259b15dec5	Improve macro readability in malloc_conf_init Define more readable macros than yes and no.	2019-05-08 14:15:03 -07:00
David Goldblatt	33e1dad680	Safety checks: Add a redzoning feature.	2019-04-15 16:48:12 -07:00
mgrice	d3d7a8ef09	remove compare and branch in fast path for c++ operator delete[] Summary: sdallocx is checking a flag that will never be set (at least in the provided C++ destructor implementation). This branch will probably only rarely be mispredicted however it removes two instructions in sdallocx and one at the callsite (to zero out flags).	2019-04-08 10:59:05 -07:00
Qi Wang	0101d5ebef	Avoid check_min for opt_lg_extent_max_active_fit. This fixes a compiler warning.	2019-03-29 15:56:53 -07:00
Qi Wang	788a657cee	Allow low values of oversize_threshold to disable the feature. We should allow a way to easily disable the feature (e.g. not reserving the arena id at all).	2019-03-29 11:33:00 -07:00
Qi Wang	e3db480f6f	Rename huge_threshold to oversize_threshold. The keyword huge tend to remind people of huge pages which is not relevent to the feature.	2019-01-25 13:15:45 -08:00
Qi Wang	350809dc5d	Set huge_threshold to 8M by default. This feature uses an dedicated arena to handle huge requests, which significantly improves VM fragmentation. In production workload we tested it often reduces VM size by >30%.	2019-01-24 13:29:23 -08:00
Qi Wang	7a815c1b7c	Un-experimental the huge_threshold feature.	2019-01-16 12:28:57 -08:00
Qi Wang	bbe8e6a909	Avoid creating bg thds for huge arena lone. For low arena count settings, the huge threshold feature may trigger an unwanted bg thd creation. Given that the huge arena does eager purging by default, bypass bg thd creation when initializing the huge arena.	2019-01-15 16:00:34 -08:00
Qi Wang	98b56ab23d	Store the bin shard selection in TSD. This avoids having to choose bin shard on the fly, also will allow flexible bin binding for each thread.	2018-12-03 17:17:03 -08:00
Qi Wang	3f9f2833f6	Add opt.bin_shards to specify number of bin shards. The option uses the same format as "slab_sizes" to specify number of shards for each bin size.	2018-12-03 17:17:03 -08:00
Qi Wang	37b8913925	Add support for sharded bins within an arena. This makes it possible to have multiple set of bins in an arena, which improves arena scalability because the bins (especially the small ones) are always the limiting factor in production workload. A bin shard is picked on allocation; each extent tracks the bin shard id for deallocation. The shard size will be determined using runtime options.	2018-12-03 17:17:03 -08:00

1 2 3 4 5 ...

495 Commits