server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Qi Wang	1a0503367b	Revert "Refactor profiling" This reverts commit 0b462407ae84a62b3c097f0e9f18df487a47d9a7.	2019-07-29 14:10:15 -07:00
Yinan Zhang	0b462407ae	Refactor profiling Refactored core profiling codebase into two logical parts: (a) `prof_data.c`: core internal data structure managing & dumping; (b) `prof.c`: mutexes & outward-facing APIs. Some internal functions had to be exposed out, but there are not that many of them if the modularization is (hopefully) clean enough.	2019-07-29 13:55:00 -07:00
Yinan Zhang	7618b0b8e4	Refactor prof log `prof.c` is growing too long, so trying to modularize it. There are a few internal functions that had to be exposed but I think it is a fair trade-off.	2019-07-29 13:55:00 -07:00
Qi Wang	85f0cb2d0c	Add indent to individual options for confirm_conf.	2019-07-25 17:00:31 -07:00
Qi Wang	bc0998a905	Invoke arena_dalloc_promoted() properly w/o tcache. When tcache was disabled, the dalloc promoted case was missing.	2019-07-24 18:30:54 -07:00
Qi Wang	1d148f353a	Optimize max_active_fit in first_fit. Stop scanning once reached the first max_active_fit size.	2019-07-24 11:28:45 -07:00
Qi Wang	4e36ce34c1	Track the leaked VM space via the abandoned_vm counter. The counter is 0 unless metadata allocation failed (indicates OOM), and is mainly for sanity checking.	2019-07-24 11:24:22 -07:00
Qi Wang	42807fcd9e	extent_dalloc instead of leak when register fails. extent_register may only fail if the underlying extent and region got stolen / coalesced before we lock. Avoid doing extent_leak (which purges the region) since we don't really own the region.	2019-07-23 22:34:45 -07:00
Qi Wang	57dbab5d6b	Avoid leaking extents / VM when split is not supported. This can only happen on Windows and with opt.retain disabled (which isn't the default). The solution is suboptimal, however not a common case as retain is the long term plan for all platforms anyway.	2019-07-23 22:18:55 -07:00
Qi Wang	9a86c65abc	Implement retain on Windows. The VirtualAlloc and VirtualFree APIs are different because MEM_DECOMMIT cannot be used across multiple VirtualAlloc regions. To properly support decommit, only allow merge / split within the same region -- this is done by tracking the "is_head" state of extents and not merging cross-region. Add a new state is_head (only relevant for retain && !maps_coalesce), which is true for the first extent in each VirtualAlloc region. Determine if two extents can be merged based on the head state, and use serial numbers for sanity checks.	2019-07-23 22:18:55 -07:00
Qi Wang	f32f23d6cc	Fix posix_memalign with input size 0. Return a valid pointer instead of failed assertion.	2019-07-18 00:43:23 -07:00
Yinan Zhang	e0a0c8d4bf	Fix a bug in prof_dump_write The original logic can be disastrous if `PROF_DUMP_BUFSIZE` is less than `slen` -- `prof_dump_buf_end + slen <= PROF_DUMP_BUFSIZE` would always be `false`, so `memcpy` would always try to copy `PROF_DUMP_BUFSIZE - prof_dump_buf_end` chars, which can be dangerous: in the last round of the `while` loop it would not only illegally read the memory beyond `s` (which might not always be disastrous), but it would also illegally overwrite the memory beyond `prof_dump_buf` (which can be pretty disastrous). `slen` probably has never gone beyond `PROF_DUMP_BUFSIZE` so we were just lucky.	2019-07-16 15:15:32 -07:00
Yinan Zhang	d26636d566	Fix logic in printing `cbopaque` can now be overriden without overriding `write_cb` in the first place. (Otherwise there would be no need to have the `cbopaque` parameter in `malloc_message`.)	2019-07-16 14:54:23 -07:00
Qi Wang	1a71533511	Avoid blocking on background thread lock for stats. Background threads may run for a long time, especially when the # of dirty pages is high. Avoid blocking stats calls because of this (which may cause latency spikes).	2019-05-22 14:28:38 -07:00
Qi Wang	e13cf65a5f	Add experimental.arenas.i.pactivep. The new experimental mallctl exposes the arena pactive counter to applications, which allows fast read w/o going through the mallctl / epoch steps. This is particularly useful when frequent balancing is required, e.g. when having multiple manual arenas, and threads are multiplexed to them based on usage.	2019-05-22 14:27:58 -07:00
Yinan Zhang	c92ac30601	Add confirm_conf option If the confirm_conf option is set, when the program starts, each of the four malloc_conf strings will be printed, and each option will be printed when being set.	2019-05-22 09:38:39 -07:00
Yinan Zhang	4c63b0e76a	Improve memory utilization tests Added tests for large size classes and expanded the tests to cover wider range of allocation sizes.	2019-05-21 12:57:06 -07:00
Vaibhav Jain	2d6d099fed	Fix GCC-9.1 warning with macro GET_ARG_NUMERIC GCC-9.1 reports following error when trying to compile file src/malloc_io.c and with CFLAGS='-Werror' : src/malloc_io.c: In function ‘malloc_vsnprintf’: src/malloc_io.c:369:2: error: case label value exceeds maximum value for type [-Werror] 369 \| case '?' \| 0x80: \ \| ^~~~ src/malloc_io.c:581:5: note: in expansion of macro ‘GET_ARG_NUMERIC’ 581 \| GET_ARG_NUMERIC(val, 'p'); \| ^~~~~~~~~~~~~~~ ... <snip> cc1: all warnings being treated as errors make: *** [Makefile:388: src/malloc_io.sym.o] Error 1 The warning is reported as by default the type 'char' is 'signed char' and or-ing 0x80 will turn the case label char negative which will be beyond the printable ascii range (0 - 127). The patch fixes this by explicitly casting the 'len' variable as unsigned char' inside the 'switch' statement so that value of expression " '?' \| 0x80 " falls within the legal values of the variable 'len'.	2019-05-21 11:20:07 -07:00
Qi Wang	07c44847c2	Track nfills and nflushes for arenas.i.small / large. Small is added purely for convenience. Large flushes wasn't tracked before and can be useful in analysis. Large fill simply reports nmalloc, since there is no batch fill for large currently.	2019-05-15 10:05:09 -07:00
Yinan Zhang	13e88ae970	Fix assert in free fastpath rtree_szind_slab_read_fast() may have not initialized alloc_ctx.szind, unless after confirming the return is true.	2019-05-15 09:42:52 -07:00
Yinan Zhang	259b15dec5	Improve macro readability in malloc_conf_init Define more readable macros than yes and no.	2019-05-08 14:15:03 -07:00
Dave Watson	5679751208	Remove best fit This option saves a few CPU cycles, but potentially adds a lot of fragmentation - so much so that there are workarounds like max_active. Instead, let's just drop it entirely. It only made a difference in one service I tested (.3% cpu regression), while many services saw a memory win (also small, less than 1% mem P99)	2019-05-08 13:15:19 -07:00
Dave Watson	b62d126df8	Add max_active_fit to first_fit The max_active_fit check is currently only on the best_fit path, add it to the first_fit path also.	2019-05-08 13:15:19 -07:00
Doron Roberts-Kedes	7fc4f2a32c	Add nonfull_slabs to bin_stats_t. When config_stats is enabled track the size of bin->slabs_nonfull in the new nonfull_slabs counter in bin_stats_t. This metric should be useful for establishing an upper ceiling on the savings possible by meshing.	2019-04-29 13:35:02 -07:00
Qi Wang	1aabab5fdc	Enforce TLS_MODEL attribute. Caught by @zoulasc in #1460. The attribute needs to be added in the headers as well.	2019-04-16 11:07:15 -07:00
David Goldblatt	33e1dad680	Safety checks: Add a redzoning feature.	2019-04-15 16:48:12 -07:00
David Goldblatt	b92c9a1a81	Safety checks: Indirect through a function. This will let us share code on failure pathways.pathways	2019-04-15 16:48:12 -07:00
David Goldblatt	f95a88fcd9	Safety checks: Expose config value via mallctl and stats.	2019-04-15 16:48:12 -07:00
David Goldblatt	f4d24f05e1	Move extra size checks behind a config flag. This will let us turn that flag into a generic "turn on runtime checks" flag that guards other functionality we have planned.	2019-04-15 16:48:12 -07:00
Yinan Zhang	7ee3897740	Separate tests for extent utilization API As title.	2019-04-10 13:03:20 -07:00
mgrice	d3d7a8ef09	remove compare and branch in fast path for c++ operator delete[] Summary: sdallocx is checking a flag that will never be set (at least in the provided C++ destructor implementation). This branch will probably only rarely be mispredicted however it removes two instructions in sdallocx and one at the callsite (to zero out flags).	2019-04-08 10:59:05 -07:00
Qi Wang	93084cdc89	Ensure page alignment on extent_alloc. This is discovered and suggested by @jasone in #1468. When custom extent hooks are in use, we should ensure page alignment on the extent alloc path, instead of relying on the user hooks to do so.	2019-04-04 13:49:37 -07:00
Yinan Zhang	9aab3f2be0	Add memory utilization analytics to mallctl The analytics tool is put under experimental.utilization namespace in mallctl. Input is one pointer or an array of pointers and the output is a list of memory utilization statistics.	2019-04-04 13:48:39 -07:00
Qi Wang	978a7a21ae	Use iallocztm instead of ialloc in prof_log functions. Explicitly use iallocztm for internal allocations. ialloc could trigger arena creation, which may cause lock order reversal (narenas_mtx and log_mtx).	2019-04-02 16:53:00 -07:00
Qi Wang	0101d5ebef	Avoid check_min for opt_lg_extent_max_active_fit. This fixes a compiler warning.	2019-03-29 15:56:53 -07:00
Qi Wang	59d9891948	Add the missing unlock in the error path of extent_register.	2019-03-29 15:56:53 -07:00
Qi Wang	788a657cee	Allow low values of oversize_threshold to disable the feature. We should allow a way to easily disable the feature (e.g. not reserving the arena id at all).	2019-03-29 11:33:00 -07:00
Qi Wang	a4d017f5e5	Output message before aborting on tcache size-matching check.	2019-03-29 11:33:00 -07:00
Qi Wang	fb56766ca9	Eagerly purge oversized merged extents. This change improves memory usage slightly, at virtually no CPU cost.	2019-03-14 17:34:55 -07:00
Qi Wang	b804d0f019	Fallback to 32-bit when 8-bit atomics are missing for TSD. When it happens, this might cause a slowdown on the fast path operations. However such case is very rare.	2019-03-09 12:52:06 -08:00
Dave Rigby	cbdb1807ce	Stringify tls_callback linker directive Proposed fix for #1444 - ensure that `tls_callback` in the `#pragma comment(linker)`directive gets the same prefix added as it does i the C declaration.	2019-02-22 12:43:35 -08:00
Qi Wang	18450d0abe	Guard libgcc unwind init with opt_prof. Only triggers libgcc unwind init when prof is enabled. This helps workaround some bootstrapping issues.	2019-02-21 16:04:47 -08:00
Qi Wang	2db2d2ef5e	Make background_thread not dependent on libdl. When not using libdl, still allows background_thread to be enabled.	2019-02-06 21:00:59 -08:00
Qi Wang	e13400c919	Sanity check szind on tcache flush. This adds some overhead to the tcache flush path (which is one of the popular paths). Guard it behind a config option.	2019-02-01 12:31:34 -08:00
Qi Wang	b33eb26dee	Tweak the spacing for the total_wait_time per second.	2019-01-28 15:37:19 -08:00
Qi Wang	e3db480f6f	Rename huge_threshold to oversize_threshold. The keyword huge tend to remind people of huge pages which is not relevent to the feature.	2019-01-25 13:15:45 -08:00
Qi Wang	350809dc5d	Set huge_threshold to 8M by default. This feature uses an dedicated arena to handle huge requests, which significantly improves VM fragmentation. In production workload we tested it often reduces VM size by >30%.	2019-01-24 13:29:23 -08:00
Qi Wang	522d1e7b4b	Tweak the spacing for nrequests in stats output.	2019-01-23 17:42:12 -08:00
Qi Wang	8c9571376e	Fix stats output (rate for total # of requests). The rate calculation for the total row was missing.	2019-01-23 17:42:12 -08:00
Qi Wang	7a815c1b7c	Un-experimental the huge_threshold feature.	2019-01-16 12:28:57 -08:00

... 3 4 5 6 7 ...

1351 Commits