server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
David T. Goldblatt	3d84bd57f4	Arena: Add helper function arena_get_from_extent.	2019-09-23 23:06:27 -07:00
David T. Goldblatt	c97d255752	Eset: Remove temporary declaration.	2019-09-23 23:06:27 -07:00
David T. Goldblatt	ce5b128f10	Remove the undefined extent_size_quantize declarations.	2019-09-23 23:06:27 -07:00
David T. Goldblatt	821dd53a1d	Extent -> Eset: Rename arena members.	2019-09-23 23:06:27 -07:00
David T. Goldblatt	e144b21e4b	Extent -> Eset: Move fork handling.	2019-09-23 23:06:27 -07:00
David T. Goldblatt	77bbb35a92	Extent -> Eset: Move extent fit functions.	2019-09-23 23:06:27 -07:00
David T. Goldblatt	1210af9a4e	Extent -> Eset: Move insertion and removal.	2019-09-23 23:06:27 -07:00
David T. Goldblatt	a42861540e	Extents -> Eset: Convert some stats getters.	2019-09-23 23:06:27 -07:00
David T. Goldblatt	820f070c6b	Move page quantization to sz module.	2019-09-23 23:06:27 -07:00
David T. Goldblatt	63d1b7a7a7	Extents -> Eset: move extents_state_get.	2019-09-23 23:06:27 -07:00
David T. Goldblatt	b416b96a39	Extents -> Eset: rename/move extents_init.	2019-09-23 23:06:27 -07:00
David T. Goldblatt	4e5e43f22e	Rename extents_t -> eset_t.	2019-09-23 23:06:27 -07:00
David T. Goldblatt	723ccc6c27	Extents: Split out extent struct.	2019-09-23 23:06:27 -07:00
David T. Goldblatt	41187bdfb0	Extents: Break extent-struct/arena interactions Specifically, the extent_arena_[g\|s]et functions and the address randomization. These are the only things that tie the extent struct itself to the arena code.	2019-09-23 23:06:27 -07:00
David T. Goldblatt	529cfe2abc	Arena: rename arena_structs_b.h -> arena_structs.h arena_structs_a.h was removed in the previous commit.	2019-09-23 23:06:27 -07:00
David T. Goldblatt	e7cf84a8dd	Rearrange slab data and constants The constants logically belong in the sc module. The slab data bitmap isn't really scoped to an arena; move it to its own module.	2019-09-23 23:06:27 -07:00
zhxchen17	b7c7df24ba	Add max_per_bg_thd stats for per background thread mutexes. Added a new stats row to aggregate the maximum value of mutex counters for each background threads. Given that the per bg thd mutex is not expected to be contended, this counter is mainly for sanity check / debugging.	2019-09-13 09:23:57 -07:00
zhxchen17	4b76c684bb	Add "prof.dump_prefix" to override filename prefixes for dumps.	2019-09-12 22:26:03 -07:00
zhxchen17	242af439b8	Rename "prof_dump_seq_mtx" to "prof_dump_filename_mtx".	2019-09-12 22:26:03 -07:00
Yinan Zhang	93d6151800	Pass tsd down to prof_backtrace()	2019-09-05 10:57:43 -07:00
Qi Wang	785b84e603	Make cache_bin_sz_t unsigned. The bin size type was made signed only because the low_water could go -1, which was already removed.	2019-09-04 13:37:07 -07:00
Qi Wang	23dc7a7fba	Fix index type for cache_bin_alloc_easy.	2019-09-04 13:37:07 -07:00
Yinan Zhang	57b81c078e	Pull thread_(de)allocated out of config_stats	2019-08-26 11:56:41 -07:00
Qi Wang	0043e68d4c	Track low_water == -1 case explicitly. The -1 value of low_water indicates if the cache has been depleted and refilled. Track the status explicitly in the tcache struct. This allows the fast path to check if (cur_ptr > low_water), instead of >=, which avoids reaching slow path when the last item is allocated.	2019-08-21 16:00:38 -07:00
Qi Wang	937ca1db9f	Store ncached_max * ptr_size in tcache_bin_info. With the cache bin metadata switched to pointers, ncached_max is usually accessed and timed by sizeof(ptr). Store the results in tcache_bin_info for direct access, and add a helper function for the ncached_max value.	2019-08-19 12:23:24 -07:00
Qi Wang	7599c82d48	Redesign the cache bin metadata for fast path. Implement the pointer-based metadata for tcache bins -- - 3 pointers are maintained to represent each bin; - 2 of the pointers are compressed on 64-bit; - is_full / is_empty done through pointer comparison; Comparing to the previous counter based design -- - fast-path speed up ~15% in benchmarks - direct pointer comparison and de-reference - no need to access tcache_bin_info in common case	2019-08-19 12:21:44 -07:00
Qi Wang	e2c7584361	Simplify / refactor tcache_dalloc_large.	2019-08-14 13:08:23 -07:00
Qi Wang	9c5c2a2c86	Unify the signature of tcache_flush small and large.	2019-08-14 13:08:23 -07:00
Yinan Zhang	8c8466fa6e	Add compact json option for emitter JSON format is largely meant for machine-machine communication, so adding the option to the emitter. According to local testing, the savings in terms of bytes outputted is around 50% for stats printing and around 25% for prof log printing.	2019-08-09 09:53:41 -07:00
Yinan Zhang	7fc6b1b259	Add buffered writer The buffered writer adopts a signature identical to `write_cb`, so that it can be plugged into anywhere `write_cb` appears.	2019-08-09 09:44:29 -07:00
Yinan Zhang	39343555d6	Report stats for tdatas_mtx and prof_dump_mtx	2019-08-09 09:24:16 -07:00
Yinan Zhang	07ce2434bf	Refactor profiling Refactored core profiling codebase into two logical parts: (a) `prof_data.c`: core internal data structure managing & dumping; (b) `prof.c`: mutexes & outward-facing APIs. Some internal functions had to be exposed out, but there are not that many of them if the modularization is (hopefully) clean enough.	2019-08-07 19:48:28 -07:00
Yinan Zhang	56126d0d2d	Refactor prof log Prof logging is conceptually seperate from core profiling, so split it out as a module of its own. There are a few internal functions that had to be exposed but I think it is a fair trade-off.	2019-08-07 13:53:45 -07:00
Yinan Zhang	56c8ecffc1	Correct tsd layout graph Augmented the tsd layout graph so that the two recently added fields, `offset_state` and `bytes_until_sample`, are properly reflected. As is shown, the cache footprint is 16 bytes larger than before.	2019-08-05 15:30:20 -07:00
Yinan Zhang	9344d25488	Workaround to address g++ unused variable warnings g++ 5.5.0+ complained `parameter ‘expected’ set but not used [-Werror=unused-but-set-parameter]` (despite that `expected` is in fact used).	2019-07-30 11:37:56 -07:00
Qi Wang	5742473cc8	Revert "Refactor prof log" This reverts commit `7618b0b8e4`.	2019-07-29 14:10:15 -07:00
Qi Wang	1a0503367b	Revert "Refactor profiling" This reverts commit `0b462407ae`.	2019-07-29 14:10:15 -07:00
Yinan Zhang	0b462407ae	Refactor profiling Refactored core profiling codebase into two logical parts: (a) `prof_data.c`: core internal data structure managing & dumping; (b) `prof.c`: mutexes & outward-facing APIs. Some internal functions had to be exposed out, but there are not that many of them if the modularization is (hopefully) clean enough.	2019-07-29 13:55:00 -07:00
Yinan Zhang	7618b0b8e4	Refactor prof log `prof.c` is growing too long, so trying to modularize it. There are a few internal functions that had to be exposed but I think it is a fair trade-off.	2019-07-29 13:55:00 -07:00
Qi Wang	a3fa597921	Refactor arena_dalloc() / _sdalloc().	2019-07-24 18:30:54 -07:00
Qi Wang	bc0998a905	Invoke arena_dalloc_promoted() properly w/o tcache. When tcache was disabled, the dalloc promoted case was missing.	2019-07-24 18:30:54 -07:00
Qi Wang	4e36ce34c1	Track the leaked VM space via the abandoned_vm counter. The counter is 0 unless metadata allocation failed (indicates OOM), and is mainly for sanity checking.	2019-07-24 11:24:22 -07:00
Qi Wang	9a86c65abc	Implement retain on Windows. The VirtualAlloc and VirtualFree APIs are different because MEM_DECOMMIT cannot be used across multiple VirtualAlloc regions. To properly support decommit, only allow merge / split within the same region -- this is done by tracking the "is_head" state of extents and not merging cross-region. Add a new state is_head (only relevant for retain && !maps_coalesce), which is true for the first extent in each VirtualAlloc region. Determine if two extents can be merged based on the head state, and use serial numbers for sanity checks.	2019-07-23 22:18:55 -07:00
Yinan Zhang	a2a693e722	Remove prof_accumbytes in arena `prof_accumbytes` was supposed to be replaced by `prof_accum` in https://github.com/jemalloc/jemalloc/pull/623.	2019-07-16 15:18:52 -07:00
Yinan Zhang	d26636d566	Fix logic in printing `cbopaque` can now be overriden without overriding `write_cb` in the first place. (Otherwise there would be no need to have the `cbopaque` parameter in `malloc_message`.)	2019-07-16 14:54:23 -07:00
Yinan Zhang	7720b6e385	Fix redzone setting and checking	2019-07-11 20:51:29 -07:00
Yinan Zhang	c92ac30601	Add confirm_conf option If the confirm_conf option is set, when the program starts, each of the four malloc_conf strings will be printed, and each option will be printed when being set.	2019-05-22 09:38:39 -07:00
Qi Wang	07c44847c2	Track nfills and nflushes for arenas.i.small / large. Small is added purely for convenience. Large flushes wasn't tracked before and can be useful in analysis. Large fill simply reports nmalloc, since there is no batch fill for large currently.	2019-05-15 10:05:09 -07:00
Doron Roberts-Kedes	7fc4f2a32c	Add nonfull_slabs to bin_stats_t. When config_stats is enabled track the size of bin->slabs_nonfull in the new nonfull_slabs counter in bin_stats_t. This metric should be useful for establishing an upper ceiling on the savings possible by meshing.	2019-04-29 13:35:02 -07:00
Yinan Zhang	ae124b8684	Improve size class header Mainly fixing typos. The only non-trivial change is in the computation for SC_NPSIZES, though the result wouldn't be any different when SC_NGROUP = 4 as is always the case at the moment.	2019-04-24 10:45:12 -07:00
Qi Wang	1aabab5fdc	Enforce TLS_MODEL attribute. Caught by @zoulasc in #1460. The attribute needs to be added in the headers as well.	2019-04-16 11:07:15 -07:00
David Goldblatt	33e1dad680	Safety checks: Add a redzoning feature.	2019-04-15 16:48:12 -07:00
David Goldblatt	b92c9a1a81	Safety checks: Indirect through a function. This will let us share code on failure pathways.pathways	2019-04-15 16:48:12 -07:00
David Goldblatt	f4d24f05e1	Move extra size checks behind a config flag. This will let us turn that flag into a generic "turn on runtime checks" flag that guards other functionality we have planned.	2019-04-15 16:48:12 -07:00
zoulasc	7f7935cf78	Add an autoconf feature test for format_arg and a jemalloc-specific macro for it.	2019-04-15 15:14:46 -07:00
zoulasc	14e4176758	Fix incorrect macro use. Compiling with warnings produces missing prototype warnings.	2019-04-15 15:14:46 -07:00
zoulasc	020b5dc7ac	Convert the format generator function to an annotated format function, so that the generated formats can be checked by the compiler.	2019-04-15 15:14:46 -07:00
mgrice	d3d7a8ef09	remove compare and branch in fast path for c++ operator delete[] Summary: sdallocx is checking a flag that will never be set (at least in the provided C++ destructor implementation). This branch will probably only rarely be mispredicted however it removes two instructions in sdallocx and one at the callsite (to zero out flags).	2019-04-08 10:59:05 -07:00
Yinan Zhang	9aab3f2be0	Add memory utilization analytics to mallctl The analytics tool is put under experimental.utilization namespace in mallctl. Input is one pointer or an array of pointers and the output is a list of memory utilization statistics.	2019-04-04 13:48:39 -07:00
Qi Wang	fb56766ca9	Eagerly purge oversized merged extents. This change improves memory usage slightly, at virtually no CPU cost.	2019-03-14 17:34:55 -07:00
Qi Wang	f6c30cbafa	Remove some unused comments.	2019-03-14 17:34:55 -07:00
Qi Wang	b804d0f019	Fallback to 32-bit when 8-bit atomics are missing for TSD. When it happens, this might cause a slowdown on the fast path operations. However such case is very rare.	2019-03-09 12:52:06 -08:00
Qi Wang	06f0850427	Detect if 8-bit atomics are available. In some rare cases (older compiler, e.g. gcc 4.2 w/ MIPS), 8-bit atomics might be unavailable. Detect such cases so that we can workaround.	2019-03-09 12:52:06 -08:00
Jason Evans	14d3686c9f	Do not use #pragma GCC diagnostic with gcc < 4.6. This regression was introduced by `3d29d11ac2` (Clean compilation -Wextra).	2019-03-09 12:10:30 -08:00
Jason Evans	775fe302a7	Remove JE_FORCE_SYNC_COMPARE_AND_SWAP_[48]. These macros have been unused since `d4ac7582f3` (Introduce a backport of C11 atomics).	2019-02-22 14:22:16 -08:00
Jason Evans	dca7060d5e	Avoid redefining tsd_t. This fixes a build failure when integrating with FreeBSD's libc. This regression was introduced by `d1e11d48d4` (Move tsd link and in_hook after tcache.).	2019-02-20 20:27:55 -08:00
Qi Wang	8e9a613122	Disable muzzy decay by default.	2019-02-04 14:38:54 -08:00
Qi Wang	e13400c919	Sanity check szind on tcache flush. This adds some overhead to the tcache flush path (which is one of the popular paths). Guard it behind a config option.	2019-02-01 12:31:34 -08:00
Qi Wang	e3db480f6f	Rename huge_threshold to oversize_threshold. The keyword huge tend to remind people of huge pages which is not relevent to the feature.	2019-01-25 13:15:45 -08:00
Qi Wang	350809dc5d	Set huge_threshold to 8M by default. This feature uses an dedicated arena to handle huge requests, which significantly improves VM fragmentation. In production workload we tested it often reduces VM size by >30%.	2019-01-24 13:29:23 -08:00
Qi Wang	bbe8e6a909	Avoid creating bg thds for huge arena lone. For low arena count settings, the huge threshold feature may trigger an unwanted bg thd creation. Given that the huge arena does eager purging by default, bypass bg thd creation when initializing the huge arena.	2019-01-15 16:00:34 -08:00
Qi Wang	f459454afe	Avoid potential issues on extent zero-out. When custom extent_hooks or transparent huge pages are in use, the purging semantics may change, which means we may not get zeroed pages on repopulating. Fixing the issue by manually memset for such cases.	2019-01-11 19:16:12 -08:00
Leonardo Santagada	daa0e436ba	implement malloc_getcpu for windows	2019-01-08 14:34:45 -08:00
Qi Wang	7241bf5b74	Only read arena index from extent on the tcache flush path. Add exten_arena_ind_get() to avoid loading the actual arena ptr in case we just need to check arena matching.	2018-12-18 15:19:30 -08:00
Alexander Zinoviev	36de5189c7	Add rate counters to stats	2018-12-18 09:59:41 -08:00
Qi Wang	98b56ab23d	Store the bin shard selection in TSD. This avoids having to choose bin shard on the fly, also will allow flexible bin binding for each thread.	2018-12-03 17:17:03 -08:00
Qi Wang	3f9f2833f6	Add opt.bin_shards to specify number of bin shards. The option uses the same format as "slab_sizes" to specify number of shards for each bin size.	2018-12-03 17:17:03 -08:00
Qi Wang	37b8913925	Add support for sharded bins within an arena. This makes it possible to have multiple set of bins in an arena, which improves arena scalability because the bins (especially the small ones) are always the limiting factor in production workload. A bin shard is picked on allocation; each extent tracks the bin shard id for deallocation. The shard size will be determined using runtime options.	2018-12-03 17:17:03 -08:00
Dave Watson	b23336af96	mutex: fix trylock spin wait contention If there are 3 or more threads spin-waiting on the same mutex, there will be excessive exclusive cacheline contention because pthread_trylock() immediately tries to CAS in a new value, instead of first checking if the lock is locked. This diff adds a 'locked' hint flag, and we will only spin wait without trylock()ing while set. I don't know of any other portable way to get the same behavior as pthread_mutex_lock(). This is pretty easy to test via ttest, e.g. ./ttest1 500 3 10000 1 100 Throughput is nearly 3x as fast. This blames to the mutex profiling changes, however, we almost never have 3 or more threads contending in properly configured production workloads, but still worth fixing.	2018-11-28 15:17:02 -08:00
Qi Wang	c4063ce439	Set the default number of background threads to 4. The setting has been tested in production for a while. No negative effect while we were able to reduce number of threads per process.	2018-11-16 09:35:12 -08:00
Qi Wang	43f3b1ad0c	Deprecate OSSpinLock.	2018-11-14 08:44:05 -08:00
Dave Watson	13c237c7ef	Add a fastpath for arena_slab_reg_alloc_batch Also adds a configure.ac check for __builtin_popcount, which is used in the new fastpath.	2018-11-14 07:09:11 -08:00
Dave Watson	17aa470760	add extent_nfree_sub	2018-11-14 07:09:11 -08:00
Qi Wang	1f56115704	Fix tcache_flush (follow up `cd2931a`). Also catch invalid tcache id.	2018-11-13 08:54:09 -08:00
Dave Watson	e2ab215324	refactor tcache_dalloc_small Add a cache_bin_dalloc_easy (to match the alloc_easy function), and use it in tcache_dalloc_small. It will also be used in the new free fastpath.	2018-11-12 13:20:37 -08:00
Dave Watson	5e795297b3	rtree: add rtree_szind_slab_read_fast For a free fastpath, we want something that will not make additional calls. Assume most free() calls will hit the L1 cache, and use a custom rtree function for this. Additionally, roll the ptr=NULL check in to the rtree cache check.	2018-11-12 13:20:37 -08:00
Justin Hibbits	be0749f591	Restrict lwsync to powerpc64 only Nearly all 32-bit powerpc hardware treats lwsync as sync, and some cores (Freescale e500) trap lwsync as an illegal instruction, which then gets emulated in the kernel. To avoid unnecessary traps on the e500, use sync on all 32-bit powerpc. This pessimizes 32-bit software running on 64-bit hardware, but those numbers should be slim.	2018-10-24 11:18:55 -07:00
Edward Tomasz Napierala	ceba1dde27	Make use of pthread_set_name_np(3) on FreeBSD.	2018-10-24 10:06:37 -07:00
Dave Watson	936bc2aa15	prof: Fix memory regression The diff 'refactor prof accum...' moved the bytes_until_sample subtraction before the load of tdata. If tdata is null, tdata_get(true) will overwrite bytes_until_sample, but we still sample the current allocation. Instead, do the subtraction and check logic again, to keep the previous behavior. blame-rev: `0ac524308d`	2018-10-23 12:39:57 -07:00
Dave Watson	0ec656eb71	ticker: add ticker_trytick For the fastpath, we want to tick, but undo the tick and jump to the slowpath if ticker would fire.	2018-10-18 08:32:19 -07:00
Dave Watson	ac34afb403	drop bump_empty_alloc option. Size class lookup support used instead.	2018-10-17 08:50:58 -07:00
Dave Watson	4edbb7c64c	sz: Support 0 size in size2index lookup/compute	2018-10-17 08:50:58 -07:00
gnzlbg	08260a6b94	Add experimental API: smallocx_return_t smallocx(size, flags) --- Motivation: This new experimental memory-allocaction API returns a pointer to the allocation as well as the usable size of the allocated memory region. The `s` in `smallocx` stands for `sized`-`mallocx`, attempting to convey that this API returns the size of the allocated memory region. It should allow C++ P0901r0 [0] and Rust Alloc::alloc_excess to make use of it. The main purpose of these APIs is to improve telemetry. It is more accurate to register `smallocx(size, flags)` than `smallocx(nallocx(size), flags)`, for example. The latter will always line up perfectly with the existing size classes, causing a loss of telemetry information about the internal fragmentation induced by potentially poor size-classes choices. Instrumenting `nallocx` does not help much since user code can cache its result and use it repeatedly. --- Implementation: The implementation adds a new `usize` option to `static_opts_s` and an `usize` variable to `dynamic_opts_s`. These are then used to cache the result of `sz_index2size` and similar functions in the code paths in which they are unconditionally invoked. In the code-paths in which these functions are not unconditionally invoked, `smallocx` calls, as opposed to `mallocx`, these functions explicitly. --- [0]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0901r0.html	2018-10-17 07:12:28 -07:00
Dave Watson	325e3305fc	remove malloc_init() off the fastpath	2018-10-15 10:11:08 -07:00
Dave Watson	997d86acc6	restrict bytes_until_sample to int64_t. This allows optimal asm generation of sub bytes_until_sample, usize; je; for x86 arch. Subtraction is unconditional, and only flags are checked for the jump, no extra compare is necessary. This also reduces register pressure.	2018-10-15 08:24:12 -07:00
Dave Watson	0ac524308d	refactor prof accum, so that tdata is not loaded if we aren't going to sample.	2018-10-15 08:24:12 -07:00
Dave Watson	9ed3bdc848	move bytes until sample to tsd. Fastpath allocation does not need to load tdata now, avoiding several branches.	2018-10-15 08:24:12 -07:00
Dave Watson	09adf18f1a	Remove a branch from cache_bin_alloc_easy Combine the branches for checking for an empty cache_bin, and checking for the low watermark.	2018-10-15 08:18:15 -07:00
Rajeev Misra	115ce93562	bit_util: Don't use __builtin_clz on s390x There's an optimizer bug upstream that results in test failures; reported at https://bugzilla.redhat.com/show_bug.cgi?id=1619354. This works around the failure reported at https://github.com/jemalloc/jemalloc/issues/1307.	2018-09-20 11:25:17 -07:00
Rajeev Misra	4c548a61c8	Bit_util: Use intrinsics for pow2_ceil, where available.	2018-08-15 19:38:31 -07:00

1 2 3 4 5 ...

1000 Commits