server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Chris Peterson	a82070ef5f	Add JEMALLOC_ALLOC_JUNK and JEMALLOC_FREE_JUNK macros Replace hardcoded 0xa5 and 0x5a junk values with JEMALLOC_ALLOC_JUNK and JEMALLOC_FREE_JUNK macros, respectively.	2016-03-31 11:23:29 -07:00
Jason Evans	f86bc081d6	Update a comment.	2016-03-31 11:19:46 -07:00
Jason Evans	ce7c0f999b	Fix potential chunk leaks. Move chunk_dalloc_arena()'s implementation into chunk_dalloc_wrapper(), so that if the dalloc hook fails, proper decommit/purge/retain cascading occurs. This fixes three potential chunk leaks on OOM paths, one during dss-based chunk allocation, one during chunk header commit (currently relevant only on Windows), and one during rtree write (e.g. if rtree node allocation fails). Merge chunk_purge_arena() into chunk_purge_default() (refactor, no change to functionality).	2016-03-30 18:36:04 -07:00
Jason Evans	61a6dfcd5f	Constify various internal arena APIs.	2016-03-23 16:15:42 -07:00
Jason Evans	613cdc80f6	Convert arena_bin_t's runs from a tree to a heap.	2016-03-08 13:48:27 -08:00
Dave Watson	4a0dbb5ac8	Use pairing heap for arena->runs_avail Use pairing heap instead of red black tree in arena runs_avail. The extra links are unioned with the bitmap_t, so this change doesn't use any extra memory. Canaries show this change to be a 1% cpu win, and 2% latency win. In particular, large free()s, and small bin frees are now O(1) (barring coalescing). I also tested changing bin->runs to be a pairing heap, but saw a much smaller win, and it would mean increasing the size of arena_run_s by two pointers, so I left that as an rb-tree for now.	2016-03-08 13:48:27 -08:00
Jason Evans	022f6891fa	Avoid a potential innocuous compiler warning. Add a cast to avoid comparing a ssize_t value to a uint64_t value that is always larger than a 32-bit ssize_t. This silences an innocuous compiler warning from e.g. gcc 4.2.1 about the comparison always having the same result.	2016-03-02 22:45:37 -08:00
Dmitri Smirnov	33184bf698	Fix stack corruption and uninitialized var warning Stack corruption happens in x64 bit This resolves #347.	2016-02-29 15:22:53 -08:00
Jason Evans	3c07f803aa	Fix stats.arenas.<i>.[...] for --disable-stats case. Add missing stats.arenas.<i>.{dss,lg_dirty_mult,decay_time} initialization. Fix stats.arenas.<i>.{pactive,pdirty} to read under the protection of the arena mutex.	2016-02-27 20:40:13 -08:00
Jason Evans	40ee9aa957	Fix stats.cactive accounting regression. Fix stats.cactive accounting to always increase/decrease by multiples of the chunk size, even for huge size classes that are not multiples of the chunk size, e.g. {2.5, 3, 3.5, 5, 7} MiB with 2 MiB chunk size. This regression was introduced by `155bfa7da1` (Normalize size classes.) and first released in 4.0.0. This resolves #336.	2016-02-27 15:35:52 -08:00
Jason Evans	3763d3b5f9	Refactor arena_cactive_update() into arena_cactive_{add,sub}(). This removes an implicit conversion from size_t to ssize_t. For cactive decreases, the size_t value was intentionally underflowed to generate "negative" values (actually positive values above the positive range of ssize_t), and the conversion to ssize_t was undefined according to C language semantics. This regression was perpetuated by `1522937e9c` (Fix the cactive statistic.) and first release in 4.0.0, which in retrospect only fixed one of two problems introduced by `aa5113b1fd` (Refactor overly large/complex functions) and first released in 3.5.0.	2016-02-26 17:29:35 -08:00
Jason Evans	42ce80e15a	Silence miscellaneous 64-to-32-bit data loss warnings. This resolves #341.	2016-02-25 20:51:00 -08:00
Jason Evans	8282a2ad97	Remove a superfluous comment.	2016-02-25 16:44:48 -08:00
Jason Evans	0c516a00c4	Make *allocx() size class overflow behavior defined. Limit supported size and alignment to HUGE_MAXCLASS, which in turn is now limited to be less than PTRDIFF_MAX. This resolves #278 and #295.	2016-02-25 15:29:49 -08:00
Jason Evans	767d85061a	Refactor arenas array (fixes deadlock). Refactor the arenas array, which contains pointers to all extant arenas, such that it starts out as a sparse array of maximum size, and use double-checked atomics-based reads as the basis for fast and simple arena_get(). Additionally, reduce arenas_lock's role such that it only protects against arena initalization races. These changes remove the possibility for arena lookups to trigger locking, which resolves at least one known (fork-related) deadlock. This resolves #315.	2016-02-24 23:58:10 -08:00
Dave Watson	3812729167	Fix arena_size computation. Fix arena_size arena_new() computation to incorporate runs_avail_nclasses elements for runs_avail, rather than (runs_avail_nclasses - 1) elements. Since offsetof(arena_t, runs_avail) is used rather than sizeof(arena_t) for the first term of the computation, all of the runs_avail elements must be added into the second term. This bug was introduced (by Jason Evans) while merging pull request #330 as `3417a304cc` (Separate arena_avail trees).	2016-02-24 20:10:02 -08:00
Dave Watson	cd86c1481a	Fix arena_run_first_best_fit Merge of `3417a304cc` looks like a small bug: first_best_fit doesn't scan through all the classes, since ind is offset from runs_avail_nclasses by run_avail_bias.	2016-02-24 17:50:02 -08:00
Jason Evans	9e1810ca9d	Silence miscellaneous 64-to-32-bit data loss warnings.	2016-02-24 13:03:48 -08:00
Jason Evans	9f4ee6034c	Refactor jemalloc_ffs() into ffs_(). Use appropriate versions to resolve 64-to-32-bit data loss warnings.	2016-02-24 13:03:48 -08:00
Jason Evans	ae45142adc	Collapse arena_avail_tree_* into arena_run_tree_*. These tree types converged to become identical, yet they still had independently generated red-black tree implementations.	2016-02-23 18:27:24 -08:00
Dave Watson	3417a304cc	Separate arena_avail trees Separate run trees by index, replacing the previous quantize logic. Quantization by index is now performed only on insertion / removal from the tree, and not on node comparison, saving some cpu. This also means we don't have to dereference the miscelm* pointers, saving half of the memory loads from miscelms/mapbits that have fallen out of cache. A linear scan of the indicies appears to be fast enough. The only cost of this is an extra tree array in each arena.	2016-02-23 18:09:36 -08:00
Jason Evans	0da8ce1e96	Use table lookup for run_quantize_{floor,ceil}(). Reduce run quantization overhead by generating lookup tables during bootstrapping, and using the tables for all subsequent run quantization.	2016-02-22 16:47:34 -08:00
Jason Evans	08551eee58	Fix run_quantize_ceil(). In practice this bug had limited impact (and then only by increasing chunk fragmentation) because run_quantize_ceil() returned correct results except for inputs that could only arise from aligned allocation requests that required more than page alignment. This bug existed in the original run quantization implementation, which was introduced by `8a03cf039c` (Implement cache index randomization for large allocations.).	2016-02-22 16:28:00 -08:00
Jason Evans	a9a4684792	Test run quantization. Also rename run_quantize_*() to improve clarity. These tests demonstrate that run_quantize_ceil() is flawed.	2016-02-22 14:58:05 -08:00
Jason Evans	9bad079039	Refactor time_* into nstime_*. Use a single uint64_t in nstime_t to store nanoseconds rather than using struct timespec. This reduces fragility around conversions between long and uint64_t, especially missing casts that only cause problems on 32-bit platforms.	2016-02-21 21:39:05 -08:00
Jason Evans	243f7a0508	Implement decay-based unused dirty page purging. This is an alternative to the existing ratio-based unused dirty page purging, and is intended to eventually become the sole purging mechanism. Add mallctls: - opt.purge - opt.decay_time - arena.<i>.decay - arena.<i>.decay_time - arenas.decay_time - stats.arenas.<i>.decay_time This resolves #325.	2016-02-19 20:56:21 -08:00
Jason Evans	1a4ad3c0fa	Refactor out arena_compute_npurge(). Refactor out arena_compute_npurge() by integrating its logic into arena_stash_dirty() as an incremental computation.	2016-02-19 20:32:37 -08:00
Jason Evans	4985dc681e	Refactor arena_ralloc_no_move(). Refactor early return logic in arena_ralloc_no_move() to return early on failure rather than on success.	2016-02-19 20:32:37 -08:00
Jason Evans	578cd16581	Refactor arena_malloc_hard() out of arena_malloc().	2016-02-19 20:32:32 -08:00
Jason Evans	34676d3369	Refactor prng* from cpp macros into inline functions. Remove 32-bit variant, convert prng64() to prng_lg_range(), and add prng_range().	2016-02-19 20:29:06 -08:00
Qi Wang	f4a0f32d34	Fast-path improvement: reduce # of branches and unnecessary operations. - Combine multiple runtime branches into a single malloc_slow check. - Avoid calling arena_choose / size2index / index2size on fast path. - A few micro optimizations.	2015-11-10 14:28:34 -08:00
Joshua Kahn	13b4015531	Allow const keys for lookup Signed-off-by: Steve Dougherty <sdougherty@barracuda.com> This resolves #281.	2015-11-09 15:48:05 -08:00
Mike Hommey	f97298bfc1	Remove arena_run_dalloc_decommit(). This resolves #284.	2015-11-09 15:38:30 -08:00
Jason Evans	a784e411f2	Fix a xallocx(..., MALLOCX_ZERO) bug. Fix xallocx(..., MALLOCX_ZERO to zero the last full trailing page of large allocations that have been randomly assigned an offset of 0 when --enable-cache-oblivious configure option is enabled. This addresses a special case missed in `d260f442ce` (Fix xallocx(..., MALLOCX_ZERO) bugs.).	2015-09-24 22:21:55 -07:00
Jason Evans	d260f442ce	Fix xallocx(..., MALLOCX_ZERO) bugs. Zero all trailing bytes of large allocations when --enable-cache-oblivious configure option is enabled. This regression was introduced by `8a03cf039c` (Implement cache index randomization for large allocations.). Zero trailing bytes of huge allocations when resizing from/to a size class that is not a multiple of the chunk size.	2015-09-24 16:38:45 -07:00
Jason Evans	e56b24e3a2	Make arena_dalloc_large_locked_impl() static.	2015-09-20 09:58:10 -07:00
Jason Evans	9a505b768c	Centralize xallocx() size[+extra] overflow checks.	2015-09-15 14:39:58 -07:00
Jason Evans	676df88e48	Rename arena_maxclass to large_maxclass. arena_maxclass is no longer an appropriate name, because arenas also manage huge allocations.	2015-09-11 20:50:20 -07:00
Jason Evans	560a4e1e01	Fix xallocx() bugs. Fix xallocx() bugs related to the 'extra' parameter when specified as non-zero.	2015-09-11 20:40:34 -07:00
Dmitry-Me	a306a60651	Reduce variables scope	2015-09-04 10:42:33 -07:00
Jason Evans	d01fd19755	Rename index_t to szind_t to avoid an existing type on Solaris. This resolves #256.	2015-08-19 15:21:32 -07:00
Jason Evans	5ef33a9f2b	Don't bitshift by negative amounts. Don't bitshift by negative amounts when encoding/decoding run sizes in chunk header maps. This affected systems with page sizes greater than 8 KiB. Reported by Ingvar Hagelund <ingvar@redpill-linpro.com>.	2015-08-19 14:16:30 -07:00
Jason Evans	1f27abc1b1	Refactor arena_mapbits_{small,large}_set() to not preserve unzeroed. Fix arena_run_split_large_helper() to treat newly committed memory as zeroed.	2015-08-11 16:45:47 -07:00
Jason Evans	45186f0c07	Refactor arena_mapbits unzeroed flag management. Only set the unzeroed flag when initializing the entire mapbits entry, rather than mutating just the unzeroed bit. This simplifies the possible mapbits state transitions.	2015-08-10 23:03:34 -07:00
Jason Evans	de249c8679	Arena chunk decommit cleanups and fixes. Decommit arena chunk header during chunk deallocation if the rest of the chunk is decommitted.	2015-08-10 17:13:59 -07:00
Jason Evans	8fadb1a8c2	Implement chunk hook support for page run commit/decommit. Cascade from decommit to purge when purging unused dirty pages, so that it is possible to decommit cleaned memory rather than just purging. For non-Windows debug builds, decommit runs rather than purging them, since this causes access of deallocated runs to segfault. This resolves #251.	2015-08-07 00:50:58 -07:00
Jason Evans	5716d97f75	Fix an in-place growing large reallocation regression. Fix arena_ralloc_large_grow() to properly account for large_pad, so that in-place large reallocation succeeds when possible, rather than always failing. This regression was introduced by `8a03cf039c` (Implement cache index randomization for large allocations.)	2015-08-06 23:45:45 -07:00
Jason Evans	b49a334a64	Generalize chunk management hooks. Add the "arena.<i>.chunk_hooks" mallctl, which replaces and expands on the "arena.<i>.chunk.{alloc,dalloc,purge}" mallctls. The chunk hooks allow control over chunk allocation/deallocation, decommit/commit, purging, and splitting/merging, such that the application can rely on jemalloc's internal chunk caching and retaining functionality, yet implement a variety of chunk management mechanisms and policies. Merge the chunks_[sz]ad_{mmap,dss} red-black trees into chunks_[sz]ad_retained. This slightly reduces how hard jemalloc tries to honor the dss precedence setting; prior to this change the precedence setting was also consulted when recycling chunks. Fix chunk purging. Don't purge chunks in arena_purge_stashed(); instead deallocate them in arena_unstash_purged(), so that the dirty memory linkage remains valid until after the last time it is used. This resolves #176 and #201.	2015-08-03 21:49:02 -07:00
Jason Evans	50883deb6e	Change arena_palloc_large() parameter from size to usize. This change merely documents that arena_palloc_large() always receives usize as its argument.	2015-07-23 17:13:18 -07:00
Jason Evans	5fae7dc1b3	Fix MinGW-related portability issues. Create and use FMT* macros that are equivalent to the PRI* macros that inttypes.h defines. This allows uniform use of the Unix-specific format specifiers, e.g. "%zu", as well as avoiding Windows-specific definitions of e.g. PRIu64. Add ffs()/ffsl() support for compiling with gcc. Extract compatibility definitions of ENOENT, EINVAL, EAGAIN, EPERM, ENOMEM, and ENORANGE into include/msvc_compat/windows_extra.h and use the file for tests as well as for core jemalloc code.	2015-07-23 13:56:25 -07:00

1 2 3 4

171 Commits