server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Jason Evans	1ff09534b5	Fix prof_realloc() regression. Mostly revert the prof_realloc() changes in 498856f44a30b31fe713a18eb2fc7c6ecf3a9f63 (Move slabs out of chunks.) so that prof_free_sampled_object() is called when appropriate. Leave the prof_tctx_[re]set() optimization in place, but add an assertion to verify that all eight cases are correctly handled. Add a comment to make clear the code ordering, so that the regression originally fixed by ea8d97b8978a0c0423f0ed64332463a25b787c3d (Fix prof_{malloc,free}_sample_object() call order in prof_realloc().) is not repeated. This resolves #499.	2017-01-17 15:16:37 -08:00
Jason Evans	ffbb7dac3d	Remove leading blank lines from function bodies. This resolves #535.	2017-01-13 14:49:24 -08:00
David Goldblatt	77cccac8cd	Break up headers into constituent parts This is part of a broader change to make header files better represent the dependencies between one another (see https://github.com/jemalloc/jemalloc/issues/533). It breaks up component headers into smaller parts that can be made to have a simpler dependency graph. For the autogenerated headers (smoothstep.h and size_classes.h), no splitting was necessary, so I didn't add support to emit multiple headers.	2017-01-12 15:43:51 -08:00
David Goldblatt	94c5d22a4d	Remove mb.h, which is unused	2017-01-11 13:24:30 -08:00
John Paul Adrian Glaubitz	77de5f27d8	Use better pre-processor defines for sparc64 Currently, jemalloc detects sparc64 targets by checking whether __sparc64__ is defined. However, this definition is used on BSD targets only. Linux targets define both __sparc__ and __arch64__ for sparc64. Since this also works on BSD, rather use __sparc__ and __arch64__ instead of __sparc64__ to detect sparc64 targets.	2017-01-10 17:39:54 -08:00
Jason Evans	edf1bafb2b	Implement arena.<i>.destroy . Add MALLCTL_ARENAS_DESTROYED for accessing destroyed arena stats as an analogue to MALLCTL_ARENAS_ALL. This resolves #382.	2017-01-06 18:58:46 -08:00
Jason Evans	6edbedd916	Range-check mib[1] --> arena_ind casts.	2017-01-06 18:58:46 -08:00
Jason Evans	c0a05e6aba	Move static ctl_epoch variable into ctl_stats_t (as epoch).	2017-01-06 18:58:45 -08:00
Jason Evans	d778dd2afc	Refactor ctl_stats_t. Refactor ctl_stats_t to be a demand-zeroed non-growing data structure. To keep the size from being onerous (~60 MiB) on 32-bit systems, convert the arenas field to contain pointers rather than directly embedded ctl_arena_stats_t elements.	2017-01-06 18:58:45 -08:00
Jason Evans	0f04bb1d6f	Rename the arenas.extend mallctl to arenas.create.	2017-01-06 18:58:45 -08:00
Jason Evans	3dc4e83ccb	Add MALLCTL_ARENAS_ALL. Add the MALLCTL_ARENAS_ALL cpp macro as a fixed index for use in accessing the arena.<i>.{purge,decay,dss} and stats.arenas.<i>.* mallctls, and deprecate access via the arenas.narenas index (to be removed in 6.0.0).	2017-01-06 18:58:45 -08:00
Jason Evans	027ace8519	Reindent.	2017-01-06 18:58:45 -08:00
Jason Evans	a0dd3a4483	Implement per arena base allocators. Add/rename related mallctls: - Add stats.arenas.<i>.base . - Rename stats.arenas.<i>.metadata to stats.arenas.<i>.internal . - Add stats.arenas.<i>.resident . Modify the arenas.extend mallctl to take an optional (extent_hooks_t *) argument so that it is possible for all base allocations to be serviced by the specified extent hooks. This resolves #463.	2016-12-26 18:08:28 -08:00
Jason Evans	a6e86810d8	Refactor purging and splitting/merging. Split purging into lazy and forced variants. Use the forced variant for zeroing dss. Add support for NULL function pointers as an opt-out mechanism for the dalloc, commit, decommit, purge_lazy, purge_forced, split, and merge fields of extent_hooks_t. Add short-circuiting checks in large_ralloc_no_move_{shrink,expand}() so that no attempt is made if splitting/merging is not supported. This resolves #268.	2016-12-26 18:08:16 -08:00
Jason Evans	884fa22b8c	Rename arena_decay_t's ndirty to nunpurged.	2016-12-26 17:59:43 -08:00
Jason Evans	411697adcd	Use exponential series to size extents. If virtual memory is retained, allocate extents such that their sizes form an exponentially growing series. This limits the number of disjoint virtual memory ranges so that extent merging can be effective even if multiple arenas' extent allocation requests are highly interleaved. This resolves #462.	2016-12-26 17:59:42 -08:00
Jason Evans	c1baa0a9b7	Add huge page configuration and pages_[no}huge(). Add the --with-lg-hugepage configure option, but automatically configure LG_HUGEPAGE even if it isn't specified. Add the pages_[no]huge() functions, which toggle huge page state via madvise(..., MADV_[NO]HUGEPAGE) calls.	2016-12-26 17:59:34 -08:00
Jason Evans	bacb6afc6c	Simplify arena_slab_regind(). Rewrite arena_slab_regind() to provide sufficient constant data for the compiler to perform division strength reduction. This replaces more general manual strength reduction that was implemented before arena_bin_info was compile-time-constant. It would be possible to slightly improve on the compiler-generated division code by taking advantage of range limits that the compiler doesn't know about.	2016-12-23 10:34:34 -08:00
Jason Evans	69c26cdb01	Add some missing explicit casts.	2016-12-13 13:38:11 -08:00
Dave Watson	2319152d9f	jemalloc cpp new/delete bindings Adds cpp bindings for jemalloc, along with necessary autoconf settings. This is mostly to add sized deallocation support, which can't be added from C directly. Sized deallocation is ~10% microbench improvement. * Import ax_cxx_compile_stdcxx.m4 from the autoconf repo, seems like the easiest way to get c++14 detection. * Adds various other changes, like CXXFLAGS, to configure.ac. * Adds new rules to Makefile.in for src/jemalloc-cpp.cpp, and a basic unittest. * Both new and delete are overridden, to ensure jemalloc is used for both. * TODO future enhancement of avoiding extra PLT thunks for new and delete - sdallocx and malloc are publicly exported jemalloc symbols, using an alias would link them directly. Unfortunately, was having trouble getting it to play nice with jemalloc's namespace support. Testing: Tested gcc 4.8, gcc 5, gcc 5.2, clang 4.0. Only gcc >= 5 has sized deallocation support, verified that the rest build correctly. Tested mac osx and Centos. Tested --with-jemalloc-prefix and --without-export. This resolves #202.	2016-12-12 18:36:06 -08:00
Jason Evans	d4c5aceb7c	Add a_type parameter to qr_{meld,split}().	2016-12-12 18:16:51 -08:00
Jason Evans	acb7b1f53e	Add --disable-syscall. This resolves #517.	2016-12-03 16:50:58 -08:00
Jason Evans	32127949a3	Enable overriding JEMALLOC_{ALLOC,FREE}_JUNK. This resolves #509.	2016-11-22 10:58:58 -08:00
Jason Evans	c3b85f2585	Style fixes.	2016-11-22 10:58:23 -08:00
Jason Evans	5234be2133	Add pthread_atfork(3) feature test. Some versions of Android provide a pthreads library without providing pthread_atfork(), so in practice a separate feature test is necessary for the latter.	2016-11-17 15:14:57 -08:00
Jason Evans	fda60be799	Update a comment.	2016-11-17 11:50:52 -08:00
Jason Evans	a64123ce13	Refactor madvise(2) configuration. Add feature tests for the MADV_FREE and MADV_DONTNEED flags to madvise(2), so that MADV_FREE is detected and used for Linux kernel versions 4.5 and newer. Refactor pages_purge() so that on systems which support both flags, MADV_FREE is preferred over MADV_DONTNEED. This resolves #387.	2016-11-17 10:31:57 -08:00
Jason Evans	a38acf716e	Add extent serial numbers. Add extent serial numbers and use them where appropriate as a sort key that is higher priority than address, so that the allocation policy prefers older extents. This resolves #147.	2016-11-15 13:08:33 -08:00
Jason Evans	cda59f9970	Rename atomic__{uint32,uint64,u}() to atomic__{u32,u64,zu}(). This change conforms to naming conventions throughout the codebase.	2016-11-07 11:27:48 -08:00
Jason Evans	2e46b13ad5	Revert "Define 64-bits atomics unconditionally" This reverts commit c2942e2c0e097e7c75a3addd0b9c87758f91692e. This resolves #495.	2016-11-07 10:53:35 -08:00
Jason Evans	04b463546e	Refactor prng to not use 64-bit atomics on 32-bit platforms. This resolves #495.	2016-11-07 10:52:44 -08:00
Jason Evans	ea9961acdb	Fix psz/pind edge cases. Add an "over-size" extent heap in which to store extents which exceed the maximum size class (plus cache-oblivious padding, if enabled). Remove psz2ind_clamp() and use psz2ind() instead so that trying to allocate the maximum size class can in principle succeed. In practice, this allows assertions to hold so that OOM errors can be successfully generated.	2016-11-03 22:33:34 -07:00
Jason Evans	8dd5ea87ca	Fix extent_alloc_cache[_locked]() to support decommitted allocation. Fix extent_alloc_cache[_locked]() to support decommitted allocation, and use this ability in arena_stash_dirty(), so that decommitted extents are not needlessly committed during purging. In practice this does not happen on any currently supported systems, because both extent merging and decommit must be implemented; all supported systems implement one xor the other.	2016-11-03 22:33:23 -07:00
Jason Evans	4f7d8c2dee	Update symbol mangling.	2016-11-03 15:00:02 -07:00
Dave Watson	25f7bbcf28	Fix long spinning in rtree_node_init rtree_node_init spinlocks the node, allocates, and then sets the node. This is under heavy contention at the top of the tree if many threads start to allocate at the same time. Instead, take a per-rtree sleeping mutex to reduce spinning. Tested both pthreads and osx OSSpinLock, and both reduce spinning adequately Previous benchmark time: ./ttest1 500 100 ~15s New benchmark time: ./ttest1 500 100 .57s	2016-11-02 20:30:53 -07:00
Jason Evans	d82f2b3473	Do not use syscall(2) on OS X 10.12 (deprecated).	2016-11-02 19:18:33 -07:00
Jason Evans	795f6689de	Add os_unfair_lock support. OS X 10.12 deprecated OSSpinLock; os_unfair_lock is the recommended replacement.	2016-11-02 18:09:45 -07:00
Jason Evans	d9f7b2a430	Fix/refactor zone allocator integration code. Fix zone_force_unlock() to reinitialize, rather than unlocking mutexes, since OS X 10.12 cannot tolerate a child unlocking mutexes that were locked by its parent. Refactor; this was a side effect of experimenting with zone {de,re}registration during fork(2).	2016-11-02 18:06:40 -07:00
Jason Evans	90b60eeae4	Add an assertion in witness_owner().	2016-10-31 15:28:22 -07:00
Jason Evans	6a834d94bb	Refactor witness_unlock() to fix undefined test behavior. This resolves #396.	2016-10-31 11:49:12 -07:00
Jason Evans	6c80321aed	Use CLOCK_MONOTONIC_COARSE rather than COARSE_MONOTONIC_RAW. The raw clock variant is slow (even relative to plain CLOCK_MONOTONIC), whereas the coarse clock variant is faster than CLOCK_MONOTONIC, but still has resolution (~1ms) that is adequate for our purposes. This resolves #479.	2016-10-29 22:58:18 -07:00
Dave Watson	8309388408	Support static linking of jemalloc with glibc glibc defines its malloc implementation with several weak and strong symbols: strong_alias (__libc_calloc, __calloc) weak_alias (__libc_calloc, calloc) strong_alias (__libc_free, __cfree) weak_alias (__libc_free, cfree) strong_alias (__libc_free, __free) strong_alias (__libc_free, free) strong_alias (__libc_malloc, __malloc) strong_alias (__libc_malloc, malloc) The issue is not with the weak symbols, but that other parts of glibc depend on __libc_malloc explicitly. Defining them in terms of jemalloc API's allows the linker to drop glibc's malloc.o completely from the link, and static linking no longer results in symbol collisions. Another wrinkle: jemalloc during initialization calls sysconf to get the number of CPU's. GLIBC allocates for the first time before setting up isspace (and other related) tables, which are used by sysconf. Instead, use the pthread API to get the number of CPUs with GLIBC, which seems to work. This resolves #442.	2016-10-28 15:08:19 -07:00
Jason Evans	48d4adfbeb	Avoid negation of unsigned numbers. Rather than relying on two's complement negation for alignment mask generation, use bitwise not and addition. This dodges warnings from MSVC, and should be strength-reduced by compiler optimization anyway.	2016-10-27 21:26:33 -07:00
Jason Evans	b54d160dc4	Do not (recursively) allocate within tsd_fetch(). Refactor tsd so that tsdn_fetch() does not trigger allocation, since allocation could cause infinite recursion. This resolves #458.	2016-10-20 23:59:12 -07:00
Jason Evans	577d4572b0	Make dss operations lockless. Rather than protecting dss operations with a mutex, use atomic operations. This has negligible impact on synchronization overhead during typical dss allocation, but is a substantial improvement for extent_in_dss() and the newly added extent_dss_mergeable(), which can be called multiple times during extent deallocations. This change also has the advantage of avoiding tsd in deallocation paths associated with purging, which resolves potential deadlocks during thread exit due to attempted tsd resurrection. This resolves #425.	2016-10-13 15:37:00 -07:00
Jason Evans	e5effef428	Add/use adaptive spinning. Add spin_t and spin_{init,adaptive}(), which provide a simple abstraction for adaptive spinning. Adaptively spin during busy waits in bootstrapping and rtree node initialization.	2016-10-13 14:55:39 -07:00
Jason Evans	9acd5cf178	Remove all vestiges of chunks. Remove mallctls: - opt.lg_chunk - stats.cactive This resolves #464.	2016-10-12 11:55:43 -07:00
Jason Evans	63b5657aa5	Remove ratio-based purging. Make decay-based purging the default (and only) mode. Remove associated mallctls: - opt.purge - opt.lg_dirty_mult - arena.<i>.lg_dirty_mult - arenas.lg_dirty_mult - stats.arenas.<i>.lg_dirty_mult This resolves #385.	2016-10-12 10:40:27 -07:00
Jason Evans	b4b4a77848	Fix and simplify decay-based purging. Simplify decay-based purging attempts to only be triggered when the epoch is advanced, rather than every time purgeable memory increases. In a correctly functioning system (not previously the case; see below), this only causes a behavior difference if during subsequent purge attempts the least recently used (LRU) purgeable memory extent is initially too large to be purged, but that memory is reused between attempts and one or more of the next LRU purgeable memory extents are small enough to be purged. In practice this is an arbitrary behavior change that is within the set of acceptable behaviors. As for the purging fix, assure that arena->decay.ndirty is recorded after the epoch advance and associated purging occurs. Prior to this fix, it was possible for purging during epoch advance to cause a substantially underrepresentative (arena->ndirty - arena->decay.ndirty), i.e. the number of dirty pages attributed to the current epoch was too low, and a series of unintended purges could result. This fix is also relevant in the context of the simplification described above, but the bug's impact would be limited to over-purging at epoch advances.	2016-10-11 15:30:01 -07:00
Jason Evans	5f11fb7d43	Do not advance decay epoch when time goes backwards. Instead, move the epoch backward in time. Additionally, add nstime_monotonic() and use it in debug builds to assert that time only goes backward if nstime_update() is using a non-monotonic time source.	2016-10-10 22:15:10 -07:00

... 4 5 6 7 8 ...

811 Commits