server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Jason Evans	e723f99dec	Alphabetize private symbol names.	2017-02-28 15:06:27 -08:00
Jason Evans	d84d2909c3	Fix/enhance THP integration. Detect whether chunks start off as THP-capable by default (according to the state of /sys/kernel/mm/transparent_hugepage/enabled), and use this as the basis for whether to call pages_nohuge() once per chunk during first purge of any of the chunk's page runs. Add the --disable-thp configure option, as well as the the opt.thp mallctl. This resolves #541.	2017-02-28 14:25:06 -08:00
Jason Evans	1027a2682b	Add some missing explicit casts. This resolves #614.	2017-02-27 14:41:01 -08:00
Jason Evans	1e2c9ef8d6	Fix huge-aligned allocation. This regression was caused by b9408d77a63a54fd331f9b81c884f68e6d57f2e5 (Fix/simplify chunk_recycle() allocation size computations.). This resolves #647.	2017-02-27 11:08:19 -08:00
Jason Evans	08c24e7c1a	Relax witness assertions related to prof_gdump(). In some cases the prof machinery allocates (in order to modify the bt2gctx hash table), and such operations are synchronized via bt2gctx_mtx. Rather than asserting that no locks are held on entry into functions that may call prof_gdump(), make the weaker assertion that no "core" locks are held. The prof machinery enqueues dumps triggered by prof_gdump() calls when bt2gctx_mtx is held, so this weakened assertion avoids false failures in such cases.	2017-02-23 10:08:42 -08:00
Jason Evans	f56cb9a68e	Add witness_assert_depth[_to_rank](). This makes it possible to make lock state assertions about precisely which locks are held.	2017-02-23 10:08:42 -08:00
Jason Evans	7034e6baa1	Enable mutex witnesses even when !isthreaded. This fixes interactions with witness_assert_depth[_to_rank](), which was added in dad74bd3c811ca2b1af1fd57b28f2456da5ba08b (Convert witness_assert_lockless() to witness_assert_lock_depth().).	2017-02-23 10:08:42 -08:00
Jason Evans	3ecc3c8486	Fix/refactor tcaches synchronization. Synchronize tcaches with tcaches_mtx rather than ctl_mtx. Add missing synchronization for tcache flushing. This bug was introduced by 1cb181ed632e7573fb4eab194e4d216867222d27 (Implement explicit tcache support.), which was first released in 4.0.0.	2017-02-23 10:06:15 -08:00
Jason Evans	b49c649bc1	Fix lock order reversal during gdump.	2017-01-24 12:50:06 -08:00
Jason Evans	dad74bd3c8	Convert witness_assert_lockless() to witness_assert_lock_depth(). This makes it possible to make lock state assertions about precisely which locks are held.	2017-01-24 12:50:06 -08:00
Mike Hommey	c68bb41793	Don't rely on OSX SDK malloc/malloc.h for malloc_zone struct definitions The SDK jemalloc is built against might be not be the latest for various reasons, but the resulting binary ought to work on newer versions of OSX. In order to ensure this, we need the fullest definitions possible, so copy what we need from the latest version of malloc/malloc.h available on opensource.apple.com.	2017-01-17 20:12:24 -08:00
John Paul Adrian Glaubitz	9389335b86	Use better pre-processor defines for sparc64 Currently, jemalloc detects sparc64 targets by checking whether __sparc64__ is defined. However, this definition is used on BSD targets only. Linux targets define both __sparc__ and __arch64__ for sparc64. Since this also works on BSD, rather use __sparc__ and __arch64__ instead of __sparc64__ to detect sparc64 targets.	2017-01-13 09:01:33 -08:00
Jason Evans	145f3cd173	Add --disable-syscall. This resolves #517.	2016-12-03 16:56:19 -08:00
Jason Evans	e98a620c59	Mark partially purged arena chunks as non-hugepage. Add the pages_[no]huge() functions, which toggle huge page state via madvise(..., MADV_[NO]HUGEPAGE) calls. The first time a page run is purged from within an arena chunk, call pages_nohuge() to tell the kernel to make no further attempts to back the chunk with huge pages. Upon arena chunk deletion, restore the associated virtual memory to its original state via pages_huge(). This resolves #243.	2016-11-24 00:15:55 -08:00
Jason Evans	fc11f3cb84	Enable overriding JEMALLOC_{ALLOC,FREE}_JUNK. This resolves #509.	2016-11-22 11:02:28 -08:00
Jason Evans	949a27fc32	Add pthread_atfork(3) feature test. Some versions of Android provide a pthreads library without providing pthread_atfork(), so in practice a separate feature test is necessary for the latter.	2016-11-17 15:16:27 -08:00
Jason Evans	62f2d84e7a	Refactor madvise(2) configuration. Add feature tests for the MADV_FREE and MADV_DONTNEED flags to madvise(2), so that MADV_FREE is detected and used for Linux kernel versions 4.5 and newer. Refactor pages_purge() so that on systems which support both flags, MADV_FREE is preferred over MADV_DONTNEED. This resolves #387.	2016-11-17 10:37:48 -08:00
Jason Evans	0d6a472db9	Avoid gcc tautological-compare warnings.	2016-11-16 18:53:59 -08:00
Jason Evans	87004d238c	Avoid negation of unsigned numbers. Rather than relying on two's complement negation for alignment mask generation, use bitwise not and addition. This dodges warnings from MSVC, and should be strength-reduced by compiler optimization anyway.	2016-11-15 14:04:35 -08:00
Jason Evans	5c77af98b1	Add extent serial numbers. Add extent serial numbers and use them where appropriate as a sort key that is higher priority than address, so that the allocation policy prefers older extents. This resolves #147.	2016-11-15 13:33:40 -08:00
Jason Evans	a2af09f025	Remove overly restrictive stats_cactive_{add,sub}() assertions. This fixes a regression caused by 40ee9aa9577ea5eb6616c10b9e6b0fa7e6796821 (Fix stats.cactive accounting regression.) and first released in 4.1.0.	2016-11-11 22:19:10 -08:00
Jason Evans	7b8e74f48f	Revert "Define 64-bits atomics unconditionally" This reverts commit af33e9a59735a2ee72132d3dd6e23fae6d296e34. This resolves #495.	2016-11-07 11:51:05 -08:00
Jason Evans	5d6cb6eb66	Refactor prng to not use 64-bit atomics on 32-bit platforms. This resolves #495.	2016-11-07 11:50:59 -08:00
Jason Evans	e9012630ac	Fix chunk_alloc_cache() to support decommitted allocation. Fix chunk_alloc_cache() to support decommitted allocation, and use this ability in arena_chunk_alloc_internal() and arena_stash_dirty(), so that chunks don't get permanently stuck in a hybrid state. This resolves #487.	2016-11-03 22:36:30 -07:00
Jason Evans	dd3ed23aea	Update symbol mangling.	2016-11-03 14:55:58 -07:00
Jason Evans	da206df10b	Do not use syscall(2) on OS X 10.12 (deprecated).	2016-11-02 19:35:12 -07:00
Jason Evans	3f2b8d9cfa	Add os_unfair_lock support. OS X 10.12 deprecated OSSpinLock; os_unfair_lock is the recommended replacement.	2016-11-02 19:35:12 -07:00
Jason Evans	a99e0fa2d2	Fix/refactor zone allocator integration code. Fix zone_force_unlock() to reinitialize, rather than unlocking mutexes, since OS X 10.12 cannot tolerate a child unlocking mutexes that were locked by its parent. Refactor; this was a side effect of experimenting with zone {de,re}registration during fork(2).	2016-11-02 19:35:09 -07:00
Jason Evans	4752a54eeb	Refactor witness_unlock() to fix undefined test behavior. This resolves #396.	2016-10-31 11:51:39 -07:00
Jason Evans	1d57c03e33	Use CLOCK_MONOTONIC_COARSE rather than COARSE_MONOTONIC_RAW. The raw clock variant is slow (even relative to plain CLOCK_MONOTONIC), whereas the coarse clock variant is faster than CLOCK_MONOTONIC, but still has resolution (~1ms) that is adequate for our purposes. This resolves #479.	2016-10-29 22:59:42 -07:00
Dave Watson	ed84764a2a	Support static linking of jemalloc with glibc glibc defines its malloc implementation with several weak and strong symbols: strong_alias (__libc_calloc, __calloc) weak_alias (__libc_calloc, calloc) strong_alias (__libc_free, __cfree) weak_alias (__libc_free, cfree) strong_alias (__libc_free, __free) strong_alias (__libc_free, free) strong_alias (__libc_malloc, __malloc) strong_alias (__libc_malloc, malloc) The issue is not with the weak symbols, but that other parts of glibc depend on __libc_malloc explicitly. Defining them in terms of jemalloc API's allows the linker to drop glibc's malloc.o completely from the link, and static linking no longer results in symbol collisions. Another wrinkle: jemalloc during initialization calls sysconf to get the number of CPU's. GLIBC allocates for the first time before setting up isspace (and other related) tables, which are used by sysconf. Instead, use the pthread API to get the number of CPUs with GLIBC, which seems to work. This resolves #442.	2016-10-28 15:10:19 -07:00
Jason Evans	962a2979e3	Do not (recursively) allocate within tsd_fetch(). Refactor tsd so that tsdn_fetch() does not trigger allocation, since allocation could cause infinite recursion. This resolves #458.	2016-10-21 00:27:37 -07:00
Jason Evans	e2bcf037d4	Make dss operations lockless. Rather than protecting dss operations with a mutex, use atomic operations. This has negligible impact on synchronization overhead during typical dss allocation, but is a substantial improvement for chunk_in_dss() and the newly added chunk_dss_mergeable(), which can be called multiple times during chunk deallocations. This change also has the advantage of avoiding tsd in deallocation paths associated with purging, which resolves potential deadlocks during thread exit due to attempted tsd resurrection. This resolves #425.	2016-10-13 15:33:56 -07:00
Jason Evans	9737685943	Add/use adaptive spinning. Add spin_t and spin_{init,adaptive}(), which provide a simple abstraction for adaptive spinning. Adaptively spin during busy waits in bootstrapping and rtree node initialization.	2016-10-13 14:58:38 -07:00
Jason Evans	d419bb09ef	Fix and simplify decay-based purging. Simplify decay-based purging attempts to only be triggered when the epoch is advanced, rather than every time purgeable memory increases. In a correctly functioning system (not previously the case; see below), this only causes a behavior difference if during subsequent purge attempts the least recently used (LRU) purgeable memory extent is initially too large to be purged, but that memory is reused between attempts and one or more of the next LRU purgeable memory extents are small enough to be purged. In practice this is an arbitrary behavior change that is within the set of acceptable behaviors. As for the purging fix, assure that arena->decay.ndirty is recorded after the epoch advance and associated purging occurs. Prior to this fix, it was possible for purging during epoch advance to cause a substantially underrepresentative (arena->ndirty - arena->decay.ndirty), i.e. the number of dirty pages attributed to the current epoch was too low, and a series of unintended purges could result. This fix is also relevant in the context of the simplification described above, but the bug's impact would be limited to over-purging at epoch advances.	2016-10-11 15:50:05 -07:00
Jason Evans	45a5bf6772	Do not advance decay epoch when time goes backwards. Instead, move the epoch backward in time. Additionally, add nstime_monotonic() and use it in debug builds to assert that time only goes backward if nstime_update() is using a non-monotonic time source.	2016-10-10 22:31:37 -07:00
Jason Evans	94e7ffa979	Refactor arena->decay_* into arena->decay.* (arena_decay_t).	2016-10-10 22:22:59 -07:00
Jason Evans	b732c395b7	Refine nstime_update(). Add missing #include <time.h>. The critical time facilities appear to have been transitively included via unistd.h and sys/time.h, but in principle this omission was capable of having caused clock_gettime(CLOCK_MONOTONIC, ...) to have been overlooked in favor of gettimeofday(), which in turn could cause spurious non-monotonic time updates. Refactor nstime_get() out of nstime_update() and add configure tests for all variants. Add CLOCK_MONOTONIC_RAW support (Linux-specific) and mach_absolute_time() support (OS X-specific). Do not fall back to clock_gettime(CLOCK_REALTIME, ...). This was a fragile Linux-specific workaround, which we're unlikely to use at all now that clock_gettime(CLOCK_MONOTONIC_RAW, ...) is supported, and if we have no choice besides non-monotonic clocks, gettimeofday() is only incrementally worse.	2016-10-10 11:40:46 -07:00
Jason Evans	5d8db15db9	Simplify run quantization.	2016-10-06 15:58:38 -07:00
Jason Evans	f193fd80cf	Refactor runs_avail. Use pszind_t size classes rather than szind_t size classes, and always reserve space for NPSIZES elements. This removes unused heaps that are not multiples of the page size, and adds (currently) unused heaps for all huge size classes, with the immediate benefit that the size of arena_t allocations is constant (no longer dependent on chunk size).	2016-10-04 19:48:50 -07:00
Jason Evans	1abb49f09d	Implement pz2ind(), pind2sz(), and psz2u(). These compute size classes and indices similarly to size2index(), index2size() and s2u(), respectively, but using the subset of size classes that are multiples of the page size. Note that pszind_t and szind_t are not interchangeable.	2016-10-04 16:29:19 -07:00
Jason Evans	bcd5424b1c	Use TSDN_NULL rather than NULL as appropriate.	2016-10-04 15:56:56 -07:00
Mike Hommey	af33e9a597	Define 64-bits atomics unconditionally They are used on all platforms in prng.h.	2016-10-04 12:18:14 -07:00
Eric Le Bihan	b54c0c2925	Fix LG_QUANTUM definition for sparc64 GCC 4.9.3 cross-compiled for sparc64 defines __sparc_v9__, not __sparc64__ nor __sparcv9. This prevents LG_QUANTUM from being defined properly. Adding this new value to the check solves the issue.	2016-09-26 15:14:59 -07:00
Elliot Ronaghan	5acef864f2	Don't use compact red-black trees with the pgi compiler Some bug (either in the red-black tree code, or in the pgi compiler) seems to cause red-black trees to become unbalanced. This issue seems to go away if we don't use compact red-black trees. Since red-black trees don't seem to be used much anymore, I opted for what seems to be an easy fix here instead of digging in and trying to find the root cause of the bug. Some context in case it's helpful: I experienced a ton of segfaults while using pgi as Chapel's target compiler with jemalloc 4.0.4. The little bit of debugging I did pointed me somewhere deep in red-black tree manipulation, but I didn't get a chance to investigate further. It looks like 4.2.0 replaced most uses of red-black trees with pairing-heaps, which seems to avoid whatever bug I was hitting. However, `make check_unit` was still failing on the rb test, so I figured the core issue was just being masked. Here's the `make check_unit` failure: ```sh === test/unit/rb === test_rb_empty: pass tree_recurse:test/unit/rb.c:90: Failed assertion: (((_Bool) (((uintptr_t) (left_node)->link.rbn_right_red) & ((size_t)1)))) == (false) --> true != false: Node should be black test_rb_random:test/unit/rb.c:274: Failed assertion: (imbalances) == (0) --> 1 != 0: Tree is unbalanced tree_recurse:test/unit/rb.c:90: Failed assertion: (((_Bool) (((uintptr_t) (left_node)->link.rbn_right_red) & ((size_t)1)))) == (false) --> true != false: Node should be black test_rb_random:test/unit/rb.c:274: Failed assertion: (imbalances) == (0) --> 1 != 0: Tree is unbalanced node_remove:test/unit/rb.c:190: Failed assertion: (imbalances) == (0) --> 2 != 0: Tree is unbalanced <jemalloc>: test/unit/rb.c:43: Failed assertion: "pathp[-1].cmp < 0" test/test.sh: line 22: 12926 Aborted Test harness error ``` While starting to debug I saw the RB_COMPACT option and decided to check if turning that off resolved the bug. It seems to have fixed it (`make check_unit` passes and the segfaults under Chapel are gone) so it seems like on okay work-around. I'd imagine this has performance implications for red-black trees under pgi, but if they're not going to be used much anymore it's probably not a big deal.	2016-09-26 11:08:45 -07:00
Elliot Ronaghan	d1207f0d37	Check for __builtin_unreachable at configure time Add a configure check for __builtin_unreachable instead of basing its availability on the __GNUC__ version. On OS X using gcc (a real gcc, not the bundled version that's just a gcc front-end) leads to a linker assertion: https://github.com/jemalloc/jemalloc/issues/266 It turns out that this is caused by a gcc bug resulting from the use of __builtin_unreachable(): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57438 To work around this bug, check that __builtin_unreachable() actually works at configure time, and if it doesn't use abort() instead. The check is based on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57438#c21. With this `make check` passes with a homebrew installed gcc-5 and gcc-6.	2016-09-26 10:44:37 -07:00
Jason Evans	20cd2de5ef	Add a missing prof_alloc_rollback() call. In the case where prof_alloc_prep() is called with an over-estimate of allocation size, and sampling doesn't end up being triggered, the tctx must be discarded.	2016-06-08 10:12:38 -07:00
Jason Evans	05a9e4ac65	Fix potential VM map fragmentation regression. Revert 245ae6036c09cc11a72fab4335495d95cddd5beb (Support --with-lg-page values larger than actual page size.), because it could cause VM map fragmentation if the kernel grows mmap()ed memory downward. This resolves #391.	2016-06-07 14:21:21 -07:00
Jason Evans	73d3d58dc2	Optimize witness fast path. Short-circuit commonly called witness functions so that they only execute in debug builds, and remove equivalent guards from mutex functions. This avoids pointless code execution in witness_assert_lockless(), which is typically called twice per allocation/deallocation function invocation. Inline commonly called witness functions so that optimized builds can completely remove calls as dead code.	2016-05-11 15:38:06 -07:00
Jason Evans	c1e00ef2a6	Resolve bootstrapping issues when embedded in FreeBSD libc. b2c0d6322d2307458ae2b28545f8a5c9903d7ef5 (Add witness, a simple online locking validator.) caused a broad propagation of tsd throughout the internal API, but tsd_fetch() was designed to fail prior to tsd bootstrapping. Fix this by splitting tsd_t into non-nullable tsd_t and nullable tsdn_t, and modifying all internal APIs that do not critically rely on tsd to take nullable pointers. Furthermore, add the tsd_booted_get() function so that tsdn_fetch() can probe whether tsd bootstrapping is complete and return NULL if not. All dangerous conversions of nullable pointers are tsdn_tsd() calls that assert-fail on invalid conversion.	2016-05-10 22:51:33 -07:00

1 2 3 4 5 ...

495 Commits