server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Jason Evans	5d8db15db9	Simplify run quantization.	2016-10-06 15:58:38 -07:00
Jason Evans	f193fd80cf	Refactor runs_avail. Use pszind_t size classes rather than szind_t size classes, and always reserve space for NPSIZES elements. This removes unused heaps that are not multiples of the page size, and adds (currently) unused heaps for all huge size classes, with the immediate benefit that the size of arena_t allocations is constant (no longer dependent on chunk size).	2016-10-04 19:48:50 -07:00
Jason Evans	1abb49f09d	Implement pz2ind(), pind2sz(), and psz2u(). These compute size classes and indices similarly to size2index(), index2size() and s2u(), respectively, but using the subset of size classes that are multiples of the page size. Note that pszind_t and szind_t are not interchangeable.	2016-10-04 16:29:19 -07:00
Jason Evans	bcd5424b1c	Use TSDN_NULL rather than NULL as appropriate.	2016-10-04 15:56:56 -07:00
Mike Hommey	af33e9a597	Define 64-bits atomics unconditionally They are used on all platforms in prng.h.	2016-10-04 12:18:14 -07:00
Eric Le Bihan	b54c0c2925	Fix LG_QUANTUM definition for sparc64 GCC 4.9.3 cross-compiled for sparc64 defines __sparc_v9__, not __sparc64__ nor __sparcv9. This prevents LG_QUANTUM from being defined properly. Adding this new value to the check solves the issue.	2016-09-26 15:14:59 -07:00
Elliot Ronaghan	5acef864f2	Don't use compact red-black trees with the pgi compiler Some bug (either in the red-black tree code, or in the pgi compiler) seems to cause red-black trees to become unbalanced. This issue seems to go away if we don't use compact red-black trees. Since red-black trees don't seem to be used much anymore, I opted for what seems to be an easy fix here instead of digging in and trying to find the root cause of the bug. Some context in case it's helpful: I experienced a ton of segfaults while using pgi as Chapel's target compiler with jemalloc 4.0.4. The little bit of debugging I did pointed me somewhere deep in red-black tree manipulation, but I didn't get a chance to investigate further. It looks like 4.2.0 replaced most uses of red-black trees with pairing-heaps, which seems to avoid whatever bug I was hitting. However, `make check_unit` was still failing on the rb test, so I figured the core issue was just being masked. Here's the `make check_unit` failure: ```sh === test/unit/rb === test_rb_empty: pass tree_recurse:test/unit/rb.c:90: Failed assertion: (((_Bool) (((uintptr_t) (left_node)->link.rbn_right_red) & ((size_t)1)))) == (false) --> true != false: Node should be black test_rb_random:test/unit/rb.c:274: Failed assertion: (imbalances) == (0) --> 1 != 0: Tree is unbalanced tree_recurse:test/unit/rb.c:90: Failed assertion: (((_Bool) (((uintptr_t) (left_node)->link.rbn_right_red) & ((size_t)1)))) == (false) --> true != false: Node should be black test_rb_random:test/unit/rb.c:274: Failed assertion: (imbalances) == (0) --> 1 != 0: Tree is unbalanced node_remove:test/unit/rb.c:190: Failed assertion: (imbalances) == (0) --> 2 != 0: Tree is unbalanced <jemalloc>: test/unit/rb.c:43: Failed assertion: "pathp[-1].cmp < 0" test/test.sh: line 22: 12926 Aborted Test harness error ``` While starting to debug I saw the RB_COMPACT option and decided to check if turning that off resolved the bug. It seems to have fixed it (`make check_unit` passes and the segfaults under Chapel are gone) so it seems like on okay work-around. I'd imagine this has performance implications for red-black trees under pgi, but if they're not going to be used much anymore it's probably not a big deal.	2016-09-26 11:08:45 -07:00
Elliot Ronaghan	d1207f0d37	Check for __builtin_unreachable at configure time Add a configure check for __builtin_unreachable instead of basing its availability on the __GNUC__ version. On OS X using gcc (a real gcc, not the bundled version that's just a gcc front-end) leads to a linker assertion: https://github.com/jemalloc/jemalloc/issues/266 It turns out that this is caused by a gcc bug resulting from the use of __builtin_unreachable(): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57438 To work around this bug, check that __builtin_unreachable() actually works at configure time, and if it doesn't use abort() instead. The check is based on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57438#c21. With this `make check` passes with a homebrew installed gcc-5 and gcc-6.	2016-09-26 10:44:37 -07:00
Jason Evans	20cd2de5ef	Add a missing prof_alloc_rollback() call. In the case where prof_alloc_prep() is called with an over-estimate of allocation size, and sampling doesn't end up being triggered, the tctx must be discarded.	2016-06-08 10:12:38 -07:00
Jason Evans	05a9e4ac65	Fix potential VM map fragmentation regression. Revert 245ae6036c09cc11a72fab4335495d95cddd5beb (Support --with-lg-page values larger than actual page size.), because it could cause VM map fragmentation if the kernel grows mmap()ed memory downward. This resolves #391.	2016-06-07 14:21:21 -07:00
Jason Evans	73d3d58dc2	Optimize witness fast path. Short-circuit commonly called witness functions so that they only execute in debug builds, and remove equivalent guards from mutex functions. This avoids pointless code execution in witness_assert_lockless(), which is typically called twice per allocation/deallocation function invocation. Inline commonly called witness functions so that optimized builds can completely remove calls as dead code.	2016-05-11 15:38:06 -07:00
Jason Evans	c1e00ef2a6	Resolve bootstrapping issues when embedded in FreeBSD libc. b2c0d6322d2307458ae2b28545f8a5c9903d7ef5 (Add witness, a simple online locking validator.) caused a broad propagation of tsd throughout the internal API, but tsd_fetch() was designed to fail prior to tsd bootstrapping. Fix this by splitting tsd_t into non-nullable tsd_t and nullable tsdn_t, and modifying all internal APIs that do not critically rely on tsd to take nullable pointers. Furthermore, add the tsd_booted_get() function so that tsdn_fetch() can probe whether tsd bootstrapping is complete and return NULL if not. All dangerous conversions of nullable pointers are tsdn_tsd() calls that assert-fail on invalid conversion.	2016-05-10 22:51:33 -07:00
Jason Evans	919e4a0ea9	Add LG_QUANTUM definition for the RISC-V architecture.	2016-05-06 17:15:32 -07:00
Jason Evans	1326010cf4	Update private_symbols.txt.	2016-05-06 14:50:58 -07:00
Jason Evans	3ef51d7f73	Optimize the fast paths of calloc() and [m,d,sd]allocx(). This is a broader application of optimizations to malloc() and free() in f4a0f32d340985de477bbe329ecdaecd69ed1055 (Fast-path improvement: reduce # of branches and unnecessary operations.). This resolves #321.	2016-05-06 14:37:39 -07:00
Jason Evans	c2f970c32b	Modify pages_map() to support mapping uncommitted virtual memory. If the OS overcommits: - Commit all mappings in pages_map() regardless of whether the caller requested committed memory. - Linux-specific: Specify MAP_NORESERVE to avoid unfortunate interactions with heuristic overcommit mode during fork(2). This resolves #193.	2016-05-05 18:56:17 -07:00
Jason Evans	04c3c0f9a0	Add the stats.retained and stats.arenas.<i>.retained statistics. This resolves #367.	2016-05-03 22:11:35 -07:00
Jason Evans	90827a3f3e	Fix huge_palloc() regression. Split arena_choose() into arena_[i]choose() and use arena_ichoose() for arena lookup during internal allocation. This fixes huge_palloc() so that it always succeeds during extent node allocation. This regression was introduced by 66cd953514a18477eb49732e40d5c2ab5f1b12c5 (Do not allocate metadata via non-auto arenas, nor tcaches.).	2016-05-03 17:19:15 -07:00
Jason Evans	108c4a11e9	Fix witness/fork() interactions. Fix witness to clear its list of owned mutexes in the child if platform-specific malloc_mutex code re-initializes mutexes rather than unlocking them.	2016-04-26 10:47:22 -07:00
Jason Evans	174c0c3a9c	Fix fork()-related lock rank ordering reversals.	2016-04-25 23:16:20 -07:00
Jason Evans	71d94828a2	Fix degenerate mb_write() compilation error. This resolves #375.	2016-04-22 21:27:17 -07:00
Jason Evans	19ff2cefba	Implement the arena.<i>.reset mallctl. This makes it possible to discard all of an arena's allocations in a single operation. This resolves #146.	2016-04-22 15:20:06 -07:00
Jason Evans	66cd953514	Do not allocate metadata via non-auto arenas, nor tcaches. This assures that all internally allocated metadata come from the first opt_narenas arenas, i.e. the automatically multiplexed arenas.	2016-04-22 15:19:59 -07:00
Jason Evans	b6e07d2389	Fix malloc_mutex_assert_[not_]owner() for --enable-lazy-lock case.	2016-04-18 15:42:09 -07:00
Jason Evans	ab0cfe01fa	Update private_symbols.txt. Change test-related mangling to simplify symbol filtering. The following commands can be used to detect missing/obsolete symbol mangling, with the caveat that the full set of symbols is based on the union of symbols generated by all configurations, some of which are platform-specific: ./autogen.sh --enable-debug --enable-prof --enable-lazy-lock make all tests nm -a lib/libjemalloc.a src/.jet.o \ \|grep " [TDBCR] " \ \|awk '{print $3}' \ \|sed -e 's/^$je_\\|jet_\(n_$\?\)$[a-zA-Z0-9_]$/\3/g' \ \|LC_COLLATE=C sort -u \ \|grep -v \ -e '^$malloc\\|calloc\\|posix_memalign\\|aligned_alloc\\|realloc\\|free$$' \ -e '^$m\\|r\\|x\\|s\\|d\\|sd\\|n$allocx$' \ -e '^mallctl$\\|nametomib\\|bymib$$' \ -e '^malloc_$stats_print\\|usable_size\\|message$$' \ -e '^$memalign\\|valloc$$' \ -e '^__$malloc\\|memalign\\|realloc\\|free$_hook$' \ -e '^pthread_create$' \ > /tmp/private_symbols.txt	2016-04-18 15:23:35 -07:00
Rajat Goel	a0c632c9d5	Update private_symbols.txt Add 4 missing symbols	2016-04-18 11:54:09 -07:00
Jason Evans	1423ee9016	Fix style nits.	2016-04-17 13:44:59 -07:00
Jason Evans	1b5830178f	Fix malloc_mutex_[un]lock() to conditionally check witness. Also remove tautological cassert(config_debug) calls.	2016-04-17 13:44:59 -07:00
Jason Evans	2288424325	s/MALLOC_MUTEX_RANK_OMIT/WITNESS_RANK_OMIT/ This fixes a compilation error caused by b2c0d6322d2307458ae2b28545f8a5c9903d7ef5 (Add witness, a simple online locking validator.). This resolves #375.	2016-04-14 12:18:55 -07:00
Jason Evans	a15841cc7d	Fix a compilation error. Fix a compilation error that occurs if Valgrind is not enabled. This regression was caused by b2c0d6322d2307458ae2b28545f8a5c9903d7ef5 (Add witness, a simple online locking validator.).	2016-04-14 02:12:33 -07:00
Jason Evans	b2c0d6322d	Add witness, a simple online locking validator. This resolves #358.	2016-04-14 02:09:28 -07:00
Jason Evans	8413463f3a	Fix a style nit.	2016-04-12 23:18:25 -07:00
Jason Evans	667eca2ac2	Simplify RTREE_HEIGHT_MAX definition. Use 1U rather than ZU(1) in macro definitions, so that the preprocessor can evaluate the resulting expressions.	2016-04-11 02:35:00 -07:00
Jason Evans	245ae6036c	Support --with-lg-page values larger than actual page size. During over-allocation in preparation for creating aligned mappings, allocate one more page than necessary if PAGE is the actual page size, so that trimming still succeeds even if the system returns a mapping that has less than PAGE alignment. This allows compiling with e.g. 64 KiB "pages" on systems that actually use 4 KiB pages. Note that for e.g. --with-lg-page=21, it is also necessary to increase the chunk size (e.g. --with-malloc-conf=lg_chunk:22) so that there are at least two "pages" per chunk. In practice this isn't a particularly compelling configuration because so much (unusable) virtual memory is dedicated to chunk headers.	2016-04-11 02:35:00 -07:00
Jason Evans	96aa67aca8	Clean up char vs. uint8_t in junk filling code. Consistently use uint8_t rather than char for junk filling code.	2016-04-11 02:26:35 -07:00
Jason Evans	c6a2c39404	Refactor/fix ph. Refactor ph to support configurable comparison functions. Use a cpp macro code generation form equivalent to the rb macros so that pairing heaps can be used for both run heaps and chunk heaps. Remove per node parent pointers, and instead use leftmost siblings' prev pointers to track parents. Fix multi-pass sibling merging to iterate over intermediate results using a FIFO, rather than a LIFO. Use this fixed sibling merging implementation for both merge phases of the auxiliary twopass algorithm (first merging the aux list, then replacing the root with its merged children). This fixes both degenerate merge behavior and the potential for deep recursion. This regression was introduced by 6bafa6678fc36483e638f1c3a0a9bf79fb89bfc9 (Pairing heap). This resolves #371.	2016-04-11 02:15:42 -07:00
Jason Evans	2ee2f1ec57	Reduce differences between alternative bitmap implementations.	2016-04-06 10:38:47 -07:00
Jason Evans	4a8abbb400	Fix bitmap_sfu() regression. Fix bitmap_sfu() to shift by LG_BITMAP_GROUP_NBITS rather than hard-coded 6 when using linear (non-USE_TREE) bitmap search. In practice this affects only 64-bit systems for which sizeof(long) is not 8 (i.e. Windows), since USE_TREE is defined for 32-bit systems. This regression was caused by b8823ab02607d6f03febd32ac504bb6188c54047 (Use linear scan for small bitmaps). This resolves #368.	2016-04-06 10:32:06 -07:00
Chris Peterson	a82070ef5f	Add JEMALLOC_ALLOC_JUNK and JEMALLOC_FREE_JUNK macros Replace hardcoded 0xa5 and 0x5a junk values with JEMALLOC_ALLOC_JUNK and JEMALLOC_FREE_JUNK macros, respectively.	2016-03-31 11:23:29 -07:00
Jason Evans	ce7c0f999b	Fix potential chunk leaks. Move chunk_dalloc_arena()'s implementation into chunk_dalloc_wrapper(), so that if the dalloc hook fails, proper decommit/purge/retain cascading occurs. This fixes three potential chunk leaks on OOM paths, one during dss-based chunk allocation, one during chunk header commit (currently relevant only on Windows), and one during rtree write (e.g. if rtree node allocation fails). Merge chunk_purge_arena() into chunk_purge_default() (refactor, no change to functionality).	2016-03-30 18:36:04 -07:00
Chris Peterson	f3060284c5	Remove unused arenas_extend() function declaration. The arenas_extend() function was renamed to arenas_init() in commit 8bb3198f72fc7587dc93527f9f19fb5be52fa553, but its function declaration was not removed from jemalloc_internal.h.in.	2016-03-26 01:03:24 -07:00
Jason Evans	af3184cac0	Use abort() for fallback implementations of unreachable().	2016-03-24 01:42:08 -07:00
Jason Evans	61a6dfcd5f	Constify various internal arena APIs.	2016-03-23 16:15:42 -07:00
Jason Evans	6a885198c2	Always inline performance-critical rtree operations.	2016-03-23 16:15:42 -07:00
Jason Evans	6c460ad91b	Optimize rtree_get(). Specialize fast path to avoid code that cannot execute for dependent loads. Manually unroll.	2016-03-22 17:54:35 -07:00
Jason Evans	22af74e106	Refactor out signed/unsigned comparisons.	2016-03-15 09:40:02 -07:00
Rajeev Misra	ca18f2834e	typecast address to pointer to byte to avoid unaligned memory access error	2016-03-10 22:49:05 -08:00
Jason Evans	613cdc80f6	Convert arena_bin_t's runs from a tree to a heap.	2016-03-08 13:48:27 -08:00
Dave Watson	4a0dbb5ac8	Use pairing heap for arena->runs_avail Use pairing heap instead of red black tree in arena runs_avail. The extra links are unioned with the bitmap_t, so this change doesn't use any extra memory. Canaries show this change to be a 1% cpu win, and 2% latency win. In particular, large free()s, and small bin frees are now O(1) (barring coalescing). I also tested changing bin->runs to be a pairing heap, but saw a much smaller win, and it would mean increasing the size of arena_run_s by two pointers, so I left that as an rb-tree for now.	2016-03-08 13:48:27 -08:00
Jason Evans	f8d80d62a8	Refactor ph_merge_ordered() out of ph_merge().	2016-03-08 13:48:27 -08:00

1 2 3 4 5 ...

427 Commits