server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Dave Watson	2319152d9f	jemalloc cpp new/delete bindings Adds cpp bindings for jemalloc, along with necessary autoconf settings. This is mostly to add sized deallocation support, which can't be added from C directly. Sized deallocation is ~10% microbench improvement. * Import ax_cxx_compile_stdcxx.m4 from the autoconf repo, seems like the easiest way to get c++14 detection. * Adds various other changes, like CXXFLAGS, to configure.ac. * Adds new rules to Makefile.in for src/jemalloc-cpp.cpp, and a basic unittest. * Both new and delete are overridden, to ensure jemalloc is used for both. * TODO future enhancement of avoiding extra PLT thunks for new and delete - sdallocx and malloc are publicly exported jemalloc symbols, using an alias would link them directly. Unfortunately, was having trouble getting it to play nice with jemalloc's namespace support. Testing: Tested gcc 4.8, gcc 5, gcc 5.2, clang 4.0. Only gcc >= 5 has sized deallocation support, verified that the rest build correctly. Tested mac osx and Centos. Tested --with-jemalloc-prefix and --without-export. This resolves #202.	2016-12-12 18:36:06 -08:00
Jason Evans	acb7b1f53e	Add --disable-syscall. This resolves #517.	2016-12-03 16:50:58 -08:00
Jason Evans	5234be2133	Add pthread_atfork(3) feature test. Some versions of Android provide a pthreads library without providing pthread_atfork(), so in practice a separate feature test is necessary for the latter.	2016-11-17 15:14:57 -08:00
Jason Evans	a64123ce13	Refactor madvise(2) configuration. Add feature tests for the MADV_FREE and MADV_DONTNEED flags to madvise(2), so that MADV_FREE is detected and used for Linux kernel versions 4.5 and newer. Refactor pages_purge() so that on systems which support both flags, MADV_FREE is preferred over MADV_DONTNEED. This resolves #387.	2016-11-17 10:31:57 -08:00
Jason Evans	aec5a051e8	Avoid gcc type-limits warnings.	2016-11-16 18:28:38 -08:00
Maks Naumov	95974c0440	Remove size_t -> unsigned -> size_t conversion.	2016-11-16 11:23:31 -08:00
Jason Evans	8a4528bdd1	Uniformly cast mallctl[bymib]() oldp/newp arguments to (void *). This avoids warnings in some cases, and is otherwise generally good hygiene.	2016-11-15 15:01:03 -08:00
Jason Evans	a38acf716e	Add extent serial numbers. Add extent serial numbers and use them where appropriate as a sort key that is higher priority than address, so that the allocation policy prefers older extents. This resolves #147.	2016-11-15 13:08:33 -08:00
Jason Evans	c0a667112c	Fix arena_reset() crashing bug. This regression was caused by 498856f44a30b31fe713a18eb2fc7c6ecf3a9f63 (Move slabs out of chunks.).	2016-11-15 10:34:02 -08:00
Jason Evans	cda59f9970	Rename atomic__{uint32,uint64,u}() to atomic__{u32,u64,zu}(). This change conforms to naming conventions throughout the codebase.	2016-11-07 11:27:48 -08:00
Jason Evans	04b463546e	Refactor prng to not use 64-bit atomics on 32-bit platforms. This resolves #495.	2016-11-07 10:52:44 -08:00
Jason Evans	a967fae362	Fix/simplify extent_recycle() allocation size computations. Do not call s2u() during alloc_size computation, since any necessary ceiling increase is taken care of later by extent_first_best_fit() --> extent_size_quantize_ceil(), and the s2u() call may erroneously cause a higher quantization result. Remove an overly strict overflow check that was added in 4a7852137d8b6598fdb90ea8e1fd3bc8a8b94a3a (Fix extent_recycle()'s cache-oblivious padding support.).	2016-11-03 23:49:21 -07:00
Jason Evans	4a7852137d	Fix extent_recycle()'s cache-oblivious padding support. Add padding after computing the size class, so that the optimal size class isn't skipped during search for a usable extent. This regression was caused by b46261d58b449cc4c099ed2384451a2499688f0e (Implement cache-oblivious support for huge size classes.).	2016-11-03 22:33:35 -07:00
Jason Evans	ea9961acdb	Fix psz/pind edge cases. Add an "over-size" extent heap in which to store extents which exceed the maximum size class (plus cache-oblivious padding, if enabled). Remove psz2ind_clamp() and use psz2ind() instead so that trying to allocate the maximum size class can in principle succeed. In practice, this allows assertions to hold so that OOM errors can be successfully generated.	2016-11-03 22:33:34 -07:00
Jason Evans	8dd5ea87ca	Fix extent_alloc_cache[_locked]() to support decommitted allocation. Fix extent_alloc_cache[_locked]() to support decommitted allocation, and use this ability in arena_stash_dirty(), so that decommitted extents are not needlessly committed during purging. In practice this does not happen on any currently supported systems, because both extent merging and decommit must be implemented; all supported systems implement one xor the other.	2016-11-03 22:33:23 -07:00
Dave Watson	25f7bbcf28	Fix long spinning in rtree_node_init rtree_node_init spinlocks the node, allocates, and then sets the node. This is under heavy contention at the top of the tree if many threads start to allocate at the same time. Instead, take a per-rtree sleeping mutex to reduce spinning. Tested both pthreads and osx OSSpinLock, and both reduce spinning adequately Previous benchmark time: ./ttest1 500 100 ~15s New benchmark time: ./ttest1 500 100 .57s	2016-11-02 20:30:53 -07:00
Dave Watson	712fde79fd	Check for existance of CPU_COUNT macro before using it. This resolves #485.	2016-11-02 20:05:40 -07:00
Jason Evans	d82f2b3473	Do not use syscall(2) on OS X 10.12 (deprecated).	2016-11-02 19:18:33 -07:00
Jason Evans	795f6689de	Add os_unfair_lock support. OS X 10.12 deprecated OSSpinLock; os_unfair_lock is the recommended replacement.	2016-11-02 18:09:45 -07:00
Jason Evans	d9f7b2a430	Fix/refactor zone allocator integration code. Fix zone_force_unlock() to reinitialize, rather than unlocking mutexes, since OS X 10.12 cannot tolerate a child unlocking mutexes that were locked by its parent. Refactor; this was a side effect of experimenting with zone {de,re}registration during fork(2).	2016-11-02 18:06:40 -07:00
Jason Evans	7b0a8b74f0	malloc_stats_print() fixes/cleanups. Fix and clean up various malloc_stats_print() issues caused by 0ba5b9b6189e16a983d8922d8c5cb6ab421906e8 (Add "J" (JSON) support to malloc_stats_print().).	2016-11-01 15:26:35 -07:00
Jason Evans	0ba5b9b618	Add "J" (JSON) support to malloc_stats_print(). This resolves #474.	2016-10-31 22:30:49 -07:00
Jason Evans	b93f63b3eb	Fix extent_rtree acquire() to release element on error. This resolves #480.	2016-10-31 16:32:33 -07:00
Jason Evans	6c80321aed	Use CLOCK_MONOTONIC_COARSE rather than COARSE_MONOTONIC_RAW. The raw clock variant is slow (even relative to plain CLOCK_MONOTONIC), whereas the coarse clock variant is faster than CLOCK_MONOTONIC, but still has resolution (~1ms) that is adequate for our purposes. This resolves #479.	2016-10-29 22:58:18 -07:00
Jason Evans	d87037a62c	Use syscall(2) rather than {open,read,close}(2) during boot. Some applications wrap various system calls, and if they call the allocator in their wrappers, unexpected reentry can result. This is not a general solution (many other syscalls are spread throughout the code), but this resolves a bootstrapping issue that is apparently common. This resolves #443.	2016-10-29 22:41:04 -07:00
Jason Evans	1dcd0aa07f	Do not mark malloc_conf as weak on Windows. This works around malloc_conf not being properly initialized by at least the cygwin toolchain. Prior build system changes to use -Wl,--[no-]whole-archive may be necessary for malloc_conf resolution to work properly as a non-weak symbol (not tested).	2016-10-29 00:13:11 -07:00
Jason Evans	6ec2d8e279	Do not mark malloc_conf as weak for unit tests. This is generally correct (no need for weak symbols since no jemalloc library is involved in the link phase), and avoids linking problems (apparently unininitialized non-NULL malloc_conf) when using cygwin with gcc.	2016-10-28 23:03:25 -07:00
Dave Watson	8309388408	Support static linking of jemalloc with glibc glibc defines its malloc implementation with several weak and strong symbols: strong_alias (__libc_calloc, __calloc) weak_alias (__libc_calloc, calloc) strong_alias (__libc_free, __cfree) weak_alias (__libc_free, cfree) strong_alias (__libc_free, __free) strong_alias (__libc_free, free) strong_alias (__libc_malloc, __malloc) strong_alias (__libc_malloc, malloc) The issue is not with the weak symbols, but that other parts of glibc depend on __libc_malloc explicitly. Defining them in terms of jemalloc API's allows the linker to drop glibc's malloc.o completely from the link, and static linking no longer results in symbol collisions. Another wrinkle: jemalloc during initialization calls sysconf to get the number of CPU's. GLIBC allocates for the first time before setting up isspace (and other related) tables, which are used by sysconf. Instead, use the pthread API to get the number of CPUs with GLIBC, which seems to work. This resolves #442.	2016-10-28 15:08:19 -07:00
Jason Evans	68e14c9884	Fix over-sized allocation of rtree leaf nodes. Use the correct level metadata when allocating child nodes so that leaf nodes don't end up over-sized (2^16 elements vs 2^4 elements).	2016-10-28 00:16:55 -07:00
Jason Evans	977103c897	Uniformly cast mallctl[bymib]() oldp/newp arguments to (void *). This avoids warnings in some cases, and is otherwise generally good hygiene.	2016-10-27 21:31:25 -07:00
Jason Evans	b54d160dc4	Do not (recursively) allocate within tsd_fetch(). Refactor tsd so that tsdn_fetch() does not trigger allocation, since allocation could cause infinite recursion. This resolves #458.	2016-10-20 23:59:12 -07:00
Jason Evans	577d4572b0	Make dss operations lockless. Rather than protecting dss operations with a mutex, use atomic operations. This has negligible impact on synchronization overhead during typical dss allocation, but is a substantial improvement for extent_in_dss() and the newly added extent_dss_mergeable(), which can be called multiple times during extent deallocations. This change also has the advantage of avoiding tsd in deallocation paths associated with purging, which resolves potential deadlocks during thread exit due to attempted tsd resurrection. This resolves #425.	2016-10-13 15:37:00 -07:00
Jason Evans	e5effef428	Add/use adaptive spinning. Add spin_t and spin_{init,adaptive}(), which provide a simple abstraction for adaptive spinning. Adaptively spin during busy waits in bootstrapping and rtree node initialization.	2016-10-13 14:55:39 -07:00
Jason Evans	9acd5cf178	Remove all vestiges of chunks. Remove mallctls: - opt.lg_chunk - stats.cactive This resolves #464.	2016-10-12 11:55:43 -07:00
Jason Evans	63b5657aa5	Remove ratio-based purging. Make decay-based purging the default (and only) mode. Remove associated mallctls: - opt.purge - opt.lg_dirty_mult - arena.<i>.lg_dirty_mult - arenas.lg_dirty_mult - stats.arenas.<i>.lg_dirty_mult This resolves #385.	2016-10-12 10:40:27 -07:00
Jason Evans	b4b4a77848	Fix and simplify decay-based purging. Simplify decay-based purging attempts to only be triggered when the epoch is advanced, rather than every time purgeable memory increases. In a correctly functioning system (not previously the case; see below), this only causes a behavior difference if during subsequent purge attempts the least recently used (LRU) purgeable memory extent is initially too large to be purged, but that memory is reused between attempts and one or more of the next LRU purgeable memory extents are small enough to be purged. In practice this is an arbitrary behavior change that is within the set of acceptable behaviors. As for the purging fix, assure that arena->decay.ndirty is recorded after the epoch advance and associated purging occurs. Prior to this fix, it was possible for purging during epoch advance to cause a substantially underrepresentative (arena->ndirty - arena->decay.ndirty), i.e. the number of dirty pages attributed to the current epoch was too low, and a series of unintended purges could result. This fix is also relevant in the context of the simplification described above, but the bug's impact would be limited to over-purging at epoch advances.	2016-10-11 15:30:01 -07:00
Jason Evans	5f11fb7d43	Do not advance decay epoch when time goes backwards. Instead, move the epoch backward in time. Additionally, add nstime_monotonic() and use it in debug builds to assert that time only goes backward if nstime_update() is using a non-monotonic time source.	2016-10-10 22:15:10 -07:00
Jason Evans	ee0c74b77a	Refactor arena->decay_* into arena->decay.* (arena_decay_t).	2016-10-10 20:32:19 -07:00
Jason Evans	e0164bc63c	Refine nstime_update(). Add missing #include <time.h>. The critical time facilities appear to have been transitively included via unistd.h and sys/time.h, but in principle this omission was capable of having caused clock_gettime(CLOCK_MONOTONIC, ...) to have been overlooked in favor of gettimeofday(), which in turn could cause spurious non-monotonic time updates. Refactor nstime_get() out of nstime_update() and add configure tests for all variants. Add CLOCK_MONOTONIC_RAW support (Linux-specific) and mach_absolute_time() support (OS X-specific). Do not fall back to clock_gettime(CLOCK_REALTIME, ...). This was a fragile Linux-specific workaround, which we're unlikely to use at all now that clock_gettime(CLOCK_MONOTONIC_RAW, ...) is supported, and if we have no choice besides non-monotonic clocks, gettimeofday() is only incrementally worse.	2016-10-10 10:33:59 -07:00
Jason Evans	b6c0867142	Reduce "thread.arena" mallctl contention. This resolves #460.	2016-10-04 09:54:18 -07:00
Jason Evans	a5a8d7ae8d	Remove a size class assertion from extent_size_quantize_floor(). Extent coalescence can result in legitimate calls to extent_size_quantize_floor() with size larger than LARGE_MAXCLASS.	2016-10-03 14:45:27 -07:00
Jason Evans	871a9498e1	Fix size class overflow bugs. Avoid calling s2u() on raw extent sizes in extent_recycle(). Clamp psz2ind() (implemented as psz2ind_clamp()) when inserting/removing into/from size-segregated extent heaps.	2016-10-03 14:18:55 -07:00
Jason Evans	3c8c3e9e9b	Close file descriptor after reading "/proc/sys/vm/overcommit_memory". This bug was introduced by c2f970c32b527660a33fa513a76d913c812dcf7c (Modify pages_map() to support mapping uncommitted virtual memory.). This resolves #399.	2016-09-26 15:55:40 -07:00
Jason Evans	5ff1839133	Formatting fixes.	2016-09-26 11:00:32 -07:00
Jason Evans	0222fb41d1	Add various mutex ownership assertions.	2016-09-23 12:21:34 -07:00
Jason Evans	e3187ec6b6	Fix large_dalloc_impl() to always lock large_mtx.	2016-09-23 12:21:34 -07:00
Jason Evans	fd96974040	Add new_addr validation in extent_recycle().	2016-09-23 12:21:25 -07:00
Jason Evans	f6d01ff4b7	Protect extents_dirty access with extents_mtx. This fixes race conditions during purging.	2016-09-22 11:57:28 -07:00
Jason Evans	bc49157d21	Fix extent_recycle() to exclude other arenas' extents. When attempting to recycle an extent at a specified address, check that the extent belongs to the correct arena.	2016-09-22 11:53:19 -07:00
Qi Wang	1cb399b630	Fix arena_bind(). When tsd is not in nominal state (e.g. during thread termination), we should not increment nthreads.	2016-09-22 09:13:45 -07:00

1 2 3 4 5 ...

660 Commits