server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Qi Wang	b693c7868e	Implementing opt.background_thread. Added opt.background_thread to enable background threads, which handles purging currently. When enabled, decay ticks will not trigger purging (which will be left to the background threads). We limit the max number of threads to NCPUs. When percpu arena is enabled, set CPU affinity for the background threads as well. The sleep interval of background threads is dynamic and determined by computing number of pages to purge in the future (based on backlog).	2017-05-23 12:26:20 -07:00
Jason Evans	909f0482e4	Automatically generate private symbol name mangling macros. Rather than using a manually maintained list of internal symbols to drive name mangling, add a compilation phase to automatically extract the list of internal symbols. This resolves #677.	2017-05-11 23:06:54 -07:00
Jason Evans	b9ab04a191	Refactor !opt.munmap to opt.retain.	2017-04-29 09:24:12 -07:00
Jason Evans	c67c3e4a63	Replace --disable-munmap with opt.munmap. Control use of munmap(2) via a run-time option rather than a compile-time option (with the same per platform default). The old behavior of --disable-munmap can be achieved with --with-malloc-conf=munmap:false. This partially resolves #580.	2017-04-24 20:37:16 -07:00
Jason Evans	e2cc6280ed	Remove --enable-code-coverage. This option hasn't been particularly useful since the original pre-3.0.0 push to broaden test coverage. This partially resolves #580.	2017-04-24 16:33:04 -07:00
Jason Evans	0f63396b23	Remove --disable-cc-silence. The explicit compiler warning suppression controlled by this option is universally desirable, so remove the ability to disable suppression. This partially resolves #580.	2017-04-24 15:02:45 -07:00
Jason Evans	af76f0e5d2	Remove --with-lg-tiny-min. This option isn't useful in practice. This partially resolves #580.	2017-04-24 11:48:28 -07:00
David Goldblatt	425253e2cd	Enable -Wundef, when supported. This can catch bugs in which one header defines a numeric constant, and another uses it without including the defining header. Undefined preprocessor symbols expand to '0', so that this will compile fine, silently doing the math wrong.	2017-04-21 17:03:56 -07:00
Jason Evans	3823effe12	Remove --enable-ivsalloc. Continue to use ivsalloc() when --enable-debug is specified (and add assertions to guard against 0 size), but stop providing a documented explicit semantics-changing band-aid to dodge undefined behavior in sallocx() and malloc_usable_size(). ivsalloc() remains compiled in, unlike when #211 restored --enable-ivsalloc, and if JEMALLOC_FORCE_IVSALLOC is defined during compilation, sallocx() and malloc_usable_size() will still use ivsalloc(). This partially resolves #580.	2017-04-21 14:34:35 -07:00
Jason Evans	4403c9ab44	Remove --disable-tcache. Simplify configuration by removing the --disable-tcache option, but replace the testing for that configuration with --with-malloc-conf=tcache:false. Fix the thread.arena and thread.tcache.flush mallctls to work correctly if tcache is disabled. This partially resolves #580.	2017-04-21 10:06:12 -07:00
Jason Evans	7cbcd2e2b7	Fix pages_purge_forced() to discard pages on non-Linux systems. madvise(..., MADV_DONTNEED) only causes demand-zeroing on Linux, so fall back to overlaying a new mapping.	2017-03-13 18:19:57 -07:00
Qi Wang	ec532e2c5c	Implement per-CPU arena. The new feature, opt.percpu_arena, determines thread-arena association dynamically based CPU id. Three modes are supported: "percpu", "phycpu" and disabled. "percpu" uses the current core id (with help from sched_getcpu()) directly as the arena index, while "phycpu" will assign threads on the same physical CPU to the same arena. In other words, "percpu" means # of arenas == # of CPUs, while "phycpu" has # of arenas == 1/2 * (# of CPUs). Note that no runtime check on whether hyper threading is enabled is added yet. When enabled, threads will be migrated between arenas when a CPU change is detected. In the current design, to reduce overhead from reading CPU id, each arena tracks the thread accessed most recently. When a new thread comes in, we will read CPU id and update arena if necessary.	2017-03-08 23:19:01 -08:00
David Goldblatt	d4ac7582f3	Introduce a backport of C11 atomics This introduces a backport of C11 atomics. It has four implementations; ranked in order of preference, they are: - GCC/Clang __atomic builtins - GCC/Clang __sync builtins - MSVC _Interlocked builtins - C11 atomics, from <stdatomic.h> The primary advantages are: - Close adherence to the standard API gives us a defined memory model. - Type safety: atomic objects are now separate types from non-atomic ones, so that it's impossible to mix up atomic and non-atomic updates (which is undefined behavior that compilers are starting to take advantage of). - Efficiency: we can specify ordering for operations, avoiding fences and atomic operations on strongly ordered architectures (example: `atomic_write_u32(ptr, val);` involves a CAS loop, whereas `atomic_store(ptr, val, ATOMIC_RELEASE);` is a plain store. This diff leaves in the current atomics API (implementing them in terms of the backport). This lets us transition uses over piecemeal. Testing: This is by nature hard to test. I've manually tested the first three options on Linux on gcc by futzing with the #defines manually, on freebsd with gcc and clang, on MSVC, and on OS X with clang. All of these were x86 machines though, and we don't have any test infrastructure set up for non-x86 platforms.	2017-03-03 13:40:59 -08:00
Jason Evans	f5cf9b19c8	Determine rtree levels at compile time. Rather than dynamically building a table to aid per level computations, define a constant table at compile time. Omit both high and low insignificant bits. Use one to three tree levels, depending on the number of significant bits.	2017-02-08 18:50:03 -08:00
Jason Evans	c0cc5db871	Replace tabs following #define with spaces. This resolves #564.	2017-01-20 21:45:53 -08:00
Mike Hommey	0f7376eb62	Don't rely on OSX SDK malloc/malloc.h for malloc_zone struct definitions The SDK jemalloc is built against might be not be the latest for various reasons, but the resulting binary ought to work on newer versions of OSX. In order to ensure this, we need the fullest definitions possible, so copy what we need from the latest version of malloc/malloc.h available on opensource.apple.com.	2017-01-17 20:13:28 -08:00
Jason Evans	c1baa0a9b7	Add huge page configuration and pages_[no}huge(). Add the --with-lg-hugepage configure option, but automatically configure LG_HUGEPAGE even if it isn't specified. Add the pages_[no]huge() functions, which toggle huge page state via madvise(..., MADV_[NO]HUGEPAGE) calls.	2016-12-26 17:59:34 -08:00
Jason Evans	acb7b1f53e	Add --disable-syscall. This resolves #517.	2016-12-03 16:50:58 -08:00
Jason Evans	5234be2133	Add pthread_atfork(3) feature test. Some versions of Android provide a pthreads library without providing pthread_atfork(), so in practice a separate feature test is necessary for the latter.	2016-11-17 15:14:57 -08:00
Jason Evans	a64123ce13	Refactor madvise(2) configuration. Add feature tests for the MADV_FREE and MADV_DONTNEED flags to madvise(2), so that MADV_FREE is detected and used for Linux kernel versions 4.5 and newer. Refactor pages_purge() so that on systems which support both flags, MADV_FREE is preferred over MADV_DONTNEED. This resolves #387.	2016-11-17 10:31:57 -08:00
Jason Evans	d82f2b3473	Do not use syscall(2) on OS X 10.12 (deprecated).	2016-11-02 19:18:33 -07:00
Jason Evans	795f6689de	Add os_unfair_lock support. OS X 10.12 deprecated OSSpinLock; os_unfair_lock is the recommended replacement.	2016-11-02 18:09:45 -07:00
Jason Evans	6c80321aed	Use CLOCK_MONOTONIC_COARSE rather than COARSE_MONOTONIC_RAW. The raw clock variant is slow (even relative to plain CLOCK_MONOTONIC), whereas the coarse clock variant is faster than CLOCK_MONOTONIC, but still has resolution (~1ms) that is adequate for our purposes. This resolves #479.	2016-10-29 22:58:18 -07:00
Jason Evans	e0164bc63c	Refine nstime_update(). Add missing #include <time.h>. The critical time facilities appear to have been transitively included via unistd.h and sys/time.h, but in principle this omission was capable of having caused clock_gettime(CLOCK_MONOTONIC, ...) to have been overlooked in favor of gettimeofday(), which in turn could cause spurious non-monotonic time updates. Refactor nstime_get() out of nstime_update() and add configure tests for all variants. Add CLOCK_MONOTONIC_RAW support (Linux-specific) and mach_absolute_time() support (OS X-specific). Do not fall back to clock_gettime(CLOCK_REALTIME, ...). This was a fragile Linux-specific workaround, which we're unlikely to use at all now that clock_gettime(CLOCK_MONOTONIC_RAW, ...) is supported, and if we have no choice besides non-monotonic clocks, gettimeofday() is only incrementally worse.	2016-10-10 10:33:59 -07:00
Elliot Ronaghan	1167e9eff3	Check for __builtin_unreachable at configure time Add a configure check for __builtin_unreachable instead of basing its availability on the __GNUC__ version. On OS X using gcc (a real gcc, not the bundled version that's just a gcc front-end) leads to a linker assertion: https://github.com/jemalloc/jemalloc/issues/266 It turns out that this is caused by a gcc bug resulting from the use of __builtin_unreachable(): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57438 To work around this bug, check that __builtin_unreachable() actually works at configure time, and if it doesn't use abort() instead. The check is based on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57438#c21. With this `make check` passes with a homebrew installed gcc-5 and gcc-6.	2016-07-07 13:28:44 -07:00
Jason Evans	c8c3cbdf47	Miscellaneous s/chunk/extent/ updates.	2016-06-05 20:42:24 -07:00
Jason Evans	03eea4fb8b	Better document --enable-ivsalloc.	2016-06-05 20:42:24 -07:00
Jason Evans	17c021c177	Remove redzone support. This resolves #369.	2016-05-13 10:27:33 -07:00
Jason Evans	ba5c709517	Remove quarantine support.	2016-05-13 10:25:05 -07:00
Jason Evans	9a8add1510	Remove Valgrind support.	2016-05-13 09:56:18 -07:00
Jason Evans	c2f970c32b	Modify pages_map() to support mapping uncommitted virtual memory. If the OS overcommits: - Commit all mappings in pages_map() regardless of whether the caller requested committed memory. - Linux-specific: Specify MAP_NORESERVE to avoid unfortunate interactions with heuristic overcommit mode during fork(2). This resolves #193.	2016-05-05 18:56:17 -07:00
Jason Evans	9f4ee6034c	Refactor jemalloc_ffs() into ffs_(). Use appropriate versions to resolve 64-to-32-bit data loss warnings.	2016-02-24 13:03:48 -08:00
Jason Evans	ecae12323d	Fix overflow in prng_range(). Add jemalloc_ffs64() and use it instead of jemalloc_ffsl() in prng_range(), since long is not guaranteed to be a 64-bit type.	2016-02-20 23:41:33 -08:00
Jason Evans	f829009929	Add --with-malloc-conf. Add --with-malloc-conf, which makes it possible to embed a default options string during configuration.	2016-02-19 20:29:06 -08:00
Jason Evans	d059b9d6a1	Implement support for non-coalescing maps on MinGW. - Do not reallocate huge objects in place if the number of backing chunks would change. - Do not cache multi-chunk mappings. This resolves #213.	2015-07-24 18:39:14 -07:00
Jason Evans	8a03cf039c	Implement cache index randomization for large allocations. Extract szad size quantization into {extent,run}_quantize(), and . quantize szad run sizes to the union of valid small region run sizes and large run sizes. Refactor iteration in arena_run_first_fit() to use run_quantize{,_first,_next(), and add support for padded large runs. For large allocations that have no specified alignment constraints, compute a pseudo-random offset from the beginning of the first backing page that is a multiple of the cache line size. Under typical configurations with 4-KiB pages and 64-byte cache lines this results in a uniform distribution among 64 page boundary offsets. Add the --disable-cache-oblivious option, primarily intended for performance testing. This resolves #13.	2015-05-06 13:27:39 -07:00
Jason Evans	e0a08a1496	Restore --enable-ivsalloc. However, unlike before it was removed do not force --enable-ivsalloc when Darwin zone allocator integration is enabled, since the zone allocator code uses ivsalloc() regardless of whether malloc_usable_size() and sallocx() do. This resolves #211.	2015-03-18 21:06:58 -07:00
Mike Hommey	7c46fd59cc	Make --without-export actually work `9906660` added a --without-export configure option to avoid exporting jemalloc symbols, but the option didn't actually work.	2015-03-04 21:49:15 +09:00
Jason Evans	cbf3a6d703	Move centralized chunk management into arenas. Migrate all centralized data structures related to huge allocations and recyclable chunks into arena_t, so that each arena can manage huge allocations and recyclable virtual memory completely independently of other arenas. Add chunk node caching to arenas, in order to avoid contention on the base allocator. Use chunks_rtree to look up huge allocations rather than a red-black tree. Maintain a per arena unsorted list of huge allocations (which will be needed to enumerate huge allocations during arena reset). Remove the --enable-ivsalloc option, make ivsalloc() always available, and use it for size queries if --enable-debug is enabled. The only practical implications to this removal are that 1) ivsalloc() is now always available during live debugging (and the underlying radix tree is available during core-based debugging), and 2) size query validation can no longer be enabled independent of --enable-debug. Remove the stats.chunks.{current,total,high} mallctls, and replace their underlying statistics with simpler atomically updated counters used exclusively for gdump triggering. These statistics are no longer very useful because each arena manages chunks independently, and per arena statistics provide similar information. Simplify chunk synchronization code, now that base chunk allocation cannot cause recursive lock acquisition.	2015-02-12 00:15:56 -08:00
Daniel Micay	b74041fb6e	Ignore MALLOC_CONF in set{uid,gid,cap} binaries. This eliminates the malloc tunables as tools for an attacker. Closes #173	2014-12-14 15:36:15 -08:00
Jason Evans	e12eaf93dc	Style and spelling fixes.	2014-12-08 16:34:04 -08:00
Chih-hung Hsieh	59cd80e6c6	Add a C11 atomics-based implementation of atomic.h API.	2014-12-06 21:17:49 -08:00
Jason Evans	81e547566e	Add --with-lg-tiny-min, generalize --with-lg-quantum.	2014-10-10 22:35:07 -07:00
Jason Evans	fc0b3b7383	Add configure options. Add: --with-lg-page --with-lg-page-sizes --with-lg-size-class-group --with-lg-quantum Get rid of STATIC_PAGE_SHIFT, in favor of directly setting LG_PAGE. Fix various edge conditions exposed by the configure options.	2014-10-09 22:44:37 -07:00
Eric Wong	4dcf04bfc0	correctly detect adaptive mutexes in pthreads PTHREAD_MUTEX_ADAPTIVE_NP is an enum on glibc and not a macro, we must test for their existence by attempting compilation.	2014-09-29 16:10:40 -07:00
Sara Golemon	3e24afa28e	Test for availability of malloc hooks via autoconf __*_hook() is glibc, but on at least one glibc platform (homebrew), the __GLIBC__ define isn't set correctly and we miss being able to use these hooks. Do a feature test for it during configuration so that we enable it anywhere the hooks are actually available.	2014-08-22 15:19:21 -07:00
Richard Diamond	994fad9bda	Add check for madvise(2) to configure.ac. Some platforms, such as Google's Portable Native Client, use Newlib and thus lack access to madvise(2). In those instances, pages_purge() is transformed into a no-op.	2014-06-03 09:32:49 -07:00
Richard Diamond	9c3a10fdf6	Try to use __builtin_ffsl if ffsl is unavailable. Some platforms (like those using Newlib) don't have ffs/ffsl. This commit adds a check to configure.ac for __builtin_ffsl if ffsl isn't found. __builtin_ffsl performs the same function as ffsl, and has the added benefit of being available on any platform utilizing Gcc-compatible compiler. This change does not address the used of ffs in the MALLOCX_ARENA() macro.	2014-06-02 07:44:50 -07:00
Jason Evans	d04047cc29	Add size class computation capability. Add size class computation capability, currently used only as validation of the size class lookup tables. Generalize the size class spacing used for bins, for eventual use throughout the full range of allocation sizes.	2014-05-28 21:06:46 -07:00
Jason Evans	e2deab7a75	Refactor huge allocation to be managed by arenas. Refactor huge allocation to be managed by arenas (though the global red-black tree of huge allocations remains for lookup during deallocation). This is the logical conclusion of recent changes that 1) made per arena dss precedence apply to huge allocation, and 2) made it possible to replace the per arena chunk allocation/deallocation functions. Remove the top level huge stats, and replace them with per arena huge stats. Normalize function names and types to dalloc (some were dealloc). Remove the --enable-mremap option. As jemalloc currently operates, this is a performace regression for some applications, but planned work to logarithmically space huge size classes should provide similar amortized performance. The motivation for this change was that mremap-based huge reallocation forced leaky abstractions that prevented refactoring.	2014-05-15 22:36:41 -07:00

1 2

58 Commits