server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Jason Evans	b9408d77a6	Fix/simplify chunk_recycle() allocation size computations. Remove outer CHUNK_CEILING(s2u(...)) from alloc_size computation, since s2u() may overflow (and return 0), and CHUNK_CEILING() is only needed around the alignment portion of the computation. This fixes a regression caused by 5707d6f952c71baa2f19102479859012982ac821 (Quantize szad trees by size class.) and first released in 4.0.0. This resolves #497.	2016-11-11 22:18:39 -08:00
Jason Evans	2cdf07aba9	Fix extent_quantize() to handle greater-than-huge-size extents. Allocation requests can't directly create extents that exceed HUGE_MAXCLASS, but extent merging can create them. This fixes a regression caused by 8a03cf039cd06f9fa6972711195055d865673966 (Implement cache index randomization for large allocations.) and first released in 4.0.0. This resolves #497.	2016-11-11 22:17:27 -08:00
Jason Evans	5d6cb6eb66	Refactor prng to not use 64-bit atomics on 32-bit platforms. This resolves #495.	2016-11-07 11:50:59 -08:00
Jason Evans	a4e83e8593	Fix run leak. Fix arena_run_first_best_fit() to search all potentially non-empty runs_avail heaps, rather than ignoring the heap that contains runs larger than large_maxclass, but less than chunksize. This fixes a regression caused by f193fd80cf1f99bce2bc9f5f4a8b149219965da2 (Refactor runs_avail.). This resolves #493.	2016-11-07 09:43:39 -08:00
Jason Evans	28b7e42e44	Fix arena data structure size calculation. Fix paren placement so that QUANTUM_CEILING() applies to the correct portion of the expression that computes how much memory to base_alloc(). In practice this bug had no impact. This was caused by 5d8db15db91c85d47b343cfc07fc6ea736f0de48 (Simplify run quantization.), which in turn fixed an over-allocation regression caused by 3c4d92e82a31f652a7c77ca937a02d0185085b06 (Add per size class huge allocation statistics.).	2016-11-04 15:00:08 -07:00
Jason Evans	32896a902b	Fix large allocation to search optimal size class heap. Fix arena_run_alloc_large_helper() to not convert size to usize when searching for the first best fit via arena_run_first_best_fit(). This allows the search to consider the optimal quantized size class, so that e.g. allocating and deallocating 40 KiB in a tight loop can reuse the same memory. This regression was nominally caused by 5707d6f952c71baa2f19102479859012982ac821 (Quantize szad trees by size class.), but it did not commonly cause problems until 8a03cf039cd06f9fa6972711195055d865673966 (Implement cache index randomization for large allocations.). These regressions were first released in 4.0.0. This resolves #487.	2016-11-03 22:36:30 -07:00
Jason Evans	e9012630ac	Fix chunk_alloc_cache() to support decommitted allocation. Fix chunk_alloc_cache() to support decommitted allocation, and use this ability in arena_chunk_alloc_internal() and arena_stash_dirty(), so that chunks don't get permanently stuck in a hybrid state. This resolves #487.	2016-11-03 22:36:30 -07:00
Dave Watson	6c56e194b0	Check for existance of CPU_COUNT macro before using it. This resolves #485.	2016-11-02 19:54:19 -07:00
Jason Evans	da206df10b	Do not use syscall(2) on OS X 10.12 (deprecated).	2016-11-02 19:35:12 -07:00
Jason Evans	3f2b8d9cfa	Add os_unfair_lock support. OS X 10.12 deprecated OSSpinLock; os_unfair_lock is the recommended replacement.	2016-11-02 19:35:12 -07:00
Jason Evans	a99e0fa2d2	Fix/refactor zone allocator integration code. Fix zone_force_unlock() to reinitialize, rather than unlocking mutexes, since OS X 10.12 cannot tolerate a child unlocking mutexes that were locked by its parent. Refactor; this was a side effect of experimenting with zone {de,re}registration during fork(2).	2016-11-02 19:35:09 -07:00
Jason Evans	b599b32280	Add "J" (JSON) support to malloc_stats_print(). This resolves #474.	2016-11-01 15:32:37 -07:00
Jason Evans	1d57c03e33	Use CLOCK_MONOTONIC_COARSE rather than COARSE_MONOTONIC_RAW. The raw clock variant is slow (even relative to plain CLOCK_MONOTONIC), whereas the coarse clock variant is faster than CLOCK_MONOTONIC, but still has resolution (~1ms) that is adequate for our purposes. This resolves #479.	2016-10-29 22:59:42 -07:00
Jason Evans	c443b67561	Use syscall(2) rather than {open,read,close}(2) during boot. Some applications wrap various system calls, and if they call the allocator in their wrappers, unexpected reentry can result. This is not a general solution (many other syscalls are spread throughout the code), but this resolves a bootstrapping issue that is apparently common. This resolves #443.	2016-10-29 22:46:52 -07:00
Jason Evans	e46f8f97bc	Do not mark malloc_conf as weak on Windows. This works around malloc_conf not being properly initialized by at least the cygwin toolchain. Prior build system changes to use -Wl,--[no-]whole-archive may be necessary for malloc_conf resolution to work properly as a non-weak symbol (not tested).	2016-10-29 00:16:30 -07:00
Jason Evans	35799a5030	Do not mark malloc_conf as weak for unit tests. This is generally correct (no need for weak symbols since no jemalloc library is involved in the link phase), and avoids linking problems (apparently unininitialized non-NULL malloc_conf) when using cygwin with gcc.	2016-10-28 23:21:14 -07:00
Dave Watson	ed84764a2a	Support static linking of jemalloc with glibc glibc defines its malloc implementation with several weak and strong symbols: strong_alias (__libc_calloc, __calloc) weak_alias (__libc_calloc, calloc) strong_alias (__libc_free, __cfree) weak_alias (__libc_free, cfree) strong_alias (__libc_free, __free) strong_alias (__libc_free, free) strong_alias (__libc_malloc, __malloc) strong_alias (__libc_malloc, malloc) The issue is not with the weak symbols, but that other parts of glibc depend on __libc_malloc explicitly. Defining them in terms of jemalloc API's allows the linker to drop glibc's malloc.o completely from the link, and static linking no longer results in symbol collisions. Another wrinkle: jemalloc during initialization calls sysconf to get the number of CPU's. GLIBC allocates for the first time before setting up isspace (and other related) tables, which are used by sysconf. Instead, use the pthread API to get the number of CPUs with GLIBC, which seems to work. This resolves #442.	2016-10-28 15:10:19 -07:00
Jason Evans	dc553d52d8	Fix over-sized allocation of rtree leaf nodes. Use the correct level metadata when allocating child nodes so that leaf nodes don't end up over-sized (2^16 elements vs 2^4 elements).	2016-10-28 00:41:15 -07:00
Jason Evans	962a2979e3	Do not (recursively) allocate within tsd_fetch(). Refactor tsd so that tsdn_fetch() does not trigger allocation, since allocation could cause infinite recursion. This resolves #458.	2016-10-21 00:27:37 -07:00
Jason Evans	e2bcf037d4	Make dss operations lockless. Rather than protecting dss operations with a mutex, use atomic operations. This has negligible impact on synchronization overhead during typical dss allocation, but is a substantial improvement for chunk_in_dss() and the newly added chunk_dss_mergeable(), which can be called multiple times during chunk deallocations. This change also has the advantage of avoiding tsd in deallocation paths associated with purging, which resolves potential deadlocks during thread exit due to attempted tsd resurrection. This resolves #425.	2016-10-13 15:33:56 -07:00
Jason Evans	9737685943	Add/use adaptive spinning. Add spin_t and spin_{init,adaptive}(), which provide a simple abstraction for adaptive spinning. Adaptively spin during busy waits in bootstrapping and rtree node initialization.	2016-10-13 14:58:38 -07:00
Jason Evans	a2539fab95	Disallow 0x5a junk filling when running in Valgrind. Explicitly disallow junk:true and junk:free runtime settings when running in Valgrind, since deallocation-time junk filling and redzone validation cause false positive Valgrind reports. This resolves #470.	2016-10-12 22:58:40 -07:00
Jason Evans	d419bb09ef	Fix and simplify decay-based purging. Simplify decay-based purging attempts to only be triggered when the epoch is advanced, rather than every time purgeable memory increases. In a correctly functioning system (not previously the case; see below), this only causes a behavior difference if during subsequent purge attempts the least recently used (LRU) purgeable memory extent is initially too large to be purged, but that memory is reused between attempts and one or more of the next LRU purgeable memory extents are small enough to be purged. In practice this is an arbitrary behavior change that is within the set of acceptable behaviors. As for the purging fix, assure that arena->decay.ndirty is recorded after the epoch advance and associated purging occurs. Prior to this fix, it was possible for purging during epoch advance to cause a substantially underrepresentative (arena->ndirty - arena->decay.ndirty), i.e. the number of dirty pages attributed to the current epoch was too low, and a series of unintended purges could result. This fix is also relevant in the context of the simplification described above, but the bug's impact would be limited to over-purging at epoch advances.	2016-10-11 15:50:05 -07:00
Jason Evans	45a5bf6772	Do not advance decay epoch when time goes backwards. Instead, move the epoch backward in time. Additionally, add nstime_monotonic() and use it in debug builds to assert that time only goes backward if nstime_update() is using a non-monotonic time source.	2016-10-10 22:31:37 -07:00
Jason Evans	94e7ffa979	Refactor arena->decay_* into arena->decay.* (arena_decay_t).	2016-10-10 22:22:59 -07:00
Jason Evans	b732c395b7	Refine nstime_update(). Add missing #include <time.h>. The critical time facilities appear to have been transitively included via unistd.h and sys/time.h, but in principle this omission was capable of having caused clock_gettime(CLOCK_MONOTONIC, ...) to have been overlooked in favor of gettimeofday(), which in turn could cause spurious non-monotonic time updates. Refactor nstime_get() out of nstime_update() and add configure tests for all variants. Add CLOCK_MONOTONIC_RAW support (Linux-specific) and mach_absolute_time() support (OS X-specific). Do not fall back to clock_gettime(CLOCK_REALTIME, ...). This was a fragile Linux-specific workaround, which we're unlikely to use at all now that clock_gettime(CLOCK_MONOTONIC_RAW, ...) is supported, and if we have no choice besides non-monotonic clocks, gettimeofday() is only incrementally worse.	2016-10-10 11:40:46 -07:00
Jason Evans	5d8db15db9	Simplify run quantization.	2016-10-06 15:58:38 -07:00
Jason Evans	f193fd80cf	Refactor runs_avail. Use pszind_t size classes rather than szind_t size classes, and always reserve space for NPSIZES elements. This removes unused heaps that are not multiples of the page size, and adds (currently) unused heaps for all huge size classes, with the immediate benefit that the size of arena_t allocations is constant (no longer dependent on chunk size).	2016-10-04 19:48:50 -07:00
Jason Evans	1abb49f09d	Implement pz2ind(), pind2sz(), and psz2u(). These compute size classes and indices similarly to size2index(), index2size() and s2u(), respectively, but using the subset of size classes that are multiples of the page size. Note that pszind_t and szind_t are not interchangeable.	2016-10-04 16:29:19 -07:00
Jason Evans	bcd5424b1c	Use TSDN_NULL rather than NULL as appropriate.	2016-10-04 15:56:56 -07:00
Jason Evans	79647fe465	Close file descriptor after reading "/proc/sys/vm/overcommit_memory". This bug was introduced by c2f970c32b527660a33fa513a76d913c812dcf7c (Modify pages_map() to support mapping uncommitted virtual memory.). This resolves #399.	2016-09-26 15:58:44 -07:00
Jason Evans	57cddffca6	Formatting fixes.	2016-09-26 11:01:59 -07:00
Mike Hommey	11b5da7533	Change how the default zone is found On OSX 10.12, malloc_default_zone returns a special zone that is not present in the list of registered zones. That zone uses a "lite zone" if one is present (apparently enabled when malloc stack logging is enabled), or the first registered zone otherwise. In practice this means unless malloc stack logging is enabled, the first registered zone is the default. So get the list of zones to get the first one, instead of relying on malloc_default_zone.	2016-09-26 11:01:37 -07:00
Elliot Ronaghan	a6a8e40f7d	Fix a valgrind regression in chunk_recycle() Fix a latent valgrind bug exposed by d412624b25eed2b5c52b7d94a71070d3aab03cb4 (Move retaining out of default chunk hooks).	2016-09-26 10:30:57 -07:00
Qi Wang	57ed894f8a	Fix arena_bind(). When tsd is not in nominal state (e.g. during thread termination), we should not increment nthreads.	2016-09-23 14:39:29 -07:00
Jason Evans	fa09fe798a	Fix rallocx() sampling code to not eagerly commit sampler update. rallocx() for an alignment-constrained request may end up with a smaller-than-worst-case size if in-place reallocation succeeds due to serendipitous alignment. In such cases, sampling may not happen.	2016-06-08 10:14:25 -07:00
Jason Evans	a7fdcc8b09	Fix opt_zero-triggered in-place huge reallocation zeroing. Fix huge_ralloc_no_move_expand() to update the extent's zeroed attribute based on the intersection of the previous value and that of the newly merged trailing extent.	2016-06-08 10:10:08 -07:00
Elliot Ronaghan	c7d5298027	Fix a Valgrind regression in chunk_alloc_wrapper(). This regression was caused by d412624b25eed2b5c52b7d94a71070d3aab03cb4 (Move retaining out of default chunk hooks).	2016-06-07 14:30:39 -07:00
Elliot Ronaghan	9de0094e6e	Fix a Valgrind regression in calloc(). This regression was caused by 3ef51d7f733ac6432e80fa902a779ab5b98d74f6 (Optimize the fast paths of calloc() and [m,d,sd]allocx().).	2016-06-07 14:27:24 -07:00
Jason Evans	05a9e4ac65	Fix potential VM map fragmentation regression. Revert 245ae6036c09cc11a72fab4335495d95cddd5beb (Support --with-lg-page values larger than actual page size.), because it could cause VM map fragmentation if the kernel grows mmap()ed memory downward. This resolves #391.	2016-06-07 14:21:21 -07:00
Elliot Ronaghan	48384dc2d8	Fix mixed decl in nstime.c Fix mixed decl in the gettimeofday() branch of nstime_update()	2016-06-07 14:08:19 -07:00
Jason Evans	09d7bdb314	Propagate tsdn to default chunk hooks. This avoids bootstrapping issues for configurations that require allocation during tsd initialization. This resolves #390.	2016-06-07 14:00:58 -07:00
Jason Evans	1c35f63797	Guard tsdn_tsd() call with tsdn_null() check.	2016-05-11 16:52:58 -07:00
Jason Evans	0fc1317fc6	Mangle tested functions as n_witness_* rather than witness_*_impl.	2016-05-11 16:14:20 -07:00
Jason Evans	73d3d58dc2	Optimize witness fast path. Short-circuit commonly called witness functions so that they only execute in debug builds, and remove equivalent guards from mutex functions. This avoids pointless code execution in witness_assert_lockless(), which is typically called twice per allocation/deallocation function invocation. Inline commonly called witness functions so that optimized builds can completely remove calls as dead code.	2016-05-11 15:38:06 -07:00
Jason Evans	7790a0ba40	Fix chunk accounting related to triggering gdump profiles. Fix in place huge reallocation to update the chunk counters that are used for triggering gdump profiles.	2016-05-11 00:56:30 -07:00
Jason Evans	c1e00ef2a6	Resolve bootstrapping issues when embedded in FreeBSD libc. b2c0d6322d2307458ae2b28545f8a5c9903d7ef5 (Add witness, a simple online locking validator.) caused a broad propagation of tsd throughout the internal API, but tsd_fetch() was designed to fail prior to tsd bootstrapping. Fix this by splitting tsd_t into non-nullable tsd_t and nullable tsdn_t, and modifying all internal APIs that do not critically rely on tsd to take nullable pointers. Furthermore, add the tsd_booted_get() function so that tsdn_fetch() can probe whether tsd bootstrapping is complete and return NULL if not. All dangerous conversions of nullable pointers are tsdn_tsd() calls that assert-fail on invalid conversion.	2016-05-10 22:51:33 -07:00
Jason Evans	0c12dcabc5	Fix tsd bootstrapping for a0malloc().	2016-05-07 16:55:36 -07:00
Jason Evans	3ef51d7f73	Optimize the fast paths of calloc() and [m,d,sd]allocx(). This is a broader application of optimizations to malloc() and free() in f4a0f32d340985de477bbe329ecdaecd69ed1055 (Fast-path improvement: reduce # of branches and unnecessary operations.). This resolves #321.	2016-05-06 14:37:39 -07:00
Jason Evans	c2f970c32b	Modify pages_map() to support mapping uncommitted virtual memory. If the OS overcommits: - Commit all mappings in pages_map() regardless of whether the caller requested committed memory. - Linux-specific: Specify MAP_NORESERVE to avoid unfortunate interactions with heuristic overcommit mode during fork(2). This resolves #193.	2016-05-05 18:56:17 -07:00

1 2 3 4 5 ...

585 Commits