server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Jason Evans	26d23da6cd	Prefer pages_purge_forced() over memset(). This has the dual advantages of allowing for sparsely used large allocations, and relying on the kernel to supply zeroed pages, which tends to be very fast on modern systems.	2017-03-13 18:19:57 -07:00
Jason Evans	28078274c4	Add alignment/size assertions to pages_*(). These sanity checks prevent what otherwise might result in failed system calls and unintended fallback execution paths.	2017-03-13 18:19:57 -07:00
Jason Evans	7cbcd2e2b7	Fix pages_purge_forced() to discard pages on non-Linux systems. madvise(..., MADV_DONTNEED) only causes demand-zeroing on Linux, so fall back to overlaying a new mapping.	2017-03-13 18:19:57 -07:00
David Goldblatt	21a68e2d22	Convert rtree code to use C11 atomics In the process, I changed the implementation of rtree_elm_acquire so that it won't even try to CAS if its initial read (getting the extent + lock bit) indicates that the CAS is doomed to fail. This can significantly improve performance under contention.	2017-03-13 12:05:27 -07:00
Jason Evans	3a2b183d5f	Convert arena_t's purging field to non-atomic bool. The decay mutex already protects all accesses.	2017-03-10 10:14:30 -08:00
Jason Evans	75fddc786c	Fix ATOMIC_{ACQUIRE,RELEASE,ACQ_REL} definitions.	2017-03-09 00:57:37 -08:00
Qi Wang	f84471edc3	Add documentation for percpu_arena in jemalloc.xml.in.	2017-03-08 23:19:01 -08:00
Qi Wang	ec532e2c5c	Implement per-CPU arena. The new feature, opt.percpu_arena, determines thread-arena association dynamically based CPU id. Three modes are supported: "percpu", "phycpu" and disabled. "percpu" uses the current core id (with help from sched_getcpu()) directly as the arena index, while "phycpu" will assign threads on the same physical CPU to the same arena. In other words, "percpu" means # of arenas == # of CPUs, while "phycpu" has # of arenas == 1/2 * (# of CPUs). Note that no runtime check on whether hyper threading is enabled is added yet. When enabled, threads will be migrated between arenas when a CPU change is detected. In the current design, to reduce overhead from reading CPU id, each arena tracks the thread accessed most recently. When a new thread comes in, we will read CPU id and update arena if necessary.	2017-03-08 23:19:01 -08:00
Qi Wang	8721e19c04	Fix arena_prefork lock rank order for witness. When witness is enabled, lock rank order needs to be preserved during prefork, not only for each arena, but also across arenas. This change breaks arena_prefork into further stages to ensure valid rank order across arenas. Also changed test/unit/fork to use a manual arena to catch this case.	2017-03-08 23:07:27 -08:00
David Goldblatt	8adab26972	Convert extents_t's npages field to use C11-style atomics In the process, we can do some strength reduction, changing the fetch-adds and fetch-subs to be simple loads followed by stores, since the modifications all occur while holding the mutex.	2017-03-08 21:27:09 -08:00
David Goldblatt	dafadce622	Reintroduce JEMALLOC_ATOMIC_U64 The C11 atomics backport removed this #define, which degraded atomic 64-bit reads to require a lock even on platforms that support them. This commit fixes that.	2017-03-08 21:26:37 -08:00
Qi Wang	01f47f11a6	Store associated arena in tcache. This fixes tcache_flush for manual tcaches, which wasn't able to find the correct arena it associated with. Also changed the decay test to cover this case (by using manually created arenas).	2017-03-07 12:58:11 -08:00
Jason Evans	cdce93e4a3	Use any-best-fit for cached extent allocation. This simplifies what would be pairing heap operations to the equivalent of LIFO queue operations. This is a complementary optimization in the context of delayed coalescing for cached extents.	2017-03-07 10:25:33 -08:00
Jason Evans	cc75c35db5	Add any() and remove_any() to ph. These functions select the easiest-to-remove element in the heap, which is either the most recently inserted aux list element or the root. If no calls are made to first() or remove_first(), the behavior (and time complexity) is the same as for a LIFO queue.	2017-03-07 10:25:33 -08:00
Jason Evans	e201e24904	Perform delayed coalescing prior to purging. Rather than purging uncoalesced extents, perform just enough incremental coalescing to purge only fully coalesced extents. In the absence of cached extent reuse, the immediate versus delayed incremental purging algorithms result in the same purge order. This resolves #655.	2017-03-07 10:25:12 -08:00
Jason Evans	8547ee11c3	Fix flakiness in test_decay_ticker. Fix the test_decay_ticker test to carefully control slab creation/destruction such that the decay backlog reliably reaches zero. Use an isolated arena so that no extraneous allocation can confuse the situation. Speed up time during the latter part of the test so that the entire decay time can expire in a reasonable amount of wall time.	2017-03-07 10:25:12 -08:00
David Goldblatt	4f1e94658a	Change arena to use the atomic functions for ssize_t instead of the union strategy	2017-03-06 18:49:19 -08:00
David Goldblatt	438efede78	Add atomic types for ssize_t	2017-03-06 18:49:19 -08:00
David Goldblatt	424e3428b1	Make type abbreviations consistent: ssize_t is zd everywhere	2017-03-06 18:49:19 -08:00
David Goldblatt	84326c566a	Insert not_reached after an exhaustive switch In the C11 atomics backport, we couldn't use not_reached() in atomic_enum_to_builtin (in atomic_gcc_atomic.h), since atomic.h was hermetic and assert.h wasn't; there was a dependency issue. assert.h is hermetic now, so we can include it.	2017-03-06 15:08:43 -08:00
David Goldblatt	e9852b5776	Disentangle assert and util This is the first header refactoring diff, #533. It splits the assert and util components into separate, hermetic, header files. In the process, it splits out two of the large sub-components of util (the stdio.h replacement, and bit manipulation routines) into their own components (malloc_io.h and bit_util.h). This is mostly to break up cyclic dependencies, but it also breaks off a good chunk of the catch-all-ness of util, which is nice.	2017-03-06 15:08:43 -08:00
Jason Evans	04d8fcb745	Optimize malloc_large_stats_t maintenance. Convert the nrequests field to be partially derived, and the curlextents to be fully derived, in order to reduce the number of stats updates needed during common operations. This change affects ndalloc stats during arena reset, because it is no longer possible to cancel out ndalloc effects (curlextents would become negative).	2017-03-04 08:18:31 -08:00
David Goldblatt	d4ac7582f3	Introduce a backport of C11 atomics This introduces a backport of C11 atomics. It has four implementations; ranked in order of preference, they are: - GCC/Clang __atomic builtins - GCC/Clang __sync builtins - MSVC _Interlocked builtins - C11 atomics, from <stdatomic.h> The primary advantages are: - Close adherence to the standard API gives us a defined memory model. - Type safety: atomic objects are now separate types from non-atomic ones, so that it's impossible to mix up atomic and non-atomic updates (which is undefined behavior that compilers are starting to take advantage of). - Efficiency: we can specify ordering for operations, avoiding fences and atomic operations on strongly ordered architectures (example: `atomic_write_u32(ptr, val);` involves a CAS loop, whereas `atomic_store(ptr, val, ATOMIC_RELEASE);` is a plain store. This diff leaves in the current atomics API (implementing them in terms of the backport). This lets us transition uses over piecemeal. Testing: This is by nature hard to test. I've manually tested the first three options on Linux on gcc by futzing with the #defines manually, on freebsd with gcc and clang, on MSVC, and on OS X with clang. All of these were x86 machines though, and we don't have any test infrastructure set up for non-x86 platforms.	2017-03-03 13:40:59 -08:00
David Goldblatt	957b8c5f21	Stop #define-ining away 'inline' In the long term, we'll transition to C99-style inline semantics. In the short-term, this will allow both styles to coexist without breaking one another.	2017-03-03 13:40:59 -08:00
Jason Evans	fd058f572b	Immediately purge cached extents if decay_time is 0. This fixes a regression caused by 54269dc0ed3e4d04b2539016431de3cfe8330719 (Remove obsolete arena_maybe_purge() call.), as well as providing a general fix. This resolves #665.	2017-03-02 19:43:06 -08:00
Jason Evans	d61a5f76b2	Convert arena_decay_t's time to be atomically synchronized.	2017-03-02 19:43:06 -08:00
Jason Evans	ff55f07eb6	Fix typos.	2017-03-01 15:31:30 -08:00
Qi Wang	aa1de06e3a	Small style fix in ctl.c	2017-03-01 15:21:39 -08:00
charsyam	a8c9e9c651	fix typo sytem -> system	2017-03-01 08:40:05 -08:00
Jason Evans	04380e79f1	Merge branch 'rc-4.5.0'	2017-02-28 19:09:23 -08:00
Jason Evans	379dd44c57	Add casts to CONF_HANDLE_T_U(). This avoids signed/unsigned comparison warnings when specifying integer constants as inputs.	2017-02-28 17:18:25 -08:00
Jason Evans	700253e1f2	Update ChangeLog for 4.5.0.	2017-02-28 16:21:05 -08:00
Jason Evans	2406c22f36	Add casts to CONF_HANDLE_T_U(). This avoids signed/unsigned comparison warnings when specifying integer constants as inputs. Clean up whitespace and add clarifying parentheses for CONF_HANDLE_SIZE_T(opt_lg_chunk, ...).	2017-02-28 16:20:44 -08:00
Jason Evans	e723f99dec	Alphabetize private symbol names.	2017-02-28 15:06:27 -08:00
Jason Evans	cbb6720861	Update ChangeLog for 4.5.0.	2017-02-28 14:25:26 -08:00
Jason Evans	d84d2909c3	Fix/enhance THP integration. Detect whether chunks start off as THP-capable by default (according to the state of /sys/kernel/mm/transparent_hugepage/enabled), and use this as the basis for whether to call pages_nohuge() once per chunk during first purge of any of the chunk's page runs. Add the --disable-thp configure option, as well as the the opt.thp mallctl. This resolves #541.	2017-02-28 14:25:06 -08:00
Jason Evans	766ddcd0f2	restructure *CFLAGS configuration. Convert CFLAGS to be a concatenation: CFLAGS := CONFIGURE_CFLAGS SPECIFIED_CFLAGS EXTRA_CFLAGS This ordering makes it possible to override the flags set by the configure script both during and after configuration, with CFLAGS and EXTRA_CFLAGS, respectively. This resolves #619.	2017-02-28 12:54:40 -08:00
Jason Evans	25d50a943a	Dodge 32-bit-clang-specific backtracing failure. This disables run_tests.sh configurations that use the combination of 32-bit clang and heap profiling.	2017-02-28 10:59:27 -08:00
Jason Evans	4a068644c7	Put -D_REENTRANT in CPPFLAGS rather than CFLAGS. This regression was introduced by 194d6f9de8ff92841b67f38a2a6a06818e3240dd (Restructure CFLAGS/CXXFLAGS configuration.).	2017-02-28 01:21:26 -08:00
Qi Wang	7b53fe928e	Handle race in stats_arena_bins_print When multiple threads calling stats_print, race could happen as we read the counters in separate mallctl calls; and the removed assertion could fail when other operations happened in between the mallctl calls. For simplicity, output "race" in the utilization field in this case. This resolves #616.	2017-02-27 15:25:38 -08:00
Jason Evans	7c124830a1	Fix lg_chunk clamping for config_cache_oblivious. Fix lg_chunk clamping to take into account cache-oblivious large allocation. This regression only resulted in incorrect behavior if !config_fill (false unless --disable-fill specified) and config_cache_oblivious (true unless --disable-cache-oblivious specified). This regression was introduced by 8a03cf039cd06f9fa6972711195055d865673966 (Implement cache index randomization for large allocations.), which was first released in 4.0.0. This resolves #555.	2017-02-27 15:19:41 -08:00
Jason Evans	1027a2682b	Add some missing explicit casts. This resolves #614.	2017-02-27 14:41:01 -08:00
Jason Evans	472fef2e12	Fix {allocated,nmalloc,ndalloc,nrequests}_large stats regression. This fixes a regression introduced by d433471f581ca50583c7a99f9802f7388f81aa36 (Derive {allocated,nmalloc,ndalloc,nrequests}_large stats.).	2017-02-27 11:18:07 -08:00
Jason Evans	079b8bee37	Tidy up extent quantization. Remove obsolete unit test scaffolding for extent quantization. Remove redundant assertions. Add an assertion to extents_first_best_fit_locked() that should help prevent aligned allocation regressions.	2017-02-27 11:17:47 -08:00
Jason Evans	1e2c9ef8d6	Fix huge-aligned allocation. This regression was caused by b9408d77a63a54fd331f9b81c884f68e6d57f2e5 (Fix/simplify chunk_recycle() allocation size computations.). This resolves #647.	2017-02-27 11:08:19 -08:00
Jason Evans	d727596bcb	Update a comment.	2017-02-26 11:05:27 -08:00
Jason Evans	ed19a48928	Silence harmless warnings discovered via run_tests.sh.	2017-02-26 11:04:58 -08:00
Jason Evans	54d2d697b2	Test JSON output of malloc_stats_print() and fix bugs. Implement and test a JSON validation parser. Use the parser to validate JSON output from malloc_stats_print(), with a significant subset of supported output options. This resolves #583.	2017-02-26 11:04:58 -08:00
Jason Evans	61d26425e5	Fix JSON-mode output for !config_stats and/or !config_prof cases. These bugs were introduced by b599b32280e1142856b0b96293a71e1684b1ccfb (Add "J" (JSON) support to malloc_stats_print().), which was first released in 4.3.0. This resolves #615.	2017-02-26 11:04:58 -08:00
Jason Evans	adae7cfc4a	Fix chunk_alloc_dss() regression. Fix chunk_alloc_dss() to account for bytes that are not a multiple of the chunk size. This regression was introduced by e2bcf037d445a84a71c7997670819ebd0a893b4a (Make dss operations lockless.), which was first released in 4.3.0.	2017-02-26 10:53:26 -08:00

... 7 8 9 10 11 ...

2174 Commits