server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Jason Evans	cb9b44914e	Remove obsolete (incorrect) assertions. This regression was introduced by `88fef7ceda` (Refactor huge_*() calls into arena internals.), and went undetected because of the --enable-debug regression.	2015-02-15 20:13:28 -08:00
Daniel Micay	a95018ee81	Attempt to expand huge allocations in-place. This adds support for expanding huge allocations in-place by requesting memory at a specific address from the chunk allocator. It's currently only implemented for the chunk recycling path, although in theory it could also be done by optimistically allocating new chunks. On Linux, it could attempt an in-place mremap. However, that won't work in practice since the heap is grown downwards and memory is not unmapped (in a normal build, at least). Repeated vector reallocation micro-benchmark: #include <string.h> #include <stdlib.h> int main(void) { for (size_t i = 0; i < 100; i++) { void ptr = NULL; size_t old_size = 0; for (size_t size = 4; size < (1 << 30); size = 2) { ptr = realloc(ptr, size); if (!ptr) return 1; memset(ptr + old_size, 0xff, size - old_size); old_size = size; } free(ptr); } } The glibc allocator fails to do any in-place reallocations on this benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it elides the cost of copies via mremap, which is currently not something that jemalloc can use. With this improvement, jemalloc still fails to do any in-place huge reallocations for the first outer loop, but then succeeds 100% of the time for the remaining 99 iterations. The time spent doing allocations and copies drops down to under 5%, with nearly all of it spent doing purging + faulting (when huge pages are disabled) and the array memset. An improved mremap API (MREMAP_RETAIN - #138) would be far more general but this is a portable optimization and would still be useful on Linux for xallocx. Numbers with transparent huge pages enabled: glibc (copies elided via MREMAP_MAYMOVE): 8.471s jemalloc: 17.816s jemalloc + no-op madvise: 13.236s jemalloc + this commit: 6.787s jemalloc + this commit + no-op madvise: 6.144s Numbers with transparent huge pages disabled: glibc (copies elided via MREMAP_MAYMOVE): 15.403s jemalloc: 39.456s jemalloc + no-op madvise: 12.768s jemalloc + this commit: 15.534s jemalloc + this commit + no-op madvise: 6.354s Closes #137	2014-10-05 14:47:01 -07:00
Daniel Micay	4cfe55166e	Add support for sized deallocation. This adds a new `sdallocx` function to the external API, allowing the size to be passed by the caller. It avoids some extra reads in the thread cache fast path. In the case where stats are enabled, this avoids the work of calculating the size from the pointer. An assertion validates the size that's passed in, so enabling debugging will allow users of the API to debug cases where an incorrect size is passed in. The performance win for a contrived microbenchmark doing an allocation and immediately freeing it is ~10%. It may have a different impact on a real workload. Closes #28	2014-09-08 17:34:24 -07:00
Mike Hommey	b54aef1d8c	Fixup after `3a730df` (Avoid pointer arithmetic on void*[...])	2014-05-28 09:46:09 -07:00
Mike Hommey	3a730dfd50	Avoid pointer arithmetic on void* in test/integration/rallocx.c	2014-05-27 15:26:28 -07:00
Jason Evans	e2deab7a75	Refactor huge allocation to be managed by arenas. Refactor huge allocation to be managed by arenas (though the global red-black tree of huge allocations remains for lookup during deallocation). This is the logical conclusion of recent changes that 1) made per arena dss precedence apply to huge allocation, and 2) made it possible to replace the per arena chunk allocation/deallocation functions. Remove the top level huge stats, and replace them with per arena huge stats. Normalize function names and types to dalloc (some were dealloc). Remove the --enable-mremap option. As jemalloc currently operates, this is a performace regression for some applications, but planned work to logarithmically space huge size classes should provide similar amortized performance. The motivation for this change was that mremap-based huge reallocation forced leaky abstractions that prevented refactoring.	2014-05-15 22:36:41 -07:00
aravind	fb7fe50a88	Add support for user-specified chunk allocators/deallocators. Add new mallctl endpoints "arena<i>.chunk.alloc" and "arena<i>.chunk.dealloc" to allow userspace to configure jemalloc's chunk allocator and deallocator on a per-arena basis.	2014-05-12 10:46:03 -07:00
Jason Evans	4d434adb14	Make dss non-optional, and fix an "arena.<i>.dss" mallctl bug. Make dss non-optional on all platforms which support sbrk(2). Fix the "arena.<i>.dss" mallctl to return an error if "primary" or "secondary" precedence is specified, but sbrk(2) is not supported.	2014-04-15 12:09:48 -07:00
Jason Evans	9790b9667f	Remove the allocm() API, which is superceded by the allocx() API.	2014-04-14 22:32:31 -07:00
Jason Evans	ada8447cf6	Reduce maximum tested alignment. Reduce maximum tested alignment from 2^29 to 2^25. Some systems may not have enough contiguous virtual memory to satisfy the larger alignment, but the smaller alignment is still adequate to test multi-chunk alignment.	2014-03-30 11:22:23 -07:00
Jason Evans	c2dcfd8ded	Convert ALLOCM_ARENA() test to MALLOCX_ARENA() test.	2014-03-28 10:40:03 -07:00
Jason Evans	2850e90d0d	Remove flawed alignment-related overflow test. Remove the allocm() test equivalent to the mallocx() test removed in the previous commit. The flawed test attempted to cause OOM due to large request size and alignment constraint. Although this test "passed" on 64-bit systems due to the virtual memory hole, it could pass on some 32-bit systems.	2014-01-29 10:58:32 -08:00
Jason Evans	a184d3fcde	Fix/remove flawed alignment-related overflow tests. Fix/remove three related flawed tests that attempted to cause OOM due to large request size and alignment constraint. Although these tests "passed" on 64-bit systems due to the virtual memory hole, they could pass on some 32-bit systems.	2014-01-28 18:09:59 -08:00
Jason Evans	b2c31660be	Extract profiling code from [re]allocation functions. Extract profiling code from malloc(), imemalign(), calloc(), realloc(), mallocx(), rallocx(), and xallocx(). This slightly reduces the amount of code compiled into the fast paths, but the primary benefit is the combinatorial complexity reduction. Simplify iralloc[t]() by creating a separate ixalloc() that handles the no-move cases. Further simplify [mrxn]allocx() (and by implication [mrn]allocm()) to make request size overflows due to size class and/or alignment constraints trigger undefined behavior (detected by debug-only assertions). Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling backtrace creation in imemalign(). This bug impacted posix_memalign() and aligned_alloc().	2014-01-12 15:41:05 -08:00
Jason Evans	e935c07e00	Add rallocx() test of both alignment and zeroing.	2013-12-16 13:37:21 -08:00
Jason Evans	5a658b9c75	Add zero/align tests for rallocx().	2013-12-15 15:54:18 -08:00
Jason Evans	d82a5e6a34	Implement the allocx() API. Implement the allocx() API, which is a successor to the allocm() API. The allocx() functions are slightly simpler to use because they have fewer parameters, they directly return the results of primary interest, and mallocx()/rallocx() avoid the strict aliasing pitfall that allocm()/rallocx() share with posix_memalign(). The following code violates strict aliasing rules: foo_t foo; allocm((void )&foo, NULL, 42, 0); whereas the following is safe: foo_t foo; void p; allocm(&p, NULL, 42, 0); foo = (foo_t )p; mallocx() does not have this problem: foo_t foo = (foo_t )mallocx(42, 0);	2013-12-12 22:35:52 -08:00
Jason Evans	0f4f1efd94	Add mq (message queue) to test infrastructure. Add mtx (mutex) to test infrastructure, in order to avoid bootstrapping complications that would result from directly using malloc_mutex. Rename test infrastructure's thread abstraction from je_thread to thd. Fix some header ordering issues.	2013-12-12 14:41:02 -08:00
Jason Evans	a4f124f59f	Normalize #define whitespace. Consistently use a tab rather than a space following #define.	2013-12-08 22:28:27 -08:00
Jason Evans	2a83ed0284	Refactor tests. Refactor tests to use explicit testing assertions, rather than diff'ing test output. This makes the test code a bit shorter, more explicitly encodes testing intent, and makes test failure diagnosis more straightforward.	2013-12-08 20:52:21 -08:00
Jason Evans	86abd0dcd8	Refactor to support more varied testing. Refactor the test harness to support three types of tests: - unit: White box unit tests. These tests have full access to all internal jemalloc library symbols. Though in actuality all symbols are prefixed by jet_, macro-based name mangling abstracts this away from test code. - integration: Black box integration tests. These tests link with the installable shared jemalloc library, and with the exception of some utility code and configure-generated macro definitions, they have no access to jemalloc internals. - stress: Black box stress tests. These tests link with the installable shared jemalloc library, as well as with an internal allocator with symbols prefixed by jet_ (same as for unit tests) that can be used to allocate data structures that are internal to the test code. Move existing tests into test/{unit,integration}/ as appropriate. Split out internal parts of jemalloc_defs.h.in and put them in jemalloc_internal_defs.h.in. This reduces internals exposure to applications that #include <jemalloc/jemalloc.h>. Refactor jemalloc.h header generation so that a single header file results, and the prototypes can be used to generate jet_ prototypes for tests. Split jemalloc.h.in into multiple parts (jemalloc_defs.h.in, jemalloc_macros.h.in, jemalloc_protos.h.in, jemalloc_mangle.h.in) and use a shell script to generate a unified jemalloc.h at configure time. Change the default private namespace prefix from "" to "je_". Add missing private namespace mangling. Remove hard-coded private_namespace.h. Instead generate it and private_unnamespace.h from private_symbols.txt. Use similar logic for public symbols, which aids in name mangling for jet_ symbols. Add test_warn() and test_fail(). Replace existing exit(1) calls with test_fail() calls.	2013-12-03 22:06:59 -08:00

1 2

71 Commits