server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Jason Evans	57efa7bb0e	Avoid atexit(3) when possible, disable prof_final by default. atexit(3) can deadlock internally during its own initialization if jemalloc calls atexit() during jemalloc initialization. Mitigate the impact by restructuring prof initialization to avoid calling atexit() unless the registered function will actually dump a final heap profile. Additionally, disable prof_final by default so that this land mine is opt-in rather than opt-out. This resolves #144.	2014-10-08 18:08:00 -07:00
Jason Evans	3c3b3b1a94	Fix a docbook element nesting nit. According to the docbook documentation for <funcprototype>, its parent must be <funcsynopsis>; fix accordingly. Nonetheless, the man page processor fails badly when this construct is embedded in a <para> (which is documented to be legal), although the html processor does fine.	2014-10-05 14:48:44 -07:00
Daniel Micay	a95018ee81	Attempt to expand huge allocations in-place. This adds support for expanding huge allocations in-place by requesting memory at a specific address from the chunk allocator. It's currently only implemented for the chunk recycling path, although in theory it could also be done by optimistically allocating new chunks. On Linux, it could attempt an in-place mremap. However, that won't work in practice since the heap is grown downwards and memory is not unmapped (in a normal build, at least). Repeated vector reallocation micro-benchmark: #include <string.h> #include <stdlib.h> int main(void) { for (size_t i = 0; i < 100; i++) { void ptr = NULL; size_t old_size = 0; for (size_t size = 4; size < (1 << 30); size = 2) { ptr = realloc(ptr, size); if (!ptr) return 1; memset(ptr + old_size, 0xff, size - old_size); old_size = size; } free(ptr); } } The glibc allocator fails to do any in-place reallocations on this benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it elides the cost of copies via mremap, which is currently not something that jemalloc can use. With this improvement, jemalloc still fails to do any in-place huge reallocations for the first outer loop, but then succeeds 100% of the time for the remaining 99 iterations. The time spent doing allocations and copies drops down to under 5%, with nearly all of it spent doing purging + faulting (when huge pages are disabled) and the array memset. An improved mremap API (MREMAP_RETAIN - #138) would be far more general but this is a portable optimization and would still be useful on Linux for xallocx. Numbers with transparent huge pages enabled: glibc (copies elided via MREMAP_MAYMOVE): 8.471s jemalloc: 17.816s jemalloc + no-op madvise: 13.236s jemalloc + this commit: 6.787s jemalloc + this commit + no-op madvise: 6.144s Numbers with transparent huge pages disabled: glibc (copies elided via MREMAP_MAYMOVE): 15.403s jemalloc: 39.456s jemalloc + no-op madvise: 12.768s jemalloc + this commit: 15.534s jemalloc + this commit + no-op madvise: 6.354s Closes #137	2014-10-05 14:47:01 -07:00
Jason Evans	e9a3fa2e09	Add missing header includes in jemalloc/jemalloc.h . Add stdlib.h, stdbool.h, and stdint.h to jemalloc/jemalloc.h so that applications only have to #include <jemalloc/jemalloc.h>. This resolves #132.	2014-10-05 12:05:37 -07:00
Jason Evans	fc12c0b8bc	Implement/test/fix prof-related mallctl's. Implement/test/fix the opt.prof_thread_active_init, prof.thread_active_init, and thread.prof.active mallctl's. Test/fix the thread.prof.name mallctl. Refactor opt_prof_active to be read-only and move mutable state into the prof_active variable. Stop leaning on ctl-related locking for protection.	2014-10-03 23:25:30 -07:00
Jason Evans	20c31deaae	Test prof.reset mallctl and fix numerous discovered bugs.	2014-10-02 23:01:10 -07:00
Daniel Micay	4cfe55166e	Add support for sized deallocation. This adds a new `sdallocx` function to the external API, allowing the size to be passed by the caller. It avoids some extra reads in the thread cache fast path. In the case where stats are enabled, this avoids the work of calculating the size from the pointer. An assertion validates the size that's passed in, so enabling debugging will allow users of the API to debug cases where an incorrect size is passed in. The performance win for a contrived microbenchmark doing an allocation and immediately freeing it is ~10%. It may have a different impact on a real workload. Closes #28	2014-09-08 17:34:24 -07:00
Jason Evans	602c8e0971	Implement per thread heap profiling. Rename data structures (prof_thr_cnt_t-->prof_tctx_t, prof_ctx_t-->prof_gctx_t), and convert to storing a prof_tctx_t for sampled objects. Convert PROF_ALLOC_PREP() to prof_alloc_prep(), since precise backtrace depth within jemalloc functions is no longer an issue (pprof prunes irrelevant frames). Implement mallctl's: - prof.reset implements full sample data reset, and optional change of sample interval. - prof.lg_sample reads the current sample interval (opt.lg_prof_sample was the permanent source of truth prior to prof.reset). - thread.prof.name provides naming capability for threads within heap profile dumps. - thread.prof.active makes it possible to activate/deactivate heap profiling for individual threads. Modify the heap dump files to contain per thread heap profile data. This change is incompatible with the existing pprof, which will require enhancements to read and process the enriched data.	2014-08-19 21:31:16 -07:00
Jason Evans	b4d62cd61b	Minor doc edit.	2014-05-15 22:46:24 -07:00
Jason Evans	e2deab7a75	Refactor huge allocation to be managed by arenas. Refactor huge allocation to be managed by arenas (though the global red-black tree of huge allocations remains for lookup during deallocation). This is the logical conclusion of recent changes that 1) made per arena dss precedence apply to huge allocation, and 2) made it possible to replace the per arena chunk allocation/deallocation functions. Remove the top level huge stats, and replace them with per arena huge stats. Normalize function names and types to dalloc (some were dealloc). Remove the --enable-mremap option. As jemalloc currently operates, this is a performace regression for some applications, but planned work to logarithmically space huge size classes should provide similar amortized performance. The motivation for this change was that mremap-based huge reallocation forced leaky abstractions that prevented refactoring.	2014-05-15 22:36:41 -07:00
aravind	fb7fe50a88	Add support for user-specified chunk allocators/deallocators. Add new mallctl endpoints "arena<i>.chunk.alloc" and "arena<i>.chunk.dealloc" to allow userspace to configure jemalloc's chunk allocator and deallocator on a per-arena basis.	2014-05-12 10:46:03 -07:00
Jason Evans	bd87b01999	Optimize Valgrind integration. Forcefully disable tcache if running inside Valgrind, and remove Valgrind calls in tcache-specific code. Restructure Valgrind-related code to move most Valgrind calls out of the fast path functions. Take advantage of static knowledge to elide some branches in JEMALLOC_VALGRIND_REALLOC().	2014-04-15 16:49:57 -07:00
Jason Evans	ecd3e59ca3	Remove the "opt.valgrind" mallctl. Remove the "opt.valgrind" mallctl because it is unnecessary -- jemalloc automatically detects whether it is running inside valgrind.	2014-04-15 14:33:50 -07:00
Jason Evans	a2c719b374	Remove the "arenas.purge" mallctl. Remove the "arenas.purge" mallctl, which was obsoleted by the "arena.<i>.purge" mallctl in 3.1.0.	2014-04-15 12:46:28 -07:00
Jason Evans	4d434adb14	Make dss non-optional, and fix an "arena.<i>.dss" mallctl bug. Make dss non-optional on all platforms which support sbrk(2). Fix the "arena.<i>.dss" mallctl to return an error if "primary" or "secondary" precedence is specified, but sbrk(2) is not supported.	2014-04-15 12:09:48 -07:00
Jason Evans	24a4ba77e1	Update MALLOCX_ARENA() documentation. Update MALLOCX_ARENA() documentation to no longer claim that it has no effect for huge region allocations.	2014-04-14 22:38:59 -07:00
Jason Evans	9790b9667f	Remove the allocm() API, which is superceded by the allocx() API.	2014-04-14 22:32:31 -07:00
Jason Evans	9c62ed44b0	Document how dss precedence affects huge allocation.	2014-03-31 09:16:59 -07:00
Jason Evans	b2c31660be	Extract profiling code from [re]allocation functions. Extract profiling code from malloc(), imemalign(), calloc(), realloc(), mallocx(), rallocx(), and xallocx(). This slightly reduces the amount of code compiled into the fast paths, but the primary benefit is the combinatorial complexity reduction. Simplify iralloc[t]() by creating a separate ixalloc() that handles the no-move cases. Further simplify [mrxn]allocx() (and by implication [mrn]allocm()) to make request size overflows due to size class and/or alignment constraints trigger undefined behavior (detected by debug-only assertions). Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling backtrace creation in imemalign(). This bug impacted posix_memalign() and aligned_alloc().	2014-01-12 15:41:05 -08:00
Jason Evans	d8a390020c	Fix a few mallctl() documentation errors. Normalize mallctl() order (code and documentation).	2013-12-19 21:40:41 -08:00
Jason Evans	de73296d6b	Add mallctl*() unit tests.	2013-12-19 21:40:13 -08:00
Jason Evans	1393d79a4c	Remove ENOMEM from the documented set of mallctl() errors. mallctl() always returns EINVAL and does partial result copying when *oldlenp is to short to hold the requested value, rather than returning ENOMEM. Therefore remove ENOMEM from the documented set of possible errors.	2013-12-18 15:35:45 -08:00
Jason Evans	d82a5e6a34	Implement the allocx() API. Implement the allocx() API, which is a successor to the allocm() API. The allocx() functions are slightly simpler to use because they have fewer parameters, they directly return the results of primary interest, and mallocx()/rallocx() avoid the strict aliasing pitfall that allocm()/rallocx() share with posix_memalign(). The following code violates strict aliasing rules: foo_t foo; allocm((void )&foo, NULL, 42, 0); whereas the following is safe: foo_t foo; void p; allocm(&p, NULL, 42, 0); foo = (foo_t )p; mallocx() does not have this problem: foo_t foo = (foo_t )mallocx(42, 0);	2013-12-12 22:35:52 -08:00
Jason Evans	39e7fd0580	Fix ALLOCM_ARENA(a) handling in rallocm(). Fix rallocm() to use the specified arena for allocation, not just deallocation. Clarify ALLOCM_ARENA(a) documentation.	2013-11-25 18:02:35 -08:00
Jason Evans	aabaf851b2	Add ids for all mallctl entries. Add ids for all mallctl entries, so that external documents can link to arbitrary mallctl entries.	2013-10-30 14:52:09 -07:00
Jason Evans	705328ca46	Clarify how to use malloc_conf. Clarify that malloc_conf is intended only for compile-time configuration, since jemalloc may be initialized before main() is entered.	2013-03-19 16:28:41 -07:00
Jason Evans	1bf2743e08	Add clipping support to lg_chunk option processing. Modify processing of the lg_chunk option so that it clips an out-of-range input to the edge of the valid range. This makes it possible to request the minimum possible chunk size without intimate knowledge of allocator internals. Submitted by Ian Lepore (see FreeBSD PR bin/174641).	2012-12-23 08:51:48 -08:00
Jan Beich	ed90c97332	document what stats.active does not track Based on http://www.canonware.com/pipermail/jemalloc-discuss/2012-March/000164.html	2012-11-06 16:30:24 -08:00
Jason Evans	e3d13060c8	Purge unused dirty pages in a fragmentation-reducing order. Purge unused dirty pages in an order that first performs clean/dirty run defragmentation, in order to mitigate available run fragmentation. Remove the limitation that prevented purging unless at least one chunk worth of dirty pages had accumulated in an arena. This limitation was intended to avoid excessive purging for small applications, but the threshold was arbitrary, and the effect of questionable utility. Relax opt_lg_dirty_mult from 5 to 3. This compensates for increased likelihood of allocating clean runs, given the same ratio of clean:dirty runs, and reduces the potential for repeated purging in pathological large malloc/free loops that push the active:dirty page ratio just over the purge threshold.	2012-11-06 00:59:53 -08:00
Jason Evans	609ae595f0	Add arena-specific and selective dss allocation. Add the "arenas.extend" mallctl, so that it is possible to create new arenas that are outside the set that jemalloc automatically multiplexes threads onto. Add the ALLOCM_ARENA() flag for {,r,d}allocm(), so that it is possible to explicitly allocate from a particular arena. Add the "opt.dss" mallctl, which controls the default precedence of dss allocation relative to mmap allocation. Add the "arena.<i>.dss" mallctl, which makes it possible to set the default dss precedence on a per arena or global basis. Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge". Add the "stats.arenas.<i>.dss" mallctl.	2012-10-12 18:26:16 -07:00
Jason Evans	174b70efb4	Disable tcache by default if running inside Valgrind. Disable tcache by default if running inside Valgrind, in order to avoid making unallocated objects appear reachable to Valgrind.	2012-05-15 23:31:53 -07:00
Jason Evans	781fe75e0a	Auto-detect whether running inside Valgrind. Auto-detect whether running inside Valgrind, thus removing the need to manually specify MALLOC_CONF=valgrind:true.	2012-05-15 14:48:14 -07:00
Jason Evans	80fe0478e6	Generalize "stats.mapped" documentation. Generalize "stats.mapped" documentation to state that all inactive chunks are omitted, now that it is possible for mmap'ed chunks to be omitted in addition to DSS chunks.	2012-05-09 23:08:48 -07:00
Jason Evans	2e671ffbad	Add the --enable-mremap option. Add the --enable-mremap option, and disable the use of mremap(2) by default, for the same reason that freeing chunks via munmap(2) is disabled by default on Linux: semi-permanent VM map fragmentation.	2012-05-09 16:12:00 -07:00
Jason Evans	d926c90500	Fix Valgrind URL in documentation. Reported by Daichi GOTO.	2012-04-25 23:17:57 -07:00
Jason Evans	8f0e0eb1c0	Fix a memory corruption bug in chunk_alloc_dss(). Fix a memory corruption bug in chunk_alloc_dss() that was due to claiming newly allocated memory is zeroed. Reverse order of preference between mmap() and sbrk() to prefer mmap(). Clean up management of 'zero' parameter in chunk_alloc*().	2012-04-21 13:33:48 -07:00
Jason Evans	0b25fe79aa	Update prof defaults to match common usage. Change the "opt.lg_prof_sample" default from 0 to 19 (1 B to 512 KiB). Change the "opt.prof_accum" default from true to false. Add the "opt.prof_final" mallctl, so that "opt.prof_prefix" need not be abused to disable final profile dumping.	2012-04-17 16:39:33 -07:00
Jason Evans	25a000e896	Update pprof (from gperftools 2.0).	2012-04-17 15:49:30 -07:00
Jason Evans	59ae2766af	Add the --disable-munmap option. Add the --disable-munmap option, remove the configure test that attempted to detect the VM allocation quirk known to exist on Linux x86[_64], and make --disable-munmap implicit on Linux.	2012-04-16 18:08:58 -07:00
Jason Evans	d6abcbb14b	Always disable redzone by default. Always disable redzone by default, even when --enable-debug is specified. The memory overhead for redzones can be substantial, which makes this feature something that should only be opted into.	2012-04-12 17:09:54 -07:00
Jason Evans	122449b073	Implement Valgrind support, redzones, and quarantine. Implement Valgrind support, as well as the redzone and quarantine features, which help Valgrind detect memory errors. Redzones are only implemented for small objects because the changes necessary to support redzones around large and huge objects are complicated by in-place reallocation, to the point that it isn't clear that the maintenance burden is worth the incremental improvement to Valgrind support. Merge arena_salloc() and arena_salloc_demote(). Refactor i[v]salloc() to expose the 'demote' option.	2012-04-11 11:46:18 -07:00
Jason Evans	b147611b52	Add utrace(2)-based tracing (--enable-utrace).	2012-04-05 13:36:17 -07:00
Jason Evans	48db6167e7	Remove obsolete "config.dynamic_page_shift" mallctl documentation.	2012-04-03 01:33:55 -07:00
Jason Evans	ae4c7b4b40	Clean up PAGE macros. s/PAGE_SHIFT/LG_PAGE/g and s/PAGE_SIZE/PAGE/g. Remove remnants of the dynamic-page-shift code. Rename the "arenas.pagesize" mallctl to "arenas.page". Remove the "arenas.chunksize" mallctl, which is redundant with "opt.lg_chunk".	2012-04-02 07:04:34 -07:00
Jason Evans	d4be8b7b6e	Add the "thread.tcache.enabled" mallctl.	2012-03-26 19:02:49 -07:00
Jason Evans	7091b415bb	Fix various documentation formatting regressions.	2012-03-19 09:36:44 -07:00
Jason Evans	e7b8fa18d2	Rename the "tcache.flush" mallctl to "thread.tcache.flush".	2012-03-16 17:09:32 -07:00
Jason Evans	0a0bbf63e5	Implement aligned_alloc(). Implement aligned_alloc(), which was added in the C11 standard. The function is weakly specified to the point that a minimally compliant implementation would be painful to use (size must be an integral multiple of alignment!), which in practice makes posix_memalign() a safer choice.	2012-03-13 12:55:21 -07:00
Jason Evans	4507f34628	Remove the lg_tcache_gc_sweep option. Remove the lg_tcache_gc_sweep option, because it is no longer very useful. Prior to the addition of dynamic adjustment of tcache fill count, it was possible for fill/flush overhead to be a problem, but this problem no longer occurs.	2012-03-05 14:34:37 -08:00
Jason Evans	7e77eaffff	Add the --disable-experimental option.	2012-03-02 17:47:37 -08:00

1 2

60 Commits