server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Daniel Micay	4cfe55166e	Add support for sized deallocation. This adds a new `sdallocx` function to the external API, allowing the size to be passed by the caller. It avoids some extra reads in the thread cache fast path. In the case where stats are enabled, this avoids the work of calculating the size from the pointer. An assertion validates the size that's passed in, so enabling debugging will allow users of the API to debug cases where an incorrect size is passed in. The performance win for a contrived microbenchmark doing an allocation and immediately freeing it is ~10%. It may have a different impact on a real workload. Closes #28	2014-09-08 17:34:24 -07:00
Jason Evans	b718cf77e9	Optimize [nmd]alloc() fast paths. Optimize [nmd]alloc() fast paths such that the (flags == 0) case is streamlined, flags decoding only happens to the minimum degree necessary, and no conditionals are repeated.	2014-09-07 14:40:19 -07:00
Sara Golemon	3e24afa28e	Test for availability of malloc hooks via autoconf __*_hook() is glibc, but on at least one glibc platform (homebrew), the __GLIBC__ define isn't set correctly and we miss being able to use these hooks. Do a feature test for it during configuration so that we enable it anywhere the hooks are actually available.	2014-08-22 15:19:21 -07:00
Jason Evans	602c8e0971	Implement per thread heap profiling. Rename data structures (prof_thr_cnt_t-->prof_tctx_t, prof_ctx_t-->prof_gctx_t), and convert to storing a prof_tctx_t for sampled objects. Convert PROF_ALLOC_PREP() to prof_alloc_prep(), since precise backtrace depth within jemalloc functions is no longer an issue (pprof prunes irrelevant frames). Implement mallctl's: - prof.reset implements full sample data reset, and optional change of sample interval. - prof.lg_sample reads the current sample interval (opt.lg_prof_sample was the permanent source of truth prior to prof.reset). - thread.prof.name provides naming capability for threads within heap profile dumps. - thread.prof.active makes it possible to activate/deactivate heap profiling for individual threads. Modify the heap dump files to contain per thread heap profile data. This change is incompatible with the existing pprof, which will require enhancements to read and process the enriched data.	2014-08-19 21:31:16 -07:00
Richard Diamond	94ed6812bc	Don't catch fork()ing events for Native Client. Native Client doesn't allow forking, thus there is no need to catch fork()ing events for Native Client. Additionally, without this commit, jemalloc will introduce an unresolved pthread_atfork() in PNaCl Rust bins.	2014-06-02 07:45:33 -07:00
Jason Evans	e2deab7a75	Refactor huge allocation to be managed by arenas. Refactor huge allocation to be managed by arenas (though the global red-black tree of huge allocations remains for lookup during deallocation). This is the logical conclusion of recent changes that 1) made per arena dss precedence apply to huge allocation, and 2) made it possible to replace the per arena chunk allocation/deallocation functions. Remove the top level huge stats, and replace them with per arena huge stats. Normalize function names and types to dalloc (some were dealloc). Remove the --enable-mremap option. As jemalloc currently operates, this is a performace regression for some applications, but planned work to logarithmically space huge size classes should provide similar amortized performance. The motivation for this change was that mremap-based huge reallocation forced leaky abstractions that prevented refactoring.	2014-05-15 22:36:41 -07:00
aravind	fb7fe50a88	Add support for user-specified chunk allocators/deallocators. Add new mallctl endpoints "arena<i>.chunk.alloc" and "arena<i>.chunk.dealloc" to allow userspace to configure jemalloc's chunk allocator and deallocator on a per-arena basis.	2014-05-12 10:46:03 -07:00
Jason Evans	a344dd01c7	Fix coding sytle nits.	2014-05-01 15:51:30 -07:00
Jason Evans	6f001059aa	Simplify backtracing. Simplify backtracing to not ignore any frames, and compensate for this in pprof in order to increase flexibility with respect to function-based refactoring even in the presence of non-deterministic inlining. Modify pprof to blacklist all jemalloc allocation entry points including non-standard ones like mallocx(), and ignore all allocator-internal frames. Prior to this change, pprof excluded the specifically blacklisted functions from backtraces, but it left allocator-internal frames intact.	2014-04-22 20:55:09 -07:00
Jason Evans	bd87b01999	Optimize Valgrind integration. Forcefully disable tcache if running inside Valgrind, and remove Valgrind calls in tcache-specific code. Restructure Valgrind-related code to move most Valgrind calls out of the fast path functions. Take advantage of static knowledge to elide some branches in JEMALLOC_VALGRIND_REALLOC().	2014-04-15 16:49:57 -07:00
Jason Evans	ecd3e59ca3	Remove the "opt.valgrind" mallctl. Remove the "opt.valgrind" mallctl because it is unnecessary -- jemalloc automatically detects whether it is running inside valgrind.	2014-04-15 14:33:50 -07:00
Jason Evans	9790b9667f	Remove the allocm() API, which is superceded by the allocx() API.	2014-04-14 22:32:31 -07:00
Jason Evans	9b0cbf0850	Remove support for non-prof-promote heap profiling metadata. Make promotion of sampled small objects to large objects mandatory, so that profiling metadata can always be stored in the chunk map, rather than requiring one pointer per small region in each small-region page run. In practice the non-prof-promote code was only useful when using jemalloc to track all objects and report them as leaks at program exit. However, Valgrind is at least as good a tool for this particular use case. Furthermore, the non-prof-promote code is getting in the way of some optimizations that will make heap profiling much cheaper for the predominant use case (sampling a small representative proportion of all allocations).	2014-04-11 14:24:51 -07:00
Ben Maurer	be8e59f5a6	Don't dereference chunk->arena in free() hot path When you call free() we load chunk->arena even though that data isn't used on the tcache hot path. In profiling some FB applications, I found that ~30% of the dTLB misses in the free() function come from this line. With 4 MB chunks, the arena_chunk_t->map is ~ 32 KB (1024 pages in the chunk, 4 8 byte pointers in arena_chunk_map_t). This means there's only a 1/8 chance of the page containing chunk->arena also comtaining the map bits.	2014-04-05 15:59:08 -07:00
Max Wang	fbb31029a5	Use arena dss prec instead of default for huge allocs. Pass a dss_prec_t parameter to huge_{m,p,r}alloc instead of defaulting to the chunk dss prec.	2014-03-28 13:43:58 -07:00
Jason Evans	e2206edebc	Fix unused variable warnings.	2014-01-21 14:59:13 -08:00
Jason Evans	b2c31660be	Extract profiling code from [re]allocation functions. Extract profiling code from malloc(), imemalign(), calloc(), realloc(), mallocx(), rallocx(), and xallocx(). This slightly reduces the amount of code compiled into the fast paths, but the primary benefit is the combinatorial complexity reduction. Simplify iralloc[t]() by creating a separate ixalloc() that handles the no-move cases. Further simplify [mrxn]allocx() (and by implication [mrn]allocm()) to make request size overflows due to size class and/or alignment constraints trigger undefined behavior (detected by debug-only assertions). Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling backtrace creation in imemalign(). This bug impacted posix_memalign() and aligned_alloc().	2014-01-12 15:41:05 -08:00
Jason Evans	0405312921	Fix an uninitialized variable read in xallocx().	2013-12-20 15:52:01 -08:00
Jason Evans	665769357c	Optimize arena_prof_ctx_set(). Refactor such that arena_prof_ctx_set() receives usize as an argument, and use it to determine whether to handle ptr as a small region, rather than reading the chunk page map.	2013-12-15 21:57:02 -08:00
Jason Evans	d82a5e6a34	Implement the allocx() API. Implement the allocx() API, which is a successor to the allocm() API. The allocx() functions are slightly simpler to use because they have fewer parameters, they directly return the results of primary interest, and mallocx()/rallocx() avoid the strict aliasing pitfall that allocm()/rallocx() share with posix_memalign(). The following code violates strict aliasing rules: foo_t foo; allocm((void )&foo, NULL, 42, 0); whereas the following is safe: foo_t foo; void p; allocm(&p, NULL, 42, 0); foo = (foo_t )p; mallocx() does not have this problem: foo_t foo = (foo_t )mallocx(42, 0);	2013-12-12 22:35:52 -08:00
Jason Evans	7369232544	Silence some unused variable warnings.	2013-12-10 13:51:52 -08:00
Jason Evans	52b30691f9	Remove unused variable.	2013-12-02 15:16:39 -08:00
Jason Evans	addad093f8	Clean up malloc_ncpus(). Clean up malloc_ncpus() by replacing incorrectly indented if..else branches with a ?: expression. Submitted by Igor Podlesny.	2013-11-29 16:19:44 -08:00
Jason Evans	39e7fd0580	Fix ALLOCM_ARENA(a) handling in rallocm(). Fix rallocm() to use the specified arena for allocation, not just deallocation. Clarify ALLOCM_ARENA(a) documentation.	2013-11-25 18:02:35 -08:00
Leonard Crestez	ac4403cacb	Delay pthread_atfork registering. This function causes recursive allocation on LinuxThreads. Signed-off-by: Crestez Dan Leonard <lcrestez@ixiacom.com>	2013-10-24 16:40:31 -07:00
Jason Evans	1d1cee127a	Add a missing mutex unlock in malloc_init_hard() error path. Add a missing mutex unlock in a malloc_init_hard() error path (failed mutex initialization). In practice this bug was very unlikely to ever trigger, but if it did, application deadlock would likely result. Reported by Pat Lynch.	2013-10-21 15:04:12 -07:00
Jason Evans	e2985a2381	Avoid (x < 0) comparison for unsigned x. Avoid (min < 0) comparison for unsigned min in malloc_conf_init(). This bug had no practical consequences. Reported by Pat Lynch.	2013-10-21 15:01:44 -07:00
Jason Evans	6556e28be1	Prefer not_reached() over assert(false) where appropriate.	2013-10-21 14:56:27 -07:00
Jason Evans	543abf7e6c	Fix inlining warning. Add the JEMALLOC_ALWAYS_INLINE_C macro and use it for always-inlined functions declared in .c files. This fixes a function attribute inconsistency for debug builds that resulted in (harmless) compiler warnings about functions not being inlinable. Reported by Ricardo Nabinger Sanchez.	2013-10-19 17:26:00 -07:00
Alexandre Perrin	dd6ef0302f	malloc_conf_init: revert errno value when readlink(2) fail.	2013-10-13 15:33:15 -07:00
Jason Evans	88c222c8e9	Fix a prof-related locking order bug. Fix a locking order bug that could cause deadlock during fork if heap profiling were enabled.	2013-02-06 11:59:30 -08:00
Jason Evans	bbe29d374d	Fix potential TLS-related memory corruption. Avoid writing to uninitialized TLS as a side effect of deallocation. Initializing TLS during deallocation is unsafe because it is possible that a thread never did any allocation, and that TLS has already been deallocated by the threads library, resulting in write-after-free corruption. These fixes affect prof_tdata and quarantine; all other uses of TLS are already safe, whether intentionally (as for tcache) or unintentionally (as for arenas).	2013-01-31 14:23:48 -08:00
Jason Evans	d1b6e18a99	Revert opt_abort and opt_junk refactoring. Revert refactoring of opt_abort and opt_junk declarations. clang accepts the config_*-based declarations (and generates correct code), but gcc complains with: error: initializer element is not constant	2013-01-22 16:54:26 -08:00
Jason Evans	ba175a2bfb	Use config_* instead of JEMALLOC_. Convert a couple of stragglers from JEMALLOC_ to use config_*.	2013-01-22 12:14:45 -08:00
Jason Evans	88393cb0eb	Add and use JEMALLOC_ALWAYS_INLINE. Add JEMALLOC_ALWAYS_INLINE and use it to guarantee that the entire fast paths of the primary allocation/deallocation functions are inlined.	2013-01-22 08:45:43 -08:00
Garrett Cooper	6e6164ae15	Don't mangle errno with free(3) if utrace(2) fails This ensures POLA on FreeBSD (at least) as free(3) is generally assumed to not fiddle around with errno. Signed-off-by: Garrett Cooper <yanegomi@gmail.com>	2012-12-24 10:30:57 -08:00
Jason Evans	1bf2743e08	Add clipping support to lg_chunk option processing. Modify processing of the lg_chunk option so that it clips an out-of-range input to the edge of the valid range. This makes it possible to request the minimum possible chunk size without intimate knowledge of allocator internals. Submitted by Ian Lepore (see FreeBSD PR bin/174641).	2012-12-23 08:51:48 -08:00
Jason Evans	609ae595f0	Add arena-specific and selective dss allocation. Add the "arenas.extend" mallctl, so that it is possible to create new arenas that are outside the set that jemalloc automatically multiplexes threads onto. Add the ALLOCM_ARENA() flag for {,r,d}allocm(), so that it is possible to explicitly allocate from a particular arena. Add the "opt.dss" mallctl, which controls the default precedence of dss allocation relative to mmap allocation. Add the "arena.<i>.dss" mallctl, which makes it possible to set the default dss precedence on a per arena or global basis. Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge". Add the "stats.arenas.<i>.dss" mallctl.	2012-10-12 18:26:16 -07:00
Jason Evans	2cc11ff837	Make malloc_usable_size() implementation consistent with prototype. Use JEMALLOC_USABLE_SIZE_CONST for the malloc_usable_size() implementation as well as the prototype, for consistency's sake.	2012-10-09 16:29:21 -07:00
Jason Evans	b5225928fe	Fix fork(2)-related mutex acquisition order. Fix mutex acquisition order inversion for the chunks rtree and the base mutex. Chunks rtree acquisition was introduced by the previous commit, so this bug was short-lived.	2012-10-09 16:16:00 -07:00
Jason Evans	20f1fc95ad	Fix fork(2)-related deadlocks. Add a library constructor for jemalloc that initializes the allocator. This fixes a race that could occur if threads were created by the main thread prior to any memory allocation, followed by fork(2), and then memory allocation in the child process. Fix the prefork/postfork functions to acquire/release the ctl, prof, and rtree mutexes. This fixes various fork() child process deadlocks, but one possible deadlock remains (intentionally) unaddressed: prof backtracing can acquire runtime library mutexes, so deadlock is still possible if heap profiling is enabled during fork(). This deadlock is known to be a real issue in at least the case of libgcc-based backtracing. Reported by tfengjun.	2012-10-09 15:21:46 -07:00
Corey Richardson	1d553f72cb	If sysconf() fails, the number of CPUs is reported as UINT_MAX, not 1 as it should be	2012-10-08 15:45:38 -07:00
Jason Evans	5c710cee78	Remove const from ___hook variable declarations. Remove const from ___hook variable declarations, so that glibc can modify them during process forking.	2012-05-23 16:09:22 -07:00
Jason Evans	174b70efb4	Disable tcache by default if running inside Valgrind. Disable tcache by default if running inside Valgrind, in order to avoid making unallocated objects appear reachable to Valgrind.	2012-05-15 23:31:53 -07:00
Jason Evans	781fe75e0a	Auto-detect whether running inside Valgrind. Auto-detect whether running inside Valgrind, thus removing the need to manually specify MALLOC_CONF=valgrind:true.	2012-05-15 14:48:14 -07:00
Jason Evans	58ad1e4956	Return early in _malloc_{pre,post}fork() if uninitialized. Avoid mutex operations in _malloc_{pre,post}fork() unless jemalloc has been initialized. Reported by David Xu.	2012-05-11 17:40:16 -07:00
Mike Hommey	fd97b1dfc7	Add support for MSVC Tested with MSVC 8 32 and 64 bits.	2012-05-01 11:32:11 -07:00
Mike Hommey	da99e31105	Replace JEMALLOC_ATTR with various different macros when it makes sense Theses newly added macros will be used to implement the equivalent under MSVC. Also, move the definitions to headers, where they make more sense, and for some, are even more useful there (e.g. malloc).	2012-04-30 17:57:31 -07:00
Mike Hommey	a14bce85e8	Use Get/SetLastError on Win32 Using errno on win32 doesn't quite work, because the value set in a shared library can't be read from e.g. an executable calling the function setting errno. At the same time, since buferror always uses errno/GetLastError, don't pass it.	2012-04-30 16:50:55 -07:00
Jason Evans	3fb50b0407	Fix a PROF_ALLOC_PREP() error path. Fix a PROF_ALLOC_PREP() error path to initialize the return value to NULL.	2012-04-25 13:13:44 -07:00

1 2 3

109 Commits