server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
David Goldblatt	44f9bd147a	Header refactoring: unify and de-catchall rtree module.	2017-05-31 13:08:45 -07:00
Qi Wang	7578b0e929	Explicitly say so when aborting on opt_abort_conf.	2017-05-30 17:37:35 -07:00
Qi Wang	d5ef5ae934	Add opt.stats_print_opts. The value is passed to atexit(3)-triggered malloc_stats_print() calls.	2017-05-29 11:54:00 -07:00
Qi Wang	b86d271cbf	Added opt_abort_conf: abort on invalid config options.	2017-05-26 21:14:28 -07:00
David Goldblatt	18ecbfa89e	Header refactoring: unify and de-catchall mutex module	2017-05-24 15:27:30 -07:00
David Goldblatt	9f822a1fd7	Header refactoring: unify and de-catchall witness code.	2017-05-24 15:27:30 -07:00
Qi Wang	b693c7868e	Implementing opt.background_thread. Added opt.background_thread to enable background threads, which handles purging currently. When enabled, decay ticks will not trigger purging (which will be left to the background threads). We limit the max number of threads to NCPUs. When percpu arena is enabled, set CPU affinity for the background threads as well. The sleep interval of background threads is dynamic and determined by computing number of pages to purge in the future (based on backlog).	2017-05-23 12:26:20 -07:00
David Goldblatt	26c792e61a	Allow mutexes to take a lock ordering enum at construction. This lets us specify whether and how mutexes of the same rank are allowed to be acquired. Currently, we only allow two polices (only a single mutex at a given rank at a time, and mutexes acquired in ascending order), but we can plausibly allow more (e.g. the "release uncontended mutexes before blocking").	2017-05-19 14:21:27 -07:00
Jason Evans	6e62c62862	Refactor decay_time into decay_ms. Support millisecond resolution for decay times. Among other use cases this makes it possible to specify a short initial dirty-->muzzy decay phase, followed by a longer muzzy-->clean decay phase. This resolves #812.	2017-05-18 11:33:45 -07:00
Jason Evans	18a83681cf	Refactor (MALLOCX_ARENA_MAX + 1) to be MALLOCX_ARENA_LIMIT. This resolves #673.	2017-05-14 10:14:23 -07:00
Jason Evans	909f0482e4	Automatically generate private symbol name mangling macros. Rather than using a manually maintained list of internal symbols to drive name mangling, add a compilation phase to automatically extract the list of internal symbols. This resolves #677.	2017-05-11 23:06:54 -07:00
David Goldblatt	209f2926b8	Header refactoring: tsd - cleanup and dependency breaking. This removes the tsd macros (which are used only for tsd_t in real builds). We break up the circular dependencies involving tsd. We also move all tsd access through getters and setters. This allows us to assert that we only touch data when tsd is in a valid state. We simplify the usages of the x macro trick, removing all the customizability (get/set, init, cleanup), moving the lifetime logic to tsd_init and tsd_cleanup. This lets us make initialization order independent of order within tsd_t.	2017-05-01 10:49:56 -07:00
Jason Evans	b9ab04a191	Refactor !opt.munmap to opt.retain.	2017-04-29 09:24:12 -07:00
David Goldblatt	89e2d3c12b	Header refactoring: ctl - unify and remove from catchall. In order to do this, we introduce the mutex_prof module, which breaks a circular dependency between ctl and prof.	2017-04-25 09:51:38 -07:00
Jason Evans	c67c3e4a63	Replace --disable-munmap with opt.munmap. Control use of munmap(2) via a run-time option rather than a compile-time option (with the same per platform default). The old behavior of --disable-munmap can be achieved with --with-malloc-conf=munmap:false. This partially resolves #580.	2017-04-24 20:37:16 -07:00
David Goldblatt	31b43219db	Header refactoring: size_classes module - remove from the catchall	2017-04-24 10:33:21 -07:00
David Goldblatt	bf2dc7e678	Header refactoring: ticker module - remove from the catchall and unify.	2017-04-24 10:33:21 -07:00
David Goldblatt	4d2e4bf5eb	Get rid of most of the various inline macros.	2017-04-24 10:33:21 -07:00
David Goldblatt	425253e2cd	Enable -Wundef, when supported. This can catch bugs in which one header defines a numeric constant, and another uses it without including the defining header. Undefined preprocessor symbols expand to '0', so that this will compile fine, silently doing the math wrong.	2017-04-21 17:03:56 -07:00
Jason Evans	3823effe12	Remove --enable-ivsalloc. Continue to use ivsalloc() when --enable-debug is specified (and add assertions to guard against 0 size), but stop providing a documented explicit semantics-changing band-aid to dodge undefined behavior in sallocx() and malloc_usable_size(). ivsalloc() remains compiled in, unlike when #211 restored --enable-ivsalloc, and if JEMALLOC_FORCE_IVSALLOC is defined during compilation, sallocx() and malloc_usable_size() will still use ivsalloc(). This partially resolves #580.	2017-04-21 14:34:35 -07:00
Jason Evans	4403c9ab44	Remove --disable-tcache. Simplify configuration by removing the --disable-tcache option, but replace the testing for that configuration with --with-malloc-conf=tcache:false. Fix the thread.arena and thread.tcache.flush mallctls to work correctly if tcache is disabled. This partially resolves #580.	2017-04-21 10:06:12 -07:00
Jason Evans	da4cff0279	Support --with-lg-page values larger than system page size. All mappings continue to be PAGE-aligned, even if the system page size is smaller. This change is primarily intended to provide a mechanism for supporting multiple page sizes with the same binary; smaller page sizes work better in conjunction with jemalloc's design. This resolves #467.	2017-04-18 19:01:04 -07:00
David Goldblatt	38e847c1c5	Header refactoring: unify spin.h and move it out of the catch-all.	2017-04-18 18:35:03 -07:00
David Goldblatt	7ebc83894f	Header refactoring: move jemalloc_internal_types.h out of the catch-all	2017-04-18 18:35:03 -07:00
David Goldblatt	d9ec36e22d	Header refactoring: move assert.h out of the catch-all	2017-04-18 18:35:03 -07:00
David Goldblatt	f692e6c214	Header refactoring: move util.h out of the catchall	2017-04-18 18:35:03 -07:00
David Goldblatt	54373be084	Header refactoring: move malloc_io.h out of the catchall	2017-04-18 18:35:03 -07:00
Qi Wang	c2fcf9c2cf	Switch to fine-grained reentrancy support. Previously we had a general detection and support of reentrancy, at the cost of having branches and inc / dec operations on fast paths. To avoid taxing fast paths, we move the reentrancy operations onto tsd slow state, and only modify reentrancy level around external calls (that might trigger reentrancy).	2017-04-14 19:48:06 -07:00
Qi Wang	b348ba29bb	Bundle 3 branches on fast path into tsd_state. Added tsd_state_nominal_slow, which on fast path malloc() incorporates tcache_enabled check, and on fast path free() bundles both malloc_slow and tcache_enabled branches.	2017-04-14 16:58:08 -07:00
Qi Wang	ccfe68a916	Pass alloc_ctx down profiling path. With this change, when profiling is enabled, we avoid doing redundant rtree lookups. Also changed dalloc_atx_t to alloc_atx_t, as it's now used on allocation path as well (to speed up profiling).	2017-04-12 13:55:39 -07:00
Qi Wang	f35213bae4	Pass dalloc_ctx down the sdalloc path. This avoids redundant rtree lookups.	2017-04-12 13:55:39 -07:00
David Goldblatt	e709fae1d7	Header refactoring: move atomic.h out of the catch-all	2017-04-11 11:52:30 -07:00
David Goldblatt	743d940dc3	Header refactoring: Split up jemalloc_internal.h This is a biggy. jemalloc_internal.h has been doing multiple jobs for a while now: - The source of system-wide definitions. - The catch-all include file. - The module header file for jemalloc.c This commit splits up this functionality. The system-wide definitions responsibility has moved to jemalloc_preamble.h. The catch-all include file is now jemalloc_internal_includes.h. The module headers for jemalloc.c are now in jemalloc_internal_[externs\|inlines\|types].h, just as they are for the other modules.	2017-04-11 11:52:30 -07:00
Qi Wang	bfa530b75b	Pass dealloc_ctx down free() fast path. This gets rid of the redundent rtree lookup down fast path.	2017-04-11 09:58:12 -07:00
Qi Wang	04ef218d87	Move reentrancy_level to the beginning of TSD.	2017-04-07 16:25:43 -07:00
David Goldblatt	b407a65401	Add basic reentrancy-checking support, and allow arena_new to reenter. This checks whether or not we're reentrant using thread-local data, and, if we are, moves certain internal allocations to use arena 0 (which should be properly initialized after bootstrapping). The immediate thing this allows is spinning up threads in arena_new, which will enable spinning up background threads there.	2017-04-07 14:10:27 -07:00
Qi Wang	fde3e20cc0	Integrate auto tcache into TSD. The embedded tcache is initialized upon tsd initialization. The avail arrays for the tbins will be allocated / deallocated accordingly during init / cleanup. With this change, the pointer to the auto tcache will always be available, as long as we have access to the TSD. tcache_available() (called in tcache_get()) is provided to check if we should use tcache.	2017-04-07 09:55:14 -07:00
David Goldblatt	bc32ec3503	Move arena-tracking atomics in jemalloc.c to C11-style	2017-04-05 16:25:37 -07:00
Qi Wang	d3cda3423c	Do proper cleanup for tsd_state_reincarnated. Also enable arena_bind under non-nominal state, as the cleanup will be handled correctly now.	2017-04-04 00:34:49 -07:00
Qi Wang	e6b074472e	Force inline ifree to avoid function call costs on fast path. Without ALWAYS_INLINE, sometimes ifree() gets compiled into its own function, which adds overhead on the fast path.	2017-03-24 17:54:28 -07:00
Jason Evans	5e67fbc367	Push down iealloc() calls. Call iealloc() as deep into call chains as possible without causing redundant calls.	2017-03-22 18:33:32 -07:00
Jason Evans	51a2ec92a1	Remove extent dereferences from the deallocation fast paths.	2017-03-22 18:33:32 -07:00
Jason Evans	4f341412e5	Remove extent arg from isalloc() and arena_salloc().	2017-03-22 18:33:32 -07:00
Qi Wang	ad91762635	Not re-binding iarena when migrate between arenas.	2017-03-21 14:05:20 -07:00
Jason Evans	64e458f5cd	Implement two-phase decay-based purging. Split decay-based purging into two phases, the first of which uses lazy purging to convert dirty pages to "muzzy", and the second of which uses forced purging, decommit, or unmapping to convert pages to clean or destroy them altogether. Not all operating systems support lazy purging, yet the application may provide extent hooks that implement lazy purging, so care must be taken to dynamically omit the first phase when necessary. The mallctl interfaces change as follows: - opt.decay_time --> opt.{dirty,muzzy}_decay_time - arena.<i>.decay_time --> arena.<i>.{dirty,muzzy}_decay_time - arenas.decay_time --> arenas.{dirty,muzzy}_decay_time - stats.arenas.<i>.pdirty --> stats.arenas.<i>.p{dirty,muzzy} - stats.arenas.<i>.{npurge,nmadvise,purged} --> stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged} This resolves #521.	2017-03-15 13:13:47 -07:00
Qi Wang	ec532e2c5c	Implement per-CPU arena. The new feature, opt.percpu_arena, determines thread-arena association dynamically based CPU id. Three modes are supported: "percpu", "phycpu" and disabled. "percpu" uses the current core id (with help from sched_getcpu()) directly as the arena index, while "phycpu" will assign threads on the same physical CPU to the same arena. In other words, "percpu" means # of arenas == # of CPUs, while "phycpu" has # of arenas == 1/2 * (# of CPUs). Note that no runtime check on whether hyper threading is enabled is added yet. When enabled, threads will be migrated between arenas when a CPU change is detected. In the current design, to reduce overhead from reading CPU id, each arena tracks the thread accessed most recently. When a new thread comes in, we will read CPU id and update arena if necessary.	2017-03-08 23:19:01 -08:00
Qi Wang	8721e19c04	Fix arena_prefork lock rank order for witness. When witness is enabled, lock rank order needs to be preserved during prefork, not only for each arena, but also across arenas. This change breaks arena_prefork into further stages to ensure valid rank order across arenas. Also changed test/unit/fork to use a manual arena to catch this case.	2017-03-08 23:07:27 -08:00
Qi Wang	01f47f11a6	Store associated arena in tcache. This fixes tcache_flush for manual tcaches, which wasn't able to find the correct arena it associated with. Also changed the decay test to cover this case (by using manually created arenas).	2017-03-07 12:58:11 -08:00
Jason Evans	379dd44c57	Add casts to CONF_HANDLE_T_U(). This avoids signed/unsigned comparison warnings when specifying integer constants as inputs.	2017-02-28 17:18:25 -08:00
Jason Evans	ab25d3c987	Synchronize arena->tcache_ql with arena->tcache_ql_mtx. This replaces arena->lock synchronization.	2017-02-16 09:39:46 -08:00
Jason Evans	5f11830754	Replace spin_init() with SPIN_INITIALIZER.	2017-02-08 18:50:03 -08:00
Jason Evans	1bac516aaa	Optimize compute_size_with_overflow(). Do not check for overflow unless it is actually a possibility.	2017-02-03 19:13:05 -08:00
Jason Evans	767ffa2b5f	Fix compute_size_with_overflow(). Fix compute_size_with_overflow() to use a high_bits mask that has the high bits set, rather than the low bits. This regression was introduced by `5154ff32ee` (Unify the allocation paths).	2017-02-03 19:13:05 -08:00
Jason Evans	1b6e43507e	Fix/refactor tcaches synchronization. Synchronize tcaches with tcaches_mtx rather than ctl_mtx. Add missing synchronization for tcache flushing. This bug was introduced by `1cb181ed63` (Implement explicit tcache support.), which was first released in 4.0.0.	2017-02-01 16:43:46 -08:00
David Goldblatt	85d2841818	Fix a bug in which a potentially invalid usize replaced size In the refactoring that unified the allocation paths, usize was substituted for size. This worked fine under the default test configuration, but triggered asserts when we started beefing up our CI testing. This change fixes the issue, and clarifies the comment describing the argument selection that it got wrong.	2017-01-25 15:50:59 -08:00
Tamir Duberstein	0874b648e0	Avoid redeclaring glibc's secure_getenv Avoid the name secure_getenv to avoid redeclaring secure_getenv when secure_getenv is present but its use is manually disabled via ac_cv_func_secure_getenv=no.	2017-01-25 11:24:32 -08:00
Jason Evans	c0cc5db871	Replace tabs following #define with spaces. This resolves #564.	2017-01-20 21:45:53 -08:00
Jason Evans	f408643a4c	Remove extraneous parens around return arguments. This resolves #540.	2017-01-20 21:43:07 -08:00
Jason Evans	c4c2592c83	Update brace style. Add braces around single-line blocks, and remove line breaks before function-opening braces. This resolves #537.	2017-01-20 21:43:07 -08:00
David Goldblatt	5154ff32ee	Unify the allocation paths This unifies the allocation paths for malloc, posix_memalign, aligned_alloc, calloc, memalign, valloc, and mallocx, so that they all share common code where they can. There's more work that could be done here, but I think this is the smallest discrete change in this direction.	2017-01-20 12:15:53 -08:00
Jason Evans	ffbb7dac3d	Remove leading blank lines from function bodies. This resolves #535.	2017-01-13 14:49:24 -08:00
Jason Evans	edf1bafb2b	Implement arena.<i>.destroy . Add MALLCTL_ARENAS_DESTROYED for accessing destroyed arena stats as an analogue to MALLCTL_ARENAS_ALL. This resolves #382.	2017-01-06 18:58:46 -08:00
Jason Evans	0f04bb1d6f	Rename the arenas.extend mallctl to arenas.create.	2017-01-06 18:58:45 -08:00
Jason Evans	a0dd3a4483	Implement per arena base allocators. Add/rename related mallctls: - Add stats.arenas.<i>.base . - Rename stats.arenas.<i>.metadata to stats.arenas.<i>.internal . - Add stats.arenas.<i>.resident . Modify the arenas.extend mallctl to take an optional (extent_hooks_t *) argument so that it is possible for all base allocations to be serviced by the specified extent hooks. This resolves #463.	2016-12-26 18:08:28 -08:00
Jason Evans	5234be2133	Add pthread_atfork(3) feature test. Some versions of Android provide a pthreads library without providing pthread_atfork(), so in practice a separate feature test is necessary for the latter.	2016-11-17 15:14:57 -08:00
Jason Evans	aec5a051e8	Avoid gcc type-limits warnings.	2016-11-16 18:28:38 -08:00
Jason Evans	ea9961acdb	Fix psz/pind edge cases. Add an "over-size" extent heap in which to store extents which exceed the maximum size class (plus cache-oblivious padding, if enabled). Remove psz2ind_clamp() and use psz2ind() instead so that trying to allocate the maximum size class can in principle succeed. In practice, this allows assertions to hold so that OOM errors can be successfully generated.	2016-11-03 22:33:34 -07:00
Dave Watson	712fde79fd	Check for existance of CPU_COUNT macro before using it. This resolves #485.	2016-11-02 20:05:40 -07:00
Jason Evans	1dcd0aa07f	Do not mark malloc_conf as weak on Windows. This works around malloc_conf not being properly initialized by at least the cygwin toolchain. Prior build system changes to use -Wl,--[no-]whole-archive may be necessary for malloc_conf resolution to work properly as a non-weak symbol (not tested).	2016-10-29 00:13:11 -07:00
Jason Evans	6ec2d8e279	Do not mark malloc_conf as weak for unit tests. This is generally correct (no need for weak symbols since no jemalloc library is involved in the link phase), and avoids linking problems (apparently unininitialized non-NULL malloc_conf) when using cygwin with gcc.	2016-10-28 23:03:25 -07:00
Dave Watson	8309388408	Support static linking of jemalloc with glibc glibc defines its malloc implementation with several weak and strong symbols: strong_alias (__libc_calloc, __calloc) weak_alias (__libc_calloc, calloc) strong_alias (__libc_free, __cfree) weak_alias (__libc_free, cfree) strong_alias (__libc_free, __free) strong_alias (__libc_free, free) strong_alias (__libc_malloc, __malloc) strong_alias (__libc_malloc, malloc) The issue is not with the weak symbols, but that other parts of glibc depend on __libc_malloc explicitly. Defining them in terms of jemalloc API's allows the linker to drop glibc's malloc.o completely from the link, and static linking no longer results in symbol collisions. Another wrinkle: jemalloc during initialization calls sysconf to get the number of CPU's. GLIBC allocates for the first time before setting up isspace (and other related) tables, which are used by sysconf. Instead, use the pthread API to get the number of CPUs with GLIBC, which seems to work. This resolves #442.	2016-10-28 15:08:19 -07:00
Jason Evans	b54d160dc4	Do not (recursively) allocate within tsd_fetch(). Refactor tsd so that tsdn_fetch() does not trigger allocation, since allocation could cause infinite recursion. This resolves #458.	2016-10-20 23:59:12 -07:00
Jason Evans	577d4572b0	Make dss operations lockless. Rather than protecting dss operations with a mutex, use atomic operations. This has negligible impact on synchronization overhead during typical dss allocation, but is a substantial improvement for extent_in_dss() and the newly added extent_dss_mergeable(), which can be called multiple times during extent deallocations. This change also has the advantage of avoiding tsd in deallocation paths associated with purging, which resolves potential deadlocks during thread exit due to attempted tsd resurrection. This resolves #425.	2016-10-13 15:37:00 -07:00
Jason Evans	e5effef428	Add/use adaptive spinning. Add spin_t and spin_{init,adaptive}(), which provide a simple abstraction for adaptive spinning. Adaptively spin during busy waits in bootstrapping and rtree node initialization.	2016-10-13 14:55:39 -07:00
Jason Evans	9acd5cf178	Remove all vestiges of chunks. Remove mallctls: - opt.lg_chunk - stats.cactive This resolves #464.	2016-10-12 11:55:43 -07:00
Jason Evans	63b5657aa5	Remove ratio-based purging. Make decay-based purging the default (and only) mode. Remove associated mallctls: - opt.purge - opt.lg_dirty_mult - arena.<i>.lg_dirty_mult - arenas.lg_dirty_mult - stats.arenas.<i>.lg_dirty_mult This resolves #385.	2016-10-12 10:40:27 -07:00
Qi Wang	1cb399b630	Fix arena_bind(). When tsd is not in nominal state (e.g. during thread termination), we should not increment nthreads.	2016-09-22 09:13:45 -07:00
Jason Evans	7be2ebc23f	Make tsd cleanup functions optional, remove noop cleanup functions.	2016-06-05 20:42:24 -07:00
Jason Evans	819417580e	Fix rallocx() sampling code to not eagerly commit sampler update. rallocx() for an alignment-constrained request may end up with a smaller-than-worst-case size if in-place reallocation succeeds due to serendipitous alignment. In such cases, sampling may not happen.	2016-06-05 20:42:24 -07:00
Jason Evans	a83a31c1c5	Relax opt_lg_chunk clamping constraints.	2016-06-05 20:42:24 -07:00
Jason Evans	22588dda6e	Rename most remaining chunk APIs to extent.	2016-06-05 20:42:23 -07:00
Jason Evans	0c4932eb1e	s/chunk_lookup/extent_lookup/g, s/chunks_rtree/extents_rtree/g	2016-06-05 20:42:23 -07:00
Jason Evans	9c305c9e5c	s/chunk_hook/extent_hook/g	2016-06-05 20:42:23 -07:00
Jason Evans	7d63fed0fd	Rename huge to large.	2016-06-05 20:42:23 -07:00
Jason Evans	498856f44a	Move slabs out of chunks.	2016-06-05 20:42:23 -07:00
Jason Evans	ed2c2427a7	Use huge size class infrastructure for large size classes.	2016-06-05 20:42:18 -07:00
Jason Evans	8c9be3e837	Refactor rtree to always use base_alloc() for node allocation.	2016-06-03 12:27:41 -07:00
Jason Evans	db72272bef	Use rtree-based chunk lookups rather than pointer bit twiddling. Look up chunk metadata via the radix tree, rather than using CHUNK_ADDR2BASE(). Propagate pointer's containing extent. Minimize extent lookups by doing a single lookup (e.g. in free()) and propagating the pointer's extent into nearly all the functions that may need it.	2016-06-03 12:27:41 -07:00
Jason Evans	3aea827f5e	Simplify run quantization.	2016-05-16 12:21:27 -07:00
Jason Evans	7bb00ae9d6	Refactor runs_avail. Use pszind_t size classes rather than szind_t size classes, and always reserve space for NPSIZES elements. This removes unused heaps that are not multiples of the page size, and adds (currently) unused heaps for all huge size classes, with the immediate benefit that the size of arena_t allocations is constant (no longer dependent on chunk size).	2016-05-16 12:21:21 -07:00
Jason Evans	226c446979	Implement pz2ind(), pind2sz(), and psz2u(). These compute size classes and indices similarly to size2index(), index2size() and s2u(), respectively, but using the subset of size classes that are multiples of the page size. Note that pszind_t and szind_t are not interchangeable.	2016-05-13 10:31:54 -07:00
Jason Evans	627372b459	Initialize arena_bin_info at compile time rather than at boot time. This resolves #370.	2016-05-13 10:31:30 -07:00
Jason Evans	17c021c177	Remove redzone support. This resolves #369.	2016-05-13 10:27:33 -07:00
Jason Evans	ba5c709517	Remove quarantine support.	2016-05-13 10:25:05 -07:00
Jason Evans	9a8add1510	Remove Valgrind support.	2016-05-13 09:56:18 -07:00
Jason Evans	a397045323	Use TSDN_NULL rather than NULL as appropriate.	2016-05-12 21:07:08 -07:00
Jason Evans	c1e00ef2a6	Resolve bootstrapping issues when embedded in FreeBSD libc. `b2c0d6322d` (Add witness, a simple online locking validator.) caused a broad propagation of tsd throughout the internal API, but tsd_fetch() was designed to fail prior to tsd bootstrapping. Fix this by splitting tsd_t into non-nullable tsd_t and nullable tsdn_t, and modifying all internal APIs that do not critically rely on tsd to take nullable pointers. Furthermore, add the tsd_booted_get() function so that tsdn_fetch() can probe whether tsd bootstrapping is complete and return NULL if not. All dangerous conversions of nullable pointers are tsdn_tsd() calls that assert-fail on invalid conversion.	2016-05-10 22:51:33 -07:00
Jason Evans	0c12dcabc5	Fix tsd bootstrapping for a0malloc().	2016-05-07 16:55:36 -07:00
Jason Evans	3ef51d7f73	Optimize the fast paths of calloc() and [m,d,sd]allocx(). This is a broader application of optimizations to malloc() and free() in `f4a0f32d34` (Fast-path improvement: reduce # of branches and unnecessary operations.). This resolves #321.	2016-05-06 14:37:39 -07:00
Jason Evans	c2f970c32b	Modify pages_map() to support mapping uncommitted virtual memory. If the OS overcommits: - Commit all mappings in pages_map() regardless of whether the caller requested committed memory. - Linux-specific: Specify MAP_NORESERVE to avoid unfortunate interactions with heuristic overcommit mode during fork(2). This resolves #193.	2016-05-05 18:56:17 -07:00
Jason Evans	108c4a11e9	Fix witness/fork() interactions. Fix witness to clear its list of owned mutexes in the child if platform-specific malloc_mutex code re-initializes mutexes rather than unlocking them.	2016-04-26 10:47:22 -07:00
Jason Evans	174c0c3a9c	Fix fork()-related lock rank ordering reversals.	2016-04-25 23:16:20 -07:00
Jason Evans	259f8ebbfc	Fix arena_choose_hard() regression. This regression was caused by `66cd953514` (Do not allocate metadata via non-auto arenas, nor tcaches.).	2016-04-22 22:21:31 -07:00
Jason Evans	66cd953514	Do not allocate metadata via non-auto arenas, nor tcaches. This assures that all internally allocated metadata come from the first opt_narenas arenas, i.e. the automatically multiplexed arenas.	2016-04-22 15:19:59 -07:00
Jason Evans	b2c0d6322d	Add witness, a simple online locking validator. This resolves #358.	2016-04-14 02:09:28 -07:00
Jason Evans	39f58755a7	Fix a potential tsd cleanup leak. Prior to `767d85061a` (Refactor arenas array (fixes deadlock).), it was possible under some circumstances for arena_get() to trigger recreation of the arenas cache during tsd cleanup, and the arenas cache would then be leaked. In principle a similar issue could still occur as a side effect of decay-based purging, which calls arena_tdata_get(). Fix arenas_tdata_cleanup() by setting tsd->arenas_tdata_bypass to true, so that arena_tdata_get() will gracefully fail (an expected behavior) rather than recreating tsd->arena_tdata. Reported by Christopher Ferris <cferris@google.com>.	2016-02-27 21:18:15 -08:00
Jason Evans	9d2c10f2e8	Add more HUGE_MAXCLASS overflow checks. Add HUGE_MAXCLASS overflow checks that are specific to heap profiling code paths. This fixes test failures that were introduced by `0c516a00c4` (Make *allocx() size class overflow behavior defined.).	2016-02-25 16:42:15 -08:00
Jason Evans	0c516a00c4	Make *allocx() size class overflow behavior defined. Limit supported size and alignment to HUGE_MAXCLASS, which in turn is now limited to be less than PTRDIFF_MAX. This resolves #278 and #295.	2016-02-25 15:29:49 -08:00
Jason Evans	767d85061a	Refactor arenas array (fixes deadlock). Refactor the arenas array, which contains pointers to all extant arenas, such that it starts out as a sparse array of maximum size, and use double-checked atomics-based reads as the basis for fast and simple arena_get(). Additionally, reduce arenas_lock's role such that it only protects against arena initalization races. These changes remove the possibility for arena lookups to trigger locking, which resolves at least one known (fork-related) deadlock. This resolves #315.	2016-02-24 23:58:10 -08:00
Jason Evans	9e1810ca9d	Silence miscellaneous 64-to-32-bit data loss warnings.	2016-02-24 13:03:48 -08:00
Jason Evans	0931cecbfa	Use ssize_t for readlink() rather than int.	2016-02-24 13:03:48 -08:00
Jason Evans	8f683b94a7	Make opt_narenas unsigned rather than size_t.	2016-02-24 13:03:48 -08:00
Jason Evans	9bad079039	Refactor time_* into nstime_*. Use a single uint64_t in nstime_t to store nanoseconds rather than using struct timespec. This reduces fragility around conversions between long and uint64_t, especially missing casts that only cause problems on 32-bit platforms.	2016-02-21 21:39:05 -08:00
Jason Evans	243f7a0508	Implement decay-based unused dirty page purging. This is an alternative to the existing ratio-based unused dirty page purging, and is intended to eventually become the sole purging mechanism. Add mallctls: - opt.purge - opt.decay_time - arena.<i>.decay - arena.<i>.decay_time - arenas.decay_time - stats.arenas.<i>.decay_time This resolves #325.	2016-02-19 20:56:21 -08:00
Jason Evans	db927b6727	Refactor arenas_cache tsd. Refactor arenas_cache tsd into arenas_tdata, which is a structure of type arena_tdata_t.	2016-02-19 20:32:37 -08:00
Jason Evans	f829009929	Add --with-malloc-conf. Add --with-malloc-conf, which makes it possible to embed a default options string during configuration.	2016-02-19 20:29:06 -08:00
Cosmin Paraschiv	9cb481a73f	Call malloc_test_boot0() from malloc_init_hard_recursible(). When using LinuxThreads, malloc bootstrapping deadlocks, since malloc_tsd_boot0() ends up calling pthread_setspecific(), which causes recursive allocation. Fix it by moving the malloc_tsd_boot0() call to malloc_init_hard_recursible(). The deadlock was introduced by `8bb3198f72` (Refactor/fix arenas manipulation.), when tsd_boot() was split and the top half, tsd_boot0(), got an extra tsd_wrapper_set() call.	2016-01-11 11:10:39 -08:00
Qi Wang	f4a0f32d34	Fast-path improvement: reduce # of branches and unnecessary operations. - Combine multiple runtime branches into a single malloc_slow check. - Avoid calling arena_choose / size2index / index2size on fast path. - A few micro optimizations.	2015-11-10 14:28:34 -08:00
Jason Evans	21523297fc	Add mallocx() OOM tests.	2015-09-17 15:27:28 -07:00
Jason Evans	3263be6efb	Simplify imallocx_prof_sample(). Simplify imallocx_prof_sample() to always operate on usize rather than sometimes using size. This avoids redundant usize computations and more closely fits the style adopted by i[rx]allocx_prof_sample() to fix sampling bugs.	2015-09-17 10:19:28 -07:00
Jason Evans	4be9c79f88	Fix irallocx_prof_sample(). Fix irallocx_prof_sample() to always allocate large regions, even when alignment is non-zero.	2015-09-17 10:17:55 -07:00
Jason Evans	38e2c8fa9c	Fix ixallocx_prof_sample(). Fix ixallocx_prof_sample() to never modify nor create sampled small allocations. xallocx() is in general incapable of moving small allocations, so this fix removes buggy code without loss of generality.	2015-09-17 10:05:56 -07:00
Jason Evans	9a505b768c	Centralize xallocx() size[+extra] overflow checks.	2015-09-15 14:39:58 -07:00
Jason Evans	8c485b02a6	Fix ixallocx_prof() to check for size greater than HUGE_MAXCLASS.	2015-09-15 00:51:09 -07:00
Jason Evans	708ed79834	Resolve an unsupported special case in arena_prof_tctx_set(). Add arena_prof_tctx_reset() and use it instead of arena_prof_tctx_set() when resetting the tctx pointer during reallocation, which happens whenever an originally sampled reallocated object is not sampled during reallocation. This regression was introduced by `594c759f37` (Optimize arena_prof_tctx_set().)	2015-09-14 23:57:58 -07:00
Jason Evans	23f6e103c8	Fix ixallocx_prof_sample() argument order reversal. Fix ixallocx_prof() to pass usize_max and zero to ixallocx_prof_sample() in the correct order.	2015-09-14 23:57:09 -07:00
Jason Evans	ce9a4e3479	s/max_usize/usize_max/g	2015-09-14 23:55:54 -07:00
Jason Evans	d9704042ee	s/oldptr/old_ptr/g	2015-09-14 23:55:54 -07:00
Jason Evans	cec0d63d8b	Make one call to prof_active_get_unlocked() per allocation event. Make one call to prof_active_get_unlocked() per allocation event, and use the result throughout the relevant functions that handle an allocation event. Also add a missing check in prof_realloc(). These fixes protect allocation events against concurrent prof_active changes.	2015-09-14 23:55:48 -07:00
Jason Evans	ef363de701	Fix irealloc_prof() to prof_alloc_rollback() on OOM.	2015-09-14 23:54:42 -07:00
Jason Evans	46ff049128	Optimize irallocx_prof() to optimistically update the sampler state.	2015-09-14 22:47:18 -07:00
Jason Evans	4acb6c7ff3	Fix ixallocx_prof() size+extra overflow. Fix ixallocx_prof() to clamp the extra parameter if size+extra would overflow HUGE_MAXCLASS.	2015-09-14 22:47:12 -07:00
Mike Hommey	0a116faf95	Force initialization of the init_lock in malloc_init_hard on Windows XP This resolves #269.	2015-09-04 10:35:20 -07:00
Jason Evans	30949da601	Fix arenas_cache_cleanup() and arena_get_hard(). Fix arenas_cache_cleanup() and arena_get_hard() to handle allocation/deallocation within the application's thread-specific data cleanup functions even after arenas_cache is torn down. This is a more general fix that complements `45e9f66c28` (Fix arenas_cache_cleanup().).	2015-08-27 20:32:35 -07:00
Christopher Ferris	45e9f66c28	Fix arenas_cache_cleanup(). Fix arenas_cache_cleanup() to handle allocation/deallocation within the application's thread-specific data cleanup functions even after arenas_cache is torn down.	2015-08-21 12:33:17 -07:00
Matthijs	c1a6a51e40	MSVC compatibility changes - Decorate public function with __declspec(allocator) and __declspec(restrict), just like MSVC 1900 - Support JEMALLOC_HAS_RESTRICT by defining the restrict keyword - Move __declspec(nothrow) between 'void' and '*' so it compiles once more	2015-08-04 09:01:48 -07:00
Jason Evans	00632609df	Move JEMALLOC_NOTHROW just after return type. Only use __declspec(nothrow) in C++ mode. This resolves #244.	2015-07-21 08:21:13 -07:00
Mike Hommey	50cd636eed	Remove JEMALLOC_ALLOC_SIZE annotations on functions not returning pointers As per gcc documentation: The alloc_size attribute is used to tell the compiler that the function return value points to memory (...) This resolves #245.	2015-07-21 09:16:07 +09:00
Jason Evans	ae93d6bf36	Avoid function prototype incompatibilities. Add various function attributes to the exported functions to give the compiler more information to work with during optimization, and also specify throw() when compiling with C++ on Linux, in order to adequately match what __THROW does in glibc. This resolves #237.	2015-07-10 16:09:40 -07:00
Matthijs	a1aaf949a5	Optimizations for Windows - Set opt_lg_chunk based on run-time OS setting - Verify LG_PAGE is compatible with run-time OS setting - When targeting Windows Vista or newer, use SRWLOCK instead of CRITICAL_SECTION - When targeting Windows Vista or newer, statically initialize init_lock	2015-06-25 22:53:58 +02:00
Jason Evans	241abc601b	Fix size class overflow handling when profiling is enabled. Fix size class overflow handling for malloc(), posix_memalign(), memalign(), calloc(), and realloc() when profiling is enabled. Remove an assertion that erroneously caused arena_sdalloc() to fail when profiling was enabled. This resolves #232.	2015-06-23 18:56:14 -07:00
Jason Evans	dc0610a714	Add alignment assertions to public aligned allocation functions.	2015-06-22 18:48:58 -07:00
Jason Evans	8a03cf039c	Implement cache index randomization for large allocations. Extract szad size quantization into {extent,run}_quantize(), and . quantize szad run sizes to the union of valid small region run sizes and large run sizes. Refactor iteration in arena_run_first_fit() to use run_quantize{,_first,_next(), and add support for padded large runs. For large allocations that have no specified alignment constraints, compute a pseudo-random offset from the beginning of the first backing page that is a multiple of the cache line size. Under typical configurations with 4-KiB pages and 64-byte cache lines this results in a uniform distribution among 64 page boundary offsets. Add the --disable-cache-oblivious option, primarily intended for performance testing. This resolves #13.	2015-05-06 13:27:39 -07:00
Igor Podlesny	95e88de0aa	Concise JEMALLOC_HAVE_ISSETUGID case in secure_getenv().	2015-04-30 11:48:56 -07:00
Jason Evans	e0a08a1496	Restore --enable-ivsalloc. However, unlike before it was removed do not force --enable-ivsalloc when Darwin zone allocator integration is enabled, since the zone allocator code uses ivsalloc() regardless of whether malloc_usable_size() and sallocx() do. This resolves #211.	2015-03-18 21:06:58 -07:00
Jason Evans	b01186cebd	Remove redundant tcache_boot() call.	2015-02-15 14:04:55 -08:00
Jason Evans	cbf3a6d703	Move centralized chunk management into arenas. Migrate all centralized data structures related to huge allocations and recyclable chunks into arena_t, so that each arena can manage huge allocations and recyclable virtual memory completely independently of other arenas. Add chunk node caching to arenas, in order to avoid contention on the base allocator. Use chunks_rtree to look up huge allocations rather than a red-black tree. Maintain a per arena unsorted list of huge allocations (which will be needed to enumerate huge allocations during arena reset). Remove the --enable-ivsalloc option, make ivsalloc() always available, and use it for size queries if --enable-debug is enabled. The only practical implications to this removal are that 1) ivsalloc() is now always available during live debugging (and the underlying radix tree is available during core-based debugging), and 2) size query validation can no longer be enabled independent of --enable-debug. Remove the stats.chunks.{current,total,high} mallctls, and replace their underlying statistics with simpler atomically updated counters used exclusively for gdump triggering. These statistics are no longer very useful because each arena manages chunks independently, and per arena statistics provide similar information. Simplify chunk synchronization code, now that base chunk allocation cannot cause recursive lock acquisition.	2015-02-12 00:15:56 -08:00
Jason Evans	1cb181ed63	Implement explicit tcache support. Add the MALLOCX_TCACHE() and MALLOCX_TCACHE_NONE macros, which can be used in conjunction with the *allocx() API. Add the tcache.create, tcache.flush, and tcache.destroy mallctls. This resolves #145.	2015-02-09 17:44:48 -08:00
Jason Evans	4581b97809	Implement metadata statistics. There are three categories of metadata: - Base allocations are used for bootstrap-sensitive internal allocator data structures. - Arena chunk headers comprise pages which track the states of the non-metadata pages. - Internal allocations differ from application-originated allocations in that they are for internal use, and that they are omitted from heap profiles. The metadata statistics comprise the metadata categories as follows: - stats.metadata: All metadata -- base + arena chunk headers + internal allocations. - stats.arenas.<i>.metadata.mapped: Arena chunk headers. - stats.arenas.<i>.metadata.allocated: Internal allocations. This is reported separately from the other metadata statistics because it overlaps with the allocated and active statistics, whereas the other metadata statistics do not. Base allocations are not reported separately, though their magnitude can be computed by subtracting the arena-specific metadata. This resolves #163.	2015-01-23 23:34:43 -08:00
Jason Evans	10aff3f3e1	Refactor bootstrapping to delay tsd initialization. Refactor bootstrapping to delay tsd initialization, primarily to support integration with FreeBSD's libc. Refactor a0() for internal-only use, and add the bootstrap_{malloc,calloc,free}() API for use by FreeBSD's libc. This separation limits use of the a0() functions to metadata allocation, which doesn't require malloc/calloc/free API compatibility. This resolves #170.	2015-01-22 14:04:27 -08:00
Jason Evans	bc96876f99	Fix arenas_cache_cleanup(). Fix arenas_cache_cleanup() to check whether arenas_cache is NULL before deallocation, rather than checking arenas.	2015-01-22 14:02:56 -08:00
Jason Evans	44b57b8e8b	Fix OOM handling in memalign() and valloc(). Fix memalign() and valloc() to heed imemalign()'s return value. Reported by Kurt Wampler.	2015-01-16 18:04:17 -08:00
Guilherme Goncalves	2c5cb613df	Introduce two new modes of junk filling: "alloc" and "free". In addition to true/false, opt.junk can now be either "alloc" or "free", giving applications the possibility of junking memory only on allocation or deallocation. This resolves #172.	2014-12-14 17:07:26 -08:00
Daniel Micay	b74041fb6e	Ignore MALLOC_CONF in set{uid,gid,cap} binaries. This eliminates the malloc tunables as tools for an attacker. Closes #173	2014-12-14 15:36:15 -08:00
Jason Evans	e12eaf93dc	Style and spelling fixes.	2014-12-08 16:34:04 -08:00
Daniel Micay	dc65213111	rm unused arena wrangling from xallocx It has no use for the arena_t since unlike rallocx it never makes a new memory allocation. It's just an unused parameter in ixalloc_helper.	2014-10-30 23:19:34 -07:00
Jason Evans	cfc5706f69	Miscellaneous cleanups.	2014-10-30 23:18:45 -07:00
Daniel Micay	d33f834591	avoid redundant chunk header reads * use sized deallocation in iralloct_realign * iralloc and ixalloc always need the old size, so pass it in from the caller where it's often already calculated	2014-10-30 17:06:38 -07:00
Daniel Micay	809b0ac391	mark huge allocations as unlikely This cleans up the fast path a bit more by moving away more code.	2014-10-30 17:06:38 -07:00
Jason Evans	81e547566e	Add --with-lg-tiny-min, generalize --with-lg-quantum.	2014-10-10 22:35:07 -07:00
Jason Evans	9b75677e53	Don't fetch tsd in a0{d,}alloc(). Don't fetch tsd in a0{d,}alloc(), because doing so can cause infinite recursion on systems that require an allocated tsd wrapper.	2014-10-10 18:19:20 -07:00
Jason Evans	fc0b3b7383	Add configure options. Add: --with-lg-page --with-lg-page-sizes --with-lg-size-class-group --with-lg-quantum Get rid of STATIC_PAGE_SHIFT, in favor of directly setting LG_PAGE. Fix various edge conditions exposed by the configure options.	2014-10-09 22:44:37 -07:00
Jason Evans	3a8b9b1fd9	Fix a recursive lock acquisition regression. Fix a recursive lock acquisition regression, which was introduced by `8bb3198f72` (Refactor/fix arenas manipulation.).	2014-10-08 00:54:16 -07:00
Daniel Micay	f22214a29d	Use regular arena allocation for huge tree nodes. This avoids grabbing the base mutex, as a step towards fine-grained locking for huge allocations. The thread cache also provides a tiny (~3%) improvement for serial huge allocations.	2014-10-07 23:57:09 -07:00
Jason Evans	8bb3198f72	Refactor/fix arenas manipulation. Abstract arenas access to use arena_get() (or a0get() where appropriate) rather than directly reading e.g. arenas[ind]. Prior to the addition of the arenas.extend mallctl, the worst possible outcome of directly accessing arenas was a stale read, but arenas.extend may allocate and assign a new array to arenas. Add a tsd-based arenas_cache, which amortizes arenas reads. This introduces some subtle bootstrapping issues, with tsd_boot() now being split into tsd_boot[01]() to support tsd wrapper allocation bootstrapping, as well as an arenas_cache_bypass tsd variable which dynamically terminates allocation of arenas_cache itself. Promote a0malloc(), a0calloc(), and a0free() to be generally useful for internal allocation, and use them in several places (more may be appropriate). Abstract arena->nthreads management and fix a missing decrement during thread destruction (recent tsd refactoring left arenas_cleanup() unused). Change arena_choose() to propagate OOM, and handle OOM in all callers. This is important for providing consistent allocation behavior when the MALLOCX_ARENA() flag is being used. Prior to this fix, it was possible for an OOM to result in allocation silently allocating from a different arena than the one specified.	2014-10-07 23:14:57 -07:00
Jason Evans	155bfa7da1	Normalize size classes. Normalize size classes to use the same number of size classes per size doubling (currently hard coded to 4), across the intire range of size classes. Small size classes already used this spacing, but in order to support this change, additional small size classes now fill [4 KiB .. 16 KiB). Large size classes range from [16 KiB .. 4 MiB). Huge size classes now support non-multiples of the chunk size in order to fill (4 MiB .. 16 MiB).	2014-10-06 01:45:13 -07:00
Jason Evans	0800afd03f	Silence a compiler warning.	2014-10-04 14:59:17 -07:00
Jason Evans	029d44cf8b	Fix tsd cleanup regressions. Fix tsd cleanup regressions that were introduced in `5460aa6f66` (Convert all tsd variables to reside in a single tsd structure.). These regressions were twofold: 1) tsd_tryget() should never (and need never) return NULL. Rename it to tsd_fetch() and simplify all callers. 2) tsd__set() must only be called when tsd is in the nominal state, because cleanup happens during the nominal-->purgatory transition, and re-initialization must not happen while in the purgatory state. Add tsd_nominal() and use it as needed. Note that tsd_{p,}_get() can still be used as long as no re-initialization that would require cleanup occurs. This means that e.g. the thread_allocated counter can be updated unconditionally.	2014-10-04 11:22:55 -07:00
Jason Evans	fc12c0b8bc	Implement/test/fix prof-related mallctl's. Implement/test/fix the opt.prof_thread_active_init, prof.thread_active_init, and thread.prof.active mallctl's. Test/fix the thread.prof.name mallctl. Refactor opt_prof_active to be read-only and move mutable state into the prof_active variable. Stop leaning on ctl-related locking for protection.	2014-10-03 23:25:30 -07:00
Jason Evans	551ebc4364	Convert to uniform style: cond == false --> !cond	2014-10-03 10:16:09 -07:00
Dave Rigby	e3a16fce5e	Mark malloc_conf as a weak symbol This fixes issue #113 - je_malloc_conf is not respected on OS X	2014-09-29 15:05:55 -07:00
Jason Evans	5460aa6f66	Convert all tsd variables to reside in a single tsd structure.	2014-09-23 02:36:08 -07:00
Jason Evans	c3e9e7b041	Fix irallocx_prof() sample logic. Fix irallocx_prof() sample logic to only update the threshold counter after it knows what size the allocation ended up being. This regression was caused by `6e73dc194e` (Fix a profile sampling race.), which did not make it into any releases prior to this fix.	2014-09-11 17:04:03 -07:00
Jason Evans	9c640bfdd4	Apply likely()/unlikely() to allocation/deallocation fast paths.	2014-09-11 17:01:58 -07:00
Jason Evans	91566fc079	Fix mallocx() to always honor MALLOCX_ARENA() when profiling.	2014-09-11 13:15:33 -07:00
Daniel Micay	23fdf8b359	mark some conditions as unlikely * assertion failure * malloc_init failure * malloc not already initialized (in malloc_init) * running in valgrind * thread cache disabled at runtime Clang and GCC already consider a comparison with NULL or -1 to be cold, so many branches (out-of-memory) are already correctly considered as cold and marking them is not important.	2014-09-10 21:49:42 -04:00
Jason Evans	6e73dc194e	Fix a profile sampling race. Fix a profile sampling race that was due to preparing to sample, yet doing nothing to assure that the context remains valid until the stats are updated. These regressions were caused by `602c8e0971` (Implement per thread heap profiling.), which did not make it into any releases prior to these fixes.	2014-09-09 19:47:09 -07:00
Jason Evans	a2260c95cd	Fix sdallocx() assertion. Refactor sdallocx() and nallocx() to share inallocx(), and fix an sdallocx() assertion to check usize rather than size.	2014-09-09 10:39:15 -07:00
Daniel Micay	4cfe55166e	Add support for sized deallocation. This adds a new `sdallocx` function to the external API, allowing the size to be passed by the caller. It avoids some extra reads in the thread cache fast path. In the case where stats are enabled, this avoids the work of calculating the size from the pointer. An assertion validates the size that's passed in, so enabling debugging will allow users of the API to debug cases where an incorrect size is passed in. The performance win for a contrived microbenchmark doing an allocation and immediately freeing it is ~10%. It may have a different impact on a real workload. Closes #28	2014-09-08 17:34:24 -07:00
Jason Evans	b718cf77e9	Optimize [nmd]alloc() fast paths. Optimize [nmd]alloc() fast paths such that the (flags == 0) case is streamlined, flags decoding only happens to the minimum degree necessary, and no conditionals are repeated.	2014-09-07 14:40:19 -07:00
Sara Golemon	3e24afa28e	Test for availability of malloc hooks via autoconf __*_hook() is glibc, but on at least one glibc platform (homebrew), the __GLIBC__ define isn't set correctly and we miss being able to use these hooks. Do a feature test for it during configuration so that we enable it anywhere the hooks are actually available.	2014-08-22 15:19:21 -07:00
Jason Evans	602c8e0971	Implement per thread heap profiling. Rename data structures (prof_thr_cnt_t-->prof_tctx_t, prof_ctx_t-->prof_gctx_t), and convert to storing a prof_tctx_t for sampled objects. Convert PROF_ALLOC_PREP() to prof_alloc_prep(), since precise backtrace depth within jemalloc functions is no longer an issue (pprof prunes irrelevant frames). Implement mallctl's: - prof.reset implements full sample data reset, and optional change of sample interval. - prof.lg_sample reads the current sample interval (opt.lg_prof_sample was the permanent source of truth prior to prof.reset). - thread.prof.name provides naming capability for threads within heap profile dumps. - thread.prof.active makes it possible to activate/deactivate heap profiling for individual threads. Modify the heap dump files to contain per thread heap profile data. This change is incompatible with the existing pprof, which will require enhancements to read and process the enriched data.	2014-08-19 21:31:16 -07:00
Richard Diamond	94ed6812bc	Don't catch fork()ing events for Native Client. Native Client doesn't allow forking, thus there is no need to catch fork()ing events for Native Client. Additionally, without this commit, jemalloc will introduce an unresolved pthread_atfork() in PNaCl Rust bins.	2014-06-02 07:45:33 -07:00
Jason Evans	e2deab7a75	Refactor huge allocation to be managed by arenas. Refactor huge allocation to be managed by arenas (though the global red-black tree of huge allocations remains for lookup during deallocation). This is the logical conclusion of recent changes that 1) made per arena dss precedence apply to huge allocation, and 2) made it possible to replace the per arena chunk allocation/deallocation functions. Remove the top level huge stats, and replace them with per arena huge stats. Normalize function names and types to dalloc (some were dealloc). Remove the --enable-mremap option. As jemalloc currently operates, this is a performace regression for some applications, but planned work to logarithmically space huge size classes should provide similar amortized performance. The motivation for this change was that mremap-based huge reallocation forced leaky abstractions that prevented refactoring.	2014-05-15 22:36:41 -07:00
aravind	fb7fe50a88	Add support for user-specified chunk allocators/deallocators. Add new mallctl endpoints "arena<i>.chunk.alloc" and "arena<i>.chunk.dealloc" to allow userspace to configure jemalloc's chunk allocator and deallocator on a per-arena basis.	2014-05-12 10:46:03 -07:00
Jason Evans	a344dd01c7	Fix coding sytle nits.	2014-05-01 15:51:30 -07:00
Jason Evans	6f001059aa	Simplify backtracing. Simplify backtracing to not ignore any frames, and compensate for this in pprof in order to increase flexibility with respect to function-based refactoring even in the presence of non-deterministic inlining. Modify pprof to blacklist all jemalloc allocation entry points including non-standard ones like mallocx(), and ignore all allocator-internal frames. Prior to this change, pprof excluded the specifically blacklisted functions from backtraces, but it left allocator-internal frames intact.	2014-04-22 20:55:09 -07:00
Jason Evans	bd87b01999	Optimize Valgrind integration. Forcefully disable tcache if running inside Valgrind, and remove Valgrind calls in tcache-specific code. Restructure Valgrind-related code to move most Valgrind calls out of the fast path functions. Take advantage of static knowledge to elide some branches in JEMALLOC_VALGRIND_REALLOC().	2014-04-15 16:49:57 -07:00
Jason Evans	ecd3e59ca3	Remove the "opt.valgrind" mallctl. Remove the "opt.valgrind" mallctl because it is unnecessary -- jemalloc automatically detects whether it is running inside valgrind.	2014-04-15 14:33:50 -07:00
Jason Evans	9790b9667f	Remove the allocm() API, which is superceded by the allocx() API.	2014-04-14 22:32:31 -07:00
Jason Evans	9b0cbf0850	Remove support for non-prof-promote heap profiling metadata. Make promotion of sampled small objects to large objects mandatory, so that profiling metadata can always be stored in the chunk map, rather than requiring one pointer per small region in each small-region page run. In practice the non-prof-promote code was only useful when using jemalloc to track all objects and report them as leaks at program exit. However, Valgrind is at least as good a tool for this particular use case. Furthermore, the non-prof-promote code is getting in the way of some optimizations that will make heap profiling much cheaper for the predominant use case (sampling a small representative proportion of all allocations).	2014-04-11 14:24:51 -07:00
Ben Maurer	be8e59f5a6	Don't dereference chunk->arena in free() hot path When you call free() we load chunk->arena even though that data isn't used on the tcache hot path. In profiling some FB applications, I found that ~30% of the dTLB misses in the free() function come from this line. With 4 MB chunks, the arena_chunk_t->map is ~ 32 KB (1024 pages in the chunk, 4 8 byte pointers in arena_chunk_map_t). This means there's only a 1/8 chance of the page containing chunk->arena also comtaining the map bits.	2014-04-05 15:59:08 -07:00
Max Wang	fbb31029a5	Use arena dss prec instead of default for huge allocs. Pass a dss_prec_t parameter to huge_{m,p,r}alloc instead of defaulting to the chunk dss prec.	2014-03-28 13:43:58 -07:00
Jason Evans	e2206edebc	Fix unused variable warnings.	2014-01-21 14:59:13 -08:00
Jason Evans	b2c31660be	Extract profiling code from [re]allocation functions. Extract profiling code from malloc(), imemalign(), calloc(), realloc(), mallocx(), rallocx(), and xallocx(). This slightly reduces the amount of code compiled into the fast paths, but the primary benefit is the combinatorial complexity reduction. Simplify iralloc[t]() by creating a separate ixalloc() that handles the no-move cases. Further simplify [mrxn]allocx() (and by implication [mrn]allocm()) to make request size overflows due to size class and/or alignment constraints trigger undefined behavior (detected by debug-only assertions). Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling backtrace creation in imemalign(). This bug impacted posix_memalign() and aligned_alloc().	2014-01-12 15:41:05 -08:00
Jason Evans	0405312921	Fix an uninitialized variable read in xallocx().	2013-12-20 15:52:01 -08:00
Jason Evans	665769357c	Optimize arena_prof_ctx_set(). Refactor such that arena_prof_ctx_set() receives usize as an argument, and use it to determine whether to handle ptr as a small region, rather than reading the chunk page map.	2013-12-15 21:57:02 -08:00
Jason Evans	d82a5e6a34	Implement the allocx() API. Implement the allocx() API, which is a successor to the allocm() API. The allocx() functions are slightly simpler to use because they have fewer parameters, they directly return the results of primary interest, and mallocx()/rallocx() avoid the strict aliasing pitfall that allocm()/rallocx() share with posix_memalign(). The following code violates strict aliasing rules: foo_t foo; allocm((void )&foo, NULL, 42, 0); whereas the following is safe: foo_t foo; void p; allocm(&p, NULL, 42, 0); foo = (foo_t )p; mallocx() does not have this problem: foo_t foo = (foo_t )mallocx(42, 0);	2013-12-12 22:35:52 -08:00
Jason Evans	7369232544	Silence some unused variable warnings.	2013-12-10 13:51:52 -08:00
Jason Evans	52b30691f9	Remove unused variable.	2013-12-02 15:16:39 -08:00
Jason Evans	addad093f8	Clean up malloc_ncpus(). Clean up malloc_ncpus() by replacing incorrectly indented if..else branches with a ?: expression. Submitted by Igor Podlesny.	2013-11-29 16:19:44 -08:00
Jason Evans	39e7fd0580	Fix ALLOCM_ARENA(a) handling in rallocm(). Fix rallocm() to use the specified arena for allocation, not just deallocation. Clarify ALLOCM_ARENA(a) documentation.	2013-11-25 18:02:35 -08:00
Leonard Crestez	ac4403cacb	Delay pthread_atfork registering. This function causes recursive allocation on LinuxThreads. Signed-off-by: Crestez Dan Leonard <lcrestez@ixiacom.com>	2013-10-24 16:40:31 -07:00
Jason Evans	1d1cee127a	Add a missing mutex unlock in malloc_init_hard() error path. Add a missing mutex unlock in a malloc_init_hard() error path (failed mutex initialization). In practice this bug was very unlikely to ever trigger, but if it did, application deadlock would likely result. Reported by Pat Lynch.	2013-10-21 15:04:12 -07:00
Jason Evans	e2985a2381	Avoid (x < 0) comparison for unsigned x. Avoid (min < 0) comparison for unsigned min in malloc_conf_init(). This bug had no practical consequences. Reported by Pat Lynch.	2013-10-21 15:01:44 -07:00
Jason Evans	6556e28be1	Prefer not_reached() over assert(false) where appropriate.	2013-10-21 14:56:27 -07:00
Jason Evans	543abf7e6c	Fix inlining warning. Add the JEMALLOC_ALWAYS_INLINE_C macro and use it for always-inlined functions declared in .c files. This fixes a function attribute inconsistency for debug builds that resulted in (harmless) compiler warnings about functions not being inlinable. Reported by Ricardo Nabinger Sanchez.	2013-10-19 17:26:00 -07:00
Alexandre Perrin	dd6ef0302f	malloc_conf_init: revert errno value when readlink(2) fail.	2013-10-13 15:33:15 -07:00
Jason Evans	88c222c8e9	Fix a prof-related locking order bug. Fix a locking order bug that could cause deadlock during fork if heap profiling were enabled.	2013-02-06 11:59:30 -08:00
Jason Evans	bbe29d374d	Fix potential TLS-related memory corruption. Avoid writing to uninitialized TLS as a side effect of deallocation. Initializing TLS during deallocation is unsafe because it is possible that a thread never did any allocation, and that TLS has already been deallocated by the threads library, resulting in write-after-free corruption. These fixes affect prof_tdata and quarantine; all other uses of TLS are already safe, whether intentionally (as for tcache) or unintentionally (as for arenas).	2013-01-31 14:23:48 -08:00
Jason Evans	d1b6e18a99	Revert opt_abort and opt_junk refactoring. Revert refactoring of opt_abort and opt_junk declarations. clang accepts the config_*-based declarations (and generates correct code), but gcc complains with: error: initializer element is not constant	2013-01-22 16:54:26 -08:00
Jason Evans	ba175a2bfb	Use config_* instead of JEMALLOC_. Convert a couple of stragglers from JEMALLOC_ to use config_*.	2013-01-22 12:14:45 -08:00
Jason Evans	88393cb0eb	Add and use JEMALLOC_ALWAYS_INLINE. Add JEMALLOC_ALWAYS_INLINE and use it to guarantee that the entire fast paths of the primary allocation/deallocation functions are inlined.	2013-01-22 08:45:43 -08:00
Garrett Cooper	6e6164ae15	Don't mangle errno with free(3) if utrace(2) fails This ensures POLA on FreeBSD (at least) as free(3) is generally assumed to not fiddle around with errno. Signed-off-by: Garrett Cooper <yanegomi@gmail.com>	2012-12-24 10:30:57 -08:00
Jason Evans	1bf2743e08	Add clipping support to lg_chunk option processing. Modify processing of the lg_chunk option so that it clips an out-of-range input to the edge of the valid range. This makes it possible to request the minimum possible chunk size without intimate knowledge of allocator internals. Submitted by Ian Lepore (see FreeBSD PR bin/174641).	2012-12-23 08:51:48 -08:00
Jason Evans	609ae595f0	Add arena-specific and selective dss allocation. Add the "arenas.extend" mallctl, so that it is possible to create new arenas that are outside the set that jemalloc automatically multiplexes threads onto. Add the ALLOCM_ARENA() flag for {,r,d}allocm(), so that it is possible to explicitly allocate from a particular arena. Add the "opt.dss" mallctl, which controls the default precedence of dss allocation relative to mmap allocation. Add the "arena.<i>.dss" mallctl, which makes it possible to set the default dss precedence on a per arena or global basis. Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge". Add the "stats.arenas.<i>.dss" mallctl.	2012-10-12 18:26:16 -07:00
Jason Evans	2cc11ff837	Make malloc_usable_size() implementation consistent with prototype. Use JEMALLOC_USABLE_SIZE_CONST for the malloc_usable_size() implementation as well as the prototype, for consistency's sake.	2012-10-09 16:29:21 -07:00
Jason Evans	b5225928fe	Fix fork(2)-related mutex acquisition order. Fix mutex acquisition order inversion for the chunks rtree and the base mutex. Chunks rtree acquisition was introduced by the previous commit, so this bug was short-lived.	2012-10-09 16:16:00 -07:00
Jason Evans	20f1fc95ad	Fix fork(2)-related deadlocks. Add a library constructor for jemalloc that initializes the allocator. This fixes a race that could occur if threads were created by the main thread prior to any memory allocation, followed by fork(2), and then memory allocation in the child process. Fix the prefork/postfork functions to acquire/release the ctl, prof, and rtree mutexes. This fixes various fork() child process deadlocks, but one possible deadlock remains (intentionally) unaddressed: prof backtracing can acquire runtime library mutexes, so deadlock is still possible if heap profiling is enabled during fork(). This deadlock is known to be a real issue in at least the case of libgcc-based backtracing. Reported by tfengjun.	2012-10-09 15:21:46 -07:00
Corey Richardson	1d553f72cb	If sysconf() fails, the number of CPUs is reported as UINT_MAX, not 1 as it should be	2012-10-08 15:45:38 -07:00
Jason Evans	5c710cee78	Remove const from ___hook variable declarations. Remove const from ___hook variable declarations, so that glibc can modify them during process forking.	2012-05-23 16:09:22 -07:00
Jason Evans	174b70efb4	Disable tcache by default if running inside Valgrind. Disable tcache by default if running inside Valgrind, in order to avoid making unallocated objects appear reachable to Valgrind.	2012-05-15 23:31:53 -07:00
Jason Evans	781fe75e0a	Auto-detect whether running inside Valgrind. Auto-detect whether running inside Valgrind, thus removing the need to manually specify MALLOC_CONF=valgrind:true.	2012-05-15 14:48:14 -07:00
Jason Evans	58ad1e4956	Return early in _malloc_{pre,post}fork() if uninitialized. Avoid mutex operations in _malloc_{pre,post}fork() unless jemalloc has been initialized. Reported by David Xu.	2012-05-11 17:40:16 -07:00
Mike Hommey	fd97b1dfc7	Add support for MSVC Tested with MSVC 8 32 and 64 bits.	2012-05-01 11:32:11 -07:00
Mike Hommey	da99e31105	Replace JEMALLOC_ATTR with various different macros when it makes sense Theses newly added macros will be used to implement the equivalent under MSVC. Also, move the definitions to headers, where they make more sense, and for some, are even more useful there (e.g. malloc).	2012-04-30 17:57:31 -07:00
Mike Hommey	a14bce85e8	Use Get/SetLastError on Win32 Using errno on win32 doesn't quite work, because the value set in a shared library can't be read from e.g. an executable calling the function setting errno. At the same time, since buferror always uses errno/GetLastError, don't pass it.	2012-04-30 16:50:55 -07:00
Jason Evans	3fb50b0407	Fix a PROF_ALLOC_PREP() error path. Fix a PROF_ALLOC_PREP() error path to initialize the return value to NULL.	2012-04-25 13:13:44 -07:00
Jason Evans	8694e2e7b9	Silence compiler warnings.	2012-04-23 13:05:32 -07:00
Mike Hommey	a19e87fbad	Add support for Mingw	2012-04-21 21:27:46 -07:00
Jason Evans	a8f8d7540d	Remove mmap_unaligned. Remove mmap_unaligned, which was used to heuristically decide whether to optimistically call mmap() in such a way that could reduce the total number of system calls. If I remember correctly, the intention of mmap_unaligned was to avoid always executing the slow path in the presence of ASLR. However, that reasoning seems to have been based on a flawed understanding of how ASLR actually works. Although ASLR apparently causes mmap() to ignore address requests, it does not cause total placement randomness, so there is a reasonable expectation that iterative mmap() calls will start returning chunk-aligned mappings once the first chunk has been properly aligned.	2012-04-21 19:17:21 -07:00
Jason Evans	606f1fdc3c	Put CONF_HANDLE_() keys in quotes. Put CONF_HANDLE_() keys in quotes, so that they aren't mangled when --with-private-namespace is used.	2012-04-20 21:39:14 -07:00
Jason Evans	86e58583bb	Make special FreeBSD function overrides visible. Make special FreeBSD libc/libthr function overrides for _malloc_prefork(), _malloc_postfork(), and _malloc_thread_cleanup() visible.	2012-04-18 19:01:00 -07:00
Jason Evans	0b25fe79aa	Update prof defaults to match common usage. Change the "opt.lg_prof_sample" default from 0 to 19 (1 B to 512 KiB). Change the "opt.prof_accum" default from true to false. Add the "opt.prof_final" mallctl, so that "opt.prof_prefix" need not be abused to disable final profile dumping.	2012-04-17 16:39:33 -07:00
Jason Evans	7ca0fdfb85	Disable munmap() if it causes VM map holes. Add a configure test to determine whether common mmap()/munmap() patterns cause VM map holes, and only use munmap() to discard unused chunks if the problem does not exist. Unify the chunk caching for mmap and dss. Fix options processing to limit lg_chunk to be large enough that redzones will always fit.	2012-04-12 20:20:58 -07:00
Jason Evans	d6abcbb14b	Always disable redzone by default. Always disable redzone by default, even when --enable-debug is specified. The memory overhead for redzones can be substantial, which makes this feature something that should only be opted into.	2012-04-12 17:09:54 -07:00
Mike Hommey	b8325f9cb0	Call base_boot before chunk_boot0 Chunk_boot0 calls rtree_new, which calls base_alloc, which locks the base_mtx mutex. That mutex is initialized in base_boot.	2012-04-12 11:42:20 -07:00
Jason Evans	5ff709c264	Normalize aligned allocation algorithms. Normalize arena_palloc(), chunk_alloc_mmap_slow(), and chunk_recycle_dss() to use the same algorithm for trimming over-allocation. Add the ALIGNMENT_ADDR2BASE(), ALIGNMENT_ADDR2OFFSET(), and ALIGNMENT_CEILING() macros, and use them where appropriate. Remove the run_size_p parameter from sa2u(). Fix a potential deadlock in chunk_recycle_dss() that was introduced by `eae269036c` (Add alignment support to chunk_alloc()).	2012-04-11 18:13:45 -07:00
Jason Evans	122449b073	Implement Valgrind support, redzones, and quarantine. Implement Valgrind support, as well as the redzone and quarantine features, which help Valgrind detect memory errors. Redzones are only implemented for small objects because the changes necessary to support redzones around large and huge objects are complicated by in-place reallocation, to the point that it isn't clear that the maintenance burden is worth the incremental improvement to Valgrind support. Merge arena_salloc() and arena_salloc_demote(). Refactor i[v]salloc() to expose the 'demote' option.	2012-04-11 11:46:18 -07:00
Jason Evans	a1ee7838e1	Rename labels. Rename labels from FOO to label_foo in order to avoid system macro definitions, in particular OUT and ERROR on mingw. Reported by Mike Hommey.	2012-04-10 15:07:44 -07:00
Jason Evans	b147611b52	Add utrace(2)-based tracing (--enable-utrace).	2012-04-05 13:36:17 -07:00
Jason Evans	02b231205e	Fix threaded initialization and enable it on Linux. Reported by Mike Hommey.	2012-04-05 11:06:23 -07:00
Jason Evans	01b3fe55ff	Add a0malloc(), a0calloc(), and a0free(). Add a0malloc(), a0calloc(), and a0free(), which are used by FreeBSD's libc to allocate/deallocate TLS in static binaries.	2012-04-03 19:25:48 -07:00
Jason Evans	633aaff967	Postpone mutex initialization on FreeBSD. Postpone mutex initialization on FreeBSD until after base allocation is safe.	2012-04-03 19:25:30 -07:00
Jason Evans	ae4c7b4b40	Clean up PAGE macros. s/PAGE_SHIFT/LG_PAGE/g and s/PAGE_SIZE/PAGE/g. Remove remnants of the dynamic-page-shift code. Rename the "arenas.pagesize" mallctl to "arenas.page". Remove the "arenas.chunksize" mallctl, which is redundant with "opt.lg_chunk".	2012-04-02 07:04:34 -07:00
Jason Evans	f004737267	Revert "Avoid NULL check in free() and malloc_usable_size()." This reverts commit `96d4120ac0`. ivsalloc() depends on chunks_rtree being initialized. This can be worked around via a NULL pointer check. However, thread_allocated_tsd_get() also depends on initialization having occurred, and there is no way to guard its call in free() that is cheaper than checking whether ptr is NULL.	2012-04-02 15:18:24 -07:00
Jason Evans	96d4120ac0	Avoid NULL check in free() and malloc_usable_size(). Generalize isalloc() to handle NULL pointers in such a way that the NULL checking overhead is only paid when introspecting huge allocations (or NULL). This allows free() and malloc_usable_size() to no longer check for NULL. Submitted by Igor Bukanov and Mike Hommey.	2012-04-02 14:50:03 -07:00
Mike Hommey	80b25932ca	Move last bit of zone initialization in zone.c, and lazy-initialize	2012-04-02 14:15:20 -07:00
Jason Evans	4eeb52f080	Remove vsnprintf() and strtoumax() validation. Remove code that validates malloc_vsnprintf() and malloc_strtoumax() against their namesakes. The validation code has adequately served its usefulness at this point, and it isn't worth dealing with the different formatting for %p with glibc versus other implementations for NULL pointers ("(nil)" vs. "0x0"). Reported by Mike Hommey.	2012-04-02 02:30:24 -07:00
Mike Hommey	3c2ba0dcbc	Avoid crashes when system libraries use the purgeable zone allocator	2012-03-30 11:03:20 -07:00

... 3 4 5 6 7 ...

487 Commits