server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Jason Evans	0f04bb1d6f	Rename the arenas.extend mallctl to arenas.create.	2017-01-06 18:58:45 -08:00
Jason Evans	3dc4e83ccb	Add MALLCTL_ARENAS_ALL. Add the MALLCTL_ARENAS_ALL cpp macro as a fixed index for use in accessing the arena.<i>.{purge,decay,dss} and stats.arenas.<i>.* mallctls, and deprecate access via the arenas.narenas index (to be removed in 6.0.0).	2017-01-06 18:58:45 -08:00
Jason Evans	a0dd3a4483	Implement per arena base allocators. Add/rename related mallctls: - Add stats.arenas.<i>.base . - Rename stats.arenas.<i>.metadata to stats.arenas.<i>.internal . - Add stats.arenas.<i>.resident . Modify the arenas.extend mallctl to take an optional (extent_hooks_t *) argument so that it is possible for all base allocations to be serviced by the specified extent hooks. This resolves #463.	2016-12-26 18:08:28 -08:00
Jason Evans	a6e86810d8	Refactor purging and splitting/merging. Split purging into lazy and forced variants. Use the forced variant for zeroing dss. Add support for NULL function pointers as an opt-out mechanism for the dalloc, commit, decommit, purge_lazy, purge_forced, split, and merge fields of extent_hooks_t. Add short-circuiting checks in large_ralloc_no_move_{shrink,expand}() so that no attempt is made if splitting/merging is not supported. This resolves #268.	2016-12-26 18:08:16 -08:00
Jason Evans	884fa22b8c	Rename arena_decay_t's ndirty to nunpurged.	2016-12-26 17:59:43 -08:00
Jason Evans	411697adcd	Use exponential series to size extents. If virtual memory is retained, allocate extents such that their sizes form an exponentially growing series. This limits the number of disjoint virtual memory ranges so that extent merging can be effective even if multiple arenas' extent allocation requests are highly interleaved. This resolves #462.	2016-12-26 17:59:42 -08:00
Jason Evans	c1baa0a9b7	Add huge page configuration and pages_[no}huge(). Add the --with-lg-hugepage configure option, but automatically configure LG_HUGEPAGE even if it isn't specified. Add the pages_[no]huge() functions, which toggle huge page state via madvise(..., MADV_[NO]HUGEPAGE) calls.	2016-12-26 17:59:34 -08:00
Jason Evans	bacb6afc6c	Simplify arena_slab_regind(). Rewrite arena_slab_regind() to provide sufficient constant data for the compiler to perform division strength reduction. This replaces more general manual strength reduction that was implemented before arena_bin_info was compile-time-constant. It would be possible to slightly improve on the compiler-generated division code by taking advantage of range limits that the compiler doesn't know about.	2016-12-23 10:34:34 -08:00
Jason Evans	69c26cdb01	Add some missing explicit casts.	2016-12-13 13:38:11 -08:00
Dave Watson	2319152d9f	jemalloc cpp new/delete bindings Adds cpp bindings for jemalloc, along with necessary autoconf settings. This is mostly to add sized deallocation support, which can't be added from C directly. Sized deallocation is ~10% microbench improvement. * Import ax_cxx_compile_stdcxx.m4 from the autoconf repo, seems like the easiest way to get c++14 detection. * Adds various other changes, like CXXFLAGS, to configure.ac. * Adds new rules to Makefile.in for src/jemalloc-cpp.cpp, and a basic unittest. * Both new and delete are overridden, to ensure jemalloc is used for both. * TODO future enhancement of avoiding extra PLT thunks for new and delete - sdallocx and malloc are publicly exported jemalloc symbols, using an alias would link them directly. Unfortunately, was having trouble getting it to play nice with jemalloc's namespace support. Testing: Tested gcc 4.8, gcc 5, gcc 5.2, clang 4.0. Only gcc >= 5 has sized deallocation support, verified that the rest build correctly. Tested mac osx and Centos. Tested --with-jemalloc-prefix and --without-export. This resolves #202.	2016-12-12 18:36:06 -08:00
Jason Evans	d4c5aceb7c	Add a_type parameter to qr_{meld,split}().	2016-12-12 18:16:51 -08:00
Jason Evans	acb7b1f53e	Add --disable-syscall. This resolves #517.	2016-12-03 16:50:58 -08:00
Jason Evans	32127949a3	Enable overriding JEMALLOC_{ALLOC,FREE}_JUNK. This resolves #509.	2016-11-22 10:58:58 -08:00
Jason Evans	c3b85f2585	Style fixes.	2016-11-22 10:58:23 -08:00
Jason Evans	5234be2133	Add pthread_atfork(3) feature test. Some versions of Android provide a pthreads library without providing pthread_atfork(), so in practice a separate feature test is necessary for the latter.	2016-11-17 15:14:57 -08:00
Jason Evans	fda60be799	Update a comment.	2016-11-17 11:50:52 -08:00
Jason Evans	a64123ce13	Refactor madvise(2) configuration. Add feature tests for the MADV_FREE and MADV_DONTNEED flags to madvise(2), so that MADV_FREE is detected and used for Linux kernel versions 4.5 and newer. Refactor pages_purge() so that on systems which support both flags, MADV_FREE is preferred over MADV_DONTNEED. This resolves #387.	2016-11-17 10:31:57 -08:00
Jason Evans	a38acf716e	Add extent serial numbers. Add extent serial numbers and use them where appropriate as a sort key that is higher priority than address, so that the allocation policy prefers older extents. This resolves #147.	2016-11-15 13:08:33 -08:00
Jason Evans	cda59f9970	Rename atomic__{uint32,uint64,u}() to atomic__{u32,u64,zu}(). This change conforms to naming conventions throughout the codebase.	2016-11-07 11:27:48 -08:00
Jason Evans	2e46b13ad5	Revert "Define 64-bits atomics unconditionally" This reverts commit `c2942e2c0e`. This resolves #495.	2016-11-07 10:53:35 -08:00
Jason Evans	04b463546e	Refactor prng to not use 64-bit atomics on 32-bit platforms. This resolves #495.	2016-11-07 10:52:44 -08:00
Jason Evans	ea9961acdb	Fix psz/pind edge cases. Add an "over-size" extent heap in which to store extents which exceed the maximum size class (plus cache-oblivious padding, if enabled). Remove psz2ind_clamp() and use psz2ind() instead so that trying to allocate the maximum size class can in principle succeed. In practice, this allows assertions to hold so that OOM errors can be successfully generated.	2016-11-03 22:33:34 -07:00
Jason Evans	8dd5ea87ca	Fix extent_alloc_cache[_locked]() to support decommitted allocation. Fix extent_alloc_cache[_locked]() to support decommitted allocation, and use this ability in arena_stash_dirty(), so that decommitted extents are not needlessly committed during purging. In practice this does not happen on any currently supported systems, because both extent merging and decommit must be implemented; all supported systems implement one xor the other.	2016-11-03 22:33:23 -07:00
Jason Evans	4f7d8c2dee	Update symbol mangling.	2016-11-03 15:00:02 -07:00
Dave Watson	25f7bbcf28	Fix long spinning in rtree_node_init rtree_node_init spinlocks the node, allocates, and then sets the node. This is under heavy contention at the top of the tree if many threads start to allocate at the same time. Instead, take a per-rtree sleeping mutex to reduce spinning. Tested both pthreads and osx OSSpinLock, and both reduce spinning adequately Previous benchmark time: ./ttest1 500 100 ~15s New benchmark time: ./ttest1 500 100 .57s	2016-11-02 20:30:53 -07:00
Jason Evans	d82f2b3473	Do not use syscall(2) on OS X 10.12 (deprecated).	2016-11-02 19:18:33 -07:00
Jason Evans	795f6689de	Add os_unfair_lock support. OS X 10.12 deprecated OSSpinLock; os_unfair_lock is the recommended replacement.	2016-11-02 18:09:45 -07:00
Jason Evans	d9f7b2a430	Fix/refactor zone allocator integration code. Fix zone_force_unlock() to reinitialize, rather than unlocking mutexes, since OS X 10.12 cannot tolerate a child unlocking mutexes that were locked by its parent. Refactor; this was a side effect of experimenting with zone {de,re}registration during fork(2).	2016-11-02 18:06:40 -07:00
Jason Evans	90b60eeae4	Add an assertion in witness_owner().	2016-10-31 15:28:22 -07:00
Jason Evans	6a834d94bb	Refactor witness_unlock() to fix undefined test behavior. This resolves #396.	2016-10-31 11:49:12 -07:00
Jason Evans	6c80321aed	Use CLOCK_MONOTONIC_COARSE rather than COARSE_MONOTONIC_RAW. The raw clock variant is slow (even relative to plain CLOCK_MONOTONIC), whereas the coarse clock variant is faster than CLOCK_MONOTONIC, but still has resolution (~1ms) that is adequate for our purposes. This resolves #479.	2016-10-29 22:58:18 -07:00
Dave Watson	8309388408	Support static linking of jemalloc with glibc glibc defines its malloc implementation with several weak and strong symbols: strong_alias (__libc_calloc, __calloc) weak_alias (__libc_calloc, calloc) strong_alias (__libc_free, __cfree) weak_alias (__libc_free, cfree) strong_alias (__libc_free, __free) strong_alias (__libc_free, free) strong_alias (__libc_malloc, __malloc) strong_alias (__libc_malloc, malloc) The issue is not with the weak symbols, but that other parts of glibc depend on __libc_malloc explicitly. Defining them in terms of jemalloc API's allows the linker to drop glibc's malloc.o completely from the link, and static linking no longer results in symbol collisions. Another wrinkle: jemalloc during initialization calls sysconf to get the number of CPU's. GLIBC allocates for the first time before setting up isspace (and other related) tables, which are used by sysconf. Instead, use the pthread API to get the number of CPUs with GLIBC, which seems to work. This resolves #442.	2016-10-28 15:08:19 -07:00
Jason Evans	48d4adfbeb	Avoid negation of unsigned numbers. Rather than relying on two's complement negation for alignment mask generation, use bitwise not and addition. This dodges warnings from MSVC, and should be strength-reduced by compiler optimization anyway.	2016-10-27 21:26:33 -07:00
Jason Evans	b54d160dc4	Do not (recursively) allocate within tsd_fetch(). Refactor tsd so that tsdn_fetch() does not trigger allocation, since allocation could cause infinite recursion. This resolves #458.	2016-10-20 23:59:12 -07:00
Jason Evans	577d4572b0	Make dss operations lockless. Rather than protecting dss operations with a mutex, use atomic operations. This has negligible impact on synchronization overhead during typical dss allocation, but is a substantial improvement for extent_in_dss() and the newly added extent_dss_mergeable(), which can be called multiple times during extent deallocations. This change also has the advantage of avoiding tsd in deallocation paths associated with purging, which resolves potential deadlocks during thread exit due to attempted tsd resurrection. This resolves #425.	2016-10-13 15:37:00 -07:00
Jason Evans	e5effef428	Add/use adaptive spinning. Add spin_t and spin_{init,adaptive}(), which provide a simple abstraction for adaptive spinning. Adaptively spin during busy waits in bootstrapping and rtree node initialization.	2016-10-13 14:55:39 -07:00
Jason Evans	9acd5cf178	Remove all vestiges of chunks. Remove mallctls: - opt.lg_chunk - stats.cactive This resolves #464.	2016-10-12 11:55:43 -07:00
Jason Evans	63b5657aa5	Remove ratio-based purging. Make decay-based purging the default (and only) mode. Remove associated mallctls: - opt.purge - opt.lg_dirty_mult - arena.<i>.lg_dirty_mult - arenas.lg_dirty_mult - stats.arenas.<i>.lg_dirty_mult This resolves #385.	2016-10-12 10:40:27 -07:00
Jason Evans	b4b4a77848	Fix and simplify decay-based purging. Simplify decay-based purging attempts to only be triggered when the epoch is advanced, rather than every time purgeable memory increases. In a correctly functioning system (not previously the case; see below), this only causes a behavior difference if during subsequent purge attempts the least recently used (LRU) purgeable memory extent is initially too large to be purged, but that memory is reused between attempts and one or more of the next LRU purgeable memory extents are small enough to be purged. In practice this is an arbitrary behavior change that is within the set of acceptable behaviors. As for the purging fix, assure that arena->decay.ndirty is recorded after the epoch advance and associated purging occurs. Prior to this fix, it was possible for purging during epoch advance to cause a substantially underrepresentative (arena->ndirty - arena->decay.ndirty), i.e. the number of dirty pages attributed to the current epoch was too low, and a series of unintended purges could result. This fix is also relevant in the context of the simplification described above, but the bug's impact would be limited to over-purging at epoch advances.	2016-10-11 15:30:01 -07:00
Jason Evans	5f11fb7d43	Do not advance decay epoch when time goes backwards. Instead, move the epoch backward in time. Additionally, add nstime_monotonic() and use it in debug builds to assert that time only goes backward if nstime_update() is using a non-monotonic time source.	2016-10-10 22:15:10 -07:00
Jason Evans	ee0c74b77a	Refactor arena->decay_* into arena->decay.* (arena_decay_t).	2016-10-10 20:32:19 -07:00
Jason Evans	e0164bc63c	Refine nstime_update(). Add missing #include <time.h>. The critical time facilities appear to have been transitively included via unistd.h and sys/time.h, but in principle this omission was capable of having caused clock_gettime(CLOCK_MONOTONIC, ...) to have been overlooked in favor of gettimeofday(), which in turn could cause spurious non-monotonic time updates. Refactor nstime_get() out of nstime_update() and add configure tests for all variants. Add CLOCK_MONOTONIC_RAW support (Linux-specific) and mach_absolute_time() support (OS X-specific). Do not fall back to clock_gettime(CLOCK_REALTIME, ...). This was a fragile Linux-specific workaround, which we're unlikely to use at all now that clock_gettime(CLOCK_MONOTONIC_RAW, ...) is supported, and if we have no choice besides non-monotonic clocks, gettimeofday() is only incrementally worse.	2016-10-10 10:33:59 -07:00
Jason Evans	871a9498e1	Fix size class overflow bugs. Avoid calling s2u() on raw extent sizes in extent_recycle(). Clamp psz2ind() (implemented as psz2ind_clamp()) when inserting/removing into/from size-segregated extent heaps.	2016-10-03 14:18:55 -07:00
Eric Le Bihan	df0d273a07	Fix LG_QUANTUM definition for sparc64 GCC 4.9.3 cross-compiled for sparc64 defines __sparc_v9__, not __sparc64__ nor __sparcv9. This prevents LG_QUANTUM from being defined properly. Adding this new value to the check solves the issue.	2016-09-26 15:13:07 -07:00
Jason Evans	61f467e16a	Avoid self assignment in tsd_set().	2016-09-23 12:21:34 -07:00
Jason Evans	0222fb41d1	Add various mutex ownership assertions.	2016-09-23 12:21:34 -07:00
Jason Evans	73868b60f2	Fix extent_{before,last,past}() to return page-aligned results.	2016-09-23 12:21:34 -07:00
Jason Evans	f6d01ff4b7	Protect extents_dirty access with extents_mtx. This fixes race conditions during purging.	2016-09-22 11:57:28 -07:00
Elliot Ronaghan	1167e9eff3	Check for __builtin_unreachable at configure time Add a configure check for __builtin_unreachable instead of basing its availability on the __GNUC__ version. On OS X using gcc (a real gcc, not the bundled version that's just a gcc front-end) leads to a linker assertion: https://github.com/jemalloc/jemalloc/issues/266 It turns out that this is caused by a gcc bug resulting from the use of __builtin_unreachable(): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57438 To work around this bug, check that __builtin_unreachable() actually works at configure time, and if it doesn't use abort() instead. The check is based on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57438#c21. With this `make check` passes with a homebrew installed gcc-5 and gcc-6.	2016-07-07 13:28:44 -07:00
Mike Hommey	c2942e2c0e	Define 64-bits atomics unconditionally They are used on all platforms in prng.h.	2016-06-09 23:17:39 +09:00
Mike Hommey	0dad5b7719	Fix extent_*_get to build with MSVC	2016-06-09 22:00:18 +09:00
Elliot Ronaghan	8a1a794b0c	Don't use compact red-black trees with the pgi compiler Some bug (either in the red-black tree code, or in the pgi compiler) seems to cause red-black trees to become unbalanced. This issue seems to go away if we don't use compact red-black trees. Since red-black trees don't seem to be used much anymore, I opted for what seems to be an easy fix here instead of digging in and trying to find the root cause of the bug. Some context in case it's helpful: I experienced a ton of segfaults while using pgi as Chapel's target compiler with jemalloc 4.0.4. The little bit of debugging I did pointed me somewhere deep in red-black tree manipulation, but I didn't get a chance to investigate further. It looks like 4.2.0 replaced most uses of red-black trees with pairing-heaps, which seems to avoid whatever bug I was hitting. However, `make check_unit` was still failing on the rb test, so I figured the core issue was just being masked. Here's the `make check_unit` failure: ```sh === test/unit/rb === test_rb_empty: pass tree_recurse:test/unit/rb.c:90: Failed assertion: (((_Bool) (((uintptr_t) (left_node)->link.rbn_right_red) & ((size_t)1)))) == (false) --> true != false: Node should be black test_rb_random:test/unit/rb.c:274: Failed assertion: (imbalances) == (0) --> 1 != 0: Tree is unbalanced tree_recurse:test/unit/rb.c:90: Failed assertion: (((_Bool) (((uintptr_t) (left_node)->link.rbn_right_red) & ((size_t)1)))) == (false) --> true != false: Node should be black test_rb_random:test/unit/rb.c:274: Failed assertion: (imbalances) == (0) --> 1 != 0: Tree is unbalanced node_remove:test/unit/rb.c:190: Failed assertion: (imbalances) == (0) --> 2 != 0: Tree is unbalanced <jemalloc>: test/unit/rb.c:43: Failed assertion: "pathp[-1].cmp < 0" test/test.sh: line 22: 12926 Aborted Test harness error ``` While starting to debug I saw the RB_COMPACT option and decided to check if turning that off resolved the bug. It seems to have fixed it (`make check_unit` passes and the segfaults under Chapel are gone) so it seems like on okay work-around. I'd imagine this has performance implications for red-black trees under pgi, but if they're not going to be used much anymore it's probably not a big deal.	2016-06-08 14:48:55 -07:00
Jason Evans	dd752c1ffd	Fix potential VM map fragmentation regression. Revert `245ae6036c` (Support --with-lg-page values larger than actual page size.), because it could cause VM map fragmentation if the kernel grows mmap()ed memory downward. This resolves #391.	2016-06-07 14:15:49 -07:00
Jason Evans	4e910fc958	Fix extent_alloc_dss() regressions. Page-align the gap, if any, and add/use extent_dalloc_gap(), which registers the gap extent before deallocation.	2016-06-05 21:00:02 -07:00
Jason Evans	04942c3d90	Remove a stray memset(), and fix a junk filling test regression.	2016-06-05 21:00:02 -07:00
Jason Evans	f8f0542194	Modify extent hook functions to take an (extent_t *) argument. This facilitates the application accessing its own extent allocator metadata during hook invocations. This resolves #259.	2016-06-05 21:00:02 -07:00
Jason Evans	6f29a83924	Add rtree lookup path caching. rtree-based extent lookups remain more expensive than chunk-based run lookups, but with this optimization the fast path slowdown is ~3 CPU cycles per metadata lookup (on Intel Core i7-4980HQ), versus ~11 cycles prior. The path caching speedup tends to degrade gracefully unless allocated memory is spread far apart (as is the case when using a mixture of sbrk() and mmap()).	2016-06-05 20:59:57 -07:00
Jason Evans	7be2ebc23f	Make tsd cleanup functions optional, remove noop cleanup functions.	2016-06-05 20:42:24 -07:00
Jason Evans	b14fdaaca0	Add a missing prof_alloc_rollback() call. In the case where prof_alloc_prep() is called with an over-estimate of allocation size, and sampling doesn't end up being triggered, the tctx must be discarded.	2016-06-05 20:42:24 -07:00
Jason Evans	c8c3cbdf47	Miscellaneous s/chunk/extent/ updates.	2016-06-05 20:42:24 -07:00
Jason Evans	a43db1c608	Relax NBINS constraint (max 255 --> max 256).	2016-06-05 20:42:24 -07:00
Jason Evans	751f2c332d	Remove obsolete stats.arenas.<i>.metadata.mapped mallctl. Rename stats.arenas.<i>.metadata.allocated mallctl to stats.arenas.<i>.metadata .	2016-06-05 20:42:24 -07:00
Jason Evans	03eea4fb8b	Better document --enable-ivsalloc.	2016-06-05 20:42:24 -07:00
Jason Evans	22588dda6e	Rename most remaining chunk APIs to extent.	2016-06-05 20:42:23 -07:00
Jason Evans	0c4932eb1e	s/chunk_lookup/extent_lookup/g, s/chunks_rtree/extents_rtree/g	2016-06-05 20:42:23 -07:00
Jason Evans	4a55daa363	s/CHUNK_HOOKS_INITIALIZER/EXTENT_HOOKS_INITIALIZER/g	2016-06-05 20:42:23 -07:00
Jason Evans	c9a76481d8	Rename chunks_{cached,retained,mtx} to extents_{cached,retained,mtx}.	2016-06-05 20:42:23 -07:00
Jason Evans	9c305c9e5c	s/chunk_hook/extent_hook/g	2016-06-05 20:42:23 -07:00
Jason Evans	7d63fed0fd	Rename huge to large.	2016-06-05 20:42:23 -07:00
Jason Evans	714d1640f3	Update private symbols.	2016-06-05 20:42:23 -07:00
Jason Evans	498856f44a	Move slabs out of chunks.	2016-06-05 20:42:23 -07:00
Jason Evans	d28e5a6696	Improve interval-based profile dump triggering. When an allocation is large enough to trigger multiple dumps, use modular math rather than subtraction to reset the interval counter. Prior to this change, it was possible for a single allocation to cause many subsequent allocations to all trigger profile dumps. When updating usable size for a sampled object, try to cancel out the difference between LARGE_MINCLASS and usable size from the interval counter.	2016-06-05 20:42:23 -07:00
Jason Evans	ed2c2427a7	Use huge size class infrastructure for large size classes.	2016-06-05 20:42:18 -07:00
Jason Evans	b46261d58b	Implement cache-oblivious support for huge size classes.	2016-06-03 12:27:41 -07:00
Jason Evans	4731cd47f7	Allow chunks to not be naturally aligned. Precisely size extents for huge size classes that aren't multiples of chunksize.	2016-06-03 12:27:41 -07:00
Jason Evans	741967e79d	Remove CHUNK_ADDR2BASE() and CHUNK_ADDR2OFFSET().	2016-06-03 12:27:41 -07:00
Jason Evans	23c52c895f	Make extent_prof_tctx_[gs]et() atomic.	2016-06-03 12:27:41 -07:00
Jason Evans	760bf11b23	Add extent_dirty_[gs]et().	2016-06-03 12:27:41 -07:00
Jason Evans	47613afc34	Convert rtree from per chunk to per page. Refactor [de]registration to maintain interior rtree entries for slabs.	2016-06-03 12:27:41 -07:00
Jason Evans	5c6be2bdd3	Refactor chunk_purge_wrapper() to take extent argument.	2016-06-03 12:27:41 -07:00
Jason Evans	0eb6f08959	Refactor chunk_[de]commit_wrapper() to take extent arguments.	2016-06-03 12:27:41 -07:00
Jason Evans	6c94470822	Refactor chunk_dalloc_{cache,wrapper}() to take extent arguments. Rename arena_extent_[d]alloc() to extent_[d]alloc(). Move all chunk [de]registration responsibility into chunk.c.	2016-06-03 12:27:41 -07:00
Jason Evans	de0305a7f3	Add/use chunk_split_wrapper(). Remove redundant ptr/oldsize args from huge_*(). Refactor huge/chunk/arena code boundaries.	2016-06-03 12:27:41 -07:00
Jason Evans	1ad060584f	Add/use chunk_merge_wrapper().	2016-06-03 12:27:41 -07:00
Jason Evans	384e88f451	Add/use chunk_commit_wrapper().	2016-06-03 12:27:41 -07:00
Jason Evans	56e0031d7d	Add/use chunk_decommit_wrapper().	2016-06-03 12:27:41 -07:00
Jason Evans	4d2d9cec5a	Merge chunk_alloc_base() into its only caller.	2016-06-03 12:27:41 -07:00
Jason Evans	fc0372a15e	Replace extent_tree_szad_* with extent_heap_*.	2016-06-03 12:27:41 -07:00
Jason Evans	ffa45a5331	Use rtree rather than [sz]ad trees for chunk split/coalesce operations.	2016-06-03 12:27:41 -07:00
Jason Evans	93e79c5c3f	Remove redundant chunk argument from chunk_{,de,re}register().	2016-06-03 12:27:41 -07:00
Jason Evans	9aea58d9a2	Add extent_past_get().	2016-06-03 12:27:41 -07:00
Jason Evans	d78846c989	Replace extent_achunk_[gs]et() with extent_slab_[gs]et().	2016-06-03 12:27:41 -07:00
Jason Evans	fae8344098	Add extent_active_[gs]et(). Always initialize extents' runs_dirty and chunks_cache linkage.	2016-06-03 12:27:41 -07:00
Jason Evans	6f71844659	Move PAGE definitions to pages.h.	2016-06-03 12:27:41 -07:00
Jason Evans	e75e9be130	Add rtree element witnesses.	2016-06-03 12:27:41 -07:00
Jason Evans	8c9be3e837	Refactor rtree to always use base_alloc() for node allocation.	2016-06-03 12:27:41 -07:00
Jason Evans	db72272bef	Use rtree-based chunk lookups rather than pointer bit twiddling. Look up chunk metadata via the radix tree, rather than using CHUNK_ADDR2BASE(). Propagate pointer's containing extent. Minimize extent lookups by doing a single lookup (e.g. in free()) and propagating the pointer's extent into nearly all the functions that may need it.	2016-06-03 12:27:41 -07:00
Jason Evans	2d2b4e98c9	Add element acquire/release capabilities to rtree. This makes it possible to acquire short-term "ownership" of rtree elements so that it is possible to read an extent pointer and read the extent's contents with a guarantee that the element will not be modified until the ownership is released. This is intended as a mechanism for resolving rtree read/write races rather than as a way to lock extents.	2016-06-03 12:27:33 -07:00
Jason Evans	a7a6f5bc96	Rename extent_node_t to extent_t.	2016-05-16 12:21:28 -07:00
Jason Evans	3aea827f5e	Simplify run quantization.	2016-05-16 12:21:27 -07:00
Jason Evans	7bb00ae9d6	Refactor runs_avail. Use pszind_t size classes rather than szind_t size classes, and always reserve space for NPSIZES elements. This removes unused heaps that are not multiples of the page size, and adds (currently) unused heaps for all huge size classes, with the immediate benefit that the size of arena_t allocations is constant (no longer dependent on chunk size).	2016-05-16 12:21:21 -07:00
Jason Evans	226c446979	Implement pz2ind(), pind2sz(), and psz2u(). These compute size classes and indices similarly to size2index(), index2size() and s2u(), respectively, but using the subset of size classes that are multiples of the page size. Note that pszind_t and szind_t are not interchangeable.	2016-05-13 10:31:54 -07:00
Jason Evans	627372b459	Initialize arena_bin_info at compile time rather than at boot time. This resolves #370.	2016-05-13 10:31:30 -07:00
Jason Evans	b683734b43	Implement BITMAP_INFO_INITIALIZER(nbits). This allows static initialization of bitmap_info_t structures.	2016-05-13 10:27:48 -07:00
Jason Evans	17c021c177	Remove redzone support. This resolves #369.	2016-05-13 10:27:33 -07:00
Jason Evans	ba5c709517	Remove quarantine support.	2016-05-13 10:25:05 -07:00
Jason Evans	9a8add1510	Remove Valgrind support.	2016-05-13 09:56:18 -07:00
Jason Evans	a397045323	Use TSDN_NULL rather than NULL as appropriate.	2016-05-12 21:07:08 -07:00
Jason Evans	73d3d58dc2	Optimize witness fast path. Short-circuit commonly called witness functions so that they only execute in debug builds, and remove equivalent guards from mutex functions. This avoids pointless code execution in witness_assert_lockless(), which is typically called twice per allocation/deallocation function invocation. Inline commonly called witness functions so that optimized builds can completely remove calls as dead code.	2016-05-11 15:38:06 -07:00
Jason Evans	c1e00ef2a6	Resolve bootstrapping issues when embedded in FreeBSD libc. `b2c0d6322d` (Add witness, a simple online locking validator.) caused a broad propagation of tsd throughout the internal API, but tsd_fetch() was designed to fail prior to tsd bootstrapping. Fix this by splitting tsd_t into non-nullable tsd_t and nullable tsdn_t, and modifying all internal APIs that do not critically rely on tsd to take nullable pointers. Furthermore, add the tsd_booted_get() function so that tsdn_fetch() can probe whether tsd bootstrapping is complete and return NULL if not. All dangerous conversions of nullable pointers are tsdn_tsd() calls that assert-fail on invalid conversion.	2016-05-10 22:51:33 -07:00
Jason Evans	919e4a0ea9	Add LG_QUANTUM definition for the RISC-V architecture.	2016-05-06 17:15:32 -07:00
Jason Evans	1326010cf4	Update private_symbols.txt.	2016-05-06 14:50:58 -07:00
Jason Evans	3ef51d7f73	Optimize the fast paths of calloc() and [m,d,sd]allocx(). This is a broader application of optimizations to malloc() and free() in `f4a0f32d34` (Fast-path improvement: reduce # of branches and unnecessary operations.). This resolves #321.	2016-05-06 14:37:39 -07:00
Jason Evans	c2f970c32b	Modify pages_map() to support mapping uncommitted virtual memory. If the OS overcommits: - Commit all mappings in pages_map() regardless of whether the caller requested committed memory. - Linux-specific: Specify MAP_NORESERVE to avoid unfortunate interactions with heuristic overcommit mode during fork(2). This resolves #193.	2016-05-05 18:56:17 -07:00
Jason Evans	04c3c0f9a0	Add the stats.retained and stats.arenas.<i>.retained statistics. This resolves #367.	2016-05-03 22:11:35 -07:00
Jason Evans	90827a3f3e	Fix huge_palloc() regression. Split arena_choose() into arena_[i]choose() and use arena_ichoose() for arena lookup during internal allocation. This fixes huge_palloc() so that it always succeeds during extent node allocation. This regression was introduced by `66cd953514` (Do not allocate metadata via non-auto arenas, nor tcaches.).	2016-05-03 17:19:15 -07:00
Jason Evans	108c4a11e9	Fix witness/fork() interactions. Fix witness to clear its list of owned mutexes in the child if platform-specific malloc_mutex code re-initializes mutexes rather than unlocking them.	2016-04-26 10:47:22 -07:00
Jason Evans	174c0c3a9c	Fix fork()-related lock rank ordering reversals.	2016-04-25 23:16:20 -07:00
Jason Evans	71d94828a2	Fix degenerate mb_write() compilation error. This resolves #375.	2016-04-22 21:27:17 -07:00
Jason Evans	19ff2cefba	Implement the arena.<i>.reset mallctl. This makes it possible to discard all of an arena's allocations in a single operation. This resolves #146.	2016-04-22 15:20:06 -07:00
Jason Evans	66cd953514	Do not allocate metadata via non-auto arenas, nor tcaches. This assures that all internally allocated metadata come from the first opt_narenas arenas, i.e. the automatically multiplexed arenas.	2016-04-22 15:19:59 -07:00
Jason Evans	b6e07d2389	Fix malloc_mutex_assert_[not_]owner() for --enable-lazy-lock case.	2016-04-18 15:42:09 -07:00
Jason Evans	ab0cfe01fa	Update private_symbols.txt. Change test-related mangling to simplify symbol filtering. The following commands can be used to detect missing/obsolete symbol mangling, with the caveat that the full set of symbols is based on the union of symbols generated by all configurations, some of which are platform-specific: ./autogen.sh --enable-debug --enable-prof --enable-lazy-lock make all tests nm -a lib/libjemalloc.a src/.jet.o \ \|grep " [TDBCR] " \ \|awk '{print $3}' \ \|sed -e 's/^$je_\\|jet_\(n_$\?\)$[a-zA-Z0-9_]$/\3/g' \ \|LC_COLLATE=C sort -u \ \|grep -v \ -e '^$malloc\\|calloc\\|posix_memalign\\|aligned_alloc\\|realloc\\|free$$' \ -e '^$m\\|r\\|x\\|s\\|d\\|sd\\|n$allocx$' \ -e '^mallctl$\\|nametomib\\|bymib$$' \ -e '^malloc_$stats_print\\|usable_size\\|message$$' \ -e '^$memalign\\|valloc$$' \ -e '^__$malloc\\|memalign\\|realloc\\|free$_hook$' \ -e '^pthread_create$' \ > /tmp/private_symbols.txt	2016-04-18 15:23:35 -07:00
Rajat Goel	a0c632c9d5	Update private_symbols.txt Add 4 missing symbols	2016-04-18 11:54:09 -07:00
Jason Evans	1423ee9016	Fix style nits.	2016-04-17 13:44:59 -07:00
Jason Evans	1b5830178f	Fix malloc_mutex_[un]lock() to conditionally check witness. Also remove tautological cassert(config_debug) calls.	2016-04-17 13:44:59 -07:00
Jason Evans	2288424325	s/MALLOC_MUTEX_RANK_OMIT/WITNESS_RANK_OMIT/ This fixes a compilation error caused by `b2c0d6322d` (Add witness, a simple online locking validator.). This resolves #375.	2016-04-14 12:18:55 -07:00
Jason Evans	a15841cc7d	Fix a compilation error. Fix a compilation error that occurs if Valgrind is not enabled. This regression was caused by `b2c0d6322d` (Add witness, a simple online locking validator.).	2016-04-14 02:12:33 -07:00
Jason Evans	b2c0d6322d	Add witness, a simple online locking validator. This resolves #358.	2016-04-14 02:09:28 -07:00
Jason Evans	8413463f3a	Fix a style nit.	2016-04-12 23:18:25 -07:00
Jason Evans	667eca2ac2	Simplify RTREE_HEIGHT_MAX definition. Use 1U rather than ZU(1) in macro definitions, so that the preprocessor can evaluate the resulting expressions.	2016-04-11 02:35:00 -07:00
Jason Evans	245ae6036c	Support --with-lg-page values larger than actual page size. During over-allocation in preparation for creating aligned mappings, allocate one more page than necessary if PAGE is the actual page size, so that trimming still succeeds even if the system returns a mapping that has less than PAGE alignment. This allows compiling with e.g. 64 KiB "pages" on systems that actually use 4 KiB pages. Note that for e.g. --with-lg-page=21, it is also necessary to increase the chunk size (e.g. --with-malloc-conf=lg_chunk:22) so that there are at least two "pages" per chunk. In practice this isn't a particularly compelling configuration because so much (unusable) virtual memory is dedicated to chunk headers.	2016-04-11 02:35:00 -07:00
Jason Evans	96aa67aca8	Clean up char vs. uint8_t in junk filling code. Consistently use uint8_t rather than char for junk filling code.	2016-04-11 02:26:35 -07:00
Jason Evans	c6a2c39404	Refactor/fix ph. Refactor ph to support configurable comparison functions. Use a cpp macro code generation form equivalent to the rb macros so that pairing heaps can be used for both run heaps and chunk heaps. Remove per node parent pointers, and instead use leftmost siblings' prev pointers to track parents. Fix multi-pass sibling merging to iterate over intermediate results using a FIFO, rather than a LIFO. Use this fixed sibling merging implementation for both merge phases of the auxiliary twopass algorithm (first merging the aux list, then replacing the root with its merged children). This fixes both degenerate merge behavior and the potential for deep recursion. This regression was introduced by `6bafa6678f` (Pairing heap). This resolves #371.	2016-04-11 02:15:42 -07:00
Jason Evans	2ee2f1ec57	Reduce differences between alternative bitmap implementations.	2016-04-06 10:38:47 -07:00
Jason Evans	4a8abbb400	Fix bitmap_sfu() regression. Fix bitmap_sfu() to shift by LG_BITMAP_GROUP_NBITS rather than hard-coded 6 when using linear (non-USE_TREE) bitmap search. In practice this affects only 64-bit systems for which sizeof(long) is not 8 (i.e. Windows), since USE_TREE is defined for 32-bit systems. This regression was caused by `b8823ab026` (Use linear scan for small bitmaps). This resolves #368.	2016-04-06 10:32:06 -07:00
Chris Peterson	a82070ef5f	Add JEMALLOC_ALLOC_JUNK and JEMALLOC_FREE_JUNK macros Replace hardcoded 0xa5 and 0x5a junk values with JEMALLOC_ALLOC_JUNK and JEMALLOC_FREE_JUNK macros, respectively.	2016-03-31 11:23:29 -07:00
Jason Evans	ce7c0f999b	Fix potential chunk leaks. Move chunk_dalloc_arena()'s implementation into chunk_dalloc_wrapper(), so that if the dalloc hook fails, proper decommit/purge/retain cascading occurs. This fixes three potential chunk leaks on OOM paths, one during dss-based chunk allocation, one during chunk header commit (currently relevant only on Windows), and one during rtree write (e.g. if rtree node allocation fails). Merge chunk_purge_arena() into chunk_purge_default() (refactor, no change to functionality).	2016-03-30 18:36:04 -07:00
Chris Peterson	f3060284c5	Remove unused arenas_extend() function declaration. The arenas_extend() function was renamed to arenas_init() in commit `8bb3198f72`, but its function declaration was not removed from jemalloc_internal.h.in.	2016-03-26 01:03:24 -07:00
Jason Evans	af3184cac0	Use abort() for fallback implementations of unreachable().	2016-03-24 01:42:08 -07:00
Jason Evans	61a6dfcd5f	Constify various internal arena APIs.	2016-03-23 16:15:42 -07:00
Jason Evans	6a885198c2	Always inline performance-critical rtree operations.	2016-03-23 16:15:42 -07:00
Jason Evans	6c460ad91b	Optimize rtree_get(). Specialize fast path to avoid code that cannot execute for dependent loads. Manually unroll.	2016-03-22 17:54:35 -07:00
Jason Evans	22af74e106	Refactor out signed/unsigned comparisons.	2016-03-15 09:40:02 -07:00
Rajeev Misra	ca18f2834e	typecast address to pointer to byte to avoid unaligned memory access error	2016-03-10 22:49:05 -08:00
Jason Evans	613cdc80f6	Convert arena_bin_t's runs from a tree to a heap.	2016-03-08 13:48:27 -08:00
Dave Watson	4a0dbb5ac8	Use pairing heap for arena->runs_avail Use pairing heap instead of red black tree in arena runs_avail. The extra links are unioned with the bitmap_t, so this change doesn't use any extra memory. Canaries show this change to be a 1% cpu win, and 2% latency win. In particular, large free()s, and small bin frees are now O(1) (barring coalescing). I also tested changing bin->runs to be a pairing heap, but saw a much smaller win, and it would mean increasing the size of arena_run_s by two pointers, so I left that as an rb-tree for now.	2016-03-08 13:48:27 -08:00
Jason Evans	f8d80d62a8	Refactor ph_merge_ordered() out of ph_merge().	2016-03-08 13:48:27 -08:00
Dave Watson	6bafa6678f	Pairing heap Initial implementation of a twopass pairing heap with aux list. Research papers linked in comments. Where search/nsearch/last aren't needed, this gives much faster first(), delete(), and insert(). Insert is O(1), and first/delete don't have to walk the whole tree. Also tested rb_old with parent pointers - it was better than the current rb.h for memory loads, but still much worse than a pairing heap. An array-based heap would be much faster if everything fits in memory, but on a cold cache it has many more memory loads for most operations.	2016-03-08 13:46:19 -08:00
Jason Evans	022f6891fa	Avoid a potential innocuous compiler warning. Add a cast to avoid comparing a ssize_t value to a uint64_t value that is always larger than a 32-bit ssize_t. This silences an innocuous compiler warning from e.g. gcc 4.2.1 about the comparison always having the same result.	2016-03-02 22:45:37 -08:00
Jason Evans	3c07f803aa	Fix stats.arenas.<i>.[...] for --disable-stats case. Add missing stats.arenas.<i>.{dss,lg_dirty_mult,decay_time} initialization. Fix stats.arenas.<i>.{pactive,pdirty} to read under the protection of the arena mutex.	2016-02-27 20:40:13 -08:00
Jason Evans	69acd25a64	Add/alphabetize private symbols.	2016-02-27 15:35:52 -08:00
Jason Evans	40ee9aa957	Fix stats.cactive accounting regression. Fix stats.cactive accounting to always increase/decrease by multiples of the chunk size, even for huge size classes that are not multiples of the chunk size, e.g. {2.5, 3, 3.5, 5, 7} MiB with 2 MiB chunk size. This regression was introduced by `155bfa7da1` (Normalize size classes.) and first released in 4.0.0. This resolves #336.	2016-02-27 15:35:52 -08:00
Jason Evans	20fad3430c	Refactor some bitmap cpp logic.	2016-02-26 14:43:39 -08:00
Dave Watson	b8823ab026	Use linear scan for small bitmaps For small bitmaps, a linear scan of the bitmap is slightly faster than a tree search - bitmap_t is more compact, and there are fewer writes since we don't have to propogate state transitions up the tree. On x86_64 with the current settings, I'm seeing ~.5%-1% CPU improvement in production canaries with this change. The old tree code is left since 32bit sizes are much larger (and ffsl smaller), and maybe the run sizes will change in the future. This resolves #339.	2016-02-26 14:21:10 -08:00
Jason Evans	01ecdf32d6	Miscellaneous bitmap refactoring.	2016-02-26 14:21:10 -08:00
Jason Evans	42ce80e15a	Silence miscellaneous 64-to-32-bit data loss warnings. This resolves #341.	2016-02-25 20:51:00 -08:00
Jason Evans	0c516a00c4	Make *allocx() size class overflow behavior defined. Limit supported size and alignment to HUGE_MAXCLASS, which in turn is now limited to be less than PTRDIFF_MAX. This resolves #278 and #295.	2016-02-25 15:29:49 -08:00
Jason Evans	767d85061a	Refactor arenas array (fixes deadlock). Refactor the arenas array, which contains pointers to all extant arenas, such that it starts out as a sparse array of maximum size, and use double-checked atomics-based reads as the basis for fast and simple arena_get(). Additionally, reduce arenas_lock's role such that it only protects against arena initalization races. These changes remove the possibility for arena lookups to trigger locking, which resolves at least one known (fork-related) deadlock. This resolves #315.	2016-02-24 23:58:10 -08:00
Jason Evans	c7a9a6c86b	Attempt mmap-based in-place huge reallocation. Attempt mmap-based in-place huge reallocation by plumbing new_addr into chunk_alloc_mmap(). This can dramatically speed up incremental huge reallocation. This resolves #335.	2016-02-24 17:23:18 -08:00
Jason Evans	aa63d5d377	Fix ffs_zu() compilation error on MinGW. This regression was caused by `9f4ee6034c` (Refactor jemalloc_ffs() into ffs_().).	2016-02-24 14:01:47 -08:00
Jason Evans	9e1810ca9d	Silence miscellaneous 64-to-32-bit data loss warnings.	2016-02-24 13:03:48 -08:00
Jason Evans	1c42a04cc6	Change lg_floor() return type from size_t to unsigned.	2016-02-24 13:03:48 -08:00
Jason Evans	8f683b94a7	Make opt_narenas unsigned rather than size_t.	2016-02-24 13:03:48 -08:00
Jason Evans	603b3bd413	Make nhbins unsigned rather than size_t.	2016-02-24 13:03:48 -08:00
Jason Evans	9f4ee6034c	Refactor jemalloc_ffs() into ffs_(). Use appropriate versions to resolve 64-to-32-bit data loss warnings.	2016-02-24 13:03:48 -08:00
Dmitri Smirnov	b41a07c31a	Fix Windows build issues This resolves #333.	2016-02-23 18:55:45 -08:00
Jason Evans	ae45142adc	Collapse arena_avail_tree_* into arena_run_tree_*. These tree types converged to become identical, yet they still had independently generated red-black tree implementations.	2016-02-23 18:27:24 -08:00
Dave Watson	3417a304cc	Separate arena_avail trees Separate run trees by index, replacing the previous quantize logic. Quantization by index is now performed only on insertion / removal from the tree, and not on node comparison, saving some cpu. This also means we don't have to dereference the miscelm* pointers, saving half of the memory loads from miscelms/mapbits that have fallen out of cache. A linear scan of the indicies appears to be fast enough. The only cost of this is an extra tree array in each arena.	2016-02-23 18:09:36 -08:00
Dave Watson	2b1fc90b7b	Remove rbt_nil Since this is an intrusive tree, rbt_nil is the whole size of the node and can be quite large. For example, miscelm is ~100 bytes.	2016-02-23 18:09:25 -08:00
Jason Evans	0da8ce1e96	Use table lookup for run_quantize_{floor,ceil}(). Reduce run quantization overhead by generating lookup tables during bootstrapping, and using the tables for all subsequent run quantization.	2016-02-22 16:47:34 -08:00
Jason Evans	a9a4684792	Test run quantization. Also rename run_quantize_*() to improve clarity. These tests demonstrate that run_quantize_ceil() is flawed.	2016-02-22 14:58:05 -08:00
Jason Evans	817d9030a5	Indentation style cleanup.	2016-02-22 10:44:58 -08:00
Jason Evans	9bad079039	Refactor time_* into nstime_*. Use a single uint64_t in nstime_t to store nanoseconds rather than using struct timespec. This reduces fragility around conversions between long and uint64_t, especially missing casts that only cause problems on 32-bit platforms.	2016-02-21 21:39:05 -08:00
Jason Evans	56139dc403	Remove _WIN32-specific struct timespec declaration. struct timespec is already defined by the system (at least on MinGW).	2016-02-20 23:43:17 -08:00
Jason Evans	ecae12323d	Fix overflow in prng_range(). Add jemalloc_ffs64() and use it instead of jemalloc_ffsl() in prng_range(), since long is not guaranteed to be a 64-bit type.	2016-02-20 23:41:33 -08:00
Jason Evans	aac93f414e	Add symbol mangling for prng_[lg_]range().	2016-02-20 11:26:00 -08:00
rustyx	3c2c5a5071	Fix warning in ipalloc	2016-02-20 10:55:18 -08:00
Christopher Ferris	effaf7d40f	Fix a typo in the ckh_search() prototype.	2016-02-20 10:26:17 -08:00
Jason Evans	a0aaad1afa	Handle unaligned keys in hash(). Reported by Christopher Ferris <cferris@google.com>.	2016-02-20 10:23:48 -08:00
Jason Evans	243f7a0508	Implement decay-based unused dirty page purging. This is an alternative to the existing ratio-based unused dirty page purging, and is intended to eventually become the sole purging mechanism. Add mallctls: - opt.purge - opt.decay_time - arena.<i>.decay - arena.<i>.decay_time - arenas.decay_time - stats.arenas.<i>.decay_time This resolves #325.	2016-02-19 20:56:21 -08:00
Jason Evans	8e82af1166	Implement smoothstep table generation. Check in a generated smootherstep table as smoothstep.h rather than generating it at configure time, since not all systems (e.g. Windows) have dc.	2016-02-19 20:56:15 -08:00
Jason Evans	db927b6727	Refactor arenas_cache tsd. Refactor arenas_cache tsd into arenas_tdata, which is a structure of type arena_tdata_t.	2016-02-19 20:32:37 -08:00
Jason Evans	578cd16581	Refactor arena_malloc_hard() out of arena_malloc().	2016-02-19 20:32:32 -08:00
Jason Evans	34676d3369	Refactor prng* from cpp macros into inline functions. Remove 32-bit variant, convert prng64() to prng_lg_range(), and add prng_range().	2016-02-19 20:29:06 -08:00
Jason Evans	c87ab25d18	Use ticker for incremental tcache GC.	2016-02-19 20:29:06 -08:00
Jason Evans	9998000b2b	Implement ticker. Implement ticker, which provides a simple API for ticking off some number of events before indicating that the ticker has hit its limit.	2016-02-19 20:29:06 -08:00
Jason Evans	94451d184b	Flesh out time_*() API.	2016-02-19 20:29:06 -08:00
Cameron Evans	e5d5a4a517	Add time_update().	2016-02-19 20:29:06 -08:00
Jason Evans	f829009929	Add --with-malloc-conf. Add --with-malloc-conf, which makes it possible to embed a default options string during configuration.	2016-02-19 20:29:06 -08:00
Jason Evans	ef349f3f94	Fix arena_sdalloc() line wrapping.	2016-02-19 20:29:06 -08:00
Jason Evans	f9e3459f75	Tweak code to allow compilation of concatenated src/*.c sources. This resolves #294.	2015-11-12 11:06:41 -08:00
Jason Evans	a6ec1c869e	Fix a comment.	2015-11-12 10:51:32 -08:00
Qi Wang	f4a0f32d34	Fast-path improvement: reduce # of branches and unnecessary operations. - Combine multiple runtime branches into a single malloc_slow check. - Avoid calling arena_choose / size2index / index2size on fast path. - A few micro optimizations.	2015-11-10 14:28:34 -08:00
Joshua Kahn	e8ab0ab9c0	Add function to destroy tree ex_destroy iterates over the tree using post-order traversal so nodes can be removed and processed by the callback function without paying the cost to rebalance the tree. The destruction process cannot be stopped once started.	2015-11-09 15:56:18 -08:00
Joshua Kahn	13b4015531	Allow const keys for lookup Signed-off-by: Steve Dougherty <sdougherty@barracuda.com> This resolves #281.	2015-11-09 15:48:05 -08:00
Steve Dougherty	bd418ce11e	Assert compact color bit is unused Signed-off-by: Joshua Kahn <jkahn@barracuda.com> This resolves #280.	2015-11-09 15:44:30 -08:00
Jason Evans	a784e411f2	Fix a xallocx(..., MALLOCX_ZERO) bug. Fix xallocx(..., MALLOCX_ZERO to zero the last full trailing page of large allocations that have been randomly assigned an offset of 0 when --enable-cache-oblivious configure option is enabled. This addresses a special case missed in `d260f442ce` (Fix xallocx(..., MALLOCX_ZERO) bugs.).	2015-09-24 22:21:55 -07:00
Craig Rodrigues	66814c1a52	Fix tsd_boot1() to use explicit 'void' parameter list.	2015-09-20 21:57:32 -07:00
Jason Evans	6d91929e52	Address portability issues on Solaris. Don't assume Bourne shell is in /bin/sh when running size_classes.sh . Consider __sparcv9 a synonym for __sparc64__ when defining LG_QUANTUM. This resolves #275.	2015-09-15 10:42:36 -07:00
Jason Evans	708ed79834	Resolve an unsupported special case in arena_prof_tctx_set(). Add arena_prof_tctx_reset() and use it instead of arena_prof_tctx_set() when resetting the tctx pointer during reallocation, which happens whenever an originally sampled reallocated object is not sampled during reallocation. This regression was introduced by `594c759f37` (Optimize arena_prof_tctx_set().)	2015-09-14 23:57:58 -07:00
Jason Evans	ea8d97b897	Fix prof_{malloc,free}_sample_object() call order in prof_realloc(). Fix prof_realloc() to call prof_free_sampled_object() after calling prof_malloc_sample_object(). Prior to this fix, if tctx and old_tctx were the same, the tctx could have been prematurely destroyed.	2015-09-14 23:57:52 -07:00
Jason Evans	cec0d63d8b	Make one call to prof_active_get_unlocked() per allocation event. Make one call to prof_active_get_unlocked() per allocation event, and use the result throughout the relevant functions that handle an allocation event. Also add a missing check in prof_realloc(). These fixes protect allocation events against concurrent prof_active changes.	2015-09-14 23:55:48 -07:00
Jason Evans	676df88e48	Rename arena_maxclass to large_maxclass. arena_maxclass is no longer an appropriate name, because arenas also manage huge allocations.	2015-09-11 20:50:20 -07:00
Jason Evans	560a4e1e01	Fix xallocx() bugs. Fix xallocx() bugs related to the 'extra' parameter when specified as non-zero.	2015-09-11 20:40:34 -07:00
Jason Evans	a00b10735a	Fix "prof.reset" mallctl-related corruption. Fix heap profiling to distinguish among otherwise identical sample sites with interposed resets (triggered via the "prof.reset" mallctl). This bug could cause data structure corruption that would most likely result in a segfault.	2015-09-09 23:16:10 -07:00
Jason Evans	b4330b02a8	Fix pointer comparision with undefined behavior. This didn't cause bad code generation in the one case spot-checked (gcc 4.8.1), but had the potential to to so. This bug was introduced by `594c759f37` (Optimize arena_prof_tctx_set().).	2015-09-04 10:31:41 -07:00
Jason Evans	594c759f37	Optimize arena_prof_tctx_set(). Optimize arena_prof_tctx_set() to avoid reading run metadata when deciding whether it's actually necessary to write.	2015-09-02 14:52:24 -07:00
Jason Evans	b5c2a347d7	Silence compiler warnings for unreachable code. Reported by Ingvar Hagelund.	2015-08-19 23:28:34 -07:00
Jason Evans	d01fd19755	Rename index_t to szind_t to avoid an existing type on Solaris. This resolves #256.	2015-08-19 15:21:32 -07:00
Jason Evans	5ef33a9f2b	Don't bitshift by negative amounts. Don't bitshift by negative amounts when encoding/decoding run sizes in chunk header maps. This affected systems with page sizes greater than 8 KiB. Reported by Ingvar Hagelund <ingvar@redpill-linpro.com>.	2015-08-19 14:16:30 -07:00
Jason Evans	85ae064e96	Fix a comment.	2015-08-13 14:54:06 -07:00
Jason Evans	fead75fd52	Fix gcc build failure (define __has_builtin).	2015-08-12 16:46:09 -07:00
Jason Evans	7928f62273	Check whether gcc version supports __builtin_unreachable().	2015-08-12 16:38:39 -07:00
Jason Evans	694d0829c0	Update list of private symbols.	2015-08-12 13:03:43 -07:00
Jason Evans	1f27abc1b1	Refactor arena_mapbits_{small,large}_set() to not preserve unzeroed. Fix arena_run_split_large_helper() to treat newly committed memory as zeroed.	2015-08-11 16:45:47 -07:00
Jason Evans	6bdeddb697	Fix build failure. This regression was introduced by `de249c8679` (Arena chunk decommit cleanups and fixes.). This resolves #254.	2015-08-10 23:42:33 -07:00
Jason Evans	45186f0c07	Refactor arena_mapbits unzeroed flag management. Only set the unzeroed flag when initializing the entire mapbits entry, rather than mutating just the unzeroed bit. This simplifies the possible mapbits state transitions.	2015-08-10 23:03:34 -07:00
Jason Evans	de249c8679	Arena chunk decommit cleanups and fixes. Decommit arena chunk header during chunk deallocation if the rest of the chunk is decommitted.	2015-08-10 17:13:59 -07:00
Jason Evans	8fadb1a8c2	Implement chunk hook support for page run commit/decommit. Cascade from decommit to purge when purging unused dirty pages, so that it is possible to decommit cleaned memory rather than just purging. For non-Windows debug builds, decommit runs rather than purging them, since this causes access of deallocated runs to segfault. This resolves #251.	2015-08-07 00:50:58 -07:00
Daniel Micay	67c46a9e53	work around _FORTIFY_SOURCE false positive In builds with profiling disabled (default), the opt_prof_prefix array has a one byte length as a micro-optimization. This will cause the usage of write in the unused profiling code to be statically detected as a buffer overflow by Bionic's _FORTIFY_SOURCE implementation as it tries to detect read overflows in addition to write overflows. This works around the problem by informing the compiler that not_reached() means code in unreachable in release builds.	2015-08-04 17:09:43 -04:00
Matthijs	c1a6a51e40	MSVC compatibility changes - Decorate public function with __declspec(allocator) and __declspec(restrict), just like MSVC 1900 - Support JEMALLOC_HAS_RESTRICT by defining the restrict keyword - Move __declspec(nothrow) between 'void' and '*' so it compiles once more	2015-08-04 09:01:48 -07:00
Jason Evans	b49a334a64	Generalize chunk management hooks. Add the "arena.<i>.chunk_hooks" mallctl, which replaces and expands on the "arena.<i>.chunk.{alloc,dalloc,purge}" mallctls. The chunk hooks allow control over chunk allocation/deallocation, decommit/commit, purging, and splitting/merging, such that the application can rely on jemalloc's internal chunk caching and retaining functionality, yet implement a variety of chunk management mechanisms and policies. Merge the chunks_[sz]ad_{mmap,dss} red-black trees into chunks_[sz]ad_retained. This slightly reduces how hard jemalloc tries to honor the dss precedence setting; prior to this change the precedence setting was also consulted when recycling chunks. Fix chunk purging. Don't purge chunks in arena_purge_stashed(); instead deallocate them in arena_unstash_purged(), so that the dirty memory linkage remains valid until after the last time it is used. This resolves #176 and #201.	2015-08-03 21:49:02 -07:00
Jason Evans	d059b9d6a1	Implement support for non-coalescing maps on MinGW. - Do not reallocate huge objects in place if the number of backing chunks would change. - Do not cache multi-chunk mappings. This resolves #213.	2015-07-24 18:39:14 -07:00
Jason Evans	87ccb55547	Fix huge_palloc() to handle size rather than usize input. huge_ralloc() passes a size that may not be precisely a size class, so make huge_palloc() handle the more general case of a size input rather than usize. This regression appears to have been introduced by the addition of in-place huge reallocation; as such it was never incorporated into a release.	2015-07-23 17:18:49 -07:00
Jason Evans	4becdf21dc	Fix sa2u() regression. Take large_pad into account when determining whether an aligned allocation can be satisfied by a large size class. This regression was introduced by `8a03cf039c` (Implement cache index randomization for large allocations.).	2015-07-23 17:14:11 -07:00
Jason Evans	71cd2f08ff	Leave PRI* macros defined after using them to define FMT. Macro expansion happens too late for the #undef directives to work as a mechanism for preventing accidental direct use of the PRI macros.	2015-07-23 15:50:09 -07:00
Jason Evans	5fae7dc1b3	Fix MinGW-related portability issues. Create and use FMT* macros that are equivalent to the PRI* macros that inttypes.h defines. This allows uniform use of the Unix-specific format specifiers, e.g. "%zu", as well as avoiding Windows-specific definitions of e.g. PRIu64. Add ffs()/ffsl() support for compiling with gcc. Extract compatibility definitions of ENOENT, EINVAL, EAGAIN, EPERM, ENOMEM, and ENORANGE into include/msvc_compat/windows_extra.h and use the file for tests as well as for core jemalloc code.	2015-07-23 13:56:25 -07:00
Jason Evans	e42c309eba	Add JEMALLOC_FORMAT_PRINTF(). Replace JEMALLOC_ATTR(format(printf, ...). with JEMALLOC_FORMAT_PRINTF(), so that configuration feature tests can omit the attribute if it would cause extraneous compilation warnings.	2015-07-22 15:44:47 -07:00
Jason Evans	5bd879646c	Change default chunk size from 256 KiB to 2 MiB. This change improves interaction with transparent huge pages, e.g. reduced page faults (at least in the absence of unused dirty page purging).	2015-07-15 17:15:26 -07:00
Jason Evans	aa2826621e	Revert to first-best-fit run/chunk allocation. This effectively reverts `97c04a9383` (Use first-fit rather than first-best-fit run/chunk allocation.). In some pathological cases, first-fit search dominates allocation time, and it also tends not to converge as readily on a steady state of memory layout, since precise allocation order has a bigger effect than for first-best-fit.	2015-07-15 17:15:19 -07:00
Jason Evans	dde067264d	Fix an integer overflow bug in {size2index,s2u}_compute(). This {bug,regression} was introduced by `155bfa7da1` (Normalize size classes.). This resolves #241.	2015-07-09 21:36:33 -07:00
Jason Evans	0313607e66	Fix MinGW build warnings. Conditionally define ENOENT, EINVAL, etc. (was unconditional). Add/use PRIzu, PRIzd, and PRIzx for use in malloc_printf() calls. gcc issued (harmless) warnings since e.g. "%zu" should be "%Iu" on Windows, and the alternative to this workaround would have been to disable the function attributes which cause gcc to look for type mismatches in formatted printing function calls.	2015-07-07 20:10:28 -07:00
Matthijs	a1aaf949a5	Optimizations for Windows - Set opt_lg_chunk based on run-time OS setting - Verify LG_PAGE is compatible with run-time OS setting - When targeting Windows Vista or newer, use SRWLOCK instead of CRITICAL_SECTION - When targeting Windows Vista or newer, statically initialize init_lock	2015-06-25 22:53:58 +02:00
Jason Evans	241abc601b	Fix size class overflow handling when profiling is enabled. Fix size class overflow handling for malloc(), posix_memalign(), memalign(), calloc(), and realloc() when profiling is enabled. Remove an assertion that erroneously caused arena_sdalloc() to fail when profiling was enabled. This resolves #232.	2015-06-23 18:56:14 -07:00
Jason Evans	0a9f9a4d51	Convert arena_maybe_purge() recursion to iteration. This resolves #235.	2015-06-22 18:50:58 -07:00
Jason Evans	713b844bff	Update a comment.	2015-06-15 12:01:05 -07:00
Chi-hung Hsieh	c073f8167a	Fix type errors in C11 versions of atomic_*() functions.	2015-05-27 20:33:18 -07:00
Jason Evans	836bbe9951	Impose a minimum tcache count for small size classes. Now that small allocation runs have fewer regions due to run metadata residing in chunk headers, an explicit minimum tcache count is needed to make sure that tcache adequately amortizes synchronization overhead.	2015-05-19 17:47:16 -07:00
Jason Evans	6591ff09d8	Fix arena_dalloc() performance regression. Take into account large_pad when computing whether to pass the deallocation request to tcache_dalloc_large(), so that the largest cacheable size makes it back to tcache. This regression was introduced by `8a03cf039c` (Implement cache index randomization for large allocations.).	2015-05-19 17:44:45 -07:00
Jason Evans	fd5f9e43c3	Avoid atomic operations for dependent rtree reads.	2015-05-15 17:02:30 -07:00
Jason Evans	c451831264	Fix type punning in calls to atomic operation functions.	2015-05-07 22:35:40 -07:00
Jason Evans	8a03cf039c	Implement cache index randomization for large allocations. Extract szad size quantization into {extent,run}_quantize(), and . quantize szad run sizes to the union of valid small region run sizes and large run sizes. Refactor iteration in arena_run_first_fit() to use run_quantize{,_first,_next(), and add support for padded large runs. For large allocations that have no specified alignment constraints, compute a pseudo-random offset from the beginning of the first backing page that is a multiple of the cache line size. Under typical configurations with 4-KiB pages and 64-byte cache lines this results in a uniform distribution among 64 page boundary offsets. Add the --disable-cache-oblivious option, primarily intended for performance testing. This resolves #13.	2015-05-06 13:27:39 -07:00
Jason Evans	562d266511	Add the "stats.arenas.<i>.lg_dirty_mult" mallctl.	2015-03-24 16:41:38 -07:00
Jason Evans	4acd75a694	Add the "stats.allocated" mallctl.	2015-03-23 17:26:53 -07:00
Igor Podlesny	8ad6bf360f	Fix indentation inconsistencies.	2015-03-22 00:09:04 -07:00
Jason Evans	e0a08a1496	Restore --enable-ivsalloc. However, unlike before it was removed do not force --enable-ivsalloc when Darwin zone allocator integration is enabled, since the zone allocator code uses ivsalloc() regardless of whether malloc_usable_size() and sallocx() do. This resolves #211.	2015-03-18 21:06:58 -07:00
Jason Evans	8d6a3e8321	Implement dynamic per arena control over dirty page purging. Add mallctls: - arenas.lg_dirty_mult is initialized via opt.lg_dirty_mult, and can be modified to change the initial lg_dirty_mult setting for newly created arenas. - arena.<i>.lg_dirty_mult controls an individual arena's dirty page purging threshold, and synchronously triggers any purging that may be necessary to maintain the constraint. - arena.<i>.chunk.purge allows the per arena dirty page purging function to be replaced. This resolves #93.	2015-03-18 18:55:33 -07:00
Mike Hommey	c9db461ffb	Use InterlockedCompareExchange instead of non-existing InterlockedCompareExchange32	2015-03-17 12:09:30 +09:00
Jason Evans	04211e2266	Fix heap profiling regressions. Remove the prof_tctx_state_destroying transitory state and instead add the tctx_uid field, so that the tuple <thr_uid, tctx_uid> uniquely identifies a tctx. This assures that tctx's are well ordered even when more than two with the same thr_uid coexist. A previous attempted fix based on prof_tctx_state_destroying was only sufficient for protecting against two coexisting tctx's, but it also introduced a new dumping race. These regressions were introduced by `602c8e0971` (Implement per thread heap profiling.) and `764b00023f` (Fix a heap profiling regression.).	2015-03-16 15:11:06 -07:00
Jason Evans	764b00023f	Fix a heap profiling regression. Add the prof_tctx_state_destroying transitionary state to fix a race between a thread destroying a tctx and another thread creating a new equivalent tctx. This regression was introduced by `602c8e0971` (Implement per thread heap profiling.).	2015-03-14 14:01:35 -07:00
Jason Evans	fbd8d773ad	Fix unsigned comparison underflow. These bugs only affected tests and debug builds.	2015-03-11 23:14:50 -07:00
Jason Evans	f5c8f37259	Normalize rdelm/rd structure field naming.	2015-03-10 18:29:49 -07:00
Jason Evans	38e42d311c	Refactor dirty run linkage to reduce sizeof(extent_node_t).	2015-03-10 18:15:40 -07:00
Jason Evans	97c04a9383	Use first-fit rather than first-best-fit run/chunk allocation. This tends to more effectively pack active memory toward low addresses. However, additional tree searches are required in many cases, so whether this change stands the test of time will depend on real-world benchmarks.	2015-03-06 20:21:41 -08:00
Jason Evans	f044bb219e	Change default chunk size from 4 MiB to 256 KiB. Recent changes have improved huge allocation scalability, which removes upward pressure to set the chunk size so large that huge allocations are rare. Smaller chunks are more likely to completely drain, so set the default to the smallest size that doesn't leave excessive unusable trailing space in chunk headers.	2015-03-06 20:18:34 -08:00
Mike Hommey	4d871f73af	Preserve LastError when calling TlsGetValue TlsGetValue has a semantic difference with pthread_getspecific, in that it can return a non-error NULL value, so it always sets the LastError. But allocator callers may not be expecting calling e.g. free() to change the value of the last error, so preserve it.	2015-03-04 09:50:33 -08:00
Mike Hommey	7c46fd59cc	Make --without-export actually work `9906660` added a --without-export configure option to avoid exporting jemalloc symbols, but the option didn't actually work.	2015-03-04 21:49:15 +09:00
Jason Evans	99bd94fb65	Fix chunk cache races. These regressions were introduced by `ee41ad409a` (Integrate whole chunks into unused dirty page purging machinery.).	2015-02-18 16:40:53 -08:00
Jason Evans	738e089a2e	Rename "dirty chunks" to "cached chunks". Rename "dirty chunks" to "cached chunks", in order to avoid overloading the term "dirty". Fix the regression caused by `339c2b23b2` (Fix chunk_unmap() to propagate dirty state.), and actually address what that change attempted, which is to only purge chunks once, and propagate whether zeroed pages resulted into chunk_record().	2015-02-18 01:15:50 -08:00
Jason Evans	339c2b23b2	Fix chunk_unmap() to propagate dirty state. Fix chunk_unmap() to propagate whether a chunk is dirty, and modify dirty chunk purging to record this information so it can be passed to chunk_unmap(). Since the broken version of chunk_unmap() claimed that all chunks were clean, this resulted in potential memory corruption for purging implementations that do not zero (e.g. MADV_FREE). This regression was introduced by `ee41ad409a` (Integrate whole chunks into unused dirty page purging machinery.).	2015-02-17 22:25:56 -08:00
Jason Evans	47701b22ee	arena_chunk_dirty_node_init() --> extent_node_dirty_linkage_init()	2015-02-17 22:23:10 -08:00
Jason Evans	eafebfdfbe	Remove obsolete type arena_chunk_miscelms_t.	2015-02-17 16:12:31 -08:00
Jason Evans	a4e1888d1a	Simplify extent_node_t and add extent_node_init().	2015-02-17 15:13:52 -08:00
Jason Evans	ee41ad409a	Integrate whole chunks into unused dirty page purging machinery. Extend per arena unused dirty page purging to manage unused dirty chunks in aaddtion to unused dirty runs. Rather than immediately unmapping deallocated chunks (or purging them in the --disable-munmap case), store them in a separate set of trees, chunks_[sz]ad_dirty. Preferrentially allocate dirty chunks. When excessive unused dirty pages accumulate, purge runs and chunks in ingegrated LRU order (and unmap chunks in the --enable-munmap case). Refactor extent_node_t to provide accessor functions.	2015-02-16 21:02:17 -08:00
Jason Evans	40ab8f98e4	Remove more obsolete (incorrect) assertions. This regression was introduced by `88fef7ceda` (Refactor huge_*() calls into arena internals.), and went undetected because of the --enable-debug regression.	2015-02-15 20:26:45 -08:00
Jason Evans	cb9b44914e	Remove obsolete (incorrect) assertions. This regression was introduced by `88fef7ceda` (Refactor huge_*() calls into arena internals.), and went undetected because of the --enable-debug regression.	2015-02-15 20:13:28 -08:00
Jason Evans	2195ba4e1f	Normalize _link and link_ fields to all be *_link.	2015-02-15 16:43:52 -08:00
Jason Evans	41cfe03f39	If MALLOCX_ARENA(a) is specified, use it during tcache fill.	2015-02-13 15:28:56 -08:00
Jason Evans	5f7140b045	Make prof_tctx accesses atomic. Although exceedingly unlikely, it appears that writes to the prof_tctx field of arena_chunk_map_misc_t could be reordered such that a stale value could be read during deallocation, with profiler metadata corruption and invalid pointer dereferences being the most likely effects.	2015-02-12 15:54:53 -08:00
Jason Evans	88fef7ceda	Refactor huge_() calls into arena internals. Make redirects to the huge_() API the arena code's responsibility, since arenas now take responsibility for all allocation sizes.	2015-02-12 14:06:37 -08:00
Jason Evans	cbf3a6d703	Move centralized chunk management into arenas. Migrate all centralized data structures related to huge allocations and recyclable chunks into arena_t, so that each arena can manage huge allocations and recyclable virtual memory completely independently of other arenas. Add chunk node caching to arenas, in order to avoid contention on the base allocator. Use chunks_rtree to look up huge allocations rather than a red-black tree. Maintain a per arena unsorted list of huge allocations (which will be needed to enumerate huge allocations during arena reset). Remove the --enable-ivsalloc option, make ivsalloc() always available, and use it for size queries if --enable-debug is enabled. The only practical implications to this removal are that 1) ivsalloc() is now always available during live debugging (and the underlying radix tree is available during core-based debugging), and 2) size query validation can no longer be enabled independent of --enable-debug. Remove the stats.chunks.{current,total,high} mallctls, and replace their underlying statistics with simpler atomically updated counters used exclusively for gdump triggering. These statistics are no longer very useful because each arena manages chunks independently, and per arena statistics provide similar information. Simplify chunk synchronization code, now that base chunk allocation cannot cause recursive lock acquisition.	2015-02-12 00:15:56 -08:00
Jason Evans	051eae8cc5	Remove unnecessary xchg* lock prefixes.	2015-02-10 16:05:52 -08:00
Jason Evans	1cb181ed63	Implement explicit tcache support. Add the MALLOCX_TCACHE() and MALLOCX_TCACHE_NONE macros, which can be used in conjunction with the *allocx() API. Add the tcache.create, tcache.flush, and tcache.destroy mallctls. This resolves #145.	2015-02-09 17:44:48 -08:00
Jason Evans	23694b0745	Fix arena_get() for (!init_if_missing && refresh_if_missing) case. Fix arena_get() to refresh the cache as needed in the (!init_if_missing && refresh_if_missing) case. This flaw was introduced by the initial arena_get() implementation, which was part of `8bb3198f72` (Refactor/fix arenas manipulation.).	2015-02-09 17:43:10 -08:00
Jason Evans	8d0e04d42f	Refactor rtree to be lock-free. Recent huge allocation refactoring associates huge allocations with arenas, but it remains necessary to quickly look up huge allocation metadata during reallocation/deallocation. A global radix tree remains a good solution to this problem, but locking would have become the primary bottleneck after (upcoming) migration of chunk management from global to per arena data structures. This lock-free implementation uses double-checked reads to traverse the tree, so that in the steady state, each read or write requires only a single atomic operation. This implementation also assures that no more than two tree levels actually exist, through a combination of careful virtual memory allocation which makes large sparse nodes cheap, and skipping the root node on x64 (possible because the top 16 bits are all 0 in practice).	2015-02-04 16:51:53 -08:00
Jason Evans	c810fcea1f	Add (x != 0) assertion to lg_floor(x). lg_floor(0) is undefined, but depending on compiler options may not cause a crash. This assertion makes it harder to accidentally abuse lg_floor().	2015-02-04 16:51:53 -08:00
Jason Evans	f500a10b2e	Refactor base_alloc() to guarantee demand-zeroed memory. Refactor base_alloc() to guarantee that allocations are carved from demand-zeroed virtual memory. This supports sparse data structures such as multi-page radix tree nodes. Enhance base_alloc() to keep track of fragments which were too small to support previous allocation requests, and try to consume them during subsequent requests. This becomes important when request sizes commonly approach or exceed the chunk size (as could radix tree node allocations).	2015-02-04 16:51:53 -08:00
Jason Evans	918a1a5b3f	Reduce extent_node_t size to fit in one cache line.	2015-02-04 16:51:53 -08:00
Jason Evans	a55dfa4b0a	Implement more atomic operations. - atomic__p(). - atomic_cas_(). - atomic_write_*().	2015-02-04 16:50:05 -08:00
Jason Evans	f8723572d8	Add missing prototypes for bootstrap_{malloc,calloc,free}().	2015-02-04 16:50:04 -08:00
Jason Evans	5b8ed5b7c9	Implement the prof.gdump mallctl. This feature makes it possible to toggle the gdump feature on/off during program execution, whereas the the opt.prof_dump mallctl value can only be set during program startup. This resolves #72.	2015-01-25 21:21:35 -08:00
Jason Evans	4581b97809	Implement metadata statistics. There are three categories of metadata: - Base allocations are used for bootstrap-sensitive internal allocator data structures. - Arena chunk headers comprise pages which track the states of the non-metadata pages. - Internal allocations differ from application-originated allocations in that they are for internal use, and that they are omitted from heap profiles. The metadata statistics comprise the metadata categories as follows: - stats.metadata: All metadata -- base + arena chunk headers + internal allocations. - stats.arenas.<i>.metadata.mapped: Arena chunk headers. - stats.arenas.<i>.metadata.allocated: Internal allocations. This is reported separately from the other metadata statistics because it overlaps with the allocated and active statistics, whereas the other metadata statistics do not. Base allocations are not reported separately, though their magnitude can be computed by subtracting the arena-specific metadata. This resolves #163.	2015-01-23 23:34:43 -08:00
Jason Evans	10aff3f3e1	Refactor bootstrapping to delay tsd initialization. Refactor bootstrapping to delay tsd initialization, primarily to support integration with FreeBSD's libc. Refactor a0() for internal-only use, and add the bootstrap_{malloc,calloc,free}() API for use by FreeBSD's libc. This separation limits use of the a0() functions to metadata allocation, which doesn't require malloc/calloc/free API compatibility. This resolves #170.	2015-01-22 14:04:27 -08:00
Abhishek Kulkarni	b617df81bb	Add missing symbols to private_symbols.txt. This resolves #185.	2015-01-21 12:44:35 -08:00
Guilherme Goncalves	51f86346c0	Add a isblank definition for MSVC < 2013	2015-01-09 14:33:46 -08:00
Guilherme Goncalves	2c5cb613df	Introduce two new modes of junk filling: "alloc" and "free". In addition to true/false, opt.junk can now be either "alloc" or "free", giving applications the possibility of junking memory only on allocation or deallocation. This resolves #172.	2014-12-14 17:07:26 -08:00
Daniel Micay	b74041fb6e	Ignore MALLOC_CONF in set{uid,gid,cap} binaries. This eliminates the malloc tunables as tools for an attacker. Closes #173	2014-12-14 15:36:15 -08:00
Jason Evans	e12eaf93dc	Style and spelling fixes.	2014-12-08 16:34:04 -08:00
Chih-hung Hsieh	59cd80e6c6	Add a C11 atomics-based implementation of atomic.h API.	2014-12-06 21:17:49 -08:00
Jason Evans	a18c2b1f15	Style fixes.	2014-12-05 17:49:47 -08:00
Daniel Micay	879e76a9e5	teach the dss chunk allocator to handle new_addr This provides in-place expansion of huge allocations when the end of the allocation is at the end of the sbrk heap. There's already the ability to extend in-place via recycled chunks but this handles the initial growth of the heap via repeated vector / string reallocations. A possible future extension could allow realloc to go from the following: \| huge allocation \| recycled chunks \| ^ dss_end To a larger allocation built from recycled and new chunks: \| huge allocation \| ^ dss_end Doing that would involve teaching the chunk recycling code to request new chunks to satisfy the request. The chunk_dss code wouldn't require any further changes. #include <stdlib.h> int main(void) { size_t chunk = 4 * 1024 * 1024; void ptr = NULL; for (size_t size = chunk; size < chunk 128; size = 2) { ptr = realloc(ptr, size); if (!ptr) return 1; } } dss:secondary: 0.083s dss:primary: 0.083s After: dss:secondary: 0.083s dss:primary: 0.003s The dss heap grows in the upwards direction, so the oldest chunks are at the low addresses and they are used first. Linux prefers to grow the mmap heap downwards, so the trick will not work in the current* mmap chunk allocator as a huge allocation will only be at the top of the heap in a contrived case.	2014-11-28 16:11:19 -08:00
Guilherme Goncalves	a2136025c4	Remove extra definition of je_tsd_boot on win32.	2014-11-18 19:08:18 -02:00
Jason Evans	9cf2be0a81	Make quarantine_init() static.	2014-11-07 14:50:38 -08:00
Jason Evans	c002a5c800	Fix two quarantine regressions. Fix quarantine to actually update tsd when expanding, and to avoid double initialization (leaking the first quarantine) due to recursive initialization. This resolves #161.	2014-11-04 18:03:11 -08:00
Jason Evans	d7a9bab92d	Fix arena_sdalloc() to use promoted size (second attempt). Unlike the preceeding attempted fix, this version avoids the potential for converting an invalid bin index to a size class.	2014-10-31 22:26:24 -07:00
Jason Evans	6da2e9d4f6	Fix arena_sdalloc() to use promoted size.	2014-10-31 17:08:13 -07:00
Jason Evans	cfc5706f69	Miscellaneous cleanups.	2014-10-30 23:18:45 -07:00
Daniel Micay	d33f834591	avoid redundant chunk header reads * use sized deallocation in iralloct_realign * iralloc and ixalloc always need the old size, so pass it in from the caller where it's often already calculated	2014-10-30 17:06:38 -07:00
Daniel Micay	809b0ac391	mark huge allocations as unlikely This cleans up the fast path a bit more by moving away more code.	2014-10-30 17:06:38 -07:00
Jason Evans	9b41ac909f	Fix huge allocation statistics.	2014-10-14 22:20:00 -07:00
Jason Evans	3c4d92e82a	Add per size class huge allocation statistics. Add per size class huge allocation statistics, and normalize various stats: - Change the arenas.nlruns type from size_t to unsigned. - Add the arenas.nhchunks and arenas.hchunks.<i>.size mallctl's. - Replace the stats.arenas.<i>.bins.<j>.allocated mallctl with stats.arenas.<i>.bins.<j>.curregs . - Add the stats.arenas.<i>.hchunks.<j>.nmalloc, stats.arenas.<i>.hchunks.<j>.ndalloc, stats.arenas.<i>.hchunks.<j>.nrequests, and stats.arenas.<i>.hchunks.<j>.curhchunks mallctl's.	2014-10-12 23:02:10 -07:00
Jason Evans	44c97b712e	Fix a prof_tctx_t/prof_tdata_t cleanup race. Fix a prof_tctx_t/prof_tdata_t cleanup race by storing a copy of thr_uid in prof_tctx_t, so that the associated tdata need not be present during tctx teardown.	2014-10-12 13:03:20 -07:00
Jason Evans	381c23dd9d	Remove arena_dalloc_bin_run() clean page preservation. Remove code in arena_dalloc_bin_run() that preserved the "clean" state of trailing clean pages by splitting them into a separate run during deallocation. This was a useful mechanism for reducing dirty page churn when bin runs comprised many pages, but bin runs are now quite small. Remove the nextind field from arena_run_t now that it is no longer needed, and change arena_run_t's bin field (arena_bin_t *) to binind (index_t). These two changes remove 8 bytes of chunk header overhead per page, which saves 1/512 of all arena chunk memory.	2014-10-10 23:01:03 -07:00
Jason Evans	81e547566e	Add --with-lg-tiny-min, generalize --with-lg-quantum.	2014-10-10 22:35:07 -07:00
Jason Evans	fc0b3b7383	Add configure options. Add: --with-lg-page --with-lg-page-sizes --with-lg-size-class-group --with-lg-quantum Get rid of STATIC_PAGE_SHIFT, in favor of directly setting LG_PAGE. Fix various edge conditions exposed by the configure options.	2014-10-09 22:44:37 -07:00
Daniel Micay	f22214a29d	Use regular arena allocation for huge tree nodes. This avoids grabbing the base mutex, as a step towards fine-grained locking for huge allocations. The thread cache also provides a tiny (~3%) improvement for serial huge allocations.	2014-10-07 23:57:09 -07:00
Jason Evans	8bb3198f72	Refactor/fix arenas manipulation. Abstract arenas access to use arena_get() (or a0get() where appropriate) rather than directly reading e.g. arenas[ind]. Prior to the addition of the arenas.extend mallctl, the worst possible outcome of directly accessing arenas was a stale read, but arenas.extend may allocate and assign a new array to arenas. Add a tsd-based arenas_cache, which amortizes arenas reads. This introduces some subtle bootstrapping issues, with tsd_boot() now being split into tsd_boot[01]() to support tsd wrapper allocation bootstrapping, as well as an arenas_cache_bypass tsd variable which dynamically terminates allocation of arenas_cache itself. Promote a0malloc(), a0calloc(), and a0free() to be generally useful for internal allocation, and use them in several places (more may be appropriate). Abstract arena->nthreads management and fix a missing decrement during thread destruction (recent tsd refactoring left arenas_cleanup() unused). Change arena_choose() to propagate OOM, and handle OOM in all callers. This is important for providing consistent allocation behavior when the MALLOCX_ARENA() flag is being used. Prior to this fix, it was possible for an OOM to result in allocation silently allocating from a different arena than the one specified.	2014-10-07 23:14:57 -07:00
Jason Evans	155bfa7da1	Normalize size classes. Normalize size classes to use the same number of size classes per size doubling (currently hard coded to 4), across the intire range of size classes. Small size classes already used this spacing, but in order to support this change, additional small size classes now fill [4 KiB .. 16 KiB). Large size classes range from [16 KiB .. 4 MiB). Huge size classes now support non-multiples of the chunk size in order to fill (4 MiB .. 16 MiB).	2014-10-06 01:45:13 -07:00
Daniel Micay	a95018ee81	Attempt to expand huge allocations in-place. This adds support for expanding huge allocations in-place by requesting memory at a specific address from the chunk allocator. It's currently only implemented for the chunk recycling path, although in theory it could also be done by optimistically allocating new chunks. On Linux, it could attempt an in-place mremap. However, that won't work in practice since the heap is grown downwards and memory is not unmapped (in a normal build, at least). Repeated vector reallocation micro-benchmark: #include <string.h> #include <stdlib.h> int main(void) { for (size_t i = 0; i < 100; i++) { void ptr = NULL; size_t old_size = 0; for (size_t size = 4; size < (1 << 30); size = 2) { ptr = realloc(ptr, size); if (!ptr) return 1; memset(ptr + old_size, 0xff, size - old_size); old_size = size; } free(ptr); } } The glibc allocator fails to do any in-place reallocations on this benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it elides the cost of copies via mremap, which is currently not something that jemalloc can use. With this improvement, jemalloc still fails to do any in-place huge reallocations for the first outer loop, but then succeeds 100% of the time for the remaining 99 iterations. The time spent doing allocations and copies drops down to under 5%, with nearly all of it spent doing purging + faulting (when huge pages are disabled) and the array memset. An improved mremap API (MREMAP_RETAIN - #138) would be far more general but this is a portable optimization and would still be useful on Linux for xallocx. Numbers with transparent huge pages enabled: glibc (copies elided via MREMAP_MAYMOVE): 8.471s jemalloc: 17.816s jemalloc + no-op madvise: 13.236s jemalloc + this commit: 6.787s jemalloc + this commit + no-op madvise: 6.144s Numbers with transparent huge pages disabled: glibc (copies elided via MREMAP_MAYMOVE): 15.403s jemalloc: 39.456s jemalloc + no-op madvise: 12.768s jemalloc + this commit: 15.534s jemalloc + this commit + no-op madvise: 6.354s Closes #137	2014-10-05 14:47:01 -07:00
Jason Evans	16854ebeb7	Don't disable tcache for lazy-lock. Don't disable tcache when lazy-lock is configured. There already exists a mechanism to disable tcache, but doing so automatically due to lazy-lock causes surprising performance behavior.	2014-10-04 15:00:51 -07:00
Jason Evans	34e85b4182	Make prof-related inline functions always-inline.	2014-10-04 11:26:05 -07:00
Jason Evans	029d44cf8b	Fix tsd cleanup regressions. Fix tsd cleanup regressions that were introduced in `5460aa6f66` (Convert all tsd variables to reside in a single tsd structure.). These regressions were twofold: 1) tsd_tryget() should never (and need never) return NULL. Rename it to tsd_fetch() and simplify all callers. 2) tsd__set() must only be called when tsd is in the nominal state, because cleanup happens during the nominal-->purgatory transition, and re-initialization must not happen while in the purgatory state. Add tsd_nominal() and use it as needed. Note that tsd_{p,}_get() can still be used as long as no re-initialization that would require cleanup occurs. This means that e.g. the thread_allocated counter can be updated unconditionally.	2014-10-04 11:22:55 -07:00
Jason Evans	fc12c0b8bc	Implement/test/fix prof-related mallctl's. Implement/test/fix the opt.prof_thread_active_init, prof.thread_active_init, and thread.prof.active mallctl's. Test/fix the thread.prof.name mallctl. Refactor opt_prof_active to be read-only and move mutable state into the prof_active variable. Stop leaning on ctl-related locking for protection.	2014-10-03 23:25:30 -07:00
Jason Evans	551ebc4364	Convert to uniform style: cond == false --> !cond	2014-10-03 10:16:09 -07:00
Jason Evans	20c31deaae	Test prof.reset mallctl and fix numerous discovered bugs.	2014-10-02 23:01:10 -07:00
Eric Wong	4dcf04bfc0	correctly detect adaptive mutexes in pthreads PTHREAD_MUTEX_ADAPTIVE_NP is an enum on glibc and not a macro, we must test for their existence by attempting compilation.	2014-09-29 16:10:40 -07:00
Jason Evans	5d9732f2cf	Merge pull request #129 from daverigby/msvc_lg_floor Use MSVC intrinsics for lg_floor	2014-09-29 15:15:31 -07:00
Jason Evans	0c5dd03e88	Move small run metadata into the arena chunk header. Move small run metadata into the arena chunk header, with multiple expected benefits: - Lower run fragmentation due to reduced run sizes; runs are more likely to completely drain when there are fewer total regions. - Improved cache behavior. Prior to this change, run headers were always page-aligned, which put extra pressure on some CPU cache sets. The degree to which this was a problem was hardware dependent, but it likely hurt some even for the most advanced modern hardware. - Buffer overruns/underruns are less likely to corrupt allocator metadata. - Size classes between 4 KiB and 16 KiB become reasonable to support without any special handling, and the runs are small enough that dirty unused pages aren't a significant concern.	2014-09-29 01:31:39 -07:00
Jason Evans	f97e5ac4ec	Implement compile-time bitmap size computation.	2014-09-28 14:43:11 -07:00
Jason Evans	6ef80d68f0	Fix profile dumping race. Fix a race that caused a non-critical assertion failure. To trigger the race, a thread had to be part way through initializing a new sample, such that it was discoverable by the dumping thread, but not yet linked into its gctx by the time a later dump phase would normally have reset its state to 'nominal'. Additionally, lock access to the state field during modification to transition to the dumping state. It's not apparent that this oversight could have caused an actual problem due to outer locking that protects the dumping machinery, but the added locking pedantically follows the stated locking protocol for the state field.	2014-09-24 22:23:43 -07:00
Dave Rigby	112704cfbf	Use MSVC intrinsics for lg_floor When using MSVC make use of its intrinsic functions (supported on x86, amd64 & ARM) for lg_floor.	2014-09-24 11:55:02 +01:00
Jason Evans	5460aa6f66	Convert all tsd variables to reside in a single tsd structure.	2014-09-23 02:36:08 -07:00
Jason Evans	9c640bfdd4	Apply likely()/unlikely() to allocation/deallocation fast paths.	2014-09-11 17:01:58 -07:00
Daniel Micay	23fdf8b359	mark some conditions as unlikely * assertion failure * malloc_init failure * malloc not already initialized (in malloc_init) * running in valgrind * thread cache disabled at runtime Clang and GCC already consider a comparison with NULL or -1 to be cold, so many branches (out-of-memory) are already correctly considered as cold and marking them is not important.	2014-09-10 21:49:42 -04:00
Daniel Micay	6b5609d23b	add likely / unlikely macros	2014-09-10 17:36:32 -04:00
Jason Evans	6e73dc194e	Fix a profile sampling race. Fix a profile sampling race that was due to preparing to sample, yet doing nothing to assure that the context remains valid until the stats are updated. These regressions were caused by `602c8e0971` (Implement per thread heap profiling.), which did not make it into any releases prior to these fixes.	2014-09-09 19:47:09 -07:00
Jason Evans	6fd53da030	Fix prof_tdata_get()-related regressions. Fix prof_tdata_get() to avoid dereferencing an invalid tdata pointer (when it's PROF_TDATA_STATE_{REINCARNATED,PURGATORY}). Fix prof_tdata_get() callers to check for invalid results besides NULL (PROF_TDATA_STATE_{REINCARNATED,PURGATORY}). These regressions were caused by `602c8e0971` (Implement per thread heap profiling.), which did not make it into any releases prior to these fixes.	2014-09-09 15:29:34 -07:00
Daniel Micay	a62812eacc	fix isqalloct (should call isdalloct)	2014-09-08 21:46:17 -04:00
Daniel Micay	4cfe55166e	Add support for sized deallocation. This adds a new `sdallocx` function to the external API, allowing the size to be passed by the caller. It avoids some extra reads in the thread cache fast path. In the case where stats are enabled, this avoids the work of calculating the size from the pointer. An assertion validates the size that's passed in, so enabling debugging will allow users of the API to debug cases where an incorrect size is passed in. The performance win for a contrived microbenchmark doing an allocation and immediately freeing it is ~10%. It may have a different impact on a real workload. Closes #28	2014-09-08 17:34:24 -07:00
Jason Evans	b718cf77e9	Optimize [nmd]alloc() fast paths. Optimize [nmd]alloc() fast paths such that the (flags == 0) case is streamlined, flags decoding only happens to the minimum degree necessary, and no conditionals are repeated.	2014-09-07 14:40:19 -07:00
Jason Evans	c21b05ea09	Whitespace cleanups.	2014-09-04 22:27:26 -07:00
Qinfan Wu	ff6a31d3b9	Refactor chunk map. Break the chunk map into two separate arrays, in order to improve cache locality. This is related to issue #23.	2014-09-04 22:22:52 -07:00
Sara Golemon	3e24afa28e	Test for availability of malloc hooks via autoconf __*_hook() is glibc, but on at least one glibc platform (homebrew), the __GLIBC__ define isn't set correctly and we miss being able to use these hooks. Do a feature test for it during configuration so that we enable it anywhere the hooks are actually available.	2014-08-22 15:19:21 -07:00
Jason Evans	602c8e0971	Implement per thread heap profiling. Rename data structures (prof_thr_cnt_t-->prof_tctx_t, prof_ctx_t-->prof_gctx_t), and convert to storing a prof_tctx_t for sampled objects. Convert PROF_ALLOC_PREP() to prof_alloc_prep(), since precise backtrace depth within jemalloc functions is no longer an issue (pprof prunes irrelevant frames). Implement mallctl's: - prof.reset implements full sample data reset, and optional change of sample interval. - prof.lg_sample reads the current sample interval (opt.lg_prof_sample was the permanent source of truth prior to prof.reset). - thread.prof.name provides naming capability for threads within heap profile dumps. - thread.prof.active makes it possible to activate/deactivate heap profiling for individual threads. Modify the heap dump files to contain per thread heap profile data. This change is incompatible with the existing pprof, which will require enhancements to read and process the enriched data.	2014-08-19 21:31:16 -07:00
Jason Evans	1628e8615e	Add rb_empty().	2014-08-19 21:05:54 -07:00
Jason Evans	3a81cbd2d4	Dump heap profile backtraces in a stable order. Also iterate over per thread stats in a stable order, which prepares the way for stable ordering of per thread heap profile dumps.	2014-08-19 21:05:54 -07:00
Jason Evans	ab532e9799	Directly embed prof_ctx_t's bt.	2014-08-19 21:05:54 -07:00
Jason Evans	b41ccdb125	Convert prof_tdata_t's bt2cnt to a comprehensive map. Treat prof_tdata_t's bt2cnt as a comprehensive map of the thread's extant allocation samples (do not limit the total number of entries). This helps prepare the way for per thread heap profiling.	2014-08-19 21:05:54 -07:00
Jason Evans	070b3c3fbd	Fix and refactor runs_dirty-based purging. Fix runs_dirty-based purging to also purge dirty pages in the spare chunk. Refactor runs_dirty manipulation into arena_dirty_{insert,remove}(), and move the arena->ndirty accounting into those functions. Remove the u.ql_link field from arena_chunk_map_t, and get rid of the enclosing union for u.rb_link, since only rb_link remains. Remove the ndirty field from arena_chunk_t.	2014-08-14 14:45:58 -07:00
Qinfan Wu	e8a2fd83a2	arena->npurgatory is no longer needed since we drop arena's lock after stashing all the purgeable runs.	2014-08-12 09:50:01 -07:00
Qinfan Wu	90737fcda1	Remove chunks_dirty tree, nruns_avail and nruns_adjac since we no longer need to maintain the tree for dirty page purging.	2014-08-12 09:50:00 -07:00
Qinfan Wu	04d60a132b	Maintain all the dirty runs in a linked list for each arena	2014-08-12 09:50:00 -07:00
Jason Evans	a2ea54c986	Add atomic operations tests and fix latent bugs.	2014-08-06 23:36:19 -07:00
Manuel A. Fernandez Montecelo	ffa259841c	Add OpenRISC/or1k LG_QUANTUM size definition	2014-07-29 23:11:26 +01:00
Richard Diamond	994fad9bda	Add check for madvise(2) to configure.ac. Some platforms, such as Google's Portable Native Client, use Newlib and thus lack access to madvise(2). In those instances, pages_purge() is transformed into a no-op.	2014-06-03 09:32:49 -07:00
Richard Diamond	9c3a10fdf6	Try to use __builtin_ffsl if ffsl is unavailable. Some platforms (like those using Newlib) don't have ffs/ffsl. This commit adds a check to configure.ac for __builtin_ffsl if ffsl isn't found. __builtin_ffsl performs the same function as ffsl, and has the added benefit of being available on any platform utilizing Gcc-compatible compiler. This change does not address the used of ffs in the MALLOCX_ARENA() macro.	2014-06-02 07:44:50 -07:00
Jason Evans	0b5c92213f	Fix fallback lg_floor() implementations.	2014-06-01 22:05:08 -07:00
Jason Evans	1f6d77e1f6	Use KQU() rather than QU() where applicable. Fix KZI() and KQI() to append LL rather than ULL.	2014-05-28 21:17:42 -07:00
Jason Evans	d04047cc29	Add size class computation capability. Add size class computation capability, currently used only as validation of the size class lookup tables. Generalize the size class spacing used for bins, for eventual use throughout the full range of allocation sizes.	2014-05-28 21:06:46 -07:00

... 5 6 7 8 9 ...

825 Commits