server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Jason Evans	613cdc80f6	Convert arena_bin_t's runs from a tree to a heap.	2016-03-08 13:48:27 -08:00
Dave Watson	4a0dbb5ac8	Use pairing heap for arena->runs_avail Use pairing heap instead of red black tree in arena runs_avail. The extra links are unioned with the bitmap_t, so this change doesn't use any extra memory. Canaries show this change to be a 1% cpu win, and 2% latency win. In particular, large free()s, and small bin frees are now O(1) (barring coalescing). I also tested changing bin->runs to be a pairing heap, but saw a much smaller win, and it would mean increasing the size of arena_run_s by two pointers, so I left that as an rb-tree for now.	2016-03-08 13:48:27 -08:00
Jason Evans	022f6891fa	Avoid a potential innocuous compiler warning. Add a cast to avoid comparing a ssize_t value to a uint64_t value that is always larger than a 32-bit ssize_t. This silences an innocuous compiler warning from e.g. gcc 4.2.1 about the comparison always having the same result.	2016-03-02 22:45:37 -08:00
Dmitri Smirnov	33184bf698	Fix stack corruption and uninitialized var warning Stack corruption happens in x64 bit This resolves #347.	2016-02-29 15:22:53 -08:00
Jason Evans	3c07f803aa	Fix stats.arenas.<i>.[...] for --disable-stats case. Add missing stats.arenas.<i>.{dss,lg_dirty_mult,decay_time} initialization. Fix stats.arenas.<i>.{pactive,pdirty} to read under the protection of the arena mutex.	2016-02-27 20:40:13 -08:00
Jason Evans	40ee9aa957	Fix stats.cactive accounting regression. Fix stats.cactive accounting to always increase/decrease by multiples of the chunk size, even for huge size classes that are not multiples of the chunk size, e.g. {2.5, 3, 3.5, 5, 7} MiB with 2 MiB chunk size. This regression was introduced by `155bfa7da1` (Normalize size classes.) and first released in 4.0.0. This resolves #336.	2016-02-27 15:35:52 -08:00
Jason Evans	3763d3b5f9	Refactor arena_cactive_update() into arena_cactive_{add,sub}(). This removes an implicit conversion from size_t to ssize_t. For cactive decreases, the size_t value was intentionally underflowed to generate "negative" values (actually positive values above the positive range of ssize_t), and the conversion to ssize_t was undefined according to C language semantics. This regression was perpetuated by `1522937e9c` (Fix the cactive statistic.) and first release in 4.0.0, which in retrospect only fixed one of two problems introduced by `aa5113b1fd` (Refactor overly large/complex functions) and first released in 3.5.0.	2016-02-26 17:29:35 -08:00
Jason Evans	42ce80e15a	Silence miscellaneous 64-to-32-bit data loss warnings. This resolves #341.	2016-02-25 20:51:00 -08:00
Jason Evans	8282a2ad97	Remove a superfluous comment.	2016-02-25 16:44:48 -08:00
Jason Evans	0c516a00c4	Make *allocx() size class overflow behavior defined. Limit supported size and alignment to HUGE_MAXCLASS, which in turn is now limited to be less than PTRDIFF_MAX. This resolves #278 and #295.	2016-02-25 15:29:49 -08:00
Jason Evans	767d85061a	Refactor arenas array (fixes deadlock). Refactor the arenas array, which contains pointers to all extant arenas, such that it starts out as a sparse array of maximum size, and use double-checked atomics-based reads as the basis for fast and simple arena_get(). Additionally, reduce arenas_lock's role such that it only protects against arena initalization races. These changes remove the possibility for arena lookups to trigger locking, which resolves at least one known (fork-related) deadlock. This resolves #315.	2016-02-24 23:58:10 -08:00
Dave Watson	3812729167	Fix arena_size computation. Fix arena_size arena_new() computation to incorporate runs_avail_nclasses elements for runs_avail, rather than (runs_avail_nclasses - 1) elements. Since offsetof(arena_t, runs_avail) is used rather than sizeof(arena_t) for the first term of the computation, all of the runs_avail elements must be added into the second term. This bug was introduced (by Jason Evans) while merging pull request #330 as `3417a304cc` (Separate arena_avail trees).	2016-02-24 20:10:02 -08:00
Dave Watson	cd86c1481a	Fix arena_run_first_best_fit Merge of `3417a304cc` looks like a small bug: first_best_fit doesn't scan through all the classes, since ind is offset from runs_avail_nclasses by run_avail_bias.	2016-02-24 17:50:02 -08:00
Jason Evans	9e1810ca9d	Silence miscellaneous 64-to-32-bit data loss warnings.	2016-02-24 13:03:48 -08:00
Jason Evans	9f4ee6034c	Refactor jemalloc_ffs() into ffs_(). Use appropriate versions to resolve 64-to-32-bit data loss warnings.	2016-02-24 13:03:48 -08:00
Jason Evans	ae45142adc	Collapse arena_avail_tree_* into arena_run_tree_*. These tree types converged to become identical, yet they still had independently generated red-black tree implementations.	2016-02-23 18:27:24 -08:00
Dave Watson	3417a304cc	Separate arena_avail trees Separate run trees by index, replacing the previous quantize logic. Quantization by index is now performed only on insertion / removal from the tree, and not on node comparison, saving some cpu. This also means we don't have to dereference the miscelm* pointers, saving half of the memory loads from miscelms/mapbits that have fallen out of cache. A linear scan of the indicies appears to be fast enough. The only cost of this is an extra tree array in each arena.	2016-02-23 18:09:36 -08:00
Jason Evans	0da8ce1e96	Use table lookup for run_quantize_{floor,ceil}(). Reduce run quantization overhead by generating lookup tables during bootstrapping, and using the tables for all subsequent run quantization.	2016-02-22 16:47:34 -08:00
Jason Evans	08551eee58	Fix run_quantize_ceil(). In practice this bug had limited impact (and then only by increasing chunk fragmentation) because run_quantize_ceil() returned correct results except for inputs that could only arise from aligned allocation requests that required more than page alignment. This bug existed in the original run quantization implementation, which was introduced by `8a03cf039c` (Implement cache index randomization for large allocations.).	2016-02-22 16:28:00 -08:00
Jason Evans	a9a4684792	Test run quantization. Also rename run_quantize_*() to improve clarity. These tests demonstrate that run_quantize_ceil() is flawed.	2016-02-22 14:58:05 -08:00
Jason Evans	9bad079039	Refactor time_* into nstime_*. Use a single uint64_t in nstime_t to store nanoseconds rather than using struct timespec. This reduces fragility around conversions between long and uint64_t, especially missing casts that only cause problems on 32-bit platforms.	2016-02-21 21:39:05 -08:00
Jason Evans	243f7a0508	Implement decay-based unused dirty page purging. This is an alternative to the existing ratio-based unused dirty page purging, and is intended to eventually become the sole purging mechanism. Add mallctls: - opt.purge - opt.decay_time - arena.<i>.decay - arena.<i>.decay_time - arenas.decay_time - stats.arenas.<i>.decay_time This resolves #325.	2016-02-19 20:56:21 -08:00
Jason Evans	1a4ad3c0fa	Refactor out arena_compute_npurge(). Refactor out arena_compute_npurge() by integrating its logic into arena_stash_dirty() as an incremental computation.	2016-02-19 20:32:37 -08:00
Jason Evans	4985dc681e	Refactor arena_ralloc_no_move(). Refactor early return logic in arena_ralloc_no_move() to return early on failure rather than on success.	2016-02-19 20:32:37 -08:00
Jason Evans	578cd16581	Refactor arena_malloc_hard() out of arena_malloc().	2016-02-19 20:32:32 -08:00
Jason Evans	34676d3369	Refactor prng* from cpp macros into inline functions. Remove 32-bit variant, convert prng64() to prng_lg_range(), and add prng_range().	2016-02-19 20:29:06 -08:00
Qi Wang	f4a0f32d34	Fast-path improvement: reduce # of branches and unnecessary operations. - Combine multiple runtime branches into a single malloc_slow check. - Avoid calling arena_choose / size2index / index2size on fast path. - A few micro optimizations.	2015-11-10 14:28:34 -08:00
Joshua Kahn	13b4015531	Allow const keys for lookup Signed-off-by: Steve Dougherty <sdougherty@barracuda.com> This resolves #281.	2015-11-09 15:48:05 -08:00
Mike Hommey	f97298bfc1	Remove arena_run_dalloc_decommit(). This resolves #284.	2015-11-09 15:38:30 -08:00
Jason Evans	a784e411f2	Fix a xallocx(..., MALLOCX_ZERO) bug. Fix xallocx(..., MALLOCX_ZERO to zero the last full trailing page of large allocations that have been randomly assigned an offset of 0 when --enable-cache-oblivious configure option is enabled. This addresses a special case missed in `d260f442ce` (Fix xallocx(..., MALLOCX_ZERO) bugs.).	2015-09-24 22:21:55 -07:00
Jason Evans	d260f442ce	Fix xallocx(..., MALLOCX_ZERO) bugs. Zero all trailing bytes of large allocations when --enable-cache-oblivious configure option is enabled. This regression was introduced by `8a03cf039c` (Implement cache index randomization for large allocations.). Zero trailing bytes of huge allocations when resizing from/to a size class that is not a multiple of the chunk size.	2015-09-24 16:38:45 -07:00
Jason Evans	e56b24e3a2	Make arena_dalloc_large_locked_impl() static.	2015-09-20 09:58:10 -07:00
Jason Evans	9a505b768c	Centralize xallocx() size[+extra] overflow checks.	2015-09-15 14:39:58 -07:00
Jason Evans	676df88e48	Rename arena_maxclass to large_maxclass. arena_maxclass is no longer an appropriate name, because arenas also manage huge allocations.	2015-09-11 20:50:20 -07:00
Jason Evans	560a4e1e01	Fix xallocx() bugs. Fix xallocx() bugs related to the 'extra' parameter when specified as non-zero.	2015-09-11 20:40:34 -07:00
Dmitry-Me	a306a60651	Reduce variables scope	2015-09-04 10:42:33 -07:00
Jason Evans	d01fd19755	Rename index_t to szind_t to avoid an existing type on Solaris. This resolves #256.	2015-08-19 15:21:32 -07:00
Jason Evans	5ef33a9f2b	Don't bitshift by negative amounts. Don't bitshift by negative amounts when encoding/decoding run sizes in chunk header maps. This affected systems with page sizes greater than 8 KiB. Reported by Ingvar Hagelund <ingvar@redpill-linpro.com>.	2015-08-19 14:16:30 -07:00
Jason Evans	1f27abc1b1	Refactor arena_mapbits_{small,large}_set() to not preserve unzeroed. Fix arena_run_split_large_helper() to treat newly committed memory as zeroed.	2015-08-11 16:45:47 -07:00
Jason Evans	45186f0c07	Refactor arena_mapbits unzeroed flag management. Only set the unzeroed flag when initializing the entire mapbits entry, rather than mutating just the unzeroed bit. This simplifies the possible mapbits state transitions.	2015-08-10 23:03:34 -07:00
Jason Evans	de249c8679	Arena chunk decommit cleanups and fixes. Decommit arena chunk header during chunk deallocation if the rest of the chunk is decommitted.	2015-08-10 17:13:59 -07:00
Jason Evans	8fadb1a8c2	Implement chunk hook support for page run commit/decommit. Cascade from decommit to purge when purging unused dirty pages, so that it is possible to decommit cleaned memory rather than just purging. For non-Windows debug builds, decommit runs rather than purging them, since this causes access of deallocated runs to segfault. This resolves #251.	2015-08-07 00:50:58 -07:00
Jason Evans	5716d97f75	Fix an in-place growing large reallocation regression. Fix arena_ralloc_large_grow() to properly account for large_pad, so that in-place large reallocation succeeds when possible, rather than always failing. This regression was introduced by `8a03cf039c` (Implement cache index randomization for large allocations.)	2015-08-06 23:45:45 -07:00
Jason Evans	b49a334a64	Generalize chunk management hooks. Add the "arena.<i>.chunk_hooks" mallctl, which replaces and expands on the "arena.<i>.chunk.{alloc,dalloc,purge}" mallctls. The chunk hooks allow control over chunk allocation/deallocation, decommit/commit, purging, and splitting/merging, such that the application can rely on jemalloc's internal chunk caching and retaining functionality, yet implement a variety of chunk management mechanisms and policies. Merge the chunks_[sz]ad_{mmap,dss} red-black trees into chunks_[sz]ad_retained. This slightly reduces how hard jemalloc tries to honor the dss precedence setting; prior to this change the precedence setting was also consulted when recycling chunks. Fix chunk purging. Don't purge chunks in arena_purge_stashed(); instead deallocate them in arena_unstash_purged(), so that the dirty memory linkage remains valid until after the last time it is used. This resolves #176 and #201.	2015-08-03 21:49:02 -07:00
Jason Evans	50883deb6e	Change arena_palloc_large() parameter from size to usize. This change merely documents that arena_palloc_large() always receives usize as its argument.	2015-07-23 17:13:18 -07:00
Jason Evans	5fae7dc1b3	Fix MinGW-related portability issues. Create and use FMT* macros that are equivalent to the PRI* macros that inttypes.h defines. This allows uniform use of the Unix-specific format specifiers, e.g. "%zu", as well as avoiding Windows-specific definitions of e.g. PRIu64. Add ffs()/ffsl() support for compiling with gcc. Extract compatibility definitions of ENOENT, EINVAL, EAGAIN, EPERM, ENOMEM, and ENORANGE into include/msvc_compat/windows_extra.h and use the file for tests as well as for core jemalloc code.	2015-07-23 13:56:25 -07:00
Jason Evans	aa2826621e	Revert to first-best-fit run/chunk allocation. This effectively reverts `97c04a9383` (Use first-fit rather than first-best-fit run/chunk allocation.). In some pathological cases, first-fit search dominates allocation time, and it also tends not to converge as readily on a steady state of memory layout, since precise allocation order has a bigger effect than for first-best-fit.	2015-07-15 17:15:19 -07:00
Jason Evans	0313607e66	Fix MinGW build warnings. Conditionally define ENOENT, EINVAL, etc. (was unconditional). Add/use PRIzu, PRIzd, and PRIzx for use in malloc_printf() calls. gcc issued (harmless) warnings since e.g. "%zu" should be "%Iu" on Windows, and the alternative to this workaround would have been to disable the function attributes which cause gcc to look for type mismatches in formatted printing function calls.	2015-07-07 20:10:28 -07:00
Jason Evans	bce61d61bb	Move a variable declaration closer to its use.	2015-07-07 09:32:05 -07:00
Jason Evans	0a9f9a4d51	Convert arena_maybe_purge() recursion to iteration. This resolves #235.	2015-06-22 18:50:58 -07:00
Jason Evans	5154175cf1	Fix performance regression in arena_palloc(). Pass large allocation requests to arena_malloc() when possible. This regression was introduced by `155bfa7da1` (Normalize size classes.).	2015-05-19 17:42:31 -07:00
Jason Evans	8a03cf039c	Implement cache index randomization for large allocations. Extract szad size quantization into {extent,run}_quantize(), and . quantize szad run sizes to the union of valid small region run sizes and large run sizes. Refactor iteration in arena_run_first_fit() to use run_quantize{,_first,_next(), and add support for padded large runs. For large allocations that have no specified alignment constraints, compute a pseudo-random offset from the beginning of the first backing page that is a multiple of the cache line size. Under typical configurations with 4-KiB pages and 64-byte cache lines this results in a uniform distribution among 64 page boundary offsets. Add the --disable-cache-oblivious option, primarily intended for performance testing. This resolves #13.	2015-05-06 13:27:39 -07:00
Jason Evans	65db63cf3f	Fix in-place shrinking huge reallocation purging bugs. Fix the shrinking case of huge_ralloc_no_move_similar() to purge the correct number of pages, at the correct offset. This regression was introduced by `8d6a3e8321` (Implement dynamic per arena control over dirty page purging.). Fix huge_ralloc_no_move_shrink() to purge the correct number of pages. This bug was introduced by `9673983443` (Purge/zero sub-chunk huge allocations as necessary.).	2015-03-25 19:10:06 -07:00
Jason Evans	562d266511	Add the "stats.arenas.<i>.lg_dirty_mult" mallctl.	2015-03-24 16:41:38 -07:00
Jason Evans	bd16ea49c3	Fix signed/unsigned comparison in arena_lg_dirty_mult_valid().	2015-03-24 15:59:28 -07:00
Jason Evans	8d6a3e8321	Implement dynamic per arena control over dirty page purging. Add mallctls: - arenas.lg_dirty_mult is initialized via opt.lg_dirty_mult, and can be modified to change the initial lg_dirty_mult setting for newly created arenas. - arena.<i>.lg_dirty_mult controls an individual arena's dirty page purging threshold, and synchronously triggers any purging that may be necessary to maintain the constraint. - arena.<i>.chunk.purge allows the per arena dirty page purging function to be replaced. This resolves #93.	2015-03-18 18:55:33 -07:00
Jason Evans	bc45d41d23	Fix a declaration-after-statement regression.	2015-03-11 16:50:40 -07:00
Jason Evans	f5c8f37259	Normalize rdelm/rd structure field naming.	2015-03-10 18:29:49 -07:00
Jason Evans	38e42d311c	Refactor dirty run linkage to reduce sizeof(extent_node_t).	2015-03-10 18:15:40 -07:00
Jason Evans	97c04a9383	Use first-fit rather than first-best-fit run/chunk allocation. This tends to more effectively pack active memory toward low addresses. However, additional tree searches are required in many cases, so whether this change stands the test of time will depend on real-world benchmarks.	2015-03-06 20:21:41 -08:00
Jason Evans	5707d6f952	Quantize szad trees by size class. Treat sizes that round down to the same size class as size-equivalent in trees that are used to search for first best fit, so that there are only as many "firsts" as there are size classes. This comes closer to the ideal of first fit.	2015-03-06 20:21:41 -08:00
Jason Evans	99bd94fb65	Fix chunk cache races. These regressions were introduced by `ee41ad409a` (Integrate whole chunks into unused dirty page purging machinery.).	2015-02-18 16:40:53 -08:00
Jason Evans	738e089a2e	Rename "dirty chunks" to "cached chunks". Rename "dirty chunks" to "cached chunks", in order to avoid overloading the term "dirty". Fix the regression caused by `339c2b23b2` (Fix chunk_unmap() to propagate dirty state.), and actually address what that change attempted, which is to only purge chunks once, and propagate whether zeroed pages resulted into chunk_record().	2015-02-18 01:15:50 -08:00
Jason Evans	339c2b23b2	Fix chunk_unmap() to propagate dirty state. Fix chunk_unmap() to propagate whether a chunk is dirty, and modify dirty chunk purging to record this information so it can be passed to chunk_unmap(). Since the broken version of chunk_unmap() claimed that all chunks were clean, this resulted in potential memory corruption for purging implementations that do not zero (e.g. MADV_FREE). This regression was introduced by `ee41ad409a` (Integrate whole chunks into unused dirty page purging machinery.).	2015-02-17 22:25:56 -08:00
Jason Evans	47701b22ee	arena_chunk_dirty_node_init() --> extent_node_dirty_linkage_init()	2015-02-17 22:23:10 -08:00
Jason Evans	a4e1888d1a	Simplify extent_node_t and add extent_node_init().	2015-02-17 15:13:52 -08:00
Jason Evans	ee41ad409a	Integrate whole chunks into unused dirty page purging machinery. Extend per arena unused dirty page purging to manage unused dirty chunks in aaddtion to unused dirty runs. Rather than immediately unmapping deallocated chunks (or purging them in the --disable-munmap case), store them in a separate set of trees, chunks_[sz]ad_dirty. Preferrentially allocate dirty chunks. When excessive unused dirty pages accumulate, purge runs and chunks in ingegrated LRU order (and unmap chunks in the --enable-munmap case). Refactor extent_node_t to provide accessor functions.	2015-02-16 21:02:17 -08:00
Jason Evans	2195ba4e1f	Normalize _link and link_ fields to all be *_link.	2015-02-15 16:43:52 -08:00
Jason Evans	88fef7ceda	Refactor huge_() calls into arena internals. Make redirects to the huge_() API the arena code's responsibility, since arenas now take responsibility for all allocation sizes.	2015-02-12 14:06:37 -08:00
Jason Evans	cbf3a6d703	Move centralized chunk management into arenas. Migrate all centralized data structures related to huge allocations and recyclable chunks into arena_t, so that each arena can manage huge allocations and recyclable virtual memory completely independently of other arenas. Add chunk node caching to arenas, in order to avoid contention on the base allocator. Use chunks_rtree to look up huge allocations rather than a red-black tree. Maintain a per arena unsorted list of huge allocations (which will be needed to enumerate huge allocations during arena reset). Remove the --enable-ivsalloc option, make ivsalloc() always available, and use it for size queries if --enable-debug is enabled. The only practical implications to this removal are that 1) ivsalloc() is now always available during live debugging (and the underlying radix tree is available during core-based debugging), and 2) size query validation can no longer be enabled independent of --enable-debug. Remove the stats.chunks.{current,total,high} mallctls, and replace their underlying statistics with simpler atomically updated counters used exclusively for gdump triggering. These statistics are no longer very useful because each arena manages chunks independently, and per arena statistics provide similar information. Simplify chunk synchronization code, now that base chunk allocation cannot cause recursive lock acquisition.	2015-02-12 00:15:56 -08:00
Jason Evans	1cb181ed63	Implement explicit tcache support. Add the MALLOCX_TCACHE() and MALLOCX_TCACHE_NONE macros, which can be used in conjunction with the *allocx() API. Add the tcache.create, tcache.flush, and tcache.destroy mallctls. This resolves #145.	2015-02-09 17:44:48 -08:00
Mike Hommey	6505733012	Make opt.lg_dirty_mult work as documented The documentation for opt.lg_dirty_mult says: Per-arena minimum ratio (log base 2) of active to dirty pages. Some dirty unused pages may be allowed to accumulate, within the limit set by the ratio (or one chunk worth of dirty pages, whichever is greater) (...) The restriction in parentheses currently doesn't happen. This makes jemalloc aggressively madvise(), which in turns increases the amount of page faults significantly. For instance, this resulted in several(!) hundred(!) milliseconds startup regression on Firefox for Android. This may require further tweaking, but starting with actually doing what the documentation says is a good start.	2015-02-04 07:16:55 +09:00
Jason Evans	4581b97809	Implement metadata statistics. There are three categories of metadata: - Base allocations are used for bootstrap-sensitive internal allocator data structures. - Arena chunk headers comprise pages which track the states of the non-metadata pages. - Internal allocations differ from application-originated allocations in that they are for internal use, and that they are omitted from heap profiles. The metadata statistics comprise the metadata categories as follows: - stats.metadata: All metadata -- base + arena chunk headers + internal allocations. - stats.arenas.<i>.metadata.mapped: Arena chunk headers. - stats.arenas.<i>.metadata.allocated: Internal allocations. This is reported separately from the other metadata statistics because it overlaps with the allocated and active statistics, whereas the other metadata statistics do not. Base allocations are not reported separately, though their magnitude can be computed by subtracting the arena-specific metadata. This resolves #163.	2015-01-23 23:34:43 -08:00
Guilherme Goncalves	9c6a8d3b0c	Move variable declaration to the top its block for MSVC compatibility.	2014-12-17 14:46:35 -02:00
Guilherme Goncalves	2c5cb613df	Introduce two new modes of junk filling: "alloc" and "free". In addition to true/false, opt.junk can now be either "alloc" or "free", giving applications the possibility of junking memory only on allocation or deallocation. This resolves #172.	2014-12-14 17:07:26 -08:00
Jason Evans	e12eaf93dc	Style and spelling fixes.	2014-12-08 16:34:04 -08:00
Jason Evans	d49cb68b9e	Fix more pointer arithmetic undefined behavior. Reported by Guilherme Gonçalves. This resolves #166.	2014-11-17 10:31:59 -08:00
Jason Evans	2012d5a560	Fix pointer arithmetic undefined behavior. Reported by Denis Denisov.	2014-11-17 09:54:49 -08:00
Jason Evans	2b2f6dc1e4	Disable arena_dirty_count() validation.	2014-11-01 02:29:10 -07:00
Daniel Micay	809b0ac391	mark huge allocations as unlikely This cleans up the fast path a bit more by moving away more code.	2014-10-30 17:06:38 -07:00
Jason Evans	af1f592763	Use JEMALLOC_INLINE_C everywhere it's appropriate.	2014-10-30 16:38:08 -07:00
Daniel Micay	a9ea10d27c	use sized deallocation internally for ralloc The size of the source allocation is known at this point, so reading the chunk header can be avoided for the small size class fast path. This is not very useful right now, but it provides a significant performance boost with an alternate ralloc entry point taking the old size.	2014-10-16 15:39:59 -04:00
Jason Evans	9b41ac909f	Fix huge allocation statistics.	2014-10-14 22:20:00 -07:00
Jason Evans	3c4d92e82a	Add per size class huge allocation statistics. Add per size class huge allocation statistics, and normalize various stats: - Change the arenas.nlruns type from size_t to unsigned. - Add the arenas.nhchunks and arenas.hchunks.<i>.size mallctl's. - Replace the stats.arenas.<i>.bins.<j>.allocated mallctl with stats.arenas.<i>.bins.<j>.curregs . - Add the stats.arenas.<i>.hchunks.<j>.nmalloc, stats.arenas.<i>.hchunks.<j>.ndalloc, stats.arenas.<i>.hchunks.<j>.nrequests, and stats.arenas.<i>.hchunks.<j>.curhchunks mallctl's.	2014-10-12 23:02:10 -07:00
Jason Evans	381c23dd9d	Remove arena_dalloc_bin_run() clean page preservation. Remove code in arena_dalloc_bin_run() that preserved the "clean" state of trailing clean pages by splitting them into a separate run during deallocation. This was a useful mechanism for reducing dirty page churn when bin runs comprised many pages, but bin runs are now quite small. Remove the nextind field from arena_run_t now that it is no longer needed, and change arena_run_t's bin field (arena_bin_t *) to binind (index_t). These two changes remove 8 bytes of chunk header overhead per page, which saves 1/512 of all arena chunk memory.	2014-10-10 23:01:03 -07:00
Jason Evans	fc0b3b7383	Add configure options. Add: --with-lg-page --with-lg-page-sizes --with-lg-size-class-group --with-lg-quantum Get rid of STATIC_PAGE_SHIFT, in favor of directly setting LG_PAGE. Fix various edge conditions exposed by the configure options.	2014-10-09 22:44:37 -07:00
Jason Evans	8bb3198f72	Refactor/fix arenas manipulation. Abstract arenas access to use arena_get() (or a0get() where appropriate) rather than directly reading e.g. arenas[ind]. Prior to the addition of the arenas.extend mallctl, the worst possible outcome of directly accessing arenas was a stale read, but arenas.extend may allocate and assign a new array to arenas. Add a tsd-based arenas_cache, which amortizes arenas reads. This introduces some subtle bootstrapping issues, with tsd_boot() now being split into tsd_boot[01]() to support tsd wrapper allocation bootstrapping, as well as an arenas_cache_bypass tsd variable which dynamically terminates allocation of arenas_cache itself. Promote a0malloc(), a0calloc(), and a0free() to be generally useful for internal allocation, and use them in several places (more may be appropriate). Abstract arena->nthreads management and fix a missing decrement during thread destruction (recent tsd refactoring left arenas_cleanup() unused). Change arena_choose() to propagate OOM, and handle OOM in all callers. This is important for providing consistent allocation behavior when the MALLOCX_ARENA() flag is being used. Prior to this fix, it was possible for an OOM to result in allocation silently allocating from a different arena than the one specified.	2014-10-07 23:14:57 -07:00
Jason Evans	155bfa7da1	Normalize size classes. Normalize size classes to use the same number of size classes per size doubling (currently hard coded to 4), across the intire range of size classes. Small size classes already used this spacing, but in order to support this change, additional small size classes now fill [4 KiB .. 16 KiB). Large size classes range from [16 KiB .. 4 MiB). Huge size classes now support non-multiples of the chunk size in order to fill (4 MiB .. 16 MiB).	2014-10-06 01:45:13 -07:00
Daniel Micay	a95018ee81	Attempt to expand huge allocations in-place. This adds support for expanding huge allocations in-place by requesting memory at a specific address from the chunk allocator. It's currently only implemented for the chunk recycling path, although in theory it could also be done by optimistically allocating new chunks. On Linux, it could attempt an in-place mremap. However, that won't work in practice since the heap is grown downwards and memory is not unmapped (in a normal build, at least). Repeated vector reallocation micro-benchmark: #include <string.h> #include <stdlib.h> int main(void) { for (size_t i = 0; i < 100; i++) { void ptr = NULL; size_t old_size = 0; for (size_t size = 4; size < (1 << 30); size = 2) { ptr = realloc(ptr, size); if (!ptr) return 1; memset(ptr + old_size, 0xff, size - old_size); old_size = size; } free(ptr); } } The glibc allocator fails to do any in-place reallocations on this benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it elides the cost of copies via mremap, which is currently not something that jemalloc can use. With this improvement, jemalloc still fails to do any in-place huge reallocations for the first outer loop, but then succeeds 100% of the time for the remaining 99 iterations. The time spent doing allocations and copies drops down to under 5%, with nearly all of it spent doing purging + faulting (when huge pages are disabled) and the array memset. An improved mremap API (MREMAP_RETAIN - #138) would be far more general but this is a portable optimization and would still be useful on Linux for xallocx. Numbers with transparent huge pages enabled: glibc (copies elided via MREMAP_MAYMOVE): 8.471s jemalloc: 17.816s jemalloc + no-op madvise: 13.236s jemalloc + this commit: 6.787s jemalloc + this commit + no-op madvise: 6.144s Numbers with transparent huge pages disabled: glibc (copies elided via MREMAP_MAYMOVE): 15.403s jemalloc: 39.456s jemalloc + no-op madvise: 12.768s jemalloc + this commit: 15.534s jemalloc + this commit + no-op madvise: 6.354s Closes #137	2014-10-05 14:47:01 -07:00
Jason Evans	f11a6776c7	Fix OOM-related regression in arena_tcache_fill_small(). Fix an OOM-related regression in arena_tcache_fill_small() that caused cache corruption that would almost certainly expose the application to undefined behavior, usually in the form of an allocation request returning an already-allocated region, or somewhat less likely, a freed region that had already been returned to the arena, thus making it available to the arena for any purpose. This regression was introduced by `9c43c13a35` (Reverse tcache fill order.), and was present in all releases from 2.2.0 through 3.6.0. This resolves #98.	2014-10-05 13:05:10 -07:00
Jason Evans	551ebc4364	Convert to uniform style: cond == false --> !cond	2014-10-03 10:16:09 -07:00
Jason Evans	0c5dd03e88	Move small run metadata into the arena chunk header. Move small run metadata into the arena chunk header, with multiple expected benefits: - Lower run fragmentation due to reduced run sizes; runs are more likely to completely drain when there are fewer total regions. - Improved cache behavior. Prior to this change, run headers were always page-aligned, which put extra pressure on some CPU cache sets. The degree to which this was a problem was hardware dependent, but it likely hurt some even for the most advanced modern hardware. - Buffer overruns/underruns are less likely to corrupt allocator metadata. - Size classes between 4 KiB and 16 KiB become reasonable to support without any special handling, and the runs are small enough that dirty unused pages aren't a significant concern.	2014-09-29 01:31:39 -07:00
Jason Evans	5460aa6f66	Convert all tsd variables to reside in a single tsd structure.	2014-09-23 02:36:08 -07:00
Jason Evans	9c640bfdd4	Apply likely()/unlikely() to allocation/deallocation fast paths.	2014-09-11 17:01:58 -07:00
Jason Evans	b718cf77e9	Optimize [nmd]alloc() fast paths. Optimize [nmd]alloc() fast paths such that the (flags == 0) case is streamlined, flags decoding only happens to the minimum degree necessary, and no conditionals are repeated.	2014-09-07 14:40:19 -07:00
Qinfan Wu	ff6a31d3b9	Refactor chunk map. Break the chunk map into two separate arrays, in order to improve cache locality. This is related to issue #23.	2014-09-04 22:22:52 -07:00
Jason Evans	070b3c3fbd	Fix and refactor runs_dirty-based purging. Fix runs_dirty-based purging to also purge dirty pages in the spare chunk. Refactor runs_dirty manipulation into arena_dirty_{insert,remove}(), and move the arena->ndirty accounting into those functions. Remove the u.ql_link field from arena_chunk_map_t, and get rid of the enclosing union for u.rb_link, since only rb_link remains. Remove the ndirty field from arena_chunk_t.	2014-08-14 14:45:58 -07:00
Qinfan Wu	e8a2fd83a2	arena->npurgatory is no longer needed since we drop arena's lock after stashing all the purgeable runs.	2014-08-12 09:50:01 -07:00
Qinfan Wu	90737fcda1	Remove chunks_dirty tree, nruns_avail and nruns_adjac since we no longer need to maintain the tree for dirty page purging.	2014-08-12 09:50:00 -07:00
Qinfan Wu	e970800c78	Purge dirty pages from the beginning of the dirty list.	2014-08-12 09:50:00 -07:00
Qinfan Wu	a244e5078e	Add dirty page counting for debug	2014-08-12 09:50:00 -07:00
Qinfan Wu	04d60a132b	Maintain all the dirty runs in a linked list for each arena	2014-08-12 09:50:00 -07:00
Jason Evans	1522937e9c	Fix the cactive statistic. Fix the cactive statistic to decrease (rather than increase) when active memory decreases. This regression was introduced by `aa5113b1fd` (Refactor overly large/complex functions) and first released in 3.5.0.	2014-08-06 23:43:39 -07:00
Qinfan Wu	ea73eb8f3e	Reintroduce the comment that was removed in `f9ff603`.	2014-08-06 16:43:01 -07:00
Qinfan Wu	55c9aa1038	Fix the bug that causes not allocating free run with lowest address.	2014-08-06 16:10:08 -07:00
Richard Diamond	9c3a10fdf6	Try to use __builtin_ffsl if ffsl is unavailable. Some platforms (like those using Newlib) don't have ffs/ffsl. This commit adds a check to configure.ac for __builtin_ffsl if ffsl isn't found. __builtin_ffsl performs the same function as ffsl, and has the added benefit of being available on any platform utilizing Gcc-compatible compiler. This change does not address the used of ffs in the MALLOCX_ARENA() macro.	2014-06-02 07:44:50 -07:00
Jason Evans	d04047cc29	Add size class computation capability. Add size class computation capability, currently used only as validation of the size class lookup tables. Generalize the size class spacing used for bins, for eventual use throughout the full range of allocation sizes.	2014-05-28 21:06:46 -07:00
Jason Evans	e2deab7a75	Refactor huge allocation to be managed by arenas. Refactor huge allocation to be managed by arenas (though the global red-black tree of huge allocations remains for lookup during deallocation). This is the logical conclusion of recent changes that 1) made per arena dss precedence apply to huge allocation, and 2) made it possible to replace the per arena chunk allocation/deallocation functions. Remove the top level huge stats, and replace them with per arena huge stats. Normalize function names and types to dalloc (some were dealloc). Remove the --enable-mremap option. As jemalloc currently operates, this is a performace regression for some applications, but planned work to logarithmically space huge size classes should provide similar amortized performance. The motivation for this change was that mremap-based huge reallocation forced leaky abstractions that prevented refactoring.	2014-05-15 22:36:41 -07:00
aravind	fb7fe50a88	Add support for user-specified chunk allocators/deallocators. Add new mallctl endpoints "arena<i>.chunk.alloc" and "arena<i>.chunk.dealloc" to allow userspace to configure jemalloc's chunk allocator and deallocator on a per-arena basis.	2014-05-12 10:46:03 -07:00
Jason Evans	3541a904d6	Refactor small_size2bin and small_bin2size. Refactor small_size2bin and small_bin2size to be inline functions rather than directly accessed arrays.	2014-04-16 17:14:33 -07:00
Jason Evans	3e3caf03af	Merge pull request #73 from bmaurer/smallmalloc Smaller malloc hot path	2014-04-16 16:33:21 -07:00
Ben Maurer	021136ce4d	Create a const array with only a small bin to size map	2014-04-16 14:31:24 -07:00
Jason Evans	bd87b01999	Optimize Valgrind integration. Forcefully disable tcache if running inside Valgrind, and remove Valgrind calls in tcache-specific code. Restructure Valgrind-related code to move most Valgrind calls out of the fast path functions. Take advantage of static knowledge to elide some branches in JEMALLOC_VALGRIND_REALLOC().	2014-04-15 16:49:57 -07:00
Jason Evans	4d434adb14	Make dss non-optional, and fix an "arena.<i>.dss" mallctl bug. Make dss non-optional on all platforms which support sbrk(2). Fix the "arena.<i>.dss" mallctl to return an error if "primary" or "secondary" precedence is specified, but sbrk(2) is not supported.	2014-04-15 12:09:48 -07:00
Jason Evans	9b0cbf0850	Remove support for non-prof-promote heap profiling metadata. Make promotion of sampled small objects to large objects mandatory, so that profiling metadata can always be stored in the chunk map, rather than requiring one pointer per small region in each small-region page run. In practice the non-prof-promote code was only useful when using jemalloc to track all objects and report them as leaks at program exit. However, Valgrind is at least as good a tool for this particular use case. Furthermore, the non-prof-promote code is getting in the way of some optimizations that will make heap profiling much cheaper for the predominant use case (sampling a small representative proportion of all allocations).	2014-04-11 14:24:51 -07:00
Ben Maurer	f9ff60346d	refactoring for bits splitting	2014-04-10 12:43:54 -07:00
Chris Pride	20a8c78bfe	Fix a crashing case where arena_chunk_init_hard returns NULL. This happens when it fails to allocate a new chunk. Which arena_chunk_alloc then passes into arena_avail_insert without any checks. This then causes a crash when arena_avail_insert tries to check chunk->ndirty. This was introduced by the refactoring of arena_chunk_alloc which previously would have returned NULL immediately after calling chunk_alloc. This is now the return from arena_chunk_init_hard so we need to check that return, and not continue if it was NULL.	2014-03-25 22:36:05 -07:00
Erwan Legrand	69e9fbb9c1	Fix typo	2014-02-14 12:48:58 +01:00
Jason Evans	aa5113b1fd	Refactor overly large/complex functions. Refactor overly large functions by breaking out helper functions. Refactor overly complex multi-purpose functions into separate more specific functions.	2014-01-14 16:23:03 -08:00
Jason Evans	b2c31660be	Extract profiling code from [re]allocation functions. Extract profiling code from malloc(), imemalign(), calloc(), realloc(), mallocx(), rallocx(), and xallocx(). This slightly reduces the amount of code compiled into the fast paths, but the primary benefit is the combinatorial complexity reduction. Simplify iralloc[t]() by creating a separate ixalloc() that handles the no-move cases. Further simplify [mrxn]allocx() (and by implication [mrn]allocm()) to make request size overflows due to size class and/or alignment constraints trigger undefined behavior (detected by debug-only assertions). Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling backtrace creation in imemalign(). This bug impacted posix_memalign() and aligned_alloc().	2014-01-12 15:41:05 -08:00
Jason Evans	6b694c4d47	Add junk/zero filling unit tests, and fix discovered bugs. Fix growing large reallocation to junk fill new space. Fix huge deallocation to junk fill when munmap is disabled.	2014-01-07 16:54:17 -08:00
Jason Evans	0d6c5d8bd0	Add quarantine unit tests. Verify that freed regions are quarantined, and that redzone corruption is detected. Introduce a testing idiom for intercepting/replacing internal functions. In this case the replaced function is ordinarily a static function, but the idiom should work similarly for library-private functions.	2013-12-17 15:19:12 -08:00
Jason Evans	6e62984ef6	Don't junk-fill reallocations unless usize changes. Don't junk fill reallocations for which the request size is less than the current usable size, but not enough smaller to cause a size class change. Unlike malloc()/calloc()/realloc(), *allocx() contractually treats the full usize as the allocation, so a caller can ask for zeroed memory via mallocx() and a series of rallocx() calls that all specify MALLOCX_ZERO, and be assured that all newly allocated bytes will be zeroed and made available to the application without danger of allocator mutation until the size class decreases enough to cause usize reduction.	2013-12-15 21:57:09 -08:00
Jason Evans	d82a5e6a34	Implement the allocx() API. Implement the allocx() API, which is a successor to the allocm() API. The allocx() functions are slightly simpler to use because they have fewer parameters, they directly return the results of primary interest, and mallocx()/rallocx() avoid the strict aliasing pitfall that allocm()/rallocx() share with posix_memalign(). The following code violates strict aliasing rules: foo_t foo; allocm((void )&foo, NULL, 42, 0); whereas the following is safe: foo_t foo; void p; allocm(&p, NULL, 42, 0); foo = (foo_t )p; mallocx() does not have this problem: foo_t foo = (foo_t )mallocx(42, 0);	2013-12-12 22:35:52 -08:00
Jason Evans	c368f8c8a2	Remove unnecessary zeroing in arena_palloc().	2013-10-29 18:31:17 -07:00
Jason Evans	dda90f59e2	Fix a Valgrind integration flaw. Fix a Valgrind integration flaw that caused Valgrind warnings about reads of uninitialized memory in internal zero-initialized data structures (relevant to tcache and prof code).	2013-10-19 23:48:40 -07:00
Jason Evans	87a02d2bb1	Fix a Valgrind integration flaw. Fix a Valgrind integration flaw that caused Valgrind warnings about reads of uninitialized memory in arena chunk headers.	2013-10-19 21:40:20 -07:00
Jason Evans	88c222c8e9	Fix a prof-related locking order bug. Fix a locking order bug that could cause deadlock during fork if heap profiling were enabled.	2013-02-06 11:59:30 -08:00
Jason Evans	06912756cc	Fix Valgrind integration. Fix Valgrind integration to annotate all internally allocated memory in a way that keeps Valgrind happy about internal data structure access.	2013-01-31 17:02:53 -08:00
Jason Evans	38067483c5	Tighten valgrind integration. Tighten valgrind integration such that immediately after memory is validated or zeroed, valgrind is told to forget the memory's 'defined' state. The only place newly allocated memory should be left marked as 'defined' is in the public functions (e.g. calloc() and realloc()).	2013-01-21 20:04:42 -08:00
Jason Evans	a3b3386ddd	Avoid arena_prof_accum()-related locking when possible. Refactor arena_prof_accum() and its callers to avoid arena locking when prof_interval is 0 (as when profiling is disabled). Reported by Ben Maurer.	2012-11-13 13:47:53 -08:00
Jason Evans	abf6739317	Tweak chunk purge order according to fragmentation. Tweak chunk purge order to purge unfragmented chunks from high to low memory. This facilitates dirty run reuse.	2012-11-07 10:08:34 -08:00
Jason Evans	e3d13060c8	Purge unused dirty pages in a fragmentation-reducing order. Purge unused dirty pages in an order that first performs clean/dirty run defragmentation, in order to mitigate available run fragmentation. Remove the limitation that prevented purging unless at least one chunk worth of dirty pages had accumulated in an arena. This limitation was intended to avoid excessive purging for small applications, but the threshold was arbitrary, and the effect of questionable utility. Relax opt_lg_dirty_mult from 5 to 3. This compensates for increased likelihood of allocating clean runs, given the same ratio of clean:dirty runs, and reduces the potential for repeated purging in pathological large malloc/free loops that push the active:dirty page ratio just over the purge threshold.	2012-11-06 00:59:53 -08:00
Jason Evans	609ae595f0	Add arena-specific and selective dss allocation. Add the "arenas.extend" mallctl, so that it is possible to create new arenas that are outside the set that jemalloc automatically multiplexes threads onto. Add the ALLOCM_ARENA() flag for {,r,d}allocm(), so that it is possible to explicitly allocate from a particular arena. Add the "opt.dss" mallctl, which controls the default precedence of dss allocation relative to mmap allocation. Add the "arena.<i>.dss" mallctl, which makes it possible to set the default dss precedence on a per arena or global basis. Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge". Add the "stats.arenas.<i>.dss" mallctl.	2012-10-12 18:26:16 -07:00
Jason Evans	7de92767c2	Fix mlockall()/madvise() interaction. mlockall(2) can cause purging via madvise(2) to fail. Fix purging code to check whether madvise() succeeded, and base zeroed page metadata on the result. Reported by Olivier Lecomte.	2012-10-08 18:04:49 -07:00
Jason Evans	f1966e1dc7	Update a comment.	2012-05-16 00:35:08 -07:00
Jason Evans	d8ceef6c55	Fix large calloc() zeroing bugs. Refactor code such that arena_mapbits_{large,small}_set() always preserves the unzeroed flag, and manually manipulate the unzeroed flag in the one case where it actually gets reset (in arena_chunk_purge()). This fixes unzeroed preservation bugs in arena_run_split() and arena_ralloc_large_grow(). These bugs caused large calloc() to return non-zeroed memory under some circumstances.	2012-05-10 21:49:43 -07:00
Jason Evans	30fe12b866	Add arena chunk map assertions.	2012-05-10 21:49:43 -07:00
Jason Evans	5b0c99649f	Refactor arena_run_alloc(). Refactor duplicated arena_run_alloc() code into arena_run_alloc_helper().	2012-05-10 21:49:43 -07:00
Jason Evans	80737c3323	Further optimize and harden arena_salloc(). Further optimize arena_salloc() to only look at the binind chunk map bits in the common case. Add more sanity checks to arena_salloc() that detect chunk map inconsistencies for large allocations (whether due to allocator bugs or application bugs).	2012-05-02 16:11:03 -07:00
Jason Evans	203484e2ea	Optimize malloc() and free() fast paths. Embed the bin index for small page runs into the chunk page map, in order to omit [...] in the following dependent load sequence: ptr-->mapelm-->[run-->bin-->]bin_info Move various non-critcal code out of the inlined function chain into helper functions (tcache_event_hard(), arena_dalloc_small(), and locking).	2012-05-02 00:30:36 -07:00
Mike Hommey	da99e31105	Replace JEMALLOC_ATTR with various different macros when it makes sense Theses newly added macros will be used to implement the equivalent under MSVC. Also, move the definitions to headers, where they make more sense, and for some, are even more useful there (e.g. malloc).	2012-04-30 17:57:31 -07:00
Mike Hommey	8b49971d0c	Avoid variable length arrays and remove declarations within code MSVC doesn't support C99, and building as C++ to be able to use them is dangerous, as C++ and C99 are incompatible. Introduce a VARIABLE_ARRAY macro that either uses VLA when supported, or alloca() otherwise. Note that using alloca() inside loops doesn't quite work like VLAs, thus the use of VARIABLE_ARRAY there is discouraged. It might be worth investigating ways to check whether VARIABLE_ARRAY is used in such context at runtime in debug builds and bail out if that happens.	2012-04-29 00:25:34 -07:00
Jason Evans	f54166e7ef	Add missing Valgrind annotations.	2012-04-23 22:41:36 -07:00
Jason Evans	f7088e6c99	Make arena_salloc() an inline function.	2012-04-19 18:28:03 -07:00
Mike Hommey	666c5bf7a8	Add a pages_purge function to wrap madvise(JEMALLOC_MADV_PURGE) calls This will be used to implement the feature on mingw, which doesn't have madvise.	2012-04-18 18:57:48 -07:00
Jason Evans	78f7352259	Clean up a few config-related conditionals/asserts. Clean up a few config-related conditionals to avoid unnecessary dependencies on prof symbols. Use cassert() rather than assert() everywhere that it's appropriate.	2012-04-18 13:38:40 -07:00
Jason Evans	7ca0fdfb85	Disable munmap() if it causes VM map holes. Add a configure test to determine whether common mmap()/munmap() patterns cause VM map holes, and only use munmap() to discard unused chunks if the problem does not exist. Unify the chunk caching for mmap and dss. Fix options processing to limit lg_chunk to be large enough that redzones will always fit.	2012-04-12 20:20:58 -07:00
Jason Evans	5ff709c264	Normalize aligned allocation algorithms. Normalize arena_palloc(), chunk_alloc_mmap_slow(), and chunk_recycle_dss() to use the same algorithm for trimming over-allocation. Add the ALIGNMENT_ADDR2BASE(), ALIGNMENT_ADDR2OFFSET(), and ALIGNMENT_CEILING() macros, and use them where appropriate. Remove the run_size_p parameter from sa2u(). Fix a potential deadlock in chunk_recycle_dss() that was introduced by `eae269036c` (Add alignment support to chunk_alloc()).	2012-04-11 18:13:45 -07:00
Jason Evans	122449b073	Implement Valgrind support, redzones, and quarantine. Implement Valgrind support, as well as the redzone and quarantine features, which help Valgrind detect memory errors. Redzones are only implemented for small objects because the changes necessary to support redzones around large and huge objects are complicated by in-place reallocation, to the point that it isn't clear that the maintenance burden is worth the incremental improvement to Valgrind support. Merge arena_salloc() and arena_salloc_demote(). Refactor i[v]salloc() to expose the 'demote' option.	2012-04-11 11:46:18 -07:00

1 2 3 4 5 ...

267 Commits