server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Jason Evans	a4e1888d1a	Simplify extent_node_t and add extent_node_init().	2015-02-17 15:13:52 -08:00
Jason Evans	ee41ad409a	Integrate whole chunks into unused dirty page purging machinery. Extend per arena unused dirty page purging to manage unused dirty chunks in aaddtion to unused dirty runs. Rather than immediately unmapping deallocated chunks (or purging them in the --disable-munmap case), store them in a separate set of trees, chunks_[sz]ad_dirty. Preferrentially allocate dirty chunks. When excessive unused dirty pages accumulate, purge runs and chunks in ingegrated LRU order (and unmap chunks in the --enable-munmap case). Refactor extent_node_t to provide accessor functions.	2015-02-16 21:02:17 -08:00
Jason Evans	2195ba4e1f	Normalize _link and link_ fields to all be *_link.	2015-02-15 16:43:52 -08:00
Jason Evans	88fef7ceda	Refactor huge_() calls into arena internals. Make redirects to the huge_() API the arena code's responsibility, since arenas now take responsibility for all allocation sizes.	2015-02-12 14:06:37 -08:00
Jason Evans	cbf3a6d703	Move centralized chunk management into arenas. Migrate all centralized data structures related to huge allocations and recyclable chunks into arena_t, so that each arena can manage huge allocations and recyclable virtual memory completely independently of other arenas. Add chunk node caching to arenas, in order to avoid contention on the base allocator. Use chunks_rtree to look up huge allocations rather than a red-black tree. Maintain a per arena unsorted list of huge allocations (which will be needed to enumerate huge allocations during arena reset). Remove the --enable-ivsalloc option, make ivsalloc() always available, and use it for size queries if --enable-debug is enabled. The only practical implications to this removal are that 1) ivsalloc() is now always available during live debugging (and the underlying radix tree is available during core-based debugging), and 2) size query validation can no longer be enabled independent of --enable-debug. Remove the stats.chunks.{current,total,high} mallctls, and replace their underlying statistics with simpler atomically updated counters used exclusively for gdump triggering. These statistics are no longer very useful because each arena manages chunks independently, and per arena statistics provide similar information. Simplify chunk synchronization code, now that base chunk allocation cannot cause recursive lock acquisition.	2015-02-12 00:15:56 -08:00
Jason Evans	1cb181ed63	Implement explicit tcache support. Add the MALLOCX_TCACHE() and MALLOCX_TCACHE_NONE macros, which can be used in conjunction with the *allocx() API. Add the tcache.create, tcache.flush, and tcache.destroy mallctls. This resolves #145.	2015-02-09 17:44:48 -08:00
Mike Hommey	6505733012	Make opt.lg_dirty_mult work as documented The documentation for opt.lg_dirty_mult says: Per-arena minimum ratio (log base 2) of active to dirty pages. Some dirty unused pages may be allowed to accumulate, within the limit set by the ratio (or one chunk worth of dirty pages, whichever is greater) (...) The restriction in parentheses currently doesn't happen. This makes jemalloc aggressively madvise(), which in turns increases the amount of page faults significantly. For instance, this resulted in several(!) hundred(!) milliseconds startup regression on Firefox for Android. This may require further tweaking, but starting with actually doing what the documentation says is a good start.	2015-02-04 07:16:55 +09:00
Jason Evans	4581b97809	Implement metadata statistics. There are three categories of metadata: - Base allocations are used for bootstrap-sensitive internal allocator data structures. - Arena chunk headers comprise pages which track the states of the non-metadata pages. - Internal allocations differ from application-originated allocations in that they are for internal use, and that they are omitted from heap profiles. The metadata statistics comprise the metadata categories as follows: - stats.metadata: All metadata -- base + arena chunk headers + internal allocations. - stats.arenas.<i>.metadata.mapped: Arena chunk headers. - stats.arenas.<i>.metadata.allocated: Internal allocations. This is reported separately from the other metadata statistics because it overlaps with the allocated and active statistics, whereas the other metadata statistics do not. Base allocations are not reported separately, though their magnitude can be computed by subtracting the arena-specific metadata. This resolves #163.	2015-01-23 23:34:43 -08:00
Guilherme Goncalves	9c6a8d3b0c	Move variable declaration to the top its block for MSVC compatibility.	2014-12-17 14:46:35 -02:00
Guilherme Goncalves	2c5cb613df	Introduce two new modes of junk filling: "alloc" and "free". In addition to true/false, opt.junk can now be either "alloc" or "free", giving applications the possibility of junking memory only on allocation or deallocation. This resolves #172.	2014-12-14 17:07:26 -08:00
Jason Evans	e12eaf93dc	Style and spelling fixes.	2014-12-08 16:34:04 -08:00
Jason Evans	d49cb68b9e	Fix more pointer arithmetic undefined behavior. Reported by Guilherme Gonçalves. This resolves #166.	2014-11-17 10:31:59 -08:00
Jason Evans	2012d5a560	Fix pointer arithmetic undefined behavior. Reported by Denis Denisov.	2014-11-17 09:54:49 -08:00
Jason Evans	2b2f6dc1e4	Disable arena_dirty_count() validation.	2014-11-01 02:29:10 -07:00
Daniel Micay	809b0ac391	mark huge allocations as unlikely This cleans up the fast path a bit more by moving away more code.	2014-10-30 17:06:38 -07:00
Jason Evans	af1f592763	Use JEMALLOC_INLINE_C everywhere it's appropriate.	2014-10-30 16:38:08 -07:00
Daniel Micay	a9ea10d27c	use sized deallocation internally for ralloc The size of the source allocation is known at this point, so reading the chunk header can be avoided for the small size class fast path. This is not very useful right now, but it provides a significant performance boost with an alternate ralloc entry point taking the old size.	2014-10-16 15:39:59 -04:00
Jason Evans	9b41ac909f	Fix huge allocation statistics.	2014-10-14 22:20:00 -07:00
Jason Evans	3c4d92e82a	Add per size class huge allocation statistics. Add per size class huge allocation statistics, and normalize various stats: - Change the arenas.nlruns type from size_t to unsigned. - Add the arenas.nhchunks and arenas.hchunks.<i>.size mallctl's. - Replace the stats.arenas.<i>.bins.<j>.allocated mallctl with stats.arenas.<i>.bins.<j>.curregs . - Add the stats.arenas.<i>.hchunks.<j>.nmalloc, stats.arenas.<i>.hchunks.<j>.ndalloc, stats.arenas.<i>.hchunks.<j>.nrequests, and stats.arenas.<i>.hchunks.<j>.curhchunks mallctl's.	2014-10-12 23:02:10 -07:00
Jason Evans	381c23dd9d	Remove arena_dalloc_bin_run() clean page preservation. Remove code in arena_dalloc_bin_run() that preserved the "clean" state of trailing clean pages by splitting them into a separate run during deallocation. This was a useful mechanism for reducing dirty page churn when bin runs comprised many pages, but bin runs are now quite small. Remove the nextind field from arena_run_t now that it is no longer needed, and change arena_run_t's bin field (arena_bin_t *) to binind (index_t). These two changes remove 8 bytes of chunk header overhead per page, which saves 1/512 of all arena chunk memory.	2014-10-10 23:01:03 -07:00
Jason Evans	fc0b3b7383	Add configure options. Add: --with-lg-page --with-lg-page-sizes --with-lg-size-class-group --with-lg-quantum Get rid of STATIC_PAGE_SHIFT, in favor of directly setting LG_PAGE. Fix various edge conditions exposed by the configure options.	2014-10-09 22:44:37 -07:00
Jason Evans	8bb3198f72	Refactor/fix arenas manipulation. Abstract arenas access to use arena_get() (or a0get() where appropriate) rather than directly reading e.g. arenas[ind]. Prior to the addition of the arenas.extend mallctl, the worst possible outcome of directly accessing arenas was a stale read, but arenas.extend may allocate and assign a new array to arenas. Add a tsd-based arenas_cache, which amortizes arenas reads. This introduces some subtle bootstrapping issues, with tsd_boot() now being split into tsd_boot[01]() to support tsd wrapper allocation bootstrapping, as well as an arenas_cache_bypass tsd variable which dynamically terminates allocation of arenas_cache itself. Promote a0malloc(), a0calloc(), and a0free() to be generally useful for internal allocation, and use them in several places (more may be appropriate). Abstract arena->nthreads management and fix a missing decrement during thread destruction (recent tsd refactoring left arenas_cleanup() unused). Change arena_choose() to propagate OOM, and handle OOM in all callers. This is important for providing consistent allocation behavior when the MALLOCX_ARENA() flag is being used. Prior to this fix, it was possible for an OOM to result in allocation silently allocating from a different arena than the one specified.	2014-10-07 23:14:57 -07:00
Jason Evans	155bfa7da1	Normalize size classes. Normalize size classes to use the same number of size classes per size doubling (currently hard coded to 4), across the intire range of size classes. Small size classes already used this spacing, but in order to support this change, additional small size classes now fill [4 KiB .. 16 KiB). Large size classes range from [16 KiB .. 4 MiB). Huge size classes now support non-multiples of the chunk size in order to fill (4 MiB .. 16 MiB).	2014-10-06 01:45:13 -07:00
Daniel Micay	a95018ee81	Attempt to expand huge allocations in-place. This adds support for expanding huge allocations in-place by requesting memory at a specific address from the chunk allocator. It's currently only implemented for the chunk recycling path, although in theory it could also be done by optimistically allocating new chunks. On Linux, it could attempt an in-place mremap. However, that won't work in practice since the heap is grown downwards and memory is not unmapped (in a normal build, at least). Repeated vector reallocation micro-benchmark: #include <string.h> #include <stdlib.h> int main(void) { for (size_t i = 0; i < 100; i++) { void ptr = NULL; size_t old_size = 0; for (size_t size = 4; size < (1 << 30); size = 2) { ptr = realloc(ptr, size); if (!ptr) return 1; memset(ptr + old_size, 0xff, size - old_size); old_size = size; } free(ptr); } } The glibc allocator fails to do any in-place reallocations on this benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it elides the cost of copies via mremap, which is currently not something that jemalloc can use. With this improvement, jemalloc still fails to do any in-place huge reallocations for the first outer loop, but then succeeds 100% of the time for the remaining 99 iterations. The time spent doing allocations and copies drops down to under 5%, with nearly all of it spent doing purging + faulting (when huge pages are disabled) and the array memset. An improved mremap API (MREMAP_RETAIN - #138) would be far more general but this is a portable optimization and would still be useful on Linux for xallocx. Numbers with transparent huge pages enabled: glibc (copies elided via MREMAP_MAYMOVE): 8.471s jemalloc: 17.816s jemalloc + no-op madvise: 13.236s jemalloc + this commit: 6.787s jemalloc + this commit + no-op madvise: 6.144s Numbers with transparent huge pages disabled: glibc (copies elided via MREMAP_MAYMOVE): 15.403s jemalloc: 39.456s jemalloc + no-op madvise: 12.768s jemalloc + this commit: 15.534s jemalloc + this commit + no-op madvise: 6.354s Closes #137	2014-10-05 14:47:01 -07:00
Jason Evans	f11a6776c7	Fix OOM-related regression in arena_tcache_fill_small(). Fix an OOM-related regression in arena_tcache_fill_small() that caused cache corruption that would almost certainly expose the application to undefined behavior, usually in the form of an allocation request returning an already-allocated region, or somewhat less likely, a freed region that had already been returned to the arena, thus making it available to the arena for any purpose. This regression was introduced by `9c43c13a35` (Reverse tcache fill order.), and was present in all releases from 2.2.0 through 3.6.0. This resolves #98.	2014-10-05 13:05:10 -07:00
Jason Evans	551ebc4364	Convert to uniform style: cond == false --> !cond	2014-10-03 10:16:09 -07:00
Jason Evans	0c5dd03e88	Move small run metadata into the arena chunk header. Move small run metadata into the arena chunk header, with multiple expected benefits: - Lower run fragmentation due to reduced run sizes; runs are more likely to completely drain when there are fewer total regions. - Improved cache behavior. Prior to this change, run headers were always page-aligned, which put extra pressure on some CPU cache sets. The degree to which this was a problem was hardware dependent, but it likely hurt some even for the most advanced modern hardware. - Buffer overruns/underruns are less likely to corrupt allocator metadata. - Size classes between 4 KiB and 16 KiB become reasonable to support without any special handling, and the runs are small enough that dirty unused pages aren't a significant concern.	2014-09-29 01:31:39 -07:00
Jason Evans	5460aa6f66	Convert all tsd variables to reside in a single tsd structure.	2014-09-23 02:36:08 -07:00
Jason Evans	9c640bfdd4	Apply likely()/unlikely() to allocation/deallocation fast paths.	2014-09-11 17:01:58 -07:00
Jason Evans	b718cf77e9	Optimize [nmd]alloc() fast paths. Optimize [nmd]alloc() fast paths such that the (flags == 0) case is streamlined, flags decoding only happens to the minimum degree necessary, and no conditionals are repeated.	2014-09-07 14:40:19 -07:00
Qinfan Wu	ff6a31d3b9	Refactor chunk map. Break the chunk map into two separate arrays, in order to improve cache locality. This is related to issue #23.	2014-09-04 22:22:52 -07:00
Jason Evans	070b3c3fbd	Fix and refactor runs_dirty-based purging. Fix runs_dirty-based purging to also purge dirty pages in the spare chunk. Refactor runs_dirty manipulation into arena_dirty_{insert,remove}(), and move the arena->ndirty accounting into those functions. Remove the u.ql_link field from arena_chunk_map_t, and get rid of the enclosing union for u.rb_link, since only rb_link remains. Remove the ndirty field from arena_chunk_t.	2014-08-14 14:45:58 -07:00
Qinfan Wu	e8a2fd83a2	arena->npurgatory is no longer needed since we drop arena's lock after stashing all the purgeable runs.	2014-08-12 09:50:01 -07:00
Qinfan Wu	90737fcda1	Remove chunks_dirty tree, nruns_avail and nruns_adjac since we no longer need to maintain the tree for dirty page purging.	2014-08-12 09:50:00 -07:00
Qinfan Wu	e970800c78	Purge dirty pages from the beginning of the dirty list.	2014-08-12 09:50:00 -07:00
Qinfan Wu	a244e5078e	Add dirty page counting for debug	2014-08-12 09:50:00 -07:00
Qinfan Wu	04d60a132b	Maintain all the dirty runs in a linked list for each arena	2014-08-12 09:50:00 -07:00
Jason Evans	1522937e9c	Fix the cactive statistic. Fix the cactive statistic to decrease (rather than increase) when active memory decreases. This regression was introduced by `aa5113b1fd` (Refactor overly large/complex functions) and first released in 3.5.0.	2014-08-06 23:43:39 -07:00
Qinfan Wu	ea73eb8f3e	Reintroduce the comment that was removed in `f9ff603`.	2014-08-06 16:43:01 -07:00
Qinfan Wu	55c9aa1038	Fix the bug that causes not allocating free run with lowest address.	2014-08-06 16:10:08 -07:00
Richard Diamond	9c3a10fdf6	Try to use __builtin_ffsl if ffsl is unavailable. Some platforms (like those using Newlib) don't have ffs/ffsl. This commit adds a check to configure.ac for __builtin_ffsl if ffsl isn't found. __builtin_ffsl performs the same function as ffsl, and has the added benefit of being available on any platform utilizing Gcc-compatible compiler. This change does not address the used of ffs in the MALLOCX_ARENA() macro.	2014-06-02 07:44:50 -07:00
Jason Evans	d04047cc29	Add size class computation capability. Add size class computation capability, currently used only as validation of the size class lookup tables. Generalize the size class spacing used for bins, for eventual use throughout the full range of allocation sizes.	2014-05-28 21:06:46 -07:00
Jason Evans	e2deab7a75	Refactor huge allocation to be managed by arenas. Refactor huge allocation to be managed by arenas (though the global red-black tree of huge allocations remains for lookup during deallocation). This is the logical conclusion of recent changes that 1) made per arena dss precedence apply to huge allocation, and 2) made it possible to replace the per arena chunk allocation/deallocation functions. Remove the top level huge stats, and replace them with per arena huge stats. Normalize function names and types to dalloc (some were dealloc). Remove the --enable-mremap option. As jemalloc currently operates, this is a performace regression for some applications, but planned work to logarithmically space huge size classes should provide similar amortized performance. The motivation for this change was that mremap-based huge reallocation forced leaky abstractions that prevented refactoring.	2014-05-15 22:36:41 -07:00
aravind	fb7fe50a88	Add support for user-specified chunk allocators/deallocators. Add new mallctl endpoints "arena<i>.chunk.alloc" and "arena<i>.chunk.dealloc" to allow userspace to configure jemalloc's chunk allocator and deallocator on a per-arena basis.	2014-05-12 10:46:03 -07:00
Jason Evans	3541a904d6	Refactor small_size2bin and small_bin2size. Refactor small_size2bin and small_bin2size to be inline functions rather than directly accessed arrays.	2014-04-16 17:14:33 -07:00
Jason Evans	3e3caf03af	Merge pull request #73 from bmaurer/smallmalloc Smaller malloc hot path	2014-04-16 16:33:21 -07:00
Ben Maurer	021136ce4d	Create a const array with only a small bin to size map	2014-04-16 14:31:24 -07:00
Jason Evans	bd87b01999	Optimize Valgrind integration. Forcefully disable tcache if running inside Valgrind, and remove Valgrind calls in tcache-specific code. Restructure Valgrind-related code to move most Valgrind calls out of the fast path functions. Take advantage of static knowledge to elide some branches in JEMALLOC_VALGRIND_REALLOC().	2014-04-15 16:49:57 -07:00
Jason Evans	4d434adb14	Make dss non-optional, and fix an "arena.<i>.dss" mallctl bug. Make dss non-optional on all platforms which support sbrk(2). Fix the "arena.<i>.dss" mallctl to return an error if "primary" or "secondary" precedence is specified, but sbrk(2) is not supported.	2014-04-15 12:09:48 -07:00
Jason Evans	9b0cbf0850	Remove support for non-prof-promote heap profiling metadata. Make promotion of sampled small objects to large objects mandatory, so that profiling metadata can always be stored in the chunk map, rather than requiring one pointer per small region in each small-region page run. In practice the non-prof-promote code was only useful when using jemalloc to track all objects and report them as leaks at program exit. However, Valgrind is at least as good a tool for this particular use case. Furthermore, the non-prof-promote code is getting in the way of some optimizations that will make heap profiling much cheaper for the predominant use case (sampling a small representative proportion of all allocations).	2014-04-11 14:24:51 -07:00

1 2 3

102 Commits