server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Jason Evans	8fadb1a8c2	Implement chunk hook support for page run commit/decommit. Cascade from decommit to purge when purging unused dirty pages, so that it is possible to decommit cleaned memory rather than just purging. For non-Windows debug builds, decommit runs rather than purging them, since this causes access of deallocated runs to segfault. This resolves #251.	2015-08-07 00:50:58 -07:00
Jason Evans	b49a334a64	Generalize chunk management hooks. Add the "arena.<i>.chunk_hooks" mallctl, which replaces and expands on the "arena.<i>.chunk.{alloc,dalloc,purge}" mallctls. The chunk hooks allow control over chunk allocation/deallocation, decommit/commit, purging, and splitting/merging, such that the application can rely on jemalloc's internal chunk caching and retaining functionality, yet implement a variety of chunk management mechanisms and policies. Merge the chunks_[sz]ad_{mmap,dss} red-black trees into chunks_[sz]ad_retained. This slightly reduces how hard jemalloc tries to honor the dss precedence setting; prior to this change the precedence setting was also consulted when recycling chunks. Fix chunk purging. Don't purge chunks in arena_purge_stashed(); instead deallocate them in arena_unstash_purged(), so that the dirty memory linkage remains valid until after the last time it is used. This resolves #176 and #201.	2015-08-03 21:49:02 -07:00
Jason Evans	4acd75a694	Add the "stats.allocated" mallctl.	2015-03-23 17:26:53 -07:00
Jason Evans	8d6a3e8321	Implement dynamic per arena control over dirty page purging. Add mallctls: - arenas.lg_dirty_mult is initialized via opt.lg_dirty_mult, and can be modified to change the initial lg_dirty_mult setting for newly created arenas. - arena.<i>.lg_dirty_mult controls an individual arena's dirty page purging threshold, and synchronously triggers any purging that may be necessary to maintain the constraint. - arena.<i>.chunk.purge allows the per arena dirty page purging function to be replaced. This resolves #93.	2015-03-18 18:55:33 -07:00
Jason Evans	38e42d311c	Refactor dirty run linkage to reduce sizeof(extent_node_t).	2015-03-10 18:15:40 -07:00
Jason Evans	99bd94fb65	Fix chunk cache races. These regressions were introduced by `ee41ad409a` (Integrate whole chunks into unused dirty page purging machinery.).	2015-02-18 16:40:53 -08:00
Jason Evans	738e089a2e	Rename "dirty chunks" to "cached chunks". Rename "dirty chunks" to "cached chunks", in order to avoid overloading the term "dirty". Fix the regression caused by `339c2b23b2` (Fix chunk_unmap() to propagate dirty state.), and actually address what that change attempted, which is to only purge chunks once, and propagate whether zeroed pages resulted into chunk_record().	2015-02-18 01:15:50 -08:00
Jason Evans	47701b22ee	arena_chunk_dirty_node_init() --> extent_node_dirty_linkage_init()	2015-02-17 22:23:10 -08:00
Jason Evans	a4e1888d1a	Simplify extent_node_t and add extent_node_init().	2015-02-17 15:13:52 -08:00
Jason Evans	ee41ad409a	Integrate whole chunks into unused dirty page purging machinery. Extend per arena unused dirty page purging to manage unused dirty chunks in aaddtion to unused dirty runs. Rather than immediately unmapping deallocated chunks (or purging them in the --disable-munmap case), store them in a separate set of trees, chunks_[sz]ad_dirty. Preferrentially allocate dirty chunks. When excessive unused dirty pages accumulate, purge runs and chunks in ingegrated LRU order (and unmap chunks in the --enable-munmap case). Refactor extent_node_t to provide accessor functions.	2015-02-16 21:02:17 -08:00
Jason Evans	cbf3a6d703	Move centralized chunk management into arenas. Migrate all centralized data structures related to huge allocations and recyclable chunks into arena_t, so that each arena can manage huge allocations and recyclable virtual memory completely independently of other arenas. Add chunk node caching to arenas, in order to avoid contention on the base allocator. Use chunks_rtree to look up huge allocations rather than a red-black tree. Maintain a per arena unsorted list of huge allocations (which will be needed to enumerate huge allocations during arena reset). Remove the --enable-ivsalloc option, make ivsalloc() always available, and use it for size queries if --enable-debug is enabled. The only practical implications to this removal are that 1) ivsalloc() is now always available during live debugging (and the underlying radix tree is available during core-based debugging), and 2) size query validation can no longer be enabled independent of --enable-debug. Remove the stats.chunks.{current,total,high} mallctls, and replace their underlying statistics with simpler atomically updated counters used exclusively for gdump triggering. These statistics are no longer very useful because each arena manages chunks independently, and per arena statistics provide similar information. Simplify chunk synchronization code, now that base chunk allocation cannot cause recursive lock acquisition.	2015-02-12 00:15:56 -08:00
Jason Evans	1cb181ed63	Implement explicit tcache support. Add the MALLOCX_TCACHE() and MALLOCX_TCACHE_NONE macros, which can be used in conjunction with the *allocx() API. Add the tcache.create, tcache.flush, and tcache.destroy mallctls. This resolves #145.	2015-02-09 17:44:48 -08:00
Jason Evans	8d0e04d42f	Refactor rtree to be lock-free. Recent huge allocation refactoring associates huge allocations with arenas, but it remains necessary to quickly look up huge allocation metadata during reallocation/deallocation. A global radix tree remains a good solution to this problem, but locking would have become the primary bottleneck after (upcoming) migration of chunk management from global to per arena data structures. This lock-free implementation uses double-checked reads to traverse the tree, so that in the steady state, each read or write requires only a single atomic operation. This implementation also assures that no more than two tree levels actually exist, through a combination of careful virtual memory allocation which makes large sparse nodes cheap, and skipping the root node on x64 (possible because the top 16 bits are all 0 in practice).	2015-02-04 16:51:53 -08:00
Jason Evans	f500a10b2e	Refactor base_alloc() to guarantee demand-zeroed memory. Refactor base_alloc() to guarantee that allocations are carved from demand-zeroed virtual memory. This supports sparse data structures such as multi-page radix tree nodes. Enhance base_alloc() to keep track of fragments which were too small to support previous allocation requests, and try to consume them during subsequent requests. This becomes important when request sizes commonly approach or exceed the chunk size (as could radix tree node allocations).	2015-02-04 16:51:53 -08:00
Jason Evans	a55dfa4b0a	Implement more atomic operations. - atomic__p(). - atomic_cas_(). - atomic_write_*().	2015-02-04 16:50:05 -08:00
Jason Evans	5b8ed5b7c9	Implement the prof.gdump mallctl. This feature makes it possible to toggle the gdump feature on/off during program execution, whereas the the opt.prof_dump mallctl value can only be set during program startup. This resolves #72.	2015-01-25 21:21:35 -08:00
Jason Evans	4581b97809	Implement metadata statistics. There are three categories of metadata: - Base allocations are used for bootstrap-sensitive internal allocator data structures. - Arena chunk headers comprise pages which track the states of the non-metadata pages. - Internal allocations differ from application-originated allocations in that they are for internal use, and that they are omitted from heap profiles. The metadata statistics comprise the metadata categories as follows: - stats.metadata: All metadata -- base + arena chunk headers + internal allocations. - stats.arenas.<i>.metadata.mapped: Arena chunk headers. - stats.arenas.<i>.metadata.allocated: Internal allocations. This is reported separately from the other metadata statistics because it overlaps with the allocated and active statistics, whereas the other metadata statistics do not. Base allocations are not reported separately, though their magnitude can be computed by subtracting the arena-specific metadata. This resolves #163.	2015-01-23 23:34:43 -08:00
Jason Evans	10aff3f3e1	Refactor bootstrapping to delay tsd initialization. Refactor bootstrapping to delay tsd initialization, primarily to support integration with FreeBSD's libc. Refactor a0() for internal-only use, and add the bootstrap_{malloc,calloc,free}() API for use by FreeBSD's libc. This separation limits use of the a0() functions to metadata allocation, which doesn't require malloc/calloc/free API compatibility. This resolves #170.	2015-01-22 14:04:27 -08:00
Abhishek Kulkarni	b617df81bb	Add missing symbols to private_symbols.txt. This resolves #185.	2015-01-21 12:44:35 -08:00
Guilherme Goncalves	2c5cb613df	Introduce two new modes of junk filling: "alloc" and "free". In addition to true/false, opt.junk can now be either "alloc" or "free", giving applications the possibility of junking memory only on allocation or deallocation. This resolves #172.	2014-12-14 17:07:26 -08:00
Jason Evans	9cf2be0a81	Make quarantine_init() static.	2014-11-07 14:50:38 -08:00
Jason Evans	c002a5c800	Fix two quarantine regressions. Fix quarantine to actually update tsd when expanding, and to avoid double initialization (leaking the first quarantine) due to recursive initialization. This resolves #161.	2014-11-04 18:03:11 -08:00
Jason Evans	9b41ac909f	Fix huge allocation statistics.	2014-10-14 22:20:00 -07:00
Jason Evans	fc0b3b7383	Add configure options. Add: --with-lg-page --with-lg-page-sizes --with-lg-size-class-group --with-lg-quantum Get rid of STATIC_PAGE_SHIFT, in favor of directly setting LG_PAGE. Fix various edge conditions exposed by the configure options.	2014-10-09 22:44:37 -07:00
Jason Evans	8bb3198f72	Refactor/fix arenas manipulation. Abstract arenas access to use arena_get() (or a0get() where appropriate) rather than directly reading e.g. arenas[ind]. Prior to the addition of the arenas.extend mallctl, the worst possible outcome of directly accessing arenas was a stale read, but arenas.extend may allocate and assign a new array to arenas. Add a tsd-based arenas_cache, which amortizes arenas reads. This introduces some subtle bootstrapping issues, with tsd_boot() now being split into tsd_boot[01]() to support tsd wrapper allocation bootstrapping, as well as an arenas_cache_bypass tsd variable which dynamically terminates allocation of arenas_cache itself. Promote a0malloc(), a0calloc(), and a0free() to be generally useful for internal allocation, and use them in several places (more may be appropriate). Abstract arena->nthreads management and fix a missing decrement during thread destruction (recent tsd refactoring left arenas_cleanup() unused). Change arena_choose() to propagate OOM, and handle OOM in all callers. This is important for providing consistent allocation behavior when the MALLOCX_ARENA() flag is being used. Prior to this fix, it was possible for an OOM to result in allocation silently allocating from a different arena than the one specified.	2014-10-07 23:14:57 -07:00
Jason Evans	155bfa7da1	Normalize size classes. Normalize size classes to use the same number of size classes per size doubling (currently hard coded to 4), across the intire range of size classes. Small size classes already used this spacing, but in order to support this change, additional small size classes now fill [4 KiB .. 16 KiB). Large size classes range from [16 KiB .. 4 MiB). Huge size classes now support non-multiples of the chunk size in order to fill (4 MiB .. 16 MiB).	2014-10-06 01:45:13 -07:00
Jason Evans	029d44cf8b	Fix tsd cleanup regressions. Fix tsd cleanup regressions that were introduced in `5460aa6f66` (Convert all tsd variables to reside in a single tsd structure.). These regressions were twofold: 1) tsd_tryget() should never (and need never) return NULL. Rename it to tsd_fetch() and simplify all callers. 2) tsd__set() must only be called when tsd is in the nominal state, because cleanup happens during the nominal-->purgatory transition, and re-initialization must not happen while in the purgatory state. Add tsd_nominal() and use it as needed. Note that tsd_{p,}_get() can still be used as long as no re-initialization that would require cleanup occurs. This means that e.g. the thread_allocated counter can be updated unconditionally.	2014-10-04 11:22:55 -07:00
Jason Evans	fc12c0b8bc	Implement/test/fix prof-related mallctl's. Implement/test/fix the opt.prof_thread_active_init, prof.thread_active_init, and thread.prof.active mallctl's. Test/fix the thread.prof.name mallctl. Refactor opt_prof_active to be read-only and move mutable state into the prof_active variable. Stop leaning on ctl-related locking for protection.	2014-10-03 23:25:30 -07:00
Jason Evans	20c31deaae	Test prof.reset mallctl and fix numerous discovered bugs.	2014-10-02 23:01:10 -07:00
Jason Evans	0c5dd03e88	Move small run metadata into the arena chunk header. Move small run metadata into the arena chunk header, with multiple expected benefits: - Lower run fragmentation due to reduced run sizes; runs are more likely to completely drain when there are fewer total regions. - Improved cache behavior. Prior to this change, run headers were always page-aligned, which put extra pressure on some CPU cache sets. The degree to which this was a problem was hardware dependent, but it likely hurt some even for the most advanced modern hardware. - Buffer overruns/underruns are less likely to corrupt allocator metadata. - Size classes between 4 KiB and 16 KiB become reasonable to support without any special handling, and the runs are small enough that dirty unused pages aren't a significant concern.	2014-09-29 01:31:39 -07:00
Jason Evans	5460aa6f66	Convert all tsd variables to reside in a single tsd structure.	2014-09-23 02:36:08 -07:00
Jason Evans	6e73dc194e	Fix a profile sampling race. Fix a profile sampling race that was due to preparing to sample, yet doing nothing to assure that the context remains valid until the stats are updated. These regressions were caused by `602c8e0971` (Implement per thread heap profiling.), which did not make it into any releases prior to these fixes.	2014-09-09 19:47:09 -07:00
Daniel Micay	4cfe55166e	Add support for sized deallocation. This adds a new `sdallocx` function to the external API, allowing the size to be passed by the caller. It avoids some extra reads in the thread cache fast path. In the case where stats are enabled, this avoids the work of calculating the size from the pointer. An assertion validates the size that's passed in, so enabling debugging will allow users of the API to debug cases where an incorrect size is passed in. The performance win for a contrived microbenchmark doing an allocation and immediately freeing it is ~10%. It may have a different impact on a real workload. Closes #28	2014-09-08 17:34:24 -07:00
Jason Evans	b718cf77e9	Optimize [nmd]alloc() fast paths. Optimize [nmd]alloc() fast paths such that the (flags == 0) case is streamlined, flags decoding only happens to the minimum degree necessary, and no conditionals are repeated.	2014-09-07 14:40:19 -07:00
Qinfan Wu	ff6a31d3b9	Refactor chunk map. Break the chunk map into two separate arrays, in order to improve cache locality. This is related to issue #23.	2014-09-04 22:22:52 -07:00
Jason Evans	602c8e0971	Implement per thread heap profiling. Rename data structures (prof_thr_cnt_t-->prof_tctx_t, prof_ctx_t-->prof_gctx_t), and convert to storing a prof_tctx_t for sampled objects. Convert PROF_ALLOC_PREP() to prof_alloc_prep(), since precise backtrace depth within jemalloc functions is no longer an issue (pprof prunes irrelevant frames). Implement mallctl's: - prof.reset implements full sample data reset, and optional change of sample interval. - prof.lg_sample reads the current sample interval (opt.lg_prof_sample was the permanent source of truth prior to prof.reset). - thread.prof.name provides naming capability for threads within heap profile dumps. - thread.prof.active makes it possible to activate/deactivate heap profiling for individual threads. Modify the heap dump files to contain per thread heap profile data. This change is incompatible with the existing pprof, which will require enhancements to read and process the enriched data.	2014-08-19 21:31:16 -07:00
Jason Evans	d04047cc29	Add size class computation capability. Add size class computation capability, currently used only as validation of the size class lookup tables. Generalize the size class spacing used for bins, for eventual use throughout the full range of allocation sizes.	2014-05-28 21:06:46 -07:00
Jason Evans	e2deab7a75	Refactor huge allocation to be managed by arenas. Refactor huge allocation to be managed by arenas (though the global red-black tree of huge allocations remains for lookup during deallocation). This is the logical conclusion of recent changes that 1) made per arena dss precedence apply to huge allocation, and 2) made it possible to replace the per arena chunk allocation/deallocation functions. Remove the top level huge stats, and replace them with per arena huge stats. Normalize function names and types to dalloc (some were dealloc). Remove the --enable-mremap option. As jemalloc currently operates, this is a performace regression for some applications, but planned work to logarithmically space huge size classes should provide similar amortized performance. The motivation for this change was that mremap-based huge reallocation forced leaky abstractions that prevented refactoring.	2014-05-15 22:36:41 -07:00
aravind	fb7fe50a88	Add support for user-specified chunk allocators/deallocators. Add new mallctl endpoints "arena<i>.chunk.alloc" and "arena<i>.chunk.dealloc" to allow userspace to configure jemalloc's chunk allocator and deallocator on a per-arena basis.	2014-05-12 10:46:03 -07:00
Jason Evans	3541a904d6	Refactor small_size2bin and small_bin2size. Refactor small_size2bin and small_bin2size to be inline functions rather than directly accessed arrays.	2014-04-16 17:14:33 -07:00
Jason Evans	3e3caf03af	Merge pull request #73 from bmaurer/smallmalloc Smaller malloc hot path	2014-04-16 16:33:21 -07:00
Ben Maurer	021136ce4d	Create a const array with only a small bin to size map	2014-04-16 14:31:24 -07:00
Ben Maurer	6c39f9e059	refactor profiling. only use a bytes till next sample variable.	2014-04-16 13:43:30 -07:00
Ben Maurer	a7619b7fa5	outline rare tcache_get codepaths	2014-04-16 13:36:56 -07:00
Jason Evans	bd87b01999	Optimize Valgrind integration. Forcefully disable tcache if running inside Valgrind, and remove Valgrind calls in tcache-specific code. Restructure Valgrind-related code to move most Valgrind calls out of the fast path functions. Take advantage of static knowledge to elide some branches in JEMALLOC_VALGRIND_REALLOC().	2014-04-15 16:49:57 -07:00
Jason Evans	ecd3e59ca3	Remove the "opt.valgrind" mallctl. Remove the "opt.valgrind" mallctl because it is unnecessary -- jemalloc automatically detects whether it is running inside valgrind.	2014-04-15 14:33:50 -07:00
Jason Evans	9790b9667f	Remove the allocm() API, which is superceded by the allocx() API.	2014-04-14 22:32:31 -07:00
Jason Evans	9b0cbf0850	Remove support for non-prof-promote heap profiling metadata. Make promotion of sampled small objects to large objects mandatory, so that profiling metadata can always be stored in the chunk map, rather than requiring one pointer per small region in each small-region page run. In practice the non-prof-promote code was only useful when using jemalloc to track all objects and report them as leaks at program exit. However, Valgrind is at least as good a tool for this particular use case. Furthermore, the non-prof-promote code is getting in the way of some optimizations that will make heap profiling much cheaper for the predominant use case (sampling a small representative proportion of all allocations).	2014-04-11 14:24:51 -07:00
Jason Evans	8a26eaca7f	Add private namespace mangling for huge_dss_prec_get().	2014-03-31 09:31:38 -07:00
Jason Evans	772163b4f3	Add heap profiling tests. Fix a regression in prof_dump_ctx() due to an uninitized variable. This was caused by revision `4f37ef693e`, so no releases are affected.	2014-01-17 15:40:52 -08:00
Jason Evans	b2c31660be	Extract profiling code from [re]allocation functions. Extract profiling code from malloc(), imemalign(), calloc(), realloc(), mallocx(), rallocx(), and xallocx(). This slightly reduces the amount of code compiled into the fast paths, but the primary benefit is the combinatorial complexity reduction. Simplify iralloc[t]() by creating a separate ixalloc() that handles the no-move cases. Further simplify [mrxn]allocx() (and by implication [mrn]allocm()) to make request size overflows due to size class and/or alignment constraints trigger undefined behavior (detected by debug-only assertions). Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling backtrace creation in imemalign(). This bug impacted posix_memalign() and aligned_alloc().	2014-01-12 15:41:05 -08:00
Jason Evans	6b694c4d47	Add junk/zero filling unit tests, and fix discovered bugs. Fix growing large reallocation to junk fill new space. Fix huge deallocation to junk fill when munmap is disabled.	2014-01-07 16:54:17 -08:00
Jason Evans	b980cc774a	Add rtree unit tests.	2014-01-02 16:17:15 -08:00
Jason Evans	0d6c5d8bd0	Add quarantine unit tests. Verify that freed regions are quarantined, and that redzone corruption is detected. Introduce a testing idiom for intercepting/replacing internal functions. In this case the replaced function is ordinarily a static function, but the idiom should work similarly for library-private functions.	2013-12-17 15:19:12 -08:00
Jason Evans	d82a5e6a34	Implement the allocx() API. Implement the allocx() API, which is a successor to the allocm() API. The allocx() functions are slightly simpler to use because they have fewer parameters, they directly return the results of primary interest, and mallocx()/rallocx() avoid the strict aliasing pitfall that allocm()/rallocx() share with posix_memalign(). The following code violates strict aliasing rules: foo_t foo; allocm((void )&foo, NULL, 42, 0); whereas the following is safe: foo_t foo; void p; allocm(&p, NULL, 42, 0); foo = (foo_t )p; mallocx() does not have this problem: foo_t foo = (foo_t )mallocx(42, 0);	2013-12-12 22:35:52 -08:00
Jason Evans	86abd0dcd8	Refactor to support more varied testing. Refactor the test harness to support three types of tests: - unit: White box unit tests. These tests have full access to all internal jemalloc library symbols. Though in actuality all symbols are prefixed by jet_, macro-based name mangling abstracts this away from test code. - integration: Black box integration tests. These tests link with the installable shared jemalloc library, and with the exception of some utility code and configure-generated macro definitions, they have no access to jemalloc internals. - stress: Black box stress tests. These tests link with the installable shared jemalloc library, as well as with an internal allocator with symbols prefixed by jet_ (same as for unit tests) that can be used to allocate data structures that are internal to the test code. Move existing tests into test/{unit,integration}/ as appropriate. Split out internal parts of jemalloc_defs.h.in and put them in jemalloc_internal_defs.h.in. This reduces internals exposure to applications that #include <jemalloc/jemalloc.h>. Refactor jemalloc.h header generation so that a single header file results, and the prototypes can be used to generate jet_ prototypes for tests. Split jemalloc.h.in into multiple parts (jemalloc_defs.h.in, jemalloc_macros.h.in, jemalloc_protos.h.in, jemalloc_mangle.h.in) and use a shell script to generate a unified jemalloc.h at configure time. Change the default private namespace prefix from "" to "je_". Add missing private namespace mangling. Remove hard-coded private_namespace.h. Instead generate it and private_unnamespace.h from private_symbols.txt. Use similar logic for public symbols, which aids in name mangling for jet_ symbols. Add test_warn() and test_fail(). Replace existing exit(1) calls with test_fail() calls.	2013-12-03 22:06:59 -08:00

1 2 3

106 Commits