server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Qi Wang	c2fcf9c2cf	Switch to fine-grained reentrancy support. Previously we had a general detection and support of reentrancy, at the cost of having branches and inc / dec operations on fast paths. To avoid taxing fast paths, we move the reentrancy operations onto tsd slow state, and only modify reentrancy level around external calls (that might trigger reentrancy).	2017-04-14 19:48:06 -07:00
Qi Wang	ccfe68a916	Pass alloc_ctx down profiling path. With this change, when profiling is enabled, we avoid doing redundant rtree lookups. Also changed dalloc_atx_t to alloc_atx_t, as it's now used on allocation path as well (to speed up profiling).	2017-04-12 13:55:39 -07:00
Qi Wang	f35213bae4	Pass dalloc_ctx down the sdalloc path. This avoids redundant rtree lookups.	2017-04-12 13:55:39 -07:00
David Goldblatt	743d940dc3	Header refactoring: Split up jemalloc_internal.h This is a biggy. jemalloc_internal.h has been doing multiple jobs for a while now: - The source of system-wide definitions. - The catch-all include file. - The module header file for jemalloc.c This commit splits up this functionality. The system-wide definitions responsibility has moved to jemalloc_preamble.h. The catch-all include file is now jemalloc_internal_includes.h. The module headers for jemalloc.c are now in jemalloc_internal_[externs\|inlines\|types].h, just as they are for the other modules.	2017-04-11 11:52:30 -07:00
Qi Wang	04ef218d87	Move reentrancy_level to the beginning of TSD.	2017-04-07 16:25:43 -07:00
David Goldblatt	b407a65401	Add basic reentrancy-checking support, and allow arena_new to reenter. This checks whether or not we're reentrant using thread-local data, and, if we are, moves certain internal allocations to use arena 0 (which should be properly initialized after bootstrapping). The immediate thing this allows is spinning up threads in arena_new, which will enable spinning up background threads there.	2017-04-07 14:10:27 -07:00
Qi Wang	36bd90b962	Optimizing TSD and thread cache layout. 1) Re-organize TSD so that frequently accessed fields are closer to the beginning and more compact. Assuming 64-bit, the first 2.5 cachelines now contains everything needed on tcache fast path, expect the tcache struct itself. 2) Re-organize tcache and tbins. Take lg_fill_div out of tbin, and reduce tbin to 24 bytes (down from 32). Split tbins into tbins_small and tbins_large, and place tbins_small close to the beginning.	2017-04-07 14:06:17 -07:00
David Goldblatt	56b72c7b17	Transition arena struct fields to C11 atomics	2017-04-05 16:25:37 -07:00
David Goldblatt	7da04a6b09	Convert prng module to use C11-style atomics	2017-04-04 16:45:52 -07:00
Jason Evans	07f4f93434	Move arena_slab_data_t's nfree into extent_t's e_bits. Compact extent_t to 128 bytes on 64-bit systems by moving arena_slab_data_t's nfree into extent_t's e_bits. Cacheline-align extent_t structures so that they always cross the minimum number of cacheline boundaries. Re-order extent_t fields such that all fields except the slab bitmap (and overlaid heap profiling context pointer) are in the first cacheline. This resolves #461.	2017-03-27 22:43:39 -07:00
Jason Evans	c8021d01f6	Implement bitmap_ffu(), which finds the first unset bit.	2017-03-24 17:52:46 -07:00
Qi Wang	362e356675	Profile per arena base mutex, instead of just a0.	2017-03-23 00:03:28 -07:00
Qi Wang	d3fde1c124	Refactor mutex profiling code with x-macros.	2017-03-23 00:03:28 -07:00
Qi Wang	20b8c70e9f	Added extents_dirty / _muzzy mutexes, as well as decay_dirty / _muzzy.	2017-03-23 00:03:28 -07:00
Qi Wang	64c5f5c174	Added "stats.mutexes.reset" mallctl to reset all mutex stats. Also switched from the term "lock" to "mutex".	2017-03-23 00:03:28 -07:00
Qi Wang	ca9074deff	Added lock profiling and output for global locks (ctl, prof and base).	2017-03-23 00:03:28 -07:00
Qi Wang	0fb5c0e853	Add arena lock stats output.	2017-03-23 00:03:28 -07:00
Qi Wang	a4f176af57	Output bin lock profiling results to malloc_stats. Two counters are included for the small bins: lock contention rate, and max lock waiting time.	2017-03-23 00:03:28 -07:00
Jason Evans	5e67fbc367	Push down iealloc() calls. Call iealloc() as deep into call chains as possible without causing redundant calls.	2017-03-22 18:33:32 -07:00
Jason Evans	51a2ec92a1	Remove extent dereferences from the deallocation fast paths.	2017-03-22 18:33:32 -07:00
Jason Evans	4f341412e5	Remove extent arg from isalloc() and arena_salloc().	2017-03-22 18:33:32 -07:00
Jason Evans	99d68445ef	Incorporate szind/slab into rtree leaves. Expand and restructure the rtree API such that all common operations can be achieved with minimal work, regardless of whether the rtree leaf fields are independent versus packed into a single atomic pointer.	2017-03-22 18:33:32 -07:00
Jason Evans	f50d6009fe	Remove binind field from arena_slab_data_t. binind is now redundant; the containing extent_t's szind field always provides the same value.	2017-03-22 18:33:32 -07:00
Jason Evans	e8921cf2eb	Convert extent_t's usize to szind. Rather than storing usize only for large (and prof-promoted) allocations, store the size class index for allocations that reside within the extent, such that the size class index is valid for all extents that contain extant allocations, and invalid otherwise (mainly to make debugging simpler).	2017-03-22 18:33:32 -07:00
Jason Evans	64e458f5cd	Implement two-phase decay-based purging. Split decay-based purging into two phases, the first of which uses lazy purging to convert dirty pages to "muzzy", and the second of which uses forced purging, decommit, or unmapping to convert pages to clean or destroy them altogether. Not all operating systems support lazy purging, yet the application may provide extent hooks that implement lazy purging, so care must be taken to dynamically omit the first phase when necessary. The mallctl interfaces change as follows: - opt.decay_time --> opt.{dirty,muzzy}_decay_time - arena.<i>.decay_time --> arena.<i>.{dirty,muzzy}_decay_time - arenas.decay_time --> arenas.{dirty,muzzy}_decay_time - stats.arenas.<i>.pdirty --> stats.arenas.<i>.p{dirty,muzzy} - stats.arenas.<i>.{npurge,nmadvise,purged} --> stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged} This resolves #521.	2017-03-15 13:13:47 -07:00
Jason Evans	38a5bfc816	Move arena_t's purging field into arena_decay_t.	2017-03-15 13:13:47 -07:00
Jason Evans	765edd67b4	Refactor decay-related function parametrization. Refactor most of the decay-related functions to take as parameters the decay_t and associated extents_t structures to operate on. This prepares for supporting both lazy and forced purging on different decay schedules.	2017-03-15 13:13:47 -07:00
David Goldblatt	ee202efc79	Convert remaining arena_stats_t fields to atomics These were all size_ts, so we have atomics support for them on all platforms, so the conversion is straightforward. Left non-atomic is curlextents, which AFAICT is not used atomically anywhere.	2017-03-13 18:22:33 -07:00
David Goldblatt	4fc2acf5ae	Switch atomic uint64_ts in arena_stats_t to C11 atomics I expect this to be the trickiest conversion we will see, since we want atomics on 64-bit platforms, but are also always able to piggyback on some sort of external synchronization on non-64 bit platforms.	2017-03-13 18:22:33 -07:00
Jason Evans	3a2b183d5f	Convert arena_t's purging field to non-atomic bool. The decay mutex already protects all accesses.	2017-03-10 10:14:30 -08:00
Qi Wang	ec532e2c5c	Implement per-CPU arena. The new feature, opt.percpu_arena, determines thread-arena association dynamically based CPU id. Three modes are supported: "percpu", "phycpu" and disabled. "percpu" uses the current core id (with help from sched_getcpu()) directly as the arena index, while "phycpu" will assign threads on the same physical CPU to the same arena. In other words, "percpu" means # of arenas == # of CPUs, while "phycpu" has # of arenas == 1/2 * (# of CPUs). Note that no runtime check on whether hyper threading is enabled is added yet. When enabled, threads will be migrated between arenas when a CPU change is detected. In the current design, to reduce overhead from reading CPU id, each arena tracks the thread accessed most recently. When a new thread comes in, we will read CPU id and update arena if necessary.	2017-03-08 23:19:01 -08:00
Qi Wang	8721e19c04	Fix arena_prefork lock rank order for witness. When witness is enabled, lock rank order needs to be preserved during prefork, not only for each arena, but also across arenas. This change breaks arena_prefork into further stages to ensure valid rank order across arenas. Also changed test/unit/fork to use a manual arena to catch this case.	2017-03-08 23:07:27 -08:00
Jason Evans	e201e24904	Perform delayed coalescing prior to purging. Rather than purging uncoalesced extents, perform just enough incremental coalescing to purge only fully coalesced extents. In the absence of cached extent reuse, the immediate versus delayed incremental purging algorithms result in the same purge order. This resolves #655.	2017-03-07 10:25:12 -08:00
David Goldblatt	4f1e94658a	Change arena to use the atomic functions for ssize_t instead of the union strategy	2017-03-06 18:49:19 -08:00
Jason Evans	04d8fcb745	Optimize malloc_large_stats_t maintenance. Convert the nrequests field to be partially derived, and the curlextents to be fully derived, in order to reduce the number of stats updates needed during common operations. This change affects ndalloc stats during arena reset, because it is no longer possible to cancel out ndalloc effects (curlextents would become negative).	2017-03-04 08:18:31 -08:00
Jason Evans	fd058f572b	Immediately purge cached extents if decay_time is 0. This fixes a regression caused by `54269dc0ed` (Remove obsolete arena_maybe_purge() call.), as well as providing a general fix. This resolves #665.	2017-03-02 19:43:06 -08:00
Jason Evans	d61a5f76b2	Convert arena_decay_t's time to be atomically synchronized.	2017-03-02 19:43:06 -08:00
Jason Evans	472fef2e12	Fix {allocated,nmalloc,ndalloc,nrequests}_large stats regression. This fixes a regression introduced by `d433471f58` (Derive {allocated,nmalloc,ndalloc,nrequests}_large stats.).	2017-02-27 11:18:07 -08:00
Jason Evans	54269dc0ed	Remove obsolete arena_maybe_purge() call. Remove a call to arena_maybe_purge() that was necessary for ratio-based purging, but is obsolete in the context of decay-based purging.	2017-02-21 12:46:41 -08:00
Jason Evans	2dfc5b5aac	Disable coalescing of cached extents. Extent splitting and coalescing is a major component of large allocation overhead, and disabling coalescing of cached extents provides a simple and effective hysteresis mechanism. Once two-phase purging is implemented, it will probably make sense to leave coalescing disabled for the first phase, but coalesce during the second phase.	2017-02-16 20:11:50 -08:00
Jason Evans	b0654b95ed	Fix arena->stats.mapped accounting. Mapped memory increases when extent_alloc_wrapper() succeeds, and decreases when extent_dalloc_wrapper() is called (during purging).	2017-02-16 15:52:11 -08:00
Jason Evans	f8fee6908d	Synchronize arena->decay with arena->decay.mtx. This removes the last use of arena->lock.	2017-02-16 09:39:46 -08:00
Jason Evans	d433471f58	Derive {allocated,nmalloc,ndalloc,nrequests}_large stats. This mildly reduces stats update overhead during normal operation.	2017-02-16 09:39:46 -08:00
Jason Evans	ab25d3c987	Synchronize arena->tcache_ql with arena->tcache_ql_mtx. This replaces arena->lock synchronization.	2017-02-16 09:39:46 -08:00
Jason Evans	6b5cba4191	Convert arena->stats synchronization to atomics.	2017-02-16 09:39:46 -08:00
Jason Evans	fa2d64c94b	Convert arena->prof_accumbytes synchronization to atomics.	2017-02-16 09:39:46 -08:00
Jason Evans	b779522b9b	Convert arena->dss_prec synchronization to atomics.	2017-02-16 09:39:46 -08:00
Jason Evans	d27f29b468	Disentangle arena and extent locking. Refactor arena and extent locking protocols such that arena and extent locks are never held when calling into the extent_*_wrapper() API. This requires extra care during purging since the arena lock no longer protects the inner purging logic. It also requires extra care to protect extents from being merged with adjacent extents. Convert extent_t's 'active' flag to an enumerated 'state', so that retained extents are explicitly marked as such, rather than depending on ring linkage state. Refactor the extent collections (and their synchronization) for cached and retained extents into extents_t. Incorporate LRU functionality to support purging. Incorporate page count accounting, which replaces arena->ndirty and arena->stats.retained. Assert that no core locks are held when entering any internal [de]allocation functions. This is in addition to existing assertions that no locks are held when entering external [de]allocation functions. Audit and document synchronization protocols for all arena_t fields. This fixes a potential deadlock due to recursive allocation during gdump, in a similar fashion to `b49c649bc1` (Fix lock order reversal during gdump.), but with a necessarily much broader code impact.	2017-02-01 16:43:46 -08:00
Jason Evans	c0cc5db871	Replace tabs following #define with spaces. This resolves #564.	2017-01-20 21:45:53 -08:00
Jason Evans	f408643a4c	Remove extraneous parens around return arguments. This resolves #540.	2017-01-20 21:43:07 -08:00

1 2 3 4 5 ...

301 Commits