Commit Graph

307 Commits

Author SHA1 Message Date
David Goldblatt
4d2e4bf5eb Get rid of most of the various inline macros. 2017-04-24 10:33:21 -07:00
Jason Evans
4403c9ab44 Remove --disable-tcache.
Simplify configuration by removing the --disable-tcache option, but
replace the testing for that configuration with
--with-malloc-conf=tcache:false.

Fix the thread.arena and thread.tcache.flush mallctls to work correctly
if tcache is disabled.

This partially resolves #580.
2017-04-21 10:06:12 -07:00
Qi Wang
5aa46f027d Bypass extent tracking for auto arenas.
Tracking extents is required by arena_reset.  To support this, the extent
linkage was used for tracking 1) large allocations, and 2) full slabs.  However
modifying the extent linkage could be an expensive operation as it likely incurs
cache misses.  Since we forbid arena_reset on auto arenas, let's bypass the
linkage operations for auto arenas.
2017-04-21 00:29:18 -07:00
David Goldblatt
d9ec36e22d Header refactoring: move assert.h out of the catch-all 2017-04-18 18:35:03 -07:00
David Goldblatt
f692e6c214 Header refactoring: move util.h out of the catchall 2017-04-18 18:35:03 -07:00
Jason Evans
881fbf762f Prefer old/low extent_t structures during reuse.
Rather than using a LIFO queue to track available extent_t structures,
use a red-black tree, and always choose the oldest/lowest available
during reuse.
2017-04-17 14:47:45 -07:00
Qi Wang
c2fcf9c2cf Switch to fine-grained reentrancy support.
Previously we had a general detection and support of reentrancy, at the cost of
having branches and inc / dec operations on fast paths.  To avoid taxing fast
paths, we move the reentrancy operations onto tsd slow state, and only modify
reentrancy level around external calls (that might trigger reentrancy).
2017-04-14 19:48:06 -07:00
Qi Wang
ccfe68a916 Pass alloc_ctx down profiling path.
With this change, when profiling is enabled, we avoid doing redundant rtree
lookups. Also changed dalloc_atx_t to alloc_atx_t, as it's now used on
allocation path as well (to speed up profiling).
2017-04-12 13:55:39 -07:00
Qi Wang
f35213bae4 Pass dalloc_ctx down the sdalloc path.
This avoids redundant rtree lookups.
2017-04-12 13:55:39 -07:00
David Goldblatt
743d940dc3 Header refactoring: Split up jemalloc_internal.h
This is a biggy.  jemalloc_internal.h has been doing multiple jobs for a while
now:
- The source of system-wide definitions.
- The catch-all include file.
- The module header file for jemalloc.c

This commit splits up this functionality.  The system-wide definitions
responsibility has moved to jemalloc_preamble.h.  The catch-all include file is
now jemalloc_internal_includes.h.  The module headers for jemalloc.c are now in
jemalloc_internal_[externs|inlines|types].h, just as they are for the other
modules.
2017-04-11 11:52:30 -07:00
Qi Wang
04ef218d87 Move reentrancy_level to the beginning of TSD. 2017-04-07 16:25:43 -07:00
David Goldblatt
b407a65401 Add basic reentrancy-checking support, and allow arena_new to reenter.
This checks whether or not we're reentrant using thread-local data, and, if we
are, moves certain internal allocations to use arena 0 (which should be properly
initialized after bootstrapping).

The immediate thing this allows is spinning up threads in arena_new, which will
enable spinning up background threads there.
2017-04-07 14:10:27 -07:00
Qi Wang
36bd90b962 Optimizing TSD and thread cache layout.
1) Re-organize TSD so that frequently accessed fields are closer to the
beginning and more compact.  Assuming 64-bit, the first 2.5 cachelines now
contains everything needed on tcache fast path, expect the tcache struct itself.

2) Re-organize tcache and tbins.  Take lg_fill_div out of tbin, and reduce tbin
to 24 bytes (down from 32). Split tbins into tbins_small and tbins_large, and
place tbins_small close to the beginning.
2017-04-07 14:06:17 -07:00
David Goldblatt
56b72c7b17 Transition arena struct fields to C11 atomics 2017-04-05 16:25:37 -07:00
David Goldblatt
7da04a6b09 Convert prng module to use C11-style atomics 2017-04-04 16:45:52 -07:00
Jason Evans
07f4f93434 Move arena_slab_data_t's nfree into extent_t's e_bits.
Compact extent_t to 128 bytes on 64-bit systems by moving
arena_slab_data_t's nfree into extent_t's e_bits.

Cacheline-align extent_t structures so that they always cross the
minimum number of cacheline boundaries.

Re-order extent_t fields such that all fields except the slab bitmap
(and overlaid heap profiling context pointer) are in the first
cacheline.

This resolves #461.
2017-03-27 22:43:39 -07:00
Jason Evans
c8021d01f6 Implement bitmap_ffu(), which finds the first unset bit. 2017-03-24 17:52:46 -07:00
Qi Wang
362e356675 Profile per arena base mutex, instead of just a0. 2017-03-23 00:03:28 -07:00
Qi Wang
d3fde1c124 Refactor mutex profiling code with x-macros. 2017-03-23 00:03:28 -07:00
Qi Wang
20b8c70e9f Added extents_dirty / _muzzy mutexes, as well as decay_dirty / _muzzy. 2017-03-23 00:03:28 -07:00
Qi Wang
64c5f5c174 Added "stats.mutexes.reset" mallctl to reset all mutex stats.
Also switched from the term "lock" to "mutex".
2017-03-23 00:03:28 -07:00
Qi Wang
ca9074deff Added lock profiling and output for global locks (ctl, prof and base). 2017-03-23 00:03:28 -07:00
Qi Wang
0fb5c0e853 Add arena lock stats output. 2017-03-23 00:03:28 -07:00
Qi Wang
a4f176af57 Output bin lock profiling results to malloc_stats.
Two counters are included for the small bins: lock contention rate, and
max lock waiting time.
2017-03-23 00:03:28 -07:00
Jason Evans
5e67fbc367 Push down iealloc() calls.
Call iealloc() as deep into call chains as possible without causing
redundant calls.
2017-03-22 18:33:32 -07:00
Jason Evans
51a2ec92a1 Remove extent dereferences from the deallocation fast paths. 2017-03-22 18:33:32 -07:00
Jason Evans
4f341412e5 Remove extent arg from isalloc() and arena_salloc(). 2017-03-22 18:33:32 -07:00
Jason Evans
99d68445ef Incorporate szind/slab into rtree leaves.
Expand and restructure the rtree API such that all common operations can
be achieved with minimal work, regardless of whether the rtree leaf
fields are independent versus packed into a single atomic pointer.
2017-03-22 18:33:32 -07:00
Jason Evans
f50d6009fe Remove binind field from arena_slab_data_t.
binind is now redundant; the containing extent_t's szind field always
provides the same value.
2017-03-22 18:33:32 -07:00
Jason Evans
e8921cf2eb Convert extent_t's usize to szind.
Rather than storing usize only for large (and prof-promoted)
allocations, store the size class index for allocations that reside
within the extent, such that the size class index is valid for all
extents that contain extant allocations, and invalid otherwise (mainly
to make debugging simpler).
2017-03-22 18:33:32 -07:00
Jason Evans
64e458f5cd Implement two-phase decay-based purging.
Split decay-based purging into two phases, the first of which uses lazy
purging to convert dirty pages to "muzzy", and the second of which uses
forced purging, decommit, or unmapping to convert pages to clean or
destroy them altogether.  Not all operating systems support lazy
purging, yet the application may provide extent hooks that implement
lazy purging, so care must be taken to dynamically omit the first phase
when necessary.

The mallctl interfaces change as follows:
- opt.decay_time --> opt.{dirty,muzzy}_decay_time
- arena.<i>.decay_time --> arena.<i>.{dirty,muzzy}_decay_time
- arenas.decay_time --> arenas.{dirty,muzzy}_decay_time
- stats.arenas.<i>.pdirty --> stats.arenas.<i>.p{dirty,muzzy}
- stats.arenas.<i>.{npurge,nmadvise,purged} -->
  stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged}

This resolves #521.
2017-03-15 13:13:47 -07:00
Jason Evans
38a5bfc816 Move arena_t's purging field into arena_decay_t. 2017-03-15 13:13:47 -07:00
Jason Evans
765edd67b4 Refactor decay-related function parametrization.
Refactor most of the decay-related functions to take as parameters the
decay_t and associated extents_t structures to operate on.  This
prepares for supporting both lazy and forced purging on different decay
schedules.
2017-03-15 13:13:47 -07:00
David Goldblatt
ee202efc79 Convert remaining arena_stats_t fields to atomics
These were all size_ts, so we have atomics support for them on all platforms, so
the conversion is straightforward.

Left non-atomic is curlextents, which AFAICT is not used atomically anywhere.
2017-03-13 18:22:33 -07:00
David Goldblatt
4fc2acf5ae Switch atomic uint64_ts in arena_stats_t to C11 atomics
I expect this to be the trickiest conversion we will see, since we want atomics
on 64-bit platforms, but are also always able to piggyback on some sort of
external synchronization on non-64 bit platforms.
2017-03-13 18:22:33 -07:00
Jason Evans
3a2b183d5f Convert arena_t's purging field to non-atomic bool.
The decay mutex already protects all accesses.
2017-03-10 10:14:30 -08:00
Qi Wang
ec532e2c5c Implement per-CPU arena.
The new feature, opt.percpu_arena, determines thread-arena association
dynamically based CPU id. Three modes are supported: "percpu", "phycpu"
and disabled.

"percpu" uses the current core id (with help from sched_getcpu())
directly as the arena index, while "phycpu" will assign threads on the
same physical CPU to the same arena. In other words, "percpu" means # of
arenas == # of CPUs, while "phycpu" has # of arenas == 1/2 * (# of
CPUs). Note that no runtime check on whether hyper threading is enabled
is added yet.

When enabled, threads will be migrated between arenas when a CPU change
is detected. In the current design, to reduce overhead from reading CPU
id, each arena tracks the thread accessed most recently. When a new
thread comes in, we will read CPU id and update arena if necessary.
2017-03-08 23:19:01 -08:00
Qi Wang
8721e19c04 Fix arena_prefork lock rank order for witness.
When witness is enabled, lock rank order needs to be preserved during
prefork, not only for each arena, but also across arenas. This change
breaks arena_prefork into further stages to ensure valid rank order
across arenas. Also changed test/unit/fork to use a manual arena to
catch this case.
2017-03-08 23:07:27 -08:00
Jason Evans
e201e24904 Perform delayed coalescing prior to purging.
Rather than purging uncoalesced extents, perform just enough incremental
coalescing to purge only fully coalesced extents.  In the absence of
cached extent reuse, the immediate versus delayed incremental purging
algorithms result in the same purge order.

This resolves #655.
2017-03-07 10:25:12 -08:00
David Goldblatt
4f1e94658a Change arena to use the atomic functions for ssize_t instead of the union strategy 2017-03-06 18:49:19 -08:00
Jason Evans
04d8fcb745 Optimize malloc_large_stats_t maintenance.
Convert the nrequests field to be partially derived, and the curlextents
to be fully derived, in order to reduce the number of stats updates
needed during common operations.

This change affects ndalloc stats during arena reset, because it is no
longer possible to cancel out ndalloc effects (curlextents would become
negative).
2017-03-04 08:18:31 -08:00
Jason Evans
fd058f572b Immediately purge cached extents if decay_time is 0.
This fixes a regression caused by
54269dc0ed (Remove obsolete
arena_maybe_purge() call.), as well as providing a general fix.

This resolves #665.
2017-03-02 19:43:06 -08:00
Jason Evans
d61a5f76b2 Convert arena_decay_t's time to be atomically synchronized. 2017-03-02 19:43:06 -08:00
Jason Evans
472fef2e12 Fix {allocated,nmalloc,ndalloc,nrequests}_large stats regression.
This fixes a regression introduced by
d433471f58 (Derive
{allocated,nmalloc,ndalloc,nrequests}_large stats.).
2017-02-27 11:18:07 -08:00
Jason Evans
54269dc0ed Remove obsolete arena_maybe_purge() call.
Remove a call to arena_maybe_purge() that was necessary for ratio-based
purging, but is obsolete in the context of decay-based purging.
2017-02-21 12:46:41 -08:00
Jason Evans
2dfc5b5aac Disable coalescing of cached extents.
Extent splitting and coalescing is a major component of large allocation
overhead, and disabling coalescing of cached extents provides a simple
and effective hysteresis mechanism.  Once two-phase purging is
implemented, it will probably make sense to leave coalescing disabled
for the first phase, but coalesce during the second phase.
2017-02-16 20:11:50 -08:00
Jason Evans
b0654b95ed Fix arena->stats.mapped accounting.
Mapped memory increases when extent_alloc_wrapper() succeeds, and
decreases when extent_dalloc_wrapper() is called (during purging).
2017-02-16 15:52:11 -08:00
Jason Evans
f8fee6908d Synchronize arena->decay with arena->decay.mtx.
This removes the last use of arena->lock.
2017-02-16 09:39:46 -08:00
Jason Evans
d433471f58 Derive {allocated,nmalloc,ndalloc,nrequests}_large stats.
This mildly reduces stats update overhead during normal operation.
2017-02-16 09:39:46 -08:00
Jason Evans
ab25d3c987 Synchronize arena->tcache_ql with arena->tcache_ql_mtx.
This replaces arena->lock synchronization.
2017-02-16 09:39:46 -08:00