Commit Graph

315 Commits

Author SHA1 Message Date
Jason Evans
64e458f5cd Implement two-phase decay-based purging.
Split decay-based purging into two phases, the first of which uses lazy
purging to convert dirty pages to "muzzy", and the second of which uses
forced purging, decommit, or unmapping to convert pages to clean or
destroy them altogether.  Not all operating systems support lazy
purging, yet the application may provide extent hooks that implement
lazy purging, so care must be taken to dynamically omit the first phase
when necessary.

The mallctl interfaces change as follows:
- opt.decay_time --> opt.{dirty,muzzy}_decay_time
- arena.<i>.decay_time --> arena.<i>.{dirty,muzzy}_decay_time
- arenas.decay_time --> arenas.{dirty,muzzy}_decay_time
- stats.arenas.<i>.pdirty --> stats.arenas.<i>.p{dirty,muzzy}
- stats.arenas.<i>.{npurge,nmadvise,purged} -->
  stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged}

This resolves #521.
2017-03-15 13:13:47 -07:00
Qi Wang
ec532e2c5c Implement per-CPU arena.
The new feature, opt.percpu_arena, determines thread-arena association
dynamically based CPU id. Three modes are supported: "percpu", "phycpu"
and disabled.

"percpu" uses the current core id (with help from sched_getcpu())
directly as the arena index, while "phycpu" will assign threads on the
same physical CPU to the same arena. In other words, "percpu" means # of
arenas == # of CPUs, while "phycpu" has # of arenas == 1/2 * (# of
CPUs). Note that no runtime check on whether hyper threading is enabled
is added yet.

When enabled, threads will be migrated between arenas when a CPU change
is detected. In the current design, to reduce overhead from reading CPU
id, each arena tracks the thread accessed most recently. When a new
thread comes in, we will read CPU id and update arena if necessary.
2017-03-08 23:19:01 -08:00
Qi Wang
8721e19c04 Fix arena_prefork lock rank order for witness.
When witness is enabled, lock rank order needs to be preserved during
prefork, not only for each arena, but also across arenas. This change
breaks arena_prefork into further stages to ensure valid rank order
across arenas. Also changed test/unit/fork to use a manual arena to
catch this case.
2017-03-08 23:07:27 -08:00
Qi Wang
01f47f11a6 Store associated arena in tcache.
This fixes tcache_flush for manual tcaches, which wasn't able to find
the correct arena it associated with. Also changed the decay test to
cover this case (by using manually created arenas).
2017-03-07 12:58:11 -08:00
Jason Evans
cc75c35db5 Add any() and remove_any() to ph.
These functions select the easiest-to-remove element in the heap, which
is either the most recently inserted aux list element or the root.  If
no calls are made to first() or remove_first(), the behavior (and time
complexity) is the same as for a LIFO queue.
2017-03-07 10:25:33 -08:00
Jason Evans
8547ee11c3 Fix flakiness in test_decay_ticker.
Fix the test_decay_ticker test to carefully control slab
creation/destruction such that the decay backlog reliably reaches zero.
Use an isolated arena so that no extraneous allocation can confuse the
situation.  Speed up time during the latter part of the test so that the
entire decay time can expire in a reasonable amount of wall time.
2017-03-07 10:25:12 -08:00
David Goldblatt
438efede78 Add atomic types for ssize_t 2017-03-06 18:49:19 -08:00
David Goldblatt
e9852b5776 Disentangle assert and util
This is the first header refactoring diff, #533.  It splits the assert and util
components into separate, hermetic, header files.  In the process, it splits out
two of the large sub-components of util (the stdio.h replacement, and bit
manipulation routines) into their own components (malloc_io.h and bit_util.h).
This is mostly to break up cyclic dependencies, but it also breaks off a good
chunk of the catch-all-ness of util, which is nice.
2017-03-06 15:08:43 -08:00
David Goldblatt
d4ac7582f3 Introduce a backport of C11 atomics
This introduces a backport of C11 atomics.  It has four implementations; ranked
in order of preference, they are:
- GCC/Clang __atomic builtins
- GCC/Clang __sync builtins
- MSVC _Interlocked builtins
- C11 atomics, from <stdatomic.h>

The primary advantages are:
- Close adherence to the standard API gives us a defined memory model.
- Type safety: atomic objects are now separate types from non-atomic ones, so
  that it's impossible to mix up atomic and non-atomic updates (which is
  undefined behavior that compilers are starting to take advantage of).
- Efficiency: we can specify ordering for operations, avoiding fences and
  atomic operations on strongly ordered architectures (example:
  `atomic_write_u32(ptr, val);` involves a CAS loop, whereas
  `atomic_store(ptr, val, ATOMIC_RELEASE);` is a plain store.

This diff leaves in the current atomics API (implementing them in terms of the
backport).  This lets us transition uses over piecemeal.

Testing:
This is by nature hard to test. I've manually tested the first three options on
Linux on gcc by futzing with the #defines manually, on freebsd with gcc and
clang, on MSVC, and on OS X with clang.  All of these were x86 machines though,
and we don't have any test infrastructure set up for non-x86 platforms.
2017-03-03 13:40:59 -08:00
Jason Evans
fd058f572b Immediately purge cached extents if decay_time is 0.
This fixes a regression caused by
54269dc0ed (Remove obsolete
arena_maybe_purge() call.), as well as providing a general fix.

This resolves #665.
2017-03-02 19:43:06 -08:00
Jason Evans
de49674fbd Use MALLOC_CONF rather than malloc_conf for tests.
malloc_conf does not reliably work with MSVC, which complains of
"inconsistent dll linkage", i.e. its inability to support the
application overriding malloc_conf when dynamically linking/loading.
Work around this limitation by adding test harness support for per test
shell script sourcing, and converting all tests to use MALLOC_CONF
instead of malloc_conf.
2017-02-23 08:57:02 -08:00
Jason Evans
de8a68e853 Enhance spin_adaptive() to yield after several iterations.
This avoids worst case behavior if e.g. another thread is preempted
while owning the resource the spinning thread is waiting for.
2017-02-08 18:50:03 -08:00
Jason Evans
650c070e10 Remove rtree support for 0 (NULL) keys.
NULL can never actually be inserted in practice, and removing support
allows a branch to be removed from the fast path.
2017-02-08 18:50:03 -08:00
Jason Evans
f5cf9b19c8 Determine rtree levels at compile time.
Rather than dynamically building a table to aid per level computations,
define a constant table at compile time.  Omit both high and low
insignificant bits.  Use one to three tree levels, depending on the
number of significant bits.
2017-02-08 18:50:03 -08:00
Jason Evans
3bd6d8e41d Conditianalize lg_tcache_max use on JEMALLOC_TCACHE. 2017-02-07 12:15:36 -08:00
Jason Evans
d27f29b468 Disentangle arena and extent locking.
Refactor arena and extent locking protocols such that arena and
extent locks are never held when calling into the extent_*_wrapper()
API.  This requires extra care during purging since the arena lock no
longer protects the inner purging logic.  It also requires extra care to
protect extents from being merged with adjacent extents.

Convert extent_t's 'active' flag to an enumerated 'state', so that
retained extents are explicitly marked as such, rather than depending on
ring linkage state.

Refactor the extent collections (and their synchronization) for cached
and retained extents into extents_t.  Incorporate LRU functionality to
support purging.  Incorporate page count accounting, which replaces
arena->ndirty and arena->stats.retained.

Assert that no core locks are held when entering any internal
[de]allocation functions.  This is in addition to existing assertions
that no locks are held when entering external [de]allocation functions.

Audit and document synchronization protocols for all arena_t fields.

This fixes a potential deadlock due to recursive allocation during
gdump, in a similar fashion to b49c649bc1
(Fix lock order reversal during gdump.), but with a necessarily much
broader code impact.
2017-02-01 16:43:46 -08:00
Jason Evans
d0e93ada51 Add witness_assert_depth[_to_rank]().
This makes it possible to make lock state assertions about precisely
which locks are held.
2017-02-01 16:43:46 -08:00
Jason Evans
190f81c6d5 Silence harmless warnings discovered via run_tests.sh. 2017-02-01 11:29:12 -08:00
Jason Evans
c0cc5db871 Replace tabs following #define with spaces.
This resolves #564.
2017-01-20 21:45:53 -08:00
Jason Evans
f408643a4c Remove extraneous parens around return arguments.
This resolves #540.
2017-01-20 21:43:07 -08:00
Jason Evans
c4c2592c83 Update brace style.
Add braces around single-line blocks, and remove line breaks before
function-opening braces.

This resolves #537.
2017-01-20 21:43:07 -08:00
Jason Evans
9eb1b1c881 Fix --disable-stats support.
Fix numerous regressions that were exposed by --disable-stats, both in
the core library and in the tests.
2017-01-19 18:31:07 -08:00
Jason Evans
66bf773ef2 Test JSON output of malloc_stats_print() and fix bugs.
Implement and test a JSON validation parser.  Use the parser to validate
JSON output from malloc_stats_print(), with a significant subset of
supported output options.

This resolves #551.
2017-01-19 14:05:00 -08:00
Jason Evans
1ff09534b5 Fix prof_realloc() regression.
Mostly revert the prof_realloc() changes in
498856f44a (Move slabs out of chunks.) so
that prof_free_sampled_object() is called when appropriate.  Leave the
prof_tctx_[re]set() optimization in place, but add an assertion to
verify that all eight cases are correctly handled.  Add a comment to
make clear the code ordering, so that the regression originally fixed by
ea8d97b897 (Fix
prof_{malloc,free}_sample_object() call order in prof_realloc().) is not
repeated.

This resolves #499.
2017-01-17 15:16:37 -08:00
Jason Evans
8115f05b26 Add nullptr support to sized delete operators. 2017-01-17 14:30:15 -08:00
Jason Evans
ffbb7dac3d Remove leading blank lines from function bodies.
This resolves #535.
2017-01-13 14:49:24 -08:00
David Goldblatt
77cccac8cd Break up headers into constituent parts
This is part of a broader change to make header files better represent the
dependencies between one another (see
https://github.com/jemalloc/jemalloc/issues/533). It breaks up component headers
into smaller parts that can be made to have a simpler dependency graph.

For the autogenerated headers (smoothstep.h and size_classes.h), no splitting
was necessary, so I didn't add support to emit multiple headers.
2017-01-12 15:43:51 -08:00
Jason Evans
edf1bafb2b Implement arena.<i>.destroy .
Add MALLCTL_ARENAS_DESTROYED for accessing destroyed arena stats as an
analogue to MALLCTL_ARENAS_ALL.

This resolves #382.
2017-01-06 18:58:46 -08:00
Jason Evans
3f291d59ad Refactor test extent hook code to be reusable.
Move test extent hook code from the extent integration test into a
header, and normalize the out-of-band controls and introspection.
Also refactor the base unit test to use the header.
2017-01-06 18:58:46 -08:00
Jason Evans
dc2125cf95 Replace the arenas.initialized mallctl with arena.<i>.initialized . 2017-01-06 18:58:46 -08:00
Jason Evans
0f04bb1d6f Rename the arenas.extend mallctl to arenas.create. 2017-01-06 18:58:45 -08:00
Jason Evans
3dc4e83ccb Add MALLCTL_ARENAS_ALL.
Add the MALLCTL_ARENAS_ALL cpp macro as a fixed index for use
in accessing the arena.<i>.{purge,decay,dss} and stats.arenas.<i>.*
mallctls, and deprecate access via the arenas.narenas index (to be
removed in 6.0.0).
2017-01-06 18:58:45 -08:00
Jason Evans
a0dd3a4483 Implement per arena base allocators.
Add/rename related mallctls:
- Add stats.arenas.<i>.base .
- Rename stats.arenas.<i>.metadata to stats.arenas.<i>.internal .
- Add stats.arenas.<i>.resident .

Modify the arenas.extend mallctl to take an optional (extent_hooks_t *)
argument so that it is possible for all base allocations to be serviced
by the specified extent hooks.

This resolves #463.
2016-12-26 18:08:28 -08:00
Jason Evans
a6e86810d8 Refactor purging and splitting/merging.
Split purging into lazy and forced variants.  Use the forced variant for
zeroing dss.

Add support for NULL function pointers as an opt-out mechanism for the
dalloc, commit, decommit, purge_lazy, purge_forced, split, and merge
fields of extent_hooks_t.

Add short-circuiting checks in large_ralloc_no_move_{shrink,expand}() so
that no attempt is made if splitting/merging is not supported.

This resolves #268.
2016-12-26 18:08:16 -08:00
Jason Evans
c1baa0a9b7 Add huge page configuration and pages_[no}huge().
Add the --with-lg-hugepage configure option, but automatically configure
LG_HUGEPAGE even if it isn't specified.

Add the pages_[no]huge() functions, which toggle huge page state via
madvise(..., MADV_[NO]HUGEPAGE) calls.
2016-12-26 17:59:34 -08:00
Jason Evans
bacb6afc6c Simplify arena_slab_regind().
Rewrite arena_slab_regind() to provide sufficient constant data for
the compiler to perform division strength reduction.  This replaces
more general manual strength reduction that was implemented before
arena_bin_info was compile-time-constant.  It would be possible to
slightly improve on the compiler-generated division code by taking
advantage of range limits that the compiler doesn't know about.
2016-12-23 10:34:34 -08:00
Dave Watson
2319152d9f jemalloc cpp new/delete bindings
Adds cpp bindings for jemalloc, along with necessary autoconf settings.
This is mostly to add sized deallocation support, which can't be added
from C directly.  Sized deallocation is ~10% microbench improvement.

* Import ax_cxx_compile_stdcxx.m4 from the autoconf repo, seems like the
  easiest way to get c++14 detection.
* Adds various other changes, like CXXFLAGS, to configure.ac.
* Adds new rules to Makefile.in for src/jemalloc-cpp.cpp, and a basic
  unittest.
* Both new and delete are overridden, to ensure jemalloc is used for
  both.
* TODO future enhancement of avoiding extra PLT thunks for new and
  delete - sdallocx and malloc are publicly exported jemalloc symbols,
  using an alias would link them directly.  Unfortunately, was having
  trouble getting it to play nice with jemalloc's namespace support.

Testing:
Tested gcc 4.8, gcc 5, gcc 5.2, clang 4.0.  Only gcc >= 5 has sized
deallocation support, verified that the rest build correctly.

Tested mac osx and Centos.

Tested --with-jemalloc-prefix and --without-export.

This resolves #202.
2016-12-12 18:36:06 -08:00
Jason Evans
d4c5aceb7c Add a_type parameter to qr_{meld,split}(). 2016-12-12 18:16:51 -08:00
Jason Evans
8a4528bdd1 Uniformly cast mallctl[bymib]() oldp/newp arguments to (void *).
This avoids warnings in some cases, and is otherwise generally good
hygiene.
2016-11-15 15:01:03 -08:00
Jason Evans
2c95154501 Add packing test, which verifies stable layout policy. 2016-11-15 13:08:33 -08:00
Jason Evans
c25e711cf9 Reduce memory usage for sdallocx() test_alignment_and_size. 2016-11-11 23:50:35 -08:00
Jason Evans
5e0373c815 Fix test_prng_lg_range_zu() to work on 32-bit systems. 2016-11-07 11:50:11 -08:00
Jason Evans
cda59f9970 Rename atomic_*_{uint32,uint64,u}() to atomic_*_{u32,u64,zu}().
This change conforms to naming conventions throughout the codebase.
2016-11-07 11:27:48 -08:00
Jason Evans
04b463546e Refactor prng to not use 64-bit atomics on 32-bit platforms.
This resolves #495.
2016-11-07 10:52:44 -08:00
Jason Evans
ea9961acdb Fix psz/pind edge cases.
Add an "over-size" extent heap in which to store extents which exceed
the maximum size class (plus cache-oblivious padding, if enabled).
Remove psz2ind_clamp() and use psz2ind() instead so that trying to
allocate the maximum size class can in principle succeed.  In practice,
this allows assertions to hold so that OOM errors can be successfully
generated.
2016-11-03 22:33:34 -07:00
Dave Watson
25f7bbcf28 Fix long spinning in rtree_node_init
rtree_node_init spinlocks the node, allocates, and then sets the node.
This is under heavy contention at the top of the tree if many threads
start to allocate at the same time.

Instead, take a per-rtree sleeping mutex to reduce spinning.  Tested
both pthreads and osx OSSpinLock, and both reduce spinning adequately

Previous benchmark time:
./ttest1 500 100
~15s

New benchmark time:
./ttest1 500 100
.57s
2016-11-02 20:30:53 -07:00
Jason Evans
795f6689de Add os_unfair_lock support.
OS X 10.12 deprecated OSSpinLock; os_unfair_lock is the recommended
replacement.
2016-11-02 18:09:45 -07:00
Jason Evans
b54072dfee Call _exit(2) rather than exit(3) in forked child.
_exit(2) is async-signal-safe, whereas exit(3) is not.
2016-11-02 18:05:19 -07:00
Jason Evans
bde815dc40 Reduce memory requirements for regression tests.
This is intended to drop memory usage to a level that AppVeyor test
instances can handle.

This resolves #393.
2016-10-28 11:23:24 -07:00
Jason Evans
970d293257 Periodically purge in memory-intensive integration tests.
This resolves #393.
2016-10-28 11:00:36 -07:00