Commit Graph

2021 Commits

Author SHA1 Message Date
David Goldblatt
21a68e2d22 Convert rtree code to use C11 atomics
In the process, I changed the implementation of rtree_elm_acquire so that it
won't even try to CAS if its initial read (getting the extent + lock bit)
indicates that the CAS is doomed to fail.  This can significantly improve
performance under contention.
2017-03-13 12:05:27 -07:00
Jason Evans
3a2b183d5f Convert arena_t's purging field to non-atomic bool.
The decay mutex already protects all accesses.
2017-03-10 10:14:30 -08:00
Jason Evans
75fddc786c Fix ATOMIC_{ACQUIRE,RELEASE,ACQ_REL} definitions. 2017-03-09 00:57:37 -08:00
Qi Wang
f84471edc3 Add documentation for percpu_arena in jemalloc.xml.in. 2017-03-08 23:19:01 -08:00
Qi Wang
ec532e2c5c Implement per-CPU arena.
The new feature, opt.percpu_arena, determines thread-arena association
dynamically based CPU id. Three modes are supported: "percpu", "phycpu"
and disabled.

"percpu" uses the current core id (with help from sched_getcpu())
directly as the arena index, while "phycpu" will assign threads on the
same physical CPU to the same arena. In other words, "percpu" means # of
arenas == # of CPUs, while "phycpu" has # of arenas == 1/2 * (# of
CPUs). Note that no runtime check on whether hyper threading is enabled
is added yet.

When enabled, threads will be migrated between arenas when a CPU change
is detected. In the current design, to reduce overhead from reading CPU
id, each arena tracks the thread accessed most recently. When a new
thread comes in, we will read CPU id and update arena if necessary.
2017-03-08 23:19:01 -08:00
Qi Wang
8721e19c04 Fix arena_prefork lock rank order for witness.
When witness is enabled, lock rank order needs to be preserved during
prefork, not only for each arena, but also across arenas. This change
breaks arena_prefork into further stages to ensure valid rank order
across arenas. Also changed test/unit/fork to use a manual arena to
catch this case.
2017-03-08 23:07:27 -08:00
David Goldblatt
8adab26972 Convert extents_t's npages field to use C11-style atomics
In the process, we can do some strength reduction, changing the fetch-adds and
fetch-subs to be simple loads followed by stores, since the modifications all
occur while holding the mutex.
2017-03-08 21:27:09 -08:00
David Goldblatt
dafadce622 Reintroduce JEMALLOC_ATOMIC_U64
The C11 atomics backport removed this #define, which degraded atomic 64-bit
reads to require a lock even on platforms that support them.  This commit fixes
that.
2017-03-08 21:26:37 -08:00
Qi Wang
01f47f11a6 Store associated arena in tcache.
This fixes tcache_flush for manual tcaches, which wasn't able to find
the correct arena it associated with. Also changed the decay test to
cover this case (by using manually created arenas).
2017-03-07 12:58:11 -08:00
Jason Evans
cdce93e4a3 Use any-best-fit for cached extent allocation.
This simplifies what would be pairing heap operations to the equivalent
of LIFO queue operations.  This is a complementary optimization in the
context of delayed coalescing for cached extents.
2017-03-07 10:25:33 -08:00
Jason Evans
cc75c35db5 Add any() and remove_any() to ph.
These functions select the easiest-to-remove element in the heap, which
is either the most recently inserted aux list element or the root.  If
no calls are made to first() or remove_first(), the behavior (and time
complexity) is the same as for a LIFO queue.
2017-03-07 10:25:33 -08:00
Jason Evans
e201e24904 Perform delayed coalescing prior to purging.
Rather than purging uncoalesced extents, perform just enough incremental
coalescing to purge only fully coalesced extents.  In the absence of
cached extent reuse, the immediate versus delayed incremental purging
algorithms result in the same purge order.

This resolves #655.
2017-03-07 10:25:12 -08:00
Jason Evans
8547ee11c3 Fix flakiness in test_decay_ticker.
Fix the test_decay_ticker test to carefully control slab
creation/destruction such that the decay backlog reliably reaches zero.
Use an isolated arena so that no extraneous allocation can confuse the
situation.  Speed up time during the latter part of the test so that the
entire decay time can expire in a reasonable amount of wall time.
2017-03-07 10:25:12 -08:00
David Goldblatt
4f1e94658a Change arena to use the atomic functions for ssize_t instead of the union strategy 2017-03-06 18:49:19 -08:00
David Goldblatt
438efede78 Add atomic types for ssize_t 2017-03-06 18:49:19 -08:00
David Goldblatt
424e3428b1 Make type abbreviations consistent: ssize_t is zd everywhere 2017-03-06 18:49:19 -08:00
David Goldblatt
84326c566a Insert not_reached after an exhaustive switch
In the C11 atomics backport, we couldn't use not_reached() in
atomic_enum_to_builtin (in atomic_gcc_atomic.h), since atomic.h was hermetic and
assert.h wasn't; there was a dependency issue.  assert.h is hermetic now, so we
can include it.
2017-03-06 15:08:43 -08:00
David Goldblatt
e9852b5776 Disentangle assert and util
This is the first header refactoring diff, #533.  It splits the assert and util
components into separate, hermetic, header files.  In the process, it splits out
two of the large sub-components of util (the stdio.h replacement, and bit
manipulation routines) into their own components (malloc_io.h and bit_util.h).
This is mostly to break up cyclic dependencies, but it also breaks off a good
chunk of the catch-all-ness of util, which is nice.
2017-03-06 15:08:43 -08:00
Jason Evans
04d8fcb745 Optimize malloc_large_stats_t maintenance.
Convert the nrequests field to be partially derived, and the curlextents
to be fully derived, in order to reduce the number of stats updates
needed during common operations.

This change affects ndalloc stats during arena reset, because it is no
longer possible to cancel out ndalloc effects (curlextents would become
negative).
2017-03-04 08:18:31 -08:00
David Goldblatt
d4ac7582f3 Introduce a backport of C11 atomics
This introduces a backport of C11 atomics.  It has four implementations; ranked
in order of preference, they are:
- GCC/Clang __atomic builtins
- GCC/Clang __sync builtins
- MSVC _Interlocked builtins
- C11 atomics, from <stdatomic.h>

The primary advantages are:
- Close adherence to the standard API gives us a defined memory model.
- Type safety: atomic objects are now separate types from non-atomic ones, so
  that it's impossible to mix up atomic and non-atomic updates (which is
  undefined behavior that compilers are starting to take advantage of).
- Efficiency: we can specify ordering for operations, avoiding fences and
  atomic operations on strongly ordered architectures (example:
  `atomic_write_u32(ptr, val);` involves a CAS loop, whereas
  `atomic_store(ptr, val, ATOMIC_RELEASE);` is a plain store.

This diff leaves in the current atomics API (implementing them in terms of the
backport).  This lets us transition uses over piecemeal.

Testing:
This is by nature hard to test. I've manually tested the first three options on
Linux on gcc by futzing with the #defines manually, on freebsd with gcc and
clang, on MSVC, and on OS X with clang.  All of these were x86 machines though,
and we don't have any test infrastructure set up for non-x86 platforms.
2017-03-03 13:40:59 -08:00
David Goldblatt
957b8c5f21 Stop #define-ining away 'inline'
In the long term, we'll transition to C99-style inline semantics.  In the
short-term, this will allow both styles to coexist without breaking one another.
2017-03-03 13:40:59 -08:00
Jason Evans
fd058f572b Immediately purge cached extents if decay_time is 0.
This fixes a regression caused by
54269dc0ed (Remove obsolete
arena_maybe_purge() call.), as well as providing a general fix.

This resolves #665.
2017-03-02 19:43:06 -08:00
Jason Evans
d61a5f76b2 Convert arena_decay_t's time to be atomically synchronized. 2017-03-02 19:43:06 -08:00
Jason Evans
ff55f07eb6 Fix typos. 2017-03-01 15:31:30 -08:00
Qi Wang
aa1de06e3a Small style fix in ctl.c 2017-03-01 15:21:39 -08:00
charsyam
a8c9e9c651 fix typo sytem -> system 2017-03-01 08:40:05 -08:00
Jason Evans
04380e79f1 Merge branch 'rc-4.5.0' 2017-02-28 19:09:23 -08:00
Jason Evans
379dd44c57 Add casts to CONF_HANDLE_T_U().
This avoids signed/unsigned comparison warnings when specifying integer
constants as inputs.
2017-02-28 17:18:25 -08:00
Jason Evans
700253e1f2 Update ChangeLog for 4.5.0. 2017-02-28 16:21:05 -08:00
Jason Evans
2406c22f36 Add casts to CONF_HANDLE_T_U().
This avoids signed/unsigned comparison warnings when specifying integer
constants as inputs.

Clean up whitespace and add clarifying parentheses for
CONF_HANDLE_SIZE_T(opt_lg_chunk, ...).
2017-02-28 16:20:44 -08:00
Jason Evans
e723f99dec Alphabetize private symbol names. 2017-02-28 15:06:27 -08:00
Jason Evans
cbb6720861 Update ChangeLog for 4.5.0. 2017-02-28 14:25:26 -08:00
Jason Evans
d84d2909c3 Fix/enhance THP integration.
Detect whether chunks start off as THP-capable by default (according to
the state of /sys/kernel/mm/transparent_hugepage/enabled), and use this
as the basis for whether to call pages_nohuge() once per chunk during
first purge of any of the chunk's page runs.

Add the --disable-thp configure option, as well as the the opt.thp
mallctl.

This resolves #541.
2017-02-28 14:25:06 -08:00
Jason Evans
766ddcd0f2 restructure *CFLAGS configuration.
Convert CFLAGS to be a concatenation:

  CFLAGS := CONFIGURE_CFLAGS SPECIFIED_CFLAGS EXTRA_CFLAGS

This ordering makes it possible to override the flags set by the
configure script both during and after configuration, with CFLAGS and
EXTRA_CFLAGS, respectively.

This resolves #619.
2017-02-28 12:54:40 -08:00
Jason Evans
25d50a943a Dodge 32-bit-clang-specific backtracing failure.
This disables run_tests.sh configurations that use the combination of
32-bit clang and heap profiling.
2017-02-28 10:59:27 -08:00
Jason Evans
4a068644c7 Put -D_REENTRANT in CPPFLAGS rather than CFLAGS.
This regression was introduced by
194d6f9de8 (Restructure *CFLAGS/*CXXFLAGS
configuration.).
2017-02-28 01:21:26 -08:00
Qi Wang
7b53fe928e Handle race in stats_arena_bins_print
When multiple threads calling stats_print, race could happen as we read the
counters in separate mallctl calls; and the removed assertion could fail when
other operations happened in between the mallctl calls. For simplicity, output
"race" in the utilization field in this case.

This resolves #616.
2017-02-27 15:25:38 -08:00
Jason Evans
7c124830a1 Fix lg_chunk clamping for config_cache_oblivious.
Fix lg_chunk clamping to take into account cache-oblivious large
allocation.  This regression only resulted in incorrect behavior if
!config_fill (false unless --disable-fill specified) and
config_cache_oblivious (true unless --disable-cache-oblivious
specified).

This regression was introduced by
8a03cf039c (Implement cache index
randomization for large allocations.), which was first released in
4.0.0.

This resolves #555.
2017-02-27 15:19:41 -08:00
Jason Evans
1027a2682b Add some missing explicit casts.
This resolves #614.
2017-02-27 14:41:01 -08:00
Jason Evans
472fef2e12 Fix {allocated,nmalloc,ndalloc,nrequests}_large stats regression.
This fixes a regression introduced by
d433471f58 (Derive
{allocated,nmalloc,ndalloc,nrequests}_large stats.).
2017-02-27 11:18:07 -08:00
Jason Evans
079b8bee37 Tidy up extent quantization.
Remove obsolete unit test scaffolding for extent quantization.  Remove
redundant assertions.  Add an assertion to
extents_first_best_fit_locked() that should help prevent aligned
allocation regressions.
2017-02-27 11:17:47 -08:00
Jason Evans
1e2c9ef8d6 Fix huge-aligned allocation.
This regression was caused by
b9408d77a6 (Fix/simplify chunk_recycle()
allocation size computations.).

This resolves #647.
2017-02-27 11:08:19 -08:00
Jason Evans
d727596bcb Update a comment. 2017-02-26 11:05:27 -08:00
Jason Evans
ed19a48928 Silence harmless warnings discovered via run_tests.sh. 2017-02-26 11:04:58 -08:00
Jason Evans
54d2d697b2 Test JSON output of malloc_stats_print() and fix bugs.
Implement and test a JSON validation parser.  Use the parser to validate
JSON output from malloc_stats_print(), with a significant subset of
supported output options.

This resolves #583.
2017-02-26 11:04:58 -08:00
Jason Evans
61d26425e5 Fix JSON-mode output for !config_stats and/or !config_prof cases.
These bugs were introduced by b599b32280
(Add "J" (JSON) support to malloc_stats_print().), which was first
released in 4.3.0.

This resolves #615.
2017-02-26 11:04:58 -08:00
Jason Evans
adae7cfc4a Fix chunk_alloc_dss() regression.
Fix chunk_alloc_dss() to account for bytes that are not a multiple of
the chunk size.  This regression was introduced by
e2bcf037d4 (Make dss operations
lockless.), which was first released in 4.3.0.
2017-02-26 10:53:26 -08:00
Qi Wang
c2323e13a5 Get rid of witness in malloc_mutex_t when !(configured w/ debug).
We don't touch witness at all when config_debug == false.  Let's only pay the
memory cost in malloc_mutex_s when needed. Note that when !config_debug, we keep
the field in a union so that we don't have to do #ifdefs in multiple places.
2017-02-24 09:41:29 -08:00
Jason Evans
08c24e7c1a Relax witness assertions related to prof_gdump().
In some cases the prof machinery allocates (in order to modify the
bt2gctx hash table), and such operations are synchronized via
bt2gctx_mtx.  Rather than asserting that no locks are held on entry
into functions that may call prof_gdump(), make the weaker assertion
that no "core" locks are held.  The prof machinery enqueues dumps
triggered by prof_gdump() calls when bt2gctx_mtx is held, so this
weakened assertion avoids false failures in such cases.
2017-02-23 10:08:42 -08:00
Jason Evans
f56cb9a68e Add witness_assert_depth[_to_rank]().
This makes it possible to make lock state assertions about precisely
which locks are held.
2017-02-23 10:08:42 -08:00