Commit Graph

169 Commits

Author SHA1 Message Date
Jason Evans
e5effef428 Add/use adaptive spinning.
Add spin_t and spin_{init,adaptive}(), which provide a simple
abstraction for adaptive spinning.

Adaptively spin during busy waits in bootstrapping and rtree node
initialization.
2016-10-13 14:55:39 -07:00
Jason Evans
9acd5cf178 Remove all vestiges of chunks.
Remove mallctls:
- opt.lg_chunk
- stats.cactive

This resolves #464.
2016-10-12 11:55:43 -07:00
Jason Evans
871a9498e1 Fix size class overflow bugs.
Avoid calling s2u() on raw extent sizes in extent_recycle().

Clamp psz2ind() (implemented as psz2ind_clamp()) when inserting/removing
into/from size-segregated extent heaps.
2016-10-03 14:18:55 -07:00
Eric Le Bihan
df0d273a07 Fix LG_QUANTUM definition for sparc64
GCC 4.9.3 cross-compiled for sparc64 defines __sparc_v9__, not
__sparc64__ nor __sparcv9. This prevents LG_QUANTUM from being defined
properly. Adding this new value to the check solves the issue.
2016-09-26 15:13:07 -07:00
Elliot Ronaghan
8a1a794b0c Don't use compact red-black trees with the pgi compiler
Some bug (either in the red-black tree code, or in the pgi compiler) seems to
cause red-black trees to become unbalanced. This issue seems to go away if we
don't use compact red-black trees. Since red-black trees don't seem to be used
much anymore, I opted for what seems to be an easy fix here instead of digging
in and trying to find the root cause of the bug.

Some context in case it's helpful:

I experienced a ton of segfaults while using pgi as Chapel's target compiler
with jemalloc 4.0.4. The little bit of debugging I did pointed me somewhere
deep in red-black tree manipulation, but I didn't get a chance to investigate
further. It looks like 4.2.0 replaced most uses of red-black trees with
pairing-heaps, which seems to avoid whatever bug I was hitting.

However, `make check_unit` was still failing on the rb test, so I figured the
core issue was just being masked. Here's the `make check_unit` failure:

```sh
=== test/unit/rb ===
test_rb_empty: pass
tree_recurse:test/unit/rb.c:90: Failed assertion: (((_Bool) (((uintptr_t) (left_node)->link.rbn_right_red) & ((size_t)1)))) == (false) --> true != false: Node should be black
test_rb_random:test/unit/rb.c:274: Failed assertion: (imbalances) == (0) --> 1 != 0: Tree is unbalanced
tree_recurse:test/unit/rb.c:90: Failed assertion: (((_Bool) (((uintptr_t) (left_node)->link.rbn_right_red) & ((size_t)1)))) == (false) --> true != false: Node should be black
test_rb_random:test/unit/rb.c:274: Failed assertion: (imbalances) == (0) --> 1 != 0: Tree is unbalanced
node_remove:test/unit/rb.c:190: Failed assertion: (imbalances) == (0) --> 2 != 0: Tree is unbalanced
<jemalloc>: test/unit/rb.c:43: Failed assertion: "pathp[-1].cmp < 0"
test/test.sh: line 22: 12926 Aborted
Test harness error
```

While starting to debug I saw the RB_COMPACT option and decided to check if
turning that off resolved the bug. It seems to have fixed it (`make check_unit`
passes and the segfaults under Chapel are gone) so it seems like on okay
work-around. I'd imagine this has performance implications for red-black trees
under pgi, but if they're not going to be used much anymore it's probably not a
big deal.
2016-06-08 14:48:55 -07:00
Jason Evans
dd752c1ffd Fix potential VM map fragmentation regression.
Revert 245ae6036c (Support --with-lg-page
values larger than actual page size.), because it could cause VM map
fragmentation if the kernel grows mmap()ed memory downward.

This resolves #391.
2016-06-07 14:15:49 -07:00
Jason Evans
7be2ebc23f Make tsd cleanup functions optional, remove noop cleanup functions. 2016-06-05 20:42:24 -07:00
Jason Evans
751f2c332d Remove obsolete stats.arenas.<i>.metadata.mapped mallctl.
Rename stats.arenas.<i>.metadata.allocated mallctl to
stats.arenas.<i>.metadata .
2016-06-05 20:42:24 -07:00
Jason Evans
03eea4fb8b Better document --enable-ivsalloc. 2016-06-05 20:42:24 -07:00
Jason Evans
0c4932eb1e s/chunk_lookup/extent_lookup/g, s/chunks_rtree/extents_rtree/g 2016-06-05 20:42:23 -07:00
Jason Evans
7d63fed0fd Rename huge to large. 2016-06-05 20:42:23 -07:00
Jason Evans
ed2c2427a7 Use huge size class infrastructure for large size classes. 2016-06-05 20:42:18 -07:00
Jason Evans
4731cd47f7 Allow chunks to not be naturally aligned.
Precisely size extents for huge size classes that aren't multiples of
chunksize.
2016-06-03 12:27:41 -07:00
Jason Evans
d78846c989 Replace extent_achunk_[gs]et() with extent_slab_[gs]et(). 2016-06-03 12:27:41 -07:00
Jason Evans
fae8344098 Add extent_active_[gs]et().
Always initialize extents' runs_dirty and chunks_cache linkage.
2016-06-03 12:27:41 -07:00
Jason Evans
6f71844659 Move *PAGE* definitions to pages.h. 2016-06-03 12:27:41 -07:00
Jason Evans
8c9be3e837 Refactor rtree to always use base_alloc() for node allocation. 2016-06-03 12:27:41 -07:00
Jason Evans
db72272bef Use rtree-based chunk lookups rather than pointer bit twiddling.
Look up chunk metadata via the radix tree, rather than using
CHUNK_ADDR2BASE().

Propagate pointer's containing extent.

Minimize extent lookups by doing a single lookup (e.g. in free()) and
propagating the pointer's extent into nearly all the functions that may
need it.
2016-06-03 12:27:41 -07:00
Jason Evans
a7a6f5bc96 Rename extent_node_t to extent_t. 2016-05-16 12:21:28 -07:00
Jason Evans
7bb00ae9d6 Refactor runs_avail.
Use pszind_t size classes rather than szind_t size classes, and always
reserve space for NPSIZES elements.  This removes unused heaps that are
not multiples of the page size, and adds (currently) unused heaps for
all huge size classes, with the immediate benefit that the size of
arena_t allocations is constant (no longer dependent on chunk size).
2016-05-16 12:21:21 -07:00
Jason Evans
226c446979 Implement pz2ind(), pind2sz(), and psz2u().
These compute size classes and indices similarly to size2index(),
index2size() and s2u(), respectively, but using the subset of size
classes that are multiples of the page size.  Note that pszind_t and
szind_t are not interchangeable.
2016-05-13 10:31:54 -07:00
Jason Evans
17c021c177 Remove redzone support.
This resolves #369.
2016-05-13 10:27:33 -07:00
Jason Evans
ba5c709517 Remove quarantine support. 2016-05-13 10:25:05 -07:00
Jason Evans
9a8add1510 Remove Valgrind support. 2016-05-13 09:56:18 -07:00
Jason Evans
73d3d58dc2 Optimize witness fast path.
Short-circuit commonly called witness functions so that they only
execute in debug builds, and remove equivalent guards from mutex
functions.  This avoids pointless code execution in
witness_assert_lockless(), which is typically called twice per
allocation/deallocation function invocation.

Inline commonly called witness functions so that optimized builds can
completely remove calls as dead code.
2016-05-11 15:38:06 -07:00
Jason Evans
c1e00ef2a6 Resolve bootstrapping issues when embedded in FreeBSD libc.
b2c0d6322d (Add witness, a simple online
locking validator.) caused a broad propagation of tsd throughout the
internal API, but tsd_fetch() was designed to fail prior to tsd
bootstrapping.  Fix this by splitting tsd_t into non-nullable tsd_t and
nullable tsdn_t, and modifying all internal APIs that do not critically
rely on tsd to take nullable pointers.  Furthermore, add the
tsd_booted_get() function so that tsdn_fetch() can probe whether tsd
bootstrapping is complete and return NULL if not.  All dangerous
conversions of nullable pointers are tsdn_tsd() calls that assert-fail
on invalid conversion.
2016-05-10 22:51:33 -07:00
Jason Evans
919e4a0ea9 Add LG_QUANTUM definition for the RISC-V architecture. 2016-05-06 17:15:32 -07:00
Jason Evans
3ef51d7f73 Optimize the fast paths of calloc() and [m,d,sd]allocx().
This is a broader application of optimizations to malloc() and free() in
f4a0f32d34 (Fast-path improvement:
reduce # of branches and unnecessary operations.).

This resolves #321.
2016-05-06 14:37:39 -07:00
Jason Evans
90827a3f3e Fix huge_palloc() regression.
Split arena_choose() into arena_[i]choose() and use arena_ichoose() for
arena lookup during internal allocation.  This fixes huge_palloc() so
that it always succeeds during extent node allocation.

This regression was introduced by
66cd953514 (Do not allocate metadata via
non-auto arenas, nor tcaches.).
2016-05-03 17:19:15 -07:00
Jason Evans
66cd953514 Do not allocate metadata via non-auto arenas, nor tcaches.
This assures that all internally allocated metadata come from the
first opt_narenas arenas, i.e. the automatically multiplexed arenas.
2016-04-22 15:19:59 -07:00
Jason Evans
b2c0d6322d Add witness, a simple online locking validator.
This resolves #358.
2016-04-14 02:09:28 -07:00
Jason Evans
245ae6036c Support --with-lg-page values larger than actual page size.
During over-allocation in preparation for creating aligned mappings,
allocate one more page than necessary if PAGE is the actual page size,
so that trimming still succeeds even if the system returns a mapping
that has less than PAGE alignment.  This allows compiling with e.g. 64
KiB "pages" on systems that actually use 4 KiB pages.

Note that for e.g. --with-lg-page=21, it is also necessary to increase
the chunk size (e.g. --with-malloc-conf=lg_chunk:22) so that there are
at least two "pages" per chunk.  In practice this isn't a particularly
compelling configuration because so much (unusable) virtual memory is
dedicated to chunk headers.
2016-04-11 02:35:00 -07:00
Jason Evans
c6a2c39404 Refactor/fix ph.
Refactor ph to support configurable comparison functions.  Use a cpp
macro code generation form equivalent to the rb macros so that pairing
heaps can be used for both run heaps and chunk heaps.

Remove per node parent pointers, and instead use leftmost siblings' prev
pointers to track parents.

Fix multi-pass sibling merging to iterate over intermediate results
using a FIFO, rather than a LIFO.  Use this fixed sibling merging
implementation for both merge phases of the auxiliary twopass algorithm
(first merging the aux list, then replacing the root with its merged
children).  This fixes both degenerate merge behavior and the potential
for deep recursion.

This regression was introduced by
6bafa6678f (Pairing heap).

This resolves #371.
2016-04-11 02:15:42 -07:00
Chris Peterson
f3060284c5 Remove unused arenas_extend() function declaration.
The arenas_extend() function was renamed to arenas_init() in commit
8bb3198f72, but its function declaration
was not removed from jemalloc_internal.h.in.
2016-03-26 01:03:24 -07:00
Dave Watson
6bafa6678f Pairing heap
Initial implementation of a twopass pairing heap with aux list.
Research papers linked in comments.

Where search/nsearch/last aren't needed, this gives much faster first(),
delete(), and insert().  Insert is O(1), and first/delete don't have to
walk the whole tree.

Also tested rb_old with parent pointers - it was better than the current
rb.h for memory loads, but still much worse than a pairing heap.

An array-based heap would be much faster if everything fits in memory,
but on a cold cache it has many more memory loads for most operations.
2016-03-08 13:46:19 -08:00
Jason Evans
0c516a00c4 Make *allocx() size class overflow behavior defined.
Limit supported size and alignment to HUGE_MAXCLASS, which in turn is
now limited to be less than PTRDIFF_MAX.

This resolves #278 and #295.
2016-02-25 15:29:49 -08:00
Jason Evans
767d85061a Refactor arenas array (fixes deadlock).
Refactor the arenas array, which contains pointers to all extant arenas,
such that it starts out as a sparse array of maximum size, and use
double-checked atomics-based reads as the basis for fast and simple
arena_get().  Additionally, reduce arenas_lock's role such that it only
protects against arena initalization races.  These changes remove the
possibility for arena lookups to trigger locking, which resolves at
least one known (fork-related) deadlock.

This resolves #315.
2016-02-24 23:58:10 -08:00
Jason Evans
1c42a04cc6 Change lg_floor() return type from size_t to unsigned. 2016-02-24 13:03:48 -08:00
Jason Evans
8f683b94a7 Make opt_narenas unsigned rather than size_t. 2016-02-24 13:03:48 -08:00
Jason Evans
9bad079039 Refactor time_* into nstime_*.
Use a single uint64_t in nstime_t to store nanoseconds rather than using
struct timespec.  This reduces fragility around conversions between long
and uint64_t, especially missing casts that only cause problems on
32-bit platforms.
2016-02-21 21:39:05 -08:00
rustyx
3c2c5a5071 Fix warning in ipalloc 2016-02-20 10:55:18 -08:00
Jason Evans
243f7a0508 Implement decay-based unused dirty page purging.
This is an alternative to the existing ratio-based unused dirty page
purging, and is intended to eventually become the sole purging
mechanism.

Add mallctls:
- opt.purge
- opt.decay_time
- arena.<i>.decay
- arena.<i>.decay_time
- arenas.decay_time
- stats.arenas.<i>.decay_time

This resolves #325.
2016-02-19 20:56:21 -08:00
Jason Evans
8e82af1166 Implement smoothstep table generation.
Check in a generated smootherstep table as smoothstep.h rather than
generating it at configure time, since not all systems (e.g. Windows)
have dc.
2016-02-19 20:56:15 -08:00
Jason Evans
db927b6727 Refactor arenas_cache tsd.
Refactor arenas_cache tsd into arenas_tdata, which is a structure of
type arena_tdata_t.
2016-02-19 20:32:37 -08:00
Jason Evans
34676d3369 Refactor prng* from cpp macros into inline functions.
Remove 32-bit variant, convert prng64() to prng_lg_range(), and add
prng_range().
2016-02-19 20:29:06 -08:00
Jason Evans
9998000b2b Implement ticker.
Implement ticker, which provides a simple API for ticking off some
number of events before indicating that the ticker has hit its limit.
2016-02-19 20:29:06 -08:00
Cameron Evans
e5d5a4a517 Add time_update(). 2016-02-19 20:29:06 -08:00
Jason Evans
f829009929 Add --with-malloc-conf.
Add --with-malloc-conf, which makes it possible to embed a default
options string during configuration.
2016-02-19 20:29:06 -08:00
Qi Wang
f4a0f32d34 Fast-path improvement: reduce # of branches and unnecessary operations.
- Combine multiple runtime branches into a single malloc_slow check.
- Avoid calling arena_choose / size2index / index2size on fast path.
- A few micro optimizations.
2015-11-10 14:28:34 -08:00
Jason Evans
a784e411f2 Fix a xallocx(..., MALLOCX_ZERO) bug.
Fix xallocx(..., MALLOCX_ZERO to zero the last full trailing page of
large allocations that have been randomly assigned an offset of 0 when
--enable-cache-oblivious configure option is enabled.  This addresses a
special case missed in d260f442ce (Fix
xallocx(..., MALLOCX_ZERO) bugs.).
2015-09-24 22:21:55 -07:00