Commit Graph

507 Commits

Author SHA1 Message Date
Jason Evans
613cdc80f6 Convert arena_bin_t's runs from a tree to a heap. 2016-03-08 13:48:27 -08:00
Dave Watson
4a0dbb5ac8 Use pairing heap for arena->runs_avail
Use pairing heap instead of red black tree in arena runs_avail.  The
extra links are unioned with the bitmap_t, so this change doesn't use
any extra memory.

Canaries show this change to be a 1% cpu win, and 2% latency win.  In
particular, large free()s, and small bin frees are now O(1) (barring
coalescing).

I also tested changing bin->runs to be a pairing heap, but saw a much
smaller win, and it would mean increasing the size of arena_run_s by two
pointers, so I left that as an rb-tree for now.
2016-03-08 13:48:27 -08:00
Dave Watson
6bafa6678f Pairing heap
Initial implementation of a twopass pairing heap with aux list.
Research papers linked in comments.

Where search/nsearch/last aren't needed, this gives much faster first(),
delete(), and insert().  Insert is O(1), and first/delete don't have to
walk the whole tree.

Also tested rb_old with parent pointers - it was better than the current
rb.h for memory loads, but still much worse than a pairing heap.

An array-based heap would be much faster if everything fits in memory,
but on a cold cache it has many more memory loads for most operations.
2016-03-08 13:46:19 -08:00
Jason Evans
022f6891fa Avoid a potential innocuous compiler warning.
Add a cast to avoid comparing a ssize_t value to a uint64_t value that
is always larger than a 32-bit ssize_t.  This silences an innocuous
compiler warning from e.g. gcc 4.2.1 about the comparison always having
the same result.
2016-03-02 22:45:37 -08:00
Dmitri Smirnov
33184bf698 Fix stack corruption and uninitialized var warning
Stack corruption happens in x64 bit

This resolves #347.
2016-02-29 15:22:53 -08:00
Jason Evans
39f58755a7 Fix a potential tsd cleanup leak.
Prior to 767d85061a (Refactor arenas array
(fixes deadlock).), it was possible under some circumstances for
arena_get() to trigger recreation of the arenas cache during tsd
cleanup, and the arenas cache would then be leaked.  In principle a
similar issue could still occur as a side effect of decay-based purging,
which calls arena_tdata_get().  Fix arenas_tdata_cleanup() by setting
tsd->arenas_tdata_bypass to true, so that arena_tdata_get() will
gracefully fail (an expected behavior) rather than recreating
tsd->arena_tdata.

Reported by Christopher Ferris <cferris@google.com>.
2016-02-27 21:18:15 -08:00
Jason Evans
3c07f803aa Fix stats.arenas.<i>.[...] for --disable-stats case.
Add missing stats.arenas.<i>.{dss,lg_dirty_mult,decay_time}
initialization.

Fix stats.arenas.<i>.{pactive,pdirty} to read under the protection of
the arena mutex.
2016-02-27 20:40:13 -08:00
Jason Evans
40ee9aa957 Fix stats.cactive accounting regression.
Fix stats.cactive accounting to always increase/decrease by multiples of
the chunk size, even for huge size classes that are not multiples of the
chunk size, e.g. {2.5, 3, 3.5, 5, 7} MiB with 2 MiB chunk size.  This
regression was introduced by 155bfa7da1
(Normalize size classes.) and first released in 4.0.0.

This resolves #336.
2016-02-27 15:35:52 -08:00
Jason Evans
3763d3b5f9 Refactor arena_cactive_update() into arena_cactive_{add,sub}().
This removes an implicit conversion from size_t to ssize_t.  For cactive
decreases, the size_t value was intentionally underflowed to generate
"negative" values (actually positive values above the positive range of
ssize_t), and the conversion to ssize_t was undefined according to C
language semantics.

This regression was perpetuated by
1522937e9c (Fix the cactive statistic.)
and first release in 4.0.0, which in retrospect only fixed one of two
problems introduced by aa5113b1fd
(Refactor overly large/complex functions) and first released in 3.5.0.
2016-02-26 17:29:35 -08:00
buchgr
d412624b25 Move retaining out of default chunk hooks
This fixes chunk allocation to reuse retained memory even if an
application-provided chunk allocation function is in use.

This resolves #307.
2016-02-26 15:24:13 -08:00
Dave Watson
b8823ab026 Use linear scan for small bitmaps
For small bitmaps, a linear scan of the bitmap is slightly faster than
a tree search - bitmap_t is more compact, and there are fewer writes
since we don't have to propogate state transitions up the tree.
On x86_64 with the current settings, I'm seeing ~.5%-1% CPU improvement
in production canaries with this change.

The old tree code is left since 32bit sizes are much larger (and ffsl
smaller), and maybe the run sizes will change in the future.

This resolves #339.
2016-02-26 14:21:10 -08:00
Jason Evans
01ecdf32d6 Miscellaneous bitmap refactoring. 2016-02-26 14:21:10 -08:00
Jason Evans
42ce80e15a Silence miscellaneous 64-to-32-bit data loss warnings.
This resolves #341.
2016-02-25 20:51:00 -08:00
Jason Evans
8282a2ad97 Remove a superfluous comment. 2016-02-25 16:44:48 -08:00
Jason Evans
9d2c10f2e8 Add more HUGE_MAXCLASS overflow checks.
Add HUGE_MAXCLASS overflow checks that are specific to heap profiling
code paths.  This fixes test failures that were introduced by
0c516a00c4 (Make *allocx() size class
overflow behavior defined.).
2016-02-25 16:42:15 -08:00
Jason Evans
0c516a00c4 Make *allocx() size class overflow behavior defined.
Limit supported size and alignment to HUGE_MAXCLASS, which in turn is
now limited to be less than PTRDIFF_MAX.

This resolves #278 and #295.
2016-02-25 15:29:49 -08:00
Jason Evans
767d85061a Refactor arenas array (fixes deadlock).
Refactor the arenas array, which contains pointers to all extant arenas,
such that it starts out as a sparse array of maximum size, and use
double-checked atomics-based reads as the basis for fast and simple
arena_get().  Additionally, reduce arenas_lock's role such that it only
protects against arena initalization races.  These changes remove the
possibility for arena lookups to trigger locking, which resolves at
least one known (fork-related) deadlock.

This resolves #315.
2016-02-24 23:58:10 -08:00
Dave Watson
3812729167 Fix arena_size computation.
Fix arena_size arena_new() computation to incorporate
runs_avail_nclasses elements for runs_avail, rather than
(runs_avail_nclasses - 1) elements.  Since offsetof(arena_t, runs_avail)
is used rather than sizeof(arena_t) for the first term of the
computation, all of the runs_avail elements must be added into the
second term.

This bug was introduced (by Jason Evans) while merging pull request #330
as 3417a304cc (Separate arena_avail
trees).
2016-02-24 20:10:02 -08:00
Dave Watson
cd86c1481a Fix arena_run_first_best_fit
Merge of 3417a304cc looks like a small
bug: first_best_fit doesn't scan through all the classes, since ind is
offset from runs_avail_nclasses by run_avail_bias.
2016-02-24 17:50:02 -08:00
Jason Evans
c7a9a6c86b Attempt mmap-based in-place huge reallocation.
Attempt mmap-based in-place huge reallocation by plumbing new_addr into
chunk_alloc_mmap().  This can dramatically speed up incremental huge
reallocation.

This resolves #335.
2016-02-24 17:23:18 -08:00
Jason Evans
ca8fffb5c1 Silence miscellaneous 64-to-32-bit data loss warnings. 2016-02-24 13:16:51 -08:00
Jason Evans
9e1810ca9d Silence miscellaneous 64-to-32-bit data loss warnings. 2016-02-24 13:03:48 -08:00
Jason Evans
0931cecbfa Use ssize_t for readlink() rather than int. 2016-02-24 13:03:48 -08:00
Jason Evans
8f683b94a7 Make opt_narenas unsigned rather than size_t. 2016-02-24 13:03:48 -08:00
Jason Evans
603b3bd413 Make nhbins unsigned rather than size_t. 2016-02-24 13:03:48 -08:00
Jason Evans
8dd5115ede Explicitly cast mib[] elements to unsigned where appropriate. 2016-02-24 13:03:48 -08:00
Jason Evans
9f4ee6034c Refactor jemalloc_ffs*() into ffs_*().
Use appropriate versions to resolve 64-to-32-bit data loss warnings.
2016-02-24 13:03:48 -08:00
Jason Evans
ae45142adc Collapse arena_avail_tree_* into arena_run_tree_*.
These tree types converged to become identical, yet they still had
independently generated red-black tree implementations.
2016-02-23 18:27:24 -08:00
Dave Watson
3417a304cc Separate arena_avail trees
Separate run trees by index, replacing the previous quantize logic.
Quantization by index is now performed only on insertion / removal from
the tree, and not on node comparison, saving some cpu.  This also means
we don't have to dereference the miscelm* pointers, saving half of the
memory loads from miscelms/mapbits that have fallen out of cache.  A
linear scan of the indicies appears to be fast enough.

The only cost of this is an extra tree array in each arena.
2016-02-23 18:09:36 -08:00
Jason Evans
0da8ce1e96 Use table lookup for run_quantize_{floor,ceil}().
Reduce run quantization overhead by generating lookup tables during
bootstrapping, and using the tables for all subsequent run quantization.
2016-02-22 16:47:34 -08:00
Jason Evans
08551eee58 Fix run_quantize_ceil().
In practice this bug had limited impact (and then only by increasing
chunk fragmentation) because run_quantize_ceil() returned correct
results except for inputs that could only arise from aligned allocation
requests that required more than page alignment.

This bug existed in the original run quantization implementation, which
was introduced by 8a03cf039c (Implement
cache index randomization for large allocations.).
2016-02-22 16:28:00 -08:00
Jason Evans
a9a4684792 Test run quantization.
Also rename run_quantize_*() to improve clarity.  These tests
demonstrate that run_quantize_ceil() is flawed.
2016-02-22 14:58:05 -08:00
Jason Evans
9bad079039 Refactor time_* into nstime_*.
Use a single uint64_t in nstime_t to store nanoseconds rather than using
struct timespec.  This reduces fragility around conversions between long
and uint64_t, especially missing casts that only cause problems on
32-bit platforms.
2016-02-21 21:39:05 -08:00
Jason Evans
788d29d397 Fix Windows-specific prof-related compilation portability issues. 2016-02-20 23:46:14 -08:00
Jason Evans
fd9cd7a6cc Fix time_update() to compile and work on MinGW. 2016-02-20 23:45:22 -08:00
rustyx
efbee86278 Prevent MSVC from optimizing away tls_callback (resolves #318) 2016-02-20 10:52:53 -08:00
rustyx
7f283980f0 getpid() fix for Win32 2016-02-20 10:52:53 -08:00
Jason Evans
243f7a0508 Implement decay-based unused dirty page purging.
This is an alternative to the existing ratio-based unused dirty page
purging, and is intended to eventually become the sole purging
mechanism.

Add mallctls:
- opt.purge
- opt.decay_time
- arena.<i>.decay
- arena.<i>.decay_time
- arenas.decay_time
- stats.arenas.<i>.decay_time

This resolves #325.
2016-02-19 20:56:21 -08:00
Jason Evans
1a4ad3c0fa Refactor out arena_compute_npurge().
Refactor out arena_compute_npurge() by integrating its logic into
arena_stash_dirty() as an incremental computation.
2016-02-19 20:32:37 -08:00
Jason Evans
db927b6727 Refactor arenas_cache tsd.
Refactor arenas_cache tsd into arenas_tdata, which is a structure of
type arena_tdata_t.
2016-02-19 20:32:37 -08:00
Jason Evans
4985dc681e Refactor arena_ralloc_no_move().
Refactor early return logic in arena_ralloc_no_move() to return early on
failure rather than on success.
2016-02-19 20:32:37 -08:00
Jason Evans
578cd16581 Refactor arena_malloc_hard() out of arena_malloc(). 2016-02-19 20:32:32 -08:00
Jason Evans
34676d3369 Refactor prng* from cpp macros into inline functions.
Remove 32-bit variant, convert prng64() to prng_lg_range(), and add
prng_range().
2016-02-19 20:29:06 -08:00
Jason Evans
c87ab25d18 Use ticker for incremental tcache GC. 2016-02-19 20:29:06 -08:00
Jason Evans
9998000b2b Implement ticker.
Implement ticker, which provides a simple API for ticking off some
number of events before indicating that the ticker has hit its limit.
2016-02-19 20:29:06 -08:00
Jason Evans
94451d184b Flesh out time_*() API. 2016-02-19 20:29:06 -08:00
Cameron Evans
e5d5a4a517 Add time_update(). 2016-02-19 20:29:06 -08:00
Jason Evans
f829009929 Add --with-malloc-conf.
Add --with-malloc-conf, which makes it possible to embed a default
options string during configuration.
2016-02-19 20:29:06 -08:00
Cosmin Paraschiv
9cb481a73f Call malloc_test_boot0() from malloc_init_hard_recursible().
When using LinuxThreads, malloc bootstrapping deadlocks, since
malloc_tsd_boot0() ends up calling pthread_setspecific(), which causes
recursive allocation.  Fix it by moving the malloc_tsd_boot0() call to
malloc_init_hard_recursible().

The deadlock was introduced by 8bb3198f72
(Refactor/fix arenas manipulation.), when tsd_boot() was split and the
top half, tsd_boot0(), got an extra tsd_wrapper_set() call.
2016-01-11 11:10:39 -08:00
Jason Evans
f9e3459f75 Tweak code to allow compilation of concatenated src/*.c sources.
This resolves #294.
2015-11-12 11:06:41 -08:00