Commit Graph

492 Commits

Author SHA1 Message Date
Jason Evans
0c516a00c4 Make *allocx() size class overflow behavior defined.
Limit supported size and alignment to HUGE_MAXCLASS, which in turn is
now limited to be less than PTRDIFF_MAX.

This resolves #278 and #295.
2016-02-25 15:29:49 -08:00
Jason Evans
767d85061a Refactor arenas array (fixes deadlock).
Refactor the arenas array, which contains pointers to all extant arenas,
such that it starts out as a sparse array of maximum size, and use
double-checked atomics-based reads as the basis for fast and simple
arena_get().  Additionally, reduce arenas_lock's role such that it only
protects against arena initalization races.  These changes remove the
possibility for arena lookups to trigger locking, which resolves at
least one known (fork-related) deadlock.

This resolves #315.
2016-02-24 23:58:10 -08:00
Dave Watson
3812729167 Fix arena_size computation.
Fix arena_size arena_new() computation to incorporate
runs_avail_nclasses elements for runs_avail, rather than
(runs_avail_nclasses - 1) elements.  Since offsetof(arena_t, runs_avail)
is used rather than sizeof(arena_t) for the first term of the
computation, all of the runs_avail elements must be added into the
second term.

This bug was introduced (by Jason Evans) while merging pull request #330
as 3417a304cc (Separate arena_avail
trees).
2016-02-24 20:10:02 -08:00
Dave Watson
cd86c1481a Fix arena_run_first_best_fit
Merge of 3417a304cc looks like a small
bug: first_best_fit doesn't scan through all the classes, since ind is
offset from runs_avail_nclasses by run_avail_bias.
2016-02-24 17:50:02 -08:00
Jason Evans
c7a9a6c86b Attempt mmap-based in-place huge reallocation.
Attempt mmap-based in-place huge reallocation by plumbing new_addr into
chunk_alloc_mmap().  This can dramatically speed up incremental huge
reallocation.

This resolves #335.
2016-02-24 17:23:18 -08:00
Jason Evans
ca8fffb5c1 Silence miscellaneous 64-to-32-bit data loss warnings. 2016-02-24 13:16:51 -08:00
Jason Evans
9e1810ca9d Silence miscellaneous 64-to-32-bit data loss warnings. 2016-02-24 13:03:48 -08:00
Jason Evans
0931cecbfa Use ssize_t for readlink() rather than int. 2016-02-24 13:03:48 -08:00
Jason Evans
8f683b94a7 Make opt_narenas unsigned rather than size_t. 2016-02-24 13:03:48 -08:00
Jason Evans
603b3bd413 Make nhbins unsigned rather than size_t. 2016-02-24 13:03:48 -08:00
Jason Evans
8dd5115ede Explicitly cast mib[] elements to unsigned where appropriate. 2016-02-24 13:03:48 -08:00
Jason Evans
9f4ee6034c Refactor jemalloc_ffs*() into ffs_*().
Use appropriate versions to resolve 64-to-32-bit data loss warnings.
2016-02-24 13:03:48 -08:00
Jason Evans
ae45142adc Collapse arena_avail_tree_* into arena_run_tree_*.
These tree types converged to become identical, yet they still had
independently generated red-black tree implementations.
2016-02-23 18:27:24 -08:00
Dave Watson
3417a304cc Separate arena_avail trees
Separate run trees by index, replacing the previous quantize logic.
Quantization by index is now performed only on insertion / removal from
the tree, and not on node comparison, saving some cpu.  This also means
we don't have to dereference the miscelm* pointers, saving half of the
memory loads from miscelms/mapbits that have fallen out of cache.  A
linear scan of the indicies appears to be fast enough.

The only cost of this is an extra tree array in each arena.
2016-02-23 18:09:36 -08:00
Jason Evans
0da8ce1e96 Use table lookup for run_quantize_{floor,ceil}().
Reduce run quantization overhead by generating lookup tables during
bootstrapping, and using the tables for all subsequent run quantization.
2016-02-22 16:47:34 -08:00
Jason Evans
08551eee58 Fix run_quantize_ceil().
In practice this bug had limited impact (and then only by increasing
chunk fragmentation) because run_quantize_ceil() returned correct
results except for inputs that could only arise from aligned allocation
requests that required more than page alignment.

This bug existed in the original run quantization implementation, which
was introduced by 8a03cf039c (Implement
cache index randomization for large allocations.).
2016-02-22 16:28:00 -08:00
Jason Evans
a9a4684792 Test run quantization.
Also rename run_quantize_*() to improve clarity.  These tests
demonstrate that run_quantize_ceil() is flawed.
2016-02-22 14:58:05 -08:00
Jason Evans
9bad079039 Refactor time_* into nstime_*.
Use a single uint64_t in nstime_t to store nanoseconds rather than using
struct timespec.  This reduces fragility around conversions between long
and uint64_t, especially missing casts that only cause problems on
32-bit platforms.
2016-02-21 21:39:05 -08:00
Jason Evans
788d29d397 Fix Windows-specific prof-related compilation portability issues. 2016-02-20 23:46:14 -08:00
Jason Evans
fd9cd7a6cc Fix time_update() to compile and work on MinGW. 2016-02-20 23:45:22 -08:00
rustyx
efbee86278 Prevent MSVC from optimizing away tls_callback (resolves #318) 2016-02-20 10:52:53 -08:00
rustyx
7f283980f0 getpid() fix for Win32 2016-02-20 10:52:53 -08:00
Jason Evans
243f7a0508 Implement decay-based unused dirty page purging.
This is an alternative to the existing ratio-based unused dirty page
purging, and is intended to eventually become the sole purging
mechanism.

Add mallctls:
- opt.purge
- opt.decay_time
- arena.<i>.decay
- arena.<i>.decay_time
- arenas.decay_time
- stats.arenas.<i>.decay_time

This resolves #325.
2016-02-19 20:56:21 -08:00
Jason Evans
1a4ad3c0fa Refactor out arena_compute_npurge().
Refactor out arena_compute_npurge() by integrating its logic into
arena_stash_dirty() as an incremental computation.
2016-02-19 20:32:37 -08:00
Jason Evans
db927b6727 Refactor arenas_cache tsd.
Refactor arenas_cache tsd into arenas_tdata, which is a structure of
type arena_tdata_t.
2016-02-19 20:32:37 -08:00
Jason Evans
4985dc681e Refactor arena_ralloc_no_move().
Refactor early return logic in arena_ralloc_no_move() to return early on
failure rather than on success.
2016-02-19 20:32:37 -08:00
Jason Evans
578cd16581 Refactor arena_malloc_hard() out of arena_malloc(). 2016-02-19 20:32:32 -08:00
Jason Evans
34676d3369 Refactor prng* from cpp macros into inline functions.
Remove 32-bit variant, convert prng64() to prng_lg_range(), and add
prng_range().
2016-02-19 20:29:06 -08:00
Jason Evans
c87ab25d18 Use ticker for incremental tcache GC. 2016-02-19 20:29:06 -08:00
Jason Evans
9998000b2b Implement ticker.
Implement ticker, which provides a simple API for ticking off some
number of events before indicating that the ticker has hit its limit.
2016-02-19 20:29:06 -08:00
Jason Evans
94451d184b Flesh out time_*() API. 2016-02-19 20:29:06 -08:00
Cameron Evans
e5d5a4a517 Add time_update(). 2016-02-19 20:29:06 -08:00
Jason Evans
f829009929 Add --with-malloc-conf.
Add --with-malloc-conf, which makes it possible to embed a default
options string during configuration.
2016-02-19 20:29:06 -08:00
Cosmin Paraschiv
9cb481a73f Call malloc_test_boot0() from malloc_init_hard_recursible().
When using LinuxThreads, malloc bootstrapping deadlocks, since
malloc_tsd_boot0() ends up calling pthread_setspecific(), which causes
recursive allocation.  Fix it by moving the malloc_tsd_boot0() call to
malloc_init_hard_recursible().

The deadlock was introduced by 8bb3198f72
(Refactor/fix arenas manipulation.), when tsd_boot() was split and the
top half, tsd_boot0(), got an extra tsd_wrapper_set() call.
2016-01-11 11:10:39 -08:00
Jason Evans
f9e3459f75 Tweak code to allow compilation of concatenated src/*.c sources.
This resolves #294.
2015-11-12 11:06:41 -08:00
Dmitry-Me
ea59ebf4d3 Reuse previously computed value 2015-11-12 10:45:49 -08:00
Qi Wang
f4a0f32d34 Fast-path improvement: reduce # of branches and unnecessary operations.
- Combine multiple runtime branches into a single malloc_slow check.
- Avoid calling arena_choose / size2index / index2size on fast path.
- A few micro optimizations.
2015-11-10 14:28:34 -08:00
Joshua Kahn
13b4015531 Allow const keys for lookup
Signed-off-by: Steve Dougherty <sdougherty@barracuda.com>

This resolves #281.
2015-11-09 15:48:05 -08:00
Mike Hommey
f97298bfc1 Remove arena_run_dalloc_decommit().
This resolves #284.
2015-11-09 15:38:30 -08:00
Jason Evans
a784e411f2 Fix a xallocx(..., MALLOCX_ZERO) bug.
Fix xallocx(..., MALLOCX_ZERO to zero the last full trailing page of
large allocations that have been randomly assigned an offset of 0 when
--enable-cache-oblivious configure option is enabled.  This addresses a
special case missed in d260f442ce (Fix
xallocx(..., MALLOCX_ZERO) bugs.).
2015-09-24 22:21:55 -07:00
Jason Evans
d36c7ebb00 Work around an NPTL-specific TSD issue.
Work around a potentially bad thread-specific data initialization
interaction with NPTL (glibc's pthreads implementation).

This resolves #283.
2015-09-24 16:53:18 -07:00
Jason Evans
d260f442ce Fix xallocx(..., MALLOCX_ZERO) bugs.
Zero all trailing bytes of large allocations when
--enable-cache-oblivious configure option is enabled.  This regression
was introduced by 8a03cf039c (Implement
cache index randomization for large allocations.).

Zero trailing bytes of huge allocations when resizing from/to a size
class that is not a multiple of the chunk size.
2015-09-24 16:38:45 -07:00
Jason Evans
fb64ec29ec Fix prof_tctx_dump_iter() to filter.
Fix prof_tctx_dump_iter() to filter out nodes that were created after
heap profile dumping started.  Prior to this fix, spurious entries with
arbitrary object/byte counts could appear in heap profiles, which
resulted in jeprof inaccuracies or failures.
2015-09-21 18:37:55 -07:00
Jason Evans
e56b24e3a2 Make arena_dalloc_large_locked_impl() static. 2015-09-20 09:58:10 -07:00
Jason Evans
21523297fc Add mallocx() OOM tests. 2015-09-17 15:27:28 -07:00
Jason Evans
3ca0cf6a68 Fix prof_alloc_rollback().
Fix prof_alloc_rollback() to read tdata from thread-specific data rather
than dereferencing a potentially invalid tctx.
2015-09-17 14:49:50 -07:00
Jason Evans
3263be6efb Simplify imallocx_prof_sample().
Simplify imallocx_prof_sample() to always operate on usize rather than
sometimes using size.  This avoids redundant usize computations and
more closely fits the style adopted by i[rx]allocx_prof_sample() to fix
sampling bugs.
2015-09-17 10:19:28 -07:00
Jason Evans
4be9c79f88 Fix irallocx_prof_sample().
Fix irallocx_prof_sample() to always allocate large regions, even when
alignment is non-zero.
2015-09-17 10:17:55 -07:00
Jason Evans
38e2c8fa9c Fix ixallocx_prof_sample().
Fix ixallocx_prof_sample() to never modify nor create sampled small
allocations.  xallocx() is in general incapable of moving small
allocations, so this fix removes buggy code without loss of generality.
2015-09-17 10:05:56 -07:00
Jason Evans
9a505b768c Centralize xallocx() size[+extra] overflow checks. 2015-09-15 14:39:58 -07:00