Commit Graph

155 Commits

Author SHA1 Message Date
Dave Watson
cd86c1481a Fix arena_run_first_best_fit
Merge of 3417a304cc looks like a small
bug: first_best_fit doesn't scan through all the classes, since ind is
offset from runs_avail_nclasses by run_avail_bias.
2016-02-24 17:50:02 -08:00
Jason Evans
9e1810ca9d Silence miscellaneous 64-to-32-bit data loss warnings. 2016-02-24 13:03:48 -08:00
Jason Evans
9f4ee6034c Refactor jemalloc_ffs*() into ffs_*().
Use appropriate versions to resolve 64-to-32-bit data loss warnings.
2016-02-24 13:03:48 -08:00
Jason Evans
ae45142adc Collapse arena_avail_tree_* into arena_run_tree_*.
These tree types converged to become identical, yet they still had
independently generated red-black tree implementations.
2016-02-23 18:27:24 -08:00
Dave Watson
3417a304cc Separate arena_avail trees
Separate run trees by index, replacing the previous quantize logic.
Quantization by index is now performed only on insertion / removal from
the tree, and not on node comparison, saving some cpu.  This also means
we don't have to dereference the miscelm* pointers, saving half of the
memory loads from miscelms/mapbits that have fallen out of cache.  A
linear scan of the indicies appears to be fast enough.

The only cost of this is an extra tree array in each arena.
2016-02-23 18:09:36 -08:00
Jason Evans
0da8ce1e96 Use table lookup for run_quantize_{floor,ceil}().
Reduce run quantization overhead by generating lookup tables during
bootstrapping, and using the tables for all subsequent run quantization.
2016-02-22 16:47:34 -08:00
Jason Evans
08551eee58 Fix run_quantize_ceil().
In practice this bug had limited impact (and then only by increasing
chunk fragmentation) because run_quantize_ceil() returned correct
results except for inputs that could only arise from aligned allocation
requests that required more than page alignment.

This bug existed in the original run quantization implementation, which
was introduced by 8a03cf039c (Implement
cache index randomization for large allocations.).
2016-02-22 16:28:00 -08:00
Jason Evans
a9a4684792 Test run quantization.
Also rename run_quantize_*() to improve clarity.  These tests
demonstrate that run_quantize_ceil() is flawed.
2016-02-22 14:58:05 -08:00
Jason Evans
9bad079039 Refactor time_* into nstime_*.
Use a single uint64_t in nstime_t to store nanoseconds rather than using
struct timespec.  This reduces fragility around conversions between long
and uint64_t, especially missing casts that only cause problems on
32-bit platforms.
2016-02-21 21:39:05 -08:00
Jason Evans
243f7a0508 Implement decay-based unused dirty page purging.
This is an alternative to the existing ratio-based unused dirty page
purging, and is intended to eventually become the sole purging
mechanism.

Add mallctls:
- opt.purge
- opt.decay_time
- arena.<i>.decay
- arena.<i>.decay_time
- arenas.decay_time
- stats.arenas.<i>.decay_time

This resolves #325.
2016-02-19 20:56:21 -08:00
Jason Evans
1a4ad3c0fa Refactor out arena_compute_npurge().
Refactor out arena_compute_npurge() by integrating its logic into
arena_stash_dirty() as an incremental computation.
2016-02-19 20:32:37 -08:00
Jason Evans
4985dc681e Refactor arena_ralloc_no_move().
Refactor early return logic in arena_ralloc_no_move() to return early on
failure rather than on success.
2016-02-19 20:32:37 -08:00
Jason Evans
578cd16581 Refactor arena_malloc_hard() out of arena_malloc(). 2016-02-19 20:32:32 -08:00
Jason Evans
34676d3369 Refactor prng* from cpp macros into inline functions.
Remove 32-bit variant, convert prng64() to prng_lg_range(), and add
prng_range().
2016-02-19 20:29:06 -08:00
Qi Wang
f4a0f32d34 Fast-path improvement: reduce # of branches and unnecessary operations.
- Combine multiple runtime branches into a single malloc_slow check.
- Avoid calling arena_choose / size2index / index2size on fast path.
- A few micro optimizations.
2015-11-10 14:28:34 -08:00
Joshua Kahn
13b4015531 Allow const keys for lookup
Signed-off-by: Steve Dougherty <sdougherty@barracuda.com>

This resolves #281.
2015-11-09 15:48:05 -08:00
Mike Hommey
f97298bfc1 Remove arena_run_dalloc_decommit().
This resolves #284.
2015-11-09 15:38:30 -08:00
Jason Evans
a784e411f2 Fix a xallocx(..., MALLOCX_ZERO) bug.
Fix xallocx(..., MALLOCX_ZERO to zero the last full trailing page of
large allocations that have been randomly assigned an offset of 0 when
--enable-cache-oblivious configure option is enabled.  This addresses a
special case missed in d260f442ce (Fix
xallocx(..., MALLOCX_ZERO) bugs.).
2015-09-24 22:21:55 -07:00
Jason Evans
d260f442ce Fix xallocx(..., MALLOCX_ZERO) bugs.
Zero all trailing bytes of large allocations when
--enable-cache-oblivious configure option is enabled.  This regression
was introduced by 8a03cf039c (Implement
cache index randomization for large allocations.).

Zero trailing bytes of huge allocations when resizing from/to a size
class that is not a multiple of the chunk size.
2015-09-24 16:38:45 -07:00
Jason Evans
e56b24e3a2 Make arena_dalloc_large_locked_impl() static. 2015-09-20 09:58:10 -07:00
Jason Evans
9a505b768c Centralize xallocx() size[+extra] overflow checks. 2015-09-15 14:39:58 -07:00
Jason Evans
676df88e48 Rename arena_maxclass to large_maxclass.
arena_maxclass is no longer an appropriate name, because arenas also
manage huge allocations.
2015-09-11 20:50:20 -07:00
Jason Evans
560a4e1e01 Fix xallocx() bugs.
Fix xallocx() bugs related to the 'extra' parameter when specified as
non-zero.
2015-09-11 20:40:34 -07:00
Dmitry-Me
a306a60651 Reduce variables scope 2015-09-04 10:42:33 -07:00
Jason Evans
d01fd19755 Rename index_t to szind_t to avoid an existing type on Solaris.
This resolves #256.
2015-08-19 15:21:32 -07:00
Jason Evans
5ef33a9f2b Don't bitshift by negative amounts.
Don't bitshift by negative amounts when encoding/decoding run sizes in
chunk header maps.  This affected systems with page sizes greater than 8
KiB.

Reported by Ingvar Hagelund <ingvar@redpill-linpro.com>.
2015-08-19 14:16:30 -07:00
Jason Evans
1f27abc1b1 Refactor arena_mapbits_{small,large}_set() to not preserve unzeroed.
Fix arena_run_split_large_helper() to treat newly committed memory as
zeroed.
2015-08-11 16:45:47 -07:00
Jason Evans
45186f0c07 Refactor arena_mapbits unzeroed flag management.
Only set the unzeroed flag when initializing the entire mapbits entry,
rather than mutating just the unzeroed bit.  This simplifies the
possible mapbits state transitions.
2015-08-10 23:03:34 -07:00
Jason Evans
de249c8679 Arena chunk decommit cleanups and fixes.
Decommit arena chunk header during chunk deallocation if the rest of the
chunk is decommitted.
2015-08-10 17:13:59 -07:00
Jason Evans
8fadb1a8c2 Implement chunk hook support for page run commit/decommit.
Cascade from decommit to purge when purging unused dirty pages, so that
it is possible to decommit cleaned memory rather than just purging.  For
non-Windows debug builds, decommit runs rather than purging them, since
this causes access of deallocated runs to segfault.

This resolves #251.
2015-08-07 00:50:58 -07:00
Jason Evans
5716d97f75 Fix an in-place growing large reallocation regression.
Fix arena_ralloc_large_grow() to properly account for large_pad, so that
in-place large reallocation succeeds when possible, rather than always
failing.  This regression was introduced by
8a03cf039c (Implement cache index
randomization for large allocations.)
2015-08-06 23:45:45 -07:00
Jason Evans
b49a334a64 Generalize chunk management hooks.
Add the "arena.<i>.chunk_hooks" mallctl, which replaces and expands on
the "arena.<i>.chunk.{alloc,dalloc,purge}" mallctls.  The chunk hooks
allow control over chunk allocation/deallocation, decommit/commit,
purging, and splitting/merging, such that the application can rely on
jemalloc's internal chunk caching and retaining functionality, yet
implement a variety of chunk management mechanisms and policies.

Merge the chunks_[sz]ad_{mmap,dss} red-black trees into
chunks_[sz]ad_retained.  This slightly reduces how hard jemalloc tries
to honor the dss precedence setting; prior to this change the precedence
setting was also consulted when recycling chunks.

Fix chunk purging.  Don't purge chunks in arena_purge_stashed(); instead
deallocate them in arena_unstash_purged(), so that the dirty memory
linkage remains valid until after the last time it is used.

This resolves #176 and #201.
2015-08-03 21:49:02 -07:00
Jason Evans
50883deb6e Change arena_palloc_large() parameter from size to usize.
This change merely documents that arena_palloc_large() always receives
usize as its argument.
2015-07-23 17:13:18 -07:00
Jason Evans
5fae7dc1b3 Fix MinGW-related portability issues.
Create and use FMT* macros that are equivalent to the PRI* macros that
inttypes.h defines.  This allows uniform use of the Unix-specific format
specifiers, e.g. "%zu", as well as avoiding Windows-specific definitions
of e.g. PRIu64.

Add ffs()/ffsl() support for compiling with gcc.

Extract compatibility definitions of ENOENT, EINVAL, EAGAIN, EPERM,
ENOMEM, and ENORANGE into include/msvc_compat/windows_extra.h and
use the file for tests as well as for core jemalloc code.
2015-07-23 13:56:25 -07:00
Jason Evans
aa2826621e Revert to first-best-fit run/chunk allocation.
This effectively reverts 97c04a9383 (Use
first-fit rather than first-best-fit run/chunk allocation.).  In some
pathological cases, first-fit search dominates allocation time, and it
also tends not to converge as readily on a steady state of memory
layout, since precise allocation order has a bigger effect than for
first-best-fit.
2015-07-15 17:15:19 -07:00
Jason Evans
0313607e66 Fix MinGW build warnings.
Conditionally define ENOENT, EINVAL, etc. (was unconditional).

Add/use PRIzu, PRIzd, and PRIzx for use in malloc_printf() calls.  gcc issued
(harmless) warnings since e.g. "%zu" should be "%Iu" on Windows, and the
alternative to this workaround would have been to disable the function
attributes which cause gcc to look for type mismatches in formatted printing
function calls.
2015-07-07 20:10:28 -07:00
Jason Evans
bce61d61bb Move a variable declaration closer to its use. 2015-07-07 09:32:05 -07:00
Jason Evans
0a9f9a4d51 Convert arena_maybe_purge() recursion to iteration.
This resolves #235.
2015-06-22 18:50:58 -07:00
Jason Evans
5154175cf1 Fix performance regression in arena_palloc().
Pass large allocation requests to arena_malloc() when possible.  This
regression was introduced by 155bfa7da1
(Normalize size classes.).
2015-05-19 17:42:31 -07:00
Jason Evans
8a03cf039c Implement cache index randomization for large allocations.
Extract szad size quantization into {extent,run}_quantize(), and .
quantize szad run sizes to the union of valid small region run sizes and
large run sizes.

Refactor iteration in arena_run_first_fit() to use
run_quantize{,_first,_next(), and add support for padded large runs.

For large allocations that have no specified alignment constraints,
compute a pseudo-random offset from the beginning of the first backing
page that is a multiple of the cache line size.  Under typical
configurations with 4-KiB pages and 64-byte cache lines this results in
a uniform distribution among 64 page boundary offsets.

Add the --disable-cache-oblivious option, primarily intended for
performance testing.

This resolves #13.
2015-05-06 13:27:39 -07:00
Jason Evans
65db63cf3f Fix in-place shrinking huge reallocation purging bugs.
Fix the shrinking case of huge_ralloc_no_move_similar() to purge the
correct number of pages, at the correct offset.  This regression was
introduced by 8d6a3e8321 (Implement
dynamic per arena control over dirty page purging.).

Fix huge_ralloc_no_move_shrink() to purge the correct number of pages.
This bug was introduced by 9673983443
(Purge/zero sub-chunk huge allocations as necessary.).
2015-03-25 19:10:06 -07:00
Jason Evans
562d266511 Add the "stats.arenas.<i>.lg_dirty_mult" mallctl. 2015-03-24 16:41:38 -07:00
Jason Evans
bd16ea49c3 Fix signed/unsigned comparison in arena_lg_dirty_mult_valid(). 2015-03-24 15:59:28 -07:00
Jason Evans
8d6a3e8321 Implement dynamic per arena control over dirty page purging.
Add mallctls:
- arenas.lg_dirty_mult is initialized via opt.lg_dirty_mult, and can be
  modified to change the initial lg_dirty_mult setting for newly created
  arenas.
- arena.<i>.lg_dirty_mult controls an individual arena's dirty page
  purging threshold, and synchronously triggers any purging that may be
  necessary to maintain the constraint.
- arena.<i>.chunk.purge allows the per arena dirty page purging function
  to be replaced.

This resolves #93.
2015-03-18 18:55:33 -07:00
Jason Evans
bc45d41d23 Fix a declaration-after-statement regression. 2015-03-11 16:50:40 -07:00
Jason Evans
f5c8f37259 Normalize rdelm/rd structure field naming. 2015-03-10 18:29:49 -07:00
Jason Evans
38e42d311c Refactor dirty run linkage to reduce sizeof(extent_node_t). 2015-03-10 18:15:40 -07:00
Jason Evans
97c04a9383 Use first-fit rather than first-best-fit run/chunk allocation.
This tends to more effectively pack active memory toward low addresses.
However, additional tree searches are required in many cases, so whether
this change stands the test of time will depend on real-world
benchmarks.
2015-03-06 20:21:41 -08:00
Jason Evans
5707d6f952 Quantize szad trees by size class.
Treat sizes that round down to the same size class as size-equivalent
in trees that are used to search for first best fit, so that there are
only as many "firsts" as there are size classes.  This comes closer to
the ideal of first fit.
2015-03-06 20:21:41 -08:00
Jason Evans
99bd94fb65 Fix chunk cache races.
These regressions were introduced by
ee41ad409a (Integrate whole chunks into
unused dirty page purging machinery.).
2015-02-18 16:40:53 -08:00