Commit Graph

813 Commits

Author SHA1 Message Date
Jason Evans
ffa45a5331 Use rtree rather than [sz]ad trees for chunk split/coalesce operations. 2016-06-03 12:27:41 -07:00
Jason Evans
93e79c5c3f Remove redundant chunk argument from chunk_{,de,re}register(). 2016-06-03 12:27:41 -07:00
Jason Evans
f442254bdf Fix opt_zero-triggered in-place huge reallocation zeroing.
Fix huge_ralloc_no_move_expand() to update the extent's zeroed attribute
based on the intersection of the previous value and that of the newly
merged trailing extent.
2016-06-03 12:27:41 -07:00
Jason Evans
d78846c989 Replace extent_achunk_[gs]et() with extent_slab_[gs]et(). 2016-06-03 12:27:41 -07:00
Jason Evans
fae8344098 Add extent_active_[gs]et().
Always initialize extents' runs_dirty and chunks_cache linkage.
2016-06-03 12:27:41 -07:00
Jason Evans
b2a9fae886 Set/unset rtree node for last chunk of extents.
Set/unset rtree node for last chunk of extents, so that the rtree can be
used for chunk coalescing.
2016-06-03 12:27:41 -07:00
Jason Evans
e75e9be130 Add rtree element witnesses. 2016-06-03 12:27:41 -07:00
Jason Evans
8c9be3e837 Refactor rtree to always use base_alloc() for node allocation. 2016-06-03 12:27:41 -07:00
Jason Evans
db72272bef Use rtree-based chunk lookups rather than pointer bit twiddling.
Look up chunk metadata via the radix tree, rather than using
CHUNK_ADDR2BASE().

Propagate pointer's containing extent.

Minimize extent lookups by doing a single lookup (e.g. in free()) and
propagating the pointer's extent into nearly all the functions that may
need it.
2016-06-03 12:27:41 -07:00
Jason Evans
2d2b4e98c9 Add element acquire/release capabilities to rtree.
This makes it possible to acquire short-term "ownership" of rtree
elements so that it is possible to read an extent pointer *and* read the
extent's contents with a guarantee that the element will not be modified
until the ownership is released.  This is intended as a mechanism for
resolving rtree read/write races rather than as a way to lock extents.
2016-06-03 12:27:33 -07:00
Jason Evans
a7a6f5bc96 Rename extent_node_t to extent_t. 2016-05-16 12:21:28 -07:00
Jason Evans
3aea827f5e Simplify run quantization. 2016-05-16 12:21:27 -07:00
Jason Evans
7bb00ae9d6 Refactor runs_avail.
Use pszind_t size classes rather than szind_t size classes, and always
reserve space for NPSIZES elements.  This removes unused heaps that are
not multiples of the page size, and adds (currently) unused heaps for
all huge size classes, with the immediate benefit that the size of
arena_t allocations is constant (no longer dependent on chunk size).
2016-05-16 12:21:21 -07:00
Jason Evans
226c446979 Implement pz2ind(), pind2sz(), and psz2u().
These compute size classes and indices similarly to size2index(),
index2size() and s2u(), respectively, but using the subset of size
classes that are multiples of the page size.  Note that pszind_t and
szind_t are not interchangeable.
2016-05-13 10:31:54 -07:00
Jason Evans
627372b459 Initialize arena_bin_info at compile time rather than at boot time.
This resolves #370.
2016-05-13 10:31:30 -07:00
Jason Evans
b683734b43 Implement BITMAP_INFO_INITIALIZER(nbits).
This allows static initialization of bitmap_info_t structures.
2016-05-13 10:27:48 -07:00
Jason Evans
17c021c177 Remove redzone support.
This resolves #369.
2016-05-13 10:27:33 -07:00
Jason Evans
ba5c709517 Remove quarantine support. 2016-05-13 10:25:05 -07:00
Jason Evans
9a8add1510 Remove Valgrind support. 2016-05-13 09:56:18 -07:00
Jason Evans
a397045323 Use TSDN_NULL rather than NULL as appropriate. 2016-05-12 21:07:08 -07:00
Jason Evans
1c35f63797 Guard tsdn_tsd() call with tsdn_null() check. 2016-05-11 16:52:58 -07:00
Jason Evans
0fc1317fc6 Mangle tested functions as n_witness_* rather than witness_*_impl. 2016-05-11 16:14:20 -07:00
Jason Evans
73d3d58dc2 Optimize witness fast path.
Short-circuit commonly called witness functions so that they only
execute in debug builds, and remove equivalent guards from mutex
functions.  This avoids pointless code execution in
witness_assert_lockless(), which is typically called twice per
allocation/deallocation function invocation.

Inline commonly called witness functions so that optimized builds can
completely remove calls as dead code.
2016-05-11 15:38:06 -07:00
Jason Evans
7790a0ba40 Fix chunk accounting related to triggering gdump profiles.
Fix in place huge reallocation to update the chunk counters that are
used for triggering gdump profiles.
2016-05-11 00:56:30 -07:00
Jason Evans
c1e00ef2a6 Resolve bootstrapping issues when embedded in FreeBSD libc.
b2c0d6322d (Add witness, a simple online
locking validator.) caused a broad propagation of tsd throughout the
internal API, but tsd_fetch() was designed to fail prior to tsd
bootstrapping.  Fix this by splitting tsd_t into non-nullable tsd_t and
nullable tsdn_t, and modifying all internal APIs that do not critically
rely on tsd to take nullable pointers.  Furthermore, add the
tsd_booted_get() function so that tsdn_fetch() can probe whether tsd
bootstrapping is complete and return NULL if not.  All dangerous
conversions of nullable pointers are tsdn_tsd() calls that assert-fail
on invalid conversion.
2016-05-10 22:51:33 -07:00
Jason Evans
0c12dcabc5 Fix tsd bootstrapping for a0malloc(). 2016-05-07 16:55:36 -07:00
Jason Evans
3ef51d7f73 Optimize the fast paths of calloc() and [m,d,sd]allocx().
This is a broader application of optimizations to malloc() and free() in
f4a0f32d34 (Fast-path improvement:
reduce # of branches and unnecessary operations.).

This resolves #321.
2016-05-06 14:37:39 -07:00
Jason Evans
c2f970c32b Modify pages_map() to support mapping uncommitted virtual memory.
If the OS overcommits:
- Commit all mappings in pages_map() regardless of whether the caller
  requested committed memory.
- Linux-specific: Specify MAP_NORESERVE to avoid
  unfortunate interactions with heuristic overcommit mode during
  fork(2).

This resolves #193.
2016-05-05 18:56:17 -07:00
Jason Evans
dc391adc65 Scale leak report summary according to sampling probability.
This makes the numbers reported in the leak report summary closely match
those reported by jeprof.

This resolves #356.
2016-05-04 12:14:36 -07:00
Jason Evans
04c3c0f9a0 Add the stats.retained and stats.arenas.<i>.retained statistics.
This resolves #367.
2016-05-03 22:11:35 -07:00
Jason Evans
90827a3f3e Fix huge_palloc() regression.
Split arena_choose() into arena_[i]choose() and use arena_ichoose() for
arena lookup during internal allocation.  This fixes huge_palloc() so
that it always succeeds during extent node allocation.

This regression was introduced by
66cd953514 (Do not allocate metadata via
non-auto arenas, nor tcaches.).
2016-05-03 17:19:15 -07:00
Jason Evans
108c4a11e9 Fix witness/fork() interactions.
Fix witness to clear its list of owned mutexes in the child if
platform-specific malloc_mutex code re-initializes mutexes rather than
unlocking them.
2016-04-26 10:47:22 -07:00
Jason Evans
174c0c3a9c Fix fork()-related lock rank ordering reversals. 2016-04-25 23:16:20 -07:00
Jason Evans
7e6749595a Fix arena reset effects on large/huge stats.
Reset large curruns to 0 during arena reset.

Do not increase huge ndalloc stats during arena reset.
2016-04-25 13:26:54 -07:00
Jason Evans
259f8ebbfc Fix arena_choose_hard() regression.
This regression was caused by 66cd953514
(Do not allocate metadata via non-auto arenas, nor tcaches.).
2016-04-22 22:21:31 -07:00
Jason Evans
19ff2cefba Implement the arena.<i>.reset mallctl.
This makes it possible to discard all of an arena's allocations in a
single operation.

This resolves #146.
2016-04-22 15:20:06 -07:00
Jason Evans
66cd953514 Do not allocate metadata via non-auto arenas, nor tcaches.
This assures that all internally allocated metadata come from the
first opt_narenas arenas, i.e. the automatically multiplexed arenas.
2016-04-22 15:19:59 -07:00
Jason Evans
c9a4bf9170 Reduce a variable scope. 2016-04-22 14:56:58 -07:00
Jason Evans
ab0cfe01fa Update private_symbols.txt.
Change test-related mangling to simplify symbol filtering.

The following commands can be used to detect missing/obsolete symbol
mangling, with the caveat that the full set of symbols is based on the
union of symbols generated by all configurations, some of which are
platform-specific:

./autogen.sh --enable-debug --enable-prof --enable-lazy-lock
make all tests
nm -a lib/libjemalloc.a src/*.jet.o \
  |grep " [TDBCR] " \
  |awk '{print $3}' \
  |sed -e 's/^\(je_\|jet_\(n_\)\?\)\([a-zA-Z0-9_]*\)/\3/g' \
  |LC_COLLATE=C sort -u \
  |grep -v \
   -e '^\(malloc\|calloc\|posix_memalign\|aligned_alloc\|realloc\|free\)$' \
   -e '^\(m\|r\|x\|s\|d\|sd\|n\)allocx$' \
   -e '^mallctl\(\|nametomib\|bymib\)$' \
   -e '^malloc_\(stats_print\|usable_size\|message\)$' \
   -e '^\(memalign\|valloc\)$' \
   -e '^__\(malloc\|memalign\|realloc\|free\)_hook$' \
   -e '^pthread_create$' \
  > /tmp/private_symbols.txt
2016-04-18 15:23:35 -07:00
Jason Evans
1423ee9016 Fix style nits. 2016-04-17 13:44:59 -07:00
Jason Evans
1b5830178f Fix malloc_mutex_[un]lock() to conditionally check witness.
Also remove tautological cassert(config_debug) calls.
2016-04-17 13:44:59 -07:00
Jason Evans
d9394d0ca8 Convert base_mtx locking protocol comments to assertions. 2016-04-17 13:44:58 -07:00
Jason Evans
b2c0d6322d Add witness, a simple online locking validator.
This resolves #358.
2016-04-14 02:09:28 -07:00
rustyx
00432331b8 Fix 64-to-32 conversion warnings in 32-bit mode 2016-04-12 09:34:09 -07:00
Jason Evans
e7642715ac Fix malloc_stats_print() to print correct opt.narenas value.
This regression was caused by 8f683b94a7
(Make opt_narenas unsigned rather than size_t.).
2016-04-11 18:47:18 -07:00
Jason Evans
245ae6036c Support --with-lg-page values larger than actual page size.
During over-allocation in preparation for creating aligned mappings,
allocate one more page than necessary if PAGE is the actual page size,
so that trimming still succeeds even if the system returns a mapping
that has less than PAGE alignment.  This allows compiling with e.g. 64
KiB "pages" on systems that actually use 4 KiB pages.

Note that for e.g. --with-lg-page=21, it is also necessary to increase
the chunk size (e.g. --with-malloc-conf=lg_chunk:22) so that there are
at least two "pages" per chunk.  In practice this isn't a particularly
compelling configuration because so much (unusable) virtual memory is
dedicated to chunk headers.
2016-04-11 02:35:00 -07:00
Jason Evans
c6a2c39404 Refactor/fix ph.
Refactor ph to support configurable comparison functions.  Use a cpp
macro code generation form equivalent to the rb macros so that pairing
heaps can be used for both run heaps and chunk heaps.

Remove per node parent pointers, and instead use leftmost siblings' prev
pointers to track parents.

Fix multi-pass sibling merging to iterate over intermediate results
using a FIFO, rather than a LIFO.  Use this fixed sibling merging
implementation for both merge phases of the auxiliary twopass algorithm
(first merging the aux list, then replacing the root with its merged
children).  This fixes both degenerate merge behavior and the potential
for deep recursion.

This regression was introduced by
6bafa6678f (Pairing heap).

This resolves #371.
2016-04-11 02:15:42 -07:00
Jason Evans
2ee2f1ec57 Reduce differences between alternative bitmap implementations. 2016-04-06 10:38:47 -07:00
Chris Peterson
a82070ef5f Add JEMALLOC_ALLOC_JUNK and JEMALLOC_FREE_JUNK macros
Replace hardcoded 0xa5 and 0x5a junk values with JEMALLOC_ALLOC_JUNK and
JEMALLOC_FREE_JUNK macros, respectively.
2016-03-31 11:23:29 -07:00
Jason Evans
f86bc081d6 Update a comment. 2016-03-31 11:19:46 -07:00
Jason Evans
ce7c0f999b Fix potential chunk leaks.
Move chunk_dalloc_arena()'s implementation into chunk_dalloc_wrapper(),
so that if the dalloc hook fails, proper decommit/purge/retain cascading
occurs.  This fixes three potential chunk leaks on OOM paths, one during
dss-based chunk allocation, one during chunk header commit (currently
relevant only on Windows), and one during rtree write (e.g. if rtree
node allocation fails).

Merge chunk_purge_arena() into chunk_purge_default() (refactor, no
change to functionality).
2016-03-30 18:36:04 -07:00
Chris Peterson
0bc716ae27 Fix -Wunreachable-code warning in malloc_vsnprintf().
Variables s and slen are declared inside a switch statement, but outside
a case scope. clang reports these variable definitions as "unreachable",
though this is not really meaningful in this case. This is the only
-Wunreachable-code warning in jemalloc.

src/util.c:501:5 [-Wunreachable-code] code will never be executed

This resolves #364.
2016-03-26 23:24:33 -07:00
Jason Evans
61a6dfcd5f Constify various internal arena APIs. 2016-03-23 16:15:42 -07:00
Jason Evans
f6bd2e5a17 Code formatting fixes. 2016-03-23 16:15:42 -07:00
Jason Evans
6c460ad91b Optimize rtree_get().
Specialize fast path to avoid code that cannot execute for dependent
loads.

Manually unroll.
2016-03-22 17:54:35 -07:00
Jason Evans
22af74e106 Refactor out signed/unsigned comparisons. 2016-03-15 09:40:02 -07:00
Jason Evans
613cdc80f6 Convert arena_bin_t's runs from a tree to a heap. 2016-03-08 13:48:27 -08:00
Dave Watson
4a0dbb5ac8 Use pairing heap for arena->runs_avail
Use pairing heap instead of red black tree in arena runs_avail.  The
extra links are unioned with the bitmap_t, so this change doesn't use
any extra memory.

Canaries show this change to be a 1% cpu win, and 2% latency win.  In
particular, large free()s, and small bin frees are now O(1) (barring
coalescing).

I also tested changing bin->runs to be a pairing heap, but saw a much
smaller win, and it would mean increasing the size of arena_run_s by two
pointers, so I left that as an rb-tree for now.
2016-03-08 13:48:27 -08:00
Dave Watson
6bafa6678f Pairing heap
Initial implementation of a twopass pairing heap with aux list.
Research papers linked in comments.

Where search/nsearch/last aren't needed, this gives much faster first(),
delete(), and insert().  Insert is O(1), and first/delete don't have to
walk the whole tree.

Also tested rb_old with parent pointers - it was better than the current
rb.h for memory loads, but still much worse than a pairing heap.

An array-based heap would be much faster if everything fits in memory,
but on a cold cache it has many more memory loads for most operations.
2016-03-08 13:46:19 -08:00
Jason Evans
022f6891fa Avoid a potential innocuous compiler warning.
Add a cast to avoid comparing a ssize_t value to a uint64_t value that
is always larger than a 32-bit ssize_t.  This silences an innocuous
compiler warning from e.g. gcc 4.2.1 about the comparison always having
the same result.
2016-03-02 22:45:37 -08:00
Dmitri Smirnov
33184bf698 Fix stack corruption and uninitialized var warning
Stack corruption happens in x64 bit

This resolves #347.
2016-02-29 15:22:53 -08:00
Jason Evans
39f58755a7 Fix a potential tsd cleanup leak.
Prior to 767d85061a (Refactor arenas array
(fixes deadlock).), it was possible under some circumstances for
arena_get() to trigger recreation of the arenas cache during tsd
cleanup, and the arenas cache would then be leaked.  In principle a
similar issue could still occur as a side effect of decay-based purging,
which calls arena_tdata_get().  Fix arenas_tdata_cleanup() by setting
tsd->arenas_tdata_bypass to true, so that arena_tdata_get() will
gracefully fail (an expected behavior) rather than recreating
tsd->arena_tdata.

Reported by Christopher Ferris <cferris@google.com>.
2016-02-27 21:18:15 -08:00
Jason Evans
3c07f803aa Fix stats.arenas.<i>.[...] for --disable-stats case.
Add missing stats.arenas.<i>.{dss,lg_dirty_mult,decay_time}
initialization.

Fix stats.arenas.<i>.{pactive,pdirty} to read under the protection of
the arena mutex.
2016-02-27 20:40:13 -08:00
Jason Evans
40ee9aa957 Fix stats.cactive accounting regression.
Fix stats.cactive accounting to always increase/decrease by multiples of
the chunk size, even for huge size classes that are not multiples of the
chunk size, e.g. {2.5, 3, 3.5, 5, 7} MiB with 2 MiB chunk size.  This
regression was introduced by 155bfa7da1
(Normalize size classes.) and first released in 4.0.0.

This resolves #336.
2016-02-27 15:35:52 -08:00
Jason Evans
3763d3b5f9 Refactor arena_cactive_update() into arena_cactive_{add,sub}().
This removes an implicit conversion from size_t to ssize_t.  For cactive
decreases, the size_t value was intentionally underflowed to generate
"negative" values (actually positive values above the positive range of
ssize_t), and the conversion to ssize_t was undefined according to C
language semantics.

This regression was perpetuated by
1522937e9c (Fix the cactive statistic.)
and first release in 4.0.0, which in retrospect only fixed one of two
problems introduced by aa5113b1fd
(Refactor overly large/complex functions) and first released in 3.5.0.
2016-02-26 17:29:35 -08:00
buchgr
d412624b25 Move retaining out of default chunk hooks
This fixes chunk allocation to reuse retained memory even if an
application-provided chunk allocation function is in use.

This resolves #307.
2016-02-26 15:24:13 -08:00
Dave Watson
b8823ab026 Use linear scan for small bitmaps
For small bitmaps, a linear scan of the bitmap is slightly faster than
a tree search - bitmap_t is more compact, and there are fewer writes
since we don't have to propogate state transitions up the tree.
On x86_64 with the current settings, I'm seeing ~.5%-1% CPU improvement
in production canaries with this change.

The old tree code is left since 32bit sizes are much larger (and ffsl
smaller), and maybe the run sizes will change in the future.

This resolves #339.
2016-02-26 14:21:10 -08:00
Jason Evans
01ecdf32d6 Miscellaneous bitmap refactoring. 2016-02-26 14:21:10 -08:00
Jason Evans
42ce80e15a Silence miscellaneous 64-to-32-bit data loss warnings.
This resolves #341.
2016-02-25 20:51:00 -08:00
Jason Evans
8282a2ad97 Remove a superfluous comment. 2016-02-25 16:44:48 -08:00
Jason Evans
9d2c10f2e8 Add more HUGE_MAXCLASS overflow checks.
Add HUGE_MAXCLASS overflow checks that are specific to heap profiling
code paths.  This fixes test failures that were introduced by
0c516a00c4 (Make *allocx() size class
overflow behavior defined.).
2016-02-25 16:42:15 -08:00
Jason Evans
0c516a00c4 Make *allocx() size class overflow behavior defined.
Limit supported size and alignment to HUGE_MAXCLASS, which in turn is
now limited to be less than PTRDIFF_MAX.

This resolves #278 and #295.
2016-02-25 15:29:49 -08:00
Jason Evans
767d85061a Refactor arenas array (fixes deadlock).
Refactor the arenas array, which contains pointers to all extant arenas,
such that it starts out as a sparse array of maximum size, and use
double-checked atomics-based reads as the basis for fast and simple
arena_get().  Additionally, reduce arenas_lock's role such that it only
protects against arena initalization races.  These changes remove the
possibility for arena lookups to trigger locking, which resolves at
least one known (fork-related) deadlock.

This resolves #315.
2016-02-24 23:58:10 -08:00
Dave Watson
3812729167 Fix arena_size computation.
Fix arena_size arena_new() computation to incorporate
runs_avail_nclasses elements for runs_avail, rather than
(runs_avail_nclasses - 1) elements.  Since offsetof(arena_t, runs_avail)
is used rather than sizeof(arena_t) for the first term of the
computation, all of the runs_avail elements must be added into the
second term.

This bug was introduced (by Jason Evans) while merging pull request #330
as 3417a304cc (Separate arena_avail
trees).
2016-02-24 20:10:02 -08:00
Dave Watson
cd86c1481a Fix arena_run_first_best_fit
Merge of 3417a304cc looks like a small
bug: first_best_fit doesn't scan through all the classes, since ind is
offset from runs_avail_nclasses by run_avail_bias.
2016-02-24 17:50:02 -08:00
Jason Evans
c7a9a6c86b Attempt mmap-based in-place huge reallocation.
Attempt mmap-based in-place huge reallocation by plumbing new_addr into
chunk_alloc_mmap().  This can dramatically speed up incremental huge
reallocation.

This resolves #335.
2016-02-24 17:23:18 -08:00
Jason Evans
ca8fffb5c1 Silence miscellaneous 64-to-32-bit data loss warnings. 2016-02-24 13:16:51 -08:00
Jason Evans
9e1810ca9d Silence miscellaneous 64-to-32-bit data loss warnings. 2016-02-24 13:03:48 -08:00
Jason Evans
0931cecbfa Use ssize_t for readlink() rather than int. 2016-02-24 13:03:48 -08:00
Jason Evans
8f683b94a7 Make opt_narenas unsigned rather than size_t. 2016-02-24 13:03:48 -08:00
Jason Evans
603b3bd413 Make nhbins unsigned rather than size_t. 2016-02-24 13:03:48 -08:00
Jason Evans
8dd5115ede Explicitly cast mib[] elements to unsigned where appropriate. 2016-02-24 13:03:48 -08:00
Jason Evans
9f4ee6034c Refactor jemalloc_ffs*() into ffs_*().
Use appropriate versions to resolve 64-to-32-bit data loss warnings.
2016-02-24 13:03:48 -08:00
Jason Evans
ae45142adc Collapse arena_avail_tree_* into arena_run_tree_*.
These tree types converged to become identical, yet they still had
independently generated red-black tree implementations.
2016-02-23 18:27:24 -08:00
Dave Watson
3417a304cc Separate arena_avail trees
Separate run trees by index, replacing the previous quantize logic.
Quantization by index is now performed only on insertion / removal from
the tree, and not on node comparison, saving some cpu.  This also means
we don't have to dereference the miscelm* pointers, saving half of the
memory loads from miscelms/mapbits that have fallen out of cache.  A
linear scan of the indicies appears to be fast enough.

The only cost of this is an extra tree array in each arena.
2016-02-23 18:09:36 -08:00
Jason Evans
0da8ce1e96 Use table lookup for run_quantize_{floor,ceil}().
Reduce run quantization overhead by generating lookup tables during
bootstrapping, and using the tables for all subsequent run quantization.
2016-02-22 16:47:34 -08:00
Jason Evans
08551eee58 Fix run_quantize_ceil().
In practice this bug had limited impact (and then only by increasing
chunk fragmentation) because run_quantize_ceil() returned correct
results except for inputs that could only arise from aligned allocation
requests that required more than page alignment.

This bug existed in the original run quantization implementation, which
was introduced by 8a03cf039c (Implement
cache index randomization for large allocations.).
2016-02-22 16:28:00 -08:00
Jason Evans
a9a4684792 Test run quantization.
Also rename run_quantize_*() to improve clarity.  These tests
demonstrate that run_quantize_ceil() is flawed.
2016-02-22 14:58:05 -08:00
Jason Evans
9bad079039 Refactor time_* into nstime_*.
Use a single uint64_t in nstime_t to store nanoseconds rather than using
struct timespec.  This reduces fragility around conversions between long
and uint64_t, especially missing casts that only cause problems on
32-bit platforms.
2016-02-21 21:39:05 -08:00
Jason Evans
788d29d397 Fix Windows-specific prof-related compilation portability issues. 2016-02-20 23:46:14 -08:00
Jason Evans
fd9cd7a6cc Fix time_update() to compile and work on MinGW. 2016-02-20 23:45:22 -08:00
rustyx
efbee86278 Prevent MSVC from optimizing away tls_callback (resolves #318) 2016-02-20 10:52:53 -08:00
rustyx
7f283980f0 getpid() fix for Win32 2016-02-20 10:52:53 -08:00
Jason Evans
243f7a0508 Implement decay-based unused dirty page purging.
This is an alternative to the existing ratio-based unused dirty page
purging, and is intended to eventually become the sole purging
mechanism.

Add mallctls:
- opt.purge
- opt.decay_time
- arena.<i>.decay
- arena.<i>.decay_time
- arenas.decay_time
- stats.arenas.<i>.decay_time

This resolves #325.
2016-02-19 20:56:21 -08:00
Jason Evans
1a4ad3c0fa Refactor out arena_compute_npurge().
Refactor out arena_compute_npurge() by integrating its logic into
arena_stash_dirty() as an incremental computation.
2016-02-19 20:32:37 -08:00
Jason Evans
db927b6727 Refactor arenas_cache tsd.
Refactor arenas_cache tsd into arenas_tdata, which is a structure of
type arena_tdata_t.
2016-02-19 20:32:37 -08:00
Jason Evans
4985dc681e Refactor arena_ralloc_no_move().
Refactor early return logic in arena_ralloc_no_move() to return early on
failure rather than on success.
2016-02-19 20:32:37 -08:00
Jason Evans
578cd16581 Refactor arena_malloc_hard() out of arena_malloc(). 2016-02-19 20:32:32 -08:00
Jason Evans
34676d3369 Refactor prng* from cpp macros into inline functions.
Remove 32-bit variant, convert prng64() to prng_lg_range(), and add
prng_range().
2016-02-19 20:29:06 -08:00
Jason Evans
c87ab25d18 Use ticker for incremental tcache GC. 2016-02-19 20:29:06 -08:00
Jason Evans
9998000b2b Implement ticker.
Implement ticker, which provides a simple API for ticking off some
number of events before indicating that the ticker has hit its limit.
2016-02-19 20:29:06 -08:00
Jason Evans
94451d184b Flesh out time_*() API. 2016-02-19 20:29:06 -08:00
Cameron Evans
e5d5a4a517 Add time_update(). 2016-02-19 20:29:06 -08:00
Jason Evans
f829009929 Add --with-malloc-conf.
Add --with-malloc-conf, which makes it possible to embed a default
options string during configuration.
2016-02-19 20:29:06 -08:00
Cosmin Paraschiv
9cb481a73f Call malloc_test_boot0() from malloc_init_hard_recursible().
When using LinuxThreads, malloc bootstrapping deadlocks, since
malloc_tsd_boot0() ends up calling pthread_setspecific(), which causes
recursive allocation.  Fix it by moving the malloc_tsd_boot0() call to
malloc_init_hard_recursible().

The deadlock was introduced by 8bb3198f72
(Refactor/fix arenas manipulation.), when tsd_boot() was split and the
top half, tsd_boot0(), got an extra tsd_wrapper_set() call.
2016-01-11 11:10:39 -08:00
Jason Evans
f9e3459f75 Tweak code to allow compilation of concatenated src/*.c sources.
This resolves #294.
2015-11-12 11:06:41 -08:00
Dmitry-Me
ea59ebf4d3 Reuse previously computed value 2015-11-12 10:45:49 -08:00
Qi Wang
f4a0f32d34 Fast-path improvement: reduce # of branches and unnecessary operations.
- Combine multiple runtime branches into a single malloc_slow check.
- Avoid calling arena_choose / size2index / index2size on fast path.
- A few micro optimizations.
2015-11-10 14:28:34 -08:00
Joshua Kahn
13b4015531 Allow const keys for lookup
Signed-off-by: Steve Dougherty <sdougherty@barracuda.com>

This resolves #281.
2015-11-09 15:48:05 -08:00
Mike Hommey
f97298bfc1 Remove arena_run_dalloc_decommit().
This resolves #284.
2015-11-09 15:38:30 -08:00
Jason Evans
a784e411f2 Fix a xallocx(..., MALLOCX_ZERO) bug.
Fix xallocx(..., MALLOCX_ZERO to zero the last full trailing page of
large allocations that have been randomly assigned an offset of 0 when
--enable-cache-oblivious configure option is enabled.  This addresses a
special case missed in d260f442ce (Fix
xallocx(..., MALLOCX_ZERO) bugs.).
2015-09-24 22:21:55 -07:00
Jason Evans
d36c7ebb00 Work around an NPTL-specific TSD issue.
Work around a potentially bad thread-specific data initialization
interaction with NPTL (glibc's pthreads implementation).

This resolves #283.
2015-09-24 16:53:18 -07:00
Jason Evans
d260f442ce Fix xallocx(..., MALLOCX_ZERO) bugs.
Zero all trailing bytes of large allocations when
--enable-cache-oblivious configure option is enabled.  This regression
was introduced by 8a03cf039c (Implement
cache index randomization for large allocations.).

Zero trailing bytes of huge allocations when resizing from/to a size
class that is not a multiple of the chunk size.
2015-09-24 16:38:45 -07:00
Jason Evans
fb64ec29ec Fix prof_tctx_dump_iter() to filter.
Fix prof_tctx_dump_iter() to filter out nodes that were created after
heap profile dumping started.  Prior to this fix, spurious entries with
arbitrary object/byte counts could appear in heap profiles, which
resulted in jeprof inaccuracies or failures.
2015-09-21 18:37:55 -07:00
Jason Evans
e56b24e3a2 Make arena_dalloc_large_locked_impl() static. 2015-09-20 09:58:10 -07:00
Jason Evans
21523297fc Add mallocx() OOM tests. 2015-09-17 15:27:28 -07:00
Jason Evans
3ca0cf6a68 Fix prof_alloc_rollback().
Fix prof_alloc_rollback() to read tdata from thread-specific data rather
than dereferencing a potentially invalid tctx.
2015-09-17 14:49:50 -07:00
Jason Evans
3263be6efb Simplify imallocx_prof_sample().
Simplify imallocx_prof_sample() to always operate on usize rather than
sometimes using size.  This avoids redundant usize computations and
more closely fits the style adopted by i[rx]allocx_prof_sample() to fix
sampling bugs.
2015-09-17 10:19:28 -07:00
Jason Evans
4be9c79f88 Fix irallocx_prof_sample().
Fix irallocx_prof_sample() to always allocate large regions, even when
alignment is non-zero.
2015-09-17 10:17:55 -07:00
Jason Evans
38e2c8fa9c Fix ixallocx_prof_sample().
Fix ixallocx_prof_sample() to never modify nor create sampled small
allocations.  xallocx() is in general incapable of moving small
allocations, so this fix removes buggy code without loss of generality.
2015-09-17 10:05:56 -07:00
Jason Evans
9a505b768c Centralize xallocx() size[+extra] overflow checks. 2015-09-15 14:39:58 -07:00
Dmitry-Me
78ae1ac486 Reduce variable scope.
This resolves #274.
2015-09-15 11:19:20 -07:00
Jason Evans
8c485b02a6 Fix ixallocx_prof() to check for size greater than HUGE_MAXCLASS. 2015-09-15 00:51:09 -07:00
Jason Evans
708ed79834 Resolve an unsupported special case in arena_prof_tctx_set().
Add arena_prof_tctx_reset() and use it instead of arena_prof_tctx_set()
when resetting the tctx pointer during reallocation, which happens
whenever an originally sampled reallocated object is not sampled during
reallocation.

This regression was introduced by
594c759f37 (Optimize
arena_prof_tctx_set().)
2015-09-14 23:57:58 -07:00
Jason Evans
23f6e103c8 Fix ixallocx_prof_sample() argument order reversal.
Fix ixallocx_prof() to pass usize_max and zero to ixallocx_prof_sample()
in the correct order.
2015-09-14 23:57:09 -07:00
Jason Evans
ce9a4e3479 s/max_usize/usize_max/g 2015-09-14 23:55:54 -07:00
Jason Evans
d9704042ee s/oldptr/old_ptr/g 2015-09-14 23:55:54 -07:00
Jason Evans
cec0d63d8b Make one call to prof_active_get_unlocked() per allocation event.
Make one call to prof_active_get_unlocked() per allocation event, and
use the result throughout the relevant functions that handle an
allocation event.  Also add a missing check in prof_realloc().  These
fixes protect allocation events against concurrent prof_active changes.
2015-09-14 23:55:48 -07:00
Jason Evans
ef363de701 Fix irealloc_prof() to prof_alloc_rollback() on OOM. 2015-09-14 23:54:42 -07:00
Jason Evans
46ff049128 Optimize irallocx_prof() to optimistically update the sampler state. 2015-09-14 22:47:18 -07:00
Jason Evans
4acb6c7ff3 Fix ixallocx_prof() size+extra overflow.
Fix ixallocx_prof() to clamp the extra parameter if size+extra would
overflow HUGE_MAXCLASS.
2015-09-14 22:47:12 -07:00
Jason Evans
676df88e48 Rename arena_maxclass to large_maxclass.
arena_maxclass is no longer an appropriate name, because arenas also
manage huge allocations.
2015-09-11 20:50:20 -07:00
Jason Evans
560a4e1e01 Fix xallocx() bugs.
Fix xallocx() bugs related to the 'extra' parameter when specified as
non-zero.
2015-09-11 20:40:34 -07:00
Jason Evans
a00b10735a Fix "prof.reset" mallctl-related corruption.
Fix heap profiling to distinguish among otherwise identical sample sites
with interposed resets (triggered via the "prof.reset" mallctl).  This
bug could cause data structure corruption that would most likely result
in a segfault.
2015-09-09 23:16:10 -07:00
Dmitry-Me
a306a60651 Reduce variables scope 2015-09-04 10:42:33 -07:00
Mike Hommey
0a116faf95 Force initialization of the init_lock in malloc_init_hard on Windows XP
This resolves #269.
2015-09-04 10:35:20 -07:00
Jason Evans
594c759f37 Optimize arena_prof_tctx_set().
Optimize arena_prof_tctx_set() to avoid reading run metadata when
deciding whether it's actually necessary to write.
2015-09-02 14:52:24 -07:00
Mike Hommey
4a2a3c9a6e Don't purge junk filled chunks when shrinking huge allocations
When junk filling is enabled, shrinking an allocation fills the bytes
that were previously allocated but now aren't. Purging the chunk before
doing that is just a waste of time.

This resolves #260.
2015-08-27 22:00:09 -07:00
Mike Hommey
6d8075f1e6 Fix chunk purge hook calls for in-place huge shrinking reallocation.
Fix chunk purge hook calls for in-place huge shrinking reallocation to
specify the old chunk size rather than the new chunk size.  This bug
caused no correctness issues for the default chunk purge function, but
was visible to custom functions set via the "arena.<i>.chunk_hooks"
mallctl.

This resolves #264.
2015-08-27 20:32:57 -07:00
Jason Evans
30949da601 Fix arenas_cache_cleanup() and arena_get_hard().
Fix arenas_cache_cleanup() and arena_get_hard() to handle
allocation/deallocation within the application's thread-specific data
cleanup functions even after arenas_cache is torn down.

This is a more general fix that complements
45e9f66c28 (Fix arenas_cache_cleanup().).
2015-08-27 20:32:35 -07:00
Christopher Ferris
45e9f66c28 Fix arenas_cache_cleanup().
Fix arenas_cache_cleanup() to handle allocation/deallocation within the
application's thread-specific data cleanup functions even after
arenas_cache is torn down.
2015-08-21 12:33:17 -07:00
Jason Evans
d01fd19755 Rename index_t to szind_t to avoid an existing type on Solaris.
This resolves #256.
2015-08-19 15:21:32 -07:00
Jason Evans
5ef33a9f2b Don't bitshift by negative amounts.
Don't bitshift by negative amounts when encoding/decoding run sizes in
chunk header maps.  This affected systems with page sizes greater than 8
KiB.

Reported by Ingvar Hagelund <ingvar@redpill-linpro.com>.
2015-08-19 14:16:30 -07:00
Jason Evans
56af64dc19 Fix a strict aliasing violation. 2015-08-12 16:38:20 -07:00
Jason Evans
6ed18cb348 Fix chunk_dalloc_arena() re: zeroing due to purge. 2015-08-12 15:20:34 -07:00
Jason Evans
03bf5b67be Try to decommit new chunks.
Always leave decommit disabled on non-Windows systems.
2015-08-12 10:26:54 -07:00
Jason Evans
1f27abc1b1 Refactor arena_mapbits_{small,large}_set() to not preserve unzeroed.
Fix arena_run_split_large_helper() to treat newly committed memory as
zeroed.
2015-08-11 16:45:47 -07:00
Jason Evans
45186f0c07 Refactor arena_mapbits unzeroed flag management.
Only set the unzeroed flag when initializing the entire mapbits entry,
rather than mutating just the unzeroed bit.  This simplifies the
possible mapbits state transitions.
2015-08-10 23:03:34 -07:00
Jason Evans
de249c8679 Arena chunk decommit cleanups and fixes.
Decommit arena chunk header during chunk deallocation if the rest of the
chunk is decommitted.
2015-08-10 17:13:59 -07:00
Jason Evans
8fadb1a8c2 Implement chunk hook support for page run commit/decommit.
Cascade from decommit to purge when purging unused dirty pages, so that
it is possible to decommit cleaned memory rather than just purging.  For
non-Windows debug builds, decommit runs rather than purging them, since
this causes access of deallocated runs to segfault.

This resolves #251.
2015-08-07 00:50:58 -07:00
Jason Evans
5716d97f75 Fix an in-place growing large reallocation regression.
Fix arena_ralloc_large_grow() to properly account for large_pad, so that
in-place large reallocation succeeds when possible, rather than always
failing.  This regression was introduced by
8a03cf039c (Implement cache index
randomization for large allocations.)
2015-08-06 23:45:45 -07:00
Matthijs
c1a6a51e40 MSVC compatibility changes
- Decorate public function with __declspec(allocator) and __declspec(restrict), just like MSVC 1900
- Support JEMALLOC_HAS_RESTRICT by defining the restrict keyword
- Move __declspec(nothrow) between 'void' and '*' so it compiles once more
2015-08-04 09:01:48 -07:00
Jason Evans
b49a334a64 Generalize chunk management hooks.
Add the "arena.<i>.chunk_hooks" mallctl, which replaces and expands on
the "arena.<i>.chunk.{alloc,dalloc,purge}" mallctls.  The chunk hooks
allow control over chunk allocation/deallocation, decommit/commit,
purging, and splitting/merging, such that the application can rely on
jemalloc's internal chunk caching and retaining functionality, yet
implement a variety of chunk management mechanisms and policies.

Merge the chunks_[sz]ad_{mmap,dss} red-black trees into
chunks_[sz]ad_retained.  This slightly reduces how hard jemalloc tries
to honor the dss precedence setting; prior to this change the precedence
setting was also consulted when recycling chunks.

Fix chunk purging.  Don't purge chunks in arena_purge_stashed(); instead
deallocate them in arena_unstash_purged(), so that the dirty memory
linkage remains valid until after the last time it is used.

This resolves #176 and #201.
2015-08-03 21:49:02 -07:00
Jason Evans
d059b9d6a1 Implement support for non-coalescing maps on MinGW.
- Do not reallocate huge objects in place if the number of backing
  chunks would change.
- Do not cache multi-chunk mappings.

This resolves #213.
2015-07-24 18:39:14 -07:00
Jason Evans
40cbd30d50 Fix huge_ralloc_no_move() to succeed more often.
Fix huge_ralloc_no_move() to succeed if an allocation request results in
the same usable size as the existing allocation, even if the request
size is smaller than the usable size.  This bug did not cause
correctness issues, but it could cause unnecessary moves during
reallocation.
2015-07-24 18:20:48 -07:00
Jason Evans
87ccb55547 Fix huge_palloc() to handle size rather than usize input.
huge_ralloc() passes a size that may not be precisely a size class, so
make huge_palloc() handle the more general case of a size input rather
than usize.

This regression appears to have been introduced by the addition of
in-place huge reallocation; as such it was never incorporated into a
release.
2015-07-23 17:18:49 -07:00
Jason Evans
50883deb6e Change arena_palloc_large() parameter from size to usize.
This change merely documents that arena_palloc_large() always receives
usize as its argument.
2015-07-23 17:13:18 -07:00
Jason Evans
5fae7dc1b3 Fix MinGW-related portability issues.
Create and use FMT* macros that are equivalent to the PRI* macros that
inttypes.h defines.  This allows uniform use of the Unix-specific format
specifiers, e.g. "%zu", as well as avoiding Windows-specific definitions
of e.g. PRIu64.

Add ffs()/ffsl() support for compiling with gcc.

Extract compatibility definitions of ENOENT, EINVAL, EAGAIN, EPERM,
ENOMEM, and ENORANGE into include/msvc_compat/windows_extra.h and
use the file for tests as well as for core jemalloc code.
2015-07-23 13:56:25 -07:00
Jason Evans
e42c309eba Add JEMALLOC_FORMAT_PRINTF().
Replace JEMALLOC_ATTR(format(printf, ...). with
JEMALLOC_FORMAT_PRINTF(), so that configuration feature tests can
omit the attribute if it would cause extraneous compilation warnings.
2015-07-22 15:44:47 -07:00
Jason Evans
00632609df Move JEMALLOC_NOTHROW just after return type.
Only use __declspec(nothrow) in C++ mode.

This resolves #244.
2015-07-21 08:21:13 -07:00
Mike Hommey
50cd636eed Remove JEMALLOC_ALLOC_SIZE annotations on functions not returning pointers
As per gcc documentation:
  The alloc_size attribute is used to tell the compiler that the function
  return value points to memory (...)

This resolves #245.
2015-07-21 09:16:07 +09:00
Jason Evans
f2bc85298c Add the config.cache_oblivious mallctl. 2015-07-17 16:38:25 -07:00
Jason Evans
aa2826621e Revert to first-best-fit run/chunk allocation.
This effectively reverts 97c04a9383 (Use
first-fit rather than first-best-fit run/chunk allocation.).  In some
pathological cases, first-fit search dominates allocation time, and it
also tends not to converge as readily on a steady state of memory
layout, since precise allocation order has a bigger effect than for
first-best-fit.
2015-07-15 17:15:19 -07:00
Jason Evans
ae93d6bf36 Avoid function prototype incompatibilities.
Add various function attributes to the exported functions to give the
compiler more information to work with during optimization, and also
specify throw() when compiling with C++ on Linux, in order to adequately
match what __THROW does in glibc.

This resolves #237.
2015-07-10 16:09:40 -07:00
Jason Evans
d508ec71eb Fix a variable declaration typo. 2015-07-07 20:28:22 -07:00
Jason Evans
b946086b08 Use jemalloc_ffs() rather than ffs(). 2015-07-07 20:16:25 -07:00
Jason Evans
0313607e66 Fix MinGW build warnings.
Conditionally define ENOENT, EINVAL, etc. (was unconditional).

Add/use PRIzu, PRIzd, and PRIzx for use in malloc_printf() calls.  gcc issued
(harmless) warnings since e.g. "%zu" should be "%Iu" on Windows, and the
alternative to this workaround would have been to disable the function
attributes which cause gcc to look for type mismatches in formatted printing
function calls.
2015-07-07 20:10:28 -07:00
Jason Evans
0dd3ad3841 Fix an assignment type warning for tls_callback. 2015-07-07 20:10:27 -07:00
Jason Evans
bce61d61bb Move a variable declaration closer to its use. 2015-07-07 09:32:05 -07:00
Matthijs
a1aaf949a5 Optimizations for Windows
- Set opt_lg_chunk based on run-time OS setting
- Verify LG_PAGE is compatible with run-time OS setting
- When targeting Windows Vista or newer, use SRWLOCK instead of CRITICAL_SECTION
- When targeting Windows Vista or newer, statically initialize init_lock
2015-06-25 22:53:58 +02:00
Jason Evans
241abc601b Fix size class overflow handling when profiling is enabled.
Fix size class overflow handling for malloc(), posix_memalign(),
memalign(), calloc(), and realloc() when profiling is enabled.

Remove an assertion that erroneously caused arena_sdalloc() to fail when
profiling was enabled.

This resolves #232.
2015-06-23 18:56:14 -07:00
Jason Evans
0a9f9a4d51 Convert arena_maybe_purge() recursion to iteration.
This resolves #235.
2015-06-22 18:50:58 -07:00
Jason Evans
dc0610a714 Add alignment assertions to public aligned allocation functions. 2015-06-22 18:48:58 -07:00
Jason Evans
4f6f2b131e Fix two valgrind integration regressions.
The regressions were never merged into the master branch.
2015-06-22 14:38:06 -07:00
Jason Evans
56048baeb4 Clarify relationship between stats.resident and stats.mapped. 2015-05-29 19:21:10 -07:00
Jason Evans
09983d2f54 Bypass tcache when draining quarantined allocations.
This avoids the potential surprise of deallocating an object with one
tcache specified, and having the object cached in a different tcache
once it drains from the quarantine.
2015-05-29 19:20:36 -07:00
Jason Evans
836bbe9951 Impose a minimum tcache count for small size classes.
Now that small allocation runs have fewer regions due to run metadata
residing in chunk headers, an explicit minimum tcache count is needed to
make sure that tcache adequately amortizes synchronization overhead.
2015-05-19 17:47:16 -07:00
Jason Evans
5154175cf1 Fix performance regression in arena_palloc().
Pass large allocation requests to arena_malloc() when possible.  This
regression was introduced by 155bfa7da1
(Normalize size classes.).
2015-05-19 17:42:31 -07:00
Jason Evans
5aa50a2834 Fix nhbins calculation.
This regression was introduced by
155bfa7da1 (Normalize size classes.).
2015-05-19 17:40:37 -07:00
Jason Evans
fd5f9e43c3 Avoid atomic operations for dependent rtree reads. 2015-05-15 17:02:30 -07:00
Jason Evans
8a03cf039c Implement cache index randomization for large allocations.
Extract szad size quantization into {extent,run}_quantize(), and .
quantize szad run sizes to the union of valid small region run sizes and
large run sizes.

Refactor iteration in arena_run_first_fit() to use
run_quantize{,_first,_next(), and add support for padded large runs.

For large allocations that have no specified alignment constraints,
compute a pseudo-random offset from the beginning of the first backing
page that is a multiple of the cache line size.  Under typical
configurations with 4-KiB pages and 64-byte cache lines this results in
a uniform distribution among 64 page boundary offsets.

Add the --disable-cache-oblivious option, primarily intended for
performance testing.

This resolves #13.
2015-05-06 13:27:39 -07:00
Jason Evans
7041720ac2 Rename pprof to jeprof.
This rename avoids installation collisions with the upstream gperftools.
Additionally, jemalloc's per thread heap profile functionality
introduced an incompatible file format, so it's now worthwhile to
clearly distinguish jemalloc's version of this script from the upstream
version.

This resolves #229.
2015-05-01 12:31:12 -07:00
Jason Evans
8e33c21d2d Prefer /proc/<pid>/task/<pid>/maps over /proc/<pid>/maps on Linux.
This resolves #227.
2015-05-01 09:03:20 -07:00
Igor Podlesny
95e88de0aa Concise JEMALLOC_HAVE_ISSETUGID case in secure_getenv(). 2015-04-30 11:48:56 -07:00
Jason Evans
65db63cf3f Fix in-place shrinking huge reallocation purging bugs.
Fix the shrinking case of huge_ralloc_no_move_similar() to purge the
correct number of pages, at the correct offset.  This regression was
introduced by 8d6a3e8321 (Implement
dynamic per arena control over dirty page purging.).

Fix huge_ralloc_no_move_shrink() to purge the correct number of pages.
This bug was introduced by 9673983443
(Purge/zero sub-chunk huge allocations as necessary.).
2015-03-25 19:10:06 -07:00
Jason Evans
562d266511 Add the "stats.arenas.<i>.lg_dirty_mult" mallctl. 2015-03-24 16:41:38 -07:00
Jason Evans
bd16ea49c3 Fix signed/unsigned comparison in arena_lg_dirty_mult_valid(). 2015-03-24 15:59:28 -07:00
Jason Evans
d324ca8933 Fix arena_get() usage.
Fix arena_get() calls that specify refresh_if_missing=false.  In
ctl_refresh() and ctl.c's arena_purge(), these calls attempted to only
refresh once, but did so in an unreliable way.
arena_i_lg_dirty_mult_ctl() was simply wrong to pass
refresh_if_missing=false.
2015-03-24 12:33:12 -07:00
Igor Podlesny
ef0a0cc328 We have pages_unmap(ret, size) so we use it. 2015-03-23 21:12:33 -07:00
Jason Evans
4acd75a694 Add the "stats.allocated" mallctl. 2015-03-23 17:26:53 -07:00
Qinfan Wu
fd5901ce30 Fix a compile error caused by mixed declarations and code. 2015-03-21 10:18:39 -07:00
Jason Evans
7e336e7359 Fix lg_dirty_mult-related stats printing.
This regression was introduced by
8d6a3e8321 (Implement dynamic per arena
control over dirty page purging.).

This resolves #215.
2015-03-20 18:08:10 -07:00
Jason Evans
e0a08a1496 Restore --enable-ivsalloc.
However, unlike before it was removed do not force --enable-ivsalloc
when Darwin zone allocator integration is enabled, since the zone
allocator code uses ivsalloc() regardless of whether
malloc_usable_size() and sallocx() do.

This resolves #211.
2015-03-18 21:06:58 -07:00
Jason Evans
8d6a3e8321 Implement dynamic per arena control over dirty page purging.
Add mallctls:
- arenas.lg_dirty_mult is initialized via opt.lg_dirty_mult, and can be
  modified to change the initial lg_dirty_mult setting for newly created
  arenas.
- arena.<i>.lg_dirty_mult controls an individual arena's dirty page
  purging threshold, and synchronously triggers any purging that may be
  necessary to maintain the constraint.
- arena.<i>.chunk.purge allows the per arena dirty page purging function
  to be replaced.

This resolves #93.
2015-03-18 18:55:33 -07:00
Jason Evans
04211e2266 Fix heap profiling regressions.
Remove the prof_tctx_state_destroying transitory state and instead add
the tctx_uid field, so that the tuple <thr_uid, tctx_uid> uniquely
identifies a tctx.  This assures that tctx's are well ordered even when
more than two with the same thr_uid coexist.  A previous attempted fix
based on prof_tctx_state_destroying was only sufficient for protecting
against two coexisting tctx's, but it also introduced a new dumping
race.

These regressions were introduced by
602c8e0971 (Implement per thread heap
profiling.) and 764b00023f (Fix a heap
profiling regression.).
2015-03-16 15:11:06 -07:00
Jason Evans
262146dfc4 Eliminate innocuous compiler warnings. 2015-03-14 14:34:16 -07:00
Jason Evans
764b00023f Fix a heap profiling regression.
Add the prof_tctx_state_destroying transitionary state to fix a race
between a thread destroying a tctx and another thread creating a new
equivalent tctx.

This regression was introduced by
602c8e0971 (Implement per thread heap
profiling.).
2015-03-14 14:01:35 -07:00
Mike Hommey
f69e2f6fda Use the error code given to buferror on Windows
a14bce85 made buferror not take an error code, and make the Windows
code path for buferror use GetLastError, while the alternative code
paths used errno. Then 2a83ed02 made buferror take an error code
again, and while it changed the non-Windows code paths to use that
error code, the Windows code path was not changed accordingly.
2015-03-13 13:54:02 -07:00
Jason Evans
d69964bd2d Fix a heap profiling regression.
Fix prof_tctx_comp() to incorporate tctx state into the comparison.
During a dump it is possible for both a purgatory tctx and an otherwise
equivalent nominal tctx to reside in the tree at the same time.

This regression was introduced by
602c8e0971 (Implement per thread heap
profiling.).
2015-03-12 16:25:18 -07:00
Jason Evans
fbd8d773ad Fix unsigned comparison underflow.
These bugs only affected tests and debug builds.
2015-03-11 23:14:50 -07:00
Jason Evans
bc45d41d23 Fix a declaration-after-statement regression. 2015-03-11 16:50:40 -07:00
Jason Evans
f5c8f37259 Normalize rdelm/rd structure field naming. 2015-03-10 18:29:49 -07:00
Jason Evans
38e42d311c Refactor dirty run linkage to reduce sizeof(extent_node_t). 2015-03-10 18:15:40 -07:00
Jason Evans
04ca7580db Fix a chunk_recycle() regression.
This regression was introduced by
97c04a9383 (Use first-fit rather than
first-best-fit run/chunk allocation.).
2015-03-06 23:25:13 -08:00
Jason Evans
97c04a9383 Use first-fit rather than first-best-fit run/chunk allocation.
This tends to more effectively pack active memory toward low addresses.
However, additional tree searches are required in many cases, so whether
this change stands the test of time will depend on real-world
benchmarks.
2015-03-06 20:21:41 -08:00
Jason Evans
5707d6f952 Quantize szad trees by size class.
Treat sizes that round down to the same size class as size-equivalent
in trees that are used to search for first best fit, so that there are
only as many "firsts" as there are size classes.  This comes closer to
the ideal of first fit.
2015-03-06 20:21:41 -08:00
Jason Evans
35e3fd9a63 Fix a compilation error and an incorrect assertion. 2015-02-18 16:51:51 -08:00
Jason Evans
99bd94fb65 Fix chunk cache races.
These regressions were introduced by
ee41ad409a (Integrate whole chunks into
unused dirty page purging machinery.).
2015-02-18 16:40:53 -08:00
Jason Evans
738e089a2e Rename "dirty chunks" to "cached chunks".
Rename "dirty chunks" to "cached chunks", in order to avoid overloading
the term "dirty".

Fix the regression caused by 339c2b23b2
(Fix chunk_unmap() to propagate dirty state.), and actually address what
that change attempted, which is to only purge chunks once, and propagate
whether zeroed pages resulted into chunk_record().
2015-02-18 01:15:50 -08:00
Jason Evans
339c2b23b2 Fix chunk_unmap() to propagate dirty state.
Fix chunk_unmap() to propagate whether a chunk is dirty, and modify
dirty chunk purging to record this information so it can be passed to
chunk_unmap().  Since the broken version of chunk_unmap() claimed that
all chunks were clean, this resulted in potential memory corruption for
purging implementations that do not zero (e.g. MADV_FREE).

This regression was introduced by
ee41ad409a (Integrate whole chunks into
unused dirty page purging machinery.).
2015-02-17 22:25:56 -08:00
Jason Evans
47701b22ee arena_chunk_dirty_node_init() --> extent_node_dirty_linkage_init() 2015-02-17 22:23:10 -08:00
Jason Evans
a4e1888d1a Simplify extent_node_t and add extent_node_init(). 2015-02-17 15:13:52 -08:00
Jason Evans
ee41ad409a Integrate whole chunks into unused dirty page purging machinery.
Extend per arena unused dirty page purging to manage unused dirty chunks
in aaddtion to unused dirty runs.  Rather than immediately unmapping
deallocated chunks (or purging them in the --disable-munmap case), store
them in a separate set of trees, chunks_[sz]ad_dirty.  Preferrentially
allocate dirty chunks.  When excessive unused dirty pages accumulate,
purge runs and chunks in ingegrated LRU order (and unmap chunks in the
--enable-munmap case).

Refactor extent_node_t to provide accessor functions.
2015-02-16 21:02:17 -08:00
Jason Evans
2195ba4e1f Normalize *_link and link_* fields to all be *_link. 2015-02-15 16:43:52 -08:00
Jason Evans
b01186cebd Remove redundant tcache_boot() call. 2015-02-15 14:04:55 -08:00
Jason Evans
41cfe03f39 If MALLOCX_ARENA(a) is specified, use it during tcache fill. 2015-02-13 15:28:56 -08:00
Jason Evans
88fef7ceda Refactor huge_*() calls into arena internals.
Make redirects to the huge_*() API the arena code's responsibility,
since arenas now take responsibility for all allocation sizes.
2015-02-12 14:06:37 -08:00
Daniel Micay
1eaf3b6f34 add missing check for new_addr chunk size
8ddc93293c switched this to over using the
address tree in order to avoid false negatives, so it now needs to check
that the size of the free extent is large enough to satisfy the request.
2015-02-12 15:46:30 -05:00
Jason Evans
cbf3a6d703 Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.

Add chunk node caching to arenas, in order to avoid contention on the
base allocator.

Use chunks_rtree to look up huge allocations rather than a red-black
tree.  Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).

Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled.  The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.

Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering.  These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.

Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 00:15:56 -08:00
Jason Evans
f30e261c5b Update ckh to support metadata allocation tracking. 2015-02-12 00:15:24 -08:00
Jason Evans
064dbfbaf7 Fix a regression in tcache_bin_flush_small().
Fix a serious regression in tcache_bin_flush_small() that was introduced
by 1cb181ed63 (Implement explicit tcache
support.).
2015-02-12 00:15:16 -08:00
Jason Evans
9e561e8d3f Test and fix tcache ID recycling. 2015-02-10 09:03:48 -08:00
Jason Evans
1cb181ed63 Implement explicit tcache support.
Add the MALLOCX_TCACHE() and MALLOCX_TCACHE_NONE macros, which can be
used in conjunction with the *allocx() API.

Add the tcache.create, tcache.flush, and tcache.destroy mallctls.

This resolves #145.
2015-02-09 17:44:48 -08:00
Jason Evans
8d0e04d42f Refactor rtree to be lock-free.
Recent huge allocation refactoring associates huge allocations with
arenas, but it remains necessary to quickly look up huge allocation
metadata during reallocation/deallocation.  A global radix tree remains
a good solution to this problem, but locking would have become the
primary bottleneck after (upcoming) migration of chunk management from
global to per arena data structures.

This lock-free implementation uses double-checked reads to traverse the
tree, so that in the steady state, each read or write requires only a
single atomic operation.

This implementation also assures that no more than two tree levels
actually exist, through a combination of careful virtual memory
allocation which makes large sparse nodes cheap, and skipping the root
node on x64 (possible because the top 16 bits are all 0 in practice).
2015-02-04 16:51:53 -08:00
Jason Evans
f500a10b2e Refactor base_alloc() to guarantee demand-zeroed memory.
Refactor base_alloc() to guarantee that allocations are carved from
demand-zeroed virtual memory.  This supports sparse data structures such
as multi-page radix tree nodes.

Enhance base_alloc() to keep track of fragments which were too small to
support previous allocation requests, and try to consume them during
subsequent requests.  This becomes important when request sizes commonly
approach or exceed the chunk size (as could radix tree node
allocations).
2015-02-04 16:51:53 -08:00
Jason Evans
8ddc93293c Fix chunk_recycle()'s new_addr functionality.
Fix chunk_recycle()'s new_addr functionality to search by address rather
than just size if new_addr is specified.  The functionality added by
a95018ee81 (Attempt to expand huge
allocations in-place.) only worked if the two search orders happened to
return the same results (e.g. in simple test cases).
2015-02-04 16:50:04 -08:00
Mike Hommey
6505733012 Make opt.lg_dirty_mult work as documented
The documentation for opt.lg_dirty_mult says:
    Per-arena minimum ratio (log base 2) of active to dirty
    pages.  Some dirty unused pages may be allowed to accumulate,
    within the limit set by the ratio (or one chunk worth of dirty
    pages, whichever is greater) (...)

The restriction in parentheses currently doesn't happen. This makes
jemalloc aggressively madvise(), which in turns increases the amount
of page faults significantly.

For instance, this resulted in several(!) hundred(!) milliseconds
startup regression on Firefox for Android.

This may require further tweaking, but starting with actually doing
what the documentation says is a good start.
2015-02-04 07:16:55 +09:00
Felix Janda
008267b9f6 util.c: strerror_r returns char* only on glibc 2015-02-03 18:58:02 +01:00
Jason Evans
5b8ed5b7c9 Implement the prof.gdump mallctl.
This feature makes it possible to toggle the gdump feature on/off during
program execution, whereas the the opt.prof_dump mallctl value can only
be set during program startup.

This resolves #72.
2015-01-25 21:21:35 -08:00
Jason Evans
0fd663e9c5 Avoid pointless chunk_recycle() call.
Avoid calling chunk_recycle() for mmap()ed chunks if config_munmap is
disabled, in which case there are never any recyclable chunks.

This resolves #164.
2015-01-25 17:31:24 -08:00
Sébastien Marie
eee27b2a38 huge_node_locked don't have to unlock huge_mtx
in src/huge.c, after each call of huge_node_locked(), huge_mtx is
already unlocked. don't unlock it twice (it is a undefined behaviour).
2015-01-25 15:12:28 +01:00
Jason Evans
4581b97809 Implement metadata statistics.
There are three categories of metadata:

- Base allocations are used for bootstrap-sensitive internal allocator
  data structures.
- Arena chunk headers comprise pages which track the states of the
  non-metadata pages.
- Internal allocations differ from application-originated allocations
  in that they are for internal use, and that they are omitted from heap
  profiles.

The metadata statistics comprise the metadata categories as follows:

- stats.metadata: All metadata -- base + arena chunk headers + internal
  allocations.
- stats.arenas.<i>.metadata.mapped: Arena chunk headers.
- stats.arenas.<i>.metadata.allocated: Internal allocations.  This is
  reported separately from the other metadata statistics because it
  overlaps with the allocated and active statistics, whereas the other
  metadata statistics do not.

Base allocations are not reported separately, though their magnitude can
be computed by subtracting the arena-specific metadata.

This resolves #163.
2015-01-23 23:34:43 -08:00
Guilherme Goncalves
ec98a44662 Use the correct type for opt.junk when printing stats. 2015-01-23 11:01:42 -02:00
Jason Evans
10aff3f3e1 Refactor bootstrapping to delay tsd initialization.
Refactor bootstrapping to delay tsd initialization, primarily to support
integration with FreeBSD's libc.

Refactor a0*() for internal-only use, and add the
bootstrap_{malloc,calloc,free}() API for use by FreeBSD's libc.  This
separation limits use of the a0*() functions to metadata allocation,
which doesn't require malloc/calloc/free API compatibility.

This resolves #170.
2015-01-22 14:04:27 -08:00
Jason Evans
bc96876f99 Fix arenas_cache_cleanup().
Fix arenas_cache_cleanup() to check whether arenas_cache is NULL before
deallocation, rather than checking arenas.
2015-01-22 14:02:56 -08:00
Jason Evans
44b57b8e8b Fix OOM handling in memalign() and valloc().
Fix memalign() and valloc() to heed imemalign()'s return value.

Reported by Kurt Wampler.
2015-01-16 18:04:17 -08:00
Jason Evans
24057f3da8 Fix an infinite recursion bug related to a0/tsd bootstrapping.
This resolves #184.
2015-01-14 16:27:31 -08:00
Guilherme Goncalves
9c6a8d3b0c Move variable declaration to the top its block for MSVC compatibility. 2014-12-17 14:46:35 -02:00
Guilherme Goncalves
2c5cb613df Introduce two new modes of junk filling: "alloc" and "free".
In addition to true/false, opt.junk can now be either "alloc" or "free",
giving applications the possibility of junking memory only on allocation
or deallocation.

This resolves #172.
2014-12-14 17:07:26 -08:00
Daniel Micay
b74041fb6e Ignore MALLOC_CONF in set{uid,gid,cap} binaries.
This eliminates the malloc tunables as tools for an attacker.

Closes #173
2014-12-14 15:36:15 -08:00
Jason Evans
e12eaf93dc Style and spelling fixes. 2014-12-08 16:34:04 -08:00
Jason Evans
1036ddbf11 Fix OOM cleanup in huge_palloc().
Fix OOM cleanup in huge_palloc() to call idalloct() rather than
base_node_dalloc().  This bug is a result of incomplete refactoring, and
has no impact other than leaking memory during OOM.
2014-12-04 16:42:42 -08:00
Daniel Micay
879e76a9e5 teach the dss chunk allocator to handle new_addr
This provides in-place expansion of huge allocations when the end of the
allocation is at the end of the sbrk heap. There's already the ability
to extend in-place via recycled chunks but this handles the initial
growth of the heap via repeated vector / string reallocations.

A possible future extension could allow realloc to go from the following:

    | huge allocation | recycled chunks |
                                        ^ dss_end

To a larger allocation built from recycled *and* new chunks:

    |                      huge allocation                      |
                                                                ^ dss_end

Doing that would involve teaching the chunk recycling code to request
new chunks to satisfy the request. The chunk_dss code wouldn't require
any further changes.

    #include <stdlib.h>

    int main(void) {
        size_t chunk = 4 * 1024 * 1024;
        void *ptr = NULL;
        for (size_t size = chunk; size < chunk * 128; size *= 2) {
            ptr = realloc(ptr, size);
            if (!ptr) return 1;
        }
    }

dss:secondary: 0.083s
dss:primary: 0.083s

After:

dss:secondary: 0.083s
dss:primary: 0.003s

The dss heap grows in the upwards direction, so the oldest chunks are at
the low addresses and they are used first. Linux prefers to grow the
mmap heap downwards, so the trick will not work in the *current* mmap
chunk allocator as a huge allocation will only be at the top of the heap
in a contrived case.
2014-11-28 16:11:19 -08:00
Jason Evans
d49cb68b9e Fix more pointer arithmetic undefined behavior.
Reported by Guilherme Gonçalves.

This resolves #166.
2014-11-17 10:31:59 -08:00
Jason Evans
2012d5a560 Fix pointer arithmetic undefined behavior.
Reported by Denis Denisov.
2014-11-17 09:54:49 -08:00
Jason Evans
9cf2be0a81 Make quarantine_init() static. 2014-11-07 14:50:38 -08:00
Jason Evans
c002a5c800 Fix two quarantine regressions.
Fix quarantine to actually update tsd when expanding, and to avoid
double initialization (leaking the first quarantine) due to recursive
initialization.

This resolves #161.
2014-11-04 18:03:11 -08:00
Jason Evans
2b2f6dc1e4 Disable arena_dirty_count() validation. 2014-11-01 02:29:10 -07:00
Jason Evans
82cb603ed7 Don't dereference NULL tdata in prof_{enter,leave}().
It is possible for the thread's tdata to be NULL late during thread
destruction, so take care not to dereference a NULL pointer in such
cases.
2014-11-01 00:20:28 -07:00
Daniel Micay
dc65213111 rm unused arena wrangling from xallocx
It has no use for the arena_t since unlike rallocx it never makes a new
memory allocation. It's just an unused parameter in ixalloc_helper.
2014-10-30 23:19:34 -07:00
Jason Evans
cfc5706f69 Miscellaneous cleanups. 2014-10-30 23:18:45 -07:00
Daniel Micay
d33f834591 avoid redundant chunk header reads
* use sized deallocation in iralloct_realign
* iralloc and ixalloc always need the old size, so pass it in from the
  caller where it's often already calculated
2014-10-30 17:06:38 -07:00
Daniel Micay
809b0ac391 mark huge allocations as unlikely
This cleans up the fast path a bit more by moving away more code.
2014-10-30 17:06:38 -07:00
Jason Evans
c93ed81cd0 Fix prof_{enter,leave}() calls to pass tdata_self. 2014-10-30 16:50:33 -07:00
Jason Evans
af1f592763 Use JEMALLOC_INLINE_C everywhere it's appropriate. 2014-10-30 16:38:08 -07:00
Jason Evans
8f47e3d82b Merge pull request #151 from thestinger/ralloc
use sized deallocation internally for ralloc
2014-10-16 13:12:05 -07:00
Daniel Micay
a9ea10d27c use sized deallocation internally for ralloc
The size of the source allocation is known at this point, so reading the
chunk header can be avoided for the small size class fast path. This is
not very useful right now, but it provides a significant performance
boost with an alternate ralloc entry point taking the old size.
2014-10-16 15:39:59 -04:00
Jason Evans
c83bccd273 Initialize chunks_mtx for all configurations.
This resolves #150.
2014-10-16 12:33:18 -07:00
Jason Evans
9673983443 Purge/zero sub-chunk huge allocations as necessary.
Purge trailing pages during shrinking huge reallocation when resulting
size is not a multiple of the chunk size.  Similarly, zero pages if
necessary during growing huge reallocation when the resulting size is
not a multiple of the chunk size.
2014-10-15 18:02:02 -07:00
Jason Evans
bf8d6a1092 Add small run utilization to stats output.
Add the 'util' column, which reports the proportion of available regions
that are currently in use for each small size class.  Small run
utilization is the complement of external fragmentation.  For example,
utilization of 0.75 indicates that 25% of small run memory is consumed
by external fragmentation, in other (more obtuse) words, 33% external
fragmentation overhead.

This resolves #27.
2014-10-15 16:18:42 -07:00
Jason Evans
9b41ac909f Fix huge allocation statistics. 2014-10-14 22:20:00 -07:00
Jason Evans
3c4d92e82a Add per size class huge allocation statistics.
Add per size class huge allocation statistics, and normalize various
stats:
- Change the arenas.nlruns type from size_t to unsigned.
- Add the arenas.nhchunks and arenas.hchunks.<i>.size mallctl's.
- Replace the stats.arenas.<i>.bins.<j>.allocated mallctl with
  stats.arenas.<i>.bins.<j>.curregs .
- Add the stats.arenas.<i>.hchunks.<j>.nmalloc,
  stats.arenas.<i>.hchunks.<j>.ndalloc,
  stats.arenas.<i>.hchunks.<j>.nrequests, and
  stats.arenas.<i>.hchunks.<j>.curhchunks mallctl's.
2014-10-12 23:02:10 -07:00
Jason Evans
44c97b712e Fix a prof_tctx_t/prof_tdata_t cleanup race.
Fix a prof_tctx_t/prof_tdata_t cleanup race by storing a copy of thr_uid
in prof_tctx_t, so that the associated tdata need not be present during
tctx teardown.
2014-10-12 13:03:20 -07:00
Jason Evans
381c23dd9d Remove arena_dalloc_bin_run() clean page preservation.
Remove code in arena_dalloc_bin_run() that preserved the "clean" state
of trailing clean pages by splitting them into a separate run during
deallocation.  This was a useful mechanism for reducing dirty page
churn when bin runs comprised many pages, but bin runs are now quite
small.

Remove the nextind field from arena_run_t now that it is no longer
needed, and change arena_run_t's bin field (arena_bin_t *) to binind
(index_t).  These two changes remove 8 bytes of chunk header overhead
per page, which saves 1/512 of all arena chunk memory.
2014-10-10 23:01:03 -07:00
Jason Evans
81e547566e Add --with-lg-tiny-min, generalize --with-lg-quantum. 2014-10-10 22:35:07 -07:00
Jason Evans
9b75677e53 Don't fetch tsd in a0{d,}alloc().
Don't fetch tsd in a0{d,}alloc(), because doing so can cause infinite
recursion on systems that require an allocated tsd wrapper.
2014-10-10 18:19:20 -07:00
Jason Evans
fc0b3b7383 Add configure options.
Add:
  --with-lg-page
  --with-lg-page-sizes
  --with-lg-size-class-group
  --with-lg-quantum

Get rid of STATIC_PAGE_SHIFT, in favor of directly setting LG_PAGE.

Fix various edge conditions exposed by the configure options.
2014-10-09 22:44:37 -07:00
Jason Evans
57efa7bb0e Avoid atexit(3) when possible, disable prof_final by default.
atexit(3) can deadlock internally during its own initialization if
jemalloc calls atexit() during jemalloc initialization.  Mitigate the
impact by restructuring prof initialization to avoid calling atexit()
unless the registered function will actually dump a final heap profile.

Additionally, disable prof_final by default so that this land mine is
opt-in rather than opt-out.

This resolves #144.
2014-10-08 18:08:00 -07:00
Jason Evans
3a8b9b1fd9 Fix a recursive lock acquisition regression.
Fix a recursive lock acquisition regression, which was introduced by
8bb3198f72 (Refactor/fix arenas
manipulation.).
2014-10-08 00:54:16 -07:00
Daniel Micay
f22214a29d Use regular arena allocation for huge tree nodes.
This avoids grabbing the base mutex, as a step towards fine-grained
locking for huge allocations. The thread cache also provides a tiny
(~3%) improvement for serial huge allocations.
2014-10-07 23:57:09 -07:00
Jason Evans
8bb3198f72 Refactor/fix arenas manipulation.
Abstract arenas access to use arena_get() (or a0get() where appropriate)
rather than directly reading e.g. arenas[ind].  Prior to the addition of
the arenas.extend mallctl, the worst possible outcome of directly
accessing arenas was a stale read, but arenas.extend may allocate and
assign a new array to arenas.

Add a tsd-based arenas_cache, which amortizes arenas reads.  This
introduces some subtle bootstrapping issues, with tsd_boot() now being
split into tsd_boot[01]() to support tsd wrapper allocation
bootstrapping, as well as an arenas_cache_bypass tsd variable which
dynamically terminates allocation of arenas_cache itself.

Promote a0malloc(), a0calloc(), and a0free() to be generally useful for
internal allocation, and use them in several places (more may be
appropriate).

Abstract arena->nthreads management and fix a missing decrement during
thread destruction (recent tsd refactoring left arenas_cleanup()
unused).

Change arena_choose() to propagate OOM, and handle OOM in all callers.
This is important for providing consistent allocation behavior when the
MALLOCX_ARENA() flag is being used.  Prior to this fix, it was possible
for an OOM to result in allocation silently allocating from a different
arena than the one specified.
2014-10-07 23:14:57 -07:00
Jason Evans
bf40641c5c Fix a prof_tctx_t destruction race. 2014-10-06 16:35:11 -07:00
Jason Evans
155bfa7da1 Normalize size classes.
Normalize size classes to use the same number of size classes per size
doubling (currently hard coded to 4), across the intire range of size
classes.  Small size classes already used this spacing, but in order to
support this change, additional small size classes now fill [4 KiB .. 16
KiB).  Large size classes range from [16 KiB .. 4 MiB).  Huge size
classes now support non-multiples of the chunk size in order to fill (4
MiB .. 16 MiB).
2014-10-06 01:45:13 -07:00
Daniel Micay
a95018ee81 Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.

It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).

Repeated vector reallocation micro-benchmark:

    #include <string.h>
    #include <stdlib.h>

    int main(void) {
        for (size_t i = 0; i < 100; i++) {
            void *ptr = NULL;
            size_t old_size = 0;
            for (size_t size = 4; size < (1 << 30); size *= 2) {
                ptr = realloc(ptr, size);
                if (!ptr) return 1;
                memset(ptr + old_size, 0xff, size - old_size);
                old_size = size;
            }
            free(ptr);
        }
    }

The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.

With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.

An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.

Numbers with transparent huge pages enabled:

glibc (copies elided via MREMAP_MAYMOVE): 8.471s

jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s

jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s

Numbers with transparent huge pages disabled:

glibc (copies elided via MREMAP_MAYMOVE): 15.403s

jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s

jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s

Closes #137
2014-10-05 14:47:01 -07:00
Jason Evans
f11a6776c7 Fix OOM-related regression in arena_tcache_fill_small().
Fix an OOM-related regression in arena_tcache_fill_small() that caused
cache corruption that would almost certainly expose the application to
undefined behavior, usually in the form of an allocation request
returning an already-allocated region, or somewhat less likely, a freed
region that had already been returned to the arena, thus making it
available to the arena for any purpose.

This regression was introduced by
9c43c13a35 (Reverse tcache fill order.),
and was present in all releases from 2.2.0 through 3.6.0.

This resolves #98.
2014-10-05 13:05:10 -07:00
Jason Evans
f04a0bef99 Fix prof regressions.
Fix prof regressions related to tdata (main per thread profiling data
structure) destruction:
- Deadlock.  The fix for this was intended to be part of
  20c31deaae (Test prof.reset mallctl and
  fix numerous discovered bugs.) but the fix was left incomplete.
- Destruction race.  Detaching tdata just prior to destruction without
  holding the tdatas lock made it possible for another thread to destroy
  the tdata out from under the thread that was on its way to doing so.
2014-10-04 15:03:49 -07:00
Jason Evans
0800afd03f Silence a compiler warning. 2014-10-04 14:59:17 -07:00
Jason Evans
029d44cf8b Fix tsd cleanup regressions.
Fix tsd cleanup regressions that were introduced in
5460aa6f66 (Convert all tsd variables to
reside in a single tsd structure.).  These regressions were twofold:

1) tsd_tryget() should never (and need never) return NULL.  Rename it to
   tsd_fetch() and simplify all callers.
2) tsd_*_set() must only be called when tsd is in the nominal state,
   because cleanup happens during the nominal-->purgatory transition,
   and re-initialization must not happen while in the purgatory state.
   Add tsd_nominal() and use it as needed.  Note that tsd_*{p,}_get()
   can still be used as long as no re-initialization that would require
   cleanup occurs.  This means that e.g. the thread_allocated counter
   can be updated unconditionally.
2014-10-04 11:22:55 -07:00
Jason Evans
fc12c0b8bc Implement/test/fix prof-related mallctl's.
Implement/test/fix the opt.prof_thread_active_init,
prof.thread_active_init, and thread.prof.active mallctl's.

Test/fix the thread.prof.name mallctl.

Refactor opt_prof_active to be read-only and move mutable state into the
prof_active variable.  Stop leaning on ctl-related locking for
protection.
2014-10-03 23:25:30 -07:00
Jason Evans
551ebc4364 Convert to uniform style: cond == false --> !cond 2014-10-03 10:16:09 -07:00
Jason Evans
20c31deaae Test prof.reset mallctl and fix numerous discovered bugs. 2014-10-02 23:01:10 -07:00
Daniel Micay
f8034540a1 Implement in-place huge allocation shrinking.
Trivial example:

    #include <stdlib.h>

    int main(void) {
        void *ptr = malloc(1024 * 1024 * 8);
        if (!ptr) return 1;
        ptr = realloc(ptr, 1024 * 1024 * 4);
        if (!ptr) return 1;
    }

Before:

    mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcfff000000
    mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcffec00000
    madvise(0x7fcfff000000, 8388608, MADV_DONTNEED) = 0

After:

    mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1934800000
    madvise(0x7f1934c00000, 4194304, MADV_DONTNEED) = 0

Closes #134
2014-10-01 16:55:03 -07:00
Dave Rigby
e3a16fce5e Mark malloc_conf as a weak symbol
This fixes issue #113 - je_malloc_conf is not respected on OS X
2014-09-29 15:05:55 -07:00
Jason Evans
0c5dd03e88 Move small run metadata into the arena chunk header.
Move small run metadata into the arena chunk header, with multiple
expected benefits:
- Lower run fragmentation due to reduced run sizes; runs are more likely
  to completely drain when there are fewer total regions.
- Improved cache behavior.  Prior to this change, run headers were
  always page-aligned, which put extra pressure on some CPU cache sets.
  The degree to which this was a problem was hardware dependent, but it
  likely hurt some even for the most advanced modern hardware.
- Buffer overruns/underruns are less likely to corrupt allocator
  metadata.
- Size classes between 4 KiB and 16 KiB become reasonable to support
  without any special handling, and the runs are small enough that dirty
  unused pages aren't a significant concern.
2014-09-29 01:31:39 -07:00
Jason Evans
f97e5ac4ec Implement compile-time bitmap size computation. 2014-09-28 14:43:11 -07:00
Jason Evans
6ef80d68f0 Fix profile dumping race.
Fix a race that caused a non-critical assertion failure.  To trigger the
race, a thread had to be part way through initializing a new sample,
such that it was discoverable by the dumping thread, but not yet linked
into its gctx by the time a later dump phase would normally have reset
its state to 'nominal'.

Additionally, lock access to the state field during modification to
transition to the dumping state.  It's not apparent that this oversight
could have caused an actual problem due to outer locking that protects
the dumping machinery, but the added locking pedantically follows the
stated locking protocol for the state field.
2014-09-24 22:23:43 -07:00
Jason Evans
5460aa6f66 Convert all tsd variables to reside in a single tsd structure. 2014-09-23 02:36:08 -07:00
Jason Evans
9d8f3d2033 Fix prof regressions.
Don't use atomic_add_uint64(), because it isn't available on 32-bit
platforms.

Fix forking support functions to manage all prof-related mutexes.

These regressions were introduced by
602c8e0971 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
2014-09-11 18:09:14 -07:00
Jason Evans
c3e9e7b041 Fix irallocx_prof() sample logic.
Fix irallocx_prof() sample logic to only update the threshold counter
after it knows what size the allocation ended up being.  This regression
was caused by 6e73dc194e (Fix a profile
sampling race.), which did not make it into any releases prior to this
fix.
2014-09-11 17:04:03 -07:00
Jason Evans
9c640bfdd4 Apply likely()/unlikely() to allocation/deallocation fast paths. 2014-09-11 17:01:58 -07:00
Jason Evans
91566fc079 Fix mallocx() to always honor MALLOCX_ARENA() when profiling. 2014-09-11 13:15:33 -07:00
Daniel Micay
23fdf8b359 mark some conditions as unlikely
* assertion failure
* malloc_init failure
* malloc not already initialized (in malloc_init)
* running in valgrind
* thread cache disabled at runtime

Clang and GCC already consider a comparison with NULL or -1 to be cold,
so many branches (out-of-memory) are already correctly considered as
cold and marking them is not important.
2014-09-10 21:49:42 -04:00
Jason Evans
6e73dc194e Fix a profile sampling race.
Fix a profile sampling race that was due to preparing to sample, yet
doing nothing to assure that the context remains valid until the stats
are updated.

These regressions were caused by
602c8e0971 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
2014-09-09 19:47:09 -07:00
Jason Evans
6fd53da030 Fix prof_tdata_get()-related regressions.
Fix prof_tdata_get() to avoid dereferencing an invalid tdata pointer
(when it's PROF_TDATA_STATE_{REINCARNATED,PURGATORY}).

Fix prof_tdata_get() callers to check for invalid results besides NULL
(PROF_TDATA_STATE_{REINCARNATED,PURGATORY}).

These regressions were caused by
602c8e0971 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
2014-09-09 15:29:34 -07:00
Jason Evans
a2260c95cd Fix sdallocx() assertion.
Refactor sdallocx() and nallocx() to share inallocx(), and fix an
sdallocx() assertion to check usize rather than size.
2014-09-09 10:39:15 -07:00
Daniel Micay
4cfe55166e Add support for sized deallocation.
This adds a new `sdallocx` function to the external API, allowing the
size to be passed by the caller.  It avoids some extra reads in the
thread cache fast path.  In the case where stats are enabled, this
avoids the work of calculating the size from the pointer.

An assertion validates the size that's passed in, so enabling debugging
will allow users of the API to debug cases where an incorrect size is
passed in.

The performance win for a contrived microbenchmark doing an allocation
and immediately freeing it is ~10%.  It may have a different impact on a
real workload.

Closes #28
2014-09-08 17:34:24 -07:00
Jason Evans
b718cf77e9 Optimize [nmd]alloc() fast paths.
Optimize [nmd]alloc() fast paths such that the (flags == 0) case is
streamlined, flags decoding only happens to the minimum degree
necessary, and no conditionals are repeated.
2014-09-07 14:40:19 -07:00
Jason Evans
c21b05ea09 Whitespace cleanups. 2014-09-04 22:27:26 -07:00
Qinfan Wu
ff6a31d3b9 Refactor chunk map.
Break the chunk map into two separate arrays, in order to improve cache
locality. This is related to issue #23.
2014-09-04 22:22:52 -07:00
Qinfan Wu
58799f6d1c Remove junk filling in tcache_bin_flush_small().
Junk filling is done in arena_dalloc_bin_locked(), so arena_alloc_junk_small()
is redundant. Also, we should use arena_dalloc_junk_small() instead of
arena_alloc_junk_small().
2014-08-26 21:28:31 -07:00