Fix a Valgrind integration flaw that caused Valgrind warnings about
reads of uninitialized memory in internal zero-initialized data
structures (relevant to tcache and prof code).
Tighten valgrind integration such that immediately after memory is
validated or zeroed, valgrind is told to forget the memory's 'defined'
state. The only place newly allocated memory should be left marked as
'defined' is in the public functions (e.g. calloc() and realloc()).
Purge unused dirty pages in an order that first performs clean/dirty run
defragmentation, in order to mitigate available run fragmentation.
Remove the limitation that prevented purging unless at least one chunk
worth of dirty pages had accumulated in an arena. This limitation was
intended to avoid excessive purging for small applications, but the
threshold was arbitrary, and the effect of questionable utility.
Relax opt_lg_dirty_mult from 5 to 3. This compensates for increased
likelihood of allocating clean runs, given the same ratio of clean:dirty
runs, and reduces the potential for repeated purging in pathological
large malloc/free loops that push the active:dirty page ratio just over
the purge threshold.
Add the "arenas.extend" mallctl, so that it is possible to create new
arenas that are outside the set that jemalloc automatically multiplexes
threads onto.
Add the ALLOCM_ARENA() flag for {,r,d}allocm(), so that it is possible
to explicitly allocate from a particular arena.
Add the "opt.dss" mallctl, which controls the default precedence of dss
allocation relative to mmap allocation.
Add the "arena.<i>.dss" mallctl, which makes it possible to set the
default dss precedence on a per arena or global basis.
Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge".
Add the "stats.arenas.<i>.dss" mallctl.
mlockall(2) can cause purging via madvise(2) to fail. Fix purging code
to check whether madvise() succeeded, and base zeroed page metadata on
the result.
Reported by Olivier Lecomte.
Refactor code such that arena_mapbits_{large,small}_set() always
preserves the unzeroed flag, and manually manipulate the unzeroed flag
in the one case where it actually gets reset (in arena_chunk_purge()).
This fixes unzeroed preservation bugs in arena_run_split() and
arena_ralloc_large_grow(). These bugs caused large calloc() to return
non-zeroed memory under some circumstances.
Further optimize arena_salloc() to only look at the binind chunk map
bits in the common case.
Add more sanity checks to arena_salloc() that detect chunk map
inconsistencies for large allocations (whether due to allocator bugs or
application bugs).
Embed the bin index for small page runs into the chunk page map, in
order to omit [...] in the following dependent load sequence:
ptr-->mapelm-->[run-->bin-->]bin_info
Move various non-critcal code out of the inlined function chain into
helper functions (tcache_event_hard(), arena_dalloc_small(), and
locking).
Theses newly added macros will be used to implement the equivalent under
MSVC. Also, move the definitions to headers, where they make more sense,
and for some, are even more useful there (e.g. malloc).
MSVC doesn't support C99, and building as C++ to be able to use them is
dangerous, as C++ and C99 are incompatible.
Introduce a VARIABLE_ARRAY macro that either uses VLA when supported,
or alloca() otherwise. Note that using alloca() inside loops doesn't
quite work like VLAs, thus the use of VARIABLE_ARRAY there is discouraged.
It might be worth investigating ways to check whether VARIABLE_ARRAY is
used in such context at runtime in debug builds and bail out if that
happens.
Clean up a few config-related conditionals to avoid unnecessary
dependencies on prof symbols. Use cassert() rather than assert()
everywhere that it's appropriate.
Add a configure test to determine whether common mmap()/munmap()
patterns cause VM map holes, and only use munmap() to discard unused
chunks if the problem does not exist.
Unify the chunk caching for mmap and dss.
Fix options processing to limit lg_chunk to be large enough that
redzones will always fit.
Normalize arena_palloc(), chunk_alloc_mmap_slow(), and
chunk_recycle_dss() to use the same algorithm for trimming
over-allocation.
Add the ALIGNMENT_ADDR2BASE(), ALIGNMENT_ADDR2OFFSET(), and
ALIGNMENT_CEILING() macros, and use them where appropriate.
Remove the run_size_p parameter from sa2u().
Fix a potential deadlock in chunk_recycle_dss() that was introduced by
eae269036c (Add alignment support to
chunk_alloc()).
Implement Valgrind support, as well as the redzone and quarantine
features, which help Valgrind detect memory errors. Redzones are only
implemented for small objects because the changes necessary to support
redzones around large and huge objects are complicated by in-place
reallocation, to the point that it isn't clear that the maintenance
burden is worth the incremental improvement to Valgrind support.
Merge arena_salloc() and arena_salloc_demote().
Refactor i[v]salloc() to expose the 'demote' option.
s/PAGE_SHIFT/LG_PAGE/g and s/PAGE_SIZE/PAGE/g.
Remove remnants of the dynamic-page-shift code.
Rename the "arenas.pagesize" mallctl to "arenas.page".
Remove the "arenas.chunksize" mallctl, which is redundant with
"opt.lg_chunk".
Acquire/release arena bin locks as part of the prefork/postfork. This
bug made deadlock in the child between fork and exec a possibility.
Split jemalloc_postfork() into jemalloc_postfork_{parent,child}() so
that the child can reinitialize mutexes rather than unlocking them. In
practice, this bug tended not to cause problems.
Program-generate small size class tables for all valid combinations of
LG_TINY_MIN, LG_QUANTUM, and PAGE_SHIFT. Use the appropriate table to generate
all relevant data structures, and remove the distinction between
tiny/quantum/cacheline/subpage bins.
Remove --enable-dynamic-page-shift. This option didn't prove useful in
practice, and it prevented optimizations.
Add Tilera architecture support.
Fix an interaction between arena_dissociate_bin_run() and
arena_bin_lower_run() that made it possible for bin->runcur to point to
a run other than the lowest non-full run. This bug violated jemalloc's
layout policy, but did not affect correctness.
When tiny size class support was first added, it was intended to support
truly tiny size classes (even 2 bytes). However, this wasn't very
useful in practice, so the minimum tiny size class has been limited to
sizeof(void *) for a long time now. This is too small to be standards
compliant, but other commonly used malloc implementations do not even
bother using a 16-byte quantum on systems with vector units (SSE2+,
AltiVEC, etc.). As such, it is safe in practice to support an 8-byte
tiny size class on 64-bit systems that support 16-byte types.
tcache_get() is inlined, so do the config_tcache check inside
tcache_get() and simplify its callers.
Make arena_malloc() an inline function, since it is part of the malloc()
fast path.
Remove conditional logic that cause build issues if --disable-tcache was
specified.
Remove structure magic, because 1) it is no longer conditional, and 2)
it stopped being very effective at detecting memory corruption several
years ago.
Convert configuration-related cpp conditional logic to use static
constant variables, e.g.:
#ifdef JEMALLOC_DEBUG
[...]
#endif
becomes:
if (config_debug) {
[...]
}
The advantage is clearer, more concise code. The main disadvantage is
that data structures no longer have conditionally defined fields, so
they pay the cost of all fields regardless of whether they are used. In
practice, this is only a minor concern; config_stats will go away in an
upcoming change, and config_prof is the only other major feature that
depends on more than a few special-purpose fields.
Properly handle boundary conditions for sampled region promotion in
rallocm(). Prior to this fix, some combinations of 'size' and 'extra'
values could cause erroneous behavior. Additionally, size class
recording for promoted regions was incorrect.
Fix assertions in arena_purge() to accurately reflect the constraints in
arena_maybe_purge(). There were two bugs here, one of which merely
weakened the assertion, and the other of which referred to an
uninitialized variable (typo; used npurgatory instead of
arena->npurgatory).