Refactor permuted backtrace test allocation that was originally used
only by the prof_accum test, so that it can be used by other heap
profiling test binaries.
This adds a new `sdallocx` function to the external API, allowing the
size to be passed by the caller. It avoids some extra reads in the
thread cache fast path. In the case where stats are enabled, this
avoids the work of calculating the size from the pointer.
An assertion validates the size that's passed in, so enabling debugging
will allow users of the API to debug cases where an incorrect size is
passed in.
The performance win for a contrived microbenchmark doing an allocation
and immediately freeing it is ~10%. It may have a different impact on a
real workload.
Closes#28
Use nallocx() rather than mallctl() to trigger initialization, because
nallocx() has no side effects other than initialization, whereas
mallctl() does a bunch of internal memory allocation.
Refactor huge allocation to be managed by arenas (though the global
red-black tree of huge allocations remains for lookup during
deallocation). This is the logical conclusion of recent changes that 1)
made per arena dss precedence apply to huge allocation, and 2) made it
possible to replace the per arena chunk allocation/deallocation
functions.
Remove the top level huge stats, and replace them with per arena huge
stats.
Normalize function names and types to *dalloc* (some were *dealloc*).
Remove the --enable-mremap option. As jemalloc currently operates, this
is a performace regression for some applications, but planned work to
logarithmically space huge size classes should provide similar amortized
performance. The motivation for this change was that mremap-based huge
reallocation forced leaky abstractions that prevented refactoring.
Add new mallctl endpoints "arena<i>.chunk.alloc" and
"arena<i>.chunk.dealloc" to allow userspace to configure
jemalloc's chunk allocator and deallocator on a per-arena
basis.
Make dss non-optional on all platforms which support sbrk(2).
Fix the "arena.<i>.dss" mallctl to return an error if "primary" or
"secondary" precedence is specified, but sbrk(2) is not supported.
The hash code, which has MurmurHash3 at its core, generates different
output depending on system endianness, so adapt the expected output on
big-endian systems. MurmurHash3 code also makes the assumption that
unaligned access is okay (not true on all systems), but jemalloc only
hashes data structures that have sufficient alignment to dodge this
limitation.
Reduce maximum tested alignment from 2^29 to 2^25. Some systems may not
have enough contiguous virtual memory to satisfy the larger alignment,
but the smaller alignment is still adequate to test multi-chunk
alignment.
p_test_fail() was passing a va_list to two separate functions with the
expectation that no reset would occur. Refactor p_test_fail()'s callers
to instead format two strings and pass them to p_test_fail().
Add a missing parameter to an assert_u64_eq() call, which the compiler
warned about after the assertion macro refactoring.
Restore the essence of 898960247a, which
sabotages tail call optimization. This is necessary even when the
mutually recursive functions are in separate compilation units.
If mremap(2) is used for huge reallocation, physical pages are mapped to
new virtual addresses rather than data being copied to new pages. This
bypasses the normal junk filling that would happen during allocation, so
add junk filling that is specific to this case.
Break prof_accum into multiple compilation units, in order to thwart
compiler optimizations such as inlining and tail call optimization that
would alter backtraces.
Remove the allocm() test equivalent to the mallocx() test removed in the
previous commit. The flawed test attempted to cause OOM due to large
request size and alignment constraint. Although this test "passed" on
64-bit systems due to the virtual memory hole, it could pass on some
32-bit systems.
Fix/remove three related flawed tests that attempted to cause OOM due to
large request size and alignment constraint. Although these tests
"passed" on 64-bit systems due to the virtual memory hole, they could
pass on some 32-bit systems.
Re-structure alloc_[01](), which are mutually tail-recursive functions,
to do (unnecessary) work post-recursion so that the compiler cannot
perform tail call optimization, thus preserving intentionally unique
call paths in captured backtraces.
Fix stress tests such that testlib code uses the jet_ allocator, but
test code uses libjemalloc.
Generate jemalloc_{rename,mangle}.h, the former because it's needed for
the stress test name mangling fix, and the latter for consistency. As
an artifact of this change, some (but not all) definitions related to
the experimental API are absent from the headers unless the feature is
enabled at configure time.
Extract profiling code from malloc(), imemalign(), calloc(), realloc(),
mallocx(), rallocx(), and xallocx(). This slightly reduces the amount
of code compiled into the fast paths, but the primary benefit is the
combinatorial complexity reduction.
Simplify iralloc[t]() by creating a separate ixalloc() that handles the
no-move cases.
Further simplify [mrxn]allocx() (and by implication [mrn]allocm()) to
make request size overflows due to size class and/or alignment
constraints trigger undefined behavior (detected by debug-only
assertions).
Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling
backtrace creation in imemalign(). This bug impacted posix_memalign()
and aligned_alloc().
Add unit tests for pow2_ceil(), malloc_strtoumax(), and
malloc_snprintf().
Fix numerous bugs in malloc_strotumax() error handling/reporting. These
bugs could have caused application-visible issues for some seldom used
(0X... and 0... prefixes) or malformed MALLOC_CONF or mallctl() argument
strings, but otherwise they had no impact.
Fix numerous bugs in malloc_snprintf(). These bugs were not exercised
by existing malloc_*printf() calls, so they had no impact.
Reduce rtree memory usage by storing booleans (1 byte each) rather than
pointers. The rtree code is only used to record whether jemalloc manages
a chunk of memory, so there's no need to store pointers in the rtree.
Increase rtree node size to 64 KiB in order to reduce tree depth from 13
to 3 on 64-bit systems. The conversion to more compact leaf nodes was
enough by itself to make the rtree depth 1 on 32-bit systems; due to the
fact that root nodes are smaller than the specified node size if
possible, the node size change has no impact on 32-bit systems (assuming
default chunk size).
Verify that freed regions are quarantined, and that redzone corruption
is detected.
Introduce a testing idiom for intercepting/replacing internal functions.
In this case the replaced function is ordinarily a static function, but
the idiom should work similarly for library-private functions.
Move je_* definitions from jemalloc_macros.h.in to jemalloc_defs.h.in,
because only the latter is an autoconf header (#undef substitution
occurs).
Fix unit tests to use automatic mangling, so that e.g. mallocx is
macro-substituted to becom jet_mallocx.
Implement the *allocx() API, which is a successor to the *allocm() API.
The *allocx() functions are slightly simpler to use because they have
fewer parameters, they directly return the results of primary interest,
and mallocx()/rallocx() avoid the strict aliasing pitfall that
allocm()/rallocx() share with posix_memalign(). The following code
violates strict aliasing rules:
foo_t *foo;
allocm((void **)&foo, NULL, 42, 0);
whereas the following is safe:
foo_t *foo;
void *p;
allocm(&p, NULL, 42, 0);
foo = (foo_t *)p;
mallocx() does not have this problem:
foo_t *foo = (foo_t *)mallocx(42, 0);
Add mtx (mutex) to test infrastructure, in order to avoid bootstrapping
complications that would result from directly using malloc_mutex.
Rename test infrastructure's thread abstraction from je_thread to thd.
Fix some header ordering issues.
Refactor array declarations to remove some dubious casts.
Reduce array size to what is actually used.
Extract magic numbers into cpp macro definitions.
Add JEMALLOC_INLINE_C and use it instead of JEMALLOC_INLINE in .c files,
so that the annotated functions are always static.
Remove SFMT's inline-related macros and use jemalloc's instead, so that
there's no danger of interactions with jemalloc's definitions that
disable inlining for debug builds.
Add probabability distribution utility code that enables generation of
random deviates drawn from normal, Chi-square, and Gamma distributions.
Fix format strings in several of the assert_* macros (remove a %s).
Clean up header issues; it's critical that system headers are not
included after internal definitions potentially do things like:
#define inline
Fix the build system to incorporate header dependencies for the test
library C files.
Integrate the SIMD-oriented Fast Mersenne Twister (SFMT) 1.3.3 into the
test infrastructure.
The sfmt_t state encapsulation modification comes from Crux
(http://www.canonware.com/Crux/) and enables multiple
concurrent PRNGs.
test/unit/SFMT.c is an adaptation of SFMT's test.c that performs all the
same validation, both for 32- and 64-bit generation.
Refactor tests to use explicit testing assertions, rather than diff'ing
test output. This makes the test code a bit shorter, more explicitly
encodes testing intent, and makes test failure diagnosis more
straightforward.
Refactor the test harness to support three types of tests:
- unit: White box unit tests. These tests have full access to all
internal jemalloc library symbols. Though in actuality all symbols
are prefixed by jet_, macro-based name mangling abstracts this away
from test code.
- integration: Black box integration tests. These tests link with
the installable shared jemalloc library, and with the exception of
some utility code and configure-generated macro definitions, they have
no access to jemalloc internals.
- stress: Black box stress tests. These tests link with the installable
shared jemalloc library, as well as with an internal allocator with
symbols prefixed by jet_ (same as for unit tests) that can be used to
allocate data structures that are internal to the test code.
Move existing tests into test/{unit,integration}/ as appropriate.
Split out internal parts of jemalloc_defs.h.in and put them in
jemalloc_internal_defs.h.in. This reduces internals exposure to
applications that #include <jemalloc/jemalloc.h>.
Refactor jemalloc.h header generation so that a single header file
results, and the prototypes can be used to generate jet_ prototypes for
tests. Split jemalloc.h.in into multiple parts (jemalloc_defs.h.in,
jemalloc_macros.h.in, jemalloc_protos.h.in, jemalloc_mangle.h.in) and
use a shell script to generate a unified jemalloc.h at configure time.
Change the default private namespace prefix from "" to "je_".
Add missing private namespace mangling.
Remove hard-coded private_namespace.h. Instead generate it and
private_unnamespace.h from private_symbols.txt. Use similar logic for
public symbols, which aids in name mangling for jet_ symbols.
Add test_warn() and test_fail(). Replace existing exit(1) calls with
test_fail() calls.
Fix dangerous casts of int variables to pointers in thread join function
calls. On LP64 systems, int and pointers are different sizes, so
writes can corrupt memory.
Add the "arenas.extend" mallctl, so that it is possible to create new
arenas that are outside the set that jemalloc automatically multiplexes
threads onto.
Add the ALLOCM_ARENA() flag for {,r,d}allocm(), so that it is possible
to explicitly allocate from a particular arena.
Add the "opt.dss" mallctl, which controls the default precedence of dss
allocation relative to mmap allocation.
Add the "arena.<i>.dss" mallctl, which makes it possible to set the
default dss precedence on a per arena or global basis.
Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge".
Add the "stats.arenas.<i>.dss" mallctl.
Using errno on win32 doesn't quite work, because the value set in a shared
library can't be read from e.g. an executable calling the function setting
errno.
At the same time, since buferror always uses errno/GetLastError, don't pass
it.
MSVC doesn't support C99, and building as C++ to be able to use them is
dangerous, as C++ and C99 are incompatible.
Introduce a VARIABLE_ARRAY macro that either uses VLA when supported,
or alloca() otherwise. Note that using alloca() inside loops doesn't
quite work like VLAs, thus the use of VARIABLE_ARRAY there is discouraged.
It might be worth investigating ways to check whether VARIABLE_ARRAY is
used in such context at runtime in debug builds and bail out if that
happens.
Since we're now including jemalloc_internal.h, all the required headers
are already pulled. This will avoid having to fiddle with headers that can
or can't be used with MSVC. Also, now that we use malloc_printf, we can use
util.h's definition of assert instead of assert.h's.
This reverts commit 96d4120ac0.
ivsalloc() depends on chunks_rtree being initialized. This can be
worked around via a NULL pointer check. However,
thread_allocated_tsd_get() also depends on initialization having
occurred, and there is no way to guard its call in free() that is
cheaper than checking whether ptr is NULL.