p_test_fail() was passing a va_list to two separate functions with the
expectation that no reset would occur. Refactor p_test_fail()'s callers
to instead format two strings and pass them to p_test_fail().
Add a missing parameter to an assert_u64_eq() call, which the compiler
warned about after the assertion macro refactoring.
This happens when it fails to allocate a new chunk. Which
arena_chunk_alloc then passes into arena_avail_insert without any
checks. This then causes a crash when arena_avail_insert tries
to check chunk->ndirty.
This was introduced by the refactoring of arena_chunk_alloc
which previously would have returned NULL immediately after
calling chunk_alloc. This is now the return from
arena_chunk_init_hard so we need to check that return, and
not continue if it was NULL.
Restore the essence of 898960247a, which
sabotages tail call optimization. This is necessary even when the
mutually recursive functions are in separate compilation units.
If mremap(2) is used for huge reallocation, physical pages are mapped to
new virtual addresses rather than data being copied to new pages. This
bypasses the normal junk filling that would happen during allocation, so
add junk filling that is specific to this case.
Make sure that emmintrin.h can be #include'd without causing a
compilation error, rather than blindly defining HAVE_SSE2 based on
architecture. Attempts to force SSE2 compilation on a 32-bit Ubuntu
13.10 system running as a VMware guest resulted in a no-win choice
without any obvious explanation besides toolchain misconfiguration/bug:
- Suffer compilation failure due to __MMX__, __SSE__, and __SSE2__ not
being defined, even if -mmmx, -msse, and -msse2 are manually
specified (note that they appear to be enabled by default).
- Manually define __MMX__, __SSE__, and __SSE2__, and suffer compiler
warnings that they are already automatically defined. This results in
successful compilation and execution, but the noise is intolerable.
Break prof_accum into multiple compilation units, in order to thwart
compiler optimizations such as inlining and tail call optimization that
would alter backtraces.
Add a cpp #define that removes 'restrict' keyword usage unless the
compiler definitely supports C99. As written, 'restrict' is only
enabled if the compiler supports the -std=gnu99 option (e.g. gcc and
llvm).
Reported by Tobias Hieta.
Remove the allocm() test equivalent to the mallocx() test removed in the
previous commit. The flawed test attempted to cause OOM due to large
request size and alignment constraint. Although this test "passed" on
64-bit systems due to the virtual memory hole, it could pass on some
32-bit systems.
Avoid copying "jeprof" to a 1-byte buffer within prof_boot0() when heap
profiling is disabled. Although this is dead code under such
conditions, the compiler doesn't figure that part out.
Reported by Eduardo Silva.
Fix/remove three related flawed tests that attempted to cause OOM due to
large request size and alignment constraint. Although these tests
"passed" on 64-bit systems due to the virtual memory hole, they could
pass on some 32-bit systems.
Re-structure alloc_[01](), which are mutually tail-recursive functions,
to do (unnecessary) work post-recursion so that the compiler cannot
perform tail call optimization, thus preserving intentionally unique
call paths in captured backtraces.
Fix stress tests such that testlib code uses the jet_ allocator, but
test code uses libjemalloc.
Generate jemalloc_{rename,mangle}.h, the former because it's needed for
the stress test name mangling fix, and the latter for consistency. As
an artifact of this change, some (but not all) definitions related to
the experimental API are absent from the headers unless the feature is
enabled at configure time.
Refactor prof_dump() to use a two pass algorithm, and prof_leave() prior
to the second pass. This avoids write(2) system calls while holding
critical prof resources.
Fix prof_dump() to close the dump file descriptor for all relevant error
paths.
Minimize the size of prof-related static buffers when prof is disabled.
This saves roughly 65 KiB of application memory for non-prof builds.
Refactor prof_ctx_init() out of prof_lookup_global().
Refactor overly large functions by breaking out helper functions.
Refactor overly complex multi-purpose functions into separate more
specific functions.
Extract profiling code from malloc(), imemalign(), calloc(), realloc(),
mallocx(), rallocx(), and xallocx(). This slightly reduces the amount
of code compiled into the fast paths, but the primary benefit is the
combinatorial complexity reduction.
Simplify iralloc[t]() by creating a separate ixalloc() that handles the
no-move cases.
Further simplify [mrxn]allocx() (and by implication [mrn]allocm()) to
make request size overflows due to size class and/or alignment
constraints trigger undefined behavior (detected by debug-only
assertions).
Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling
backtrace creation in imemalign(). This bug impacted posix_memalign()
and aligned_alloc().
Add unit tests for pow2_ceil(), malloc_strtoumax(), and
malloc_snprintf().
Fix numerous bugs in malloc_strotumax() error handling/reporting. These
bugs could have caused application-visible issues for some seldom used
(0X... and 0... prefixes) or malformed MALLOC_CONF or mallctl() argument
strings, but otherwise they had no impact.
Fix numerous bugs in malloc_snprintf(). These bugs were not exercised
by existing malloc_*printf() calls, so they had no impact.
Reduce rtree memory usage by storing booleans (1 byte each) rather than
pointers. The rtree code is only used to record whether jemalloc manages
a chunk of memory, so there's no need to store pointers in the rtree.
Increase rtree node size to 64 KiB in order to reduce tree depth from 13
to 3 on 64-bit systems. The conversion to more compact leaf nodes was
enough by itself to make the rtree depth 1 on 32-bit systems; due to the
fact that root nodes are smaller than the specified node size if
possible, the node size change has no impact on 32-bit systems (assuming
default chunk size).