Convert the man page source from roff to DocBook, and generate html and
roff output. Modify the build system such that the documentation can be
built as part of the release process, so that users need not have
DocBook tools installed.
Many mallctl*() end points require no locking, so push the locking down
to just the functions that need it. This is of particular import for
"thread.allocated" and "thread.deallocated", which are intended as a
low-overhead way to introspect per thread allocation activity.
Replace the single-character run-time flags with key/value pairs, which
can be set via the malloc_conf global, /etc/malloc.conf, and the
MALLOC_CONF environment variable.
Replace the JEMALLOC_PROF_PREFIX environment variable with the
"opt.prof_prefix" option.
Replace umax2s() with u2s().
Fix a regression due to the recent heap profiling accuracy improvements:
prof_{m,re}alloc() must set the object's profiling context regardless of
whether it is sampled.
Fix management of the CHUNK_MAP_CLASS chunk map bits, such that all
large object (re-)allocation paths correctly initialize the bits. Prior
to this fix, in-place realloc() cleared the bits, resulting in incorrect
reported object size from arena_salloc_demote(). After this fix the
non-demoted bit pattern is all zeros (instead of all ones), which makes
it easier to assure that the bits are properly set.
Inline the heap sampling code that is executed for every allocation
event (regardless of whether a sample is taken).
Combine all prof TLS data into a single data structure, in order to
reduce the TLS lookup volume.
Add the "thread.allocated" and "thread.deallocated" mallctls, which can
be used to query the total number of bytes ever allocated/deallocated by
the calling thread.
Add s2u() and sa2u(), which can be used to compute the usable size that
will result from an allocation request of a particular size/alignment.
Re-factor ipalloc() to use sa2u().
Enhance the heap profiler to trigger samples based on usable size,
rather than request size. This has a subtle, but important, impact on
the accuracy of heap sampling. For example, previous to this change,
16- and 17-byte objects were sampled at nearly the same rate, but
17-byte objects actually consume 32 bytes each. Therefore it was
possible for the sample to be somewhat skewed compared to actual memory
usage of the allocated objects.
Fix the newsize argument to arena_run_trim_tail() that
arena_dalloc_bin_run() passes. Previously, oldsize-newsize (i.e. the
complement) was passed, which could erroneously cause dirty pages to be
returned to the clean available runs tree. Prior to the
CHUNK_MAP_ZEROED --> CHUNK_MAP_UNZEROED conversion, this bug merely
caused dirty pages to be unaccounted for (and therefore never get
purged), but with CHUNK_MAP_UNZEROED, this could cause dirty pages to be
treated as zeroed (i.e. memory corruption).
Split arena_dissociate_bin_run() out of arena_dalloc_bin_run(), so that
arena_bin_malloc_hard() can avoid dissociation when recovering from
losing a race. This fixes a bug introduced by a recent attempted fix.
Fix a regression in arena_ralloc_large_grow() that was introduced by
recent fixes.
Move part of arena_bin_lower_run() into the callers, since the
conditions under which it should be called differ slightly between
callers.
Fix arena_chunk_purge() to omit run size in the last map entry for each
run it temporarily allocates.
In arena_ralloc_large_grow(), update the map element for the end of the
newly grown run, rather than the interior map element that was the
beginning of the appended run. This is a long-standing bug, and it had
the potential to cause massive corruption, but triggering it required
roughly the following sequence of events:
1) Large in-place growing realloc(), with left-over space in the run
that followed the large object.
2) Allocation of the remainder run left over from (1).
3) Deallocation of the remainder run *before* deallocation of the
large run, with unfortunate interior map state left over from
previous run allocation/deallocation activity, such that one or
more pages of allocated memory would be treated as part of the
remainder run during run coalescing.
In summary, this was a bad bug, but it was difficult to trigger.
In arena_bin_malloc_hard(), if another thread wins the race to allocate
a bin run, dispose of the spare run via arena_bin_lower_run() rather
than arena_run_dalloc(), since the run has already been prepared for use
as a bin run. This bug has existed since March 14, 2010:
e00572b384
mmap()/munmap() without arena->lock or bin->lock.
Fix bugs in arena_dalloc_bin_run(), arena_trim_head(),
arena_trim_tail(), and arena_ralloc_large_grow() that could cause the
CHUNK_MAP_UNZEROED map bit to become corrupted. These are all
long-standing bugs, but the chances of them actually causing problems
was much lower before the CHUNK_MAP_ZEROED --> CHUNK_MAP_UNZEROED
conversion.
Fix a large run statistics regression in arena_ralloc_large_grow() that
was introduced on September 17, 2010:
8e3c3c61b5
Add {,r,s,d}allocm().
Add debug code to validate that supposedly pre-zeroed memory really is.
Preserve CHUNK_MAP_UNZEROED when allocating small runs, because it is
possible that untouched pages will be returned to the tree of clean
runs, where the CHUNK_MAP_UNZEROED flag matters. Prior to the
conversion from CHUNK_MAP_ZEROED, this was already a bug, but in the
worst case extra zeroing occurred. After the conversion, this bug made
it possible to incorrectly treat pages as pre-zeroed.
Fix a regression added by revision:
3377ffa1f4
Change CHUNK_MAP_ZEROED to CHUNK_MAP_UNZEROED.
A modified chunk->map dereference was missing the subtraction of
map_bias, which caused incorrect chunk map initialization, as well as
potential corruption of the first non-header page of memory within each
chunk.
Re-organize code for --enable-prof-libgcc so that configure doesn't
report both libgcc and libunwind support as being configured in. This
change has no impact on how jemalloc is actually configured/built.
Add test/jemalloc_test.h.in, which is processed to include
jemalloc/jemalloc@install_suffix@.h, so that test programs can include
it without worrying about the install suffix.
Fix a bug in leak context count reporting that tended to cause the
number of contexts to be underreported. The reported number of leaked
objects and bytes were not affected by this bug.
Add the R option to control whether cumulative heap profile data
are maintained. Add the T option to control the size of per thread
backtrace caches, primarily because when the R option is specified,
backtraces that no longer have allocations associated with them are
discarded as soon as no thread caches refer to them.
Remove malloc_swap_enable(), which was obsoleted by the "swap.fds"
mallctl. The prototype for malloc_swap_enable() was removed from
jemalloc/jemalloc.h, but the function itself was accidentally left in
place.
Base dynamic structure size on offsetof(), rather than subtracting the
size of the dynamic structure member. Results could differ on systems
with strict data structure alignment requirements.
Invert the chunk map bit that tracks whether a page is zeroed, so that
for zeroed arena chunks, the interior of the page map does not need to
be initialized (as it consists entirely of zero bytes).
It is common to have to specify something like JEMALLOC_OPTIONS=F31i,
because interval-based dumps are often unuseful or too expensive.
Therefore, disable interval-based dumps by default. To get the previous
default behavior it is now necessary to specify 31I as part of the
options.
Use INT_MAX instead of MAX_INT in ALLOCM_ALIGN(), and #include
<limits.h> in order to get its definition.
Modify prof code related to hash tables to avoid aliasing warnings from
gcc 4.1.2 (gcc 4.4.0 and 4.4.3 do not warn).
Remove assertions that malloc_{pre,post}fork() are only called if
threading is enabled. This was true of these functions in the context
of FreeBSD's libc, but now the functions are called unconditionally as a
result of registering them with pthread_atfork().