Implement malloc_vsnprintf() (a subset of vsnprintf(3)) as well as
several other printing functions based on it, so that formatted printing
can be relied upon without concern for inducing a dependency on floating
point runtime support. Replace malloc_write() calls with
malloc_*printf() where doing so simplifies the code.
Add name mangling for library-private symbols in the data and BSS
sections. Adjust CONF_HANDLE_*() macros in malloc_conf_init() to expose
all opt_* variable use to cpp so that proper mangling occurs.
Remove the lg_tcache_gc_sweep option, because it is no longer
very useful. Prior to the addition of dynamic adjustment of tcache fill
count, it was possible for fill/flush overhead to be a problem, but this
problem no longer occurs.
Add the --with-mangling configure option, which can be used to specify
name mangling on a per public symbol basis that takes precedence over
--with-jemalloc-prefix.
Expose the memalign() and valloc() overrides even if
--with-jemalloc-prefix is specified. This change does no real harm, and
simplifies the code.
Add nallocm(), which computes the real allocation size that would result
from the corresponding allocm() call. nallocm() is a functional
superset of OS X's malloc_good_size(), in that it takes alignment
constraints into account.
When jemalloc is used as a libc malloc replacement (i.e. not prefixed),
some particular setups may end up inconsistently calling malloc from
libc and free from jemalloc, or the other way around.
glibc provides hooks to make its functions use alternative
implementations. Use them.
Submitted by Karl Tomlinson and Mike Hommey.
Do not enforce minimum alignment in memalign(). This is a non-standard
function, and there is disagreement over whether to enforce minimum
alignment. Solaris documentation (whence memalign() originated) says
that minimum alignment is required:
The value of alignment must be a power of two and must be greater than
or equal to the size of a word.
However, Linux's manual page says in its NOTES section:
memalign() may not check that the boundary parameter is correct.
This is descriptive rather than prescriptive, but applications with
bad assumptions about memalign() exist, so be as forgiving as possible.
Reported by Mike Hommey.
Program-generate small size class tables for all valid combinations of
LG_TINY_MIN, LG_QUANTUM, and PAGE_SHIFT. Use the appropriate table to generate
all relevant data structures, and remove the distinction between
tiny/quantum/cacheline/subpage bins.
Remove --enable-dynamic-page-shift. This option didn't prove useful in
practice, and it prevented optimizations.
Add Tilera architecture support.
Remove opt.lg_prof_bt_max, and hard code it to 7. The original
intention of this option was to enable faster backtracing by limiting
backtrace depth. However, this makes graphical pprof output very
difficult to interpret. In practice, decreasing sampling frequency is a
better mechanism for limiting profiling overhead.
Remove the opt.lg_prof_tcmax option and hard-code a cache size of 1024.
This setting is something that users just shouldn't have to worry about.
If lock contention actually ends up being a problem, the simple solution
available to the user is to reduce sampling frequency.
Fix an interaction between arena_dissociate_bin_run() and
arena_bin_lower_run() that made it possible for bin->runcur to point to
a run other than the lowest non-full run. This bug violated jemalloc's
layout policy, but did not affect correctness.
When tiny size class support was first added, it was intended to support
truly tiny size classes (even 2 bytes). However, this wasn't very
useful in practice, so the minimum tiny size class has been limited to
sizeof(void *) for a long time now. This is too small to be standards
compliant, but other commonly used malloc implementations do not even
bother using a 16-byte quantum on systems with vector units (SSE2+,
AltiVEC, etc.). As such, it is safe in practice to support an 8-byte
tiny size class on 64-bit systems that support 16-byte types.
Do not enable lazy locking by default, because:
- It's fragile (applications can subvert detection of multi-threaded
mode).
- Thread caching amortizes locking overhead in the default
configuration.
tcache_get() is inlined, so do the config_tcache check inside
tcache_get() and simplify its callers.
Make arena_malloc() an inline function, since it is part of the malloc()
fast path.
Remove conditional logic that cause build issues if --disable-tcache was
specified.
Remove structure magic, because 1) it is no longer conditional, and 2)
it stopped being very effective at detecting memory corruption several
years ago.
Convert configuration-related cpp conditional logic to use static
constant variables, e.g.:
#ifdef JEMALLOC_DEBUG
[...]
#endif
becomes:
if (config_debug) {
[...]
}
The advantage is clearer, more concise code. The main disadvantage is
that data structures no longer have conditionally defined fields, so
they pay the cost of all fields regardless of whether they are used. In
practice, this is only a minor concern; config_stats will go away in an
upcoming change, and config_prof is the only other major feature that
depends on more than a few special-purpose fields.
Fix the logic in stats_print() such that if the "a" flag is passed in
without the "m" flag, merged statistics will be printed even if only one
arena is initialized.
Fix huge_ralloc() to remove the old memory region from tree of huge
allocations *before* calling mremap(2), in order to make sure that no
other thread acquires the old memory region via mmap() and encounters
stale metadata in the tree.
Reported by: Rich Prohaska
Refactor the SO and REV such that they are set via autoconf variables,
@so@ and @rev@. These variables are both needed by the jemalloc.sh
script, so this unifies their definitions.
Fix prof_lookup() to artificially raise curobjs for all paths through
the code that creates a new entry in the per thread bt2cnt hash table.
This fixes a race condition that could corrupt memory if prof_accum were
false, and a non-default lg_prof_tcmax were used and/or threads were
destroyed.
Add a missing prof_malloc() call in allocm(). Before this fix, negative
object/byte counts could be observed in heap profiles for applications
that use allocm().
Rewrite prof_alloc_prep() as a cpp macro, PROF_ALLOC_PREP(), in order to
remove any doubt as to whether an additional stack frame is created.
Prior to this change, it was assumed that inlining would reduce the
total number of frames in the backtrace, but in practice behavior wasn't
completely predictable.
Create imemalign() and call it from posix_memalign(), memalign(), and
valloc(), so that all entry points require the same number of stack
frames to be ignored during backtracing.