Remove the lg_tcache_gc_sweep option, because it is no longer
very useful. Prior to the addition of dynamic adjustment of tcache fill
count, it was possible for fill/flush overhead to be a problem, but this
problem no longer occurs.
Add the --with-mangling configure option, which can be used to specify
name mangling on a per public symbol basis that takes precedence over
--with-jemalloc-prefix.
Expose the memalign() and valloc() overrides even if
--with-jemalloc-prefix is specified. This change does no real harm, and
simplifies the code.
Add nallocm(), which computes the real allocation size that would result
from the corresponding allocm() call. nallocm() is a functional
superset of OS X's malloc_good_size(), in that it takes alignment
constraints into account.
When jemalloc is used as a libc malloc replacement (i.e. not prefixed),
some particular setups may end up inconsistently calling malloc from
libc and free from jemalloc, or the other way around.
glibc provides hooks to make its functions use alternative
implementations. Use them.
Submitted by Karl Tomlinson and Mike Hommey.
Do not enforce minimum alignment in memalign(). This is a non-standard
function, and there is disagreement over whether to enforce minimum
alignment. Solaris documentation (whence memalign() originated) says
that minimum alignment is required:
The value of alignment must be a power of two and must be greater than
or equal to the size of a word.
However, Linux's manual page says in its NOTES section:
memalign() may not check that the boundary parameter is correct.
This is descriptive rather than prescriptive, but applications with
bad assumptions about memalign() exist, so be as forgiving as possible.
Reported by Mike Hommey.
Program-generate small size class tables for all valid combinations of
LG_TINY_MIN, LG_QUANTUM, and PAGE_SHIFT. Use the appropriate table to generate
all relevant data structures, and remove the distinction between
tiny/quantum/cacheline/subpage bins.
Remove --enable-dynamic-page-shift. This option didn't prove useful in
practice, and it prevented optimizations.
Add Tilera architecture support.
Remove opt.lg_prof_bt_max, and hard code it to 7. The original
intention of this option was to enable faster backtracing by limiting
backtrace depth. However, this makes graphical pprof output very
difficult to interpret. In practice, decreasing sampling frequency is a
better mechanism for limiting profiling overhead.
Remove the opt.lg_prof_tcmax option and hard-code a cache size of 1024.
This setting is something that users just shouldn't have to worry about.
If lock contention actually ends up being a problem, the simple solution
available to the user is to reduce sampling frequency.
Fix an interaction between arena_dissociate_bin_run() and
arena_bin_lower_run() that made it possible for bin->runcur to point to
a run other than the lowest non-full run. This bug violated jemalloc's
layout policy, but did not affect correctness.
When tiny size class support was first added, it was intended to support
truly tiny size classes (even 2 bytes). However, this wasn't very
useful in practice, so the minimum tiny size class has been limited to
sizeof(void *) for a long time now. This is too small to be standards
compliant, but other commonly used malloc implementations do not even
bother using a 16-byte quantum on systems with vector units (SSE2+,
AltiVEC, etc.). As such, it is safe in practice to support an 8-byte
tiny size class on 64-bit systems that support 16-byte types.
Do not enable lazy locking by default, because:
- It's fragile (applications can subvert detection of multi-threaded
mode).
- Thread caching amortizes locking overhead in the default
configuration.
tcache_get() is inlined, so do the config_tcache check inside
tcache_get() and simplify its callers.
Make arena_malloc() an inline function, since it is part of the malloc()
fast path.
Remove conditional logic that cause build issues if --disable-tcache was
specified.
Remove structure magic, because 1) it is no longer conditional, and 2)
it stopped being very effective at detecting memory corruption several
years ago.
Convert configuration-related cpp conditional logic to use static
constant variables, e.g.:
#ifdef JEMALLOC_DEBUG
[...]
#endif
becomes:
if (config_debug) {
[...]
}
The advantage is clearer, more concise code. The main disadvantage is
that data structures no longer have conditionally defined fields, so
they pay the cost of all fields regardless of whether they are used. In
practice, this is only a minor concern; config_stats will go away in an
upcoming change, and config_prof is the only other major feature that
depends on more than a few special-purpose fields.
Fix the logic in stats_print() such that if the "a" flag is passed in
without the "m" flag, merged statistics will be printed even if only one
arena is initialized.
Fix huge_ralloc() to remove the old memory region from tree of huge
allocations *before* calling mremap(2), in order to make sure that no
other thread acquires the old memory region via mmap() and encounters
stale metadata in the tree.
Reported by: Rich Prohaska
Fix the logic in stats_print() such that if the "a" flag is passed in
without the "m" flag, merged statistics will be printed even if only one
arena is initialized.
Fix huge_ralloc() to remove the old memory region from tree of huge
allocations *before* calling mremap(2), in order to make sure that no
other thread acquires the old memory region via mmap() and encounters
stale metadata in the tree.
Reported by: Rich Prohaska