Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
It often happens that code changes introduce mixed declarations, that then
break building with Visual Studio. Since the code style is to not use
mixed declarations anyways, we might as well enforce it with -Werror.
Add:
--with-lg-page
--with-lg-page-sizes
--with-lg-size-class-group
--with-lg-quantum
Get rid of STATIC_PAGE_SHIFT, in favor of directly setting LG_PAGE.
Fix various edge conditions exposed by the configure options.
Revert 6716aa8352 (Force use of TLS if
heap profiling is enabled.). No existing tests indicate that this is
necessary, nor does code inspection uncover any potential issues. Most
likely the original commit covered up a bug related to tsd-internal
allocation that has since been fixed.
It has an unused variable, so it was always failing (at least with gcc
4.9.1). Alternatively, the `-Werror` flag could be removed if it isn't
strictly necessary.
This adds a new `sdallocx` function to the external API, allowing the
size to be passed by the caller. It avoids some extra reads in the
thread cache fast path. In the case where stats are enabled, this
avoids the work of calculating the size from the pointer.
An assertion validates the size that's passed in, so enabling debugging
will allow users of the API to debug cases where an incorrect size is
passed in.
The performance win for a contrived microbenchmark doing an allocation
and immediately freeing it is ~10%. It may have a different impact on a
real workload.
Closes#28
Relax the "are we in a git repo?" check to succeed even if the top level
jemalloc directory is not at the top level of the git repo.
Add git tag filtering so that only version triplets match when
generating VERSION.
Add fallback bogus VERSION creation, so that in the worst case, rather
than generating empty values for e.g. JEMALLOC_VERSION_MAJOR,
configuration ends up generating useless constants.
__*_hook() is glibc, but on at least one glibc platform (homebrew),
the __GLIBC__ define isn't set correctly and we miss being able to
use these hooks.
Do a feature test for it during configuration so that we enable it
anywhere the hooks are actually available.
On Windows, srcroot may start with "drive:", which confuses autoconf's
AC_CONFIG_* macros. The macros works equally well without ${srcroot},
provided some adjustment to Makefile.in.
Now this code is more portable and now people can use faster shells than
Bash such as Dash.
To use a faster shell with autoconf set the CONFIG_SHELL environment
variable to the shell and run the configure script with the shell.
When building with -O0, GCC doesn't use builtins for ffs and ffsl calls,
and uses library function calls instead. But the Android NDK doesn't have
those functions exported from any library, leading to build failure.
However, using __builtin_ffs* uses the builtin inlines.
Some platforms, such as Google's Portable Native Client, use Newlib and
thus lack access to madvise(2). In those instances, pages_purge() is
transformed into a no-op.
Some platforms (like those using Newlib) don't have ffs/ffsl. This
commit adds a check to configure.ac for __builtin_ffsl if ffsl isn't
found. __builtin_ffsl performs the same function as ffsl, and has the
added benefit of being available on any platform utilizing
Gcc-compatible compiler.
This change does not address the used of ffs in the MALLOCX_ARENA()
macro.
Add size class computation capability, currently used only as validation
of the size class lookup tables. Generalize the size class spacing used
for bins, for eventual use throughout the full range of allocation
sizes.
Refactor huge allocation to be managed by arenas (though the global
red-black tree of huge allocations remains for lookup during
deallocation). This is the logical conclusion of recent changes that 1)
made per arena dss precedence apply to huge allocation, and 2) made it
possible to replace the per arena chunk allocation/deallocation
functions.
Remove the top level huge stats, and replace them with per arena huge
stats.
Normalize function names and types to *dalloc* (some were *dealloc*).
Remove the --enable-mremap option. As jemalloc currently operates, this
is a performace regression for some applications, but planned work to
logarithmically space huge size classes should provide similar amortized
performance. The motivation for this change was that mremap-based huge
reallocation forced leaky abstractions that prevented refactoring.
Make dss non-optional on all platforms which support sbrk(2).
Fix the "arena.<i>.dss" mallctl to return an error if "primary" or
"secondary" precedence is specified, but sbrk(2) is not supported.
Remove autoconf code that explicitly disabled libgcc-based backtracing
on i[3456]86. There is no mention of which platforms/compilers
exhibited problems when this code was added, and chances are good that
any gcc toolchain issues have long since been fixed.
Specify -fno-omit-frame-pointer when using __builtin_frame_address() and
__builtin_return_address() for backtracing. This fixes backtracing
failures on e.g. i686 for optimized builds.
The hash code, which has MurmurHash3 at its core, generates different
output depending on system endianness, so adapt the expected output on
big-endian systems. MurmurHash3 code also makes the assumption that
unaligned access is okay (not true on all systems), but jemalloc only
hashes data structures that have sufficient alignment to dodge this
limitation.
Make sure that emmintrin.h can be #include'd without causing a
compilation error, rather than blindly defining HAVE_SSE2 based on
architecture. Attempts to force SSE2 compilation on a 32-bit Ubuntu
13.10 system running as a VMware guest resulted in a no-win choice
without any obvious explanation besides toolchain misconfiguration/bug:
- Suffer compilation failure due to __MMX__, __SSE__, and __SSE2__ not
being defined, even if -mmmx, -msse, and -msse2 are manually
specified (note that they appear to be enabled by default).
- Manually define __MMX__, __SSE__, and __SSE2__, and suffer compiler
warnings that they are already automatically defined. This results in
successful compilation and execution, but the noise is intolerable.