The option makes jemalloc use prctl with PR_SET_VMA to tag memory mappings with
"jemalloc_pg" or "jemalloc_pg_overcommit". This allows to easily identify
jemalloc's mappings in /proc/<pid>/maps. PR_SET_VMA is only available in Linux
5.17 and above.
Before this commit, in case FreeBSD libc jemalloc was overridden by another
jemalloc, proper thread shutdown callback was involved only for the overriding
jemalloc. A call to _malloc_thread_cleanup from libthr would be redirected to
user jemalloc, leaving data about dead threads hanging in system jemalloc. This
change tackles the issue in two ways. First, for current and old system
jemallocs, which we can not modify, the overriding jemalloc would locate and
invoke system cleanup routine. For upcoming jemalloc integrations, the cleanup
registering function will also be redirected to user jemalloc, which means that
system jemalloc's cleanup routine will be registered in user's jemalloc and a
single call to _malloc_thread_cleanup will be sufficient to invoke both
callbacks.
To avoid potential issues with removing unintended files after 'make
uninstall', spaces are no longer allowed in install suffix. It's worth
mentioning, that with GNU Make on Linux spaces in install suffix didn't
work anyway, leading to errors in the Makefile. But being verbose
about this restriction makes it more transparent for the developers.
On deallocation, sampled pointers (specially aligned) get junked and stashed
into tcache (to prevent immediate reuse). The expected behavior is to have
read-after-free corrupted and stopped by the junk-filling, while
write-after-free is checked when flushing the stashed pointers.
Adding guarded extents, which are regular extents surrounded by guard pages
(mprotected). To reduce syscalls, small guarded extents are cached as a
separate eset in ecache, and decay through the dirty / muzzy / retained pipeline
as usual.
Prior to the change you could specify --enable-prof-libunwind without
--enable-prof which would do effectively nothing. This was confusing as I
expected --enable-prof-libunwind to act like --enable-prof, but use libunwind.
When clang sees an unknown warning option, unlike gcc it doesn't fail the build
with error. It issues a warning. Hence JE_CFLAGS_ADD with warning options that
didnt't exist in clang would still mark those options as available. This led to
several warnings when built with clang or "gcc" on OSX. This change fixes those
warnings by simply making clang fail builds with non-existent warning options.
This hints to the compiler that it should care more about space than CPU (among
other things). In cases where the compiler lacks profile-guided information,
this can be a substantial space savings.
For now, we mark the mallctl or atexit driven profiling and stats functions that
take up the most space.
At least one libc (musl) defines pthread_setname_np without defining
pthread_getname_np. Detect the presence of each individually, rather than
inferring both must be defined if set is.
These are detected at configure time while they are glibc
specifics. the bionic equivalent is not api compatible
and dlopen is restricted in this platform.
Specify the maximum number of regions in a slab, which is
(<lg-page> - <lg-tiny-min>) by default. This increases the limit of slab sizes
specified by "slab_sizes" in malloc_conf. This should never be less than
the default value. The max value of this option is related to LG_BITMAP_MAXBITS
(see more in bitmap.h).
For example, on a 4k page size system, if we:
1) configure jemalloc with with --with-lg-slab-maxregs=12.
2) export MALLOC_CONF="slab_sizes:9-16:4"
The slab size of 16 bytes is set to 4 pages. Previously, the default
lg-slab-maxregs is 9 (i.e. 12 - 3). The max slab size of 16 bytes is 2 pages
(i.e. (1<<9) * 16 bytes). By increasing the value from 9 to 12, the max slab
size can be set by MALLOC_CONF is 16 pages (i.e. (1<<12) * 16 bytes).
The existing checks are good at finding such issues (on tcache flush), but not
so good at pinpointing them. Debug mode can find them, but sometimes debug mode
slows down a program so much that hard-to-hit bugs can take a long time to
crash.
This commit adds functionality to keep programs mostly on their fast paths,
while also checking every sized delete argument they get.
These simplify a lot of the bit_util module, which had grown bits and pieces of
this functionality across a variety of places over the years.
While we're here, kill off BIT_UTIL_INLINE and don't do reentrancy testing for
bit_util.
Summary:
Add support for C++17 over-aligned allocation:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0035r4.html
Supporting all 10 operators means we avoid thunking thru libstdc++-v3/libsupc++ and just call jemalloc directly.
It's also worth noting that there is now an aligned *and sized* operator delete:
```
void operator delete(void* ptr, std::size_t size, std::align_val_t al) noexcept;
```
If JeMalloc did not provide this, the default implementation would ignore the size parameter entirely:
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/libsupc%2B%2B/del_opsa.cc#L30-L33
(I must also update ax_cxx_compile_stdcxx.m4 to a newer version with C++17 support.)
Test Plan:
Wrote a simple test that allocates and then deletes an over-aligned type:
```
struct alignas(32) Foo {};
Foo *f;
int main()
{
f = new Foo;
delete f;
}
```
Before this change, both new and delete go thru PLT, and we end up calling regular old free:
```
(gdb) disassemble
Dump of assembler code for function main():
...
0x00000000004029b7 <+55>: call 0x4022d0 <_ZnwmSt11align_val_t@plt>
...
0x00000000004029d5 <+85>: call 0x4022e0 <_ZdlPvmSt11align_val_t@plt>
...
(gdb) s
free (ptr=0x7ffff6408020) at /home/engshare/third-party2/jemalloc/master/src/jemalloc.git-trunk/src/jemalloc.c:2842
2842 if (!free_fastpath(ptr, 0, false)) {
```
After this change, we directly call new/delete and ultimately call sdallocx:
```
(gdb) disassemble
Dump of assembler code for function main():
...
0x0000000000402b77 <+55>: call 0x496ca0 <operator new(unsigned long, std::align_val_t)>
...
0x0000000000402b95 <+85>: call 0x496e60 <operator delete(void*, unsigned long, std::align_val_t)>
...
(gdb) s
116 je_sdallocx_noflags(ptr, size);
```
Clang since r369414 (clang-10) can now check -Wimplicit-fallthrough for
C code, and use the GNU C style attribute to denote fallthrough.
Move the test from header only to autoconf. The previous test used
brittle version detection which did not work for newer clang that
supported this feature.
The attribute has to be its own statement, hence the added `;`. It also
can only precede case statements, so the final cases should be
explicitly terminated with break statements.
Fixes commit 3d29d11ac2 ("Clean compilation -Wextra")
Link: 1e0affb6e5
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>