Proposed fix for #1444 - ensure that `tls_callback` in the `#pragma comment(linker)`directive gets the same prefix added as it does i the C declaration.
In case of multithreaded fork, we want to leave the child in a reasonable state,
in which tsd_nominal_tsds is either empty or contains only the forking thread.
Before this commit jemalloc produced many warnings when compiled with -Wextra
with both Clang and GCC. This commit fixes the issues raised by these warnings
or suppresses them if they were spurious at least for the Clang and GCC
versions covered by CI.
This commit:
* adds `JEMALLOC_DIAGNOSTIC` macros: `JEMALLOC_DIAGNOSTIC_{PUSH,POP}` are
used to modify the stack of enabled diagnostics. The
`JEMALLOC_DIAGNOSTIC_IGNORE_...` macros are used to ignore a concrete
diagnostic.
* adds `JEMALLOC_FALLTHROUGH` macro to explicitly state that falling
through `case` labels in a `switch` statement is intended
* Removes all UNUSED annotations on function parameters. The warning
-Wunused-parameter is now disabled globally in
`jemalloc_internal_macros.h` for all translation units that include
that header. It is never re-enabled since that header cannot be
included by users.
* locally suppresses some -Wextra diagnostics:
* `-Wmissing-field-initializer` is buggy in older Clang and GCC versions,
where it does not understanding that, in C, `= {0}` is a common C idiom
to initialize a struct to zero
* `-Wtype-bounds` is suppressed in a particular situation where a generic
macro, used in multiple different places, compares an unsigned integer for
smaller than zero, which is always true.
* `-Walloc-larger-than-size=` diagnostics warn when an allocation function is
called with a size that is too large (out-of-range). These are suppressed in
the parts of the tests where `jemalloc` explicitly does this to test that the
allocation functions fail properly.
* adds a new CI build bot that runs the log unit test on CI.
Closes#1196 .
While working on #852, I noticed the prng state is atomic. This is the only
atomic use of prng in all of jemalloc. Instead, use a threadlocal prng
state if possible to avoid unnecessary cache line contention.
We use the minimal_initilized tsd (which requires no cleanup) for free()
specifically, if tsd hasn't been initialized yet.
Any other activity will transit the state from minimal to normal. This is to
workaround the case where a thread has no malloc calls in its lifetime until
during thread termination, free() happens after tls destructors.
This removes the tsd macros (which are used only for tsd_t in real builds). We
break up the circular dependencies involving tsd.
We also move all tsd access through getters and setters. This allows us to
assert that we only touch data when tsd is in a valid state.
We simplify the usages of the x macro trick, removing all the customizability
(get/set, init, cleanup), moving the lifetime logic to tsd_init and tsd_cleanup.
This lets us make initialization order independent of order within tsd_t.
Previously we had a general detection and support of reentrancy, at the cost of
having branches and inc / dec operations on fast paths. To avoid taxing fast
paths, we move the reentrancy operations onto tsd slow state, and only modify
reentrancy level around external calls (that might trigger reentrancy).
Added tsd_state_nominal_slow, which on fast path malloc() incorporates
tcache_enabled check, and on fast path free() bundles both malloc_slow and
tcache_enabled branches.
This is a biggy. jemalloc_internal.h has been doing multiple jobs for a while
now:
- The source of system-wide definitions.
- The catch-all include file.
- The module header file for jemalloc.c
This commit splits up this functionality. The system-wide definitions
responsibility has moved to jemalloc_preamble.h. The catch-all include file is
now jemalloc_internal_includes.h. The module headers for jemalloc.c are now in
jemalloc_internal_[externs|inlines|types].h, just as they are for the other
modules.
The embedded tcache is initialized upon tsd initialization. The avail arrays
for the tbins will be allocated / deallocated accordingly during init / cleanup.
With this change, the pointer to the auto tcache will always be available, as
long as we have access to the TSD. tcache_available() (called in tcache_get())
is provided to check if we should use tcache.
This will facilitate embedding tcache into tsd, which will require proper
initialization cannot be done via the static initializer. Make tsd->rtree_ctx
to be initialized via rtree_ctx_data_init().
This avoids a gcc diagnostic note:
note: The ABI for passing parameters with 64-byte alignment has
changed in GCC 4.6
This note related to the cacheline alignment of rtree_ctx_t, which was
introduced by 4a346f5593 (Replace rtree
path cache with LRU cache.).
There are three categories of metadata:
- Base allocations are used for bootstrap-sensitive internal allocator
data structures.
- Arena chunk headers comprise pages which track the states of the
non-metadata pages.
- Internal allocations differ from application-originated allocations
in that they are for internal use, and that they are omitted from heap
profiles.
The metadata statistics comprise the metadata categories as follows:
- stats.metadata: All metadata -- base + arena chunk headers + internal
allocations.
- stats.arenas.<i>.metadata.mapped: Arena chunk headers.
- stats.arenas.<i>.metadata.allocated: Internal allocations. This is
reported separately from the other metadata statistics because it
overlaps with the allocated and active statistics, whereas the other
metadata statistics do not.
Base allocations are not reported separately, though their magnitude can
be computed by subtracting the arena-specific metadata.
This resolves#163.
Refactor bootstrapping to delay tsd initialization, primarily to support
integration with FreeBSD's libc.
Refactor a0*() for internal-only use, and add the
bootstrap_{malloc,calloc,free}() API for use by FreeBSD's libc. This
separation limits use of the a0*() functions to metadata allocation,
which doesn't require malloc/calloc/free API compatibility.
This resolves#170.
Abstract arenas access to use arena_get() (or a0get() where appropriate)
rather than directly reading e.g. arenas[ind]. Prior to the addition of
the arenas.extend mallctl, the worst possible outcome of directly
accessing arenas was a stale read, but arenas.extend may allocate and
assign a new array to arenas.
Add a tsd-based arenas_cache, which amortizes arenas reads. This
introduces some subtle bootstrapping issues, with tsd_boot() now being
split into tsd_boot[01]() to support tsd wrapper allocation
bootstrapping, as well as an arenas_cache_bypass tsd variable which
dynamically terminates allocation of arenas_cache itself.
Promote a0malloc(), a0calloc(), and a0free() to be generally useful for
internal allocation, and use them in several places (more may be
appropriate).
Abstract arena->nthreads management and fix a missing decrement during
thread destruction (recent tsd refactoring left arenas_cleanup()
unused).
Change arena_choose() to propagate OOM, and handle OOM in all callers.
This is important for providing consistent allocation behavior when the
MALLOCX_ARENA() flag is being used. Prior to this fix, it was possible
for an OOM to result in allocation silently allocating from a different
arena than the one specified.
Fix tsd cleanup regressions that were introduced in
5460aa6f66 (Convert all tsd variables to
reside in a single tsd structure.). These regressions were twofold:
1) tsd_tryget() should never (and need never) return NULL. Rename it to
tsd_fetch() and simplify all callers.
2) tsd_*_set() must only be called when tsd is in the nominal state,
because cleanup happens during the nominal-->purgatory transition,
and re-initialization must not happen while in the purgatory state.
Add tsd_nominal() and use it as needed. Note that tsd_*{p,}_get()
can still be used as long as no re-initialization that would require
cleanup occurs. This means that e.g. the thread_allocated counter
can be updated unconditionally.
Implement the *allocx() API, which is a successor to the *allocm() API.
The *allocx() functions are slightly simpler to use because they have
fewer parameters, they directly return the results of primary interest,
and mallocx()/rallocx() avoid the strict aliasing pitfall that
allocm()/rallocx() share with posix_memalign(). The following code
violates strict aliasing rules:
foo_t *foo;
allocm((void **)&foo, NULL, 42, 0);
whereas the following is safe:
foo_t *foo;
void *p;
allocm(&p, NULL, 42, 0);
foo = (foo_t *)p;
mallocx() does not have this problem:
foo_t *foo = (foo_t *)mallocx(42, 0);
Fix malloc_tsd_dalloc() to bypass tcache when dallocating, so that there
is no danger of causing tcache reincarnation during thread exit.
Whether this infinite loop occurs depends on the pthreads TSD
implementation; it is known to occur on Solaris.
Submitted by Markus Eberspächer.
When using LinuxThreads pthread_setspecific triggers recursive
allocation on all threads. Work around this by creating a global linked
list of in-progress tsd initializations.
This modifies the _tsd_get_wrapper macro-generated function. When it has
to initialize an TSD object it will push the item to the linked list
first. If this causes a recursive allocation then the _get_wrapper
request is satisfied from the list. When pthread_setspecific returns the
item is removed from the list.
This effectively adds a very poor substitute for real TLS used only
during pthread_setspecific allocation recursion.
Signed-off-by: Crestez Dan Leonard <lcrestez@ixiacom.com>
Embed the bin index for small page runs into the chunk page map, in
order to omit [...] in the following dependent load sequence:
ptr-->mapelm-->[run-->bin-->]bin_info
Move various non-critcal code out of the inlined function chain into
helper functions (tcache_event_hard(), arena_dalloc_small(), and
locking).