Add the JEMALLOC_ALWAYS_INLINE_C macro and use it for always-inlined
functions declared in .c files. This fixes a function attribute
inconsistency for debug builds that resulted in (harmless) compiler
warnings about functions not being inlinable.
Reported by Ricardo Nabinger Sanchez.
Avoid writing to uninitialized TLS as a side effect of deallocation.
Initializing TLS during deallocation is unsafe because it is possible
that a thread never did any allocation, and that TLS has already been
deallocated by the threads library, resulting in write-after-free
corruption. These fixes affect prof_tdata and quarantine; all other
uses of TLS are already safe, whether intentionally (as for tcache) or
unintentionally (as for arenas).
Revert refactoring of opt_abort and opt_junk declarations. clang
accepts the config_*-based declarations (and generates correct code),
but gcc complains with:
error: initializer element is not constant
This ensures POLA on FreeBSD (at least) as free(3) is generally assumed
to not fiddle around with errno.
Signed-off-by: Garrett Cooper <yanegomi@gmail.com>
Modify processing of the lg_chunk option so that it clips an
out-of-range input to the edge of the valid range. This makes it
possible to request the minimum possible chunk size without intimate
knowledge of allocator internals.
Submitted by Ian Lepore (see FreeBSD PR bin/174641).
Add the "arenas.extend" mallctl, so that it is possible to create new
arenas that are outside the set that jemalloc automatically multiplexes
threads onto.
Add the ALLOCM_ARENA() flag for {,r,d}allocm(), so that it is possible
to explicitly allocate from a particular arena.
Add the "opt.dss" mallctl, which controls the default precedence of dss
allocation relative to mmap allocation.
Add the "arena.<i>.dss" mallctl, which makes it possible to set the
default dss precedence on a per arena or global basis.
Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge".
Add the "stats.arenas.<i>.dss" mallctl.
Fix mutex acquisition order inversion for the chunks rtree and the base
mutex. Chunks rtree acquisition was introduced by the previous commit,
so this bug was short-lived.
Add a library constructor for jemalloc that initializes the allocator.
This fixes a race that could occur if threads were created by the main
thread prior to any memory allocation, followed by fork(2), and then
memory allocation in the child process.
Fix the prefork/postfork functions to acquire/release the ctl, prof, and
rtree mutexes. This fixes various fork() child process deadlocks, but
one possible deadlock remains (intentionally) unaddressed: prof
backtracing can acquire runtime library mutexes, so deadlock is still
possible if heap profiling is enabled during fork(). This deadlock is
known to be a real issue in at least the case of libgcc-based
backtracing.
Reported by tfengjun.
Theses newly added macros will be used to implement the equivalent under
MSVC. Also, move the definitions to headers, where they make more sense,
and for some, are even more useful there (e.g. malloc).
Using errno on win32 doesn't quite work, because the value set in a shared
library can't be read from e.g. an executable calling the function setting
errno.
At the same time, since buferror always uses errno/GetLastError, don't pass
it.
Remove mmap_unaligned, which was used to heuristically decide whether to
optimistically call mmap() in such a way that could reduce the total
number of system calls. If I remember correctly, the intention of
mmap_unaligned was to avoid always executing the slow path in the
presence of ASLR. However, that reasoning seems to have been based on a
flawed understanding of how ASLR actually works. Although ASLR
apparently causes mmap() to ignore address requests, it does not cause
total placement randomness, so there is a reasonable expectation that
iterative mmap() calls will start returning chunk-aligned mappings once
the first chunk has been properly aligned.
Change the "opt.lg_prof_sample" default from 0 to 19 (1 B to 512 KiB).
Change the "opt.prof_accum" default from true to false.
Add the "opt.prof_final" mallctl, so that "opt.prof_prefix" need not be
abused to disable final profile dumping.
Add a configure test to determine whether common mmap()/munmap()
patterns cause VM map holes, and only use munmap() to discard unused
chunks if the problem does not exist.
Unify the chunk caching for mmap and dss.
Fix options processing to limit lg_chunk to be large enough that
redzones will always fit.
Always disable redzone by default, even when --enable-debug is
specified. The memory overhead for redzones can be substantial, which
makes this feature something that should only be opted into.
Normalize arena_palloc(), chunk_alloc_mmap_slow(), and
chunk_recycle_dss() to use the same algorithm for trimming
over-allocation.
Add the ALIGNMENT_ADDR2BASE(), ALIGNMENT_ADDR2OFFSET(), and
ALIGNMENT_CEILING() macros, and use them where appropriate.
Remove the run_size_p parameter from sa2u().
Fix a potential deadlock in chunk_recycle_dss() that was introduced by
eae269036c (Add alignment support to
chunk_alloc()).
Implement Valgrind support, as well as the redzone and quarantine
features, which help Valgrind detect memory errors. Redzones are only
implemented for small objects because the changes necessary to support
redzones around large and huge objects are complicated by in-place
reallocation, to the point that it isn't clear that the maintenance
burden is worth the incremental improvement to Valgrind support.
Merge arena_salloc() and arena_salloc_demote().
Refactor i[v]salloc() to expose the 'demote' option.
s/PAGE_SHIFT/LG_PAGE/g and s/PAGE_SIZE/PAGE/g.
Remove remnants of the dynamic-page-shift code.
Rename the "arenas.pagesize" mallctl to "arenas.page".
Remove the "arenas.chunksize" mallctl, which is redundant with
"opt.lg_chunk".
This reverts commit 96d4120ac0.
ivsalloc() depends on chunks_rtree being initialized. This can be
worked around via a NULL pointer check. However,
thread_allocated_tsd_get() also depends on initialization having
occurred, and there is no way to guard its call in free() that is
cheaper than checking whether ptr is NULL.
Generalize isalloc() to handle NULL pointers in such a way that the NULL
checking overhead is only paid when introspecting huge allocations (or
NULL). This allows free() and malloc_usable_size() to no longer check
for NULL.
Submitted by Igor Bukanov and Mike Hommey.
Remove code that validates malloc_vsnprintf() and malloc_strtoumax()
against their namesakes. The validation code has adequately served its
usefulness at this point, and it isn't worth dealing with the different
formatting for %p with glibc versus other implementations for NULL
pointers ("(nil)" vs. "0x0").
Reported by Mike Hommey.
Check for NULL ptr in malloc_usable_size(), rather than just asserting
that ptr is non-NULL. This matches behavior of other implementations
(e.g., glibc and tcmalloc).
Use FreeBSD-specific functions (_pthread_mutex_init_calloc_cb(),
_malloc_{pre,post}fork()) to avoid bootstrapping issues due to
allocation in libc and libthr.
Add malloc_strtoumax() and use it instead of strtoul(). Disable
validation code in malloc_vsnprintf() and malloc_strtoumax() until
jemalloc is initialized. This is necessary because locale
initialization causes allocation for both vsnprintf() and strtoumax().
Force the lazy-lock feature on in order to avoid pthread_self(),
because it causes allocation.
Use syscall(SYS_write, ...) rather than write(...), because libthr wraps
write() and causes allocation. Without this workaround, it would not be
possible to print error messages in malloc_conf_init() without
substantially reworking bootstrapping.
Fix choose_arena_hard() to look at how many threads are assigned to the
candidate choice, rather than checking whether the arena is
uninitialized. This bug potentially caused more arenas to be
initialized than necessary.