Commit Graph

80 Commits

Author SHA1 Message Date
Hao Liu
1541ffc765 configure: add --with-lg-slab-maxregs configure option.
Specify the maximum number of regions in a slab, which is
(<lg-page> - <lg-tiny-min>) by default. This increases the limit of slab sizes
specified by "slab_sizes" in malloc_conf. This should never be less than
the default value. The max value of this option is related to LG_BITMAP_MAXBITS
(see more in bitmap.h).

For example, on a 4k page size system, if we:
  1) configure jemalloc with with --with-lg-slab-maxregs=12.
  2) export MALLOC_CONF="slab_sizes:9-16:4"
The slab size of 16 bytes is set to 4 pages. Previously, the default
lg-slab-maxregs is 9 (i.e. 12 - 3). The max slab size of 16 bytes is 2 pages
(i.e. (1<<9) * 16 bytes). By increasing the value from 9 to 12, the max slab
size can be set by MALLOC_CONF is 16 pages (i.e. (1<<12) * 16 bytes).
2020-09-16 13:58:38 -07:00
David Goldblatt
eaed1e39be Add sized-delete size-checking functionality.
The existing checks are good at finding such issues (on tcache flush), but not
so good at pinpointing them.  Debug mode can find them, but sometimes debug mode
slows down a program so much that hard-to-hit bugs can take a long time to
crash.

This commit adds functionality to keep programs mostly on their fast paths,
while also checking every sized delete argument they get.
2020-08-05 19:34:05 -07:00
David Carlier
00f06c9beb enabling mpss on solaris/illumos.
reusing slighty linux configuration as possible, aligning the
 address range to HUGEPAGE.
2020-07-06 09:59:10 -07:00
Jon Haslam
4aea743279 High Resolution Timestamps for Profiling 2020-06-15 12:12:49 -07:00
David Goldblatt
f4d24f05e1 Move extra size checks behind a config flag.
This will let us turn that flag into a generic "turn on runtime checks" flag
that guards other functionality we have planned.
2019-04-15 16:48:12 -07:00
Qi Wang
f6c30cbafa Remove some unused comments. 2019-03-14 17:34:55 -07:00
Qi Wang
06f0850427 Detect if 8-bit atomics are available.
In some rare cases (older compiler, e.g. gcc 4.2 w/ MIPS), 8-bit atomics might
be unavailable.  Detect such cases so that we can workaround.
2019-03-09 12:52:06 -08:00
Jason Evans
775fe302a7 Remove JE_FORCE_SYNC_COMPARE_AND_SWAP_[48].
These macros have been unused since
d4ac7582f3 (Introduce a backport of C11
atomics).
2019-02-22 14:22:16 -08:00
Qi Wang
e13400c919 Sanity check szind on tcache flush.
This adds some overhead to the tcache flush path (which is one of the
popular paths).  Guard it behind a config option.
2019-02-01 12:31:34 -08:00
Qi Wang
43f3b1ad0c Deprecate OSSpinLock. 2018-11-14 08:44:05 -08:00
Dave Watson
13c237c7ef Add a fastpath for arena_slab_reg_alloc_batch
Also adds a configure.ac check for __builtin_popcount, which is used
in the new fastpath.
2018-11-14 07:09:11 -08:00
gnzlbg
08260a6b94 Add experimental API: smallocx_return_t smallocx(size, flags)
---

Motivation:

This new experimental memory-allocaction API returns a pointer to
the allocation as well as the usable size of the allocated memory
region.

The `s` in `smallocx` stands for `sized`-`mallocx`, attempting to
convey that this API returns the size of the allocated memory region.

It should allow C++ P0901r0 [0] and Rust Alloc::alloc_excess to make
use of it.

The main purpose of these APIs is to improve telemetry. It is more accurate
to register `smallocx(size, flags)` than `smallocx(nallocx(size), flags)`,
for example. The latter will always line up perfectly with the existing
size classes, causing a loss of telemetry information about the internal
fragmentation induced by potentially poor size-classes choices.

Instrumenting `nallocx` does not help much since user code can cache its
result and use it repeatedly.

---

Implementation:

The implementation adds a new `usize` option to `static_opts_s` and an `usize`
variable to `dynamic_opts_s`. These are then used to cache the result of
`sz_index2size` and similar functions in the code paths in which they are
unconditionally invoked. In the code-paths in which these functions are not
unconditionally invoked, `smallocx` calls, as opposed to `mallocx`, these
functions explicitly.

---

[0]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0901r0.html
2018-10-17 07:12:28 -07:00
David Goldblatt
e8ec9528ab Allow the use of readlinkat over readlink.
This can be useful in situations where readlink is disallowed.
2018-08-03 14:04:32 -07:00
Christopher Ferris
f78d4ca3fb Modify configure to determine return value of strerror_r.
On glibc and Android's bionic, strerror_r returns char* when
_GNU_SOURCE is defined.

Add a configure check for this rather than assume glibc is the
only libc that behaves this way.
2018-01-10 21:01:18 -08:00
David Goldblatt
ccd09050aa Add configure-time detection for madvise(..., MADV_DO[NT]DUMP) 2017-10-16 15:35:49 -07:00
Qi Wang
31ab38be5f Define MADV_FREE on our own when needed.
On x86 Linux, we define our own MADV_FREE if madvise(2) is available, but no
MADV_FREE is detected.  This allows the feature to be built in and enabled with
runtime detection.
2017-10-11 15:49:22 -07:00
David Goldblatt
1245faae90 Power: disable the CPU_SPINWAIT macro.
Quoting from https://github.com/jemalloc/jemalloc/issues/761 :

[...] reading the Power ISA documentation[1], the assembly in [the CPU_SPINWAIT
macro] isn't correct anyway (as @marxin points out): the setting of the
program-priority register is "sticky", and we never undo the lowering.

We could do something similar, but given that we don't have testing here in the
first place, I'm inclined to simply not try. I'll put something up reverting the
problematic commit tomorrow.

[1] Book II, chapter 3 of the 2.07B or 3.0B ISA documents.
2017-10-04 18:37:23 -07:00
Qi Wang
8fdd9a5797 Implement opt.metadata_thp
This option enables transparent huge page for base allocators (require
MADV_HUGEPAGE support).
2017-08-11 14:51:20 -07:00
David T. Goldblatt
9761b449c8 Add a logging facility.
This sets up a hierarchical logging facility, so that we can add logging
statements liberally, and turn them on in a fine-grained manner.
2017-07-20 17:58:37 -07:00
Qi Wang
a3f4977217 Add thread name for background threads. 2017-06-23 10:54:54 -07:00
Jason Evans
13685ab1b7 Normalize background thread configuration.
Also fix a compilation error #ifndef JEMALLOC_PTHREAD_CREATE_WRAPPER.
2017-06-08 23:01:26 -07:00
Jason Evans
c606a87d2a Add the --disable-thp option to support cross compiling.
This resolves #669.
2017-05-30 11:30:54 -07:00
Qi Wang
b693c7868e Implementing opt.background_thread.
Added opt.background_thread to enable background threads, which handles purging
currently.  When enabled, decay ticks will not trigger purging (which will be
left to the background threads).  We limit the max number of threads to NCPUs.
When percpu arena is enabled, set CPU affinity for the background threads as
well.

The sleep interval of background threads is dynamic and determined by computing
number of pages to purge in the future (based on backlog).
2017-05-23 12:26:20 -07:00
Jason Evans
909f0482e4 Automatically generate private symbol name mangling macros.
Rather than using a manually maintained list of internal symbols to
drive name mangling, add a compilation phase to automatically extract
the list of internal symbols.

This resolves #677.
2017-05-11 23:06:54 -07:00
Jason Evans
b9ab04a191 Refactor !opt.munmap to opt.retain. 2017-04-29 09:24:12 -07:00
Jason Evans
c67c3e4a63 Replace --disable-munmap with opt.munmap.
Control use of munmap(2) via a run-time option rather than a
compile-time option (with the same per platform default).  The old
behavior of --disable-munmap can be achieved with
--with-malloc-conf=munmap:false.

This partially resolves #580.
2017-04-24 20:37:16 -07:00
Jason Evans
e2cc6280ed Remove --enable-code-coverage.
This option hasn't been particularly useful since the original pre-3.0.0
push to broaden test coverage.

This partially resolves #580.
2017-04-24 16:33:04 -07:00
Jason Evans
0f63396b23 Remove --disable-cc-silence.
The explicit compiler warning suppression controlled by this option is
universally desirable, so remove the ability to disable suppression.

This partially resolves #580.
2017-04-24 15:02:45 -07:00
Jason Evans
af76f0e5d2 Remove --with-lg-tiny-min.
This option isn't useful in practice.

This partially resolves #580.
2017-04-24 11:48:28 -07:00
David Goldblatt
425253e2cd Enable -Wundef, when supported.
This can catch bugs in which one header defines a numeric constant, and another
uses it without including the defining header. Undefined preprocessor symbols
expand to '0', so that this will compile fine, silently doing the math wrong.
2017-04-21 17:03:56 -07:00
Jason Evans
3823effe12 Remove --enable-ivsalloc.
Continue to use ivsalloc() when --enable-debug is specified (and add
assertions to guard against 0 size), but stop providing a documented
explicit semantics-changing band-aid to dodge undefined behavior in
sallocx() and malloc_usable_size().  ivsalloc() remains compiled in,
unlike when #211 restored --enable-ivsalloc, and if
JEMALLOC_FORCE_IVSALLOC is defined during compilation, sallocx() and
malloc_usable_size() will still use ivsalloc().

This partially resolves #580.
2017-04-21 14:34:35 -07:00
Jason Evans
4403c9ab44 Remove --disable-tcache.
Simplify configuration by removing the --disable-tcache option, but
replace the testing for that configuration with
--with-malloc-conf=tcache:false.

Fix the thread.arena and thread.tcache.flush mallctls to work correctly
if tcache is disabled.

This partially resolves #580.
2017-04-21 10:06:12 -07:00
Jason Evans
7cbcd2e2b7 Fix pages_purge_forced() to discard pages on non-Linux systems.
madvise(..., MADV_DONTNEED) only causes demand-zeroing on Linux, so fall
back to overlaying a new mapping.
2017-03-13 18:19:57 -07:00
Qi Wang
ec532e2c5c Implement per-CPU arena.
The new feature, opt.percpu_arena, determines thread-arena association
dynamically based CPU id. Three modes are supported: "percpu", "phycpu"
and disabled.

"percpu" uses the current core id (with help from sched_getcpu())
directly as the arena index, while "phycpu" will assign threads on the
same physical CPU to the same arena. In other words, "percpu" means # of
arenas == # of CPUs, while "phycpu" has # of arenas == 1/2 * (# of
CPUs). Note that no runtime check on whether hyper threading is enabled
is added yet.

When enabled, threads will be migrated between arenas when a CPU change
is detected. In the current design, to reduce overhead from reading CPU
id, each arena tracks the thread accessed most recently. When a new
thread comes in, we will read CPU id and update arena if necessary.
2017-03-08 23:19:01 -08:00
David Goldblatt
d4ac7582f3 Introduce a backport of C11 atomics
This introduces a backport of C11 atomics.  It has four implementations; ranked
in order of preference, they are:
- GCC/Clang __atomic builtins
- GCC/Clang __sync builtins
- MSVC _Interlocked builtins
- C11 atomics, from <stdatomic.h>

The primary advantages are:
- Close adherence to the standard API gives us a defined memory model.
- Type safety: atomic objects are now separate types from non-atomic ones, so
  that it's impossible to mix up atomic and non-atomic updates (which is
  undefined behavior that compilers are starting to take advantage of).
- Efficiency: we can specify ordering for operations, avoiding fences and
  atomic operations on strongly ordered architectures (example:
  `atomic_write_u32(ptr, val);` involves a CAS loop, whereas
  `atomic_store(ptr, val, ATOMIC_RELEASE);` is a plain store.

This diff leaves in the current atomics API (implementing them in terms of the
backport).  This lets us transition uses over piecemeal.

Testing:
This is by nature hard to test. I've manually tested the first three options on
Linux on gcc by futzing with the #defines manually, on freebsd with gcc and
clang, on MSVC, and on OS X with clang.  All of these were x86 machines though,
and we don't have any test infrastructure set up for non-x86 platforms.
2017-03-03 13:40:59 -08:00
Jason Evans
f5cf9b19c8 Determine rtree levels at compile time.
Rather than dynamically building a table to aid per level computations,
define a constant table at compile time.  Omit both high and low
insignificant bits.  Use one to three tree levels, depending on the
number of significant bits.
2017-02-08 18:50:03 -08:00
Jason Evans
c0cc5db871 Replace tabs following #define with spaces.
This resolves #564.
2017-01-20 21:45:53 -08:00
Mike Hommey
0f7376eb62 Don't rely on OSX SDK malloc/malloc.h for malloc_zone struct definitions
The SDK jemalloc is built against might be not be the latest for various
reasons, but the resulting binary ought to work on newer versions of
OSX.

In order to ensure this, we need the fullest definitions possible, so
copy what we need from the latest version of malloc/malloc.h available
on opensource.apple.com.
2017-01-17 20:13:28 -08:00
Jason Evans
c1baa0a9b7 Add huge page configuration and pages_[no}huge().
Add the --with-lg-hugepage configure option, but automatically configure
LG_HUGEPAGE even if it isn't specified.

Add the pages_[no]huge() functions, which toggle huge page state via
madvise(..., MADV_[NO]HUGEPAGE) calls.
2016-12-26 17:59:34 -08:00
Jason Evans
acb7b1f53e Add --disable-syscall.
This resolves #517.
2016-12-03 16:50:58 -08:00
Jason Evans
5234be2133 Add pthread_atfork(3) feature test.
Some versions of Android provide a pthreads library without providing
pthread_atfork(), so in practice a separate feature test is necessary
for the latter.
2016-11-17 15:14:57 -08:00
Jason Evans
a64123ce13 Refactor madvise(2) configuration.
Add feature tests for the MADV_FREE and MADV_DONTNEED flags to
madvise(2), so that MADV_FREE is detected and used for Linux kernel
versions 4.5 and newer.  Refactor pages_purge() so that on systems which
support both flags, MADV_FREE is preferred over MADV_DONTNEED.

This resolves #387.
2016-11-17 10:31:57 -08:00
Jason Evans
d82f2b3473 Do not use syscall(2) on OS X 10.12 (deprecated). 2016-11-02 19:18:33 -07:00
Jason Evans
795f6689de Add os_unfair_lock support.
OS X 10.12 deprecated OSSpinLock; os_unfair_lock is the recommended
replacement.
2016-11-02 18:09:45 -07:00
Jason Evans
6c80321aed Use CLOCK_MONOTONIC_COARSE rather than COARSE_MONOTONIC_RAW.
The raw clock variant is slow (even relative to plain CLOCK_MONOTONIC),
whereas the coarse clock variant is faster than CLOCK_MONOTONIC, but
still has resolution (~1ms) that is adequate for our purposes.

This resolves #479.
2016-10-29 22:58:18 -07:00
Jason Evans
e0164bc63c Refine nstime_update().
Add missing #include <time.h>.  The critical time facilities appear to
have been transitively included via unistd.h and sys/time.h, but in
principle this omission was capable of having caused
clock_gettime(CLOCK_MONOTONIC, ...) to have been overlooked in favor of
gettimeofday(), which in turn could cause spurious non-monotonic time
updates.

Refactor nstime_get() out of nstime_update() and add configure tests for
all variants.

Add CLOCK_MONOTONIC_RAW support (Linux-specific) and
mach_absolute_time() support (OS X-specific).

Do not fall back to clock_gettime(CLOCK_REALTIME, ...).  This was a
fragile Linux-specific workaround, which we're unlikely to use at all
now that clock_gettime(CLOCK_MONOTONIC_RAW, ...) is supported, and if we
have no choice besides non-monotonic clocks, gettimeofday() is only
incrementally worse.
2016-10-10 10:33:59 -07:00
Elliot Ronaghan
1167e9eff3 Check for __builtin_unreachable at configure time
Add a configure check for __builtin_unreachable instead of basing its
availability on the __GNUC__ version. On OS X using gcc (a real gcc, not the
bundled version that's just a gcc front-end) leads to a linker assertion:

    https://github.com/jemalloc/jemalloc/issues/266

It turns out that this is caused by a gcc bug resulting from the use of
__builtin_unreachable():

    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57438

To work around this bug, check that __builtin_unreachable() actually works at
configure time, and if it doesn't use abort() instead. The check is based on
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57438#c21.

With this `make check` passes with a homebrew installed gcc-5 and gcc-6.
2016-07-07 13:28:44 -07:00
Jason Evans
c8c3cbdf47 Miscellaneous s/chunk/extent/ updates. 2016-06-05 20:42:24 -07:00
Jason Evans
03eea4fb8b Better document --enable-ivsalloc. 2016-06-05 20:42:24 -07:00
Jason Evans
17c021c177 Remove redzone support.
This resolves #369.
2016-05-13 10:27:33 -07:00