Commit Graph

754 Commits

Author SHA1 Message Date
Jason Evans
36195c8f4d Disable percpu_arena by default. 2017-05-23 15:32:50 -07:00
Qi Wang
eeefdf3ce8 Fix # of unpurged pages in decay algorithm.
When # of dirty pages move below npages_limit (e.g. they are reused), we should
not lower number of unpurged pages because that would cause the reused pages to
be double counted in the backlog (as a result, decay happen slower than it
should).  Instead, set number of unpurged to the greater of current npages and
npages_limit.

Added an assertion: the ceiling # of pages should be greater than npages_limit.
2017-05-23 13:48:30 -07:00
Qi Wang
0eae838b0d Check for background thread inactivity on extents_dalloc.
To avoid background threads sleeping forever with idle arenas, we eagerly check
background threads' sleep time after extents_dalloc, and signal the thread if
necessary.
2017-05-23 12:26:20 -07:00
Qi Wang
5f5ed2198e Add profiling for the background thread mutex. 2017-05-23 12:26:20 -07:00
Qi Wang
2bee0c6251 Add background thread related stats. 2017-05-23 12:26:20 -07:00
Qi Wang
b693c7868e Implementing opt.background_thread.
Added opt.background_thread to enable background threads, which handles purging
currently.  When enabled, decay ticks will not trigger purging (which will be
left to the background threads).  We limit the max number of threads to NCPUs.
When percpu arena is enabled, set CPU affinity for the background threads as
well.

The sleep interval of background threads is dynamic and determined by computing
number of pages to purge in the future (based on backlog).
2017-05-23 12:26:20 -07:00
David Goldblatt
3f685e8824 Protect the rtree/extent interactions with a mutex pool.
Instead of embedding a lock bit in rtree leaf elements, we associate extents
with a small set of mutexes.  This gets us two things:

- We can use the system mutexes.  This (hypothetically) protects us from
  priority inversion, and lets us stop doing a backoff/sleep loop, instead
  opting for precise wakeups from the mutex.
- Cuts down on the number of mutex acquisitions we have to do (from 4 in the
  worst case to two).

We end up simplifying most of the rtree code (which no longer has to deal with
locking or concurrency at all), at the cost of additional complexity in the
extent code: since the mutex protecting the rtree leaf elements is determined by
reading the extent out of those elements, the initial read is racy, so that we
may acquire an out of date mutex.  We re-check the extent in the leaf after
acquiring the mutex to protect us from this race.
2017-05-19 14:21:27 -07:00
David Goldblatt
26c792e61a Allow mutexes to take a lock ordering enum at construction.
This lets us specify whether and how mutexes of the same rank are allowed to be
acquired.  Currently, we only allow two polices (only a single mutex at a given
rank at a time, and mutexes acquired in ascending order), but we can plausibly
allow more (e.g. the "release uncontended mutexes before blocking").
2017-05-19 14:21:27 -07:00
Jason Evans
6e62c62862 Refactor *decay_time into *decay_ms.
Support millisecond resolution for decay times.  Among other use cases
this makes it possible to specify a short initial dirty-->muzzy decay
phase, followed by a longer muzzy-->clean decay phase.

This resolves #812.
2017-05-18 11:33:45 -07:00
Qi Wang
baf3e294e0 Add stats: arena uptime. 2017-05-18 10:04:28 -07:00
Jason Evans
18a83681cf Refactor (MALLOCX_ARENA_MAX + 1) to be MALLOCX_ARENA_LIMIT.
This resolves #673.
2017-05-14 10:14:23 -07:00
Jason Evans
909f0482e4 Automatically generate private symbol name mangling macros.
Rather than using a manually maintained list of internal symbols to
drive name mangling, add a compilation phase to automatically extract
the list of internal symbols.

This resolves #677.
2017-05-11 23:06:54 -07:00
Jason Evans
a4ae9707da Remove unused private_unnamespace infrastructure. 2017-05-11 23:06:54 -07:00
Jason Evans
a268af5085 Stop depending on JEMALLOC_N() for function interception during testing.
Instead, always define function pointers for interceptable functions,
but mark them const unless testing, so that the compiler can optimize
out the pointer dereferences.
2017-05-11 23:06:54 -07:00
Jason Evans
81ef365622 Avoid compiler warnings on Windows. 2017-05-11 18:06:20 -07:00
Jason Evans
11d2f39d96 Remove mutex_prof_data_t redeclaration.
Redeclaration causes compilations failures with e.g. gcc 4.2.1 on
FreeBSD.  This regression was introduced by
89e2d3c12b (Header refactoring: ctl -
unify and remove from catchall.).
2017-05-11 10:49:43 -07:00
Jason Evans
0798fe6e70 Fix rtree_leaf_elm_szind_slab_update().
Re-read the leaf element when atomic CAS fails due to a race with
another thread that has locked the leaf element, since
atomic_compare_exchange_strong_p() overwrites the expected value with
the actual value on failure.  This regression was introduced by
0ee0e0c155 (Implement compact rtree leaf
element representation.).

This resolves #798.
2017-05-03 08:52:33 -07:00
Jason Evans
344dd342dd rtree_leaf_elm_extent_write() --> rtree_leaf_elm_extent_lock_write()
Refactor rtree_leaf_elm_extent_write() as
rtree_leaf_elm_extent_lock_write(), so that whether the leaf element is
currently acquired is separate from what lock state to write.  This
allows for a relaxed atomic read when releasing the lock.
2017-05-03 08:52:33 -07:00
Qi Wang
fc1aaf13fe Revert "Use trylock in tcache_bin_flush when possible."
This reverts commit 8584adc451.  Production
results not favorable.  Will investigate separately.
2017-05-01 14:49:42 -07:00
David Goldblatt
209f2926b8 Header refactoring: tsd - cleanup and dependency breaking.
This removes the tsd macros (which are used only for tsd_t in real builds).  We
break up the circular dependencies involving tsd.

We also move all tsd access through getters and setters.  This allows us to
assert that we only touch data when tsd is in a valid state.

We simplify the usages of the x macro trick, removing all the customizability
(get/set, init, cleanup), moving the lifetime logic to tsd_init and tsd_cleanup.
This lets us make initialization order independent of order within tsd_t.
2017-05-01 10:49:56 -07:00
Jason Evans
c86c8f4ffb Add extent_destroy_t and use it during arena destruction.
Add the extent_destroy_t extent destruction hook to extent_hooks_t, and
use it during arena destruction.  This hook explicitly communicates to
the callee that the extent must be destroyed or tracked for later reuse,
lest it be permanently leaked.  Prior to this change, retained extents
could unintentionally be leaked if extent retention was enabled.

This resolves #560.
2017-04-29 09:24:12 -07:00
Jason Evans
b9ab04a191 Refactor !opt.munmap to opt.retain. 2017-04-29 09:24:12 -07:00
Qi Wang
d901a37775 Revert "Use try_flush first in tcache_dalloc."
This reverts commit b0c2a28280.  Production
benchmark shows this caused significant regression in both CPU and memory
consumption.  Will investigate separately later on.
2017-04-28 10:59:04 -07:00
Qi Wang
b0c2a28280 Use try_flush first in tcache_dalloc.
Only do must_flush if try_flush didn't manage to free anything.
2017-04-25 17:21:33 -07:00
Qi Wang
8584adc451 Use trylock in tcache_bin_flush when possible.
During tcache gc, use tcache_bin_try_flush_small / _large so that we can skip
items with their bins locked already.
2017-04-25 17:21:33 -07:00
Qi Wang
05775a3736 Avoid prof_dump during reentrancy. 2017-04-25 12:54:36 -07:00
David Goldblatt
268843ac68 Header refactoring: pages.h - unify and remove from catchall. 2017-04-25 09:51:38 -07:00
David Goldblatt
dab4beb277 Header refactoring: hash - unify and remove from catchall. 2017-04-25 09:51:38 -07:00
David Goldblatt
89e2d3c12b Header refactoring: ctl - unify and remove from catchall.
In order to do this, we introduce the mutex_prof module, which breaks a circular
dependency between ctl and prof.
2017-04-25 09:51:38 -07:00
Jason Evans
c67c3e4a63 Replace --disable-munmap with opt.munmap.
Control use of munmap(2) via a run-time option rather than a
compile-time option (with the same per platform default).  The old
behavior of --disable-munmap can be achieved with
--with-malloc-conf=munmap:false.

This partially resolves #580.
2017-04-24 20:37:16 -07:00
Jason Evans
e2cc6280ed Remove --enable-code-coverage.
This option hasn't been particularly useful since the original pre-3.0.0
push to broaden test coverage.

This partially resolves #580.
2017-04-24 16:33:04 -07:00
Jason Evans
0f63396b23 Remove --disable-cc-silence.
The explicit compiler warning suppression controlled by this option is
universally desirable, so remove the ability to disable suppression.

This partially resolves #580.
2017-04-24 15:02:45 -07:00
Qi Wang
f970c497dc Implement malloc_mutex_trylock() w/ proper stats update. 2017-04-24 13:23:55 -07:00
Jason Evans
af76f0e5d2 Remove --with-lg-tiny-min.
This option isn't useful in practice.

This partially resolves #580.
2017-04-24 11:48:28 -07:00
David Goldblatt
120c7a747f Header refactoring: bitmap - unify and remove from catchall. 2017-04-24 10:33:21 -07:00
David Goldblatt
d6b5c7e0f6 Header refactoring: stats - unify and remove from catchall 2017-04-24 10:33:21 -07:00
David Goldblatt
36abf78aa9 Header refactoring: move smoothstep.h out of the catchall. 2017-04-24 10:33:21 -07:00
David Goldblatt
31b43219db Header refactoring: size_classes module - remove from the catchall 2017-04-24 10:33:21 -07:00
David Goldblatt
68da2361d2 Header refactoring: ckh module - remove from the catchall and unify. 2017-04-24 10:33:21 -07:00
David Goldblatt
bf2dc7e678 Header refactoring: ticker module - remove from the catchall and unify. 2017-04-24 10:33:21 -07:00
David Goldblatt
fa3ad730c4 Header refactoring: prng module - remove from the catchall and unify. 2017-04-24 10:33:21 -07:00
David Goldblatt
4d2e4bf5eb Get rid of most of the various inline macros. 2017-04-24 10:33:21 -07:00
David Goldblatt
425253e2cd Enable -Wundef, when supported.
This can catch bugs in which one header defines a numeric constant, and another
uses it without including the defining header. Undefined preprocessor symbols
expand to '0', so that this will compile fine, silently doing the math wrong.
2017-04-21 17:03:56 -07:00
Jason Evans
3823effe12 Remove --enable-ivsalloc.
Continue to use ivsalloc() when --enable-debug is specified (and add
assertions to guard against 0 size), but stop providing a documented
explicit semantics-changing band-aid to dodge undefined behavior in
sallocx() and malloc_usable_size().  ivsalloc() remains compiled in,
unlike when #211 restored --enable-ivsalloc, and if
JEMALLOC_FORCE_IVSALLOC is defined during compilation, sallocx() and
malloc_usable_size() will still use ivsalloc().

This partially resolves #580.
2017-04-21 14:34:35 -07:00
Jim Chen
ae248a2160 Use openat syscall if available
Some architectures like AArch64 may not have the open syscall because it
was superseded by the openat syscall, so check and use SYS_openat if
SYS_open is not available.

Additionally, Android headers for AArch64 define SYS_open to __NR_open,
even though __NR_open is undefined. Undefine SYS_open in that case so
SYS_openat is used.
2017-04-21 10:58:42 -07:00
Jason Evans
4403c9ab44 Remove --disable-tcache.
Simplify configuration by removing the --disable-tcache option, but
replace the testing for that configuration with
--with-malloc-conf=tcache:false.

Fix the thread.arena and thread.tcache.flush mallctls to work correctly
if tcache is disabled.

This partially resolves #580.
2017-04-21 10:06:12 -07:00
Qi Wang
5aa46f027d Bypass extent tracking for auto arenas.
Tracking extents is required by arena_reset.  To support this, the extent
linkage was used for tracking 1) large allocations, and 2) full slabs.  However
modifying the extent linkage could be an expensive operation as it likely incurs
cache misses.  Since we forbid arena_reset on auto arenas, let's bypass the
linkage operations for auto arenas.
2017-04-21 00:29:18 -07:00
Jason Evans
da4cff0279 Support --with-lg-page values larger than system page size.
All mappings continue to be PAGE-aligned, even if the system page size
is smaller.  This change is primarily intended to provide a mechanism
for supporting multiple page sizes with the same binary; smaller page
sizes work better in conjunction with jemalloc's design.

This resolves #467.
2017-04-18 19:01:04 -07:00
Jason Evans
45f087eb03 Revert "Remove BITMAP_USE_TREE."
Some systems use a native 64 KiB page size, which means that the bitmap
for the smallest size class can be 8192 bits, not just 512 bits as when
the page size is 4 KiB.  Linear search in bitmap_{sfu,ffu}() is
unacceptably slow for such large bitmaps.

This reverts commit 7c00f04ff4.
2017-04-18 19:01:04 -07:00
David Goldblatt
38e847c1c5 Header refactoring: unify spin.h and move it out of the catch-all. 2017-04-18 18:35:03 -07:00