Commit Graph

1475 Commits

Author SHA1 Message Date
Qi Wang
719583f14a Fix large.nflushes in the merged stats. 2019-08-28 23:37:00 -07:00
Yinan Zhang
adce29c885 Optimize for prof_active off
Move the handling of `prof_active` off case completely to slow path,
so as to reduce register pressure on malloc fast path.
2019-08-27 14:48:56 -07:00
Yinan Zhang
49e6fbce78 Always adjust thread_(de)allocated 2019-08-26 11:56:41 -07:00
Yinan Zhang
57b81c078e Pull thread_(de)allocated out of config_stats 2019-08-26 11:56:41 -07:00
Yinan Zhang
9e031c1d11 Bug fix for prof_active switch
The bug is subtle but critical: if application performs the following
three actions in sequence: (a) turn `prof_active` off, (b) make at
least one allocation that triggers the malloc slow path via the
`if (unlikely(bytes_until_sample < 0))` path, and (c) turn
`prof_active` back on, then the application would never get another
sample (until a very very long time later).

The fix is to properly reset `bytes_until_sample` rather than
throwing it all the way to `SSIZE_MAX`.

A side minor change is to call `prof_active_get_unlocked()` rather
than directly grabbing the `prof_active` variable - it is the very
reason why we defined the `prof_active_get_unlocked()` function.
2019-08-22 13:00:10 -07:00
Qi Wang
0043e68d4c Track low_water == -1 case explicitly.
The -1 value of low_water indicates if the cache has been depleted and
refilled.  Track the status explicitly in the tcache struct.

This allows the fast path to check if (cur_ptr > low_water), instead of >=,
which avoids reaching slow path when the last item is allocated.
2019-08-21 16:00:38 -07:00
Qi Wang
937ca1db9f Store ncached_max * ptr_size in tcache_bin_info.
With the cache bin metadata switched to pointers, ncached_max is usually
accessed and timed by sizeof(ptr). Store the results in tcache_bin_info for
direct access, and add a helper function for the ncached_max value.
2019-08-19 12:23:24 -07:00
Qi Wang
7599c82d48 Redesign the cache bin metadata for fast path.
Implement the pointer-based metadata for tcache bins --
- 3 pointers are maintained to represent each bin;
- 2 of the pointers are compressed on 64-bit;
- is_full / is_empty done through pointer comparison;

Comparing to the previous counter based design --
- fast-path speed up ~15% in benchmarks
- direct pointer comparison and de-reference
- no need to access tcache_bin_info in common case
2019-08-19 12:21:44 -07:00
Qi Wang
9c5c2a2c86 Unify the signature of tcache_flush small and large. 2019-08-14 13:08:23 -07:00
Yinan Zhang
28ed9b9a51 Buffer stats printing
Without buffering `malloc_stats_print` would invoke the write back
call (which could mean an expensive `malloc_write_fd` call) for every
single `printf` (including printing each line break and each leading
tab/space for indentation).
2019-08-13 09:40:11 -07:00
Yinan Zhang
eb70fef8ca Make compact json format as default
Saves 20-50% of the output size.
2019-08-12 13:59:50 -07:00
Yinan Zhang
a219cfcda3 Clear tcache prof_accumbytes in tcache_flush_cache
`tcache->prof_accumbytes` should always be cleared after being
transferred to arena; otherwise the allocations would be double
counted, leading to excessive prof dumps.
2019-08-12 09:08:09 -07:00
Yinan Zhang
ad3f7dbfa0 Buffer prof_log_stop
Make use of the new buffered writer for the output of `prof_log_stop`.
2019-08-12 09:06:01 -07:00
Qi Wang
5934846612 Fix large bin index accessed through cache bin descriptor. 2019-08-11 16:31:12 -07:00
Qi Wang
22746d3c9f Properly dalloc prof nodes with idalloctm.
The prof_alloc_node is allocated through ialloc as internal.  Switch to
idalloctm with tcache and is_internal properly set.
2019-08-09 10:29:49 -07:00
Yinan Zhang
7fc6b1b259 Add buffered writer
The buffered writer adopts a signature identical to `write_cb`,
so that it can be plugged into anywhere `write_cb` appears.
2019-08-09 09:44:29 -07:00
Yinan Zhang
39343555d6 Report stats for tdatas_mtx and prof_dump_mtx 2019-08-09 09:24:16 -07:00
Qi Wang
87e2400cbb Fix tcaches mutex pre- / post-fork handling. 2019-08-08 10:55:32 -07:00
Yinan Zhang
07ce2434bf Refactor profiling
Refactored core profiling codebase into two logical parts:

(a) `prof_data.c`: core internal data structure managing & dumping;
(b) `prof.c`: mutexes & outward-facing APIs.

Some internal functions had to be exposed out, but there are not
that many of them if the modularization is (hopefully) clean enough.
2019-08-07 19:48:28 -07:00
Yinan Zhang
56126d0d2d Refactor prof log
Prof logging is conceptually seperate from core profiling, so
split it out as a module of its own.  There are a few internal
functions that had to be exposed but I think it is a fair trade-off.
2019-08-07 13:53:45 -07:00
Qi Wang
8a94ac25d5 Sanity check on prof dump buffer size. 2019-08-01 17:55:45 -07:00
Yinan Zhang
82b8aaaeb6 Quick fix for prof log printing
The emitter APIs used were incorrect, a side effect of which was
extra lines being printed.
2019-07-30 19:31:28 -07:00
Qi Wang
c9cdc1b27f Limit to exact fit on Windows with retain off.
W/o retain, split and merge are disallowed on Windows.  Avoid doing first-fit
which needs splitting almost always.  Instead, try exact fit only and bail out
early.
2019-07-29 16:19:36 -07:00
Qi Wang
5742473cc8 Revert "Refactor prof log"
This reverts commit 7618b0b8e4.
2019-07-29 14:10:15 -07:00
Qi Wang
1a0503367b Revert "Refactor profiling"
This reverts commit 0b462407ae.
2019-07-29 14:10:15 -07:00
Yinan Zhang
0b462407ae Refactor profiling
Refactored core profiling codebase into two logical parts:

(a) `prof_data.c`: core internal data structure managing & dumping;
(b) `prof.c`: mutexes & outward-facing APIs.

Some internal functions had to be exposed out, but there are not
that many of them if the modularization is (hopefully) clean enough.
2019-07-29 13:55:00 -07:00
Yinan Zhang
7618b0b8e4 Refactor prof log
`prof.c` is growing too long, so trying to modularize it.  There are
a few internal functions that had to be exposed but I think it is a
fair trade-off.
2019-07-29 13:55:00 -07:00
Qi Wang
85f0cb2d0c Add indent to individual options for confirm_conf. 2019-07-25 17:00:31 -07:00
Qi Wang
bc0998a905 Invoke arena_dalloc_promoted() properly w/o tcache.
When tcache was disabled, the dalloc promoted case was missing.
2019-07-24 18:30:54 -07:00
Qi Wang
1d148f353a Optimize max_active_fit in first_fit.
Stop scanning once reached the first max_active_fit size.
2019-07-24 11:28:45 -07:00
Qi Wang
4e36ce34c1 Track the leaked VM space via the abandoned_vm counter.
The counter is 0 unless metadata allocation failed (indicates OOM), and is
mainly for sanity checking.
2019-07-24 11:24:22 -07:00
Qi Wang
42807fcd9e extent_dalloc instead of leak when register fails.
extent_register may only fail if the underlying extent and region got stolen /
coalesced before we lock.  Avoid doing extent_leak (which purges the region)
since we don't really own the region.
2019-07-23 22:34:45 -07:00
Qi Wang
57dbab5d6b Avoid leaking extents / VM when split is not supported.
This can only happen on Windows and with opt.retain disabled (which isn't the
default).  The solution is suboptimal, however not a common case as retain is
the long term plan for all platforms anyway.
2019-07-23 22:18:55 -07:00
Qi Wang
9a86c65abc Implement retain on Windows.
The VirtualAlloc and VirtualFree APIs are different because MEM_DECOMMIT cannot
be used across multiple VirtualAlloc regions.  To properly support decommit,
only allow merge / split within the same region -- this is done by tracking the
"is_head" state of extents and not merging cross-region.

Add a new state is_head (only relevant for retain && !maps_coalesce), which is
true for the first extent in each VirtualAlloc region.  Determine if two extents
can be merged based on the head state, and use serial numbers for sanity checks.
2019-07-23 22:18:55 -07:00
Qi Wang
f32f23d6cc Fix posix_memalign with input size 0.
Return a valid pointer instead of failed assertion.
2019-07-18 00:43:23 -07:00
Yinan Zhang
e0a0c8d4bf Fix a bug in prof_dump_write
The original logic can be disastrous if `PROF_DUMP_BUFSIZE` is less
than `slen` -- `prof_dump_buf_end + slen <= PROF_DUMP_BUFSIZE` would
always be `false`, so `memcpy` would always try to copy
`PROF_DUMP_BUFSIZE - prof_dump_buf_end` chars, which can be
dangerous: in the last round of the `while` loop it would not only
illegally read the memory beyond `s` (which might not always be
disastrous), but it would also illegally overwrite the memory beyond
`prof_dump_buf` (which can be pretty disastrous).  `slen` probably
has never gone beyond `PROF_DUMP_BUFSIZE` so we were just lucky.
2019-07-16 15:15:32 -07:00
Yinan Zhang
d26636d566 Fix logic in printing
`cbopaque` can now be overriden without overriding `write_cb` in
the first place.  (Otherwise there would be no need to have the
`cbopaque` parameter in `malloc_message`.)
2019-07-16 14:54:23 -07:00
Qi Wang
1a71533511 Avoid blocking on background thread lock for stats.
Background threads may run for a long time, especially when the # of dirty pages
is high.  Avoid blocking stats calls because of this (which may cause latency
spikes).
2019-05-22 14:28:38 -07:00
Qi Wang
e13cf65a5f Add experimental.arenas.i.pactivep.
The new experimental mallctl exposes the arena pactive counter to applications,
which allows fast read w/o going through the mallctl / epoch steps.  This is
particularly useful when frequent balancing is required, e.g. when having
multiple manual arenas, and threads are multiplexed to them based on usage.
2019-05-22 14:27:58 -07:00
Yinan Zhang
c92ac30601 Add confirm_conf option
If the confirm_conf option is set, when the program starts, each of
the four malloc_conf strings will be printed, and each option will
be printed when being set.
2019-05-22 09:38:39 -07:00
Yinan Zhang
4c63b0e76a Improve memory utilization tests
Added tests for large size classes and expanded the tests to
cover wider range of allocation sizes.
2019-05-21 12:57:06 -07:00
Vaibhav Jain
2d6d099fed Fix GCC-9.1 warning with macro GET_ARG_NUMERIC
GCC-9.1 reports following error when trying to compile file
src/malloc_io.c and with CFLAGS='-Werror' :

src/malloc_io.c: In function ‘malloc_vsnprintf’:
src/malloc_io.c:369:2: error: case label value exceeds maximum value for type [-Werror]
  369 |  case '?' | 0x80:      \
      |  ^~~~
src/malloc_io.c:581:5: note: in expansion of macro ‘GET_ARG_NUMERIC’
  581 |     GET_ARG_NUMERIC(val, 'p');
      |     ^~~~~~~~~~~~~~~
...
<snip>
cc1: all warnings being treated as errors
make: *** [Makefile:388: src/malloc_io.sym.o] Error 1

The warning is reported as by default the type 'char' is 'signed char'
and or-ing 0x80 will turn the case label char negative which will be
beyond the printable ascii range (0 - 127).

The patch fixes this by explicitly casting the 'len' variable as
unsigned char' inside the 'switch' statement so that value of
expression " '?' | 0x80 " falls within the legal values of the
variable 'len'.
2019-05-21 11:20:07 -07:00
Qi Wang
07c44847c2 Track nfills and nflushes for arenas.i.small / large.
Small is added purely for convenience.  Large flushes wasn't tracked before and
can be useful in analysis.  Large fill simply reports nmalloc, since there is no
batch fill for large currently.
2019-05-15 10:05:09 -07:00
Yinan Zhang
13e88ae970 Fix assert in free fastpath
rtree_szind_slab_read_fast() may have not initialized
alloc_ctx.szind, unless after confirming the return is true.
2019-05-15 09:42:52 -07:00
Yinan Zhang
259b15dec5 Improve macro readability in malloc_conf_init
Define more readable macros than yes and no.
2019-05-08 14:15:03 -07:00
Dave Watson
5679751208 Remove best fit
This option saves a few CPU cycles, but potentially adds a lot of
fragmentation - so much so that there are workarounds like
max_active.  Instead, let's just drop it entirely.  It only made
a difference in one service I tested (.3% cpu regression), while
many services saw a memory win (also small, less than 1% mem P99)
2019-05-08 13:15:19 -07:00
Dave Watson
b62d126df8 Add max_active_fit to first_fit
The max_active_fit check is currently only on the best_fit
path, add it to the first_fit path also.
2019-05-08 13:15:19 -07:00
Doron Roberts-Kedes
7fc4f2a32c Add nonfull_slabs to bin_stats_t.
When config_stats is enabled track the size of bin->slabs_nonfull in
the new nonfull_slabs counter in bin_stats_t. This metric should be
useful for establishing an upper ceiling on the savings possible by
meshing.
2019-04-29 13:35:02 -07:00
Qi Wang
1aabab5fdc Enforce TLS_MODEL attribute.
Caught by @zoulasc in #1460.  The attribute needs to be added in the headers as
well.
2019-04-16 11:07:15 -07:00
David Goldblatt
33e1dad680 Safety checks: Add a redzoning feature. 2019-04-15 16:48:12 -07:00
David Goldblatt
b92c9a1a81 Safety checks: Indirect through a function.
This will let us share code on failure pathways.pathways
2019-04-15 16:48:12 -07:00
David Goldblatt
f95a88fcd9 Safety checks: Expose config value via mallctl and stats. 2019-04-15 16:48:12 -07:00
David Goldblatt
f4d24f05e1 Move extra size checks behind a config flag.
This will let us turn that flag into a generic "turn on runtime checks" flag
that guards other functionality we have planned.
2019-04-15 16:48:12 -07:00
Yinan Zhang
7ee3897740 Separate tests for extent utilization API
As title.
2019-04-10 13:03:20 -07:00
mgrice
d3d7a8ef09 remove compare and branch in fast path for c++ operator delete[]
Summary: sdallocx is checking a flag that will never be set (at least in the provided C++ destructor implementation).  This branch will probably only rarely be mispredicted however it removes two instructions in sdallocx and one at the callsite (to zero out flags).
2019-04-08 10:59:05 -07:00
Qi Wang
93084cdc89 Ensure page alignment on extent_alloc.
This is discovered and suggested by @jasone in #1468.  When custom extent hooks
are in use, we should ensure page alignment on the extent alloc path, instead of
relying on the user hooks to do so.
2019-04-04 13:49:37 -07:00
Yinan Zhang
9aab3f2be0 Add memory utilization analytics to mallctl
The analytics tool is put under experimental.utilization namespace in
mallctl.  Input is one pointer or an array of pointers and the output
is a list of memory utilization statistics.
2019-04-04 13:48:39 -07:00
Qi Wang
978a7a21ae Use iallocztm instead of ialloc in prof_log functions.
Explicitly use iallocztm for internal allocations.  ialloc could trigger arena
creation, which may cause lock order reversal (narenas_mtx and log_mtx).
2019-04-02 16:53:00 -07:00
Qi Wang
0101d5ebef Avoid check_min for opt_lg_extent_max_active_fit.
This fixes a compiler warning.
2019-03-29 15:56:53 -07:00
Qi Wang
59d9891948 Add the missing unlock in the error path of extent_register. 2019-03-29 15:56:53 -07:00
Qi Wang
788a657cee Allow low values of oversize_threshold to disable the feature.
We should allow a way to easily disable the feature (e.g. not reserving the
arena id at all).
2019-03-29 11:33:00 -07:00
Qi Wang
a4d017f5e5 Output message before aborting on tcache size-matching check. 2019-03-29 11:33:00 -07:00
Qi Wang
fb56766ca9 Eagerly purge oversized merged extents.
This change improves memory usage slightly, at virtually no CPU cost.
2019-03-14 17:34:55 -07:00
Qi Wang
b804d0f019 Fallback to 32-bit when 8-bit atomics are missing for TSD.
When it happens, this might cause a slowdown on the fast path operations.
However such case is very rare.
2019-03-09 12:52:06 -08:00
Dave Rigby
cbdb1807ce Stringify tls_callback linker directive
Proposed fix for #1444 - ensure that `tls_callback` in the `#pragma comment(linker)`directive gets the same prefix added as it does i the C declaration.
2019-02-22 12:43:35 -08:00
Qi Wang
18450d0abe Guard libgcc unwind init with opt_prof.
Only triggers libgcc unwind init when prof is enabled.  This helps workaround
some bootstrapping issues.
2019-02-21 16:04:47 -08:00
Qi Wang
2db2d2ef5e Make background_thread not dependent on libdl.
When not using libdl, still allows background_thread to be enabled.
2019-02-06 21:00:59 -08:00
Qi Wang
e13400c919 Sanity check szind on tcache flush.
This adds some overhead to the tcache flush path (which is one of the
popular paths).  Guard it behind a config option.
2019-02-01 12:31:34 -08:00
Qi Wang
b33eb26dee Tweak the spacing for the total_wait_time per second. 2019-01-28 15:37:19 -08:00
Qi Wang
e3db480f6f Rename huge_threshold to oversize_threshold.
The keyword huge tend to remind people of huge pages which is not relevent to
the feature.
2019-01-25 13:15:45 -08:00
Qi Wang
350809dc5d Set huge_threshold to 8M by default.
This feature uses an dedicated arena to handle huge requests, which
significantly improves VM fragmentation.  In production workload we tested it
often reduces VM size by >30%.
2019-01-24 13:29:23 -08:00
Qi Wang
522d1e7b4b Tweak the spacing for nrequests in stats output. 2019-01-23 17:42:12 -08:00
Qi Wang
8c9571376e Fix stats output (rate for total # of requests).
The rate calculation for the total row was missing.
2019-01-23 17:42:12 -08:00
Qi Wang
7a815c1b7c Un-experimental the huge_threshold feature. 2019-01-16 12:28:57 -08:00
Qi Wang
bbe8e6a909 Avoid creating bg thds for huge arena lone.
For low arena count settings, the huge threshold feature may trigger an unwanted
bg thd creation.  Given that the huge arena does eager purging by default,
bypass bg thd creation when initializing the huge arena.
2019-01-15 16:00:34 -08:00
Qi Wang
f459454afe Avoid potential issues on extent zero-out.
When custom extent_hooks or transparent huge pages are in use, the purging
semantics may change, which means we may not get zeroed pages on repopulating.
Fixing the issue by manually memset for such cases.
2019-01-11 19:16:12 -08:00
Qi Wang
0ecd5addb1 Force purge on thread death only when w/o bg thds. 2019-01-11 19:15:34 -08:00
Qi Wang
7241bf5b74 Only read arena index from extent on the tcache flush path.
Add exten_arena_ind_get() to avoid loading the actual arena ptr in case we just
need to check arena matching.
2018-12-18 15:19:30 -08:00
Alexander Zinoviev
36de5189c7 Add rate counters to stats 2018-12-18 09:59:41 -08:00
Qi Wang
99f4eefb61 Fix incorrect stats mreging with sharded bins.
With sharded bins, we may not flush all items from the same arena in one run.
Adjust the stats merging logic accordingly.
2018-12-07 18:16:15 -08:00
Qi Wang
98b56ab23d Store the bin shard selection in TSD.
This avoids having to choose bin shard on the fly, also will allow flexible bin
binding for each thread.
2018-12-03 17:17:03 -08:00
Qi Wang
45bb4483ba Add stats for arenas.bin.i.nshards. 2018-12-03 17:17:03 -08:00
Qi Wang
3f9f2833f6 Add opt.bin_shards to specify number of bin shards.
The option uses the same format as "slab_sizes" to specify number of shards for
each bin size.
2018-12-03 17:17:03 -08:00
Qi Wang
37b8913925 Add support for sharded bins within an arena.
This makes it possible to have multiple set of bins in an arena, which improves
arena scalability because the bins (especially the small ones) are always the
limiting factor in production workload.

A bin shard is picked on allocation; each extent tracks the bin shard id for
deallocation.  The shard size will be determined using runtime options.
2018-12-03 17:17:03 -08:00
Dave Watson
b23336af96 mutex: fix trylock spin wait contention
If there are 3 or more threads spin-waiting on the same mutex,
there will be excessive exclusive cacheline contention because
pthread_trylock() immediately tries to CAS in a new value, instead
of first checking if the lock is locked.

This diff adds a 'locked' hint flag, and we will only spin wait
without trylock()ing while set.  I don't know of any other portable
way to get the same behavior as pthread_mutex_lock().

This is pretty easy to test via ttest, e.g.

./ttest1 500 3 10000 1 100

Throughput is nearly 3x as fast.

This blames to the mutex profiling changes, however, we almost never
have 3 or more threads contending in properly configured production
workloads, but still worth fixing.
2018-11-28 15:17:02 -08:00
Qi Wang
c4063ce439 Set the default number of background threads to 4.
The setting has been tested in production for a while.  No negative effect while
we were able to reduce number of threads per process.
2018-11-16 09:35:12 -08:00
Qi Wang
43f3b1ad0c Deprecate OSSpinLock. 2018-11-14 08:44:05 -08:00
Dave Watson
13c237c7ef Add a fastpath for arena_slab_reg_alloc_batch
Also adds a configure.ac check for __builtin_popcount, which is used
in the new fastpath.
2018-11-14 07:09:11 -08:00
Dave Watson
17aa470760 add extent_nfree_sub 2018-11-14 07:09:11 -08:00
Dave Watson
4b82872ebf arena: Refactor tcache_fill to batch fill from slab
Refactor tcache_fill, introducing a new function arena_slab_reg_alloc_batch,
which will fill multiple pointers from a slab.

There should be no functional changes here, but allows future optimization
on reg_alloc_batch.
2018-11-14 07:09:11 -08:00
Qi Wang
57553c3b1a Avoid touching all pages in extent_recycle for debug build.
We may have a large number of pages with *zero set (since they are populated on
demand).  Only check the first page to avoid paging in all of them.
2018-11-13 08:54:48 -08:00
Qi Wang
1f56115704 Fix tcache_flush (follow up cd2931a).
Also catch invalid tcache id.
2018-11-13 08:54:09 -08:00
Dave Watson
794e29c0ab Add a free() and sdallocx(where flags=0) fastpath
Add unsized and sized deallocation fastpaths.  Similar to the malloc()
fastpath, this removes all frame manipulation for the majority of
free() calls.  The performance advantages here are less than that
of the malloc() fastpath, but from prod tests seems to still be half
a percent or so of improvement.

Stats and sampling a both supported (sdallocx needs a sampling check,
for rtree lookups slab will only be set for unsampled objects).

We don't support flush, any flush requests go to the slowpath.
2018-11-12 13:20:37 -08:00
Edward Tomasz Napierala
a4c6b9ae01 Restore a FreeBSD-specific getpagesize(3) optimization.
It was removed in 0771ff2cea.
Add a comment explaining its purpose.
2018-11-09 14:14:49 -08:00
Qi Wang
cd2931ad9b Fix tcaches_flush.
The regression was introduced in 3a1363b.
2018-11-09 13:11:37 -08:00
Qi Wang
7ee0b6cc37 Properly trigger decay on tcache destory.
When destroying tcache, decay may not be triggered since tsd is non-nominal.
Explicitly decay to avoid pathological cases.
2018-11-09 11:03:19 -08:00
Qi Wang
d66f976628 Optimize large deallocation.
We eagerly coalesce large buffers when deallocating, however the previous logic
around this introduced extra lock overhead -- when coalescing we always lock the
neighbors even if they are active, while for active extents nothing can be done.

This commit checks if the neighbor extents are potentially active before
locking, and avoids locking if possible.  This speeds up large_dalloc by ~20%.
It also fixes some undesired behavior: we could stop coalescing because a small
buffer was merged, while a large neighbor was ignored on the other side.
2018-11-08 13:35:59 -08:00
Qi Wang
8dabf81df1 Bypass extent_dalloc when retain is enabled.
When retain is enabled, the default dalloc hook does nothing (since we avoid
munmap).  But the overhead preparing the call is high, specifically the extent
de-register and re-register involve locking and extent / rtree modifications.
Bypass the call with retain in this diff.
2018-11-08 11:32:25 -08:00
Qi Wang
50b473c883 Set commit properly for FreeBSD w/ overcommit.
When overcommit is enabled, commit needs to be set when doing mmap().  The
regression was introduced in f80c97e.
2018-11-05 09:47:04 -08:00
Edward Tomasz Napierala
ceba1dde27 Make use of pthread_set_name_np(3) on FreeBSD. 2018-10-24 10:06:37 -07:00
Dave Watson
0f8313659e malloc: Add a fastpath
This diff adds a fastpath that assumes size <= SC_LOOKUP_MAXCLASS, and
that we hit tcache.  If either of these is false, we fall back to
the previous codepath (renamed 'malloc_default').

Crucially, we only tail call malloc_default, and with the same kind
and number of arguments, so that both clang and gcc tail-calling
will kick in - therefore malloc() gets treated as a leaf function,
and there are *no* caller-saved registers.   Previously malloc() contained
5 caller saved registers on x64, resulting in at least 10 extra
memory-movement instructions.

In microbenchmarks this results in up to ~10% improvement in malloc()
fastpath.  In real programs, this is a ~1% CPU and latency improvement
overall.
2018-10-18 08:32:19 -07:00
Dave Watson
ac34afb403 drop bump_empty_alloc option. Size class lookup support used instead. 2018-10-17 08:50:58 -07:00
Dave Watson
4edbb7c64c sz: Support 0 size in size2index lookup/compute 2018-10-17 08:50:58 -07:00
gnzlbg
01e2a38e5a Make smallocx symbol name depend on the JEMALLOC_VERSION_GID
This comments concatenates the `JEMALLOC_VERSION_GID` to the
`smallocx` symbol name, such that the symbol ends up exported
as `smallocx_{git_hash}`.
2018-10-17 07:12:28 -07:00
gnzlbg
741fca1bb7 Hide smallocx even when enabled from the library API
The experimental `smallocx` API is not exposed via header files,
requiring the users to peek at `jemalloc`'s source code to manually
add the external declarations to their own programs.

This should reinforce that `smallocx` is experimental, and that `jemalloc`
does not offer any kind of backwards compatiblity or ABI gurantees for it.
2018-10-17 07:12:28 -07:00
gnzlbg
08260a6b94 Add experimental API: smallocx_return_t smallocx(size, flags)
---

Motivation:

This new experimental memory-allocaction API returns a pointer to
the allocation as well as the usable size of the allocated memory
region.

The `s` in `smallocx` stands for `sized`-`mallocx`, attempting to
convey that this API returns the size of the allocated memory region.

It should allow C++ P0901r0 [0] and Rust Alloc::alloc_excess to make
use of it.

The main purpose of these APIs is to improve telemetry. It is more accurate
to register `smallocx(size, flags)` than `smallocx(nallocx(size), flags)`,
for example. The latter will always line up perfectly with the existing
size classes, causing a loss of telemetry information about the internal
fragmentation induced by potentially poor size-classes choices.

Instrumenting `nallocx` does not help much since user code can cache its
result and use it repeatedly.

---

Implementation:

The implementation adds a new `usize` option to `static_opts_s` and an `usize`
variable to `dynamic_opts_s`. These are then used to cache the result of
`sz_index2size` and similar functions in the code paths in which they are
unconditionally invoked. In the code-paths in which these functions are not
unconditionally invoked, `smallocx` calls, as opposed to `mallocx`, these
functions explicitly.

---

[0]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0901r0.html
2018-10-17 07:12:28 -07:00
Dave Watson
325e3305fc remove malloc_init() off the fastpath 2018-10-15 10:11:08 -07:00
Dave Watson
997d86acc6 restrict bytes_until_sample to int64_t. This allows optimal asm
generation of sub bytes_until_sample, usize; je; for x86 arch.
Subtraction is unconditional, and only flags are checked for the jump,
no extra compare is necessary.  This also reduces register pressure.
2018-10-15 08:24:12 -07:00
Dave Watson
d1a861fa80 add a check for SC_LARGE_MAXCLASS
If we assume SC_LARGE_MAXCLASS will always fit in a SSIZE_T, then we can
optimize some checks by unconditional subtraction, and then checking flags
only, without a compare statement in x86.
2018-10-15 08:24:12 -07:00
Dave Watson
9ed3bdc848 move bytes until sample to tsd. Fastpath allocation does not need
to load tdata now, avoiding several branches.
2018-10-15 08:24:12 -07:00
jsteemann
856319dc8a check return value of malloc_read_fd
in case `malloc_read_fd` returns a negative error number, the result
would afterwards be casted to an unsigned size_t, and may have
theoretically caused an out-of-bounds memory access in the following
`strncmp` call.
2018-10-11 17:25:20 -07:00
Edward Tomasz Napierala
f80c97e477 Rework the way jemalloc uses mmap(2) on FreeBSD.
This makes it directly use MAP_EXCL and MAP_ALIGNED() instead
of weird workarounds involving mapping at random places and then
unmapping parts of them.
2018-10-06 22:06:56 -07:00
Edward Tomasz Napierala
676cdd6679 Disable runtime detection of lazy purging support on FreeBSD.
The check doesn't seem to serve any purpose here, and this shaves
off three syscalls on binary startup.
2018-10-06 22:06:56 -07:00
David Goldblatt
88771fa013 Bootstrapping: don't overwrite opt_prof_prefix. 2018-09-12 17:06:06 -07:00
David Carlier
0771ff2cea FreeBSD build changes and allow to run the tests. 2018-08-09 10:41:20 -07:00
David Goldblatt
e8ec9528ab Allow the use of readlinkat over readlink.
This can be useful in situations where readlink is disallowed.
2018-08-03 14:04:32 -07:00
Tyler Etzel
126252a7e6 Add stats for the size of extent_avail heap 2018-08-02 10:16:06 -07:00
Tyler Etzel
c14e6c0819 Add extents information to mallocstats output
- Show number/bytes of extents of each size that are dirty, muzzy, retained.
2018-08-02 10:16:06 -07:00
Tyler Etzel
5e23f96dd4 Add unit tests for logging 2018-08-01 13:27:11 -07:00
Tyler Etzel
b664bd7935 Add logging for sampled allocations
- prof_opt_log flag starts logging automatically at runtime
- prof_log_{start,stop} mallctl for manual control
2018-08-01 13:27:11 -07:00
Tyler Etzel
eb261e53a6 Small refactoring of emitter
- Make API more clear for using as standalone json emitter
- Support cases that weren't possible before, e.g.
	- emitting primitive values in an array
	- emitting nested arrays
2018-08-01 13:27:11 -07:00
David Goldblatt
41b7372ead TSD: Add fork support to tsd_nominal_tsds.
In case of multithreaded fork, we want to leave the child in a reasonable state,
in which tsd_nominal_tsds is either empty or contains only the forking thread.
2018-07-26 17:22:25 -07:00
David Goldblatt
013ab26c86 TSD: Add a tsd_nominal_list death assertion.
A thread should have had its state transition away from nominal before it dies.
This change adds that to the list of thread death assertions.
2018-07-26 17:22:25 -07:00
David Goldblatt
3aba072cef SC: Remove global data.
The global data is mostly only used at initialization, or for easy access to
values we could compute statically.  Instead of consuming that space (and
risking TLB misses), we can just pass around a pointer to stack data during
bootstrapping.
2018-07-23 13:37:08 -07:00
Qi Wang
4bc48718b2 Tolerate experimental features for abort_conf.
Not aborting with unrecognized experimental options.  This helps us testing
experimental features with abort_conf enabled.
2018-07-17 20:40:32 -07:00
David Goldblatt
55e5cc1341 SC: Make some key size classes static.
The largest small class, smallest large class, and largest large class may all
be needed down fast paths; to avoid the risk of touching another cache line, we
can make them available as constants.
2018-07-12 20:53:06 -07:00
David T. Goldblatt
5112d9e5fd Add MALLOC_CONF parsing for dynamic slab sizes.
This actually enables us to change the values.
2018-07-12 20:53:06 -07:00
David T. Goldblatt
4610ffa942 Bootstrapping: Parse MALLOC_CONF before using slab sizes.
I.e., parse before booting the bin module or sz module.  This lets us tweak size
class settings before committing to them by letting them leak into other
modules.

This commit does not actually do any tweaking of the size classes; it *just*
chanchanges bootstrapping order; this may help bisecting any bootstrapping
failures on poorly-tested architectures.
2018-07-12 20:53:06 -07:00
David T. Goldblatt
a7f68aed3e SC: Add page customization functionality. 2018-07-12 20:53:06 -07:00
David T. Goldblatt
017dca198c SC module: Add a note on style. 2018-07-12 20:53:06 -07:00
David Goldblatt
0552aad91b Kill size_classes.sh.
We've moved size class computations to boot time; they were being used only to
check that the computations resulted in equal values.
2018-07-12 20:53:06 -07:00
David Goldblatt
4f55c0ec22 Translate size class computation from bash shell into C.
This is the last big step in making size classes a runtime computation rather
than a configure-time one.

The compile-time computation has been left in, for now, to allow assertion
checking that the results are identical.
2018-07-12 20:53:06 -07:00
David Goldblatt
e904f813b4 Hide size class computation behind a layer of indirection.
This class removes almost all the dependencies on size_classes.h, accessing the
data there only via the new module sc.h, which does not depend on any
configuration options.

In a subsequent commit, we'll remove the configure-time size class computations,
doing them at boot time, instead.
2018-07-12 20:53:06 -07:00
gnzlbg
3d29d11ac2 Clean compilation -Wextra
Before this commit jemalloc produced many warnings when compiled with -Wextra
with both Clang and GCC. This commit fixes the issues raised by these warnings
or suppresses them if they were spurious at least for the Clang and GCC
versions covered by CI.

This commit:

* adds `JEMALLOC_DIAGNOSTIC` macros: `JEMALLOC_DIAGNOSTIC_{PUSH,POP}` are
  used to modify the stack of enabled diagnostics. The
  `JEMALLOC_DIAGNOSTIC_IGNORE_...` macros are used to ignore a concrete
  diagnostic.

* adds `JEMALLOC_FALLTHROUGH` macro to explicitly state that falling
  through `case` labels in a `switch` statement is intended

* Removes all UNUSED annotations on function parameters. The warning
  -Wunused-parameter is now disabled globally in
  `jemalloc_internal_macros.h` for all translation units that include
  that header. It is never re-enabled since that header cannot be
  included by users.

* locally suppresses some -Wextra diagnostics:

  * `-Wmissing-field-initializer` is buggy in older Clang and GCC versions,
    where it does not understanding that, in C, `= {0}` is a common C idiom
    to initialize a struct to zero

  * `-Wtype-bounds` is suppressed in a particular situation where a generic
    macro, used in multiple different places, compares an unsigned integer for
    smaller than zero, which is always true.

  * `-Walloc-larger-than-size=` diagnostics warn when an allocation function is
    called with a size that is too large (out-of-range). These are suppressed in
    the parts of the tests where `jemalloc` explicitly does this to test that the
    allocation functions fail properly.

* adds a new CI build bot that runs the log unit test on CI.

Closes #1196 .
2018-07-09 21:40:42 -07:00
Qi Wang
cdf15b458a Rename huge_threshold to experimental, and tweak documentation. 2018-06-29 10:35:02 -07:00
Qi Wang
1302af4c43 Add ctl and stats for opt.huge_threshold. 2018-06-29 10:35:02 -07:00
Qi Wang
79522b2fc2 Refactor arena_is_auto. 2018-06-29 10:35:02 -07:00
Qi Wang
94a88c26f4 Implement huge arena: opt.huge_threshold.
The feature allows using a dedicated arena for huge allocations.  We want the
addtional arena to separate huge allocation because: 1) mixing small extents
with huge ones causes fragmentation over the long run (this feature reduces VM
size significantly); 2) with many arenas, huge extents rarely get reused across
threads; and 3) huge allocations happen way less frequently, therefore no
concerns for lock contention.
2018-06-29 10:35:02 -07:00
Qi Wang
77a71ef2b7 Fall back to the default pthread_create if RTLD_NEXT fails. 2018-06-28 13:18:21 -07:00
David Goldblatt
d1e11d48d4 Move tsd link and in_hook after tcache.
This can lead to better cache utilization down the common paths where we don't
touch the link.
2018-06-27 13:39:02 -07:00
Qi Wang
fec1ef7c91 Fix arena locking in tcache_bin_flush_large().
This regression was introduced in c834912 (incorrect arena used).
2018-06-26 23:13:15 -07:00
Qi Wang
0ff7ff3ec7 Optimize ixalloc by avoiding a size lookup. 2018-06-05 21:03:51 -07:00
Qi Wang
c834912aa9 Avoid taking large_mtx for auto arenas.
On tcache flush path, we can avoid touching the large_mtx for auto arenas, since
it was only needed for manual arenas where arena_reset is allowed.
2018-06-05 15:16:03 -07:00
Qi Wang
9bd8deb260 Fix stats output for opt.lg_extent_max_active_fit. 2018-06-05 10:23:28 -07:00
Qi Wang
d22e150320 Avoid taking extents_muzzy mutex when muzzy is disabled.
When muzzy decay is disabled, no need to allocate from extents_muzzy.  This
saves us a couple of mutex operations down the extents_alloc path.
2018-05-24 14:40:56 -07:00
David Goldblatt
a7f749c9af Hooks: Protect against reentrancy.
Previously, we made the user deal with this themselves, but that's not good
enough; if hooks may allocate, we should test the allocation pathways down
hooks.  If we're doing that, we might as well actually implement the protection
for the user.
2018-05-18 11:43:03 -07:00
David Goldblatt
0379235f47 Tests: Shouldn't be able to change global slowness.
This can help ensure that we don't leave slowness changes behind in case of
resource exhaustion.
2018-05-18 11:43:03 -07:00
David Goldblatt
59e371f463 Hooks: Add a hook exhaustion test.
When we run out of space in which to store hooks, we should return EAGAIN from
the mallctl, but not otherwise misbehave.
2018-05-18 11:43:03 -07:00
David Goldblatt
bb071db92e Mallctl: Add experimental.hooks.[install|remove]. 2018-05-18 11:43:03 -07:00
David Goldblatt
126e9a84a5 Hooks: move the "extra" pointer into the hook_t itself.
This simplifies the mallctl call to install a hook, which should only take a
single argument.
2018-05-18 11:43:03 -07:00
David Goldblatt
cb0707c0fc Hooks: hook the realloc pathways that move/expand. 2018-05-18 11:43:03 -07:00
David Goldblatt
67270040a5 Hooks: hook the realloc paths that act as pure malloc/free. 2018-05-18 11:43:03 -07:00
David Goldblatt
83e516154c Hooks: hook the pure-expand function. 2018-05-18 11:43:03 -07:00
David Goldblatt
c154f5881b Hooks: hook the pure-deallocation functions. 2018-05-18 11:43:03 -07:00
David Goldblatt
226327cf66 Hooks: hook the pure-allocation functions. 2018-05-18 11:43:03 -07:00
David Goldblatt
fe0e399385 Hooks: add an early-exit path for the common no-hook case. 2018-05-18 11:43:03 -07:00
David Goldblatt
5ae6e7cbfa Add "hook" module.
The hook module allows a low-reader-overhead way of finding hooks to invoke and
calling them.

For now, none of the allocation pathways are tied into the hooks; this will come
later.
2018-05-18 11:43:03 -07:00
David Goldblatt
c7a87e0e0b Rename hooks module to test_hooks.
"Hooks" is really the best name for the module that will contain the publicly
exposed hooks.  So lets rename the current "hooks" module (that hook external
dependencies, for reentrancy testing) to "test_hooks".
2018-05-18 11:43:03 -07:00
David Goldblatt
e870829e64 TSD: Add the ability to enter a global slow path.
This gives any thread the ability to send other threads down slow paths the next
time they fetch tsd.
2018-05-18 11:43:03 -07:00
David Goldblatt
982c10de35 TSD: Make all state access happen through a function.
Shortly, tsd state will be atomic and have some complicated enough logic down
the state-setting path that we should be aware of it.
2018-05-18 11:43:03 -07:00
Qi Wang
09edea3f5c Tweak the format of the per arena summary section.
Increase the width to ensure enough space for long running programs.
2018-05-17 12:58:56 -07:00
Qi Wang
312352faa8 Fix background thread index issues with max_background_threads. 2018-05-15 12:25:23 -07:00
Qi Wang
e8a63b87c3 Fix an incorrect assertion.
When configured with --with-lg-page, it's possible for the configured page size
to be greater than the system page size, in which case the page address may only
be aligned with the system page size.
2018-05-09 23:52:56 -07:00
Latchesar Ionkov
a32b7bd567 Mallctl: Add arenas.lookup
Implement a new mallctl operation that allows looking up the arena a
region of memory belongs to.
2018-05-01 13:14:36 -07:00
Qi Wang
b8f4c730ef Remove an incorrect assertion.
Background threads are created without holding the global background_thread
lock, which mean paused state is possible (and fine).
2018-04-18 14:17:08 -07:00
Qi Wang
dedfeecc4e Invoke dlsym() on demand.
If no lazy lock or background thread is enabled, avoid dlsym pthread_create on
boot.
2018-04-18 11:20:21 -07:00
David Goldblatt
c95284df1a Avoid a resource leak down extent split failure paths.
Previously, we would leak the extent and memory associated with a salvageable
portion of an extent that we were trying to split in three, in the case where
the first split attempt succeeded and the second failed.
2018-04-18 08:19:41 -07:00
Qi Wang
e40b2f75bd Fix abort_conf processing.
When abort_conf is set, make sure we always error out at the end of the options
processing loop.
2018-04-17 18:23:53 -07:00
Qi Wang
0fadf4a2e3 Add UNUSED to avoid compiler warnings. 2018-04-16 13:50:21 -07:00
Qi Wang
3f0dc64c6b Allow setting extent hooks on uninitialized auto arenas.
Setting extent hooks can result in initializing an unused auto arena.  This is
useful to install extent hooks on auto arenas from the beginning.
2018-04-11 21:21:54 -07:00
Jason Evans
4937309620 Silence a compiler warning. 2018-04-10 17:59:00 -07:00
Dave Watson
8b14f3abc0 background_thread: add max thread count config
Looking at the thread counts in our services, jemalloc's background thread
is useful, but mostly idle.  Add a config option to tune down the number of threads.
2018-04-10 14:01:45 -07:00
Rajeev Misra
5f51882a0a Stack address should not be used for ordering mutexes 2018-04-10 10:16:57 -07:00
Qi Wang
d3e0976a2c Fix type warning on Windows.
Add cast since read / write has unsigned return type on windows.
2018-04-09 16:50:30 -07:00
Qi Wang
4df483f0fd Fix arguments passed to extent_init. 2018-04-09 16:35:58 -07:00
Qi Wang
2dccf45640 Control idump and gdump with prof_active. 2018-04-09 16:35:14 -07:00
Dave Watson
6d02421730 extents: Remove preserve_lru feature.
preserve_lru feature adds lots of complication, for little value.
Removing it means merged extents are re-added to the lru list, and may
take longer to madvise away than they otherwise would.

Canaries after removal seem flat for several services (no change).
2018-04-02 12:40:28 -07:00
Qi Wang
21eb0d15a6 Fix a background_thread shutdown issue.
1) make sure background thread 0 is always created; and 2) fix synchronization
between thread 0 and the control thread.
2018-04-02 10:03:47 -07:00
Qi Wang
956c4ad6b5 Change mutable option output in stats to avoid stringify issues. 2018-03-15 14:42:48 -07:00
Qi Wang
baffeb1d0a Fix a typo in stats. 2018-03-15 14:42:48 -07:00
David Goldblatt
4c36cd2cc5 Stats printing: Convert arena large stats to use emitter.
This completes the conversion; we now have only structured text output.
2018-03-09 11:47:17 -08:00
David Goldblatt
4eed989bbf Stats printing: convert arena bin stats to use emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
a9f3cedc6e Stats printing: remove a spurious newline.
This was left over from a previous emitter conversion.  It didn't affect the
correctness of the output.
2018-03-09 11:47:17 -08:00
David Goldblatt
a1738f4efd Stats printing: Make arena mutex stats use the emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
07fb707623 Stats printing: convert most per-arena stats to use the emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
8fc850695d Stats printing: convert paging and alloc counts to use the emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
bc6620f73e Stats printing: convert decay stats to use the emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
a6ef061c43 Stats printing: Move emitter cutoff point into stats_arena_print. 2018-03-09 11:47:17 -08:00
David Goldblatt
cbde666d9a Stats printing: move stats_print_helper to use emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
86c61d4a57 Stats printing: Move global mutex stats to use emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
9e1846b004 Stats printing: move non-mutex arena stats to the emitter.
Another step in the conversion process.  The mutex is a little different,
because we we want to emit it as an array.
2018-03-09 11:47:17 -08:00
David Goldblatt
8076b28721 Stats printing: Remove explicit callback passing to stats_print_helper.
This makes the emitter the only source of callback information, which is a step
towards where we want to be.
2018-03-09 11:47:17 -08:00
David Goldblatt
0d20eda127 Stats printing: Move emitter -> manual cutoff point.
This makes it so that the "general" portion of the stats code is completely
agnostic to emitter type.
2018-03-09 11:47:17 -08:00
David Goldblatt
ec31d476ff Stats printing: Convert profiling stats to use the emitter.
While we're at it, print them in table form, too.
2018-03-09 11:47:17 -08:00
David Goldblatt
e5acc35400 Stats printing: Convert general arena stats to use the emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
4a335e0c6f Stats printing: convert config and opt output to use emitter.
This is a step along the path towards using the emitter for all stats output.
2018-03-09 11:47:17 -08:00
David Goldblatt
b646f89173 Stats printing: Convert header and footer to use emitter. 2018-03-09 11:47:17 -08:00
Qi Wang
e4f090e8df Add opt.thp which allows explicit hugepage usage.
"always" marks all user mappings as MADV_HUGEPAGE; while "never" marks all
mappings as MADV_NOHUGEPAGE. The default setting "default" does not change any
settings.  Note that all the madvise calls are part of the default extent hooks
by design, so that customized extent hooks have complete control over the
mappings including hugepage settings.
2018-03-08 13:08:06 -08:00
Qi Wang
efa40532dc Remove config.thp which wasn't in use. 2018-03-08 13:08:06 -08:00
David Goldblatt
26b1c13982 Background threads: fix an indexing bug.
We have a buffer overrun that manifests in the case where arena indices higher
than the number of CPUs are accessed before arena indices lower than the number
of CPUs.  This fixes the bug and adds a test.
2018-02-27 19:43:05 -08:00
Christopher Ferris
f78d4ca3fb Modify configure to determine return value of strerror_r.
On glibc and Android's bionic, strerror_r returns char* when
_GNU_SOURCE is defined.

Add a configure check for this rather than assume glibc is the
only libc that behaves this way.
2018-01-10 21:01:18 -08:00
Qi Wang
ba5992fe9a Improve the fit for aligned allocation.
We compute the max size required to satisfy an alignment.  However this can be
quite pessimistic, especially with frequent reuse (and combined with state-based
fragmentation).  This commit adds one more fit step specific to aligned
allocations, searching in all potential fit size classes.
2018-01-05 14:27:58 -08:00
Rajeev Misra
f47e39d11a handle 32 bit mutex counters 2018-01-04 11:08:17 -08:00
David Goldblatt
d41b19f9c7 Implement arena regind computation using div_info_t.
This eliminates the need to generate an enormous switch statement in
arena_slab_regind.
2017-12-21 14:25:43 -08:00
David Goldblatt
21f7c13d0b Add the div module, which allows fast division by dynamic values. 2017-12-21 14:25:43 -08:00
David T. Goldblatt
7f1b02e3fa Split up and standardize naming of stats code.
The arena-associated stats are now all prefixed with arena_stats_, and live in
their own file.  Likewise, malloc_bin_stats_t -> bin_stats_t, also in its own
file.
2017-12-18 16:29:10 -08:00
David T. Goldblatt
901d94a2b0 Rename cache_alloc_easy to cache_bin_alloc_easy.
This lives in the cache_bin module; just a typo.
2017-12-18 16:29:10 -08:00
David T. Goldblatt
8aafa270fd Move bin stats code from arena to bin module. 2017-12-18 16:29:10 -08:00
David T. Goldblatt
48bb4a056b Move bin forking code from arena to bin module. 2017-12-18 16:29:10 -08:00
David T. Goldblatt
a8dd8876fb Move bin initialization from arena module to bin module. 2017-12-18 16:29:10 -08:00
David T. Goldblatt
4bf4a1c4ea Pull out arena_bin_info_t and arena_bin_t into their own file.
In the process, kill arena_bin_index, which is unused.  To follow are several
diffs continuing this separation.
2017-12-18 16:29:10 -08:00
Qi Wang
740bdd68b1 Over purge by 1 extent always.
When purging, large allocations are usually the ones that cross the npages_limit
threshold, simply because they are "large".  This means we often leave the large
extent around for a while, which has the downsides of: 1) high RSS and 2) more
chance of them getting fragmented.  Given that they are not likely to be reused
very soon (LRU), let's over purge by 1 extent (which is often large and not
reused frequently).
2017-12-18 12:57:07 -08:00
Qi Wang
5e0332890f Output opt.lg_extent_max_active_fit in stats. 2017-12-14 15:49:15 -08:00
Qi Wang
955b1d9cc5 Fix extent deregister on the leak path.
On leak path we should not adjust gdump when deregister.
2017-12-08 22:22:03 -08:00
Qi Wang
6e841f618a Add more tests for extent hooks failure paths. 2017-11-28 21:52:49 -08:00
Qi Wang
26a8f82c48 Add missing deregister before extents_leak.
This fixes an regression introduced by 211b1f3 (refactor extent split).
2017-11-19 21:12:40 -08:00
Qi Wang
e475d03752 Avoid setting zero and commit if split fails in extent_recycle. 2017-11-19 21:12:27 -08:00
Qi Wang
3e64dae802 Eagerly coalesce large extents.
Coalescing is a small price to pay for large allocations since they happen less
frequently.  This reduces fragmentation while also potentially improving
locality.
2017-11-16 15:32:02 -08:00
Qi Wang
eb1b08daae Fix an extent coalesce bug.
When coalescing, we should take both extents off the LRU list; otherwise decay
can grab the existing outer extent through extents_evict.
2017-11-16 15:32:02 -08:00
Qi Wang
fac706836f Add opt.lg_extent_max_active_fit
When allocating from dirty extents (which we always prefer if available), large
active extents can get split even if the new allocation is much smaller, in
which case the introduced fragmentation causes high long term damage.  This new
option controls the threshold to reuse and split an existing active extent.  We
avoid using a large extent for much smaller sizes, in order to reduce
fragmentation.  In some workload, adding the threshold improves virtual memory
usage by >10x.
2017-11-16 15:32:02 -08:00
Qi Wang
282a3faa17 Use extent_heap_first for best fit.
extent_heap_any makes the layout less predictable and as a result incurs more
fragmentation.
2017-11-16 15:32:02 -08:00
Dave Watson
d6feed6e66 Use tsd offset_state instead of atomic
While working on #852, I noticed the prng state is atomic.  This is the only
atomic use of prng in all of jemalloc.  Instead, use a threadlocal prng
state if possible to avoid unnecessary cache line contention.
2017-11-14 08:58:18 -08:00
Qi Wang
cb3b72b975 Fix base allocator THP auto mode locking and stats.
Added proper synchronization for switching to using THP in auto mode.  Also
fixed stats for number of THPs used.
2017-11-09 16:14:12 -08:00
Qi Wang
b5d071c266 Fix unbounded increase in stash_decayed.
Added an upper bound on how many pages we can decay during the current run.
Without this, decay could have unbounded increase in stashed, since other
threads could add new pages into the extents.
2017-11-08 16:33:30 -08:00
Qi Wang
6dd5681ab7 Use hugepage alignment for base allocator.
This gives us an easier way to tell if the allocation is for metadata in the
extent hooks.
2017-11-03 19:37:13 -07:00
Qi Wang
e422fa8e7e Add arena.i.retain_grow_limit
This option controls the max size when grow_retained.  This is useful when we
have customized extent hooks reserving physical memory (e.g. 1G huge pages).
Without this feature, the default increasing sequence could result in fragmented
and wasted physical memory.
2017-11-03 13:53:33 -07:00
Edward Tomasz Napierala
9f455e2786 Try to use sysctl(3) instead of sysctlbyname(3).
This attempts to use VM_OVERCOMMIT OID - newly introduced in -CURRENT
few days ago, specifically for this purpose - instead of querying the
sysctl by its string name.  Due to how syctlbyname(3) works, this means
we do one syscall during binary startup instead of two.

Signed-off-by: Edward Tomasz Napierala <trasz@FreeBSD.org>
2017-11-03 08:25:39 -07:00
Edward Tomasz Napierala
d591df05c8 Use getpagesize(3) under FreeBSD.
This avoids sysctl(2) syscall during binary startup, using the value
passed in the ELF aux vector instead.

Signed-off-by: Edward Tomasz Napierala <trasz@FreeBSD.org>
2017-11-03 08:25:39 -07:00
Qi Wang
58eba024c0 metadata_thp: auto mode adjustment for a0.
We observed that arena 0 can have much more metadata allocated comparing to
other arenas.  Tune the auto mode to only switch to huge page on the 5th block
(instead of 3 previously) for a0.
2017-11-01 13:52:06 -07:00
Qi Wang
47203d5f42 Output all counters for bin mutex stats.
The saved space is not worth the trouble of missing counters.
2017-10-19 16:31:54 -07:00
David Goldblatt
d14bbf8d81 Add a "dumpable" bit to the extent state.
Currently, this is unused (i.e. all extents are always marked dumpable).  In the
future, we'll begin using this functionality.
2017-10-16 15:35:49 -07:00
David Goldblatt
bbaa72422b Add pages_dontdump and pages_dodump.
This will, eventually, enable us to avoid dumping eden regions.
2017-10-16 15:35:49 -07:00
David Goldblatt
211b1f3c7d Factor out extent-splitting core from extent lifetime management.
Before this commit, extent_recycle_split intermingles the splitting of an extent
and the return of parts of that extent to a given extents_t.  After it, that
logic is separated.  This will enable splitting extents that don't live in any
extents_t (as the grow retained region soon will).
2017-10-16 15:35:49 -07:00
David Goldblatt
5bad01c38e Document some of the internal extent functions. 2017-10-16 15:35:49 -07:00
Qi Wang
31ab38be5f Define MADV_FREE on our own when needed.
On x86 Linux, we define our own MADV_FREE if madvise(2) is available, but no
MADV_FREE is detected.  This allows the feature to be built in and enabled with
runtime detection.
2017-10-11 15:49:22 -07:00
Qi Wang
7e74093c96 Set isthreaded manually.
Avoid relying pthread_once which creates dependency during init.
2017-10-05 22:57:56 -07:00
Qi Wang
a2e6eb2c22 Delay background_thread_ctl_init to right before thread creation.
ctl_init sets isthreaded, which means it should be done without holding any
locks.
2017-10-05 22:57:56 -07:00
Qi Wang
79e83451ff Enable a0 metadata thp on the 3rd base block.
Since we allocate rtree nodes from a0's base, it's pushed to over 1 block on
initialization right away, which makes the auto thp mode less effective on a0.
We change a0 to make the switch on the 3rd block instead.
2017-10-05 13:39:03 -07:00
David Goldblatt
1245faae90 Power: disable the CPU_SPINWAIT macro.
Quoting from https://github.com/jemalloc/jemalloc/issues/761 :

[...] reading the Power ISA documentation[1], the assembly in [the CPU_SPINWAIT
macro] isn't correct anyway (as @marxin points out): the setting of the
program-priority register is "sticky", and we never undo the lowering.

We could do something similar, but given that we don't have testing here in the
first place, I'm inclined to simply not try. I'll put something up reverting the
problematic commit tomorrow.

[1] Book II, chapter 3 of the 2.07B or 3.0B ISA documents.
2017-10-04 18:37:23 -07:00
Dave Watson
7c6c99b829 Use ph instead of rb tree for extents_avail_
There does not seem to be any overlap between usage of
extent_avail and extent_heap, so we can use the same hook.

The only remaining usage of rb trees is in the profiling code,
which has some 'interesting' iteration constraints.

Fixes #888
2017-10-04 12:23:03 -07:00
David Goldblatt
8a7ee3014c Logging: capitalize log macro.
Dodge a name-conflict with the math.h logarithm function. D'oh.
2017-10-02 20:44:43 -07:00
Qi Wang
0720192a32 Add runtime detection of lazy purging support.
It's possible to build with lazy purge enabled but depoly to systems without
such support.  In this case, rely on the boot time detection instead of keep
making unnecessary madvise calls (which all returns EINVAL).
2017-09-26 17:26:22 -07:00
Qi Wang
eaa58a5026 Put static keyword first.
Fix a warning by -Wold-style-declaration.
2017-09-21 12:18:10 -07:00
Qi Wang
9b20a4bf70 Clear cache bin ql postfork.
This fixes a regression in 9c05490, which introduced the new cache bin ql.  The
list needs to be cleaned up after fork, same as tcache_ql.
2017-09-12 16:16:12 -07:00
Qi Wang
a315688be0 Relax constraints on reentrancy for extent hooks.
If we guarantee no malloc activity in extent hooks, it's possible to make
customized hooks working on arena 0.  Remove the non-a0 assertion to enable such
use cases.
2017-08-31 11:03:34 -07:00
Qi Wang
e55c3ca267 Add stats for metadata_thp.
Report number of THPs used in arena and aggregated stats.
2017-08-30 16:47:32 -07:00
Qi Wang
47b20bb654 Change opt.metadata_thp to [disabled,auto,always].
To avoid the high RSS caused by THP + low usage arena (i.e. THP becomes a
significant percentage), added a new "auto" option which will only start using
THP after a base allocator used up the first THP region.  Starting from the
second hugepage (in a single arena), "auto" behaves the same as "always",
i.e. madvise hugepage right away.
2017-08-30 16:47:32 -07:00
David Goldblatt
9c0549007d Make arena stats collection go through cache bins.
This eliminates the need for the arena stats code to "know" about tcaches; all
that it needs is a cache_bin_array_descriptor_t to tell it where to find
cache_bins whose stats it should aggregate.
2017-08-16 17:48:44 -07:00
David Goldblatt
f3170baa30 Pull out caching for a bin into its own file.
This is the first step towards breaking up the tcache and arena (since they
interact primarily at the bin level).  It should also make a future arena
caching implementation more straightforward.
2017-08-16 17:48:44 -07:00
Qi Wang
3ec279ba1c Fix test/unit/pages.
As part of the metadata_thp support, We now have a separate swtich
(JEMALLOC_HAVE_MADVISE_HUGE) for MADV_HUGEPAGE availability.  Use that instead
of JEMALLOC_THP (which doesn't guard pages_huge anymore) in tests.
2017-08-11 15:57:12 -07:00
Qi Wang
8fdd9a5797 Implement opt.metadata_thp
This option enables transparent huge page for base allocators (require
MADV_HUGEPAGE support).
2017-08-11 14:51:20 -07:00
Ryan Libby
048c6679cd Remove external linkage for spin_adaptive
The external linkage for spin_adaptive was not used, and the inline
declaration of spin_adaptive that was used caused a probem on FreeBSD
where CPU_SPINWAIT is implemented as a call to a static procedure for
x86 architectures.
2017-08-08 10:30:21 -07:00
Qi Wang
1ab2ab294c Only read szind if ptr is not paged aligned in sdallocx.
If ptr is not page aligned, we know the allocation was not sampled. In this case
use the size passed into sdallocx directly w/o accessing rtree.  This improve
sdallocx efficiency in the common case (not sampled && small allocation).
2017-07-31 15:47:48 -07:00
Qi Wang
3800e55a2c Bypass extent_alloc_wrapper_hard for no_move_expand.
When retain is enabled, we should not attempt mmap for in-place expansion
(large_ralloc_no_move), because it's virtually impossible to succeed, and causes
unnecessary syscalls (which can cause lock contention under load).
2017-07-31 14:04:17 -07:00
David Goldblatt
e6aeceb606 Logging: log using the log var names directly.
Currently we have to log by writing something like:

  static log_var_t log_a_b_c = LOG_VAR_INIT("a.b.c");
  log (log_a_b_c, "msg");

This is sort of annoying.  Let's just write:

  log("a.b.c", "msg");
2017-07-24 14:55:54 -07:00
Qinfan Wu
b28f31e7ed Split out cold code path in newImpl
I noticed that the whole newImpl is inlined. Since OOM handling code is
rarely executed, we should only inline the hot path.
2017-07-24 13:37:02 -07:00
David Goldblatt
a9f7732d45 Logging: allow logging with empty varargs.
Currently, the log macro requires at least one argument after the format string,
because of the way the preprocessor handles varargs macros.  We can hide some of
that irritation by pushing the extra arguments into a varargs function.
2017-07-22 09:38:19 -07:00
Y. T. Chung
aa6c282137 Validates fd before calling fcntl 2017-07-22 07:46:30 -07:00
David T. Goldblatt
e215a7bc18 Add entry and exit logging to all core functions.
I.e. mallloc, free, the allocx API, the posix extensions.
2017-07-20 17:58:37 -07:00
David T. Goldblatt
9761b449c8 Add a logging facility.
This sets up a hierarchical logging facility, so that we can add logging
statements liberally, and turn them on in a fine-grained manner.
2017-07-20 17:58:37 -07:00
Y. T. Chung
0975b88dfd Fall back to FD_CLOEXEC when O_CLOEXEC is unavailable.
Older Linux systems don't have O_CLOEXEC.  If that's the case, we fcntl
immediately after open, to minimize the length of the racy period in
which an
operation in another thread can leak a file descriptor to a child.
2017-07-20 14:13:33 -07:00
David Goldblatt
0a4f5a7eea Fix deadlock in multithreaded fork in OS X.
On OS X, we rely on the zone machinery to call our prefork and postfork
handlers.

In zone_force_unlock, we call jemalloc_postfork_child, reinitializing all our
mutexes regardless of state, since the mutex implementation will assert if the
tid of the unlocker is different from that of the locker.  This has the effect
of unlocking the mutexes, but also fails to wake any threads waiting on them in
the parent.

To fix this, we track whether or not we're the parent or child after the fork,
and unlock or reinit as appropriate.

This resolves #895.
2017-07-10 18:17:12 -07:00
Qi Wang
cb032781bd Add extent_grow_mtx in pre_ / post_fork handlers.
This fixed the issue that could cause the child process to stuck after fork.
2017-06-29 17:01:18 -07:00
Qi Wang
aa363f9388 Fix pthread_sigmask() usage to block all signals. 2017-06-26 11:27:21 -07:00
Qi Wang
57beeb2fcb Switch ctl to explicitly use tsd instead of tsdn. 2017-06-23 13:27:53 -07:00
Qi Wang
425463a446 Check arena in current context in pre_reentrancy. 2017-06-23 13:27:53 -07:00
Qi Wang
d6eb8ac8f3 Set reentrancy when invoking customized extent hooks.
Customized extent hooks may malloc / free thus trigger reentry.  Support this
behavior by adding reentrancy on hook functions.
2017-06-23 13:27:53 -07:00
Jason Evans
d49ac4c709 Fix assertion typos.
Reported by Conrad Meyer.
2017-06-23 11:48:00 -07:00
Qi Wang
a3f4977217 Add thread name for background threads. 2017-06-23 10:54:54 -07:00
Qi Wang
52fc887b49 Avoid inactivity_check within background threads.
Passing is_background_thread down the decay path, so that background thread
itself won't attempt inactivity_check.  This fixes an issue with background
thread doing trylock on a mutex it already owns.
2017-06-22 16:53:58 -07:00
Jason Evans
37f3fa0941 Mask signals during background thread creation.
This prevents signals from being inadvertently delivered to background
threads.
2017-06-20 17:47:38 -07:00
Qi Wang
d35c037e03 Clear tcache_ql after fork in child. 2017-06-19 21:53:07 -07:00
Qi Wang
9b1befabbb Add minimal initialized TSD.
We use the minimal_initilized tsd (which requires no cleanup) for free()
specifically, if tsd hasn't been initialized yet.

Any other activity will transit the state from minimal to normal.  This is to
workaround the case where a thread has no malloc calls in its lifetime until
during thread termination, free() happens after tls destructors.
2017-06-15 17:55:53 -07:00
Qi Wang
ae93fb08e2 Pass tsd to tcache_flush(). 2017-06-15 17:55:53 -07:00
Qi Wang
84f6c2cae0 Log decay->nunpurged before purging.
During purging, we may unlock decay->mtx.  Therefore we should finish logging
decay related counters before attempt to purge.
2017-06-14 20:18:02 -07:00
Qi Wang
a4d6fe73cf Only abort on dlsym when necessary.
If neither background_thread nor lazy_lock is in use, do not abort on dlsym
errors.
2017-06-14 13:27:41 -07:00
Qi Wang
d955d6f2be Fix extent_hooks in extent_grow_retained().
This issue caused the default extent alloc function to be incorrectly
used even when arena.<i>.extent_hooks is set.  This bug was introduced
by 411697adcd (Use exponential series to
size extents.), which was first released in 5.0.0.
2017-06-14 09:34:29 -07:00
Qi Wang
394df9519d Combine background_thread started / paused into state. 2017-06-12 08:56:14 -07:00
Qi Wang
b83b5ad44a Not re-enable background thread after fork.
Avoid calling pthread_create in postfork handlers.
2017-06-12 08:56:14 -07:00
Qi Wang
464cb60490 Move background thread creation to background_thread_0.
To avoid complications, avoid invoking pthread_create "internally", instead rely
on thread0 to launch new threads, and also terminating threads when asked.
2017-06-12 08:56:14 -07:00
Jason Evans
13685ab1b7 Normalize background thread configuration.
Also fix a compilation error #ifndef JEMALLOC_PTHREAD_CREATE_WRAPPER.
2017-06-08 23:01:26 -07:00
Jason Evans
94d655b8bd Update a UTRACE() size argument. 2017-06-08 15:33:52 -07:00
Qi Wang
5642f03cae Add internal tsd for background_thread. 2017-06-08 10:02:18 -07:00
Qi Wang
73713fbb27 Drop high rank locks when creating threads.
Avoid holding arenas_lock and background_thread_lock when creating background
threads, because pthread_create may take internal locks, and potentially cause
deadlock with jemalloc internal locks.
2017-06-08 10:02:18 -07:00
Qi Wang
00869e39a3 Make tsd no-cleanup during tsd reincarnation.
Since tsd cleanup isn't guaranteed when reincarnated, we set up tsd in a way
that needs no cleanup, by making it going through slow path instead.
2017-06-07 11:03:49 -07:00
Qi Wang
29c2577ee0 Remove assertions on extent_hooks being default.
It's possible to customize the extent_hooks while still using part of the
default implementation.
2017-06-05 10:56:40 -07:00
Qi Wang
3a813946fb Take background thread lock when setting extent hooks. 2017-06-05 10:56:25 -07:00
Qi Wang
530c07a45b Set reentrancy level to 1 during init.
This makes sure we go down slow path w/ a0 in init.
2017-06-02 12:59:21 -07:00
Qi Wang
340071f0cf Set isthreaded when enabling background_thread. 2017-06-01 17:34:49 -07:00
Qi Wang
c84ec3e9da Fix background thread creation.
The state initialization should be done before pthread_create.
2017-06-01 09:00:07 -07:00
Jason Evans
b511232fcd Refactor/fix background_thread/percpu_arena bootstrapping.
Refactor bootstrapping such that dlsym() is called during the
bootstrapping phase that can tolerate reentrant allocation.
2017-06-01 08:55:27 -07:00
David Goldblatt
fa35463d56 Witness assertions: only assert locklessness when non-reentrant.
Previously we could still hit these assertions down error paths or in the
extended API.
2017-05-31 17:02:54 -07:00
Qi Wang
508f54b02b Use real pthread_create for creating background threads. 2017-05-31 16:48:13 -07:00
David Goldblatt
8261e581be Header refactoring: Pull size helpers out of jemalloc module. 2017-05-31 13:08:45 -07:00
David Goldblatt
041e041e1f Header refactoring: unify and de-catchall mutex_pool. 2017-05-31 13:08:45 -07:00
David Goldblatt
98774e64a4 Header refactoring: unify and de-catchall extent_mmap module. 2017-05-31 13:08:45 -07:00
David Goldblatt
93284bb53d Header refactoring: unify and de-catchall extent_dss. 2017-05-31 13:08:45 -07:00
David Goldblatt
44f9bd147a Header refactoring: unify and de-catchall rtree module. 2017-05-31 13:08:45 -07:00
Jason Evans
10d090aae9 Pass the O_CLOEXEC flag to open(2).
This resolves #528.
2017-05-31 08:50:35 -07:00
Qi Wang
66813916b5 Track background thread status separately at fork.
Use a separate boolean to track the enabled status, instead of leaving the
global background thread status inconsistent.
2017-05-31 08:27:31 -07:00
Qi Wang
2e4d1a4e30 Output total_wait_ns for bin mutexes. 2017-05-30 22:25:11 -07:00
Qi Wang
7578b0e929 Explicitly say so when aborting on opt_abort_conf. 2017-05-30 17:37:35 -07:00
Jason Evans
c606a87d2a Add the --disable-thp option to support cross compiling.
This resolves #669.
2017-05-30 11:30:54 -07:00
Qi Wang
bf6673a070 Fix npages during arena_decay_epoch_advance().
We do not lock extents while advancing epoch.  This change makes sure that we
only read npages from extents once in order to avoid any inconsistency.
2017-05-30 10:26:53 -07:00
Jason Evans
168793a1c1 Fix extent_grow_next management.
Fix management of extent_grow_next to serialize operations that may grow
retained memory.  This assures that the sizes of the newly allocated
extents correspond to the size classes in the intended growth sequence.

Fix management of extent_grow_next to skip size classes if a request is
too large to be satisfied by the next size in the growth sequence.  This
avoids the potential for an arbitrary number of requests to bypass
triggering extent_grow_next increases.

This resolves #858.
2017-05-29 17:27:18 -07:00
Jason Evans
a16114866a Fix OOM paths in extent_grow_retained(). 2017-05-29 17:27:18 -07:00
Qi Wang
d5ef5ae934 Add opt.stats_print_opts.
The value is passed to atexit(3)-triggered malloc_stats_print() calls.
2017-05-29 11:54:00 -07:00
Qi Wang
b86d271cbf Added opt_abort_conf: abort on invalid config options. 2017-05-26 21:14:28 -07:00
Qi Wang
927239b910 Cleanup smoothstep.sh / .h.
h_step_sum was used to compute moving sum.  Not in use anymore.
2017-05-25 16:52:10 -07:00
Qi Wang
1df18d7c83 Fix stats.mapped during deallocation. 2017-05-24 15:57:46 -07:00
David Goldblatt
18ecbfa89e Header refactoring: unify and de-catchall mutex module 2017-05-24 15:27:30 -07:00
David Goldblatt
9f822a1fd7 Header refactoring: unify and de-catchall witness code. 2017-05-24 15:27:30 -07:00
Jason Evans
196a53c2ae Do not assume dss never decreases.
An sbrk() caller outside jemalloc can decrease the dss, so add a
separate atomic boolean to explicitly track whether jemalloc is
concurrently calling sbrk(), rather than depending on state outside
jemalloc's full control.

This resolves #802.
2017-05-23 15:31:29 -07:00
Jason Evans
9b1038d19c Do not hold the base mutex while calling extent hooks.
Drop the base mutex while allocating new base blocks, because extent
allocation can enter code that prohibits holding non-core mutexes, e.g.
the extent_[d]alloc() and extent_purge_forced_wrapper() calls in
extent_alloc_dss().

This partially resolves #802.
2017-05-23 15:31:29 -07:00
Qi Wang
eeefdf3ce8 Fix # of unpurged pages in decay algorithm.
When # of dirty pages move below npages_limit (e.g. they are reused), we should
not lower number of unpurged pages because that would cause the reused pages to
be double counted in the backlog (as a result, decay happen slower than it
should).  Instead, set number of unpurged to the greater of current npages and
npages_limit.

Added an assertion: the ceiling # of pages should be greater than npages_limit.
2017-05-23 13:48:30 -07:00
Qi Wang
0eae838b0d Check for background thread inactivity on extents_dalloc.
To avoid background threads sleeping forever with idle arenas, we eagerly check
background threads' sleep time after extents_dalloc, and signal the thread if
necessary.
2017-05-23 12:26:20 -07:00
Qi Wang
5f5ed2198e Add profiling for the background thread mutex. 2017-05-23 12:26:20 -07:00
Qi Wang
2bee0c6251 Add background thread related stats. 2017-05-23 12:26:20 -07:00
Qi Wang
b693c7868e Implementing opt.background_thread.
Added opt.background_thread to enable background threads, which handles purging
currently.  When enabled, decay ticks will not trigger purging (which will be
left to the background threads).  We limit the max number of threads to NCPUs.
When percpu arena is enabled, set CPU affinity for the background threads as
well.

The sleep interval of background threads is dynamic and determined by computing
number of pages to purge in the future (based on backlog).
2017-05-23 12:26:20 -07:00
David Goldblatt
3f685e8824 Protect the rtree/extent interactions with a mutex pool.
Instead of embedding a lock bit in rtree leaf elements, we associate extents
with a small set of mutexes.  This gets us two things:

- We can use the system mutexes.  This (hypothetically) protects us from
  priority inversion, and lets us stop doing a backoff/sleep loop, instead
  opting for precise wakeups from the mutex.
- Cuts down on the number of mutex acquisitions we have to do (from 4 in the
  worst case to two).

We end up simplifying most of the rtree code (which no longer has to deal with
locking or concurrency at all), at the cost of additional complexity in the
extent code: since the mutex protecting the rtree leaf elements is determined by
reading the extent out of those elements, the initial read is racy, so that we
may acquire an out of date mutex.  We re-check the extent in the leaf after
acquiring the mutex to protect us from this race.
2017-05-19 14:21:27 -07:00
David Goldblatt
26c792e61a Allow mutexes to take a lock ordering enum at construction.
This lets us specify whether and how mutexes of the same rank are allowed to be
acquired.  Currently, we only allow two polices (only a single mutex at a given
rank at a time, and mutexes acquired in ascending order), but we can plausibly
allow more (e.g. the "release uncontended mutexes before blocking").
2017-05-19 14:21:27 -07:00
Jason Evans
6e62c62862 Refactor *decay_time into *decay_ms.
Support millisecond resolution for decay times.  Among other use cases
this makes it possible to specify a short initial dirty-->muzzy decay
phase, followed by a longer muzzy-->clean decay phase.

This resolves #812.
2017-05-18 11:33:45 -07:00
Qi Wang
baf3e294e0 Add stats: arena uptime. 2017-05-18 10:04:28 -07:00
Jason Evans
18a83681cf Refactor (MALLOCX_ARENA_MAX + 1) to be MALLOCX_ARENA_LIMIT.
This resolves #673.
2017-05-14 10:14:23 -07:00
Jason Evans
909f0482e4 Automatically generate private symbol name mangling macros.
Rather than using a manually maintained list of internal symbols to
drive name mangling, add a compilation phase to automatically extract
the list of internal symbols.

This resolves #677.
2017-05-11 23:06:54 -07:00
Jason Evans
a268af5085 Stop depending on JEMALLOC_N() for function interception during testing.
Instead, always define function pointers for interceptable functions,
but mark them const unless testing, so that the compiler can optimize
out the pointer dereferences.
2017-05-11 23:06:54 -07:00
Qi Wang
fc1aaf13fe Revert "Use trylock in tcache_bin_flush when possible."
This reverts commit 8584adc451.  Production
results not favorable.  Will investigate separately.
2017-05-01 14:49:42 -07:00
David Goldblatt
209f2926b8 Header refactoring: tsd - cleanup and dependency breaking.
This removes the tsd macros (which are used only for tsd_t in real builds).  We
break up the circular dependencies involving tsd.

We also move all tsd access through getters and setters.  This allows us to
assert that we only touch data when tsd is in a valid state.

We simplify the usages of the x macro trick, removing all the customizability
(get/set, init, cleanup), moving the lifetime logic to tsd_init and tsd_cleanup.
This lets us make initialization order independent of order within tsd_t.
2017-05-01 10:49:56 -07:00
Jason Evans
c86c8f4ffb Add extent_destroy_t and use it during arena destruction.
Add the extent_destroy_t extent destruction hook to extent_hooks_t, and
use it during arena destruction.  This hook explicitly communicates to
the callee that the extent must be destroyed or tracked for later reuse,
lest it be permanently leaked.  Prior to this change, retained extents
could unintentionally be leaked if extent retention was enabled.

This resolves #560.
2017-04-29 09:24:12 -07:00
Jason Evans
b9ab04a191 Refactor !opt.munmap to opt.retain. 2017-04-29 09:24:12 -07:00
Qi Wang
5c56603e91 Inline tcache_bin_flush_small_impl / _large_impl. 2017-04-27 17:49:39 -07:00
Qi Wang
8584adc451 Use trylock in tcache_bin_flush when possible.
During tcache gc, use tcache_bin_try_flush_small / _large so that we can skip
items with their bins locked already.
2017-04-25 17:21:33 -07:00
Qi Wang
e2aad5e810 Remove redundant extent lookup in tcache_bin_flush_large. 2017-04-25 16:50:12 -07:00
Qi Wang
05775a3736 Avoid prof_dump during reentrancy. 2017-04-25 12:54:36 -07:00
David Goldblatt
268843ac68 Header refactoring: pages.h - unify and remove from catchall. 2017-04-25 09:51:38 -07:00
David Goldblatt
dab4beb277 Header refactoring: hash - unify and remove from catchall. 2017-04-25 09:51:38 -07:00
David Goldblatt
89e2d3c12b Header refactoring: ctl - unify and remove from catchall.
In order to do this, we introduce the mutex_prof module, which breaks a circular
dependency between ctl and prof.
2017-04-25 09:51:38 -07:00
Jason Evans
c67c3e4a63 Replace --disable-munmap with opt.munmap.
Control use of munmap(2) via a run-time option rather than a
compile-time option (with the same per platform default).  The old
behavior of --disable-munmap can be achieved with
--with-malloc-conf=munmap:false.

This partially resolves #580.
2017-04-24 20:37:16 -07:00
Qi Wang
cf6035e1ee Use trylock in arena_decay_impl().
If another thread is working on decay, we don't have to wait for the mutex.
2017-04-24 13:23:55 -07:00
Qi Wang
f970c497dc Implement malloc_mutex_trylock() w/ proper stats update. 2017-04-24 13:23:55 -07:00
David Goldblatt
31b43219db Header refactoring: size_classes module - remove from the catchall 2017-04-24 10:33:21 -07:00
David Goldblatt
68da2361d2 Header refactoring: ckh module - remove from the catchall and unify. 2017-04-24 10:33:21 -07:00
David Goldblatt
bf2dc7e678 Header refactoring: ticker module - remove from the catchall and unify. 2017-04-24 10:33:21 -07:00
David Goldblatt
fa3ad730c4 Header refactoring: prng module - remove from the catchall and unify. 2017-04-24 10:33:21 -07:00
David Goldblatt
4d2e4bf5eb Get rid of most of the various inline macros. 2017-04-24 10:33:21 -07:00
David Goldblatt
425253e2cd Enable -Wundef, when supported.
This can catch bugs in which one header defines a numeric constant, and another
uses it without including the defining header. Undefined preprocessor symbols
expand to '0', so that this will compile fine, silently doing the math wrong.
2017-04-21 17:03:56 -07:00
Jason Evans
3823effe12 Remove --enable-ivsalloc.
Continue to use ivsalloc() when --enable-debug is specified (and add
assertions to guard against 0 size), but stop providing a documented
explicit semantics-changing band-aid to dodge undefined behavior in
sallocx() and malloc_usable_size().  ivsalloc() remains compiled in,
unlike when #211 restored --enable-ivsalloc, and if
JEMALLOC_FORCE_IVSALLOC is defined during compilation, sallocx() and
malloc_usable_size() will still use ivsalloc().

This partially resolves #580.
2017-04-21 14:34:35 -07:00
Jason Evans
b2a8453a3f Remove --disable-tls.
This option is no longer useful, because TLS is correctly configured
automatically on all supported platforms.

This partially resolves #580.
2017-04-21 11:12:29 -07:00
Jim Chen
ae248a2160 Use openat syscall if available
Some architectures like AArch64 may not have the open syscall because it
was superseded by the openat syscall, so check and use SYS_openat if
SYS_open is not available.

Additionally, Android headers for AArch64 define SYS_open to __NR_open,
even though __NR_open is undefined. Undefine SYS_open in that case so
SYS_openat is used.
2017-04-21 10:58:42 -07:00
Jason Evans
4403c9ab44 Remove --disable-tcache.
Simplify configuration by removing the --disable-tcache option, but
replace the testing for that configuration with
--with-malloc-conf=tcache:false.

Fix the thread.arena and thread.tcache.flush mallctls to work correctly
if tcache is disabled.

This partially resolves #580.
2017-04-21 10:06:12 -07:00