Commit Graph

1288 Commits

Author SHA1 Message Date
Yinan Zhang
4afd709d1f Restructure setters for profiling info
Explicitly define three setters:

- `prof_tctx_reset()`: set `prof_tctx` to `1U`, if we don't know in
advance whether the allocation is large or not;
- `prof_tctx_reset_sampled()`: set `prof_tctx` to `1U`, if we already
know in advance that the allocation is large;
- `prof_info_set()`: set a real `prof_tctx`, and also set other
profiling info e.g. the allocation time.

Code structure wise, the prof level is kept as a thin wrapper, the
large level only provides low level setter APIs, and the arena level
carries out the main logic.
2019-12-17 10:01:28 -08:00
Yinan Zhang
1d01e4c770 Initialization utilities for nstime 2019-12-16 16:08:56 -08:00
Qi Wang
dd649c9485 Optimize away the tsd_fast() check on fastpath.
Fold the tsd_state check onto the event threshold check.  The fast threshold is
set to 0 when tsd switch to non-nominal.

The fast_threshold can be reset by remote threads, to refect the non nominal tsd
state change.
2019-12-11 23:44:20 -08:00
Qi Wang
1decf958d1 Fix incorrect usage of cassert. 2019-12-11 14:02:59 -08:00
Yinan Zhang
45836d7fd3 Pass nstime_t pointer for profiling 2019-12-11 11:38:16 -08:00
Yinan Zhang
7d2bac5a38 Refactor destroy code path for prof_tctx 2019-12-10 16:31:05 -08:00
Yinan Zhang
055478cca8 Threshold is no longer updated before prof_realloc() 2019-12-10 16:31:05 -08:00
Yinan Zhang
7e3671911f Get rid of old indentation style for prof 2019-12-06 09:47:51 -08:00
Yinan Zhang
dfdd46f6c1 Refactor prof_tctx_t creation 2019-12-06 09:47:51 -08:00
Yinan Zhang
aa1d71fb7a Rename prof_tctx to alloc_tctx in prof_info_t 2019-12-06 09:47:51 -08:00
Yinan Zhang
5e0b090992 No need to pass usize to prof_tctx_set() 2019-12-06 09:47:51 -08:00
David Goldblatt
1b1e76acfe Disable some spuriously-triggering warnings 2019-12-04 13:45:17 -08:00
Yinan Zhang
5c47a30227 Guard C++ aligned APIs 2019-11-25 18:02:16 -08:00
Yinan Zhang
6945371778 Change tsdn to tsd for profiling code path 2019-11-22 16:31:56 -08:00
Yinan Zhang
b55419f9b9 Restructure profiling
Develop new data structure and code logic for holding profiling
related information stored in the extent that may be needed after the
extent is released, which in particular is the case for the
reallocation code path (e.g. in `rallocx()` and `xallocx()`).  The
data structure is a generalization of `prof_tctx_t`: we previously
only copy out the `prof_tctx` before the extent is released, but we
may be in need of additional fields. Currently the only additional
field is the allocation time field, but there may be more fields in
the future.

The restructuring also resolved a bug: `prof_realloc()` mistakenly
passed the new `ptr` to `prof_free_sampled_object()`, but passing in
the `old_ptr` would crash because it's already been released.  Now
the essential profiling information is collectively copied out early
and safely passed to `prof_free_sampled_object()` after the extent is
released.
2019-11-22 16:31:56 -08:00
Mark Santaniello
8b2c2a596d Support C++17 over-aligned allocation
Summary:
Add support for C++17 over-aligned allocation:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0035r4.html

Supporting all 10 operators means we avoid thunking thru libstdc++-v3/libsupc++ and just call jemalloc directly.

It's also worth noting that there is now an aligned *and sized* operator delete:
```
void operator delete(void* ptr, std::size_t size, std::align_val_t al) noexcept;
```

If JeMalloc did not provide this, the default implementation would ignore the size parameter entirely:
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/libsupc%2B%2B/del_opsa.cc#L30-L33

(I must also update ax_cxx_compile_stdcxx.m4 to a newer version with C++17 support.)

Test Plan:
Wrote a simple test that allocates and then deletes an over-aligned type:
```
struct alignas(32) Foo {};
Foo *f;

int main()
{
  f = new Foo;
  delete f;
}
```

Before this change, both new and delete go thru PLT, and we end up calling regular old free:
```
(gdb) disassemble
Dump of assembler code for function main():
...
   0x00000000004029b7 <+55>:    call   0x4022d0 <_ZnwmSt11align_val_t@plt>
...
   0x00000000004029d5 <+85>:    call   0x4022e0 <_ZdlPvmSt11align_val_t@plt>
...
(gdb) s
free (ptr=0x7ffff6408020) at /home/engshare/third-party2/jemalloc/master/src/jemalloc.git-trunk/src/jemalloc.c:2842
2842            if (!free_fastpath(ptr, 0, false)) {
```

After this change, we directly call new/delete and ultimately call sdallocx:
```
(gdb) disassemble
Dump of assembler code for function main():
...
   0x0000000000402b77 <+55>:    call   0x496ca0 <operator new(unsigned long, std::align_val_t)>
...
   0x0000000000402b95 <+85>:    call   0x496e60 <operator delete(void*, unsigned long, std::align_val_t)>
...
(gdb) s
116             je_sdallocx_noflags(ptr, size);
```
2019-11-22 10:14:16 -08:00
Qi Wang
9a3c738009 Refactor arena_bin_malloc_hard(). 2019-11-21 11:41:26 -08:00
Qi Wang
9a7ae3c97f Reduce footprint of bin_t.
Avoid storing mutex_prof_data_t in bin_t.  Added bin_stats_data_t which is used
for reporting bin stats.
2019-11-21 11:08:36 -08:00
Qi Wang
cb1a1f4ada Remove the unnecessary alloc_ctx on free_fastpath. 2019-11-16 13:41:13 -08:00
Qi Wang
7160617107 Add branch hints to free_fastpath.
Explicityly mark the non-slab case unlikely.  Previously there were jumps in the
common case.
2019-11-16 13:41:13 -08:00
Qi Wang
a787d2f5b3 Prefer getaffinity() to detect number of CPUs. 2019-11-15 16:24:38 -08:00
Qi Wang
04cb7d4d6b Bail out early for muzzy decay.
This avoids taking the muzzy decay mutex with the default setting.
2019-11-15 16:24:15 -08:00
Qi Wang
836d7a7e69 Check for large size first in the uncommon case of malloc.
Larger sizes are not that uncommon comparing to !tsd_fast.
2019-11-11 13:30:20 -08:00
Qi Wang
da50d8ce87 Refactor and optimize prof sampling initialization.
Makes the prof sample prng use the tsd prng_state.  This allows us to properly
initialize the sample interval event, without having to create tdata.  As a
result, tdata will be created on demand (when a thread reaches the sample
interval bytes allocated), instead of on the first allocation.
2019-11-11 10:35:37 -08:00
Qi Wang
bc774a3519 Rename tsd->offset_state to tsd->prng_state. 2019-11-11 10:35:37 -08:00
Qi Wang
19a51abf33 Avoid arena->offset_state when tsd not available for prng.
Use stack locals and remove the offset_state in arena.
2019-11-11 10:35:37 -08:00
Nick Desaulniers
d01b425e5d Add -Wimplicit-fallthrough checks if supported
Clang since r369414 (clang-10) can now check -Wimplicit-fallthrough for
C code, and use the GNU C style attribute to denote fallthrough.

Move the test from header only to autoconf. The previous test used
brittle version detection which did not work for newer clang that
supported this feature.

The attribute has to be its own statement, hence the added `;`. It also
can only precede case statements, so the final cases should be
explicitly terminated with break statements.

Fixes commit 3d29d11ac2 ("Clean compilation -Wextra")
Link: 1e0affb6e5
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
2019-11-08 13:03:03 -08:00
Yinan Zhang
43f0ce92d8 Define general purpose tsd_thread_event_init() 2019-11-04 16:07:56 -08:00
Yinan Zhang
97f93fa0f2 Pull tcache GC events into thread event handler 2019-11-04 16:07:56 -08:00
Yinan Zhang
198f02e797 Pull prof_accumbytes into thread event handler 2019-11-04 15:21:16 -08:00
Yinan Zhang
152c0ef954 Build a general purpose thread event handler 2019-11-04 11:15:50 -08:00
RingsC
6924f83cb2 use SYS_openat when available
some architecture like AArch64 may not have the open syscall, but have
openat syscall. so check and use SYS_openat if SYS_openat available if
SYS_open is not supported at init_thp_state.
2019-11-01 13:06:40 -07:00
David T. Goldblatt
de81a4eada Add stats counters for number of zero reallocs 2019-10-29 17:48:44 -07:00
David T. Goldblatt
9cfa805947 Realloc: Make behavior of realloc(ptr, 0) configurable. 2019-10-29 17:48:44 -07:00
David T. Goldblatt
ee961c2310 Merge realloc and rallocx pathways. 2019-10-29 17:48:44 -07:00
Yinan Zhang
bd6e28d6a3 Guard slabcur fetching in extent_util 2019-10-28 17:27:51 -07:00
Yinan Zhang
4786099a3a Increase column width for global malloc/free rate 2019-10-24 14:54:51 -07:00
Yinan Zhang
05681e387a Optimize cache_bin_alloc_easy for malloc fast path
`tcache_bin_info` is not accessed on malloc fast path but the
compiler reserves a register for it, as well as an additional
register for `tcache_bin_info[ind].stack_size`.  The optimization
gets rid of the need for the two registers.
2019-10-21 16:43:45 -07:00
Yinan Zhang
4fe50bc7d0 Fix amd64 MSVC warning 2019-10-18 10:16:29 -07:00
Yinan Zhang
4fbbc817c1 Simplify time setting and getting for prof log 2019-10-16 09:24:52 -07:00
Yinan Zhang
66e07f986d Suppress tdata creation in reentrancy
This change suppresses tdata initialization and prof sample threshold
update in interrupting malloc calls.  Interrupting calls have no need
for tdata.  Delaying tdata creation aligns better with our lazy tdata
creation principle, and it also helps us gain control back from
interrupting calls more quickly and reduces any risk of delegating
tdata creation to an interrupting call.
2019-10-04 08:52:50 -07:00
Yinan Zhang
beb7c16e94 Guard prof_active reset by opt_prof
Set `prof_active` to read-only when `opt_prof` is turned off.
2019-10-02 11:42:53 -07:00
David T. Goldblatt
3d84bd57f4 Arena: Add helper function arena_get_from_extent. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
821dd53a1d Extent -> Eset: Rename arena members. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
e144b21e4b Extent -> Eset: Move fork handling. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
77bbb35a92 Extent -> Eset: Move extent fit functions. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
1210af9a4e Extent -> Eset: Move insertion and removal. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
a42861540e Extents -> Eset: Convert some stats getters. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
820f070c6b Move page quantization to sz module. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
63d1b7a7a7 Extents -> Eset: move extents_state_get. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
b416b96a39 Extents -> Eset: rename/move extents_init. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
e6180fe1b4 Eset: Add a source file.
This will let us move extents_* functions over one by one.
2019-09-23 23:06:27 -07:00
David T. Goldblatt
4e5e43f22e Rename extents_t -> eset_t. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
723ccc6c27 Extents: Split out extent struct. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
41187bdfb0 Extents: Break extent-struct/arena interactions
Specifically, the extent_arena_[g|s]et functions and the address randomization.

These are the only things that tie the extent struct itself to the arena code.
2019-09-23 23:06:27 -07:00
David T. Goldblatt
e7cf84a8dd Rearrange slab data and constants
The constants logically belong in the sc module. The slab data bitmap isn't
really scoped to an arena; move it to its own module.
2019-09-23 23:06:27 -07:00
Qi Wang
ac5185f73e Fix tcache bin stack alignment.
Set the proper alignment when allocating space for the tcache bin stack.
2019-09-13 12:32:29 -07:00
zhxchen17
b7c7df24ba Add max_per_bg_thd stats for per background thread mutexes.
Added a new stats row to aggregate the maximum value of mutex counters for each
background threads.  Given that the per bg thd mutex is not expected to be
contended, this counter is mainly for sanity check / debugging.
2019-09-13 09:23:57 -07:00
zhxchen17
4b76c684bb Add "prof.dump_prefix" to override filename prefixes for dumps. 2019-09-12 22:26:03 -07:00
zhxchen17
242af439b8 Rename "prof_dump_seq_mtx" to "prof_dump_filename_mtx". 2019-09-12 22:26:03 -07:00
Yinan Zhang
93d6151800 Pass tsd down to prof_backtrace() 2019-09-05 10:57:43 -07:00
Yinan Zhang
671f120e26 Fix prof_backtrace() reentrancy level 2019-09-05 10:57:43 -07:00
Qi Wang
785b84e603 Make cache_bin_sz_t unsigned.
The bin size type was made signed only because the low_water could go -1, which
was already removed.
2019-09-04 13:37:07 -07:00
Qi Wang
719583f14a Fix large.nflushes in the merged stats. 2019-08-28 23:37:00 -07:00
Yinan Zhang
adce29c885 Optimize for prof_active off
Move the handling of `prof_active` off case completely to slow path,
so as to reduce register pressure on malloc fast path.
2019-08-27 14:48:56 -07:00
Yinan Zhang
49e6fbce78 Always adjust thread_(de)allocated 2019-08-26 11:56:41 -07:00
Yinan Zhang
57b81c078e Pull thread_(de)allocated out of config_stats 2019-08-26 11:56:41 -07:00
Yinan Zhang
9e031c1d11 Bug fix for prof_active switch
The bug is subtle but critical: if application performs the following
three actions in sequence: (a) turn `prof_active` off, (b) make at
least one allocation that triggers the malloc slow path via the
`if (unlikely(bytes_until_sample < 0))` path, and (c) turn
`prof_active` back on, then the application would never get another
sample (until a very very long time later).

The fix is to properly reset `bytes_until_sample` rather than
throwing it all the way to `SSIZE_MAX`.

A side minor change is to call `prof_active_get_unlocked()` rather
than directly grabbing the `prof_active` variable - it is the very
reason why we defined the `prof_active_get_unlocked()` function.
2019-08-22 13:00:10 -07:00
Qi Wang
0043e68d4c Track low_water == -1 case explicitly.
The -1 value of low_water indicates if the cache has been depleted and
refilled.  Track the status explicitly in the tcache struct.

This allows the fast path to check if (cur_ptr > low_water), instead of >=,
which avoids reaching slow path when the last item is allocated.
2019-08-21 16:00:38 -07:00
Qi Wang
937ca1db9f Store ncached_max * ptr_size in tcache_bin_info.
With the cache bin metadata switched to pointers, ncached_max is usually
accessed and timed by sizeof(ptr). Store the results in tcache_bin_info for
direct access, and add a helper function for the ncached_max value.
2019-08-19 12:23:24 -07:00
Qi Wang
7599c82d48 Redesign the cache bin metadata for fast path.
Implement the pointer-based metadata for tcache bins --
- 3 pointers are maintained to represent each bin;
- 2 of the pointers are compressed on 64-bit;
- is_full / is_empty done through pointer comparison;

Comparing to the previous counter based design --
- fast-path speed up ~15% in benchmarks
- direct pointer comparison and de-reference
- no need to access tcache_bin_info in common case
2019-08-19 12:21:44 -07:00
Qi Wang
9c5c2a2c86 Unify the signature of tcache_flush small and large. 2019-08-14 13:08:23 -07:00
Yinan Zhang
28ed9b9a51 Buffer stats printing
Without buffering `malloc_stats_print` would invoke the write back
call (which could mean an expensive `malloc_write_fd` call) for every
single `printf` (including printing each line break and each leading
tab/space for indentation).
2019-08-13 09:40:11 -07:00
Yinan Zhang
eb70fef8ca Make compact json format as default
Saves 20-50% of the output size.
2019-08-12 13:59:50 -07:00
Yinan Zhang
a219cfcda3 Clear tcache prof_accumbytes in tcache_flush_cache
`tcache->prof_accumbytes` should always be cleared after being
transferred to arena; otherwise the allocations would be double
counted, leading to excessive prof dumps.
2019-08-12 09:08:09 -07:00
Yinan Zhang
ad3f7dbfa0 Buffer prof_log_stop
Make use of the new buffered writer for the output of `prof_log_stop`.
2019-08-12 09:06:01 -07:00
Qi Wang
5934846612 Fix large bin index accessed through cache bin descriptor. 2019-08-11 16:31:12 -07:00
Qi Wang
22746d3c9f Properly dalloc prof nodes with idalloctm.
The prof_alloc_node is allocated through ialloc as internal.  Switch to
idalloctm with tcache and is_internal properly set.
2019-08-09 10:29:49 -07:00
Yinan Zhang
7fc6b1b259 Add buffered writer
The buffered writer adopts a signature identical to `write_cb`,
so that it can be plugged into anywhere `write_cb` appears.
2019-08-09 09:44:29 -07:00
Yinan Zhang
39343555d6 Report stats for tdatas_mtx and prof_dump_mtx 2019-08-09 09:24:16 -07:00
Qi Wang
87e2400cbb Fix tcaches mutex pre- / post-fork handling. 2019-08-08 10:55:32 -07:00
Yinan Zhang
07ce2434bf Refactor profiling
Refactored core profiling codebase into two logical parts:

(a) `prof_data.c`: core internal data structure managing & dumping;
(b) `prof.c`: mutexes & outward-facing APIs.

Some internal functions had to be exposed out, but there are not
that many of them if the modularization is (hopefully) clean enough.
2019-08-07 19:48:28 -07:00
Yinan Zhang
56126d0d2d Refactor prof log
Prof logging is conceptually seperate from core profiling, so
split it out as a module of its own.  There are a few internal
functions that had to be exposed but I think it is a fair trade-off.
2019-08-07 13:53:45 -07:00
Qi Wang
8a94ac25d5 Sanity check on prof dump buffer size. 2019-08-01 17:55:45 -07:00
Yinan Zhang
82b8aaaeb6 Quick fix for prof log printing
The emitter APIs used were incorrect, a side effect of which was
extra lines being printed.
2019-07-30 19:31:28 -07:00
Qi Wang
c9cdc1b27f Limit to exact fit on Windows with retain off.
W/o retain, split and merge are disallowed on Windows.  Avoid doing first-fit
which needs splitting almost always.  Instead, try exact fit only and bail out
early.
2019-07-29 16:19:36 -07:00
Qi Wang
5742473cc8 Revert "Refactor prof log"
This reverts commit 7618b0b8e4.
2019-07-29 14:10:15 -07:00
Qi Wang
1a0503367b Revert "Refactor profiling"
This reverts commit 0b462407ae.
2019-07-29 14:10:15 -07:00
Yinan Zhang
0b462407ae Refactor profiling
Refactored core profiling codebase into two logical parts:

(a) `prof_data.c`: core internal data structure managing & dumping;
(b) `prof.c`: mutexes & outward-facing APIs.

Some internal functions had to be exposed out, but there are not
that many of them if the modularization is (hopefully) clean enough.
2019-07-29 13:55:00 -07:00
Yinan Zhang
7618b0b8e4 Refactor prof log
`prof.c` is growing too long, so trying to modularize it.  There are
a few internal functions that had to be exposed but I think it is a
fair trade-off.
2019-07-29 13:55:00 -07:00
Qi Wang
85f0cb2d0c Add indent to individual options for confirm_conf. 2019-07-25 17:00:31 -07:00
Qi Wang
bc0998a905 Invoke arena_dalloc_promoted() properly w/o tcache.
When tcache was disabled, the dalloc promoted case was missing.
2019-07-24 18:30:54 -07:00
Qi Wang
1d148f353a Optimize max_active_fit in first_fit.
Stop scanning once reached the first max_active_fit size.
2019-07-24 11:28:45 -07:00
Qi Wang
4e36ce34c1 Track the leaked VM space via the abandoned_vm counter.
The counter is 0 unless metadata allocation failed (indicates OOM), and is
mainly for sanity checking.
2019-07-24 11:24:22 -07:00
Qi Wang
42807fcd9e extent_dalloc instead of leak when register fails.
extent_register may only fail if the underlying extent and region got stolen /
coalesced before we lock.  Avoid doing extent_leak (which purges the region)
since we don't really own the region.
2019-07-23 22:34:45 -07:00
Qi Wang
57dbab5d6b Avoid leaking extents / VM when split is not supported.
This can only happen on Windows and with opt.retain disabled (which isn't the
default).  The solution is suboptimal, however not a common case as retain is
the long term plan for all platforms anyway.
2019-07-23 22:18:55 -07:00
Qi Wang
9a86c65abc Implement retain on Windows.
The VirtualAlloc and VirtualFree APIs are different because MEM_DECOMMIT cannot
be used across multiple VirtualAlloc regions.  To properly support decommit,
only allow merge / split within the same region -- this is done by tracking the
"is_head" state of extents and not merging cross-region.

Add a new state is_head (only relevant for retain && !maps_coalesce), which is
true for the first extent in each VirtualAlloc region.  Determine if two extents
can be merged based on the head state, and use serial numbers for sanity checks.
2019-07-23 22:18:55 -07:00
Qi Wang
f32f23d6cc Fix posix_memalign with input size 0.
Return a valid pointer instead of failed assertion.
2019-07-18 00:43:23 -07:00
Yinan Zhang
e0a0c8d4bf Fix a bug in prof_dump_write
The original logic can be disastrous if `PROF_DUMP_BUFSIZE` is less
than `slen` -- `prof_dump_buf_end + slen <= PROF_DUMP_BUFSIZE` would
always be `false`, so `memcpy` would always try to copy
`PROF_DUMP_BUFSIZE - prof_dump_buf_end` chars, which can be
dangerous: in the last round of the `while` loop it would not only
illegally read the memory beyond `s` (which might not always be
disastrous), but it would also illegally overwrite the memory beyond
`prof_dump_buf` (which can be pretty disastrous).  `slen` probably
has never gone beyond `PROF_DUMP_BUFSIZE` so we were just lucky.
2019-07-16 15:15:32 -07:00
Yinan Zhang
d26636d566 Fix logic in printing
`cbopaque` can now be overriden without overriding `write_cb` in
the first place.  (Otherwise there would be no need to have the
`cbopaque` parameter in `malloc_message`.)
2019-07-16 14:54:23 -07:00