Commit Graph

1201 Commits

Author SHA1 Message Date
Yinan Zhang
05681e387a Optimize cache_bin_alloc_easy for malloc fast path
`tcache_bin_info` is not accessed on malloc fast path but the
compiler reserves a register for it, as well as an additional
register for `tcache_bin_info[ind].stack_size`.  The optimization
gets rid of the need for the two registers.
2019-10-21 16:43:45 -07:00
Yinan Zhang
4fe50bc7d0 Fix amd64 MSVC warning 2019-10-18 10:16:29 -07:00
Yinan Zhang
4fbbc817c1 Simplify time setting and getting for prof log 2019-10-16 09:24:52 -07:00
Yinan Zhang
66e07f986d Suppress tdata creation in reentrancy
This change suppresses tdata initialization and prof sample threshold
update in interrupting malloc calls.  Interrupting calls have no need
for tdata.  Delaying tdata creation aligns better with our lazy tdata
creation principle, and it also helps us gain control back from
interrupting calls more quickly and reduces any risk of delegating
tdata creation to an interrupting call.
2019-10-04 08:52:50 -07:00
Yinan Zhang
beb7c16e94 Guard prof_active reset by opt_prof
Set `prof_active` to read-only when `opt_prof` is turned off.
2019-10-02 11:42:53 -07:00
David T. Goldblatt
3d84bd57f4 Arena: Add helper function arena_get_from_extent. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
821dd53a1d Extent -> Eset: Rename arena members. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
e144b21e4b Extent -> Eset: Move fork handling. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
77bbb35a92 Extent -> Eset: Move extent fit functions. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
1210af9a4e Extent -> Eset: Move insertion and removal. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
a42861540e Extents -> Eset: Convert some stats getters. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
820f070c6b Move page quantization to sz module. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
63d1b7a7a7 Extents -> Eset: move extents_state_get. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
b416b96a39 Extents -> Eset: rename/move extents_init. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
e6180fe1b4 Eset: Add a source file.
This will let us move extents_* functions over one by one.
2019-09-23 23:06:27 -07:00
David T. Goldblatt
4e5e43f22e Rename extents_t -> eset_t. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
723ccc6c27 Extents: Split out extent struct. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
41187bdfb0 Extents: Break extent-struct/arena interactions
Specifically, the extent_arena_[g|s]et functions and the address randomization.

These are the only things that tie the extent struct itself to the arena code.
2019-09-23 23:06:27 -07:00
David T. Goldblatt
e7cf84a8dd Rearrange slab data and constants
The constants logically belong in the sc module. The slab data bitmap isn't
really scoped to an arena; move it to its own module.
2019-09-23 23:06:27 -07:00
Qi Wang
ac5185f73e Fix tcache bin stack alignment.
Set the proper alignment when allocating space for the tcache bin stack.
2019-09-13 12:32:29 -07:00
zhxchen17
b7c7df24ba Add max_per_bg_thd stats for per background thread mutexes.
Added a new stats row to aggregate the maximum value of mutex counters for each
background threads.  Given that the per bg thd mutex is not expected to be
contended, this counter is mainly for sanity check / debugging.
2019-09-13 09:23:57 -07:00
zhxchen17
4b76c684bb Add "prof.dump_prefix" to override filename prefixes for dumps. 2019-09-12 22:26:03 -07:00
zhxchen17
242af439b8 Rename "prof_dump_seq_mtx" to "prof_dump_filename_mtx". 2019-09-12 22:26:03 -07:00
Yinan Zhang
93d6151800 Pass tsd down to prof_backtrace() 2019-09-05 10:57:43 -07:00
Yinan Zhang
671f120e26 Fix prof_backtrace() reentrancy level 2019-09-05 10:57:43 -07:00
Qi Wang
785b84e603 Make cache_bin_sz_t unsigned.
The bin size type was made signed only because the low_water could go -1, which
was already removed.
2019-09-04 13:37:07 -07:00
Qi Wang
719583f14a Fix large.nflushes in the merged stats. 2019-08-28 23:37:00 -07:00
Yinan Zhang
adce29c885 Optimize for prof_active off
Move the handling of `prof_active` off case completely to slow path,
so as to reduce register pressure on malloc fast path.
2019-08-27 14:48:56 -07:00
Yinan Zhang
49e6fbce78 Always adjust thread_(de)allocated 2019-08-26 11:56:41 -07:00
Yinan Zhang
57b81c078e Pull thread_(de)allocated out of config_stats 2019-08-26 11:56:41 -07:00
Yinan Zhang
9e031c1d11 Bug fix for prof_active switch
The bug is subtle but critical: if application performs the following
three actions in sequence: (a) turn `prof_active` off, (b) make at
least one allocation that triggers the malloc slow path via the
`if (unlikely(bytes_until_sample < 0))` path, and (c) turn
`prof_active` back on, then the application would never get another
sample (until a very very long time later).

The fix is to properly reset `bytes_until_sample` rather than
throwing it all the way to `SSIZE_MAX`.

A side minor change is to call `prof_active_get_unlocked()` rather
than directly grabbing the `prof_active` variable - it is the very
reason why we defined the `prof_active_get_unlocked()` function.
2019-08-22 13:00:10 -07:00
Qi Wang
0043e68d4c Track low_water == -1 case explicitly.
The -1 value of low_water indicates if the cache has been depleted and
refilled.  Track the status explicitly in the tcache struct.

This allows the fast path to check if (cur_ptr > low_water), instead of >=,
which avoids reaching slow path when the last item is allocated.
2019-08-21 16:00:38 -07:00
Qi Wang
937ca1db9f Store ncached_max * ptr_size in tcache_bin_info.
With the cache bin metadata switched to pointers, ncached_max is usually
accessed and timed by sizeof(ptr). Store the results in tcache_bin_info for
direct access, and add a helper function for the ncached_max value.
2019-08-19 12:23:24 -07:00
Qi Wang
7599c82d48 Redesign the cache bin metadata for fast path.
Implement the pointer-based metadata for tcache bins --
- 3 pointers are maintained to represent each bin;
- 2 of the pointers are compressed on 64-bit;
- is_full / is_empty done through pointer comparison;

Comparing to the previous counter based design --
- fast-path speed up ~15% in benchmarks
- direct pointer comparison and de-reference
- no need to access tcache_bin_info in common case
2019-08-19 12:21:44 -07:00
Qi Wang
9c5c2a2c86 Unify the signature of tcache_flush small and large. 2019-08-14 13:08:23 -07:00
Yinan Zhang
28ed9b9a51 Buffer stats printing
Without buffering `malloc_stats_print` would invoke the write back
call (which could mean an expensive `malloc_write_fd` call) for every
single `printf` (including printing each line break and each leading
tab/space for indentation).
2019-08-13 09:40:11 -07:00
Yinan Zhang
eb70fef8ca Make compact json format as default
Saves 20-50% of the output size.
2019-08-12 13:59:50 -07:00
Yinan Zhang
a219cfcda3 Clear tcache prof_accumbytes in tcache_flush_cache
`tcache->prof_accumbytes` should always be cleared after being
transferred to arena; otherwise the allocations would be double
counted, leading to excessive prof dumps.
2019-08-12 09:08:09 -07:00
Yinan Zhang
ad3f7dbfa0 Buffer prof_log_stop
Make use of the new buffered writer for the output of `prof_log_stop`.
2019-08-12 09:06:01 -07:00
Qi Wang
5934846612 Fix large bin index accessed through cache bin descriptor. 2019-08-11 16:31:12 -07:00
Qi Wang
22746d3c9f Properly dalloc prof nodes with idalloctm.
The prof_alloc_node is allocated through ialloc as internal.  Switch to
idalloctm with tcache and is_internal properly set.
2019-08-09 10:29:49 -07:00
Yinan Zhang
7fc6b1b259 Add buffered writer
The buffered writer adopts a signature identical to `write_cb`,
so that it can be plugged into anywhere `write_cb` appears.
2019-08-09 09:44:29 -07:00
Yinan Zhang
39343555d6 Report stats for tdatas_mtx and prof_dump_mtx 2019-08-09 09:24:16 -07:00
Qi Wang
87e2400cbb Fix tcaches mutex pre- / post-fork handling. 2019-08-08 10:55:32 -07:00
Yinan Zhang
07ce2434bf Refactor profiling
Refactored core profiling codebase into two logical parts:

(a) `prof_data.c`: core internal data structure managing & dumping;
(b) `prof.c`: mutexes & outward-facing APIs.

Some internal functions had to be exposed out, but there are not
that many of them if the modularization is (hopefully) clean enough.
2019-08-07 19:48:28 -07:00
Yinan Zhang
56126d0d2d Refactor prof log
Prof logging is conceptually seperate from core profiling, so
split it out as a module of its own.  There are a few internal
functions that had to be exposed but I think it is a fair trade-off.
2019-08-07 13:53:45 -07:00
Qi Wang
8a94ac25d5 Sanity check on prof dump buffer size. 2019-08-01 17:55:45 -07:00
Yinan Zhang
82b8aaaeb6 Quick fix for prof log printing
The emitter APIs used were incorrect, a side effect of which was
extra lines being printed.
2019-07-30 19:31:28 -07:00
Qi Wang
c9cdc1b27f Limit to exact fit on Windows with retain off.
W/o retain, split and merge are disallowed on Windows.  Avoid doing first-fit
which needs splitting almost always.  Instead, try exact fit only and bail out
early.
2019-07-29 16:19:36 -07:00
Qi Wang
5742473cc8 Revert "Refactor prof log"
This reverts commit 7618b0b8e4.
2019-07-29 14:10:15 -07:00