Commit Graph

1124 Commits

Author SHA1 Message Date
David Goldblatt
d78fe241ac Extent -> Ehooks: Move commit and decommit hooks. 2019-12-20 10:18:40 -08:00
David Goldblatt
5459ec9dae Extent -> Ehooks: Move destroy hook. 2019-12-20 10:18:40 -08:00
David Goldblatt
bac8e2e5a6 Extent -> Ehooks: Move dalloc hook. 2019-12-20 10:18:40 -08:00
David Goldblatt
dc8b4e6e13 Extent -> Ehooks: Move alloc hook. 2019-12-20 10:18:40 -08:00
David Goldblatt
703fbc0ff5 Introduce unsafe reentrancy guards.
We have to work to circumvent the safety checks in pre_reentrancy when going
down extent hook pathways.  Instead, let's explicitly have checked and unchecked
guards.
2019-12-20 10:18:40 -08:00
David Goldblatt
ae0d8e8591 Move extent ehook calls into ehooks 2019-12-20 10:18:40 -08:00
David Goldblatt
ba8b9ecbcb Add ehooks module 2019-12-20 10:18:40 -08:00
David Goldblatt
837119a948 base_structs.h: Remove some mid-line tabs. 2019-12-20 10:18:40 -08:00
David Goldblatt
9f6eb09585 Extents: Eagerly initialize extent hooks.
When deferred initialization was added, initializing required copying
sizeof(extent_hooks_t) bytes after a pointer chase. Today, it's just a single
pointer loaded from the base_t. In subsequent diffs, we'll get rid of even that.
2019-12-20 10:18:40 -08:00
David Goldblatt
4278f84603 Move extent hook getters/setters to arena.c
This is where they're logically scoped; they access arena data.
2019-12-20 10:18:40 -08:00
Qi Wang
d5031ea824 Allow dallocx and sdallocx after tsd destruction.
After a thread turns into purgatory / reincarnated state, still allow dallocx
and sdallocx to function normally.
2019-12-19 11:17:03 -08:00
Yinan Zhang
4afd709d1f Restructure setters for profiling info
Explicitly define three setters:

- `prof_tctx_reset()`: set `prof_tctx` to `1U`, if we don't know in
advance whether the allocation is large or not;
- `prof_tctx_reset_sampled()`: set `prof_tctx` to `1U`, if we already
know in advance that the allocation is large;
- `prof_info_set()`: set a real `prof_tctx`, and also set other
profiling info e.g. the allocation time.

Code structure wise, the prof level is kept as a thin wrapper, the
large level only provides low level setter APIs, and the arena level
carries out the main logic.
2019-12-17 10:01:28 -08:00
Yinan Zhang
1d01e4c770 Initialization utilities for nstime 2019-12-16 16:08:56 -08:00
Qi Wang
dd649c9485 Optimize away the tsd_fast() check on fastpath.
Fold the tsd_state check onto the event threshold check.  The fast threshold is
set to 0 when tsd switch to non-nominal.

The fast_threshold can be reset by remote threads, to refect the non nominal tsd
state change.
2019-12-11 23:44:20 -08:00
Yinan Zhang
45836d7fd3 Pass nstime_t pointer for profiling 2019-12-11 11:38:16 -08:00
Yinan Zhang
7d2bac5a38 Refactor destroy code path for prof_tctx 2019-12-10 16:31:05 -08:00
Yinan Zhang
055478cca8 Threshold is no longer updated before prof_realloc() 2019-12-10 16:31:05 -08:00
Yinan Zhang
7e3671911f Get rid of old indentation style for prof 2019-12-06 09:47:51 -08:00
Yinan Zhang
dfdd46f6c1 Refactor prof_tctx_t creation 2019-12-06 09:47:51 -08:00
Yinan Zhang
aa1d71fb7a Rename prof_tctx to alloc_tctx in prof_info_t 2019-12-06 09:47:51 -08:00
Yinan Zhang
5e0b090992 No need to pass usize to prof_tctx_set() 2019-12-06 09:47:51 -08:00
David Goldblatt
1b1e76acfe Disable some spuriously-triggering warnings 2019-12-04 13:45:17 -08:00
Yinan Zhang
6945371778 Change tsdn to tsd for profiling code path 2019-11-22 16:31:56 -08:00
Yinan Zhang
b55419f9b9 Restructure profiling
Develop new data structure and code logic for holding profiling
related information stored in the extent that may be needed after the
extent is released, which in particular is the case for the
reallocation code path (e.g. in `rallocx()` and `xallocx()`).  The
data structure is a generalization of `prof_tctx_t`: we previously
only copy out the `prof_tctx` before the extent is released, but we
may be in need of additional fields. Currently the only additional
field is the allocation time field, but there may be more fields in
the future.

The restructuring also resolved a bug: `prof_realloc()` mistakenly
passed the new `ptr` to `prof_free_sampled_object()`, but passing in
the `old_ptr` would crash because it's already been released.  Now
the essential profiling information is collectively copied out early
and safely passed to `prof_free_sampled_object()` after the extent is
released.
2019-11-22 16:31:56 -08:00
Mark Santaniello
8b2c2a596d Support C++17 over-aligned allocation
Summary:
Add support for C++17 over-aligned allocation:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0035r4.html

Supporting all 10 operators means we avoid thunking thru libstdc++-v3/libsupc++ and just call jemalloc directly.

It's also worth noting that there is now an aligned *and sized* operator delete:
```
void operator delete(void* ptr, std::size_t size, std::align_val_t al) noexcept;
```

If JeMalloc did not provide this, the default implementation would ignore the size parameter entirely:
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/libsupc%2B%2B/del_opsa.cc#L30-L33

(I must also update ax_cxx_compile_stdcxx.m4 to a newer version with C++17 support.)

Test Plan:
Wrote a simple test that allocates and then deletes an over-aligned type:
```
struct alignas(32) Foo {};
Foo *f;

int main()
{
  f = new Foo;
  delete f;
}
```

Before this change, both new and delete go thru PLT, and we end up calling regular old free:
```
(gdb) disassemble
Dump of assembler code for function main():
...
   0x00000000004029b7 <+55>:    call   0x4022d0 <_ZnwmSt11align_val_t@plt>
...
   0x00000000004029d5 <+85>:    call   0x4022e0 <_ZdlPvmSt11align_val_t@plt>
...
(gdb) s
free (ptr=0x7ffff6408020) at /home/engshare/third-party2/jemalloc/master/src/jemalloc.git-trunk/src/jemalloc.c:2842
2842            if (!free_fastpath(ptr, 0, false)) {
```

After this change, we directly call new/delete and ultimately call sdallocx:
```
(gdb) disassemble
Dump of assembler code for function main():
...
   0x0000000000402b77 <+55>:    call   0x496ca0 <operator new(unsigned long, std::align_val_t)>
...
   0x0000000000402b95 <+85>:    call   0x496e60 <operator delete(void*, unsigned long, std::align_val_t)>
...
(gdb) s
116             je_sdallocx_noflags(ptr, size);
```
2019-11-22 10:14:16 -08:00
Qi Wang
9a7ae3c97f Reduce footprint of bin_t.
Avoid storing mutex_prof_data_t in bin_t.  Added bin_stats_data_t which is used
for reporting bin stats.
2019-11-21 11:08:36 -08:00
Yinan Zhang
73510dfd15 Revert "Fix bug in prof_realloc"
This reverts commit 3b5eecf102.
2019-11-15 15:13:39 -08:00
Yinan Zhang
3b5eecf102 Fix bug in prof_realloc
We should pass in `old_ptr` rather than the new `ptr` to
`prof_free_sampled_object()` when `old_ptr` points to a sampled
allocation.
2019-11-15 13:28:33 -08:00
Leonardo Santagada
c462753cc8 Use __forceinline for JEMALLOC_ALWAYS_INLINE on msvc 2019-11-12 13:50:25 -08:00
Qi Wang
da50d8ce87 Refactor and optimize prof sampling initialization.
Makes the prof sample prng use the tsd prng_state.  This allows us to properly
initialize the sample interval event, without having to create tdata.  As a
result, tdata will be created on demand (when a thread reaches the sample
interval bytes allocated), instead of on the first allocation.
2019-11-11 10:35:37 -08:00
Qi Wang
bc774a3519 Rename tsd->offset_state to tsd->prng_state. 2019-11-11 10:35:37 -08:00
Qi Wang
19a51abf33 Avoid arena->offset_state when tsd not available for prng.
Use stack locals and remove the offset_state in arena.
2019-11-11 10:35:37 -08:00
Nick Desaulniers
d01b425e5d Add -Wimplicit-fallthrough checks if supported
Clang since r369414 (clang-10) can now check -Wimplicit-fallthrough for
C code, and use the GNU C style attribute to denote fallthrough.

Move the test from header only to autoconf. The previous test used
brittle version detection which did not work for newer clang that
supported this feature.

The attribute has to be its own statement, hence the added `;`. It also
can only precede case statements, so the final cases should be
explicitly terminated with break statements.

Fixes commit 3d29d11ac2 ("Clean compilation -Wextra")
Link: 1e0affb6e5
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
2019-11-08 13:03:03 -08:00
Yinan Zhang
43f0ce92d8 Define general purpose tsd_thread_event_init() 2019-11-04 16:07:56 -08:00
Yinan Zhang
97f93fa0f2 Pull tcache GC events into thread event handler 2019-11-04 16:07:56 -08:00
Yinan Zhang
198f02e797 Pull prof_accumbytes into thread event handler 2019-11-04 15:21:16 -08:00
Yinan Zhang
152c0ef954 Build a general purpose thread event handler 2019-11-04 11:15:50 -08:00
David T. Goldblatt
de81a4eada Add stats counters for number of zero reallocs 2019-10-29 17:48:44 -07:00
David T. Goldblatt
9cfa805947 Realloc: Make behavior of realloc(ptr, 0) configurable. 2019-10-29 17:48:44 -07:00
Yinan Zhang
05681e387a Optimize cache_bin_alloc_easy for malloc fast path
`tcache_bin_info` is not accessed on malloc fast path but the
compiler reserves a register for it, as well as an additional
register for `tcache_bin_info[ind].stack_size`.  The optimization
gets rid of the need for the two registers.
2019-10-21 16:43:45 -07:00
Yinan Zhang
4fe50bc7d0 Fix amd64 MSVC warning 2019-10-18 10:16:29 -07:00
Yinan Zhang
4fbbc817c1 Simplify time setting and getting for prof log 2019-10-16 09:24:52 -07:00
Yinan Zhang
66e07f986d Suppress tdata creation in reentrancy
This change suppresses tdata initialization and prof sample threshold
update in interrupting malloc calls.  Interrupting calls have no need
for tdata.  Delaying tdata creation aligns better with our lazy tdata
creation principle, and it also helps us gain control back from
interrupting calls more quickly and reduces any risk of delegating
tdata creation to an interrupting call.
2019-10-04 08:52:50 -07:00
Yinan Zhang
beb7c16e94 Guard prof_active reset by opt_prof
Set `prof_active` to read-only when `opt_prof` is turned off.
2019-10-02 11:42:53 -07:00
David T. Goldblatt
3d84bd57f4 Arena: Add helper function arena_get_from_extent. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
c97d255752 Eset: Remove temporary declaration. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
ce5b128f10 Remove the undefined extent_size_quantize declarations. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
821dd53a1d Extent -> Eset: Rename arena members. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
e144b21e4b Extent -> Eset: Move fork handling. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
77bbb35a92 Extent -> Eset: Move extent fit functions. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
1210af9a4e Extent -> Eset: Move insertion and removal. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
a42861540e Extents -> Eset: Convert some stats getters. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
820f070c6b Move page quantization to sz module. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
63d1b7a7a7 Extents -> Eset: move extents_state_get. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
b416b96a39 Extents -> Eset: rename/move extents_init. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
4e5e43f22e Rename extents_t -> eset_t. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
723ccc6c27 Extents: Split out extent struct. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
41187bdfb0 Extents: Break extent-struct/arena interactions
Specifically, the extent_arena_[g|s]et functions and the address randomization.

These are the only things that tie the extent struct itself to the arena code.
2019-09-23 23:06:27 -07:00
David T. Goldblatt
529cfe2abc Arena: rename arena_structs_b.h -> arena_structs.h
arena_structs_a.h was removed in the previous commit.
2019-09-23 23:06:27 -07:00
David T. Goldblatt
e7cf84a8dd Rearrange slab data and constants
The constants logically belong in the sc module. The slab data bitmap isn't
really scoped to an arena; move it to its own module.
2019-09-23 23:06:27 -07:00
zhxchen17
b7c7df24ba Add max_per_bg_thd stats for per background thread mutexes.
Added a new stats row to aggregate the maximum value of mutex counters for each
background threads.  Given that the per bg thd mutex is not expected to be
contended, this counter is mainly for sanity check / debugging.
2019-09-13 09:23:57 -07:00
zhxchen17
4b76c684bb Add "prof.dump_prefix" to override filename prefixes for dumps. 2019-09-12 22:26:03 -07:00
zhxchen17
242af439b8 Rename "prof_dump_seq_mtx" to "prof_dump_filename_mtx". 2019-09-12 22:26:03 -07:00
Yinan Zhang
93d6151800 Pass tsd down to prof_backtrace() 2019-09-05 10:57:43 -07:00
Qi Wang
785b84e603 Make cache_bin_sz_t unsigned.
The bin size type was made signed only because the low_water could go -1, which
was already removed.
2019-09-04 13:37:07 -07:00
Qi Wang
23dc7a7fba Fix index type for cache_bin_alloc_easy. 2019-09-04 13:37:07 -07:00
Yinan Zhang
57b81c078e Pull thread_(de)allocated out of config_stats 2019-08-26 11:56:41 -07:00
Qi Wang
0043e68d4c Track low_water == -1 case explicitly.
The -1 value of low_water indicates if the cache has been depleted and
refilled.  Track the status explicitly in the tcache struct.

This allows the fast path to check if (cur_ptr > low_water), instead of >=,
which avoids reaching slow path when the last item is allocated.
2019-08-21 16:00:38 -07:00
Qi Wang
937ca1db9f Store ncached_max * ptr_size in tcache_bin_info.
With the cache bin metadata switched to pointers, ncached_max is usually
accessed and timed by sizeof(ptr). Store the results in tcache_bin_info for
direct access, and add a helper function for the ncached_max value.
2019-08-19 12:23:24 -07:00
Qi Wang
7599c82d48 Redesign the cache bin metadata for fast path.
Implement the pointer-based metadata for tcache bins --
- 3 pointers are maintained to represent each bin;
- 2 of the pointers are compressed on 64-bit;
- is_full / is_empty done through pointer comparison;

Comparing to the previous counter based design --
- fast-path speed up ~15% in benchmarks
- direct pointer comparison and de-reference
- no need to access tcache_bin_info in common case
2019-08-19 12:21:44 -07:00
Qi Wang
e2c7584361 Simplify / refactor tcache_dalloc_large. 2019-08-14 13:08:23 -07:00
Qi Wang
9c5c2a2c86 Unify the signature of tcache_flush small and large. 2019-08-14 13:08:23 -07:00
Yinan Zhang
8c8466fa6e Add compact json option for emitter
JSON format is largely meant for machine-machine communication, so
adding the option to the emitter.  According to local testing, the
savings in terms of bytes outputted is around 50% for stats printing
and around 25% for prof log printing.
2019-08-09 09:53:41 -07:00
Yinan Zhang
7fc6b1b259 Add buffered writer
The buffered writer adopts a signature identical to `write_cb`,
so that it can be plugged into anywhere `write_cb` appears.
2019-08-09 09:44:29 -07:00
Yinan Zhang
39343555d6 Report stats for tdatas_mtx and prof_dump_mtx 2019-08-09 09:24:16 -07:00
Yinan Zhang
07ce2434bf Refactor profiling
Refactored core profiling codebase into two logical parts:

(a) `prof_data.c`: core internal data structure managing & dumping;
(b) `prof.c`: mutexes & outward-facing APIs.

Some internal functions had to be exposed out, but there are not
that many of them if the modularization is (hopefully) clean enough.
2019-08-07 19:48:28 -07:00
Yinan Zhang
56126d0d2d Refactor prof log
Prof logging is conceptually seperate from core profiling, so
split it out as a module of its own.  There are a few internal
functions that had to be exposed but I think it is a fair trade-off.
2019-08-07 13:53:45 -07:00
Yinan Zhang
56c8ecffc1 Correct tsd layout graph
Augmented the tsd layout graph so that the two recently added fields,
`offset_state` and `bytes_until_sample`, are properly reflected.
As is shown, the cache footprint is 16 bytes larger than before.
2019-08-05 15:30:20 -07:00
Yinan Zhang
9344d25488 Workaround to address g++ unused variable warnings
g++ 5.5.0+ complained `parameter ‘expected’ set but not used
[-Werror=unused-but-set-parameter]` (despite that `expected` is in
fact used).
2019-07-30 11:37:56 -07:00
Qi Wang
5742473cc8 Revert "Refactor prof log"
This reverts commit 7618b0b8e4.
2019-07-29 14:10:15 -07:00
Qi Wang
1a0503367b Revert "Refactor profiling"
This reverts commit 0b462407ae.
2019-07-29 14:10:15 -07:00
Yinan Zhang
0b462407ae Refactor profiling
Refactored core profiling codebase into two logical parts:

(a) `prof_data.c`: core internal data structure managing & dumping;
(b) `prof.c`: mutexes & outward-facing APIs.

Some internal functions had to be exposed out, but there are not
that many of them if the modularization is (hopefully) clean enough.
2019-07-29 13:55:00 -07:00
Yinan Zhang
7618b0b8e4 Refactor prof log
`prof.c` is growing too long, so trying to modularize it.  There are
a few internal functions that had to be exposed but I think it is a
fair trade-off.
2019-07-29 13:55:00 -07:00
Qi Wang
a3fa597921 Refactor arena_dalloc() / _sdalloc(). 2019-07-24 18:30:54 -07:00
Qi Wang
bc0998a905 Invoke arena_dalloc_promoted() properly w/o tcache.
When tcache was disabled, the dalloc promoted case was missing.
2019-07-24 18:30:54 -07:00
Qi Wang
4e36ce34c1 Track the leaked VM space via the abandoned_vm counter.
The counter is 0 unless metadata allocation failed (indicates OOM), and is
mainly for sanity checking.
2019-07-24 11:24:22 -07:00
Qi Wang
9a86c65abc Implement retain on Windows.
The VirtualAlloc and VirtualFree APIs are different because MEM_DECOMMIT cannot
be used across multiple VirtualAlloc regions.  To properly support decommit,
only allow merge / split within the same region -- this is done by tracking the
"is_head" state of extents and not merging cross-region.

Add a new state is_head (only relevant for retain && !maps_coalesce), which is
true for the first extent in each VirtualAlloc region.  Determine if two extents
can be merged based on the head state, and use serial numbers for sanity checks.
2019-07-23 22:18:55 -07:00
Yinan Zhang
a2a693e722 Remove prof_accumbytes in arena
`prof_accumbytes` was supposed to be replaced by `prof_accum` in
https://github.com/jemalloc/jemalloc/pull/623.
2019-07-16 15:18:52 -07:00
Yinan Zhang
d26636d566 Fix logic in printing
`cbopaque` can now be overriden without overriding `write_cb` in
the first place.  (Otherwise there would be no need to have the
`cbopaque` parameter in `malloc_message`.)
2019-07-16 14:54:23 -07:00
Yinan Zhang
7720b6e385 Fix redzone setting and checking 2019-07-11 20:51:29 -07:00
Yinan Zhang
c92ac30601 Add confirm_conf option
If the confirm_conf option is set, when the program starts, each of
the four malloc_conf strings will be printed, and each option will
be printed when being set.
2019-05-22 09:38:39 -07:00
Qi Wang
07c44847c2 Track nfills and nflushes for arenas.i.small / large.
Small is added purely for convenience.  Large flushes wasn't tracked before and
can be useful in analysis.  Large fill simply reports nmalloc, since there is no
batch fill for large currently.
2019-05-15 10:05:09 -07:00
Doron Roberts-Kedes
7fc4f2a32c Add nonfull_slabs to bin_stats_t.
When config_stats is enabled track the size of bin->slabs_nonfull in
the new nonfull_slabs counter in bin_stats_t. This metric should be
useful for establishing an upper ceiling on the savings possible by
meshing.
2019-04-29 13:35:02 -07:00
Yinan Zhang
ae124b8684 Improve size class header
Mainly fixing typos.  The only non-trivial change is in the
computation for SC_NPSIZES, though the result wouldn't be any
different when SC_NGROUP = 4 as is always the case at the moment.
2019-04-24 10:45:12 -07:00
Qi Wang
1aabab5fdc Enforce TLS_MODEL attribute.
Caught by @zoulasc in #1460.  The attribute needs to be added in the headers as
well.
2019-04-16 11:07:15 -07:00
David Goldblatt
33e1dad680 Safety checks: Add a redzoning feature. 2019-04-15 16:48:12 -07:00
David Goldblatt
b92c9a1a81 Safety checks: Indirect through a function.
This will let us share code on failure pathways.pathways
2019-04-15 16:48:12 -07:00
David Goldblatt
f4d24f05e1 Move extra size checks behind a config flag.
This will let us turn that flag into a generic "turn on runtime checks" flag
that guards other functionality we have planned.
2019-04-15 16:48:12 -07:00
zoulasc
7f7935cf78 Add an autoconf feature test for format_arg and a jemalloc-specific
macro for it.
2019-04-15 15:14:46 -07:00
zoulasc
14e4176758 Fix incorrect macro use.
Compiling with warnings produces missing prototype warnings.
2019-04-15 15:14:46 -07:00
zoulasc
020b5dc7ac Convert the format generator function to an annotated format function,
so that the generated formats can be checked by the compiler.
2019-04-15 15:14:46 -07:00
mgrice
d3d7a8ef09 remove compare and branch in fast path for c++ operator delete[]
Summary: sdallocx is checking a flag that will never be set (at least in the provided C++ destructor implementation).  This branch will probably only rarely be mispredicted however it removes two instructions in sdallocx and one at the callsite (to zero out flags).
2019-04-08 10:59:05 -07:00
Yinan Zhang
9aab3f2be0 Add memory utilization analytics to mallctl
The analytics tool is put under experimental.utilization namespace in
mallctl.  Input is one pointer or an array of pointers and the output
is a list of memory utilization statistics.
2019-04-04 13:48:39 -07:00
Qi Wang
fb56766ca9 Eagerly purge oversized merged extents.
This change improves memory usage slightly, at virtually no CPU cost.
2019-03-14 17:34:55 -07:00
Qi Wang
f6c30cbafa Remove some unused comments. 2019-03-14 17:34:55 -07:00
Qi Wang
b804d0f019 Fallback to 32-bit when 8-bit atomics are missing for TSD.
When it happens, this might cause a slowdown on the fast path operations.
However such case is very rare.
2019-03-09 12:52:06 -08:00
Qi Wang
06f0850427 Detect if 8-bit atomics are available.
In some rare cases (older compiler, e.g. gcc 4.2 w/ MIPS), 8-bit atomics might
be unavailable.  Detect such cases so that we can workaround.
2019-03-09 12:52:06 -08:00
Jason Evans
14d3686c9f Do not use #pragma GCC diagnostic with gcc < 4.6.
This regression was introduced by
3d29d11ac2 (Clean compilation -Wextra).
2019-03-09 12:10:30 -08:00
Jason Evans
775fe302a7 Remove JE_FORCE_SYNC_COMPARE_AND_SWAP_[48].
These macros have been unused since
d4ac7582f3 (Introduce a backport of C11
atomics).
2019-02-22 14:22:16 -08:00
Jason Evans
dca7060d5e Avoid redefining tsd_t.
This fixes a build failure when integrating with FreeBSD's libc.  This
regression was introduced by d1e11d48d4
(Move tsd link and in_hook after tcache.).
2019-02-20 20:27:55 -08:00
Qi Wang
8e9a613122 Disable muzzy decay by default. 2019-02-04 14:38:54 -08:00
Qi Wang
e13400c919 Sanity check szind on tcache flush.
This adds some overhead to the tcache flush path (which is one of the
popular paths).  Guard it behind a config option.
2019-02-01 12:31:34 -08:00
Qi Wang
e3db480f6f Rename huge_threshold to oversize_threshold.
The keyword huge tend to remind people of huge pages which is not relevent to
the feature.
2019-01-25 13:15:45 -08:00
Qi Wang
350809dc5d Set huge_threshold to 8M by default.
This feature uses an dedicated arena to handle huge requests, which
significantly improves VM fragmentation.  In production workload we tested it
often reduces VM size by >30%.
2019-01-24 13:29:23 -08:00
Qi Wang
bbe8e6a909 Avoid creating bg thds for huge arena lone.
For low arena count settings, the huge threshold feature may trigger an unwanted
bg thd creation.  Given that the huge arena does eager purging by default,
bypass bg thd creation when initializing the huge arena.
2019-01-15 16:00:34 -08:00
Qi Wang
f459454afe Avoid potential issues on extent zero-out.
When custom extent_hooks or transparent huge pages are in use, the purging
semantics may change, which means we may not get zeroed pages on repopulating.
Fixing the issue by manually memset for such cases.
2019-01-11 19:16:12 -08:00
Leonardo Santagada
daa0e436ba implement malloc_getcpu for windows 2019-01-08 14:34:45 -08:00
Qi Wang
7241bf5b74 Only read arena index from extent on the tcache flush path.
Add exten_arena_ind_get() to avoid loading the actual arena ptr in case we just
need to check arena matching.
2018-12-18 15:19:30 -08:00
Alexander Zinoviev
36de5189c7 Add rate counters to stats 2018-12-18 09:59:41 -08:00
Qi Wang
98b56ab23d Store the bin shard selection in TSD.
This avoids having to choose bin shard on the fly, also will allow flexible bin
binding for each thread.
2018-12-03 17:17:03 -08:00
Qi Wang
3f9f2833f6 Add opt.bin_shards to specify number of bin shards.
The option uses the same format as "slab_sizes" to specify number of shards for
each bin size.
2018-12-03 17:17:03 -08:00
Qi Wang
37b8913925 Add support for sharded bins within an arena.
This makes it possible to have multiple set of bins in an arena, which improves
arena scalability because the bins (especially the small ones) are always the
limiting factor in production workload.

A bin shard is picked on allocation; each extent tracks the bin shard id for
deallocation.  The shard size will be determined using runtime options.
2018-12-03 17:17:03 -08:00
Dave Watson
b23336af96 mutex: fix trylock spin wait contention
If there are 3 or more threads spin-waiting on the same mutex,
there will be excessive exclusive cacheline contention because
pthread_trylock() immediately tries to CAS in a new value, instead
of first checking if the lock is locked.

This diff adds a 'locked' hint flag, and we will only spin wait
without trylock()ing while set.  I don't know of any other portable
way to get the same behavior as pthread_mutex_lock().

This is pretty easy to test via ttest, e.g.

./ttest1 500 3 10000 1 100

Throughput is nearly 3x as fast.

This blames to the mutex profiling changes, however, we almost never
have 3 or more threads contending in properly configured production
workloads, but still worth fixing.
2018-11-28 15:17:02 -08:00
Qi Wang
c4063ce439 Set the default number of background threads to 4.
The setting has been tested in production for a while.  No negative effect while
we were able to reduce number of threads per process.
2018-11-16 09:35:12 -08:00
Qi Wang
43f3b1ad0c Deprecate OSSpinLock. 2018-11-14 08:44:05 -08:00
Dave Watson
13c237c7ef Add a fastpath for arena_slab_reg_alloc_batch
Also adds a configure.ac check for __builtin_popcount, which is used
in the new fastpath.
2018-11-14 07:09:11 -08:00
Dave Watson
17aa470760 add extent_nfree_sub 2018-11-14 07:09:11 -08:00
Qi Wang
1f56115704 Fix tcache_flush (follow up cd2931a).
Also catch invalid tcache id.
2018-11-13 08:54:09 -08:00
Dave Watson
e2ab215324 refactor tcache_dalloc_small
Add a cache_bin_dalloc_easy (to match the alloc_easy function),
and use it in tcache_dalloc_small.  It will also be used in the
new free fastpath.
2018-11-12 13:20:37 -08:00
Dave Watson
5e795297b3 rtree: add rtree_szind_slab_read_fast
For a free fastpath, we want something that will not make additional
calls.  Assume most free() calls will hit the L1 cache, and use
a custom rtree function for this.

Additionally, roll the ptr=NULL check in to the rtree cache check.
2018-11-12 13:20:37 -08:00
Justin Hibbits
be0749f591 Restrict lwsync to powerpc64 only
Nearly all 32-bit powerpc hardware treats lwsync as sync, and some cores
(Freescale e500) trap lwsync as an illegal instruction, which then gets
emulated in the kernel.  To avoid unnecessary traps on the e500, use
sync on all 32-bit powerpc.  This pessimizes 32-bit software running on
64-bit hardware, but those numbers should be slim.
2018-10-24 11:18:55 -07:00
Edward Tomasz Napierala
ceba1dde27 Make use of pthread_set_name_np(3) on FreeBSD. 2018-10-24 10:06:37 -07:00
Dave Watson
936bc2aa15 prof: Fix memory regression
The diff 'refactor prof accum...' moved the bytes_until_sample
subtraction before the load of tdata.  If tdata is null,
tdata_get(true) will overwrite bytes_until_sample, but we
still sample the current allocation.   Instead, do the subtraction
and check logic again, to keep the previous behavior.

blame-rev: 0ac524308d
2018-10-23 12:39:57 -07:00
Dave Watson
0ec656eb71 ticker: add ticker_trytick
For the fastpath, we want to tick, but undo the tick and jump to the
slowpath if ticker would fire.
2018-10-18 08:32:19 -07:00
Dave Watson
ac34afb403 drop bump_empty_alloc option. Size class lookup support used instead. 2018-10-17 08:50:58 -07:00
Dave Watson
4edbb7c64c sz: Support 0 size in size2index lookup/compute 2018-10-17 08:50:58 -07:00
gnzlbg
01e2a38e5a Make smallocx symbol name depend on the JEMALLOC_VERSION_GID
This comments concatenates the `JEMALLOC_VERSION_GID` to the
`smallocx` symbol name, such that the symbol ends up exported
as `smallocx_{git_hash}`.
2018-10-17 07:12:28 -07:00
gnzlbg
741fca1bb7 Hide smallocx even when enabled from the library API
The experimental `smallocx` API is not exposed via header files,
requiring the users to peek at `jemalloc`'s source code to manually
add the external declarations to their own programs.

This should reinforce that `smallocx` is experimental, and that `jemalloc`
does not offer any kind of backwards compatiblity or ABI gurantees for it.
2018-10-17 07:12:28 -07:00
gnzlbg
08260a6b94 Add experimental API: smallocx_return_t smallocx(size, flags)
---

Motivation:

This new experimental memory-allocaction API returns a pointer to
the allocation as well as the usable size of the allocated memory
region.

The `s` in `smallocx` stands for `sized`-`mallocx`, attempting to
convey that this API returns the size of the allocated memory region.

It should allow C++ P0901r0 [0] and Rust Alloc::alloc_excess to make
use of it.

The main purpose of these APIs is to improve telemetry. It is more accurate
to register `smallocx(size, flags)` than `smallocx(nallocx(size), flags)`,
for example. The latter will always line up perfectly with the existing
size classes, causing a loss of telemetry information about the internal
fragmentation induced by potentially poor size-classes choices.

Instrumenting `nallocx` does not help much since user code can cache its
result and use it repeatedly.

---

Implementation:

The implementation adds a new `usize` option to `static_opts_s` and an `usize`
variable to `dynamic_opts_s`. These are then used to cache the result of
`sz_index2size` and similar functions in the code paths in which they are
unconditionally invoked. In the code-paths in which these functions are not
unconditionally invoked, `smallocx` calls, as opposed to `mallocx`, these
functions explicitly.

---

[0]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0901r0.html
2018-10-17 07:12:28 -07:00
Dave Watson
325e3305fc remove malloc_init() off the fastpath 2018-10-15 10:11:08 -07:00
Dave Watson
997d86acc6 restrict bytes_until_sample to int64_t. This allows optimal asm
generation of sub bytes_until_sample, usize; je; for x86 arch.
Subtraction is unconditional, and only flags are checked for the jump,
no extra compare is necessary.  This also reduces register pressure.
2018-10-15 08:24:12 -07:00
Dave Watson
0ac524308d refactor prof accum, so that tdata is not loaded if we aren't going to sample. 2018-10-15 08:24:12 -07:00
Dave Watson
9ed3bdc848 move bytes until sample to tsd. Fastpath allocation does not need
to load tdata now, avoiding several branches.
2018-10-15 08:24:12 -07:00
Dave Watson
09adf18f1a Remove a branch from cache_bin_alloc_easy
Combine the branches for checking for an empty cache_bin, and
checking for the low watermark.
2018-10-15 08:18:15 -07:00
Rajeev Misra
115ce93562 bit_util: Don't use __builtin_clz on s390x
There's an optimizer bug upstream that results in test failures; reported at
https://bugzilla.redhat.com/show_bug.cgi?id=1619354.  This works around the
failure reported at https://github.com/jemalloc/jemalloc/issues/1307.
2018-09-20 11:25:17 -07:00
Rajeev Misra
4c548a61c8 Bit_util: Use intrinsics for pow2_ceil, where available. 2018-08-15 19:38:31 -07:00
David Carlier
0771ff2cea FreeBSD build changes and allow to run the tests. 2018-08-09 10:41:20 -07:00
David Goldblatt
e8ec9528ab Allow the use of readlinkat over readlink.
This can be useful in situations where readlink is disallowed.
2018-08-03 14:04:32 -07:00
Tyler Etzel
126252a7e6 Add stats for the size of extent_avail heap 2018-08-02 10:16:06 -07:00
Tyler Etzel
c14e6c0819 Add extents information to mallocstats output
- Show number/bytes of extents of each size that are dirty, muzzy, retained.
2018-08-02 10:16:06 -07:00