server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Qi Wang	0043e68d4c	Track low_water == -1 case explicitly. The -1 value of low_water indicates if the cache has been depleted and refilled. Track the status explicitly in the tcache struct. This allows the fast path to check if (cur_ptr > low_water), instead of >=, which avoids reaching slow path when the last item is allocated.	2019-08-21 16:00:38 -07:00
Qi Wang	937ca1db9f	Store ncached_max * ptr_size in tcache_bin_info. With the cache bin metadata switched to pointers, ncached_max is usually accessed and timed by sizeof(ptr). Store the results in tcache_bin_info for direct access, and add a helper function for the ncached_max value.	2019-08-19 12:23:24 -07:00
Qi Wang	7599c82d48	Redesign the cache bin metadata for fast path. Implement the pointer-based metadata for tcache bins -- - 3 pointers are maintained to represent each bin; - 2 of the pointers are compressed on 64-bit; - is_full / is_empty done through pointer comparison; Comparing to the previous counter based design -- - fast-path speed up ~15% in benchmarks - direct pointer comparison and de-reference - no need to access tcache_bin_info in common case	2019-08-19 12:21:44 -07:00
Qi Wang	d2dddfb82a	Add hint in the bogus version string.	2019-08-16 16:08:18 -07:00
Qi Wang	d6b7995c16	Update INSTALL.md about the default doc build.	2019-08-16 10:03:34 -07:00
Qi Wang	e2c7584361	Simplify / refactor tcache_dalloc_large.	2019-08-14 13:08:23 -07:00
Qi Wang	9c5c2a2c86	Unify the signature of tcache_flush small and large.	2019-08-14 13:08:23 -07:00
Yinan Zhang	28ed9b9a51	Buffer stats printing Without buffering `malloc_stats_print` would invoke the write back call (which could mean an expensive `malloc_write_fd` call) for every single `printf` (including printing each line break and each leading tab/space for indentation).	2019-08-13 09:40:11 -07:00
Yinan Zhang	eb70fef8ca	Make compact json format as default Saves 20-50% of the output size.	2019-08-12 13:59:50 -07:00
Yinan Zhang	a219cfcda3	Clear tcache prof_accumbytes in tcache_flush_cache `tcache->prof_accumbytes` should always be cleared after being transferred to arena; otherwise the allocations would be double counted, leading to excessive prof dumps.	2019-08-12 09:08:09 -07:00
Yinan Zhang	ad3f7dbfa0	Buffer prof_log_stop Make use of the new buffered writer for the output of `prof_log_stop`.	2019-08-12 09:06:01 -07:00
Qi Wang	5934846612	Fix large bin index accessed through cache bin descriptor.	2019-08-11 16:31:12 -07:00
Qi Wang	22746d3c9f	Properly dalloc prof nodes with idalloctm. The prof_alloc_node is allocated through ialloc as internal. Switch to idalloctm with tcache and is_internal properly set.	2019-08-09 10:29:49 -07:00
Yinan Zhang	8c8466fa6e	Add compact json option for emitter JSON format is largely meant for machine-machine communication, so adding the option to the emitter. According to local testing, the savings in terms of bytes outputted is around 50% for stats printing and around 25% for prof log printing.	2019-08-09 09:53:41 -07:00
Yinan Zhang	7fc6b1b259	Add buffered writer The buffered writer adopts a signature identical to `write_cb`, so that it can be plugged into anywhere `write_cb` appears.	2019-08-09 09:44:29 -07:00
Yinan Zhang	39343555d6	Report stats for tdatas_mtx and prof_dump_mtx	2019-08-09 09:24:16 -07:00
Qi Wang	87e2400cbb	Fix tcaches mutex pre- / post-fork handling.	2019-08-08 10:55:32 -07:00
Yinan Zhang	07ce2434bf	Refactor profiling Refactored core profiling codebase into two logical parts: (a) `prof_data.c`: core internal data structure managing & dumping; (b) `prof.c`: mutexes & outward-facing APIs. Some internal functions had to be exposed out, but there are not that many of them if the modularization is (hopefully) clean enough.	2019-08-07 19:48:28 -07:00
Yinan Zhang	56126d0d2d	Refactor prof log Prof logging is conceptually seperate from core profiling, so split it out as a module of its own. There are a few internal functions that had to be exposed but I think it is a fair trade-off.	2019-08-07 13:53:45 -07:00
Yinan Zhang	56c8ecffc1	Correct tsd layout graph Augmented the tsd layout graph so that the two recently added fields, `offset_state` and `bytes_until_sample`, are properly reflected. As is shown, the cache footprint is 16 bytes larger than before.	2019-08-05 15:30:20 -07:00
Qi Wang	ea6b3e973b	Merge branch 'dev'	2019-08-05 12:59:21 -07:00
Qi Wang	0cfa36a58a	Update Changelog for 5.2.1.	2019-08-05 12:52:43 -07:00
Qi Wang	8a94ac25d5	Sanity check on prof dump buffer size.	2019-08-01 17:55:45 -07:00
Yinan Zhang	82b8aaaeb6	Quick fix for prof log printing The emitter APIs used were incorrect, a side effect of which was extra lines being printed.	2019-07-30 19:31:28 -07:00
Yinan Zhang	9344d25488	Workaround to address g++ unused variable warnings g++ 5.5.0+ complained `parameter ‘expected’ set but not used [-Werror=unused-but-set-parameter]` (despite that `expected` is in fact used).	2019-07-30 11:37:56 -07:00
Qi Wang	c9cdc1b27f	Limit to exact fit on Windows with retain off. W/o retain, split and merge are disallowed on Windows. Avoid doing first-fit which needs splitting almost always. Instead, try exact fit only and bail out early.	2019-07-29 16:19:36 -07:00
Qi Wang	5742473cc8	Revert "Refactor prof log" This reverts commit 7618b0b8e458d9c0db6e4b05ccbe6c6308952890.	2019-07-29 14:10:15 -07:00
Qi Wang	1a0503367b	Revert "Refactor profiling" This reverts commit 0b462407ae84a62b3c097f0e9f18df487a47d9a7.	2019-07-29 14:10:15 -07:00
Yinan Zhang	0b462407ae	Refactor profiling Refactored core profiling codebase into two logical parts: (a) `prof_data.c`: core internal data structure managing & dumping; (b) `prof.c`: mutexes & outward-facing APIs. Some internal functions had to be exposed out, but there are not that many of them if the modularization is (hopefully) clean enough.	2019-07-29 13:55:00 -07:00
Yinan Zhang	7618b0b8e4	Refactor prof log `prof.c` is growing too long, so trying to modularize it. There are a few internal functions that had to be exposed but I think it is a fair trade-off.	2019-07-29 13:55:00 -07:00
Qi Wang	85f0cb2d0c	Add indent to individual options for confirm_conf.	2019-07-25 17:00:31 -07:00
Qi Wang	9f6a9f4c1f	Update manual for opt.retain (new default on Windows).	2019-07-25 15:25:58 -07:00
Qi Wang	10fcff6c38	Lower nthreads in test/unit/retained on 32-bit to avoid OOM.	2019-07-25 13:10:03 -07:00
Qi Wang	a3fa597921	Refactor arena_dalloc() / _sdalloc().	2019-07-24 18:30:54 -07:00
Qi Wang	bc0998a905	Invoke arena_dalloc_promoted() properly w/o tcache. When tcache was disabled, the dalloc promoted case was missing.	2019-07-24 18:30:54 -07:00
Qi Wang	1d148f353a	Optimize max_active_fit in first_fit. Stop scanning once reached the first max_active_fit size.	2019-07-24 11:28:45 -07:00
Qi Wang	4e36ce34c1	Track the leaked VM space via the abandoned_vm counter. The counter is 0 unless metadata allocation failed (indicates OOM), and is mainly for sanity checking.	2019-07-24 11:24:22 -07:00
Qi Wang	42807fcd9e	extent_dalloc instead of leak when register fails. extent_register may only fail if the underlying extent and region got stolen / coalesced before we lock. Avoid doing extent_leak (which purges the region) since we don't really own the region.	2019-07-23 22:34:45 -07:00
Qi Wang	57dbab5d6b	Avoid leaking extents / VM when split is not supported. This can only happen on Windows and with opt.retain disabled (which isn't the default). The solution is suboptimal, however not a common case as retain is the long term plan for all platforms anyway.	2019-07-23 22:18:55 -07:00
Qi Wang	badf8d95f1	Enable opt.retain by default on Windows.	2019-07-23 22:18:55 -07:00
Qi Wang	9a86c65abc	Implement retain on Windows. The VirtualAlloc and VirtualFree APIs are different because MEM_DECOMMIT cannot be used across multiple VirtualAlloc regions. To properly support decommit, only allow merge / split within the same region -- this is done by tracking the "is_head" state of extents and not merging cross-region. Add a new state is_head (only relevant for retain && !maps_coalesce), which is true for the first extent in each VirtualAlloc region. Determine if two extents can be merged based on the head state, and use serial numbers for sanity checks.	2019-07-23 22:18:55 -07:00
Qi Wang	f32f23d6cc	Fix posix_memalign with input size 0. Return a valid pointer instead of failed assertion.	2019-07-18 00:43:23 -07:00
Yinan Zhang	a2a693e722	Remove prof_accumbytes in arena `prof_accumbytes` was supposed to be replaced by `prof_accum` in https://github.com/jemalloc/jemalloc/pull/623.	2019-07-16 15:18:52 -07:00
Yinan Zhang	e0a0c8d4bf	Fix a bug in prof_dump_write The original logic can be disastrous if `PROF_DUMP_BUFSIZE` is less than `slen` -- `prof_dump_buf_end + slen <= PROF_DUMP_BUFSIZE` would always be `false`, so `memcpy` would always try to copy `PROF_DUMP_BUFSIZE - prof_dump_buf_end` chars, which can be dangerous: in the last round of the `while` loop it would not only illegally read the memory beyond `s` (which might not always be disastrous), but it would also illegally overwrite the memory beyond `prof_dump_buf` (which can be pretty disastrous). `slen` probably has never gone beyond `PROF_DUMP_BUFSIZE` so we were just lucky.	2019-07-16 15:15:32 -07:00
Yinan Zhang	d26636d566	Fix logic in printing `cbopaque` can now be overriden without overriding `write_cb` in the first place. (Otherwise there would be no need to have the `cbopaque` parameter in `malloc_message`.)	2019-07-16 14:54:23 -07:00
Qi Wang	34e75630cc	Reorder the configs for AppVeyor. Enable-debug and 64-bit runs tend to be more relevant. Run them first.	2019-07-14 23:06:24 -07:00
Yinan Zhang	7720b6e385	Fix redzone setting and checking	2019-07-11 20:51:29 -07:00
frederik-h	40a3435b8d	Add missing safety_check.c to MSBuild projects The file is included in the list of source files in Makefile.in, but it is missing from the project files. This causes the build to fail due to unresolved symbols.	2019-05-24 09:00:19 -07:00
Qi Wang	1a71533511	Avoid blocking on background thread lock for stats. Background threads may run for a long time, especially when the # of dirty pages is high. Avoid blocking stats calls because of this (which may cause latency spikes).	2019-05-22 14:28:38 -07:00
Qi Wang	e13cf65a5f	Add experimental.arenas.i.pactivep. The new experimental mallctl exposes the arena pactive counter to applications, which allows fast read w/o going through the mallctl / epoch steps. This is particularly useful when frequent balancing is required, e.g. when having multiple manual arenas, and threads are multiplexed to them based on usage.	2019-05-22 14:27:58 -07:00

1 2 3 4 5 ...

2471 Commits