server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
David Goldblatt	b3df80bc79	Pull HPA options into a containing struct. Currently that just means max_alloc, but we're about to add more. While we're touching these lines anyways, tweak things to be more in line with testing.	2021-02-04 20:58:31 -08:00
David Goldblatt	bdb7307ff2	fxp: Add FXP_INIT_PERCENT This lets us specify fxp values easily in source.	2021-02-04 20:58:31 -08:00
David Goldblatt	caef4c2868	FXP: add fxp_mul_frac. This can multiply size_ts by a fraction without the risk of overflow.	2021-02-04 20:58:31 -08:00
David Goldblatt	dc886e5608	hpdata: Return the number of pages to be purged. We'll use this in the next commit.	2021-02-04 20:58:31 -08:00
David Goldblatt	9fd9c876bb	psset: keep aggregate stats. This will let us quickly query these stats to make purging decisions quickly.	2021-02-04 20:58:31 -08:00
David Goldblatt	da63f23e68	HPA: Track pending purges/hugifies in the psset. This finishes the refactoring of the HPA/psset interactions the past few commits have been building towards. Rather than the HPA removing and then reinserting hpdatas, it simply begins updates and ends them. These updates can set flags on the hpdata that prevent it from being returned for certain types of requests. For example, it can call hpdata_alloc_allowed_set(hpdata, false) during an update, at which point the given hpdata will no longer be returned for psset_pick_alloc requests. This has various of benefits: - It maintains stats correctness during purges and hugifies. - It allows simpler and more explicit concurrency control for the various special cases (e.g. allocations are disallowed during purge, but not during hugify). - It lets allocations and deallocations avoid disturbing the purging and hugification orderings. If an hpdata "loses its place" in one of the queues just do to an alloc / dalloc, it can result in pathological edge cases where very hot, very full hugepages never get hugified (and cold extents on the same hugepage as hot ones never get purged). The key benefit though is that tracking hpdatas to be purged / hugified in a principled way will let us do delayed purging and hugification. Eventually this will let us move these operations to background threads, but in the short term the benefit is that it will let us have global purging policies (e.g. purge when the entire arena has too many dirty pages, rather than any particular hugepage).	2021-02-04 20:58:31 -08:00
David Goldblatt	bf64557ed6	Move empty slab tracking to the psset. We're moving towards a world in which purging decisions are less rigidly enforced at a single-hugepage level. In that world, it makes sense to keep around some hpdatas which are not completely purged, in which case we'll need to track them.	2021-02-04 20:58:31 -08:00
David Goldblatt	99fc0717e6	psset: Reconceptualize insertion/removal. Really, this isn't a functional change, just a naming change. We start thinking of pageslabs as being always in the psset. What we used to think of as removal is now thought of as being in the psset, but in the process of being updated (and therefore, unavalable for serving new allocations). This is in preparation of subsequent changes to support deferred purging; allocations will still be in the psset for the purposes of choosing when to purge, but not for purposes of allocation/deallocation.	2021-02-04 20:58:31 -08:00
David Goldblatt	68a1666e91	hpdata: Rename "dirty" to "touched". This matches the usage in the rest of the codebase.	2021-02-04 20:58:31 -08:00
David Goldblatt	be0d7a53f3	HPA: Don't track inactive pages. This is really only useful for human consumption. Correspondingly, emit it only in the human-readable stats, and let everybody else compute from the hugepage size and nactive.	2021-02-04 20:58:31 -08:00
David Goldblatt	55e0f60ca1	psset stats: Simplify handling. We can treat the huge and nonhuge cases uniformly using huge state as an array index.	2021-02-04 20:58:31 -08:00
David Goldblatt	30b9e8162b	HPA: Generalize purging. Previously, we would purge a hugepage only when it's completely empty. With this change, we can purge even when only partially empty. Although the heuristic here is still fairly primitive, this infrastructure can scale to become more advanced.	2021-02-04 20:58:31 -08:00
David Goldblatt	70692cfb13	hpdata: Add state changing helpers. We're about to allow hugepage subextent purging; get as much of our metadata handling ready as possible.	2021-02-04 20:58:31 -08:00
David Goldblatt	9b75808be1	flat bitmap: Add a bitwise and/or/not. We're about to need them.	2021-02-04 20:58:31 -08:00
David Goldblatt	c259323ab3	Use ticker_geom_t for arena tcache decay.	2021-02-04 14:10:43 -08:00
David Goldblatt	8edfc5b170	Add ticker_geom_t. This lets a single ticker object drive events across a large number of different tick streams while sharing state.	2021-02-04 14:10:43 -08:00
David Goldblatt	2fcbd18115	Cache bin: Don't reverse flush order. The items we pick to flush matter a lot, but the order in which they get flushed doesn't; just use forward scans. This simplifies the accessing code, both in terms of the C and the generated assembly (i.e. this speeds up the flush pathways).	2021-02-04 14:10:43 -08:00
David Goldblatt	229994a204	Tcache flush: keep common path state in registers. By carefully force-inlining the division constants and the operation sum count, we can eliminate redundant operations in the arena-level dalloc function. Do so.	2021-02-04 14:10:43 -08:00
Azat Khuzhin	a943172b73	Add runtime detection for MADV_DONTNEED zeroes pages (mostly for qemu) qemu does not support this, yet [1], and you can get very tricky assert if you will run program with jemalloc in use under qemu: <jemalloc>: ../contrib/jemalloc/src/extent.c:1195: Failed assertion: "p[i] == 0" [1]: https://patchwork.kernel.org/patch/10576637/ Here is a simple example that shows the problem [2]: // Gist to check possible issues with MADV_DONTNEED // For example it does not supported by qemu user // There is a patch for this [1], but it hasn't been applied. // [1]: https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05422.html #include <sys/mman.h> #include <stdio.h> #include <stddef.h> #include <assert.h> #include <string.h> int main(int argc, char *argv) { void addr = mmap(NULL, 1<<16, PROT_READ\|PROT_WRITE, MAP_PRIVATE\|MAP_ANONYMOUS, -1, 0); if (addr == MAP_FAILED) { perror("mmap"); return 1; } memset(addr, 'A', 1<<16); if (!madvise(addr, 1<<16, MADV_DONTNEED)) { puts("MADV_DONTNEED does not return error. Check memory."); for (int i = 0; i < 1<<16; ++i) { assert(((unsigned char )addr)[i] == 0); } } else { perror("madvise"); } if (munmap(addr, 1<<16)) { perror("munmap"); return 1; } return 0; } ### unpatched qemu $ qemu-x86_64-static /tmp/test-MADV_DONTNEED MADV_DONTNEED does not return error. Check memory. test-MADV_DONTNEED: /tmp/test-MADV_DONTNEED.c:19: main: Assertion `((unsigned char )addr)[i] == 0' failed. qemu: uncaught target signal 6 (Aborted) - core dumped Aborted (core dumped) ### patched qemu (by returning ENOSYS error) $ qemu-x86_64 /tmp/test-MADV_DONTNEED madvise: Success ### patch for qemu to return ENOSYS diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 897d20c076..5540792e0e 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -11775,7 +11775,7 @@ static abi_long do_syscall1(void cpu_env, int num, abi_long arg1, turns private file-backed mappings into anonymous mappings. This will break MADV_DONTNEED. This is a hint, so ignoring and returning success is ok. / - return 0; + return ENOSYS; #endif #ifdef TARGET_NR_fcntl64 case TARGET_NR_fcntl64: [2]: https://gist.github.com/azat/12ba2c825b710653ece34dba7f926ece v2: - review fixes - add opt_dont_trust_madvise v3: - review fixes - rename opt_dont_trust_madvise to opt_trust_madvise	2021-01-20 20:08:30 -08:00
David Goldblatt	a011c4c22d	cache_bin: Separate out local and remote accesses. This fixes an incorrect debug-mode assert: - T1 starts an arena stats update and reads stack_head from another thread's cache bin, when that cache bin has 1 item in it. - T2 allocates from that cache bin. The cache_bin's stack_head now points to a NULL pointer, since the cache bin is empty. - T1 Re-reads the cache_bin's stack_head to perform an assertion check (since it previously saw that the bin was empty, whatever stack_head points to should be non-NULL).	2021-01-08 14:18:08 -08:00
Yinan Zhang	4352cbc21c	Add alignment tests for prof stats	2021-01-07 20:39:49 -08:00
Yinan Zhang	54f3351f1f	Add mallctl for prof stats fetching	2021-01-07 20:39:49 -08:00
Yinan Zhang	40fa4d29d3	Track per size class internal fragmentation	2021-01-07 20:39:49 -08:00
David Goldblatt	83cad746ae	prof_log: cassert(config_prof) in public functions This lets the compiler infer that the code is dead in builds where profiling is enabled, saving on space there.	2021-01-04 14:55:49 -08:00
Yinan Zhang	4557c0a67d	Enable ctl on partial mib and partial name	2020-12-18 10:39:58 -08:00
Yinan Zhang	006dd0414e	Add partial name-to-mib functionality	2020-12-18 10:39:58 -08:00
Yinan Zhang	f2e1a5be77	Do not fail on partial ctl path for ctl_nametomib() We do not fail on partial ctl path when the given `mib` array is shorter than the given name, and we should keep the behavior the same in the reverse case, which I feel is also the more natural way.	2020-12-18 10:39:58 -08:00
David Goldblatt	1e3b8636ff	HPA: Remove unused malloc_conf options.	2020-12-08 12:10:48 -08:00
David Goldblatt	f51948d9e1	psset unit test: fix a bug. The next commit adds assertions that reveal a bug in the test code (double-free). Fix it.	2020-12-07 06:21:08 -08:00
David Goldblatt	54c94c1679	flat bitmap: add scount / ucount functions. These can compute the number or set or unset bits in a subrange of the bitmap.	2020-12-07 06:21:08 -08:00
David Goldblatt	734e72ce8f	bit_util: Guarantee popcount's presence. Implement popcount generically, so that we can rely on it being present.	2020-12-07 06:21:08 -08:00
David Goldblatt	d9f7e6c668	hpdata: Add a test. We're about to make the functionality here more complicated; testing hpdata directly (rather than relying on user's tests) will make debugging easier.	2020-12-07 06:21:08 -08:00
David Goldblatt	f7cf23aa4d	psset: Relegate alloc/dalloc to test code. This is no longer part of the "core" functionality; we only need the stub implementations as an end-to-end test of hpdata + psset interactions when metadata is being modified. Treat them accordingly.	2020-12-07 06:21:08 -08:00
David Goldblatt	ca30b5db2b	Introduce hpdata_t. Using an edata_t both for hugepages and the allocations within those hugepages was convenient at first, but has outlived its usefulness. Representing hugepages explicitly, with their own data structure, will make future development easier.	2020-12-07 06:21:08 -08:00
David Goldblatt	4a15008cfb	HPA unit test: skip if unsupported. Previously, we replicated the logic in hpa_supported in the test as well.	2020-12-07 06:21:08 -08:00
David Goldblatt	43af63fff4	HPA: Manage whole hugepages at a time. This redesigns the HPA implementation to allow us to manage hugepages all at once, locally, without relying on a global fallback.	2020-12-07 06:21:08 -08:00
David Goldblatt	c1b2a77933	psset: Move in stats. A later change will benefit from having these functions pulled into a psset-module set of functions.	2020-12-07 06:21:08 -08:00
David Goldblatt	d0a991d47b	psset: Add insert/remove functions. These will allow us to (for instance) move pageslabs from a psset dedicated to not-yet-hugeified pages to one dedicated to hugeified ones.	2020-12-07 06:21:08 -08:00
David Goldblatt	ecd39418ac	Add fxp: A fixed-point math library. This will be used in the next commit to allow non-integer values for narenas_ratio.	2020-12-04 23:48:19 -08:00
Yinan Zhang	d96e4525ad	Route batch allocation of small batch size to tcache	2020-11-16 20:58:01 -08:00
Yinan Zhang	ac480136d7	Split out locality checking in batch allocation tests	2020-11-16 20:58:01 -08:00
Yinan Zhang	be5e49f4fa	Add a batch mode for cache_bin_alloc()	2020-11-16 20:58:01 -08:00
Yinan Zhang	4a65f34930	Fix a cache bin test	2020-11-16 20:58:01 -08:00
Yinan Zhang	9545c2cd36	Add sample interval to prof last-N dump	2020-11-13 15:33:27 -08:00
David Goldblatt	cf2549a149	Add a per-arena oversize_threshold. This can let manual arenas trade off memory and CPU the way auto arenas do.	2020-11-13 13:45:35 -08:00
David Goldblatt	4ca3d91e96	Rename geom_grow -> exp_grow. This was promised in the review of the introduction of geom_grow, but would have been painful to do there because of the series that introduced it. Now that those are comitted, renaming is easier.	2020-11-13 13:42:33 -08:00
David Goldblatt	03a6047111	Edata cache small: rewrite. In previous designs, this was intended to be a sort of cache that couldn't fail. In the current design, we want to use it just as a contention reduction mechanism. Rewrite it with those goals in mind.	2020-11-05 12:34:43 -08:00
David Goldblatt	1b3ee75667	Add experimental.thread.activity_callback. This (experimental, undocumented) functionality can be used by users to track various statistics of interest at a finer level of granularity than the thread.	2020-11-05 12:33:25 -08:00
Qi Wang	bf72188f80	Allow opt.tcache_max to accept small size classes. Previously all the small size classes were cached. However this has downsides -- particularly when page size is greater than 4K (e.g. iOS), which will result in much higher SMALL_MAXCLASS. This change allows tcache_max to be set to lower values, to better control resources taken by tcache.	2020-10-24 20:43:44 -07:00
David Goldblatt	d16849c91d	psset: Do first-fit based on slab age. This functions more like the serial number strategy of the ecache and hpa_central_t. Longer-lived slabs are more likely to continue to live for longer in the future.	2020-10-23 11:14:34 -07:00

1 2 3 4 5 ...

519 Commits