server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Jason Evans	61a6dfcd5f	Constify various internal arena APIs.	2016-03-23 16:15:42 -07:00
Jason Evans	6a885198c2	Always inline performance-critical rtree operations.	2016-03-23 16:15:42 -07:00
Jason Evans	6c460ad91b	Optimize rtree_get(). Specialize fast path to avoid code that cannot execute for dependent loads. Manually unroll.	2016-03-22 17:54:35 -07:00
Jason Evans	22af74e106	Refactor out signed/unsigned comparisons.	2016-03-15 09:40:02 -07:00
Rajeev Misra	ca18f2834e	typecast address to pointer to byte to avoid unaligned memory access error	2016-03-10 22:49:05 -08:00
Jason Evans	613cdc80f6	Convert arena_bin_t's runs from a tree to a heap.	2016-03-08 13:48:27 -08:00
Dave Watson	4a0dbb5ac8	Use pairing heap for arena->runs_avail Use pairing heap instead of red black tree in arena runs_avail. The extra links are unioned with the bitmap_t, so this change doesn't use any extra memory. Canaries show this change to be a 1% cpu win, and 2% latency win. In particular, large free()s, and small bin frees are now O(1) (barring coalescing). I also tested changing bin->runs to be a pairing heap, but saw a much smaller win, and it would mean increasing the size of arena_run_s by two pointers, so I left that as an rb-tree for now.	2016-03-08 13:48:27 -08:00
Jason Evans	f8d80d62a8	Refactor ph_merge_ordered() out of ph_merge().	2016-03-08 13:48:27 -08:00
Dave Watson	6bafa6678f	Pairing heap Initial implementation of a twopass pairing heap with aux list. Research papers linked in comments. Where search/nsearch/last aren't needed, this gives much faster first(), delete(), and insert(). Insert is O(1), and first/delete don't have to walk the whole tree. Also tested rb_old with parent pointers - it was better than the current rb.h for memory loads, but still much worse than a pairing heap. An array-based heap would be much faster if everything fits in memory, but on a cold cache it has many more memory loads for most operations.	2016-03-08 13:46:19 -08:00
Jason Evans	022f6891fa	Avoid a potential innocuous compiler warning. Add a cast to avoid comparing a ssize_t value to a uint64_t value that is always larger than a 32-bit ssize_t. This silences an innocuous compiler warning from e.g. gcc 4.2.1 about the comparison always having the same result.	2016-03-02 22:45:37 -08:00
Jason Evans	3c07f803aa	Fix stats.arenas.<i>.[...] for --disable-stats case. Add missing stats.arenas.<i>.{dss,lg_dirty_mult,decay_time} initialization. Fix stats.arenas.<i>.{pactive,pdirty} to read under the protection of the arena mutex.	2016-02-27 20:40:13 -08:00
Jason Evans	69acd25a64	Add/alphabetize private symbols.	2016-02-27 15:35:52 -08:00
Jason Evans	40ee9aa957	Fix stats.cactive accounting regression. Fix stats.cactive accounting to always increase/decrease by multiples of the chunk size, even for huge size classes that are not multiples of the chunk size, e.g. {2.5, 3, 3.5, 5, 7} MiB with 2 MiB chunk size. This regression was introduced by 155bfa7da18cab0d21d87aa2dce4554166836f5d (Normalize size classes.) and first released in 4.0.0. This resolves #336.	2016-02-27 15:35:52 -08:00
Jason Evans	20fad3430c	Refactor some bitmap cpp logic.	2016-02-26 14:43:39 -08:00
Dave Watson	b8823ab026	Use linear scan for small bitmaps For small bitmaps, a linear scan of the bitmap is slightly faster than a tree search - bitmap_t is more compact, and there are fewer writes since we don't have to propogate state transitions up the tree. On x86_64 with the current settings, I'm seeing ~.5%-1% CPU improvement in production canaries with this change. The old tree code is left since 32bit sizes are much larger (and ffsl smaller), and maybe the run sizes will change in the future. This resolves #339.	2016-02-26 14:21:10 -08:00
Jason Evans	01ecdf32d6	Miscellaneous bitmap refactoring.	2016-02-26 14:21:10 -08:00
Jason Evans	42ce80e15a	Silence miscellaneous 64-to-32-bit data loss warnings. This resolves #341.	2016-02-25 20:51:00 -08:00
Jason Evans	0c516a00c4	Make *allocx() size class overflow behavior defined. Limit supported size and alignment to HUGE_MAXCLASS, which in turn is now limited to be less than PTRDIFF_MAX. This resolves #278 and #295.	2016-02-25 15:29:49 -08:00
Jason Evans	767d85061a	Refactor arenas array (fixes deadlock). Refactor the arenas array, which contains pointers to all extant arenas, such that it starts out as a sparse array of maximum size, and use double-checked atomics-based reads as the basis for fast and simple arena_get(). Additionally, reduce arenas_lock's role such that it only protects against arena initalization races. These changes remove the possibility for arena lookups to trigger locking, which resolves at least one known (fork-related) deadlock. This resolves #315.	2016-02-24 23:58:10 -08:00
Jason Evans	c7a9a6c86b	Attempt mmap-based in-place huge reallocation. Attempt mmap-based in-place huge reallocation by plumbing new_addr into chunk_alloc_mmap(). This can dramatically speed up incremental huge reallocation. This resolves #335.	2016-02-24 17:23:18 -08:00
Jason Evans	aa63d5d377	Fix ffs_zu() compilation error on MinGW. This regression was caused by 9f4ee6034c3ac6a8c8b5f9a0d76822fb2fd90c41 (Refactor jemalloc_ffs() into ffs_().).	2016-02-24 14:01:47 -08:00
Jason Evans	9e1810ca9d	Silence miscellaneous 64-to-32-bit data loss warnings.	2016-02-24 13:03:48 -08:00
Jason Evans	1c42a04cc6	Change lg_floor() return type from size_t to unsigned.	2016-02-24 13:03:48 -08:00
Jason Evans	8f683b94a7	Make opt_narenas unsigned rather than size_t.	2016-02-24 13:03:48 -08:00
Jason Evans	603b3bd413	Make nhbins unsigned rather than size_t.	2016-02-24 13:03:48 -08:00
Jason Evans	9f4ee6034c	Refactor jemalloc_ffs() into ffs_(). Use appropriate versions to resolve 64-to-32-bit data loss warnings.	2016-02-24 13:03:48 -08:00
Dmitri Smirnov	b41a07c31a	Fix Windows build issues This resolves #333.	2016-02-23 18:55:45 -08:00
Jason Evans	ae45142adc	Collapse arena_avail_tree_* into arena_run_tree_*. These tree types converged to become identical, yet they still had independently generated red-black tree implementations.	2016-02-23 18:27:24 -08:00
Dave Watson	3417a304cc	Separate arena_avail trees Separate run trees by index, replacing the previous quantize logic. Quantization by index is now performed only on insertion / removal from the tree, and not on node comparison, saving some cpu. This also means we don't have to dereference the miscelm* pointers, saving half of the memory loads from miscelms/mapbits that have fallen out of cache. A linear scan of the indicies appears to be fast enough. The only cost of this is an extra tree array in each arena.	2016-02-23 18:09:36 -08:00
Dave Watson	2b1fc90b7b	Remove rbt_nil Since this is an intrusive tree, rbt_nil is the whole size of the node and can be quite large. For example, miscelm is ~100 bytes.	2016-02-23 18:09:25 -08:00
Jason Evans	0da8ce1e96	Use table lookup for run_quantize_{floor,ceil}(). Reduce run quantization overhead by generating lookup tables during bootstrapping, and using the tables for all subsequent run quantization.	2016-02-22 16:47:34 -08:00
Jason Evans	a9a4684792	Test run quantization. Also rename run_quantize_*() to improve clarity. These tests demonstrate that run_quantize_ceil() is flawed.	2016-02-22 14:58:05 -08:00
Jason Evans	817d9030a5	Indentation style cleanup.	2016-02-22 10:44:58 -08:00
Jason Evans	9bad079039	Refactor time_* into nstime_*. Use a single uint64_t in nstime_t to store nanoseconds rather than using struct timespec. This reduces fragility around conversions between long and uint64_t, especially missing casts that only cause problems on 32-bit platforms.	2016-02-21 21:39:05 -08:00
Jason Evans	56139dc403	Remove _WIN32-specific struct timespec declaration. struct timespec is already defined by the system (at least on MinGW).	2016-02-20 23:43:17 -08:00
Jason Evans	ecae12323d	Fix overflow in prng_range(). Add jemalloc_ffs64() and use it instead of jemalloc_ffsl() in prng_range(), since long is not guaranteed to be a 64-bit type.	2016-02-20 23:41:33 -08:00
Jason Evans	aac93f414e	Add symbol mangling for prng_[lg_]range().	2016-02-20 11:26:00 -08:00
rustyx	3c2c5a5071	Fix warning in ipalloc	2016-02-20 10:55:18 -08:00
Christopher Ferris	effaf7d40f	Fix a typo in the ckh_search() prototype.	2016-02-20 10:26:17 -08:00
Jason Evans	a0aaad1afa	Handle unaligned keys in hash(). Reported by Christopher Ferris <cferris@google.com>.	2016-02-20 10:23:48 -08:00
Jason Evans	243f7a0508	Implement decay-based unused dirty page purging. This is an alternative to the existing ratio-based unused dirty page purging, and is intended to eventually become the sole purging mechanism. Add mallctls: - opt.purge - opt.decay_time - arena.<i>.decay - arena.<i>.decay_time - arenas.decay_time - stats.arenas.<i>.decay_time This resolves #325.	2016-02-19 20:56:21 -08:00
Jason Evans	8e82af1166	Implement smoothstep table generation. Check in a generated smootherstep table as smoothstep.h rather than generating it at configure time, since not all systems (e.g. Windows) have dc.	2016-02-19 20:56:15 -08:00
Jason Evans	db927b6727	Refactor arenas_cache tsd. Refactor arenas_cache tsd into arenas_tdata, which is a structure of type arena_tdata_t.	2016-02-19 20:32:37 -08:00
Jason Evans	578cd16581	Refactor arena_malloc_hard() out of arena_malloc().	2016-02-19 20:32:32 -08:00
Jason Evans	34676d3369	Refactor prng* from cpp macros into inline functions. Remove 32-bit variant, convert prng64() to prng_lg_range(), and add prng_range().	2016-02-19 20:29:06 -08:00
Jason Evans	c87ab25d18	Use ticker for incremental tcache GC.	2016-02-19 20:29:06 -08:00
Jason Evans	9998000b2b	Implement ticker. Implement ticker, which provides a simple API for ticking off some number of events before indicating that the ticker has hit its limit.	2016-02-19 20:29:06 -08:00
Jason Evans	94451d184b	Flesh out time_*() API.	2016-02-19 20:29:06 -08:00
Cameron Evans	e5d5a4a517	Add time_update().	2016-02-19 20:29:06 -08:00
Jason Evans	f829009929	Add --with-malloc-conf. Add --with-malloc-conf, which makes it possible to embed a default options string during configuration.	2016-02-19 20:29:06 -08:00

1 2 3 4 5 ...

435 Commits