server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Qi Wang	50b473c883	Set commit properly for FreeBSD w/ overcommit. When overcommit is enabled, commit needs to be set when doing mmap(). The regression was introduced in f80c97e.	2018-11-05 09:47:04 -08:00
Edward Tomasz Napierala	ceba1dde27	Make use of pthread_set_name_np(3) on FreeBSD.	2018-10-24 10:06:37 -07:00
Dave Watson	0f8313659e	malloc: Add a fastpath This diff adds a fastpath that assumes size <= SC_LOOKUP_MAXCLASS, and that we hit tcache. If either of these is false, we fall back to the previous codepath (renamed 'malloc_default'). Crucially, we only tail call malloc_default, and with the same kind and number of arguments, so that both clang and gcc tail-calling will kick in - therefore malloc() gets treated as a leaf function, and there are no caller-saved registers. Previously malloc() contained 5 caller saved registers on x64, resulting in at least 10 extra memory-movement instructions. In microbenchmarks this results in up to ~10% improvement in malloc() fastpath. In real programs, this is a ~1% CPU and latency improvement overall.	2018-10-18 08:32:19 -07:00
Dave Watson	ac34afb403	drop bump_empty_alloc option. Size class lookup support used instead.	2018-10-17 08:50:58 -07:00
Dave Watson	4edbb7c64c	sz: Support 0 size in size2index lookup/compute	2018-10-17 08:50:58 -07:00
gnzlbg	01e2a38e5a	Make `smallocx` symbol name depend on the `JEMALLOC_VERSION_GID` This comments concatenates the `JEMALLOC_VERSION_GID` to the `smallocx` symbol name, such that the symbol ends up exported as `smallocx_{git_hash}`.	2018-10-17 07:12:28 -07:00
gnzlbg	741fca1bb7	Hide smallocx even when enabled from the library API The experimental `smallocx` API is not exposed via header files, requiring the users to peek at `jemalloc`'s source code to manually add the external declarations to their own programs. This should reinforce that `smallocx` is experimental, and that `jemalloc` does not offer any kind of backwards compatiblity or ABI gurantees for it.	2018-10-17 07:12:28 -07:00
gnzlbg	08260a6b94	Add experimental API: smallocx_return_t smallocx(size, flags) --- Motivation: This new experimental memory-allocaction API returns a pointer to the allocation as well as the usable size of the allocated memory region. The `s` in `smallocx` stands for `sized`-`mallocx`, attempting to convey that this API returns the size of the allocated memory region. It should allow C++ P0901r0 [0] and Rust Alloc::alloc_excess to make use of it. The main purpose of these APIs is to improve telemetry. It is more accurate to register `smallocx(size, flags)` than `smallocx(nallocx(size), flags)`, for example. The latter will always line up perfectly with the existing size classes, causing a loss of telemetry information about the internal fragmentation induced by potentially poor size-classes choices. Instrumenting `nallocx` does not help much since user code can cache its result and use it repeatedly. --- Implementation: The implementation adds a new `usize` option to `static_opts_s` and an `usize` variable to `dynamic_opts_s`. These are then used to cache the result of `sz_index2size` and similar functions in the code paths in which they are unconditionally invoked. In the code-paths in which these functions are not unconditionally invoked, `smallocx` calls, as opposed to `mallocx`, these functions explicitly. --- [0]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0901r0.html	2018-10-17 07:12:28 -07:00
Dave Watson	325e3305fc	remove malloc_init() off the fastpath	2018-10-15 10:11:08 -07:00
Dave Watson	997d86acc6	restrict bytes_until_sample to int64_t. This allows optimal asm generation of sub bytes_until_sample, usize; je; for x86 arch. Subtraction is unconditional, and only flags are checked for the jump, no extra compare is necessary. This also reduces register pressure.	2018-10-15 08:24:12 -07:00
Dave Watson	d1a861fa80	add a check for SC_LARGE_MAXCLASS If we assume SC_LARGE_MAXCLASS will always fit in a SSIZE_T, then we can optimize some checks by unconditional subtraction, and then checking flags only, without a compare statement in x86.	2018-10-15 08:24:12 -07:00
Dave Watson	9ed3bdc848	move bytes until sample to tsd. Fastpath allocation does not need to load tdata now, avoiding several branches.	2018-10-15 08:24:12 -07:00
jsteemann	856319dc8a	check return value of `malloc_read_fd` in case `malloc_read_fd` returns a negative error number, the result would afterwards be casted to an unsigned size_t, and may have theoretically caused an out-of-bounds memory access in the following `strncmp` call.	2018-10-11 17:25:20 -07:00
Edward Tomasz Napierala	f80c97e477	Rework the way jemalloc uses mmap(2) on FreeBSD. This makes it directly use MAP_EXCL and MAP_ALIGNED() instead of weird workarounds involving mapping at random places and then unmapping parts of them.	2018-10-06 22:06:56 -07:00
Edward Tomasz Napierala	676cdd6679	Disable runtime detection of lazy purging support on FreeBSD. The check doesn't seem to serve any purpose here, and this shaves off three syscalls on binary startup.	2018-10-06 22:06:56 -07:00
David Goldblatt	88771fa013	Bootstrapping: don't overwrite opt_prof_prefix.	2018-09-12 17:06:06 -07:00
David Carlier	0771ff2cea	FreeBSD build changes and allow to run the tests.	2018-08-09 10:41:20 -07:00
David Goldblatt	e8ec9528ab	Allow the use of readlinkat over readlink. This can be useful in situations where readlink is disallowed.	2018-08-03 14:04:32 -07:00
Tyler Etzel	126252a7e6	Add stats for the size of extent_avail heap	2018-08-02 10:16:06 -07:00
Tyler Etzel	c14e6c0819	Add extents information to mallocstats output - Show number/bytes of extents of each size that are dirty, muzzy, retained.	2018-08-02 10:16:06 -07:00
Tyler Etzel	5e23f96dd4	Add unit tests for logging	2018-08-01 13:27:11 -07:00
Tyler Etzel	b664bd7935	Add logging for sampled allocations - prof_opt_log flag starts logging automatically at runtime - prof_log_{start,stop} mallctl for manual control	2018-08-01 13:27:11 -07:00
Tyler Etzel	eb261e53a6	Small refactoring of emitter - Make API more clear for using as standalone json emitter - Support cases that weren't possible before, e.g. - emitting primitive values in an array - emitting nested arrays	2018-08-01 13:27:11 -07:00
David Goldblatt	41b7372ead	TSD: Add fork support to tsd_nominal_tsds. In case of multithreaded fork, we want to leave the child in a reasonable state, in which tsd_nominal_tsds is either empty or contains only the forking thread.	2018-07-26 17:22:25 -07:00
David Goldblatt	013ab26c86	TSD: Add a tsd_nominal_list death assertion. A thread should have had its state transition away from nominal before it dies. This change adds that to the list of thread death assertions.	2018-07-26 17:22:25 -07:00
David Goldblatt	3aba072cef	SC: Remove global data. The global data is mostly only used at initialization, or for easy access to values we could compute statically. Instead of consuming that space (and risking TLB misses), we can just pass around a pointer to stack data during bootstrapping.	2018-07-23 13:37:08 -07:00
Qi Wang	4bc48718b2	Tolerate experimental features for abort_conf. Not aborting with unrecognized experimental options. This helps us testing experimental features with abort_conf enabled.	2018-07-17 20:40:32 -07:00
David Goldblatt	55e5cc1341	SC: Make some key size classes static. The largest small class, smallest large class, and largest large class may all be needed down fast paths; to avoid the risk of touching another cache line, we can make them available as constants.	2018-07-12 20:53:06 -07:00
David T. Goldblatt	5112d9e5fd	Add MALLOC_CONF parsing for dynamic slab sizes. This actually enables us to change the values.	2018-07-12 20:53:06 -07:00
David T. Goldblatt	4610ffa942	Bootstrapping: Parse MALLOC_CONF before using slab sizes. I.e., parse before booting the bin module or sz module. This lets us tweak size class settings before committing to them by letting them leak into other modules. This commit does not actually do any tweaking of the size classes; it just chanchanges bootstrapping order; this may help bisecting any bootstrapping failures on poorly-tested architectures.	2018-07-12 20:53:06 -07:00
David T. Goldblatt	a7f68aed3e	SC: Add page customization functionality.	2018-07-12 20:53:06 -07:00
David T. Goldblatt	017dca198c	SC module: Add a note on style.	2018-07-12 20:53:06 -07:00
David Goldblatt	0552aad91b	Kill size_classes.sh. We've moved size class computations to boot time; they were being used only to check that the computations resulted in equal values.	2018-07-12 20:53:06 -07:00
David Goldblatt	4f55c0ec22	Translate size class computation from bash shell into C. This is the last big step in making size classes a runtime computation rather than a configure-time one. The compile-time computation has been left in, for now, to allow assertion checking that the results are identical.	2018-07-12 20:53:06 -07:00
David Goldblatt	e904f813b4	Hide size class computation behind a layer of indirection. This class removes almost all the dependencies on size_classes.h, accessing the data there only via the new module sc.h, which does not depend on any configuration options. In a subsequent commit, we'll remove the configure-time size class computations, doing them at boot time, instead.	2018-07-12 20:53:06 -07:00
gnzlbg	3d29d11ac2	Clean compilation -Wextra Before this commit jemalloc produced many warnings when compiled with -Wextra with both Clang and GCC. This commit fixes the issues raised by these warnings or suppresses them if they were spurious at least for the Clang and GCC versions covered by CI. This commit: * adds `JEMALLOC_DIAGNOSTIC` macros: `JEMALLOC_DIAGNOSTIC_{PUSH,POP}` are used to modify the stack of enabled diagnostics. The `JEMALLOC_DIAGNOSTIC_IGNORE_...` macros are used to ignore a concrete diagnostic. * adds `JEMALLOC_FALLTHROUGH` macro to explicitly state that falling through `case` labels in a `switch` statement is intended * Removes all UNUSED annotations on function parameters. The warning -Wunused-parameter is now disabled globally in `jemalloc_internal_macros.h` for all translation units that include that header. It is never re-enabled since that header cannot be included by users. * locally suppresses some -Wextra diagnostics: * `-Wmissing-field-initializer` is buggy in older Clang and GCC versions, where it does not understanding that, in C, `= {0}` is a common C idiom to initialize a struct to zero * `-Wtype-bounds` is suppressed in a particular situation where a generic macro, used in multiple different places, compares an unsigned integer for smaller than zero, which is always true. * `-Walloc-larger-than-size=` diagnostics warn when an allocation function is called with a size that is too large (out-of-range). These are suppressed in the parts of the tests where `jemalloc` explicitly does this to test that the allocation functions fail properly. * adds a new CI build bot that runs the log unit test on CI. Closes #1196 .	2018-07-09 21:40:42 -07:00
Qi Wang	cdf15b458a	Rename huge_threshold to experimental, and tweak documentation.	2018-06-29 10:35:02 -07:00
Qi Wang	1302af4c43	Add ctl and stats for opt.huge_threshold.	2018-06-29 10:35:02 -07:00
Qi Wang	79522b2fc2	Refactor arena_is_auto.	2018-06-29 10:35:02 -07:00
Qi Wang	94a88c26f4	Implement huge arena: opt.huge_threshold. The feature allows using a dedicated arena for huge allocations. We want the addtional arena to separate huge allocation because: 1) mixing small extents with huge ones causes fragmentation over the long run (this feature reduces VM size significantly); 2) with many arenas, huge extents rarely get reused across threads; and 3) huge allocations happen way less frequently, therefore no concerns for lock contention.	2018-06-29 10:35:02 -07:00
Qi Wang	77a71ef2b7	Fall back to the default pthread_create if RTLD_NEXT fails.	2018-06-28 13:18:21 -07:00
David Goldblatt	d1e11d48d4	Move tsd link and in_hook after tcache. This can lead to better cache utilization down the common paths where we don't touch the link.	2018-06-27 13:39:02 -07:00
Qi Wang	fec1ef7c91	Fix arena locking in tcache_bin_flush_large(). This regression was introduced in c834912 (incorrect arena used).	2018-06-26 23:13:15 -07:00
Qi Wang	0ff7ff3ec7	Optimize ixalloc by avoiding a size lookup.	2018-06-05 21:03:51 -07:00
Qi Wang	c834912aa9	Avoid taking large_mtx for auto arenas. On tcache flush path, we can avoid touching the large_mtx for auto arenas, since it was only needed for manual arenas where arena_reset is allowed.	2018-06-05 15:16:03 -07:00
Qi Wang	9bd8deb260	Fix stats output for opt.lg_extent_max_active_fit.	2018-06-05 10:23:28 -07:00
Qi Wang	d22e150320	Avoid taking extents_muzzy mutex when muzzy is disabled. When muzzy decay is disabled, no need to allocate from extents_muzzy. This saves us a couple of mutex operations down the extents_alloc path.	2018-05-24 14:40:56 -07:00
David Goldblatt	a7f749c9af	Hooks: Protect against reentrancy. Previously, we made the user deal with this themselves, but that's not good enough; if hooks may allocate, we should test the allocation pathways down hooks. If we're doing that, we might as well actually implement the protection for the user.	2018-05-18 11:43:03 -07:00
David Goldblatt	0379235f47	Tests: Shouldn't be able to change global slowness. This can help ensure that we don't leave slowness changes behind in case of resource exhaustion.	2018-05-18 11:43:03 -07:00
David Goldblatt	59e371f463	Hooks: Add a hook exhaustion test. When we run out of space in which to store hooks, we should return EAGAIN from the mallctl, but not otherwise misbehave.	2018-05-18 11:43:03 -07:00

... 14 15 16 17 18 ...

1827 Commits