server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Qi Wang	8dabf81df1	Bypass extent_dalloc when retain is enabled. When retain is enabled, the default dalloc hook does nothing (since we avoid munmap). But the overhead preparing the call is high, specifically the extent de-register and re-register involve locking and extent / rtree modifications. Bypass the call with retain in this diff.	2018-11-08 11:32:25 -08:00
Qi Wang	50b473c883	Set commit properly for FreeBSD w/ overcommit. When overcommit is enabled, commit needs to be set when doing mmap(). The regression was introduced in f80c97e.	2018-11-05 09:47:04 -08:00
Justin Hibbits	be0749f591	Restrict lwsync to powerpc64 only Nearly all 32-bit powerpc hardware treats lwsync as sync, and some cores (Freescale e500) trap lwsync as an illegal instruction, which then gets emulated in the kernel. To avoid unnecessary traps on the e500, use sync on all 32-bit powerpc. This pessimizes 32-bit software running on 64-bit hardware, but those numbers should be slim.	2018-10-24 11:18:55 -07:00
Edward Tomasz Napierala	ceba1dde27	Make use of pthread_set_name_np(3) on FreeBSD.	2018-10-24 10:06:37 -07:00
Dave Watson	936bc2aa15	prof: Fix memory regression The diff 'refactor prof accum...' moved the bytes_until_sample subtraction before the load of tdata. If tdata is null, tdata_get(true) will overwrite bytes_until_sample, but we still sample the current allocation. Instead, do the subtraction and check logic again, to keep the previous behavior. blame-rev: 0ac524308d3f636d1a4b5149fa7adf24cf426d9c	2018-10-23 12:39:57 -07:00
Dave Watson	0f8313659e	malloc: Add a fastpath This diff adds a fastpath that assumes size <= SC_LOOKUP_MAXCLASS, and that we hit tcache. If either of these is false, we fall back to the previous codepath (renamed 'malloc_default'). Crucially, we only tail call malloc_default, and with the same kind and number of arguments, so that both clang and gcc tail-calling will kick in - therefore malloc() gets treated as a leaf function, and there are no caller-saved registers. Previously malloc() contained 5 caller saved registers on x64, resulting in at least 10 extra memory-movement instructions. In microbenchmarks this results in up to ~10% improvement in malloc() fastpath. In real programs, this is a ~1% CPU and latency improvement overall.	2018-10-18 08:32:19 -07:00
Dave Watson	0ec656eb71	ticker: add ticker_trytick For the fastpath, we want to tick, but undo the tick and jump to the slowpath if ticker would fire.	2018-10-18 08:32:19 -07:00
Dave Watson	ac34afb403	drop bump_empty_alloc option. Size class lookup support used instead.	2018-10-17 08:50:58 -07:00
Dave Watson	4edbb7c64c	sz: Support 0 size in size2index lookup/compute	2018-10-17 08:50:58 -07:00
Dave Watson	2b112ea593	add test for zero-sized alloc and aligned alloc	2018-10-17 08:50:58 -07:00
gnzlbg	01e2a38e5a	Make `smallocx` symbol name depend on the `JEMALLOC_VERSION_GID` This comments concatenates the `JEMALLOC_VERSION_GID` to the `smallocx` symbol name, such that the symbol ends up exported as `smallocx_{git_hash}`.	2018-10-17 07:12:28 -07:00
gnzlbg	837de32496	Test smallocx on Travis-CI This commit updates the gen_travis script with a new build bot that covers the experimental `smallocx` API and updates the travis CI script to test this API under travis.	2018-10-17 07:12:28 -07:00
gnzlbg	741fca1bb7	Hide smallocx even when enabled from the library API The experimental `smallocx` API is not exposed via header files, requiring the users to peek at `jemalloc`'s source code to manually add the external declarations to their own programs. This should reinforce that `smallocx` is experimental, and that `jemalloc` does not offer any kind of backwards compatiblity or ABI gurantees for it.	2018-10-17 07:12:28 -07:00
gnzlbg	730e57b08f	Adapts mallocx integration tests for smallocx	2018-10-17 07:12:28 -07:00
gnzlbg	08260a6b94	Add experimental API: smallocx_return_t smallocx(size, flags) --- Motivation: This new experimental memory-allocaction API returns a pointer to the allocation as well as the usable size of the allocated memory region. The `s` in `smallocx` stands for `sized`-`mallocx`, attempting to convey that this API returns the size of the allocated memory region. It should allow C++ P0901r0 [0] and Rust Alloc::alloc_excess to make use of it. The main purpose of these APIs is to improve telemetry. It is more accurate to register `smallocx(size, flags)` than `smallocx(nallocx(size), flags)`, for example. The latter will always line up perfectly with the existing size classes, causing a loss of telemetry information about the internal fragmentation induced by potentially poor size-classes choices. Instrumenting `nallocx` does not help much since user code can cache its result and use it repeatedly. --- Implementation: The implementation adds a new `usize` option to `static_opts_s` and an `usize` variable to `dynamic_opts_s`. These are then used to cache the result of `sz_index2size` and similar functions in the code paths in which they are unconditionally invoked. In the code-paths in which these functions are not unconditionally invoked, `smallocx` calls, as opposed to `mallocx`, these functions explicitly. --- [0]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0901r0.html	2018-10-17 07:12:28 -07:00
Dave Watson	325e3305fc	remove malloc_init() off the fastpath	2018-10-15 10:11:08 -07:00
Dave Watson	997d86acc6	restrict bytes_until_sample to int64_t. This allows optimal asm generation of sub bytes_until_sample, usize; je; for x86 arch. Subtraction is unconditional, and only flags are checked for the jump, no extra compare is necessary. This also reduces register pressure.	2018-10-15 08:24:12 -07:00
Dave Watson	d1a861fa80	add a check for SC_LARGE_MAXCLASS If we assume SC_LARGE_MAXCLASS will always fit in a SSIZE_T, then we can optimize some checks by unconditional subtraction, and then checking flags only, without a compare statement in x86.	2018-10-15 08:24:12 -07:00
Dave Watson	0ac524308d	refactor prof accum, so that tdata is not loaded if we aren't going to sample.	2018-10-15 08:24:12 -07:00
Dave Watson	9ed3bdc848	move bytes until sample to tsd. Fastpath allocation does not need to load tdata now, avoiding several branches.	2018-10-15 08:24:12 -07:00
Dave Watson	09adf18f1a	Remove a branch from cache_bin_alloc_easy Combine the branches for checking for an empty cache_bin, and checking for the low watermark.	2018-10-15 08:18:15 -07:00
jsteemann	856319dc8a	check return value of `malloc_read_fd` in case `malloc_read_fd` returns a negative error number, the result would afterwards be casted to an unsigned size_t, and may have theoretically caused an out-of-bounds memory access in the following `strncmp` call.	2018-10-11 17:25:20 -07:00
Edward Tomasz Napierala	f80c97e477	Rework the way jemalloc uses mmap(2) on FreeBSD. This makes it directly use MAP_EXCL and MAP_ALIGNED() instead of weird workarounds involving mapping at random places and then unmapping parts of them.	2018-10-06 22:06:56 -07:00
Edward Tomasz Napierala	676cdd6679	Disable runtime detection of lazy purging support on FreeBSD. The check doesn't seem to serve any purpose here, and this shaves off three syscalls on binary startup.	2018-10-06 22:06:56 -07:00
Rajeev Misra	115ce93562	bit_util: Don't use __builtin_clz on s390x There's an optimizer bug upstream that results in test failures; reported at https://bugzilla.redhat.com/show_bug.cgi?id=1619354. This works around the failure reported at https://github.com/jemalloc/jemalloc/issues/1307.	2018-09-20 11:25:17 -07:00
David Goldblatt	88771fa013	Bootstrapping: don't overwrite opt_prof_prefix.	2018-09-12 17:06:06 -07:00
rustyx	9f43defb6e	Add sc.c to the MSVC project	2018-09-04 12:58:05 -07:00
Rajeev Misra	4c548a61c8	Bit_util: Use intrinsics for pow2_ceil, where available.	2018-08-15 19:38:31 -07:00
gnzlbg	36eb0b3d77	Add valgrind build bots to CI This commit adds two build-bots to CI that test the release builds of jemalloc on linux and macOS under valgrind. The macOS build is not enabled because valgrind reports errors about reads of uninitialized memory in some tests and segfaults in others.	2018-08-13 10:59:20 -07:00
David Goldblatt	1f71e1ca43	Add hook microbenchmark.	2018-08-09 13:16:54 -07:00
David Carlier	0771ff2cea	FreeBSD build changes and allow to run the tests.	2018-08-09 10:41:20 -07:00
David Goldblatt	e8ec9528ab	Allow the use of readlinkat over readlink. This can be useful in situations where readlink is disallowed.	2018-08-03 14:04:32 -07:00
Tyler Etzel	126252a7e6	Add stats for the size of extent_avail heap	2018-08-02 10:16:06 -07:00
Tyler Etzel	c14e6c0819	Add extents information to mallocstats output - Show number/bytes of extents of each size that are dirty, muzzy, retained.	2018-08-02 10:16:06 -07:00
Tyler Etzel	33f1aa5bad	Fix comment on SC_NPSIZES.	2018-08-02 10:16:06 -07:00
Tyler Etzel	5e23f96dd4	Add unit tests for logging	2018-08-01 13:27:11 -07:00
Tyler Etzel	b664bd7935	Add logging for sampled allocations - prof_opt_log flag starts logging automatically at runtime - prof_log_{start,stop} mallctl for manual control	2018-08-01 13:27:11 -07:00
Tyler Etzel	eb261e53a6	Small refactoring of emitter - Make API more clear for using as standalone json emitter - Support cases that weren't possible before, e.g. - emitting primitive values in an array - emitting nested arrays	2018-08-01 13:27:11 -07:00
David Goldblatt	41b7372ead	TSD: Add fork support to tsd_nominal_tsds. In case of multithreaded fork, we want to leave the child in a reasonable state, in which tsd_nominal_tsds is either empty or contains only the forking thread.	2018-07-26 17:22:25 -07:00
David Goldblatt	013ab26c86	TSD: Add a tsd_nominal_list death assertion. A thread should have had its state transition away from nominal before it dies. This change adds that to the list of thread death assertions.	2018-07-26 17:22:25 -07:00
David Goldblatt	3aba072cef	SC: Remove global data. The global data is mostly only used at initialization, or for easy access to values we could compute statically. Instead of consuming that space (and risking TLB misses), we can just pass around a pointer to stack data during bootstrapping.	2018-07-23 13:37:08 -07:00
Qi Wang	4bc48718b2	Tolerate experimental features for abort_conf. Not aborting with unrecognized experimental options. This helps us testing experimental features with abort_conf enabled.	2018-07-17 20:40:32 -07:00
gnzlbg	6deed86deb	Test that .travis.yml has been produced by gen_travis.py on CI This commits checks on Travis-CI that the current `.travis.yml` file equals the output of the `gen_travis.py` script, and updated the `.travis.yml` file accordingly.	2018-07-17 17:55:50 -07:00
gnzlbg	0eb0641cac	Simplify output of gen_travis.py script This commit simplifies the output of the `gen_travis.py` script by reusing addons. The `.travis.yml` script is updated to reflect these changes.	2018-07-17 17:55:50 -07:00
David Goldblatt	55e5cc1341	SC: Make some key size classes static. The largest small class, smallest large class, and largest large class may all be needed down fast paths; to avoid the risk of touching another cache line, we can make them available as constants.	2018-07-12 20:53:06 -07:00
David T. Goldblatt	5112d9e5fd	Add MALLOC_CONF parsing for dynamic slab sizes. This actually enables us to change the values.	2018-07-12 20:53:06 -07:00
David T. Goldblatt	4610ffa942	Bootstrapping: Parse MALLOC_CONF before using slab sizes. I.e., parse before booting the bin module or sz module. This lets us tweak size class settings before committing to them by letting them leak into other modules. This commit does not actually do any tweaking of the size classes; it just chanchanges bootstrapping order; this may help bisecting any bootstrapping failures on poorly-tested architectures.	2018-07-12 20:53:06 -07:00
David T. Goldblatt	a7f68aed3e	SC: Add page customization functionality.	2018-07-12 20:53:06 -07:00
David T. Goldblatt	017dca198c	SC module: Add a note on style.	2018-07-12 20:53:06 -07:00
David Goldblatt	5b7fc9056c	Remove the --with-lg-page-sizes configure option. This appears to be unused.	2018-07-12 20:53:06 -07:00

... 13 14 15 16 17 ...

2975 Commits