server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Yinan Zhang	6d8e616902	Make buffered writer an independent module	2020-01-10 11:59:02 -08:00
Yinan Zhang	6b6b4709b3	Unify buffered writer naming	2020-01-09 14:31:31 -08:00
Yinan Zhang	9a60cf54ec	Last-N profiling mode	2019-12-30 15:58:57 -08:00
David Goldblatt	c8dae890c8	Extent -> Ehooks: Move over default hooks.	2019-12-20 10:18:40 -08:00
Qi Wang	d5031ea824	Allow dallocx and sdallocx after tsd destruction. After a thread turns into purgatory / reincarnated state, still allow dallocx and sdallocx to function normally.	2019-12-19 11:17:03 -08:00
Qi Wang	dd649c9485	Optimize away the tsd_fast() check on fastpath. Fold the tsd_state check onto the event threshold check. The fast threshold is set to 0 when tsd switch to non-nominal. The fast_threshold can be reset by remote threads, to refect the non nominal tsd state change.	2019-12-11 23:44:20 -08:00
Yinan Zhang	055478cca8	Threshold is no longer updated before prof_realloc()	2019-12-10 16:31:05 -08:00
Yinan Zhang	6945371778	Change tsdn to tsd for profiling code path	2019-11-22 16:31:56 -08:00
Yinan Zhang	b55419f9b9	Restructure profiling Develop new data structure and code logic for holding profiling related information stored in the extent that may be needed after the extent is released, which in particular is the case for the reallocation code path (e.g. in `rallocx()` and `xallocx()`). The data structure is a generalization of `prof_tctx_t`: we previously only copy out the `prof_tctx` before the extent is released, but we may be in need of additional fields. Currently the only additional field is the allocation time field, but there may be more fields in the future. The restructuring also resolved a bug: `prof_realloc()` mistakenly passed the new `ptr` to `prof_free_sampled_object()`, but passing in the `old_ptr` would crash because it's already been released. Now the essential profiling information is collectively copied out early and safely passed to `prof_free_sampled_object()` after the extent is released.	2019-11-22 16:31:56 -08:00
Qi Wang	cb1a1f4ada	Remove the unnecessary alloc_ctx on free_fastpath.	2019-11-16 13:41:13 -08:00
Qi Wang	7160617107	Add branch hints to free_fastpath. Explicityly mark the non-slab case unlikely. Previously there were jumps in the common case.	2019-11-16 13:41:13 -08:00
Qi Wang	a787d2f5b3	Prefer getaffinity() to detect number of CPUs.	2019-11-15 16:24:38 -08:00
Qi Wang	836d7a7e69	Check for large size first in the uncommon case of malloc. Larger sizes are not that uncommon comparing to !tsd_fast.	2019-11-11 13:30:20 -08:00
Yinan Zhang	97f93fa0f2	Pull tcache GC events into thread event handler	2019-11-04 16:07:56 -08:00
Yinan Zhang	198f02e797	Pull prof_accumbytes into thread event handler	2019-11-04 15:21:16 -08:00
Yinan Zhang	152c0ef954	Build a general purpose thread event handler	2019-11-04 11:15:50 -08:00
David T. Goldblatt	de81a4eada	Add stats counters for number of zero reallocs	2019-10-29 17:48:44 -07:00
David T. Goldblatt	9cfa805947	Realloc: Make behavior of realloc(ptr, 0) configurable.	2019-10-29 17:48:44 -07:00
David T. Goldblatt	ee961c2310	Merge realloc and rallocx pathways.	2019-10-29 17:48:44 -07:00
Yinan Zhang	05681e387a	Optimize cache_bin_alloc_easy for malloc fast path `tcache_bin_info` is not accessed on malloc fast path but the compiler reserves a register for it, as well as an additional register for `tcache_bin_info[ind].stack_size`. The optimization gets rid of the need for the two registers.	2019-10-21 16:43:45 -07:00
David T. Goldblatt	723ccc6c27	Extents: Split out extent struct.	2019-09-23 23:06:27 -07:00
Yinan Zhang	adce29c885	Optimize for prof_active off Move the handling of `prof_active` off case completely to slow path, so as to reduce register pressure on malloc fast path.	2019-08-27 14:48:56 -07:00
Yinan Zhang	49e6fbce78	Always adjust thread_(de)allocated	2019-08-26 11:56:41 -07:00
Yinan Zhang	9e031c1d11	Bug fix for prof_active switch The bug is subtle but critical: if application performs the following three actions in sequence: (a) turn `prof_active` off, (b) make at least one allocation that triggers the malloc slow path via the `if (unlikely(bytes_until_sample < 0))` path, and (c) turn `prof_active` back on, then the application would never get another sample (until a very very long time later). The fix is to properly reset `bytes_until_sample` rather than throwing it all the way to `SSIZE_MAX`. A side minor change is to call `prof_active_get_unlocked()` rather than directly grabbing the `prof_active` variable - it is the very reason why we defined the `prof_active_get_unlocked()` function.	2019-08-22 13:00:10 -07:00
Qi Wang	7599c82d48	Redesign the cache bin metadata for fast path. Implement the pointer-based metadata for tcache bins -- - 3 pointers are maintained to represent each bin; - 2 of the pointers are compressed on 64-bit; - is_full / is_empty done through pointer comparison; Comparing to the previous counter based design -- - fast-path speed up ~15% in benchmarks - direct pointer comparison and de-reference - no need to access tcache_bin_info in common case	2019-08-19 12:21:44 -07:00
Yinan Zhang	28ed9b9a51	Buffer stats printing Without buffering `malloc_stats_print` would invoke the write back call (which could mean an expensive `malloc_write_fd` call) for every single `printf` (including printing each line break and each leading tab/space for indentation).	2019-08-13 09:40:11 -07:00
Qi Wang	85f0cb2d0c	Add indent to individual options for confirm_conf.	2019-07-25 17:00:31 -07:00
Qi Wang	f32f23d6cc	Fix posix_memalign with input size 0. Return a valid pointer instead of failed assertion.	2019-07-18 00:43:23 -07:00
Yinan Zhang	c92ac30601	Add confirm_conf option If the confirm_conf option is set, when the program starts, each of the four malloc_conf strings will be printed, and each option will be printed when being set.	2019-05-22 09:38:39 -07:00
Yinan Zhang	13e88ae970	Fix assert in free fastpath rtree_szind_slab_read_fast() may have not initialized alloc_ctx.szind, unless after confirming the return is true.	2019-05-15 09:42:52 -07:00
Yinan Zhang	259b15dec5	Improve macro readability in malloc_conf_init Define more readable macros than yes and no.	2019-05-08 14:15:03 -07:00
David Goldblatt	33e1dad680	Safety checks: Add a redzoning feature.	2019-04-15 16:48:12 -07:00
mgrice	d3d7a8ef09	remove compare and branch in fast path for c++ operator delete[] Summary: sdallocx is checking a flag that will never be set (at least in the provided C++ destructor implementation). This branch will probably only rarely be mispredicted however it removes two instructions in sdallocx and one at the callsite (to zero out flags).	2019-04-08 10:59:05 -07:00
Qi Wang	0101d5ebef	Avoid check_min for opt_lg_extent_max_active_fit. This fixes a compiler warning.	2019-03-29 15:56:53 -07:00
Qi Wang	788a657cee	Allow low values of oversize_threshold to disable the feature. We should allow a way to easily disable the feature (e.g. not reserving the arena id at all).	2019-03-29 11:33:00 -07:00
Qi Wang	e3db480f6f	Rename huge_threshold to oversize_threshold. The keyword huge tend to remind people of huge pages which is not relevent to the feature.	2019-01-25 13:15:45 -08:00
Qi Wang	350809dc5d	Set huge_threshold to 8M by default. This feature uses an dedicated arena to handle huge requests, which significantly improves VM fragmentation. In production workload we tested it often reduces VM size by >30%.	2019-01-24 13:29:23 -08:00
Qi Wang	7a815c1b7c	Un-experimental the huge_threshold feature.	2019-01-16 12:28:57 -08:00
Qi Wang	bbe8e6a909	Avoid creating bg thds for huge arena lone. For low arena count settings, the huge threshold feature may trigger an unwanted bg thd creation. Given that the huge arena does eager purging by default, bypass bg thd creation when initializing the huge arena.	2019-01-15 16:00:34 -08:00
Qi Wang	98b56ab23d	Store the bin shard selection in TSD. This avoids having to choose bin shard on the fly, also will allow flexible bin binding for each thread.	2018-12-03 17:17:03 -08:00
Qi Wang	3f9f2833f6	Add opt.bin_shards to specify number of bin shards. The option uses the same format as "slab_sizes" to specify number of shards for each bin size.	2018-12-03 17:17:03 -08:00
Qi Wang	37b8913925	Add support for sharded bins within an arena. This makes it possible to have multiple set of bins in an arena, which improves arena scalability because the bins (especially the small ones) are always the limiting factor in production workload. A bin shard is picked on allocation; each extent tracks the bin shard id for deallocation. The shard size will be determined using runtime options.	2018-12-03 17:17:03 -08:00
Dave Watson	794e29c0ab	Add a free() and sdallocx(where flags=0) fastpath Add unsized and sized deallocation fastpaths. Similar to the malloc() fastpath, this removes all frame manipulation for the majority of free() calls. The performance advantages here are less than that of the malloc() fastpath, but from prod tests seems to still be half a percent or so of improvement. Stats and sampling a both supported (sdallocx needs a sampling check, for rtree lookups slab will only be set for unsampled objects). We don't support flush, any flush requests go to the slowpath.	2018-11-12 13:20:37 -08:00
Dave Watson	0f8313659e	malloc: Add a fastpath This diff adds a fastpath that assumes size <= SC_LOOKUP_MAXCLASS, and that we hit tcache. If either of these is false, we fall back to the previous codepath (renamed 'malloc_default'). Crucially, we only tail call malloc_default, and with the same kind and number of arguments, so that both clang and gcc tail-calling will kick in - therefore malloc() gets treated as a leaf function, and there are no caller-saved registers. Previously malloc() contained 5 caller saved registers on x64, resulting in at least 10 extra memory-movement instructions. In microbenchmarks this results in up to ~10% improvement in malloc() fastpath. In real programs, this is a ~1% CPU and latency improvement overall.	2018-10-18 08:32:19 -07:00
Dave Watson	ac34afb403	drop bump_empty_alloc option. Size class lookup support used instead.	2018-10-17 08:50:58 -07:00
gnzlbg	01e2a38e5a	Make `smallocx` symbol name depend on the `JEMALLOC_VERSION_GID` This comments concatenates the `JEMALLOC_VERSION_GID` to the `smallocx` symbol name, such that the symbol ends up exported as `smallocx_{git_hash}`.	2018-10-17 07:12:28 -07:00
gnzlbg	741fca1bb7	Hide smallocx even when enabled from the library API The experimental `smallocx` API is not exposed via header files, requiring the users to peek at `jemalloc`'s source code to manually add the external declarations to their own programs. This should reinforce that `smallocx` is experimental, and that `jemalloc` does not offer any kind of backwards compatiblity or ABI gurantees for it.	2018-10-17 07:12:28 -07:00
gnzlbg	08260a6b94	Add experimental API: smallocx_return_t smallocx(size, flags) --- Motivation: This new experimental memory-allocaction API returns a pointer to the allocation as well as the usable size of the allocated memory region. The `s` in `smallocx` stands for `sized`-`mallocx`, attempting to convey that this API returns the size of the allocated memory region. It should allow C++ P0901r0 [0] and Rust Alloc::alloc_excess to make use of it. The main purpose of these APIs is to improve telemetry. It is more accurate to register `smallocx(size, flags)` than `smallocx(nallocx(size), flags)`, for example. The latter will always line up perfectly with the existing size classes, causing a loss of telemetry information about the internal fragmentation induced by potentially poor size-classes choices. Instrumenting `nallocx` does not help much since user code can cache its result and use it repeatedly. --- Implementation: The implementation adds a new `usize` option to `static_opts_s` and an `usize` variable to `dynamic_opts_s`. These are then used to cache the result of `sz_index2size` and similar functions in the code paths in which they are unconditionally invoked. In the code-paths in which these functions are not unconditionally invoked, `smallocx` calls, as opposed to `mallocx`, these functions explicitly. --- [0]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0901r0.html	2018-10-17 07:12:28 -07:00
Dave Watson	325e3305fc	remove malloc_init() off the fastpath	2018-10-15 10:11:08 -07:00
David Goldblatt	88771fa013	Bootstrapping: don't overwrite opt_prof_prefix.	2018-09-12 17:06:06 -07:00
David Goldblatt	e8ec9528ab	Allow the use of readlinkat over readlink. This can be useful in situations where readlink is disallowed.	2018-08-03 14:04:32 -07:00
Tyler Etzel	b664bd7935	Add logging for sampled allocations - prof_opt_log flag starts logging automatically at runtime - prof_log_{start,stop} mallctl for manual control	2018-08-01 13:27:11 -07:00
David Goldblatt	41b7372ead	TSD: Add fork support to tsd_nominal_tsds. In case of multithreaded fork, we want to leave the child in a reasonable state, in which tsd_nominal_tsds is either empty or contains only the forking thread.	2018-07-26 17:22:25 -07:00
David Goldblatt	3aba072cef	SC: Remove global data. The global data is mostly only used at initialization, or for easy access to values we could compute statically. Instead of consuming that space (and risking TLB misses), we can just pass around a pointer to stack data during bootstrapping.	2018-07-23 13:37:08 -07:00
Qi Wang	4bc48718b2	Tolerate experimental features for abort_conf. Not aborting with unrecognized experimental options. This helps us testing experimental features with abort_conf enabled.	2018-07-17 20:40:32 -07:00
David Goldblatt	55e5cc1341	SC: Make some key size classes static. The largest small class, smallest large class, and largest large class may all be needed down fast paths; to avoid the risk of touching another cache line, we can make them available as constants.	2018-07-12 20:53:06 -07:00
David T. Goldblatt	5112d9e5fd	Add MALLOC_CONF parsing for dynamic slab sizes. This actually enables us to change the values.	2018-07-12 20:53:06 -07:00
David T. Goldblatt	4610ffa942	Bootstrapping: Parse MALLOC_CONF before using slab sizes. I.e., parse before booting the bin module or sz module. This lets us tweak size class settings before committing to them by letting them leak into other modules. This commit does not actually do any tweaking of the size classes; it just chanchanges bootstrapping order; this may help bisecting any bootstrapping failures on poorly-tested architectures.	2018-07-12 20:53:06 -07:00
David Goldblatt	e904f813b4	Hide size class computation behind a layer of indirection. This class removes almost all the dependencies on size_classes.h, accessing the data there only via the new module sc.h, which does not depend on any configuration options. In a subsequent commit, we'll remove the configure-time size class computations, doing them at boot time, instead.	2018-07-12 20:53:06 -07:00
gnzlbg	3d29d11ac2	Clean compilation -Wextra Before this commit jemalloc produced many warnings when compiled with -Wextra with both Clang and GCC. This commit fixes the issues raised by these warnings or suppresses them if they were spurious at least for the Clang and GCC versions covered by CI. This commit: * adds `JEMALLOC_DIAGNOSTIC` macros: `JEMALLOC_DIAGNOSTIC_{PUSH,POP}` are used to modify the stack of enabled diagnostics. The `JEMALLOC_DIAGNOSTIC_IGNORE_...` macros are used to ignore a concrete diagnostic. * adds `JEMALLOC_FALLTHROUGH` macro to explicitly state that falling through `case` labels in a `switch` statement is intended * Removes all UNUSED annotations on function parameters. The warning -Wunused-parameter is now disabled globally in `jemalloc_internal_macros.h` for all translation units that include that header. It is never re-enabled since that header cannot be included by users. * locally suppresses some -Wextra diagnostics: * `-Wmissing-field-initializer` is buggy in older Clang and GCC versions, where it does not understanding that, in C, `= {0}` is a common C idiom to initialize a struct to zero * `-Wtype-bounds` is suppressed in a particular situation where a generic macro, used in multiple different places, compares an unsigned integer for smaller than zero, which is always true. * `-Walloc-larger-than-size=` diagnostics warn when an allocation function is called with a size that is too large (out-of-range). These are suppressed in the parts of the tests where `jemalloc` explicitly does this to test that the allocation functions fail properly. * adds a new CI build bot that runs the log unit test on CI. Closes #1196 .	2018-07-09 21:40:42 -07:00
Qi Wang	cdf15b458a	Rename huge_threshold to experimental, and tweak documentation.	2018-06-29 10:35:02 -07:00
Qi Wang	79522b2fc2	Refactor arena_is_auto.	2018-06-29 10:35:02 -07:00
Qi Wang	94a88c26f4	Implement huge arena: opt.huge_threshold. The feature allows using a dedicated arena for huge allocations. We want the addtional arena to separate huge allocation because: 1) mixing small extents with huge ones causes fragmentation over the long run (this feature reduces VM size significantly); 2) with many arenas, huge extents rarely get reused across threads; and 3) huge allocations happen way less frequently, therefore no concerns for lock contention.	2018-06-29 10:35:02 -07:00
Qi Wang	0ff7ff3ec7	Optimize ixalloc by avoiding a size lookup.	2018-06-05 21:03:51 -07:00
David Goldblatt	cb0707c0fc	Hooks: hook the realloc pathways that move/expand.	2018-05-18 11:43:03 -07:00
David Goldblatt	67270040a5	Hooks: hook the realloc paths that act as pure malloc/free.	2018-05-18 11:43:03 -07:00
David Goldblatt	83e516154c	Hooks: hook the pure-expand function.	2018-05-18 11:43:03 -07:00
David Goldblatt	c154f5881b	Hooks: hook the pure-deallocation functions.	2018-05-18 11:43:03 -07:00
David Goldblatt	226327cf66	Hooks: hook the pure-allocation functions.	2018-05-18 11:43:03 -07:00
David Goldblatt	5ae6e7cbfa	Add "hook" module. The hook module allows a low-reader-overhead way of finding hooks to invoke and calling them. For now, none of the allocation pathways are tied into the hooks; this will come later.	2018-05-18 11:43:03 -07:00
Qi Wang	e40b2f75bd	Fix abort_conf processing. When abort_conf is set, make sure we always error out at the end of the options processing loop.	2018-04-17 18:23:53 -07:00
Qi Wang	0fadf4a2e3	Add UNUSED to avoid compiler warnings.	2018-04-16 13:50:21 -07:00
Dave Watson	8b14f3abc0	background_thread: add max thread count config Looking at the thread counts in our services, jemalloc's background thread is useful, but mostly idle. Add a config option to tune down the number of threads.	2018-04-10 14:01:45 -07:00
Qi Wang	e4f090e8df	Add opt.thp which allows explicit hugepage usage. "always" marks all user mappings as MADV_HUGEPAGE; while "never" marks all mappings as MADV_NOHUGEPAGE. The default setting "default" does not change any settings. Note that all the madvise calls are part of the default extent hooks by design, so that customized extent hooks have complete control over the mappings including hugepage settings.	2018-03-08 13:08:06 -08:00
Qi Wang	fac706836f	Add opt.lg_extent_max_active_fit When allocating from dirty extents (which we always prefer if available), large active extents can get split even if the new allocation is much smaller, in which case the introduced fragmentation causes high long term damage. This new option controls the threshold to reuse and split an existing active extent. We avoid using a large extent for much smaller sizes, in order to reduce fragmentation. In some workload, adding the threshold improves virtual memory usage by >10x.	2017-11-16 15:32:02 -08:00
Qi Wang	a2e6eb2c22	Delay background_thread_ctl_init to right before thread creation. ctl_init sets isthreaded, which means it should be done without holding any locks.	2017-10-05 22:57:56 -07:00
David Goldblatt	8a7ee3014c	Logging: capitalize log macro. Dodge a name-conflict with the math.h logarithm function. D'oh.	2017-10-02 20:44:43 -07:00
Qi Wang	eaa58a5026	Put static keyword first. Fix a warning by -Wold-style-declaration.	2017-09-21 12:18:10 -07:00
Qi Wang	47b20bb654	Change opt.metadata_thp to [disabled,auto,always]. To avoid the high RSS caused by THP + low usage arena (i.e. THP becomes a significant percentage), added a new "auto" option which will only start using THP after a base allocator used up the first THP region. Starting from the second hugepage (in a single arena), "auto" behaves the same as "always", i.e. madvise hugepage right away.	2017-08-30 16:47:32 -07:00
Qi Wang	8fdd9a5797	Implement opt.metadata_thp This option enables transparent huge page for base allocators (require MADV_HUGEPAGE support).	2017-08-11 14:51:20 -07:00
Qi Wang	1ab2ab294c	Only read szind if ptr is not paged aligned in sdallocx. If ptr is not page aligned, we know the allocation was not sampled. In this case use the size passed into sdallocx directly w/o accessing rtree. This improve sdallocx efficiency in the common case (not sampled && small allocation).	2017-07-31 15:47:48 -07:00
David Goldblatt	e6aeceb606	Logging: log using the log var names directly. Currently we have to log by writing something like: static log_var_t log_a_b_c = LOG_VAR_INIT("a.b.c"); log (log_a_b_c, "msg"); This is sort of annoying. Let's just write: log("a.b.c", "msg");	2017-07-24 14:55:54 -07:00
David Goldblatt	a9f7732d45	Logging: allow logging with empty varargs. Currently, the log macro requires at least one argument after the format string, because of the way the preprocessor handles varargs macros. We can hide some of that irritation by pushing the extra arguments into a varargs function.	2017-07-22 09:38:19 -07:00
David T. Goldblatt	e215a7bc18	Add entry and exit logging to all core functions. I.e. mallloc, free, the allocx API, the posix extensions.	2017-07-20 17:58:37 -07:00
David T. Goldblatt	9761b449c8	Add a logging facility. This sets up a hierarchical logging facility, so that we can add logging statements liberally, and turn them on in a fine-grained manner.	2017-07-20 17:58:37 -07:00
Qi Wang	cb032781bd	Add extent_grow_mtx in pre_ / post_fork handlers. This fixed the issue that could cause the child process to stuck after fork.	2017-06-29 17:01:18 -07:00
Qi Wang	57beeb2fcb	Switch ctl to explicitly use tsd instead of tsdn.	2017-06-23 13:27:53 -07:00
Qi Wang	425463a446	Check arena in current context in pre_reentrancy.	2017-06-23 13:27:53 -07:00
Jason Evans	d49ac4c709	Fix assertion typos. Reported by Conrad Meyer.	2017-06-23 11:48:00 -07:00
Qi Wang	9b1befabbb	Add minimal initialized TSD. We use the minimal_initilized tsd (which requires no cleanup) for free() specifically, if tsd hasn't been initialized yet. Any other activity will transit the state from minimal to normal. This is to workaround the case where a thread has no malloc calls in its lifetime until during thread termination, free() happens after tls destructors.	2017-06-15 17:55:53 -07:00
Qi Wang	b83b5ad44a	Not re-enable background thread after fork. Avoid calling pthread_create in postfork handlers.	2017-06-12 08:56:14 -07:00
Jason Evans	94d655b8bd	Update a UTRACE() size argument.	2017-06-08 15:33:52 -07:00
Qi Wang	73713fbb27	Drop high rank locks when creating threads. Avoid holding arenas_lock and background_thread_lock when creating background threads, because pthread_create may take internal locks, and potentially cause deadlock with jemalloc internal locks.	2017-06-08 10:02:18 -07:00
Qi Wang	00869e39a3	Make tsd no-cleanup during tsd reincarnation. Since tsd cleanup isn't guaranteed when reincarnated, we set up tsd in a way that needs no cleanup, by making it going through slow path instead.	2017-06-07 11:03:49 -07:00
Qi Wang	530c07a45b	Set reentrancy level to 1 during init. This makes sure we go down slow path w/ a0 in init.	2017-06-02 12:59:21 -07:00
Jason Evans	b511232fcd	Refactor/fix background_thread/percpu_arena bootstrapping. Refactor bootstrapping such that dlsym() is called during the bootstrapping phase that can tolerate reentrant allocation.	2017-06-01 08:55:27 -07:00
David Goldblatt	fa35463d56	Witness assertions: only assert locklessness when non-reentrant. Previously we could still hit these assertions down error paths or in the extended API.	2017-05-31 17:02:54 -07:00
David Goldblatt	8261e581be	Header refactoring: Pull size helpers out of jemalloc module.	2017-05-31 13:08:45 -07:00
David Goldblatt	98774e64a4	Header refactoring: unify and de-catchall extent_mmap module.	2017-05-31 13:08:45 -07:00
David Goldblatt	93284bb53d	Header refactoring: unify and de-catchall extent_dss.	2017-05-31 13:08:45 -07:00
David Goldblatt	44f9bd147a	Header refactoring: unify and de-catchall rtree module.	2017-05-31 13:08:45 -07:00
Qi Wang	7578b0e929	Explicitly say so when aborting on opt_abort_conf.	2017-05-30 17:37:35 -07:00
Qi Wang	d5ef5ae934	Add opt.stats_print_opts. The value is passed to atexit(3)-triggered malloc_stats_print() calls.	2017-05-29 11:54:00 -07:00
Qi Wang	b86d271cbf	Added opt_abort_conf: abort on invalid config options.	2017-05-26 21:14:28 -07:00
David Goldblatt	18ecbfa89e	Header refactoring: unify and de-catchall mutex module	2017-05-24 15:27:30 -07:00
David Goldblatt	9f822a1fd7	Header refactoring: unify and de-catchall witness code.	2017-05-24 15:27:30 -07:00
Qi Wang	b693c7868e	Implementing opt.background_thread. Added opt.background_thread to enable background threads, which handles purging currently. When enabled, decay ticks will not trigger purging (which will be left to the background threads). We limit the max number of threads to NCPUs. When percpu arena is enabled, set CPU affinity for the background threads as well. The sleep interval of background threads is dynamic and determined by computing number of pages to purge in the future (based on backlog).	2017-05-23 12:26:20 -07:00
David Goldblatt	26c792e61a	Allow mutexes to take a lock ordering enum at construction. This lets us specify whether and how mutexes of the same rank are allowed to be acquired. Currently, we only allow two polices (only a single mutex at a given rank at a time, and mutexes acquired in ascending order), but we can plausibly allow more (e.g. the "release uncontended mutexes before blocking").	2017-05-19 14:21:27 -07:00
Jason Evans	6e62c62862	Refactor decay_time into decay_ms. Support millisecond resolution for decay times. Among other use cases this makes it possible to specify a short initial dirty-->muzzy decay phase, followed by a longer muzzy-->clean decay phase. This resolves #812.	2017-05-18 11:33:45 -07:00
Jason Evans	18a83681cf	Refactor (MALLOCX_ARENA_MAX + 1) to be MALLOCX_ARENA_LIMIT. This resolves #673.	2017-05-14 10:14:23 -07:00
Jason Evans	909f0482e4	Automatically generate private symbol name mangling macros. Rather than using a manually maintained list of internal symbols to drive name mangling, add a compilation phase to automatically extract the list of internal symbols. This resolves #677.	2017-05-11 23:06:54 -07:00
David Goldblatt	209f2926b8	Header refactoring: tsd - cleanup and dependency breaking. This removes the tsd macros (which are used only for tsd_t in real builds). We break up the circular dependencies involving tsd. We also move all tsd access through getters and setters. This allows us to assert that we only touch data when tsd is in a valid state. We simplify the usages of the x macro trick, removing all the customizability (get/set, init, cleanup), moving the lifetime logic to tsd_init and tsd_cleanup. This lets us make initialization order independent of order within tsd_t.	2017-05-01 10:49:56 -07:00
Jason Evans	b9ab04a191	Refactor !opt.munmap to opt.retain.	2017-04-29 09:24:12 -07:00
David Goldblatt	89e2d3c12b	Header refactoring: ctl - unify and remove from catchall. In order to do this, we introduce the mutex_prof module, which breaks a circular dependency between ctl and prof.	2017-04-25 09:51:38 -07:00
Jason Evans	c67c3e4a63	Replace --disable-munmap with opt.munmap. Control use of munmap(2) via a run-time option rather than a compile-time option (with the same per platform default). The old behavior of --disable-munmap can be achieved with --with-malloc-conf=munmap:false. This partially resolves #580.	2017-04-24 20:37:16 -07:00
David Goldblatt	31b43219db	Header refactoring: size_classes module - remove from the catchall	2017-04-24 10:33:21 -07:00
David Goldblatt	bf2dc7e678	Header refactoring: ticker module - remove from the catchall and unify.	2017-04-24 10:33:21 -07:00
David Goldblatt	4d2e4bf5eb	Get rid of most of the various inline macros.	2017-04-24 10:33:21 -07:00
David Goldblatt	425253e2cd	Enable -Wundef, when supported. This can catch bugs in which one header defines a numeric constant, and another uses it without including the defining header. Undefined preprocessor symbols expand to '0', so that this will compile fine, silently doing the math wrong.	2017-04-21 17:03:56 -07:00
Jason Evans	3823effe12	Remove --enable-ivsalloc. Continue to use ivsalloc() when --enable-debug is specified (and add assertions to guard against 0 size), but stop providing a documented explicit semantics-changing band-aid to dodge undefined behavior in sallocx() and malloc_usable_size(). ivsalloc() remains compiled in, unlike when #211 restored --enable-ivsalloc, and if JEMALLOC_FORCE_IVSALLOC is defined during compilation, sallocx() and malloc_usable_size() will still use ivsalloc(). This partially resolves #580.	2017-04-21 14:34:35 -07:00
Jason Evans	4403c9ab44	Remove --disable-tcache. Simplify configuration by removing the --disable-tcache option, but replace the testing for that configuration with --with-malloc-conf=tcache:false. Fix the thread.arena and thread.tcache.flush mallctls to work correctly if tcache is disabled. This partially resolves #580.	2017-04-21 10:06:12 -07:00
Jason Evans	da4cff0279	Support --with-lg-page values larger than system page size. All mappings continue to be PAGE-aligned, even if the system page size is smaller. This change is primarily intended to provide a mechanism for supporting multiple page sizes with the same binary; smaller page sizes work better in conjunction with jemalloc's design. This resolves #467.	2017-04-18 19:01:04 -07:00
David Goldblatt	38e847c1c5	Header refactoring: unify spin.h and move it out of the catch-all.	2017-04-18 18:35:03 -07:00
David Goldblatt	7ebc83894f	Header refactoring: move jemalloc_internal_types.h out of the catch-all	2017-04-18 18:35:03 -07:00
David Goldblatt	d9ec36e22d	Header refactoring: move assert.h out of the catch-all	2017-04-18 18:35:03 -07:00
David Goldblatt	f692e6c214	Header refactoring: move util.h out of the catchall	2017-04-18 18:35:03 -07:00
David Goldblatt	54373be084	Header refactoring: move malloc_io.h out of the catchall	2017-04-18 18:35:03 -07:00
Qi Wang	c2fcf9c2cf	Switch to fine-grained reentrancy support. Previously we had a general detection and support of reentrancy, at the cost of having branches and inc / dec operations on fast paths. To avoid taxing fast paths, we move the reentrancy operations onto tsd slow state, and only modify reentrancy level around external calls (that might trigger reentrancy).	2017-04-14 19:48:06 -07:00
Qi Wang	b348ba29bb	Bundle 3 branches on fast path into tsd_state. Added tsd_state_nominal_slow, which on fast path malloc() incorporates tcache_enabled check, and on fast path free() bundles both malloc_slow and tcache_enabled branches.	2017-04-14 16:58:08 -07:00
Qi Wang	ccfe68a916	Pass alloc_ctx down profiling path. With this change, when profiling is enabled, we avoid doing redundant rtree lookups. Also changed dalloc_atx_t to alloc_atx_t, as it's now used on allocation path as well (to speed up profiling).	2017-04-12 13:55:39 -07:00
Qi Wang	f35213bae4	Pass dalloc_ctx down the sdalloc path. This avoids redundant rtree lookups.	2017-04-12 13:55:39 -07:00
David Goldblatt	e709fae1d7	Header refactoring: move atomic.h out of the catch-all	2017-04-11 11:52:30 -07:00
David Goldblatt	743d940dc3	Header refactoring: Split up jemalloc_internal.h This is a biggy. jemalloc_internal.h has been doing multiple jobs for a while now: - The source of system-wide definitions. - The catch-all include file. - The module header file for jemalloc.c This commit splits up this functionality. The system-wide definitions responsibility has moved to jemalloc_preamble.h. The catch-all include file is now jemalloc_internal_includes.h. The module headers for jemalloc.c are now in jemalloc_internal_[externs\|inlines\|types].h, just as they are for the other modules.	2017-04-11 11:52:30 -07:00
Qi Wang	bfa530b75b	Pass dealloc_ctx down free() fast path. This gets rid of the redundent rtree lookup down fast path.	2017-04-11 09:58:12 -07:00
Qi Wang	04ef218d87	Move reentrancy_level to the beginning of TSD.	2017-04-07 16:25:43 -07:00
David Goldblatt	b407a65401	Add basic reentrancy-checking support, and allow arena_new to reenter. This checks whether or not we're reentrant using thread-local data, and, if we are, moves certain internal allocations to use arena 0 (which should be properly initialized after bootstrapping). The immediate thing this allows is spinning up threads in arena_new, which will enable spinning up background threads there.	2017-04-07 14:10:27 -07:00
Qi Wang	fde3e20cc0	Integrate auto tcache into TSD. The embedded tcache is initialized upon tsd initialization. The avail arrays for the tbins will be allocated / deallocated accordingly during init / cleanup. With this change, the pointer to the auto tcache will always be available, as long as we have access to the TSD. tcache_available() (called in tcache_get()) is provided to check if we should use tcache.	2017-04-07 09:55:14 -07:00
David Goldblatt	bc32ec3503	Move arena-tracking atomics in jemalloc.c to C11-style	2017-04-05 16:25:37 -07:00
Qi Wang	d3cda3423c	Do proper cleanup for tsd_state_reincarnated. Also enable arena_bind under non-nominal state, as the cleanup will be handled correctly now.	2017-04-04 00:34:49 -07:00
Qi Wang	e6b074472e	Force inline ifree to avoid function call costs on fast path. Without ALWAYS_INLINE, sometimes ifree() gets compiled into its own function, which adds overhead on the fast path.	2017-03-24 17:54:28 -07:00
Jason Evans	5e67fbc367	Push down iealloc() calls. Call iealloc() as deep into call chains as possible without causing redundant calls.	2017-03-22 18:33:32 -07:00
Jason Evans	51a2ec92a1	Remove extent dereferences from the deallocation fast paths.	2017-03-22 18:33:32 -07:00
Jason Evans	4f341412e5	Remove extent arg from isalloc() and arena_salloc().	2017-03-22 18:33:32 -07:00
Qi Wang	ad91762635	Not re-binding iarena when migrate between arenas.	2017-03-21 14:05:20 -07:00
Jason Evans	64e458f5cd	Implement two-phase decay-based purging. Split decay-based purging into two phases, the first of which uses lazy purging to convert dirty pages to "muzzy", and the second of which uses forced purging, decommit, or unmapping to convert pages to clean or destroy them altogether. Not all operating systems support lazy purging, yet the application may provide extent hooks that implement lazy purging, so care must be taken to dynamically omit the first phase when necessary. The mallctl interfaces change as follows: - opt.decay_time --> opt.{dirty,muzzy}_decay_time - arena.<i>.decay_time --> arena.<i>.{dirty,muzzy}_decay_time - arenas.decay_time --> arenas.{dirty,muzzy}_decay_time - stats.arenas.<i>.pdirty --> stats.arenas.<i>.p{dirty,muzzy} - stats.arenas.<i>.{npurge,nmadvise,purged} --> stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged} This resolves #521.	2017-03-15 13:13:47 -07:00
Qi Wang	ec532e2c5c	Implement per-CPU arena. The new feature, opt.percpu_arena, determines thread-arena association dynamically based CPU id. Three modes are supported: "percpu", "phycpu" and disabled. "percpu" uses the current core id (with help from sched_getcpu()) directly as the arena index, while "phycpu" will assign threads on the same physical CPU to the same arena. In other words, "percpu" means # of arenas == # of CPUs, while "phycpu" has # of arenas == 1/2 * (# of CPUs). Note that no runtime check on whether hyper threading is enabled is added yet. When enabled, threads will be migrated between arenas when a CPU change is detected. In the current design, to reduce overhead from reading CPU id, each arena tracks the thread accessed most recently. When a new thread comes in, we will read CPU id and update arena if necessary.	2017-03-08 23:19:01 -08:00
Qi Wang	8721e19c04	Fix arena_prefork lock rank order for witness. When witness is enabled, lock rank order needs to be preserved during prefork, not only for each arena, but also across arenas. This change breaks arena_prefork into further stages to ensure valid rank order across arenas. Also changed test/unit/fork to use a manual arena to catch this case.	2017-03-08 23:07:27 -08:00
Qi Wang	01f47f11a6	Store associated arena in tcache. This fixes tcache_flush for manual tcaches, which wasn't able to find the correct arena it associated with. Also changed the decay test to cover this case (by using manually created arenas).	2017-03-07 12:58:11 -08:00
Jason Evans	379dd44c57	Add casts to CONF_HANDLE_T_U(). This avoids signed/unsigned comparison warnings when specifying integer constants as inputs.	2017-02-28 17:18:25 -08:00
Jason Evans	ab25d3c987	Synchronize arena->tcache_ql with arena->tcache_ql_mtx. This replaces arena->lock synchronization.	2017-02-16 09:39:46 -08:00

1 2 3 4 5 ...

487 Commits