server-skynet-source-3rd-jemalloc

project-base/server-skynet-source-3rd-jemalloc

Author	SHA1	Message	Date
Dave Watson	997d86acc6	restrict bytes_until_sample to int64_t. This allows optimal asm generation of sub bytes_until_sample, usize; je; for x86 arch. Subtraction is unconditional, and only flags are checked for the jump, no extra compare is necessary. This also reduces register pressure.	2018-10-15 08:24:12 -07:00
Dave Watson	9ed3bdc848	move bytes until sample to tsd. Fastpath allocation does not need to load tdata now, avoiding several branches.	2018-10-15 08:24:12 -07:00
David Goldblatt	41b7372ead	TSD: Add fork support to tsd_nominal_tsds. In case of multithreaded fork, we want to leave the child in a reasonable state, in which tsd_nominal_tsds is either empty or contains only the forking thread.	2018-07-26 17:22:25 -07:00
David Goldblatt	d1e11d48d4	Move tsd link and in_hook after tcache. This can lead to better cache utilization down the common paths where we don't touch the link.	2018-06-27 13:39:02 -07:00
David Goldblatt	a7f749c9af	Hooks: Protect against reentrancy. Previously, we made the user deal with this themselves, but that's not good enough; if hooks may allocate, we should test the allocation pathways down hooks. If we're doing that, we might as well actually implement the protection for the user.	2018-05-18 11:43:03 -07:00
David Goldblatt	0379235f47	Tests: Shouldn't be able to change global slowness. This can help ensure that we don't leave slowness changes behind in case of resource exhaustion.	2018-05-18 11:43:03 -07:00
David Goldblatt	e870829e64	TSD: Add the ability to enter a global slow path. This gives any thread the ability to send other threads down slow paths the next time they fetch tsd.	2018-05-18 11:43:03 -07:00
David Goldblatt	feff510b9f	TSD: Pull name mangling into a macro.	2018-05-18 11:43:03 -07:00
David Goldblatt	39d6420c0c	TSD: Make state atomic. This will let us change the state of another thread remotely, eventually.	2018-05-18 11:43:03 -07:00
David Goldblatt	982c10de35	TSD: Make all state access happen through a function. Shortly, tsd state will be atomic and have some complicated enough logic down the state-setting path that we should be aware of it.	2018-05-18 11:43:03 -07:00
Dave Watson	d6feed6e66	Use tsd offset_state instead of atomic While working on #852, I noticed the prng state is atomic. This is the only atomic use of prng in all of jemalloc. Instead, use a threadlocal prng state if possible to avoid unnecessary cache line contention.	2017-11-14 08:58:18 -08:00
Qi Wang	9b1befabbb	Add minimal initialized TSD. We use the minimal_initilized tsd (which requires no cleanup) for free() specifically, if tsd hasn't been initialized yet. Any other activity will transit the state from minimal to normal. This is to workaround the case where a thread has no malloc calls in its lifetime until during thread termination, free() happens after tls destructors.	2017-06-15 17:55:53 -07:00
Jason Evans	faaf458bad	Remove redundant typedefs. Pre-C11 compilers do not support typedef redefinition.	2017-06-08 13:28:57 -07:00
Qi Wang	5642f03cae	Add internal tsd for background_thread.	2017-06-08 10:02:18 -07:00
Qi Wang	00869e39a3	Make tsd no-cleanup during tsd reincarnation. Since tsd cleanup isn't guaranteed when reincarnated, we set up tsd in a way that needs no cleanup, by making it going through slow path instead.	2017-06-07 11:03:49 -07:00
David Goldblatt	44f9bd147a	Header refactoring: unify and de-catchall rtree module.	2017-05-31 13:08:45 -07:00
David Goldblatt	9f822a1fd7	Header refactoring: unify and de-catchall witness code.	2017-05-24 15:27:30 -07:00
David Goldblatt	3f685e8824	Protect the rtree/extent interactions with a mutex pool. Instead of embedding a lock bit in rtree leaf elements, we associate extents with a small set of mutexes. This gets us two things: - We can use the system mutexes. This (hypothetically) protects us from priority inversion, and lets us stop doing a backoff/sleep loop, instead opting for precise wakeups from the mutex. - Cuts down on the number of mutex acquisitions we have to do (from 4 in the worst case to two). We end up simplifying most of the rtree code (which no longer has to deal with locking or concurrency at all), at the cost of additional complexity in the extent code: since the mutex protecting the rtree leaf elements is determined by reading the extent out of those elements, the initial read is racy, so that we may acquire an out of date mutex. We re-check the extent in the leaf after acquiring the mutex to protect us from this race.	2017-05-19 14:21:27 -07:00
David Goldblatt	209f2926b8	Header refactoring: tsd - cleanup and dependency breaking. This removes the tsd macros (which are used only for tsd_t in real builds). We break up the circular dependencies involving tsd. We also move all tsd access through getters and setters. This allows us to assert that we only touch data when tsd is in a valid state. We simplify the usages of the x macro trick, removing all the customizability (get/set, init, cleanup), moving the lifetime logic to tsd_init and tsd_cleanup. This lets us make initialization order independent of order within tsd_t.	2017-05-01 10:49:56 -07:00
David Goldblatt	77cccac8cd	Break up headers into constituent parts This is part of a broader change to make header files better represent the dependencies between one another (see https://github.com/jemalloc/jemalloc/issues/533). It breaks up component headers into smaller parts that can be made to have a simpler dependency graph. For the autogenerated headers (smoothstep.h and size_classes.h), no splitting was necessary, so I didn't add support to emit multiple headers.	2017-01-12 15:43:51 -08:00
Jason Evans	69c26cdb01	Add some missing explicit casts.	2016-12-13 13:38:11 -08:00
Jason Evans	b54d160dc4	Do not (recursively) allocate within tsd_fetch(). Refactor tsd so that tsdn_fetch() does not trigger allocation, since allocation could cause infinite recursion. This resolves #458.	2016-10-20 23:59:12 -07:00
Jason Evans	61f467e16a	Avoid self assignment in tsd_set().	2016-09-23 12:21:34 -07:00
Jason Evans	6f29a83924	Add rtree lookup path caching. rtree-based extent lookups remain more expensive than chunk-based run lookups, but with this optimization the fast path slowdown is ~3 CPU cycles per metadata lookup (on Intel Core i7-4980HQ), versus ~11 cycles prior. The path caching speedup tends to degrade gracefully unless allocated memory is spread far apart (as is the case when using a mixture of sbrk() and mmap()).	2016-06-05 20:59:57 -07:00
Jason Evans	7be2ebc23f	Make tsd cleanup functions optional, remove noop cleanup functions.	2016-06-05 20:42:24 -07:00
Jason Evans	e75e9be130	Add rtree element witnesses.	2016-06-03 12:27:41 -07:00
Jason Evans	ba5c709517	Remove quarantine support.	2016-05-13 10:25:05 -07:00
Jason Evans	c1e00ef2a6	Resolve bootstrapping issues when embedded in FreeBSD libc. `b2c0d6322d` (Add witness, a simple online locking validator.) caused a broad propagation of tsd throughout the internal API, but tsd_fetch() was designed to fail prior to tsd bootstrapping. Fix this by splitting tsd_t into non-nullable tsd_t and nullable tsdn_t, and modifying all internal APIs that do not critically rely on tsd to take nullable pointers. Furthermore, add the tsd_booted_get() function so that tsdn_fetch() can probe whether tsd bootstrapping is complete and return NULL if not. All dangerous conversions of nullable pointers are tsdn_tsd() calls that assert-fail on invalid conversion.	2016-05-10 22:51:33 -07:00
Jason Evans	174c0c3a9c	Fix fork()-related lock rank ordering reversals.	2016-04-25 23:16:20 -07:00
Jason Evans	66cd953514	Do not allocate metadata via non-auto arenas, nor tcaches. This assures that all internally allocated metadata come from the first opt_narenas arenas, i.e. the automatically multiplexed arenas.	2016-04-22 15:19:59 -07:00
Jason Evans	b2c0d6322d	Add witness, a simple online locking validator. This resolves #358.	2016-04-14 02:09:28 -07:00
Jason Evans	db927b6727	Refactor arenas_cache tsd. Refactor arenas_cache tsd into arenas_tdata, which is a structure of type arena_tdata_t.	2016-02-19 20:32:37 -08:00
Craig Rodrigues	66814c1a52	Fix tsd_boot1() to use explicit 'void' parameter list.	2015-09-20 21:57:32 -07:00
Mike Hommey	4d871f73af	Preserve LastError when calling TlsGetValue TlsGetValue has a semantic difference with pthread_getspecific, in that it can return a non-error NULL value, so it always sets the LastError. But allocator callers may not be expecting calling e.g. free() to change the value of the last error, so preserve it.	2015-03-04 09:50:33 -08:00
Jason Evans	10aff3f3e1	Refactor bootstrapping to delay tsd initialization. Refactor bootstrapping to delay tsd initialization, primarily to support integration with FreeBSD's libc. Refactor a0() for internal-only use, and add the bootstrap_{malloc,calloc,free}() API for use by FreeBSD's libc. This separation limits use of the a0() functions to metadata allocation, which doesn't require malloc/calloc/free API compatibility. This resolves #170.	2015-01-22 14:04:27 -08:00
Guilherme Goncalves	a2136025c4	Remove extra definition of je_tsd_boot on win32.	2014-11-18 19:08:18 -02:00
Jason Evans	8bb3198f72	Refactor/fix arenas manipulation. Abstract arenas access to use arena_get() (or a0get() where appropriate) rather than directly reading e.g. arenas[ind]. Prior to the addition of the arenas.extend mallctl, the worst possible outcome of directly accessing arenas was a stale read, but arenas.extend may allocate and assign a new array to arenas. Add a tsd-based arenas_cache, which amortizes arenas reads. This introduces some subtle bootstrapping issues, with tsd_boot() now being split into tsd_boot[01]() to support tsd wrapper allocation bootstrapping, as well as an arenas_cache_bypass tsd variable which dynamically terminates allocation of arenas_cache itself. Promote a0malloc(), a0calloc(), and a0free() to be generally useful for internal allocation, and use them in several places (more may be appropriate). Abstract arena->nthreads management and fix a missing decrement during thread destruction (recent tsd refactoring left arenas_cleanup() unused). Change arena_choose() to propagate OOM, and handle OOM in all callers. This is important for providing consistent allocation behavior when the MALLOCX_ARENA() flag is being used. Prior to this fix, it was possible for an OOM to result in allocation silently allocating from a different arena than the one specified.	2014-10-07 23:14:57 -07:00
Jason Evans	029d44cf8b	Fix tsd cleanup regressions. Fix tsd cleanup regressions that were introduced in `5460aa6f66` (Convert all tsd variables to reside in a single tsd structure.). These regressions were twofold: 1) tsd_tryget() should never (and need never) return NULL. Rename it to tsd_fetch() and simplify all callers. 2) tsd__set() must only be called when tsd is in the nominal state, because cleanup happens during the nominal-->purgatory transition, and re-initialization must not happen while in the purgatory state. Add tsd_nominal() and use it as needed. Note that tsd_{p,}_get() can still be used as long as no re-initialization that would require cleanup occurs. This means that e.g. the thread_allocated counter can be updated unconditionally.	2014-10-04 11:22:55 -07:00
Jason Evans	5460aa6f66	Convert all tsd variables to reside in a single tsd structure.	2014-09-23 02:36:08 -07:00
Jason Evans	0f4f1efd94	Add mq (message queue) to test infrastructure. Add mtx (mutex) to test infrastructure, in order to avoid bootstrapping complications that would result from directly using malloc_mutex. Rename test infrastructure's thread abstraction from je_thread to thd. Fix some header ordering issues.	2013-12-12 14:41:02 -08:00
Jason Evans	a4f124f59f	Normalize #define whitespace. Consistently use a tab rather than a space following #define.	2013-12-08 22:28:27 -08:00
Leonard Crestez	cb17fc6a8f	Add support for LinuxThreads. When using LinuxThreads pthread_setspecific triggers recursive allocation on all threads. Work around this by creating a global linked list of in-progress tsd initializations. This modifies the _tsd_get_wrapper macro-generated function. When it has to initialize an TSD object it will push the item to the linked list first. If this causes a recursive allocation then the _get_wrapper request is satisfied from the list. When pthread_setspecific returns the item is removed from the list. This effectively adds a very poor substitute for real TLS used only during pthread_setspecific allocation recursion. Signed-off-by: Crestez Dan Leonard <lcrestez@ixiacom.com>	2013-10-24 18:25:19 -07:00
Mike Hommey	a19e87fbad	Add support for Mingw	2012-04-21 21:27:46 -07:00
Jason Evans	7ad54c1c30	Fix chunk allocation/deallocation bugs. Fix chunk_alloc_dss() to zero memory when requested. Fix chunk_dealloc() to avoid chunk_dealloc_mmap() for dss-allocated memory. Fix huge_palloc() to always junk fill when requested. Improve chunk_recycle() to report that memory is zeroed as a side effect of pages_purge().	2012-04-21 16:04:51 -07:00
Mike Hommey	13067ec835	Remove extra argument for malloc_tsd_cleanup_register Bookkeeping an extra argument that actually only stores a function pointer for a function we already have is not very useful.	2012-04-18 19:25:01 -07:00
Mike Hommey	8ad483fe60	Remove initialization of the non-TLS tsd wrapper from static memory Using static memory when malloc_tsd_malloc fails means all threads share the same wrapper and thus the same wrapped value. This defeats the purpose of TSD.	2012-04-18 19:23:53 -07:00
Mike Hommey	7ff1ce4131	Initialize all members of non-TLS tsd wrapper when creating it Not setting the initialized member leads to randomly calling the cleanup function in cases it shouldn't be called (and isn't called in other implementations).	2012-04-18 19:23:32 -07:00
Jason Evans	3cc1f1aa69	Add tls_model configuration. The tls_model attribute isn't supporte by clang (yet?), so add a configure test that defines JEMALLOC_TLS_MODEL appropriately.	2012-04-03 22:30:05 -07:00
Jason Evans	f2296deb57	Clean up tsd (no functional changes).	2012-03-30 12:36:52 -07:00
Jason Evans	41b6afb834	Port to FreeBSD. Use FreeBSD-specific functions (_pthread_mutex_init_calloc_cb(), _malloc_{pre,post}fork()) to avoid bootstrapping issues due to allocation in libc and libthr. Add malloc_strtoumax() and use it instead of strtoul(). Disable validation code in malloc_vsnprintf() and malloc_strtoumax() until jemalloc is initialized. This is necessary because locale initialization causes allocation for both vsnprintf() and strtoumax(). Force the lazy-lock feature on in order to avoid pthread_self(), because it causes allocation. Use syscall(SYS_write, ...) rather than write(...), because libthr wraps write() and causes allocation. Without this workaround, it would not be possible to print error messages in malloc_conf_init() without substantially reworking bootstrapping. Fix choose_arena_hard() to look at how many threads are assigned to the candidate choice, rather than checking whether the arena is uninitialized. This bug potentially caused more arenas to be initialized than necessary.	2012-02-02 23:09:53 -08:00

1 2

51 Commits