Commit Graph

55 Commits

Author SHA1 Message Date
Yinan Zhang
56c8ecffc1 Correct tsd layout graph
Augmented the tsd layout graph so that the two recently added fields,
`offset_state` and `bytes_until_sample`, are properly reflected.
As is shown, the cache footprint is 16 bytes larger than before.
2019-08-05 15:30:20 -07:00
Qi Wang
b804d0f019 Fallback to 32-bit when 8-bit atomics are missing for TSD.
When it happens, this might cause a slowdown on the fast path operations.
However such case is very rare.
2019-03-09 12:52:06 -08:00
Qi Wang
98b56ab23d Store the bin shard selection in TSD.
This avoids having to choose bin shard on the fly, also will allow flexible bin
binding for each thread.
2018-12-03 17:17:03 -08:00
Qi Wang
37b8913925 Add support for sharded bins within an arena.
This makes it possible to have multiple set of bins in an arena, which improves
arena scalability because the bins (especially the small ones) are always the
limiting factor in production workload.

A bin shard is picked on allocation; each extent tracks the bin shard id for
deallocation.  The shard size will be determined using runtime options.
2018-12-03 17:17:03 -08:00
Dave Watson
997d86acc6 restrict bytes_until_sample to int64_t. This allows optimal asm
generation of sub bytes_until_sample, usize; je; for x86 arch.
Subtraction is unconditional, and only flags are checked for the jump,
no extra compare is necessary.  This also reduces register pressure.
2018-10-15 08:24:12 -07:00
Dave Watson
9ed3bdc848 move bytes until sample to tsd. Fastpath allocation does not need
to load tdata now, avoiding several branches.
2018-10-15 08:24:12 -07:00
David Goldblatt
41b7372ead TSD: Add fork support to tsd_nominal_tsds.
In case of multithreaded fork, we want to leave the child in a reasonable state,
in which tsd_nominal_tsds is either empty or contains only the forking thread.
2018-07-26 17:22:25 -07:00
David Goldblatt
d1e11d48d4 Move tsd link and in_hook after tcache.
This can lead to better cache utilization down the common paths where we don't
touch the link.
2018-06-27 13:39:02 -07:00
David Goldblatt
a7f749c9af Hooks: Protect against reentrancy.
Previously, we made the user deal with this themselves, but that's not good
enough; if hooks may allocate, we should test the allocation pathways down
hooks.  If we're doing that, we might as well actually implement the protection
for the user.
2018-05-18 11:43:03 -07:00
David Goldblatt
0379235f47 Tests: Shouldn't be able to change global slowness.
This can help ensure that we don't leave slowness changes behind in case of
resource exhaustion.
2018-05-18 11:43:03 -07:00
David Goldblatt
e870829e64 TSD: Add the ability to enter a global slow path.
This gives any thread the ability to send other threads down slow paths the next
time they fetch tsd.
2018-05-18 11:43:03 -07:00
David Goldblatt
feff510b9f TSD: Pull name mangling into a macro. 2018-05-18 11:43:03 -07:00
David Goldblatt
39d6420c0c TSD: Make state atomic.
This will let us change the state of another thread remotely, eventually.
2018-05-18 11:43:03 -07:00
David Goldblatt
982c10de35 TSD: Make all state access happen through a function.
Shortly, tsd state will be atomic and have some complicated enough logic down
the state-setting path that we should be aware of it.
2018-05-18 11:43:03 -07:00
Dave Watson
d6feed6e66 Use tsd offset_state instead of atomic
While working on #852, I noticed the prng state is atomic.  This is the only
atomic use of prng in all of jemalloc.  Instead, use a threadlocal prng
state if possible to avoid unnecessary cache line contention.
2017-11-14 08:58:18 -08:00
Qi Wang
9b1befabbb Add minimal initialized TSD.
We use the minimal_initilized tsd (which requires no cleanup) for free()
specifically, if tsd hasn't been initialized yet.

Any other activity will transit the state from minimal to normal.  This is to
workaround the case where a thread has no malloc calls in its lifetime until
during thread termination, free() happens after tls destructors.
2017-06-15 17:55:53 -07:00
Jason Evans
faaf458bad Remove redundant typedefs.
Pre-C11 compilers do not support typedef redefinition.
2017-06-08 13:28:57 -07:00
Qi Wang
5642f03cae Add internal tsd for background_thread. 2017-06-08 10:02:18 -07:00
Qi Wang
00869e39a3 Make tsd no-cleanup during tsd reincarnation.
Since tsd cleanup isn't guaranteed when reincarnated, we set up tsd in a way
that needs no cleanup, by making it going through slow path instead.
2017-06-07 11:03:49 -07:00
David Goldblatt
44f9bd147a Header refactoring: unify and de-catchall rtree module. 2017-05-31 13:08:45 -07:00
David Goldblatt
9f822a1fd7 Header refactoring: unify and de-catchall witness code. 2017-05-24 15:27:30 -07:00
David Goldblatt
3f685e8824 Protect the rtree/extent interactions with a mutex pool.
Instead of embedding a lock bit in rtree leaf elements, we associate extents
with a small set of mutexes.  This gets us two things:

- We can use the system mutexes.  This (hypothetically) protects us from
  priority inversion, and lets us stop doing a backoff/sleep loop, instead
  opting for precise wakeups from the mutex.
- Cuts down on the number of mutex acquisitions we have to do (from 4 in the
  worst case to two).

We end up simplifying most of the rtree code (which no longer has to deal with
locking or concurrency at all), at the cost of additional complexity in the
extent code: since the mutex protecting the rtree leaf elements is determined by
reading the extent out of those elements, the initial read is racy, so that we
may acquire an out of date mutex.  We re-check the extent in the leaf after
acquiring the mutex to protect us from this race.
2017-05-19 14:21:27 -07:00
David Goldblatt
209f2926b8 Header refactoring: tsd - cleanup and dependency breaking.
This removes the tsd macros (which are used only for tsd_t in real builds).  We
break up the circular dependencies involving tsd.

We also move all tsd access through getters and setters.  This allows us to
assert that we only touch data when tsd is in a valid state.

We simplify the usages of the x macro trick, removing all the customizability
(get/set, init, cleanup), moving the lifetime logic to tsd_init and tsd_cleanup.
This lets us make initialization order independent of order within tsd_t.
2017-05-01 10:49:56 -07:00
David Goldblatt
77cccac8cd Break up headers into constituent parts
This is part of a broader change to make header files better represent the
dependencies between one another (see
https://github.com/jemalloc/jemalloc/issues/533). It breaks up component headers
into smaller parts that can be made to have a simpler dependency graph.

For the autogenerated headers (smoothstep.h and size_classes.h), no splitting
was necessary, so I didn't add support to emit multiple headers.
2017-01-12 15:43:51 -08:00
Jason Evans
69c26cdb01 Add some missing explicit casts. 2016-12-13 13:38:11 -08:00
Jason Evans
b54d160dc4 Do not (recursively) allocate within tsd_fetch().
Refactor tsd so that tsdn_fetch() does not trigger allocation, since
allocation could cause infinite recursion.

This resolves #458.
2016-10-20 23:59:12 -07:00
Jason Evans
61f467e16a Avoid self assignment in tsd_set(). 2016-09-23 12:21:34 -07:00
Jason Evans
6f29a83924 Add rtree lookup path caching.
rtree-based extent lookups remain more expensive than chunk-based run
lookups, but with this optimization the fast path slowdown is ~3 CPU
cycles per metadata lookup (on Intel Core i7-4980HQ), versus ~11 cycles
prior.  The path caching speedup tends to degrade gracefully unless
allocated memory is spread far apart (as is the case when using a
mixture of sbrk() and mmap()).
2016-06-05 20:59:57 -07:00
Jason Evans
7be2ebc23f Make tsd cleanup functions optional, remove noop cleanup functions. 2016-06-05 20:42:24 -07:00
Jason Evans
e75e9be130 Add rtree element witnesses. 2016-06-03 12:27:41 -07:00
Jason Evans
ba5c709517 Remove quarantine support. 2016-05-13 10:25:05 -07:00
Jason Evans
c1e00ef2a6 Resolve bootstrapping issues when embedded in FreeBSD libc.
b2c0d6322d (Add witness, a simple online
locking validator.) caused a broad propagation of tsd throughout the
internal API, but tsd_fetch() was designed to fail prior to tsd
bootstrapping.  Fix this by splitting tsd_t into non-nullable tsd_t and
nullable tsdn_t, and modifying all internal APIs that do not critically
rely on tsd to take nullable pointers.  Furthermore, add the
tsd_booted_get() function so that tsdn_fetch() can probe whether tsd
bootstrapping is complete and return NULL if not.  All dangerous
conversions of nullable pointers are tsdn_tsd() calls that assert-fail
on invalid conversion.
2016-05-10 22:51:33 -07:00
Jason Evans
174c0c3a9c Fix fork()-related lock rank ordering reversals. 2016-04-25 23:16:20 -07:00
Jason Evans
66cd953514 Do not allocate metadata via non-auto arenas, nor tcaches.
This assures that all internally allocated metadata come from the
first opt_narenas arenas, i.e. the automatically multiplexed arenas.
2016-04-22 15:19:59 -07:00
Jason Evans
b2c0d6322d Add witness, a simple online locking validator.
This resolves #358.
2016-04-14 02:09:28 -07:00
Jason Evans
db927b6727 Refactor arenas_cache tsd.
Refactor arenas_cache tsd into arenas_tdata, which is a structure of
type arena_tdata_t.
2016-02-19 20:32:37 -08:00
Craig Rodrigues
66814c1a52 Fix tsd_boot1() to use explicit 'void' parameter list. 2015-09-20 21:57:32 -07:00
Mike Hommey
4d871f73af Preserve LastError when calling TlsGetValue
TlsGetValue has a semantic difference with pthread_getspecific, in that it
can return a non-error NULL value, so it always sets the LastError.
But allocator callers may not be expecting calling e.g. free() to change
the value of the last error, so preserve it.
2015-03-04 09:50:33 -08:00
Jason Evans
10aff3f3e1 Refactor bootstrapping to delay tsd initialization.
Refactor bootstrapping to delay tsd initialization, primarily to support
integration with FreeBSD's libc.

Refactor a0*() for internal-only use, and add the
bootstrap_{malloc,calloc,free}() API for use by FreeBSD's libc.  This
separation limits use of the a0*() functions to metadata allocation,
which doesn't require malloc/calloc/free API compatibility.

This resolves #170.
2015-01-22 14:04:27 -08:00
Guilherme Goncalves
a2136025c4 Remove extra definition of je_tsd_boot on win32. 2014-11-18 19:08:18 -02:00
Jason Evans
8bb3198f72 Refactor/fix arenas manipulation.
Abstract arenas access to use arena_get() (or a0get() where appropriate)
rather than directly reading e.g. arenas[ind].  Prior to the addition of
the arenas.extend mallctl, the worst possible outcome of directly
accessing arenas was a stale read, but arenas.extend may allocate and
assign a new array to arenas.

Add a tsd-based arenas_cache, which amortizes arenas reads.  This
introduces some subtle bootstrapping issues, with tsd_boot() now being
split into tsd_boot[01]() to support tsd wrapper allocation
bootstrapping, as well as an arenas_cache_bypass tsd variable which
dynamically terminates allocation of arenas_cache itself.

Promote a0malloc(), a0calloc(), and a0free() to be generally useful for
internal allocation, and use them in several places (more may be
appropriate).

Abstract arena->nthreads management and fix a missing decrement during
thread destruction (recent tsd refactoring left arenas_cleanup()
unused).

Change arena_choose() to propagate OOM, and handle OOM in all callers.
This is important for providing consistent allocation behavior when the
MALLOCX_ARENA() flag is being used.  Prior to this fix, it was possible
for an OOM to result in allocation silently allocating from a different
arena than the one specified.
2014-10-07 23:14:57 -07:00
Jason Evans
029d44cf8b Fix tsd cleanup regressions.
Fix tsd cleanup regressions that were introduced in
5460aa6f66 (Convert all tsd variables to
reside in a single tsd structure.).  These regressions were twofold:

1) tsd_tryget() should never (and need never) return NULL.  Rename it to
   tsd_fetch() and simplify all callers.
2) tsd_*_set() must only be called when tsd is in the nominal state,
   because cleanup happens during the nominal-->purgatory transition,
   and re-initialization must not happen while in the purgatory state.
   Add tsd_nominal() and use it as needed.  Note that tsd_*{p,}_get()
   can still be used as long as no re-initialization that would require
   cleanup occurs.  This means that e.g. the thread_allocated counter
   can be updated unconditionally.
2014-10-04 11:22:55 -07:00
Jason Evans
5460aa6f66 Convert all tsd variables to reside in a single tsd structure. 2014-09-23 02:36:08 -07:00
Jason Evans
0f4f1efd94 Add mq (message queue) to test infrastructure.
Add mtx (mutex) to test infrastructure, in order to avoid bootstrapping
complications that would result from directly using malloc_mutex.

Rename test infrastructure's thread abstraction from je_thread to thd.

Fix some header ordering issues.
2013-12-12 14:41:02 -08:00
Jason Evans
a4f124f59f Normalize #define whitespace.
Consistently use a tab rather than a space following #define.
2013-12-08 22:28:27 -08:00
Leonard Crestez
cb17fc6a8f Add support for LinuxThreads.
When using LinuxThreads pthread_setspecific triggers recursive
allocation on all threads. Work around this by creating a global linked
list of in-progress tsd initializations.

This modifies the _tsd_get_wrapper macro-generated function. When it has
to initialize an TSD object it will push the item to the linked list
first. If this causes a recursive allocation then the _get_wrapper
request is satisfied from the list. When pthread_setspecific returns the
item is removed from the list.

This effectively adds a very poor substitute for real TLS used only
during pthread_setspecific allocation recursion.

Signed-off-by: Crestez Dan Leonard <lcrestez@ixiacom.com>
2013-10-24 18:25:19 -07:00
Mike Hommey
a19e87fbad Add support for Mingw 2012-04-21 21:27:46 -07:00
Jason Evans
7ad54c1c30 Fix chunk allocation/deallocation bugs.
Fix chunk_alloc_dss() to zero memory when requested.

Fix chunk_dealloc() to avoid chunk_dealloc_mmap() for dss-allocated
memory.

Fix huge_palloc() to always junk fill when requested.

Improve chunk_recycle() to report that memory is zeroed as a side effect
of pages_purge().
2012-04-21 16:04:51 -07:00
Mike Hommey
13067ec835 Remove extra argument for malloc_tsd_cleanup_register
Bookkeeping an extra argument that actually only stores a function pointer
for a function we already have is not very useful.
2012-04-18 19:25:01 -07:00
Mike Hommey
8ad483fe60 Remove initialization of the non-TLS tsd wrapper from static memory
Using static memory when malloc_tsd_malloc fails means all threads share
the same wrapper and thus the same wrapped value. This defeats the purpose
of TSD.
2012-04-18 19:23:53 -07:00