2010-01-17 01:53:50 +08:00
|
|
|
#define JEMALLOC_HUGE_C_
|
2010-02-12 06:45:59 +08:00
|
|
|
#include "jemalloc/internal/jemalloc_internal.h"
|
2010-01-17 01:53:50 +08:00
|
|
|
|
|
|
|
/******************************************************************************/
|
|
|
|
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
static extent_node_t *
|
|
|
|
huge_node_get(const void *ptr)
|
|
|
|
{
|
|
|
|
extent_node_t *node;
|
2010-01-17 01:53:50 +08:00
|
|
|
|
2015-05-16 08:02:30 +08:00
|
|
|
node = chunk_lookup(ptr, true);
|
2015-02-16 10:04:46 +08:00
|
|
|
assert(!extent_node_achunk_get(node));
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
|
|
|
|
return (node);
|
|
|
|
}
|
2010-01-17 01:53:50 +08:00
|
|
|
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
static bool
|
|
|
|
huge_node_set(const void *ptr, extent_node_t *node)
|
|
|
|
{
|
|
|
|
|
2015-02-16 10:04:46 +08:00
|
|
|
assert(extent_node_addr_get(node) == ptr);
|
|
|
|
assert(!extent_node_achunk_get(node));
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
return (chunk_register(ptr, node));
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
huge_node_unset(const void *ptr, const extent_node_t *node)
|
|
|
|
{
|
|
|
|
|
|
|
|
chunk_deregister(ptr, node);
|
|
|
|
}
|
2010-01-17 01:53:50 +08:00
|
|
|
|
|
|
|
void *
|
2015-01-30 07:30:47 +08:00
|
|
|
huge_malloc(tsd_t *tsd, arena_t *arena, size_t size, bool zero,
|
|
|
|
tcache_t *tcache)
|
2012-04-11 01:50:33 +08:00
|
|
|
{
|
2014-10-06 08:54:10 +08:00
|
|
|
size_t usize;
|
2012-04-11 01:50:33 +08:00
|
|
|
|
2014-10-06 08:54:10 +08:00
|
|
|
usize = s2u(size);
|
|
|
|
if (usize == 0) {
|
|
|
|
/* size_t overflow. */
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
2015-01-30 07:30:47 +08:00
|
|
|
return (huge_palloc(tsd, arena, usize, chunksize, zero, tcache));
|
2012-04-11 01:50:33 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void *
|
2015-07-24 08:16:32 +08:00
|
|
|
huge_palloc(tsd_t *tsd, arena_t *arena, size_t size, size_t alignment,
|
2015-01-30 07:30:47 +08:00
|
|
|
bool zero, tcache_t *tcache)
|
2010-01-17 01:53:50 +08:00
|
|
|
{
|
|
|
|
void *ret;
|
2015-07-24 08:16:32 +08:00
|
|
|
size_t usize;
|
2010-01-17 01:53:50 +08:00
|
|
|
extent_node_t *node;
|
2012-04-22 07:04:51 +08:00
|
|
|
bool is_zeroed;
|
2010-01-17 01:53:50 +08:00
|
|
|
|
|
|
|
/* Allocate one or more contiguous chunks for this request. */
|
|
|
|
|
2015-07-24 08:16:32 +08:00
|
|
|
usize = sa2u(size, alignment);
|
|
|
|
if (unlikely(usize == 0))
|
|
|
|
return (NULL);
|
|
|
|
assert(usize >= chunksize);
|
|
|
|
|
2010-01-17 01:53:50 +08:00
|
|
|
/* Allocate an extent node with which to track the chunk. */
|
2014-11-28 03:22:36 +08:00
|
|
|
node = ipallocztm(tsd, CACHELINE_CEILING(sizeof(extent_node_t)),
|
2015-01-30 07:30:47 +08:00
|
|
|
CACHELINE, false, tcache, true, arena);
|
2010-01-17 01:53:50 +08:00
|
|
|
if (node == NULL)
|
|
|
|
return (NULL);
|
|
|
|
|
2012-04-22 07:04:51 +08:00
|
|
|
/*
|
|
|
|
* Copy zero into is_zeroed and pass the copy to chunk_alloc(), so that
|
|
|
|
* it is possible to make correct junk/zero fill decisions below.
|
|
|
|
*/
|
|
|
|
is_zeroed = zero;
|
Refactor/fix arenas manipulation.
Abstract arenas access to use arena_get() (or a0get() where appropriate)
rather than directly reading e.g. arenas[ind]. Prior to the addition of
the arenas.extend mallctl, the worst possible outcome of directly
accessing arenas was a stale read, but arenas.extend may allocate and
assign a new array to arenas.
Add a tsd-based arenas_cache, which amortizes arenas reads. This
introduces some subtle bootstrapping issues, with tsd_boot() now being
split into tsd_boot[01]() to support tsd wrapper allocation
bootstrapping, as well as an arenas_cache_bypass tsd variable which
dynamically terminates allocation of arenas_cache itself.
Promote a0malloc(), a0calloc(), and a0free() to be generally useful for
internal allocation, and use them in several places (more may be
appropriate).
Abstract arena->nthreads management and fix a missing decrement during
thread destruction (recent tsd refactoring left arenas_cleanup()
unused).
Change arena_choose() to propagate OOM, and handle OOM in all callers.
This is important for providing consistent allocation behavior when the
MALLOCX_ARENA() flag is being used. Prior to this fix, it was possible
for an OOM to result in allocation silently allocating from a different
arena than the one specified.
2014-10-08 14:14:57 +08:00
|
|
|
arena = arena_choose(tsd, arena);
|
2014-12-05 08:42:42 +08:00
|
|
|
if (unlikely(arena == NULL) || (ret = arena_chunk_alloc_huge(arena,
|
2015-07-24 08:16:32 +08:00
|
|
|
size, alignment, &is_zeroed)) == NULL) {
|
2015-01-30 07:30:47 +08:00
|
|
|
idalloctm(tsd, node, tcache, true);
|
2010-01-17 01:53:50 +08:00
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
2015-07-24 08:16:32 +08:00
|
|
|
extent_node_init(node, arena, ret, size, is_zeroed);
|
2010-01-17 01:53:50 +08:00
|
|
|
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
if (huge_node_set(ret, node)) {
|
2015-07-24 08:16:32 +08:00
|
|
|
arena_chunk_dalloc_huge(arena, ret, size);
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
idalloctm(tsd, node, tcache, true);
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Insert node into huge. */
|
|
|
|
malloc_mutex_lock(&arena->huge_mtx);
|
2015-02-16 08:43:52 +08:00
|
|
|
ql_elm_new(node, ql_link);
|
|
|
|
ql_tail_insert(&arena->huge, node, ql_link);
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
malloc_mutex_unlock(&arena->huge_mtx);
|
2010-01-17 01:53:50 +08:00
|
|
|
|
2014-10-16 09:02:02 +08:00
|
|
|
if (zero || (config_fill && unlikely(opt_zero))) {
|
|
|
|
if (!is_zeroed)
|
2015-07-24 08:16:32 +08:00
|
|
|
memset(ret, 0, size);
|
2014-12-09 05:12:41 +08:00
|
|
|
} else if (config_fill && unlikely(opt_junk_alloc))
|
2015-07-24 08:16:32 +08:00
|
|
|
memset(ret, 0xa5, size);
|
2010-01-17 01:53:50 +08:00
|
|
|
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
Implement in-place huge allocation shrinking.
Trivial example:
#include <stdlib.h>
int main(void) {
void *ptr = malloc(1024 * 1024 * 8);
if (!ptr) return 1;
ptr = realloc(ptr, 1024 * 1024 * 4);
if (!ptr) return 1;
}
Before:
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcfff000000
mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcffec00000
madvise(0x7fcfff000000, 8388608, MADV_DONTNEED) = 0
After:
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1934800000
madvise(0x7f1934c00000, 4194304, MADV_DONTNEED) = 0
Closes #134
2014-09-30 22:33:46 +08:00
|
|
|
#ifdef JEMALLOC_JET
|
|
|
|
#undef huge_dalloc_junk
|
|
|
|
#define huge_dalloc_junk JEMALLOC_N(huge_dalloc_junk_impl)
|
|
|
|
#endif
|
|
|
|
static void
|
|
|
|
huge_dalloc_junk(void *ptr, size_t usize)
|
|
|
|
{
|
|
|
|
|
2014-12-09 05:12:41 +08:00
|
|
|
if (config_fill && have_dss && unlikely(opt_junk_free)) {
|
Implement in-place huge allocation shrinking.
Trivial example:
#include <stdlib.h>
int main(void) {
void *ptr = malloc(1024 * 1024 * 8);
if (!ptr) return 1;
ptr = realloc(ptr, 1024 * 1024 * 4);
if (!ptr) return 1;
}
Before:
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcfff000000
mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcffec00000
madvise(0x7fcfff000000, 8388608, MADV_DONTNEED) = 0
After:
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1934800000
madvise(0x7f1934c00000, 4194304, MADV_DONTNEED) = 0
Closes #134
2014-09-30 22:33:46 +08:00
|
|
|
/*
|
|
|
|
* Only bother junk filling if the chunk isn't about to be
|
|
|
|
* unmapped.
|
|
|
|
*/
|
2014-10-04 01:16:09 +08:00
|
|
|
if (!config_munmap || (have_dss && chunk_in_dss(ptr)))
|
Implement in-place huge allocation shrinking.
Trivial example:
#include <stdlib.h>
int main(void) {
void *ptr = malloc(1024 * 1024 * 8);
if (!ptr) return 1;
ptr = realloc(ptr, 1024 * 1024 * 4);
if (!ptr) return 1;
}
Before:
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcfff000000
mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcffec00000
madvise(0x7fcfff000000, 8388608, MADV_DONTNEED) = 0
After:
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1934800000
madvise(0x7f1934c00000, 4194304, MADV_DONTNEED) = 0
Closes #134
2014-09-30 22:33:46 +08:00
|
|
|
memset(ptr, 0x5a, usize);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
#ifdef JEMALLOC_JET
|
|
|
|
#undef huge_dalloc_junk
|
|
|
|
#define huge_dalloc_junk JEMALLOC_N(huge_dalloc_junk)
|
|
|
|
huge_dalloc_junk_t *huge_dalloc_junk = JEMALLOC_N(huge_dalloc_junk_impl);
|
|
|
|
#endif
|
|
|
|
|
2014-10-13 13:53:59 +08:00
|
|
|
static void
|
|
|
|
huge_ralloc_no_move_similar(void *ptr, size_t oldsize, size_t usize,
|
|
|
|
size_t size, size_t extra, bool zero)
|
|
|
|
{
|
|
|
|
size_t usize_next;
|
2014-11-28 03:22:36 +08:00
|
|
|
extent_node_t *node;
|
2014-10-13 13:53:59 +08:00
|
|
|
arena_t *arena;
|
2015-03-19 09:55:33 +08:00
|
|
|
chunk_purge_t *chunk_purge;
|
|
|
|
bool zeroed;
|
2014-10-13 13:53:59 +08:00
|
|
|
|
|
|
|
/* Increase usize to incorporate extra. */
|
|
|
|
while (usize < s2u(size+extra) && (usize_next = s2u(usize+1)) < oldsize)
|
|
|
|
usize = usize_next;
|
|
|
|
|
2014-10-15 13:20:00 +08:00
|
|
|
if (oldsize == usize)
|
|
|
|
return;
|
2014-10-13 13:53:59 +08:00
|
|
|
|
2015-03-19 09:55:33 +08:00
|
|
|
node = huge_node_get(ptr);
|
|
|
|
arena = extent_node_arena_get(node);
|
|
|
|
|
|
|
|
malloc_mutex_lock(&arena->lock);
|
|
|
|
chunk_purge = arena->chunk_purge;
|
|
|
|
malloc_mutex_unlock(&arena->lock);
|
|
|
|
|
2014-10-16 09:02:02 +08:00
|
|
|
/* Fill if necessary (shrinking). */
|
|
|
|
if (oldsize > usize) {
|
2015-03-26 09:56:55 +08:00
|
|
|
size_t sdiff = oldsize - usize;
|
|
|
|
zeroed = !chunk_purge_wrapper(arena, chunk_purge, ptr, usize,
|
|
|
|
sdiff);
|
2014-12-09 05:12:41 +08:00
|
|
|
if (config_fill && unlikely(opt_junk_free)) {
|
2015-03-26 09:56:55 +08:00
|
|
|
memset((void *)((uintptr_t)ptr + usize), 0x5a, sdiff);
|
2014-10-16 09:02:02 +08:00
|
|
|
zeroed = false;
|
|
|
|
}
|
|
|
|
} else
|
|
|
|
zeroed = true;
|
|
|
|
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
malloc_mutex_lock(&arena->huge_mtx);
|
2014-10-15 13:20:00 +08:00
|
|
|
/* Update the size of the huge allocation. */
|
2015-02-16 10:04:46 +08:00
|
|
|
assert(extent_node_size_get(node) != usize);
|
|
|
|
extent_node_size_set(node, usize);
|
|
|
|
/* Clear node's zeroed field if zeroing failed above. */
|
|
|
|
extent_node_zeroed_set(node, extent_node_zeroed_get(node) && zeroed);
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
malloc_mutex_unlock(&arena->huge_mtx);
|
2014-10-13 13:53:59 +08:00
|
|
|
|
2014-10-15 13:20:00 +08:00
|
|
|
arena_chunk_ralloc_huge_similar(arena, ptr, oldsize, usize);
|
2014-10-13 13:53:59 +08:00
|
|
|
|
2014-10-15 13:20:00 +08:00
|
|
|
/* Fill if necessary (growing). */
|
2014-10-13 13:53:59 +08:00
|
|
|
if (oldsize < usize) {
|
2014-10-16 09:02:02 +08:00
|
|
|
if (zero || (config_fill && unlikely(opt_zero))) {
|
2014-11-18 01:54:49 +08:00
|
|
|
if (!zeroed) {
|
|
|
|
memset((void *)((uintptr_t)ptr + oldsize), 0,
|
|
|
|
usize - oldsize);
|
|
|
|
}
|
2014-12-09 05:12:41 +08:00
|
|
|
} else if (config_fill && unlikely(opt_junk_alloc)) {
|
2014-11-18 01:54:49 +08:00
|
|
|
memset((void *)((uintptr_t)ptr + oldsize), 0xa5, usize -
|
|
|
|
oldsize);
|
|
|
|
}
|
2014-10-15 13:20:00 +08:00
|
|
|
}
|
2014-10-13 13:53:59 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
huge_ralloc_no_move_shrink(void *ptr, size_t oldsize, size_t usize)
|
|
|
|
{
|
2014-11-28 03:22:36 +08:00
|
|
|
extent_node_t *node;
|
2014-10-13 13:53:59 +08:00
|
|
|
arena_t *arena;
|
2015-03-19 09:55:33 +08:00
|
|
|
chunk_purge_t *chunk_purge;
|
|
|
|
bool zeroed;
|
|
|
|
|
|
|
|
node = huge_node_get(ptr);
|
|
|
|
arena = extent_node_arena_get(node);
|
|
|
|
|
|
|
|
malloc_mutex_lock(&arena->lock);
|
|
|
|
chunk_purge = arena->chunk_purge;
|
|
|
|
malloc_mutex_unlock(&arena->lock);
|
2014-10-13 13:53:59 +08:00
|
|
|
|
2015-03-26 09:56:55 +08:00
|
|
|
if (oldsize > usize) {
|
|
|
|
size_t sdiff = oldsize - usize;
|
|
|
|
zeroed = !chunk_purge_wrapper(arena, chunk_purge,
|
|
|
|
CHUNK_ADDR2BASE((uintptr_t)ptr + usize),
|
|
|
|
CHUNK_ADDR2OFFSET((uintptr_t)ptr + usize), sdiff);
|
|
|
|
if (config_fill && unlikely(opt_junk_free)) {
|
|
|
|
huge_dalloc_junk((void *)((uintptr_t)ptr + usize),
|
|
|
|
sdiff);
|
|
|
|
zeroed = false;
|
|
|
|
}
|
|
|
|
} else
|
|
|
|
zeroed = true;
|
2014-10-16 09:02:02 +08:00
|
|
|
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
malloc_mutex_lock(&arena->huge_mtx);
|
2014-10-13 13:53:59 +08:00
|
|
|
/* Update the size of the huge allocation. */
|
2015-02-16 10:04:46 +08:00
|
|
|
extent_node_size_set(node, usize);
|
|
|
|
/* Clear node's zeroed field if zeroing failed above. */
|
|
|
|
extent_node_zeroed_set(node, extent_node_zeroed_get(node) && zeroed);
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
malloc_mutex_unlock(&arena->huge_mtx);
|
2014-10-13 13:53:59 +08:00
|
|
|
|
|
|
|
/* Zap the excess chunks. */
|
2014-10-15 13:20:00 +08:00
|
|
|
arena_chunk_ralloc_huge_shrink(arena, ptr, oldsize, usize);
|
2014-10-13 13:53:59 +08:00
|
|
|
}
|
|
|
|
|
Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.
It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).
Repeated vector reallocation micro-benchmark:
#include <string.h>
#include <stdlib.h>
int main(void) {
for (size_t i = 0; i < 100; i++) {
void *ptr = NULL;
size_t old_size = 0;
for (size_t size = 4; size < (1 << 30); size *= 2) {
ptr = realloc(ptr, size);
if (!ptr) return 1;
memset(ptr + old_size, 0xff, size - old_size);
old_size = size;
}
free(ptr);
}
}
The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.
With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.
An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.
Numbers with transparent huge pages enabled:
glibc (copies elided via MREMAP_MAYMOVE): 8.471s
jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s
jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s
Numbers with transparent huge pages disabled:
glibc (copies elided via MREMAP_MAYMOVE): 15.403s
jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s
jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s
Closes #137
2014-10-04 13:39:32 +08:00
|
|
|
static bool
|
|
|
|
huge_ralloc_no_move_expand(void *ptr, size_t oldsize, size_t size, bool zero) {
|
2014-10-06 08:54:10 +08:00
|
|
|
size_t usize;
|
2014-11-28 03:22:36 +08:00
|
|
|
extent_node_t *node;
|
Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.
It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).
Repeated vector reallocation micro-benchmark:
#include <string.h>
#include <stdlib.h>
int main(void) {
for (size_t i = 0; i < 100; i++) {
void *ptr = NULL;
size_t old_size = 0;
for (size_t size = 4; size < (1 << 30); size *= 2) {
ptr = realloc(ptr, size);
if (!ptr) return 1;
memset(ptr + old_size, 0xff, size - old_size);
old_size = size;
}
free(ptr);
}
}
The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.
With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.
An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.
Numbers with transparent huge pages enabled:
glibc (copies elided via MREMAP_MAYMOVE): 8.471s
jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s
jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s
Numbers with transparent huge pages disabled:
glibc (copies elided via MREMAP_MAYMOVE): 15.403s
jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s
jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s
Closes #137
2014-10-04 13:39:32 +08:00
|
|
|
arena_t *arena;
|
2014-10-16 09:02:02 +08:00
|
|
|
bool is_zeroed_subchunk, is_zeroed_chunk;
|
Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.
It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).
Repeated vector reallocation micro-benchmark:
#include <string.h>
#include <stdlib.h>
int main(void) {
for (size_t i = 0; i < 100; i++) {
void *ptr = NULL;
size_t old_size = 0;
for (size_t size = 4; size < (1 << 30); size *= 2) {
ptr = realloc(ptr, size);
if (!ptr) return 1;
memset(ptr + old_size, 0xff, size - old_size);
old_size = size;
}
free(ptr);
}
}
The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.
With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.
An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.
Numbers with transparent huge pages enabled:
glibc (copies elided via MREMAP_MAYMOVE): 8.471s
jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s
jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s
Numbers with transparent huge pages disabled:
glibc (copies elided via MREMAP_MAYMOVE): 15.403s
jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s
jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s
Closes #137
2014-10-04 13:39:32 +08:00
|
|
|
|
2014-10-06 08:54:10 +08:00
|
|
|
usize = s2u(size);
|
|
|
|
if (usize == 0) {
|
|
|
|
/* size_t overflow. */
|
Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.
It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).
Repeated vector reallocation micro-benchmark:
#include <string.h>
#include <stdlib.h>
int main(void) {
for (size_t i = 0; i < 100; i++) {
void *ptr = NULL;
size_t old_size = 0;
for (size_t size = 4; size < (1 << 30); size *= 2) {
ptr = realloc(ptr, size);
if (!ptr) return 1;
memset(ptr + old_size, 0xff, size - old_size);
old_size = size;
}
free(ptr);
}
}
The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.
With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.
An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.
Numbers with transparent huge pages enabled:
glibc (copies elided via MREMAP_MAYMOVE): 8.471s
jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s
jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s
Numbers with transparent huge pages disabled:
glibc (copies elided via MREMAP_MAYMOVE): 15.403s
jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s
jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s
Closes #137
2014-10-04 13:39:32 +08:00
|
|
|
return (true);
|
|
|
|
}
|
|
|
|
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
node = huge_node_get(ptr);
|
2015-02-16 10:04:46 +08:00
|
|
|
arena = extent_node_arena_get(node);
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
malloc_mutex_lock(&arena->huge_mtx);
|
2015-02-16 10:04:46 +08:00
|
|
|
is_zeroed_subchunk = extent_node_zeroed_get(node);
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
malloc_mutex_unlock(&arena->huge_mtx);
|
Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.
It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).
Repeated vector reallocation micro-benchmark:
#include <string.h>
#include <stdlib.h>
int main(void) {
for (size_t i = 0; i < 100; i++) {
void *ptr = NULL;
size_t old_size = 0;
for (size_t size = 4; size < (1 << 30); size *= 2) {
ptr = realloc(ptr, size);
if (!ptr) return 1;
memset(ptr + old_size, 0xff, size - old_size);
old_size = size;
}
free(ptr);
}
}
The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.
With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.
An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.
Numbers with transparent huge pages enabled:
glibc (copies elided via MREMAP_MAYMOVE): 8.471s
jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s
jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s
Numbers with transparent huge pages disabled:
glibc (copies elided via MREMAP_MAYMOVE): 15.403s
jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s
jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s
Closes #137
2014-10-04 13:39:32 +08:00
|
|
|
|
|
|
|
/*
|
2014-10-16 09:02:02 +08:00
|
|
|
* Copy zero into is_zeroed_chunk and pass the copy to chunk_alloc(), so
|
|
|
|
* that it is possible to make correct junk/zero fill decisions below.
|
Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.
It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).
Repeated vector reallocation micro-benchmark:
#include <string.h>
#include <stdlib.h>
int main(void) {
for (size_t i = 0; i < 100; i++) {
void *ptr = NULL;
size_t old_size = 0;
for (size_t size = 4; size < (1 << 30); size *= 2) {
ptr = realloc(ptr, size);
if (!ptr) return 1;
memset(ptr + old_size, 0xff, size - old_size);
old_size = size;
}
free(ptr);
}
}
The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.
With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.
An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.
Numbers with transparent huge pages enabled:
glibc (copies elided via MREMAP_MAYMOVE): 8.471s
jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s
jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s
Numbers with transparent huge pages disabled:
glibc (copies elided via MREMAP_MAYMOVE): 15.403s
jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s
jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s
Closes #137
2014-10-04 13:39:32 +08:00
|
|
|
*/
|
2014-10-16 09:02:02 +08:00
|
|
|
is_zeroed_chunk = zero;
|
Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.
It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).
Repeated vector reallocation micro-benchmark:
#include <string.h>
#include <stdlib.h>
int main(void) {
for (size_t i = 0; i < 100; i++) {
void *ptr = NULL;
size_t old_size = 0;
for (size_t size = 4; size < (1 << 30); size *= 2) {
ptr = realloc(ptr, size);
if (!ptr) return 1;
memset(ptr + old_size, 0xff, size - old_size);
old_size = size;
}
free(ptr);
}
}
The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.
With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.
An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.
Numbers with transparent huge pages enabled:
glibc (copies elided via MREMAP_MAYMOVE): 8.471s
jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s
jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s
Numbers with transparent huge pages disabled:
glibc (copies elided via MREMAP_MAYMOVE): 15.403s
jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s
jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s
Closes #137
2014-10-04 13:39:32 +08:00
|
|
|
|
2014-10-15 13:20:00 +08:00
|
|
|
if (arena_chunk_ralloc_huge_expand(arena, ptr, oldsize, usize,
|
2014-10-16 09:02:02 +08:00
|
|
|
&is_zeroed_chunk))
|
2014-10-15 13:20:00 +08:00
|
|
|
return (true);
|
Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.
It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).
Repeated vector reallocation micro-benchmark:
#include <string.h>
#include <stdlib.h>
int main(void) {
for (size_t i = 0; i < 100; i++) {
void *ptr = NULL;
size_t old_size = 0;
for (size_t size = 4; size < (1 << 30); size *= 2) {
ptr = realloc(ptr, size);
if (!ptr) return 1;
memset(ptr + old_size, 0xff, size - old_size);
old_size = size;
}
free(ptr);
}
}
The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.
With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.
An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.
Numbers with transparent huge pages enabled:
glibc (copies elided via MREMAP_MAYMOVE): 8.471s
jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s
jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s
Numbers with transparent huge pages disabled:
glibc (copies elided via MREMAP_MAYMOVE): 15.403s
jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s
jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s
Closes #137
2014-10-04 13:39:32 +08:00
|
|
|
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
malloc_mutex_lock(&arena->huge_mtx);
|
Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.
It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).
Repeated vector reallocation micro-benchmark:
#include <string.h>
#include <stdlib.h>
int main(void) {
for (size_t i = 0; i < 100; i++) {
void *ptr = NULL;
size_t old_size = 0;
for (size_t size = 4; size < (1 << 30); size *= 2) {
ptr = realloc(ptr, size);
if (!ptr) return 1;
memset(ptr + old_size, 0xff, size - old_size);
old_size = size;
}
free(ptr);
}
}
The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.
With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.
An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.
Numbers with transparent huge pages enabled:
glibc (copies elided via MREMAP_MAYMOVE): 8.471s
jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s
jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s
Numbers with transparent huge pages disabled:
glibc (copies elided via MREMAP_MAYMOVE): 15.403s
jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s
jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s
Closes #137
2014-10-04 13:39:32 +08:00
|
|
|
/* Update the size of the huge allocation. */
|
2015-02-16 10:04:46 +08:00
|
|
|
extent_node_size_set(node, usize);
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
malloc_mutex_unlock(&arena->huge_mtx);
|
Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.
It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).
Repeated vector reallocation micro-benchmark:
#include <string.h>
#include <stdlib.h>
int main(void) {
for (size_t i = 0; i < 100; i++) {
void *ptr = NULL;
size_t old_size = 0;
for (size_t size = 4; size < (1 << 30); size *= 2) {
ptr = realloc(ptr, size);
if (!ptr) return 1;
memset(ptr + old_size, 0xff, size - old_size);
old_size = size;
}
free(ptr);
}
}
The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.
With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.
An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.
Numbers with transparent huge pages enabled:
glibc (copies elided via MREMAP_MAYMOVE): 8.471s
jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s
jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s
Numbers with transparent huge pages disabled:
glibc (copies elided via MREMAP_MAYMOVE): 15.403s
jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s
jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s
Closes #137
2014-10-04 13:39:32 +08:00
|
|
|
|
2014-10-16 09:02:02 +08:00
|
|
|
if (zero || (config_fill && unlikely(opt_zero))) {
|
|
|
|
if (!is_zeroed_subchunk) {
|
2014-11-18 01:54:49 +08:00
|
|
|
memset((void *)((uintptr_t)ptr + oldsize), 0,
|
|
|
|
CHUNK_CEILING(oldsize) - oldsize);
|
2014-10-16 09:02:02 +08:00
|
|
|
}
|
|
|
|
if (!is_zeroed_chunk) {
|
2014-11-18 01:54:49 +08:00
|
|
|
memset((void *)((uintptr_t)ptr +
|
|
|
|
CHUNK_CEILING(oldsize)), 0, usize -
|
2014-10-16 09:02:02 +08:00
|
|
|
CHUNK_CEILING(oldsize));
|
|
|
|
}
|
2014-12-09 05:12:41 +08:00
|
|
|
} else if (config_fill && unlikely(opt_junk_alloc)) {
|
2014-11-18 01:54:49 +08:00
|
|
|
memset((void *)((uintptr_t)ptr + oldsize), 0xa5, usize -
|
|
|
|
oldsize);
|
|
|
|
}
|
2014-10-13 13:53:59 +08:00
|
|
|
|
Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.
It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).
Repeated vector reallocation micro-benchmark:
#include <string.h>
#include <stdlib.h>
int main(void) {
for (size_t i = 0; i < 100; i++) {
void *ptr = NULL;
size_t old_size = 0;
for (size_t size = 4; size < (1 << 30); size *= 2) {
ptr = realloc(ptr, size);
if (!ptr) return 1;
memset(ptr + old_size, 0xff, size - old_size);
old_size = size;
}
free(ptr);
}
}
The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.
With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.
An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.
Numbers with transparent huge pages enabled:
glibc (copies elided via MREMAP_MAYMOVE): 8.471s
jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s
jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s
Numbers with transparent huge pages disabled:
glibc (copies elided via MREMAP_MAYMOVE): 15.403s
jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s
jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s
Closes #137
2014-10-04 13:39:32 +08:00
|
|
|
return (false);
|
|
|
|
}
|
|
|
|
|
2014-01-13 07:05:44 +08:00
|
|
|
bool
|
Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.
It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).
Repeated vector reallocation micro-benchmark:
#include <string.h>
#include <stdlib.h>
int main(void) {
for (size_t i = 0; i < 100; i++) {
void *ptr = NULL;
size_t old_size = 0;
for (size_t size = 4; size < (1 << 30); size *= 2) {
ptr = realloc(ptr, size);
if (!ptr) return 1;
memset(ptr + old_size, 0xff, size - old_size);
old_size = size;
}
free(ptr);
}
}
The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.
With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.
An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.
Numbers with transparent huge pages enabled:
glibc (copies elided via MREMAP_MAYMOVE): 8.471s
jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s
jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s
Numbers with transparent huge pages disabled:
glibc (copies elided via MREMAP_MAYMOVE): 15.403s
jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s
jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s
Closes #137
2014-10-04 13:39:32 +08:00
|
|
|
huge_ralloc_no_move(void *ptr, size_t oldsize, size_t size, size_t extra,
|
|
|
|
bool zero)
|
2010-01-17 01:53:50 +08:00
|
|
|
{
|
2014-10-06 08:54:10 +08:00
|
|
|
size_t usize;
|
2010-01-17 01:53:50 +08:00
|
|
|
|
Implement in-place huge allocation shrinking.
Trivial example:
#include <stdlib.h>
int main(void) {
void *ptr = malloc(1024 * 1024 * 8);
if (!ptr) return 1;
ptr = realloc(ptr, 1024 * 1024 * 4);
if (!ptr) return 1;
}
Before:
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcfff000000
mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcffec00000
madvise(0x7fcfff000000, 8388608, MADV_DONTNEED) = 0
After:
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1934800000
madvise(0x7f1934c00000, 4194304, MADV_DONTNEED) = 0
Closes #134
2014-09-30 22:33:46 +08:00
|
|
|
/* Both allocations must be huge to avoid a move. */
|
2014-10-06 08:54:10 +08:00
|
|
|
if (oldsize < chunksize)
|
Implement in-place huge allocation shrinking.
Trivial example:
#include <stdlib.h>
int main(void) {
void *ptr = malloc(1024 * 1024 * 8);
if (!ptr) return 1;
ptr = realloc(ptr, 1024 * 1024 * 4);
if (!ptr) return 1;
}
Before:
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcfff000000
mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcffec00000
madvise(0x7fcfff000000, 8388608, MADV_DONTNEED) = 0
After:
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1934800000
madvise(0x7f1934c00000, 4194304, MADV_DONTNEED) = 0
Closes #134
2014-09-30 22:33:46 +08:00
|
|
|
return (true);
|
|
|
|
|
2014-10-06 08:54:10 +08:00
|
|
|
assert(s2u(oldsize) == oldsize);
|
|
|
|
usize = s2u(size);
|
|
|
|
if (usize == 0) {
|
|
|
|
/* size_t overflow. */
|
|
|
|
return (true);
|
|
|
|
}
|
Implement in-place huge allocation shrinking.
Trivial example:
#include <stdlib.h>
int main(void) {
void *ptr = malloc(1024 * 1024 * 8);
if (!ptr) return 1;
ptr = realloc(ptr, 1024 * 1024 * 4);
if (!ptr) return 1;
}
Before:
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcfff000000
mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcffec00000
madvise(0x7fcfff000000, 8388608, MADV_DONTNEED) = 0
After:
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1934800000
madvise(0x7f1934c00000, 4194304, MADV_DONTNEED) = 0
Closes #134
2014-09-30 22:33:46 +08:00
|
|
|
|
Add {,r,s,d}allocm().
Add allocm(), rallocm(), sallocm(), and dallocm(), which are a
functional superset of malloc(), calloc(), posix_memalign(),
malloc_usable_size(), and free().
2010-09-18 06:46:18 +08:00
|
|
|
/*
|
2014-10-06 08:54:10 +08:00
|
|
|
* Avoid moving the allocation if the existing chunk size accommodates
|
|
|
|
* the new size.
|
Add {,r,s,d}allocm().
Add allocm(), rallocm(), sallocm(), and dallocm(), which are a
functional superset of malloc(), calloc(), posix_memalign(),
malloc_usable_size(), and free().
2010-09-18 06:46:18 +08:00
|
|
|
*/
|
2014-10-06 08:54:10 +08:00
|
|
|
if (CHUNK_CEILING(oldsize) >= CHUNK_CEILING(usize)
|
2015-07-25 09:18:03 +08:00
|
|
|
&& CHUNK_CEILING(oldsize) <= CHUNK_CEILING(s2u(size+extra))) {
|
2014-10-13 13:53:59 +08:00
|
|
|
huge_ralloc_no_move_similar(ptr, oldsize, usize, size, extra,
|
|
|
|
zero);
|
2014-01-13 07:05:44 +08:00
|
|
|
return (false);
|
2010-01-17 01:53:50 +08:00
|
|
|
}
|
|
|
|
|
2015-07-25 09:21:42 +08:00
|
|
|
if (!maps_coalesce)
|
|
|
|
return (true);
|
|
|
|
|
Implement in-place huge allocation shrinking.
Trivial example:
#include <stdlib.h>
int main(void) {
void *ptr = malloc(1024 * 1024 * 8);
if (!ptr) return 1;
ptr = realloc(ptr, 1024 * 1024 * 4);
if (!ptr) return 1;
}
Before:
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcfff000000
mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcffec00000
madvise(0x7fcfff000000, 8388608, MADV_DONTNEED) = 0
After:
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1934800000
madvise(0x7f1934c00000, 4194304, MADV_DONTNEED) = 0
Closes #134
2014-09-30 22:33:46 +08:00
|
|
|
/* Shrink the allocation in-place. */
|
2014-10-10 08:54:06 +08:00
|
|
|
if (CHUNK_CEILING(oldsize) >= CHUNK_CEILING(usize)) {
|
2014-10-13 13:53:59 +08:00
|
|
|
huge_ralloc_no_move_shrink(ptr, oldsize, usize);
|
Implement in-place huge allocation shrinking.
Trivial example:
#include <stdlib.h>
int main(void) {
void *ptr = malloc(1024 * 1024 * 8);
if (!ptr) return 1;
ptr = realloc(ptr, 1024 * 1024 * 4);
if (!ptr) return 1;
}
Before:
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcfff000000
mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcffec00000
madvise(0x7fcfff000000, 8388608, MADV_DONTNEED) = 0
After:
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1934800000
madvise(0x7f1934c00000, 4194304, MADV_DONTNEED) = 0
Closes #134
2014-09-30 22:33:46 +08:00
|
|
|
return (false);
|
|
|
|
}
|
|
|
|
|
Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.
It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).
Repeated vector reallocation micro-benchmark:
#include <string.h>
#include <stdlib.h>
int main(void) {
for (size_t i = 0; i < 100; i++) {
void *ptr = NULL;
size_t old_size = 0;
for (size_t size = 4; size < (1 << 30); size *= 2) {
ptr = realloc(ptr, size);
if (!ptr) return 1;
memset(ptr + old_size, 0xff, size - old_size);
old_size = size;
}
free(ptr);
}
}
The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.
With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.
An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.
Numbers with transparent huge pages enabled:
glibc (copies elided via MREMAP_MAYMOVE): 8.471s
jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s
jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s
Numbers with transparent huge pages disabled:
glibc (copies elided via MREMAP_MAYMOVE): 15.403s
jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s
jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s
Closes #137
2014-10-04 13:39:32 +08:00
|
|
|
/* Attempt to expand the allocation in-place. */
|
2015-03-19 09:55:33 +08:00
|
|
|
if (huge_ralloc_no_move_expand(ptr, oldsize, size + extra, zero)) {
|
Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.
It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).
Repeated vector reallocation micro-benchmark:
#include <string.h>
#include <stdlib.h>
int main(void) {
for (size_t i = 0; i < 100; i++) {
void *ptr = NULL;
size_t old_size = 0;
for (size_t size = 4; size < (1 << 30); size *= 2) {
ptr = realloc(ptr, size);
if (!ptr) return 1;
memset(ptr + old_size, 0xff, size - old_size);
old_size = size;
}
free(ptr);
}
}
The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.
With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.
An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.
Numbers with transparent huge pages enabled:
glibc (copies elided via MREMAP_MAYMOVE): 8.471s
jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s
jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s
Numbers with transparent huge pages disabled:
glibc (copies elided via MREMAP_MAYMOVE): 15.403s
jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s
jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s
Closes #137
2014-10-04 13:39:32 +08:00
|
|
|
if (extra == 0)
|
|
|
|
return (true);
|
|
|
|
|
|
|
|
/* Try again, this time without extra. */
|
|
|
|
return (huge_ralloc_no_move_expand(ptr, oldsize, size, zero));
|
|
|
|
}
|
|
|
|
return (false);
|
Add {,r,s,d}allocm().
Add allocm(), rallocm(), sallocm(), and dallocm(), which are a
functional superset of malloc(), calloc(), posix_memalign(),
malloc_usable_size(), and free().
2010-09-18 06:46:18 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void *
|
2014-09-23 12:09:23 +08:00
|
|
|
huge_ralloc(tsd_t *tsd, arena_t *arena, void *ptr, size_t oldsize, size_t size,
|
2015-01-30 07:30:47 +08:00
|
|
|
size_t extra, size_t alignment, bool zero, tcache_t *tcache)
|
Add {,r,s,d}allocm().
Add allocm(), rallocm(), sallocm(), and dallocm(), which are a
functional superset of malloc(), calloc(), posix_memalign(),
malloc_usable_size(), and free().
2010-09-18 06:46:18 +08:00
|
|
|
{
|
|
|
|
void *ret;
|
|
|
|
size_t copysize;
|
|
|
|
|
|
|
|
/* Try to avoid moving the allocation. */
|
Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.
It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).
Repeated vector reallocation micro-benchmark:
#include <string.h>
#include <stdlib.h>
int main(void) {
for (size_t i = 0; i < 100; i++) {
void *ptr = NULL;
size_t old_size = 0;
for (size_t size = 4; size < (1 << 30); size *= 2) {
ptr = realloc(ptr, size);
if (!ptr) return 1;
memset(ptr + old_size, 0xff, size - old_size);
old_size = size;
}
free(ptr);
}
}
The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.
With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.
An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.
Numbers with transparent huge pages enabled:
glibc (copies elided via MREMAP_MAYMOVE): 8.471s
jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s
jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s
Numbers with transparent huge pages disabled:
glibc (copies elided via MREMAP_MAYMOVE): 15.403s
jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s
jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s
Closes #137
2014-10-04 13:39:32 +08:00
|
|
|
if (!huge_ralloc_no_move(ptr, oldsize, size, extra, zero))
|
2014-01-13 07:05:44 +08:00
|
|
|
return (ptr);
|
Add {,r,s,d}allocm().
Add allocm(), rallocm(), sallocm(), and dallocm(), which are a
functional superset of malloc(), calloc(), posix_memalign(),
malloc_usable_size(), and free().
2010-09-18 06:46:18 +08:00
|
|
|
|
2010-01-17 01:53:50 +08:00
|
|
|
/*
|
Add {,r,s,d}allocm().
Add allocm(), rallocm(), sallocm(), and dallocm(), which are a
functional superset of malloc(), calloc(), posix_memalign(),
malloc_usable_size(), and free().
2010-09-18 06:46:18 +08:00
|
|
|
* size and oldsize are different enough that we need to use a
|
|
|
|
* different size class. In that case, fall back to allocating new
|
|
|
|
* space and copying.
|
2010-01-17 01:53:50 +08:00
|
|
|
*/
|
2014-10-10 08:54:06 +08:00
|
|
|
if (alignment > chunksize) {
|
|
|
|
ret = huge_palloc(tsd, arena, size + extra, alignment, zero,
|
2015-01-30 07:30:47 +08:00
|
|
|
tcache);
|
|
|
|
} else
|
|
|
|
ret = huge_malloc(tsd, arena, size + extra, zero, tcache);
|
Add {,r,s,d}allocm().
Add allocm(), rallocm(), sallocm(), and dallocm(), which are a
functional superset of malloc(), calloc(), posix_memalign(),
malloc_usable_size(), and free().
2010-09-18 06:46:18 +08:00
|
|
|
|
|
|
|
if (ret == NULL) {
|
|
|
|
if (extra == 0)
|
|
|
|
return (NULL);
|
|
|
|
/* Try again, this time without extra. */
|
2014-10-10 08:54:06 +08:00
|
|
|
if (alignment > chunksize) {
|
|
|
|
ret = huge_palloc(tsd, arena, size, alignment, zero,
|
2015-01-30 07:30:47 +08:00
|
|
|
tcache);
|
|
|
|
} else
|
|
|
|
ret = huge_malloc(tsd, arena, size, zero, tcache);
|
Add {,r,s,d}allocm().
Add allocm(), rallocm(), sallocm(), and dallocm(), which are a
functional superset of malloc(), calloc(), posix_memalign(),
malloc_usable_size(), and free().
2010-09-18 06:46:18 +08:00
|
|
|
|
|
|
|
if (ret == NULL)
|
|
|
|
return (NULL);
|
|
|
|
}
|
2010-01-17 01:53:50 +08:00
|
|
|
|
Add {,r,s,d}allocm().
Add allocm(), rallocm(), sallocm(), and dallocm(), which are a
functional superset of malloc(), calloc(), posix_memalign(),
malloc_usable_size(), and free().
2010-09-18 06:46:18 +08:00
|
|
|
/*
|
|
|
|
* Copy at most size bytes (not size+extra), since the caller has no
|
|
|
|
* expectation that the extra bytes will be reliably preserved.
|
|
|
|
*/
|
2010-01-17 01:53:50 +08:00
|
|
|
copysize = (size < oldsize) ? size : oldsize;
|
2014-05-16 13:22:27 +08:00
|
|
|
memcpy(ret, ptr, copysize);
|
2015-01-30 07:30:47 +08:00
|
|
|
isqalloc(tsd, ptr, oldsize, tcache);
|
2010-01-17 01:53:50 +08:00
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
2015-01-30 07:30:47 +08:00
|
|
|
huge_dalloc(tsd_t *tsd, void *ptr, tcache_t *tcache)
|
2010-01-17 01:53:50 +08:00
|
|
|
{
|
2014-11-28 03:22:36 +08:00
|
|
|
extent_node_t *node;
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
arena_t *arena;
|
2010-01-17 01:53:50 +08:00
|
|
|
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
node = huge_node_get(ptr);
|
2015-02-16 10:04:46 +08:00
|
|
|
arena = extent_node_arena_get(node);
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
huge_node_unset(ptr, node);
|
|
|
|
malloc_mutex_lock(&arena->huge_mtx);
|
2015-02-16 08:43:52 +08:00
|
|
|
ql_remove(&arena->huge, node, ql_link);
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
malloc_mutex_unlock(&arena->huge_mtx);
|
2010-01-17 01:53:50 +08:00
|
|
|
|
2015-02-16 10:04:46 +08:00
|
|
|
huge_dalloc_junk(extent_node_addr_get(node),
|
|
|
|
extent_node_size_get(node));
|
|
|
|
arena_chunk_dalloc_huge(extent_node_arena_get(node),
|
|
|
|
extent_node_addr_get(node), extent_node_size_get(node));
|
2015-01-30 07:30:47 +08:00
|
|
|
idalloctm(tsd, node, tcache, true);
|
2010-01-17 01:53:50 +08:00
|
|
|
}
|
|
|
|
|
2014-11-28 03:22:36 +08:00
|
|
|
arena_t *
|
|
|
|
huge_aalloc(const void *ptr)
|
2010-01-17 01:53:50 +08:00
|
|
|
{
|
|
|
|
|
2015-02-16 10:04:46 +08:00
|
|
|
return (extent_node_arena_get(huge_node_get(ptr)));
|
2014-11-28 03:22:36 +08:00
|
|
|
}
|
2010-01-17 01:53:50 +08:00
|
|
|
|
2014-11-28 03:22:36 +08:00
|
|
|
size_t
|
|
|
|
huge_salloc(const void *ptr)
|
|
|
|
{
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
size_t size;
|
|
|
|
extent_node_t *node;
|
|
|
|
arena_t *arena;
|
2010-01-17 01:53:50 +08:00
|
|
|
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
node = huge_node_get(ptr);
|
2015-02-16 10:04:46 +08:00
|
|
|
arena = extent_node_arena_get(node);
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
malloc_mutex_lock(&arena->huge_mtx);
|
2015-02-16 10:04:46 +08:00
|
|
|
size = extent_node_size_get(node);
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
malloc_mutex_unlock(&arena->huge_mtx);
|
|
|
|
|
|
|
|
return (size);
|
2010-01-17 01:53:50 +08:00
|
|
|
}
|
|
|
|
|
2014-08-19 07:22:13 +08:00
|
|
|
prof_tctx_t *
|
|
|
|
huge_prof_tctx_get(const void *ptr)
|
2010-02-11 02:37:56 +08:00
|
|
|
{
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
prof_tctx_t *tctx;
|
|
|
|
extent_node_t *node;
|
|
|
|
arena_t *arena;
|
2010-02-11 02:37:56 +08:00
|
|
|
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
node = huge_node_get(ptr);
|
2015-02-16 10:04:46 +08:00
|
|
|
arena = extent_node_arena_get(node);
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
malloc_mutex_lock(&arena->huge_mtx);
|
2015-02-16 10:04:46 +08:00
|
|
|
tctx = extent_node_prof_tctx_get(node);
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
malloc_mutex_unlock(&arena->huge_mtx);
|
2012-03-14 07:31:41 +08:00
|
|
|
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
return (tctx);
|
2012-03-14 07:31:41 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
huge_prof_tctx_set(const void *ptr, prof_tctx_t *tctx)
|
2012-03-14 07:31:41 +08:00
|
|
|
{
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
extent_node_t *node;
|
|
|
|
arena_t *arena;
|
2012-03-14 07:31:41 +08:00
|
|
|
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
node = huge_node_get(ptr);
|
2015-02-16 10:04:46 +08:00
|
|
|
arena = extent_node_arena_get(node);
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
malloc_mutex_lock(&arena->huge_mtx);
|
2015-02-16 10:04:46 +08:00
|
|
|
extent_node_prof_tctx_set(node, tctx);
|
Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 04:24:27 +08:00
|
|
|
malloc_mutex_unlock(&arena->huge_mtx);
|
2012-03-14 07:31:41 +08:00
|
|
|
}
|