Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
/******************************************************************************/
|
|
|
|
#ifdef JEMALLOC_H_TYPES
|
|
|
|
|
|
|
|
/* Maximum bitmap bit count is 2^LG_BITMAP_MAXBITS. */
|
2016-05-30 09:34:50 +08:00
|
|
|
#define LG_BITMAP_MAXBITS LG_SLAB_MAXREGS
|
2014-09-29 05:43:11 +08:00
|
|
|
#define BITMAP_MAXBITS (ZU(1) << LG_BITMAP_MAXBITS)
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
|
|
|
|
typedef struct bitmap_level_s bitmap_level_t;
|
|
|
|
typedef struct bitmap_info_s bitmap_info_t;
|
|
|
|
typedef unsigned long bitmap_t;
|
|
|
|
#define LG_SIZEOF_BITMAP LG_SIZEOF_LONG
|
|
|
|
|
|
|
|
/* Number of bits per group. */
|
|
|
|
#define LG_BITMAP_GROUP_NBITS (LG_SIZEOF_BITMAP + 3)
|
2016-04-07 09:30:02 +08:00
|
|
|
#define BITMAP_GROUP_NBITS (1U << LG_BITMAP_GROUP_NBITS)
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
#define BITMAP_GROUP_NBITS_MASK (BITMAP_GROUP_NBITS-1)
|
|
|
|
|
2016-02-25 00:04:43 +08:00
|
|
|
/*
|
|
|
|
* Do some analysis on how big the bitmap is before we use a tree. For a brute
|
2016-04-07 02:54:44 +08:00
|
|
|
* force linear search, if we would have to call ffs_lu() more than 2^3 times,
|
|
|
|
* use a tree instead.
|
2016-02-25 00:04:43 +08:00
|
|
|
*/
|
2016-02-27 06:43:39 +08:00
|
|
|
#if LG_BITMAP_MAXBITS - LG_BITMAP_GROUP_NBITS > 3
|
2016-04-07 09:30:02 +08:00
|
|
|
# define BITMAP_USE_TREE
|
2016-02-25 00:04:43 +08:00
|
|
|
#endif
|
|
|
|
|
2014-09-29 05:43:11 +08:00
|
|
|
/* Number of groups required to store a given number of bits. */
|
|
|
|
#define BITMAP_BITS2GROUPS(nbits) \
|
2016-04-07 09:30:02 +08:00
|
|
|
(((nbits) + BITMAP_GROUP_NBITS_MASK) >> LG_BITMAP_GROUP_NBITS)
|
2014-09-29 05:43:11 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Number of groups required at a particular level for a given number of bits.
|
|
|
|
*/
|
|
|
|
#define BITMAP_GROUPS_L0(nbits) \
|
|
|
|
BITMAP_BITS2GROUPS(nbits)
|
|
|
|
#define BITMAP_GROUPS_L1(nbits) \
|
|
|
|
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS(nbits))
|
|
|
|
#define BITMAP_GROUPS_L2(nbits) \
|
|
|
|
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS((nbits))))
|
|
|
|
#define BITMAP_GROUPS_L3(nbits) \
|
|
|
|
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS( \
|
|
|
|
BITMAP_BITS2GROUPS((nbits)))))
|
2016-04-07 09:30:02 +08:00
|
|
|
#define BITMAP_GROUPS_L4(nbits) \
|
|
|
|
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS( \
|
|
|
|
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS((nbits))))))
|
2014-09-29 05:43:11 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Assuming the number of levels, number of groups required for a given number
|
|
|
|
* of bits.
|
|
|
|
*/
|
|
|
|
#define BITMAP_GROUPS_1_LEVEL(nbits) \
|
|
|
|
BITMAP_GROUPS_L0(nbits)
|
|
|
|
#define BITMAP_GROUPS_2_LEVEL(nbits) \
|
|
|
|
(BITMAP_GROUPS_1_LEVEL(nbits) + BITMAP_GROUPS_L1(nbits))
|
|
|
|
#define BITMAP_GROUPS_3_LEVEL(nbits) \
|
|
|
|
(BITMAP_GROUPS_2_LEVEL(nbits) + BITMAP_GROUPS_L2(nbits))
|
|
|
|
#define BITMAP_GROUPS_4_LEVEL(nbits) \
|
|
|
|
(BITMAP_GROUPS_3_LEVEL(nbits) + BITMAP_GROUPS_L3(nbits))
|
2016-04-07 09:30:02 +08:00
|
|
|
#define BITMAP_GROUPS_5_LEVEL(nbits) \
|
|
|
|
(BITMAP_GROUPS_4_LEVEL(nbits) + BITMAP_GROUPS_L4(nbits))
|
2014-09-29 05:43:11 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Maximum number of groups required to support LG_BITMAP_MAXBITS.
|
|
|
|
*/
|
2016-04-07 09:30:02 +08:00
|
|
|
#ifdef BITMAP_USE_TREE
|
2016-02-25 00:04:43 +08:00
|
|
|
|
2014-09-29 05:43:11 +08:00
|
|
|
#if LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS
|
|
|
|
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_1_LEVEL(BITMAP_MAXBITS)
|
|
|
|
#elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 2
|
|
|
|
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_2_LEVEL(BITMAP_MAXBITS)
|
|
|
|
#elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 3
|
|
|
|
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_3_LEVEL(BITMAP_MAXBITS)
|
|
|
|
#elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 4
|
|
|
|
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_4_LEVEL(BITMAP_MAXBITS)
|
2016-04-07 09:30:02 +08:00
|
|
|
#elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 5
|
|
|
|
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_5_LEVEL(BITMAP_MAXBITS)
|
2014-09-29 05:43:11 +08:00
|
|
|
#else
|
|
|
|
# error "Unsupported bitmap size"
|
|
|
|
#endif
|
|
|
|
|
2016-04-07 09:30:02 +08:00
|
|
|
/*
|
|
|
|
* Maximum number of levels possible. This could be statically computed based
|
|
|
|
* on LG_BITMAP_MAXBITS:
|
|
|
|
*
|
|
|
|
* #define BITMAP_MAX_LEVELS \
|
|
|
|
* (LG_BITMAP_MAXBITS / LG_SIZEOF_BITMAP) \
|
|
|
|
* + !!(LG_BITMAP_MAXBITS % LG_SIZEOF_BITMAP)
|
|
|
|
*
|
|
|
|
* However, that would not allow the generic BITMAP_INFO_INITIALIZER() macro, so
|
|
|
|
* instead hardcode BITMAP_MAX_LEVELS to the largest number supported by the
|
|
|
|
* various cascading macros. The only additional cost this incurs is some
|
|
|
|
* unused trailing entries in bitmap_info_t structures; the bitmaps themselves
|
|
|
|
* are not impacted.
|
|
|
|
*/
|
|
|
|
#define BITMAP_MAX_LEVELS 5
|
|
|
|
|
|
|
|
#define BITMAP_INFO_INITIALIZER(nbits) { \
|
|
|
|
/* nbits. */ \
|
|
|
|
nbits, \
|
|
|
|
/* nlevels. */ \
|
|
|
|
(BITMAP_GROUPS_L0(nbits) > BITMAP_GROUPS_L1(nbits)) + \
|
|
|
|
(BITMAP_GROUPS_L1(nbits) > BITMAP_GROUPS_L2(nbits)) + \
|
|
|
|
(BITMAP_GROUPS_L2(nbits) > BITMAP_GROUPS_L3(nbits)) + \
|
|
|
|
(BITMAP_GROUPS_L3(nbits) > BITMAP_GROUPS_L4(nbits)) + 1, \
|
|
|
|
/* levels. */ \
|
|
|
|
{ \
|
|
|
|
{0}, \
|
|
|
|
{BITMAP_GROUPS_L0(nbits)}, \
|
|
|
|
{BITMAP_GROUPS_L1(nbits) + BITMAP_GROUPS_L0(nbits)}, \
|
|
|
|
{BITMAP_GROUPS_L2(nbits) + BITMAP_GROUPS_L1(nbits) + \
|
|
|
|
BITMAP_GROUPS_L0(nbits)}, \
|
|
|
|
{BITMAP_GROUPS_L3(nbits) + BITMAP_GROUPS_L2(nbits) + \
|
|
|
|
BITMAP_GROUPS_L1(nbits) + BITMAP_GROUPS_L0(nbits)}, \
|
|
|
|
{BITMAP_GROUPS_L4(nbits) + BITMAP_GROUPS_L3(nbits) + \
|
|
|
|
BITMAP_GROUPS_L2(nbits) + BITMAP_GROUPS_L1(nbits) \
|
|
|
|
+ BITMAP_GROUPS_L0(nbits)} \
|
|
|
|
} \
|
|
|
|
}
|
|
|
|
|
|
|
|
#else /* BITMAP_USE_TREE */
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
|
2016-04-07 09:30:02 +08:00
|
|
|
#define BITMAP_GROUPS_MAX BITMAP_BITS2GROUPS(BITMAP_MAXBITS)
|
2016-02-25 00:04:43 +08:00
|
|
|
|
2016-04-07 09:30:02 +08:00
|
|
|
#define BITMAP_INFO_INITIALIZER(nbits) { \
|
|
|
|
/* nbits. */ \
|
|
|
|
nbits, \
|
|
|
|
/* ngroups. */ \
|
|
|
|
BITMAP_BITS2GROUPS(nbits) \
|
|
|
|
}
|
2016-02-25 00:04:43 +08:00
|
|
|
|
2016-04-07 09:30:02 +08:00
|
|
|
#endif /* BITMAP_USE_TREE */
|
2016-02-25 00:04:43 +08:00
|
|
|
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
#endif /* JEMALLOC_H_TYPES */
|
|
|
|
/******************************************************************************/
|
|
|
|
#ifdef JEMALLOC_H_STRUCTS
|
|
|
|
|
|
|
|
struct bitmap_level_s {
|
|
|
|
/* Offset of this level's groups within the array of groups. */
|
|
|
|
size_t group_offset;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct bitmap_info_s {
|
|
|
|
/* Logical number of bits in bitmap (stored at bottom level). */
|
|
|
|
size_t nbits;
|
|
|
|
|
2016-04-07 09:30:02 +08:00
|
|
|
#ifdef BITMAP_USE_TREE
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
/* Number of levels necessary for nbits. */
|
|
|
|
unsigned nlevels;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Only the first (nlevels+1) elements are used, and levels are ordered
|
|
|
|
* bottom to top (e.g. the bottom level is stored in levels[0]).
|
|
|
|
*/
|
|
|
|
bitmap_level_t levels[BITMAP_MAX_LEVELS+1];
|
2016-04-07 09:30:02 +08:00
|
|
|
#else /* BITMAP_USE_TREE */
|
2016-02-25 00:04:43 +08:00
|
|
|
/* Number of groups necessary for nbits. */
|
|
|
|
size_t ngroups;
|
2016-04-07 09:30:02 +08:00
|
|
|
#endif /* BITMAP_USE_TREE */
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
#endif /* JEMALLOC_H_STRUCTS */
|
|
|
|
/******************************************************************************/
|
|
|
|
#ifdef JEMALLOC_H_EXTERNS
|
|
|
|
|
|
|
|
void bitmap_info_init(bitmap_info_t *binfo, size_t nbits);
|
|
|
|
void bitmap_init(bitmap_t *bitmap, const bitmap_info_t *binfo);
|
2016-02-27 05:59:41 +08:00
|
|
|
size_t bitmap_size(const bitmap_info_t *binfo);
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
|
|
|
|
#endif /* JEMALLOC_H_EXTERNS */
|
|
|
|
/******************************************************************************/
|
|
|
|
#ifdef JEMALLOC_H_INLINES
|
|
|
|
|
|
|
|
#ifndef JEMALLOC_ENABLE_INLINE
|
|
|
|
bool bitmap_full(bitmap_t *bitmap, const bitmap_info_t *binfo);
|
|
|
|
bool bitmap_get(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit);
|
|
|
|
void bitmap_set(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit);
|
|
|
|
size_t bitmap_sfu(bitmap_t *bitmap, const bitmap_info_t *binfo);
|
|
|
|
void bitmap_unset(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit);
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#if (defined(JEMALLOC_ENABLE_INLINE) || defined(JEMALLOC_BITMAP_C_))
|
|
|
|
JEMALLOC_INLINE bool
|
|
|
|
bitmap_full(bitmap_t *bitmap, const bitmap_info_t *binfo)
|
|
|
|
{
|
2016-04-07 09:30:02 +08:00
|
|
|
#ifdef BITMAP_USE_TREE
|
2016-02-25 04:42:23 +08:00
|
|
|
size_t rgoff = binfo->levels[binfo->nlevels].group_offset - 1;
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
bitmap_t rg = bitmap[rgoff];
|
|
|
|
/* The bitmap is full iff the root group is 0. */
|
|
|
|
return (rg == 0);
|
2016-02-25 00:04:43 +08:00
|
|
|
#else
|
|
|
|
size_t i;
|
|
|
|
|
|
|
|
for (i = 0; i < binfo->ngroups; i++) {
|
|
|
|
if (bitmap[i] != 0)
|
|
|
|
return (false);
|
|
|
|
}
|
|
|
|
return (true);
|
|
|
|
#endif
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
JEMALLOC_INLINE bool
|
|
|
|
bitmap_get(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit)
|
|
|
|
{
|
|
|
|
size_t goff;
|
|
|
|
bitmap_t g;
|
|
|
|
|
|
|
|
assert(bit < binfo->nbits);
|
|
|
|
goff = bit >> LG_BITMAP_GROUP_NBITS;
|
|
|
|
g = bitmap[goff];
|
2016-02-27 05:59:41 +08:00
|
|
|
return (!(g & (ZU(1) << (bit & BITMAP_GROUP_NBITS_MASK))));
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
JEMALLOC_INLINE void
|
|
|
|
bitmap_set(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit)
|
|
|
|
{
|
|
|
|
size_t goff;
|
|
|
|
bitmap_t *gp;
|
|
|
|
bitmap_t g;
|
|
|
|
|
|
|
|
assert(bit < binfo->nbits);
|
2014-10-04 01:16:09 +08:00
|
|
|
assert(!bitmap_get(bitmap, binfo, bit));
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
goff = bit >> LG_BITMAP_GROUP_NBITS;
|
|
|
|
gp = &bitmap[goff];
|
|
|
|
g = *gp;
|
2016-02-27 05:59:41 +08:00
|
|
|
assert(g & (ZU(1) << (bit & BITMAP_GROUP_NBITS_MASK)));
|
|
|
|
g ^= ZU(1) << (bit & BITMAP_GROUP_NBITS_MASK);
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
*gp = g;
|
|
|
|
assert(bitmap_get(bitmap, binfo, bit));
|
2016-04-07 09:30:02 +08:00
|
|
|
#ifdef BITMAP_USE_TREE
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
/* Propagate group state transitions up the tree. */
|
|
|
|
if (g == 0) {
|
|
|
|
unsigned i;
|
|
|
|
for (i = 1; i < binfo->nlevels; i++) {
|
|
|
|
bit = goff;
|
|
|
|
goff = bit >> LG_BITMAP_GROUP_NBITS;
|
|
|
|
gp = &bitmap[binfo->levels[i].group_offset + goff];
|
|
|
|
g = *gp;
|
2016-02-27 05:59:41 +08:00
|
|
|
assert(g & (ZU(1) << (bit & BITMAP_GROUP_NBITS_MASK)));
|
|
|
|
g ^= ZU(1) << (bit & BITMAP_GROUP_NBITS_MASK);
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
*gp = g;
|
|
|
|
if (g != 0)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2016-02-25 00:04:43 +08:00
|
|
|
#endif
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* sfu: set first unset. */
|
|
|
|
JEMALLOC_INLINE size_t
|
|
|
|
bitmap_sfu(bitmap_t *bitmap, const bitmap_info_t *binfo)
|
|
|
|
{
|
|
|
|
size_t bit;
|
|
|
|
bitmap_t g;
|
|
|
|
unsigned i;
|
|
|
|
|
2014-10-04 01:16:09 +08:00
|
|
|
assert(!bitmap_full(bitmap, binfo));
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
|
2016-04-07 09:30:02 +08:00
|
|
|
#ifdef BITMAP_USE_TREE
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
i = binfo->nlevels - 1;
|
|
|
|
g = bitmap[binfo->levels[i].group_offset];
|
2016-02-25 02:32:45 +08:00
|
|
|
bit = ffs_lu(g) - 1;
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
while (i > 0) {
|
|
|
|
i--;
|
|
|
|
g = bitmap[binfo->levels[i].group_offset + bit];
|
2016-02-25 02:32:45 +08:00
|
|
|
bit = (bit << LG_BITMAP_GROUP_NBITS) + (ffs_lu(g) - 1);
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
}
|
2016-02-25 00:04:43 +08:00
|
|
|
#else
|
|
|
|
i = 0;
|
|
|
|
g = bitmap[0];
|
|
|
|
while ((bit = ffs_lu(g)) == 0) {
|
|
|
|
i++;
|
|
|
|
g = bitmap[i];
|
|
|
|
}
|
2016-04-07 01:38:47 +08:00
|
|
|
bit = (i << LG_BITMAP_GROUP_NBITS) + (bit - 1);
|
2016-02-25 00:04:43 +08:00
|
|
|
#endif
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
bitmap_set(bitmap, binfo, bit);
|
|
|
|
return (bit);
|
|
|
|
}
|
|
|
|
|
|
|
|
JEMALLOC_INLINE void
|
|
|
|
bitmap_unset(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit)
|
|
|
|
{
|
|
|
|
size_t goff;
|
|
|
|
bitmap_t *gp;
|
|
|
|
bitmap_t g;
|
2016-02-25 00:04:43 +08:00
|
|
|
UNUSED bool propagate;
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
|
|
|
|
assert(bit < binfo->nbits);
|
|
|
|
assert(bitmap_get(bitmap, binfo, bit));
|
|
|
|
goff = bit >> LG_BITMAP_GROUP_NBITS;
|
|
|
|
gp = &bitmap[goff];
|
|
|
|
g = *gp;
|
|
|
|
propagate = (g == 0);
|
2016-02-27 05:59:41 +08:00
|
|
|
assert((g & (ZU(1) << (bit & BITMAP_GROUP_NBITS_MASK))) == 0);
|
|
|
|
g ^= ZU(1) << (bit & BITMAP_GROUP_NBITS_MASK);
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
*gp = g;
|
2014-10-04 01:16:09 +08:00
|
|
|
assert(!bitmap_get(bitmap, binfo, bit));
|
2016-04-07 09:30:02 +08:00
|
|
|
#ifdef BITMAP_USE_TREE
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
/* Propagate group state transitions up the tree. */
|
|
|
|
if (propagate) {
|
|
|
|
unsigned i;
|
|
|
|
for (i = 1; i < binfo->nlevels; i++) {
|
|
|
|
bit = goff;
|
|
|
|
goff = bit >> LG_BITMAP_GROUP_NBITS;
|
|
|
|
gp = &bitmap[binfo->levels[i].group_offset + goff];
|
|
|
|
g = *gp;
|
|
|
|
propagate = (g == 0);
|
2016-02-27 05:59:41 +08:00
|
|
|
assert((g & (ZU(1) << (bit & BITMAP_GROUP_NBITS_MASK)))
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
== 0);
|
2016-02-27 05:59:41 +08:00
|
|
|
g ^= ZU(1) << (bit & BITMAP_GROUP_NBITS_MASK);
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
*gp = g;
|
2014-10-04 01:16:09 +08:00
|
|
|
if (!propagate)
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2016-04-07 09:30:02 +08:00
|
|
|
#endif /* BITMAP_USE_TREE */
|
Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
2011-03-17 01:30:13 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#endif /* JEMALLOC_H_INLINES */
|
|
|
|
/******************************************************************************/
|