Update ChangeLog.
This commit is contained in:
parent
04ca7580db
commit
54673fd8d7
151
ChangeLog
151
ChangeLog
@ -5,6 +5,155 @@ found in the git revision history:
|
|||||||
|
|
||||||
https://github.com/jemalloc/jemalloc
|
https://github.com/jemalloc/jemalloc
|
||||||
|
|
||||||
|
* 4.0.0 (XXX) See https://github.com/jemalloc/jemalloc/milestones/4.0.0 for
|
||||||
|
remaining work.
|
||||||
|
|
||||||
|
This version contains many speed and space optimizations, both minor and
|
||||||
|
major. The major themes are generalization, unification, and simplification.
|
||||||
|
Although many of these optimizations cause no visible behavior change, their
|
||||||
|
cumulative effect is substantial.
|
||||||
|
|
||||||
|
New features:
|
||||||
|
- Normalize size class spacing to be consistent across the complete size
|
||||||
|
range. By default there are four size classes per size doubling, but this
|
||||||
|
is now configurable via the --with-lg-size-class-group option. Also add the
|
||||||
|
--with-lg-page, --with-lg-page-sizes, --with-lg-quantum, and
|
||||||
|
--with-lg-tiny-min options, which can be used to tweak page and size class
|
||||||
|
settings. Impacts:
|
||||||
|
+ Worst case performance for incrementally growing/shrinking reallocation
|
||||||
|
is improved because there are far fewer size classes, and therefore
|
||||||
|
copying happens less often.
|
||||||
|
+ Internal fragmentation is limited to 20% for all but the smallest size
|
||||||
|
classes (those less than four times the quantum). (1B + 4 KiB)
|
||||||
|
and (1B + 4 MiB) previously suffered nearly 50% internal fragmentation.
|
||||||
|
+ Chunk fragmentation tends to be lower because there are fewer distinct run
|
||||||
|
sizes to pack.
|
||||||
|
- Add support for explicit tcaches. The "tcache.create", "tcache.flush", and
|
||||||
|
"tcache.destroy" mallctls control tcache lifetime and flushing, and the
|
||||||
|
MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to the *allocx() API
|
||||||
|
control which tcache is used for each operation.
|
||||||
|
- Implement per thread heap profiling, as well as the ability to
|
||||||
|
enable/disable heap profiling on a per thread basis. Add the "prof.reset",
|
||||||
|
"prof.lg_sample", "thread.prof.name", "thread.prof.active",
|
||||||
|
"opt.prof_thread_active_init", "prof.thread_active_init", and
|
||||||
|
"thread.prof.active" mallctls.
|
||||||
|
- Add support for per arena application-specified chunk allocators, configured
|
||||||
|
via the "arena<i>.chunk.alloc" and "arena<i>.chunk.dalloc" mallctls.
|
||||||
|
- Refactor huge allocation to be managed by arenas, so that arenas now
|
||||||
|
function as general purpose independent allocators. This is important in
|
||||||
|
the context of user-specified chunk allocators, aside from the scalability
|
||||||
|
benefits. Related new statistics:
|
||||||
|
+ The "stats.arenas.<i>.huge.allocated", "stats.arenas.<i>.huge.nmalloc",
|
||||||
|
"stats.arenas.<i>.huge.ndalloc", and "stats.arenas.<i>.huge.nrequests"
|
||||||
|
mallctls provide high level per arena huge allocation statistics.
|
||||||
|
+ The "arenas.nhchunks", "arenas.hchunks.<i>.size",
|
||||||
|
"stats.arenas.<i>.hchunks.<j>.nmalloc",
|
||||||
|
"stats.arenas.<i>.hchunks.<j>.ndalloc",
|
||||||
|
"stats.arenas.<i>.hchunks.<j>.nrequests", and
|
||||||
|
"stats.arenas.<i>.hchunks.<j>.curhchunks" mallctls provide per size class
|
||||||
|
statistics.
|
||||||
|
- Add the 'util' column to malloc_stats_print() output, which reports the
|
||||||
|
proportion of available regions that are currently in use for each small
|
||||||
|
size class.
|
||||||
|
- Add "alloc" and "free" modes for for junk filling (see the "opt.junk"
|
||||||
|
mallctl), so that it is possible to separately enable junk filling for
|
||||||
|
allocation versus deallocation.
|
||||||
|
- Add the jemalloc-config script, which provides information about how
|
||||||
|
jemalloc was configured, and how to integrate it into application builds.
|
||||||
|
- Add metadata statistics, which are accessible via the "stats.metadata",
|
||||||
|
"stats.arenas.<i>.metadata.mapped", and
|
||||||
|
"stats.arenas.<i>.metadata.allocated" mallctls.
|
||||||
|
- Add the "prof.gdump" mallctl, which makes it possible to toggle the gdump
|
||||||
|
feature on/off during program execution.
|
||||||
|
- Add sdallocx(), which implements sized deallocation. The primary
|
||||||
|
optimization over dallocx() is the removal of a metadata read, which often
|
||||||
|
suffers an L1 cache miss.
|
||||||
|
- Add missing header includes in jemalloc/jemalloc.h, so that applications
|
||||||
|
only have to #include <jemalloc/jemalloc.h>.
|
||||||
|
- Add support for additional platforms:
|
||||||
|
+ Bitrig
|
||||||
|
+ Cygwin
|
||||||
|
+ DragonFlyBSD
|
||||||
|
+ iOS
|
||||||
|
+ OpenBSD
|
||||||
|
+ OpenRISC/or1k
|
||||||
|
|
||||||
|
Optimizations:
|
||||||
|
- Switch run and chunk allocation from first-best-fit (among best-fit
|
||||||
|
candidates, choose the lowest in memory) to first-fit (among all candidates,
|
||||||
|
choose the lowest in memory). This tends to reduce chunk and virtual memory
|
||||||
|
fragmentation, respectively.
|
||||||
|
- Maintain dirty runs in per arena LRUs rather than in per arena trees of
|
||||||
|
dirty-run-containing chunks. In practice this change significantly reduces
|
||||||
|
dirty page purging volume.
|
||||||
|
- Integrate whole chunks into the unused dirty page purging machinery. This
|
||||||
|
reduces the cost of repeated huge allocation/deallocation, because it
|
||||||
|
effectively introduces a cache of chunks.
|
||||||
|
- Split the arena chunk map into two separate arrays, in order to increase
|
||||||
|
cache locality for the frequently accessed bits.
|
||||||
|
- Move small run metadata out of runs, into arena chunk headers. This reduces
|
||||||
|
run fragmentation, smaller runs reduce external fragmentation for small size
|
||||||
|
classes, and packed (less uniformly aligned) metadata layout improves CPU
|
||||||
|
cache set distribution.
|
||||||
|
- Micro-optimize the fast paths for the public API functions.
|
||||||
|
- Refactor thread-specific data to reside in a single structure. This assures
|
||||||
|
that only a single TLS read is necessary per call into the public API.
|
||||||
|
- Implement in-place huge allocation growing and shrinking.
|
||||||
|
- Refactor rtree (radix tree for chunk lookups) to be lock-free, and make
|
||||||
|
additional optimizations that reduce maximum lookup depth to one or two
|
||||||
|
levels. This resolves what was a concurrency bottleneck for per arena huge
|
||||||
|
allocation, because a global data structure is critical for determining
|
||||||
|
which arenas own which huge allocations.
|
||||||
|
|
||||||
|
Incompatible changes:
|
||||||
|
- Replace --enable-cc-silence with --disable-cc-silence to suppress spurious
|
||||||
|
warnings by default.
|
||||||
|
- Assure that the constness of malloc_usable_size()'s return type matches that
|
||||||
|
of the system implementation.
|
||||||
|
- Change the heap profile dump format to support per thread heap profiling,
|
||||||
|
and enhance pprof with the --thread=<n> option. As a result, the bundled
|
||||||
|
pprof must now be used rather than the upstream (gperftools) pprof.
|
||||||
|
- Disable "opt.prof_final" by default, in order to avoid atexit(3), which can
|
||||||
|
internally deadlock on some platforms.
|
||||||
|
- Change the "arenas.nlruns" mallctl type from size_t to unsigned.
|
||||||
|
- Replace the "stats.arenas.<i>.bins.<j>.allocated" mallctl with
|
||||||
|
"stats.arenas.<i>.bins.<j>.curregs".
|
||||||
|
- Ignore MALLOC_CONF in set{uid,gid,cap} binaries.
|
||||||
|
- Ignore MALLOCX_ARENA(a) in dallocx(), in favor of using the
|
||||||
|
MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to control tcache usage.
|
||||||
|
|
||||||
|
Removed features:
|
||||||
|
- Remove the *allocm() API, which is superseded by the *allocx() API.
|
||||||
|
- Remove the --enable-dss options, and make dss non-optional on all platforms
|
||||||
|
which support sbrk(2).
|
||||||
|
- Remove the "arenas.purge" mallctl, which was obsoleted by the
|
||||||
|
"arena.<i>.purge" mallctl in 3.1.0.
|
||||||
|
- Remove the unnecessary "opt.valgrind" mallctl; jemalloc automatically
|
||||||
|
detects whether it is running inside Valgrind.
|
||||||
|
- Remove the "stats.huge.allocated", "stats.huge.nmalloc", and
|
||||||
|
"stats.huge.ndalloc" mallctls.
|
||||||
|
- Remove the --enable-mremap option.
|
||||||
|
- Remove the --enable-ivsalloc option, and merge its functionality into
|
||||||
|
--enable-debug.
|
||||||
|
- Remove the "stats.chunks.current", "stats.chunks.total", and
|
||||||
|
"stats.chunks.high" mallctls.
|
||||||
|
|
||||||
|
Bug fixes:
|
||||||
|
- Fix the cactive statistic to decrease (rather than increase) when active
|
||||||
|
memory decreases. This regression was first released in 3.5.0.
|
||||||
|
- Fix OOM handling in memalign() and valloc(). A variant of this bug existed
|
||||||
|
in all releases since 2.0.0, which introduced these functions.
|
||||||
|
- Fix the "arena.<i>.dss" mallctl to return an error if "primary" or
|
||||||
|
"secondary" precedence is specified, but sbrk(2) is not supported.
|
||||||
|
- Fix fallback lg_floor() implementations to handle extremely large inputs.
|
||||||
|
- Ensure the default purgeable zone is after the default zone on OS X.
|
||||||
|
- Fix latent bugs in atomic_*().
|
||||||
|
- Fix the "arena.<i>.dss" mallctl to handle read-only calls.
|
||||||
|
- Fix tls_model configuration to enable the initial-exec model when possible.
|
||||||
|
- Mark malloc_conf as a weak symbol so that the application can override it.
|
||||||
|
- Correctly detect glibc's adaptive pthread mutexes.
|
||||||
|
- Fix the --without-export configure option.
|
||||||
|
|
||||||
* 3.6.0 (March 31, 2014)
|
* 3.6.0 (March 31, 2014)
|
||||||
|
|
||||||
This version contains a critical bug fix for a regression present in 3.5.0 and
|
This version contains a critical bug fix for a regression present in 3.5.0 and
|
||||||
@ -21,7 +170,7 @@ found in the git revision history:
|
|||||||
backtracing to be reliable.
|
backtracing to be reliable.
|
||||||
- Use dss allocation precedence for huge allocations as well as small/large
|
- Use dss allocation precedence for huge allocations as well as small/large
|
||||||
allocations.
|
allocations.
|
||||||
- Fix test assertion failure message formatting. This bug did not manifect on
|
- Fix test assertion failure message formatting. This bug did not manifest on
|
||||||
x86_64 systems because of implementation subtleties in va_list.
|
x86_64 systems because of implementation subtleties in va_list.
|
||||||
- Fix inconsequential test failures for hash and SFMT code.
|
- Fix inconsequential test failures for hash and SFMT code.
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user