Update ChangeLog for 5.0.0.
This commit is contained in:
parent
bff8db439c
commit
aae8fd95fb
187
ChangeLog
187
ChangeLog
@ -4,6 +4,193 @@ brevity. Much more detail can be found in the git revision history:
|
|||||||
|
|
||||||
https://github.com/jemalloc/jemalloc
|
https://github.com/jemalloc/jemalloc
|
||||||
|
|
||||||
|
* 5.0.0 (June 13, 2017)
|
||||||
|
|
||||||
|
Unlike all previous jemalloc releases, this release does not use naturally
|
||||||
|
aligned "chunks" for virtual memory management, and instead uses page-aligned
|
||||||
|
"extents". This change has few externally visible effects, but the internal
|
||||||
|
impacts are... extensive. Many other internal changes combine to make this
|
||||||
|
the most cohesively designed version of jemalloc so far, with ample
|
||||||
|
opportunity for further enhancements.
|
||||||
|
|
||||||
|
Continuous integration is now an integral aspect of development thanks to the
|
||||||
|
efforts of @davidtgoldblatt, and the dev branch tends to remain reasonably
|
||||||
|
stable on the tested platforms (Linux, FreeBSD, macOS, and Windows). As a
|
||||||
|
side effect the official release frequency may decrease over time.
|
||||||
|
|
||||||
|
New features:
|
||||||
|
- Implement optional per-CPU arena support; threads choose which arena to use
|
||||||
|
based on current CPU rather than on fixed thread-->arena associations.
|
||||||
|
(@interwq)
|
||||||
|
- Implement two-phase decay of unused dirty pages. Pages transition from
|
||||||
|
dirty-->muzzy-->clean, where the first phase transition relies on
|
||||||
|
madvise(... MADV_FREE) semantics, and the second phase transition discards
|
||||||
|
pages such that they are replaced with demand-zeroed pages on next access.
|
||||||
|
(@jasone)
|
||||||
|
- Increase decay time resolution from seconds to milliseconds. (@jasone)
|
||||||
|
- Implement opt-in per CPU background threads, and use them for asynchronous
|
||||||
|
decay-driven unused dirty page purging. (@interwq)
|
||||||
|
- Add mutex profiling, which collects a variety of statistics useful for
|
||||||
|
diagnosing overhead/contention issues. (@interwq)
|
||||||
|
- Add C++ new/delete operator bindings. (@djwatson)
|
||||||
|
- Support manually created arena destruction, such that all data and metadata
|
||||||
|
are discarded. Add MALLCTL_ARENAS_DESTROYED for accessing merged stats
|
||||||
|
associated with destroyed arenas. (@jasone)
|
||||||
|
- Add MALLCTL_ARENAS_ALL as a fixed index for use in accessing
|
||||||
|
merged/destroyed arena statistics via mallctl. (@jasone)
|
||||||
|
- Add opt.abort_conf to optionally abort if invalid configuration options are
|
||||||
|
detected during initialization. (@interwq)
|
||||||
|
- Add opt.stats_print_opts, so that e.g. JSON output can be selected for the
|
||||||
|
stats dumped during exit if opt.stats_print is true. (@jasone)
|
||||||
|
- Add --with-version=VERSION for use when embedding jemalloc into another
|
||||||
|
project's git repository. (@jasone)
|
||||||
|
- Add --disable-thp to support cross compiling. (@jasone)
|
||||||
|
- Add --with-lg-hugepage to support cross compiling. (@jasone)
|
||||||
|
- Add mallctl interfaces (various authors):
|
||||||
|
+ background_thread
|
||||||
|
+ opt.abort_conf
|
||||||
|
+ opt.retain
|
||||||
|
+ opt.percpu_arena
|
||||||
|
+ opt.background_thread
|
||||||
|
+ opt.{dirty,muzzy}_decay_ms
|
||||||
|
+ opt.stats_print_opts
|
||||||
|
+ arena.<i>.initialized
|
||||||
|
+ arena.<i>.destroy
|
||||||
|
+ arena.<i>.{dirty,muzzy}_decay_ms
|
||||||
|
+ arena.<i>.extent_hooks
|
||||||
|
+ arenas.{dirty,muzzy}_decay_ms
|
||||||
|
+ arenas.bin.<i>.slab_size
|
||||||
|
+ arenas.nlextents
|
||||||
|
+ arenas.lextent.<i>.size
|
||||||
|
+ arenas.create
|
||||||
|
+ stats.background_thread.{num_threads,num_runs,run_interval}
|
||||||
|
+ stats.mutexes.{ctl,background_thread,prof,reset}.
|
||||||
|
{num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
|
||||||
|
num_owner_switch}
|
||||||
|
+ stats.arenas.<i>.{dirty,muzzy}_decay_ms
|
||||||
|
+ stats.arenas.<i>.uptime
|
||||||
|
+ stats.arenas.<i>.{pmuzzy,base,internal,resident}
|
||||||
|
+ stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged}
|
||||||
|
+ stats.arenas.<i>.bins.<j>.{nslabs,reslabs,curslabs}
|
||||||
|
+ stats.arenas.<i>.bins.<j>.mutex.
|
||||||
|
{num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
|
||||||
|
num_owner_switch}
|
||||||
|
+ stats.arenas.<i>.lextents.<j>.{nmalloc,ndalloc,nrequests,curlextents}
|
||||||
|
+ stats.arenas.i.mutexes.{large,extent_avail,extents_dirty,extents_muzzy,
|
||||||
|
extents_retained,decay_dirty,decay_muzzy,base,tcache_list}.
|
||||||
|
{num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
|
||||||
|
num_owner_switch}
|
||||||
|
|
||||||
|
Portability improvements:
|
||||||
|
- Improve reentrant allocation support, such that deadlock is less likely if
|
||||||
|
e.g. a system library call in turn allocates memory. (@davidtgoldblatt,
|
||||||
|
@interwq)
|
||||||
|
- Support static linking of jemalloc with glibc. (@djwatson)
|
||||||
|
|
||||||
|
Optimizations and refactors:
|
||||||
|
- Organize virtual memory as "extents" of virtual memory pages, rather than as
|
||||||
|
naturally aligned "chunks", and store all metadata in arbitrarily distant
|
||||||
|
locations. This reduces virtual memory external fragmentation, and will
|
||||||
|
interact better with huge pages (not yet explicitly supported). (@jasone)
|
||||||
|
- Fold large and huge size classes together; only small and large size classes
|
||||||
|
remain. (@jasone)
|
||||||
|
- Unify the allocation paths, and merge most fast-path branching decisions.
|
||||||
|
(@davidtgoldblatt, @interwq)
|
||||||
|
- Embed per thread automatic tcache into thread-specific data, which reduces
|
||||||
|
conditional branches and dereferences. Also reorganize tcache to increase
|
||||||
|
fast-path data locality. (@interwq)
|
||||||
|
- Rewrite atomics to closely model the C11 API, convert various
|
||||||
|
synchronization from mutex-based to atomic, and use the explicit memory
|
||||||
|
ordering control to resolve various hypothetical races without increasing
|
||||||
|
synchronization overhead. (@davidtgoldblatt)
|
||||||
|
- Extensively optimize rtree via various methods:
|
||||||
|
+ Add multiple layers of rtree lookup caching, since rtree lookups are now
|
||||||
|
part of fast-path deallocation. (@interwq)
|
||||||
|
+ Determine rtree layout at compile time. (@jasone)
|
||||||
|
+ Make the tree shallower for common configurations. (@jasone)
|
||||||
|
+ Embed the root node in the top-level rtree data structure, thus avoiding
|
||||||
|
one level of indirection. (@jasone)
|
||||||
|
+ Further specialize leaf elements as compared to internal node elements,
|
||||||
|
and directly embed extent metadata needed for fast-path deallocation.
|
||||||
|
(@jasone)
|
||||||
|
+ Ignore leading always-zero address bits (architecture-specific).
|
||||||
|
(@jasone)
|
||||||
|
- Reorganize headers (ongoing work) to make them hermetic, and disentangle
|
||||||
|
various module dependencies. (@davidtgoldblatt)
|
||||||
|
- Convert various internal data structures such as size class metadata from
|
||||||
|
boot-time-initialized to compile-time-initialized. Propagate resulting data
|
||||||
|
structure simplifications, such as making arena metadata fixed-size.
|
||||||
|
(@jasone)
|
||||||
|
- Simplify size class lookups when constrained to size classes that are
|
||||||
|
multiples of the page size. This speeds lookups, but the primary benefit is
|
||||||
|
complexity reduction in code that was the source of numerous regressions.
|
||||||
|
(@jasone)
|
||||||
|
- Lock individual extents when possible for localized extent operations,
|
||||||
|
rather than relying on a top-level arena lock. (@davidtgoldblatt, @jasone)
|
||||||
|
- Use first fit layout policy instead of best fit, in order to improve
|
||||||
|
packing. (@jasone)
|
||||||
|
- If munmap(2) is not in use, use an exponential series to grow each arena's
|
||||||
|
virtual memory, so that the number of disjoint virtual memory mappings
|
||||||
|
remains low. (@jasone)
|
||||||
|
- Implement per arena base allocators, so that arenas never share any virtual
|
||||||
|
memory pages. (@jasone)
|
||||||
|
- Automatically generate private symbol name mangling macros. (@jasone)
|
||||||
|
|
||||||
|
Incompatible changes:
|
||||||
|
- Replace chunk hooks with an expanded/normalized set of extent hooks.
|
||||||
|
(@jasone)
|
||||||
|
- Remove ratio-based purging. (@jasone)
|
||||||
|
- Remove --disable-tcache. (@jasone)
|
||||||
|
- Remove --disable-tls. (@jasone)
|
||||||
|
- Remove --enable-ivsalloc. (@jasone)
|
||||||
|
- Remove --with-lg-size-class-group. (@jasone)
|
||||||
|
- Remove --with-lg-tiny-min. (@jasone)
|
||||||
|
- Remove --disable-cc-silence. (@jasone)
|
||||||
|
- Remove --enable-code-coverage. (@jasone)
|
||||||
|
- Remove --disable-munmap (replaced by opt.retain). (@jasone)
|
||||||
|
- Remove Valgrind support. (@jasone)
|
||||||
|
- Remove quarantine support. (@jasone)
|
||||||
|
- Remove redzone support. (@jasone)
|
||||||
|
- Remove mallctl interfaces (various authors):
|
||||||
|
+ config.munmap
|
||||||
|
+ config.tcache
|
||||||
|
+ config.tls
|
||||||
|
+ config.valgrind
|
||||||
|
+ opt.lg_chunk
|
||||||
|
+ opt.purge
|
||||||
|
+ opt.lg_dirty_mult
|
||||||
|
+ opt.decay_time
|
||||||
|
+ opt.quarantine
|
||||||
|
+ opt.redzone
|
||||||
|
+ opt.thp
|
||||||
|
+ arena.<i>.lg_dirty_mult
|
||||||
|
+ arena.<i>.decay_time
|
||||||
|
+ arena.<i>.chunk_hooks
|
||||||
|
+ arenas.initialized
|
||||||
|
+ arenas.lg_dirty_mult
|
||||||
|
+ arenas.decay_time
|
||||||
|
+ arenas.bin.<i>.run_size
|
||||||
|
+ arenas.nlruns
|
||||||
|
+ arenas.lrun.<i>.size
|
||||||
|
+ arenas.nhchunks
|
||||||
|
+ arenas.hchunk.<i>.size
|
||||||
|
+ arenas.extend
|
||||||
|
+ stats.cactive
|
||||||
|
+ stats.arenas.<i>.lg_dirty_mult
|
||||||
|
+ stats.arenas.<i>.decay_time
|
||||||
|
+ stats.arenas.<i>.metadata.{mapped,allocated}
|
||||||
|
+ stats.arenas.<i>.{npurge,nmadvise,purged}
|
||||||
|
+ stats.arenas.<i>.huge.{allocated,nmalloc,ndalloc,nrequests}
|
||||||
|
+ stats.arenas.<i>.bins.<j>.{nruns,reruns,curruns}
|
||||||
|
+ stats.arenas.<i>.lruns.<j>.{nmalloc,ndalloc,nrequests,curruns}
|
||||||
|
+ stats.arenas.<i>.hchunks.<j>.{nmalloc,ndalloc,nrequests,curhchunks}
|
||||||
|
|
||||||
|
Bug fixes:
|
||||||
|
- Improve interval-based profile dump triggering to dump only one profile when
|
||||||
|
a single allocation's size exceeds the interval. (@jasone)
|
||||||
|
- Use prefixed function names (as controlled by --with-jemalloc-prefix) when
|
||||||
|
pruning backtrace frames in jeprof. (@jasone)
|
||||||
|
|
||||||
* 4.5.0 (February 28, 2017)
|
* 4.5.0 (February 28, 2017)
|
||||||
|
|
||||||
This is the first release to benefit from much broader continuous integration
|
This is the first release to benefit from much broader continuous integration
|
||||||
|
Loading…
Reference in New Issue
Block a user