Update ChangeLog.

2015-02-23 22:28:43 -08:00 · 2015-02-23 22:28:43 -08:00 · 54673fd8d7
commit 54673fd8d7
parent 04ca7580db
1 changed files with 150 additions and 1 deletions
--- a/151
+++ b/151
@ -5,6 +5,155 @@ found in the git revision history:
    https://github.com/jemalloc/jemalloc
 * 4.0.0 (XXX) See https://github.com/jemalloc/jemalloc/milestones/4.0.0 for
              remaining work.
  This version contains many speed and space optimizations, both minor and
  major.  The major themes are generalization, unification, and simplification.
  Although many of these optimizations cause no visible behavior change, their
  cumulative effect is substantial.
  New features:
  - Normalize size class spacing to be consistent across the complete size
    range.  By default there are four size classes per size doubling, but this
    is now configurable via the --with-lg-size-class-group option.  Also add the
    --with-lg-page, --with-lg-page-sizes, --with-lg-quantum, and
    --with-lg-tiny-min options, which can be used to tweak page and size class
    settings.  Impacts:
    + Worst case performance for incrementally growing/shrinking reallocation
      is improved because there are far fewer size classes, and therefore
      copying happens less often.
    + Internal fragmentation is limited to 20% for all but the smallest size
      classes (those less than four times the quantum).  (1B + 4 KiB)
      and (1B + 4 MiB) previously suffered nearly 50% internal fragmentation.
    + Chunk fragmentation tends to be lower because there are fewer distinct run
      sizes to pack.
  - Add support for explicit tcaches.  The "tcache.create", "tcache.flush", and
    "tcache.destroy" mallctls control tcache lifetime and flushing, and the
    MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to the *allocx() API
    control which tcache is used for each operation.
  - Implement per thread heap profiling, as well as the ability to
    enable/disable heap profiling on a per thread basis.  Add the "prof.reset",
    "prof.lg_sample", "thread.prof.name", "thread.prof.active",
    "opt.prof_thread_active_init", "prof.thread_active_init", and
    "thread.prof.active" mallctls.
  - Add support for per arena application-specified chunk allocators, configured
    via the "arena<i>.chunk.alloc" and "arena<i>.chunk.dalloc" mallctls.
  - Refactor huge allocation to be managed by arenas, so that arenas now
    function as general purpose independent allocators.  This is important in
    the context of user-specified chunk allocators, aside from the scalability
    benefits.  Related new statistics:
    + The "stats.arenas.<i>.huge.allocated", "stats.arenas.<i>.huge.nmalloc",
      "stats.arenas.<i>.huge.ndalloc", and "stats.arenas.<i>.huge.nrequests"
      mallctls provide high level per arena huge allocation statistics.
    + The "arenas.nhchunks", "arenas.hchunks.<i>.size",
      "stats.arenas.<i>.hchunks.<j>.nmalloc",
      "stats.arenas.<i>.hchunks.<j>.ndalloc",
      "stats.arenas.<i>.hchunks.<j>.nrequests", and
      "stats.arenas.<i>.hchunks.<j>.curhchunks" mallctls provide per size class
      statistics.
  - Add the 'util' column to malloc_stats_print() output, which reports the
    proportion of available regions that are currently in use for each small
    size class.
  - Add "alloc" and "free" modes for for junk filling (see the "opt.junk"
    mallctl), so that it is possible to separately enable junk filling for
    allocation versus deallocation.
  - Add the jemalloc-config script, which provides information about how
    jemalloc was configured, and how to integrate it into application builds.
  - Add metadata statistics, which are accessible via the "stats.metadata",
    "stats.arenas.<i>.metadata.mapped", and
    "stats.arenas.<i>.metadata.allocated" mallctls.
  - Add the "prof.gdump" mallctl, which makes it possible to toggle the gdump
    feature on/off during program execution.
  - Add sdallocx(), which implements sized deallocation.  The primary
    optimization over dallocx() is the removal of a metadata read, which often
    suffers an L1 cache miss.
  - Add missing header includes in jemalloc/jemalloc.h, so that applications
    only have to #include <jemalloc/jemalloc.h>.
  - Add support for additional platforms:
    + Bitrig
    + Cygwin
    + DragonFlyBSD
    + iOS
    + OpenBSD
    + OpenRISC/or1k
  Optimizations:
  - Switch run and chunk allocation from first-best-fit (among best-fit
    candidates, choose the lowest in memory) to first-fit (among all candidates,
    choose the lowest in memory).  This tends to reduce chunk and virtual memory
    fragmentation, respectively.
  - Maintain dirty runs in per arena LRUs rather than in per arena trees of
    dirty-run-containing chunks.  In practice this change significantly reduces
    dirty page purging volume.
  - Integrate whole chunks into the unused dirty page purging machinery.  This
    reduces the cost of repeated huge allocation/deallocation, because it
    effectively introduces a cache of chunks.
  - Split the arena chunk map into two separate arrays, in order to increase
    cache locality for the frequently accessed bits.
  - Move small run metadata out of runs, into arena chunk headers.  This reduces
    run fragmentation, smaller runs reduce external fragmentation for small size
    classes, and packed (less uniformly aligned) metadata layout improves CPU
    cache set distribution.
  - Micro-optimize the fast paths for the public API functions.
  - Refactor thread-specific data to reside in a single structure.  This assures
    that only a single TLS read is necessary per call into the public API.
  - Implement in-place huge allocation growing and shrinking.
  - Refactor rtree (radix tree for chunk lookups) to be lock-free, and make
    additional optimizations that reduce maximum lookup depth to one or two
    levels.  This resolves what was a concurrency bottleneck for per arena huge
    allocation, because a global data structure is critical for determining
    which arenas own which huge allocations.
  Incompatible changes:
  - Replace --enable-cc-silence with --disable-cc-silence to suppress spurious
    warnings by default.
  - Assure that the constness of malloc_usable_size()'s return type matches that
    of the system implementation.
  - Change the heap profile dump format to support per thread heap profiling,
    and enhance pprof with the --thread=<n> option.  As a result, the bundled
    pprof must now be used rather than the upstream (gperftools) pprof.
  - Disable "opt.prof_final" by default, in order to avoid atexit(3), which can
    internally deadlock on some platforms.
  - Change the "arenas.nlruns" mallctl type from size_t to unsigned.
  - Replace the "stats.arenas.<i>.bins.<j>.allocated" mallctl with
    "stats.arenas.<i>.bins.<j>.curregs".
  - Ignore MALLOC_CONF in set{uid,gid,cap} binaries.
  - Ignore MALLOCX_ARENA(a) in dallocx(), in favor of using the
    MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to control tcache usage.
  Removed features:
  - Remove the *allocm() API, which is superseded by the *allocx() API.
  - Remove the --enable-dss options, and make dss non-optional on all platforms
    which support sbrk(2).
  - Remove the "arenas.purge" mallctl, which was obsoleted by the
    "arena.<i>.purge" mallctl in 3.1.0.
  - Remove the unnecessary "opt.valgrind" mallctl; jemalloc automatically
    detects whether it is running inside Valgrind.
  - Remove the "stats.huge.allocated", "stats.huge.nmalloc", and
    "stats.huge.ndalloc" mallctls.
  - Remove the --enable-mremap option.
  - Remove the --enable-ivsalloc option, and merge its functionality into
    --enable-debug.
  - Remove the "stats.chunks.current", "stats.chunks.total", and
    "stats.chunks.high" mallctls.
  Bug fixes:
  - Fix the cactive statistic to decrease (rather than increase) when active
    memory decreases.  This regression was first released in 3.5.0.
  - Fix OOM handling in memalign() and valloc().  A variant of this bug existed
    in all releases since 2.0.0, which introduced these functions.
  - Fix the "arena.<i>.dss" mallctl to return an error if "primary" or
    "secondary" precedence is specified, but sbrk(2) is not supported.
  - Fix fallback lg_floor() implementations to handle extremely large inputs.
  - Ensure the default purgeable zone is after the default zone on OS X.
  - Fix latent bugs in atomic_*().
  - Fix the "arena.<i>.dss" mallctl to handle read-only calls.
  - Fix tls_model configuration to enable the initial-exec model when possible.
  - Mark malloc_conf as a weak symbol so that the application can override it.
  - Correctly detect glibc's adaptive pthread mutexes.
  - Fix the --without-export configure option.
 * 3.6.0 (March 31, 2014)
  This version contains a critical bug fix for a regression present in 3.5.0 and
@ -21,7 +170,7 @@ found in the git revision history:
    backtracing to be reliable.
  - Use dss allocation precedence for huge allocations as well as small/large
    allocations.
-  - Fix test assertion failure message formatting.  This bug did not manifect on
+  - Fix test assertion failure message formatting.  This bug did not manifest on
    x86_64 systems because of implementation subtleties in va_list.
  - Fix inconsequential test failures for hash and SFMT code.