Commit Graph

60 Commits

Author SHA1 Message Date
Jason Evans
57efa7bb0e Avoid atexit(3) when possible, disable prof_final by default.
atexit(3) can deadlock internally during its own initialization if
jemalloc calls atexit() during jemalloc initialization.  Mitigate the
impact by restructuring prof initialization to avoid calling atexit()
unless the registered function will actually dump a final heap profile.

Additionally, disable prof_final by default so that this land mine is
opt-in rather than opt-out.

This resolves #144.
2014-10-08 18:08:00 -07:00
Jason Evans
3c3b3b1a94 Fix a docbook element nesting nit.
According to the docbook documentation for <funcprototype>, its parent
must be <funcsynopsis>; fix accordingly.  Nonetheless, the man page
processor fails badly when this construct is embedded in a <para> (which
is documented to be legal), although the html processor does fine.
2014-10-05 14:48:44 -07:00
Daniel Micay
a95018ee81 Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.

It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).

Repeated vector reallocation micro-benchmark:

    #include <string.h>
    #include <stdlib.h>

    int main(void) {
        for (size_t i = 0; i < 100; i++) {
            void *ptr = NULL;
            size_t old_size = 0;
            for (size_t size = 4; size < (1 << 30); size *= 2) {
                ptr = realloc(ptr, size);
                if (!ptr) return 1;
                memset(ptr + old_size, 0xff, size - old_size);
                old_size = size;
            }
            free(ptr);
        }
    }

The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.

With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.

An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.

Numbers with transparent huge pages enabled:

glibc (copies elided via MREMAP_MAYMOVE): 8.471s

jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s

jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s

Numbers with transparent huge pages disabled:

glibc (copies elided via MREMAP_MAYMOVE): 15.403s

jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s

jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s

Closes #137
2014-10-05 14:47:01 -07:00
Jason Evans
e9a3fa2e09 Add missing header includes in jemalloc/jemalloc.h .
Add stdlib.h, stdbool.h, and stdint.h to jemalloc/jemalloc.h so that
applications only have to #include <jemalloc/jemalloc.h>.

This resolves #132.
2014-10-05 12:05:37 -07:00
Jason Evans
fc12c0b8bc Implement/test/fix prof-related mallctl's.
Implement/test/fix the opt.prof_thread_active_init,
prof.thread_active_init, and thread.prof.active mallctl's.

Test/fix the thread.prof.name mallctl.

Refactor opt_prof_active to be read-only and move mutable state into the
prof_active variable.  Stop leaning on ctl-related locking for
protection.
2014-10-03 23:25:30 -07:00
Jason Evans
20c31deaae Test prof.reset mallctl and fix numerous discovered bugs. 2014-10-02 23:01:10 -07:00
Daniel Micay
4cfe55166e Add support for sized deallocation.
This adds a new `sdallocx` function to the external API, allowing the
size to be passed by the caller.  It avoids some extra reads in the
thread cache fast path.  In the case where stats are enabled, this
avoids the work of calculating the size from the pointer.

An assertion validates the size that's passed in, so enabling debugging
will allow users of the API to debug cases where an incorrect size is
passed in.

The performance win for a contrived microbenchmark doing an allocation
and immediately freeing it is ~10%.  It may have a different impact on a
real workload.

Closes #28
2014-09-08 17:34:24 -07:00
Jason Evans
602c8e0971 Implement per thread heap profiling.
Rename data structures (prof_thr_cnt_t-->prof_tctx_t,
prof_ctx_t-->prof_gctx_t), and convert to storing a prof_tctx_t for
sampled objects.

Convert PROF_ALLOC_PREP() to prof_alloc_prep(), since precise backtrace
depth within jemalloc functions is no longer an issue (pprof prunes
irrelevant frames).

Implement mallctl's:
- prof.reset implements full sample data reset, and optional change of
  sample interval.
- prof.lg_sample reads the current sample interval (opt.lg_prof_sample
  was the permanent source of truth prior to prof.reset).
- thread.prof.name provides naming capability for threads within heap
  profile dumps.
- thread.prof.active makes it possible to activate/deactivate heap
  profiling for individual threads.

Modify the heap dump files to contain per thread heap profile data.
This change is incompatible with the existing pprof, which will require
enhancements to read and process the enriched data.
2014-08-19 21:31:16 -07:00
Jason Evans
b4d62cd61b Minor doc edit. 2014-05-15 22:46:24 -07:00
Jason Evans
e2deab7a75 Refactor huge allocation to be managed by arenas.
Refactor huge allocation to be managed by arenas (though the global
red-black tree of huge allocations remains for lookup during
deallocation).  This is the logical conclusion of recent changes that 1)
made per arena dss precedence apply to huge allocation, and 2) made it
possible to replace the per arena chunk allocation/deallocation
functions.

Remove the top level huge stats, and replace them with per arena huge
stats.

Normalize function names and types to *dalloc* (some were *dealloc*).

Remove the --enable-mremap option.  As jemalloc currently operates, this
is a performace regression for some applications, but planned work to
logarithmically space huge size classes should provide similar amortized
performance.  The motivation for this change was that mremap-based huge
reallocation forced leaky abstractions that prevented refactoring.
2014-05-15 22:36:41 -07:00
aravind
fb7fe50a88 Add support for user-specified chunk allocators/deallocators.
Add new mallctl endpoints "arena<i>.chunk.alloc" and
"arena<i>.chunk.dealloc" to allow userspace to configure
jemalloc's chunk allocator and deallocator on a per-arena
basis.
2014-05-12 10:46:03 -07:00
Jason Evans
bd87b01999 Optimize Valgrind integration.
Forcefully disable tcache if running inside Valgrind, and remove
Valgrind calls in tcache-specific code.

Restructure Valgrind-related code to move most Valgrind calls out of the
fast path functions.

Take advantage of static knowledge to elide some branches in
JEMALLOC_VALGRIND_REALLOC().
2014-04-15 16:49:57 -07:00
Jason Evans
ecd3e59ca3 Remove the "opt.valgrind" mallctl.
Remove the "opt.valgrind" mallctl because it is unnecessary -- jemalloc
automatically detects whether it is running inside valgrind.
2014-04-15 14:33:50 -07:00
Jason Evans
a2c719b374 Remove the "arenas.purge" mallctl.
Remove the "arenas.purge" mallctl, which was obsoleted by the
"arena.<i>.purge" mallctl in 3.1.0.
2014-04-15 12:46:28 -07:00
Jason Evans
4d434adb14 Make dss non-optional, and fix an "arena.<i>.dss" mallctl bug.
Make dss non-optional on all platforms which support sbrk(2).

Fix the "arena.<i>.dss" mallctl to return an error if "primary" or
"secondary" precedence is specified, but sbrk(2) is not supported.
2014-04-15 12:09:48 -07:00
Jason Evans
24a4ba77e1 Update MALLOCX_ARENA() documentation.
Update MALLOCX_ARENA() documentation to no longer claim that it has no
effect for huge region allocations.
2014-04-14 22:38:59 -07:00
Jason Evans
9790b9667f Remove the *allocm() API, which is superceded by the *allocx() API. 2014-04-14 22:32:31 -07:00
Jason Evans
9c62ed44b0 Document how dss precedence affects huge allocation. 2014-03-31 09:16:59 -07:00
Jason Evans
b2c31660be Extract profiling code from [re]allocation functions.
Extract profiling code from malloc(), imemalign(), calloc(), realloc(),
mallocx(), rallocx(), and xallocx().  This slightly reduces the amount
of code compiled into the fast paths, but the primary benefit is the
combinatorial complexity reduction.

Simplify iralloc[t]() by creating a separate ixalloc() that handles the
no-move cases.

Further simplify [mrxn]allocx() (and by implication [mrn]allocm()) to
make request size overflows due to size class and/or alignment
constraints trigger undefined behavior (detected by debug-only
assertions).

Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling
backtrace creation in imemalign().  This bug impacted posix_memalign()
and aligned_alloc().
2014-01-12 15:41:05 -08:00
Jason Evans
d8a390020c Fix a few mallctl() documentation errors.
Normalize mallctl() order (code and documentation).
2013-12-19 21:40:41 -08:00
Jason Evans
de73296d6b Add mallctl*() unit tests. 2013-12-19 21:40:13 -08:00
Jason Evans
1393d79a4c Remove ENOMEM from the documented set of *mallctl() errors.
*mallctl() always returns EINVAL and does partial result copying when
*oldlenp is to short to hold the requested value, rather than returning
ENOMEM.  Therefore remove ENOMEM from the documented set of possible
errors.
2013-12-18 15:35:45 -08:00
Jason Evans
d82a5e6a34 Implement the *allocx() API.
Implement the *allocx() API, which is a successor to the *allocm() API.
The *allocx() functions are slightly simpler to use because they have
fewer parameters, they directly return the results of primary interest,
and mallocx()/rallocx() avoid the strict aliasing pitfall that
allocm()/rallocx() share with posix_memalign().  The following code
violates strict aliasing rules:

    foo_t *foo;
    allocm((void **)&foo, NULL, 42, 0);

whereas the following is safe:

    foo_t *foo;
    void *p;
    allocm(&p, NULL, 42, 0);
    foo = (foo_t *)p;

mallocx() does not have this problem:

    foo_t *foo = (foo_t *)mallocx(42, 0);
2013-12-12 22:35:52 -08:00
Jason Evans
39e7fd0580 Fix ALLOCM_ARENA(a) handling in rallocm().
Fix rallocm() to use the specified arena for allocation, not just
deallocation.

Clarify ALLOCM_ARENA(a) documentation.
2013-11-25 18:02:35 -08:00
Jason Evans
aabaf851b2 Add ids for all mallctl entries.
Add ids for all mallctl entries, so that external documents can link to
arbitrary mallctl entries.
2013-10-30 14:52:09 -07:00
Jason Evans
705328ca46 Clarify how to use malloc_conf.
Clarify that malloc_conf is intended only for compile-time
configuration, since jemalloc may be initialized before main() is
entered.
2013-03-19 16:28:41 -07:00
Jason Evans
1bf2743e08 Add clipping support to lg_chunk option processing.
Modify processing of the lg_chunk option so that it clips an
out-of-range input to the edge of the valid range.  This makes it
possible to request the minimum possible chunk size without intimate
knowledge of allocator internals.

Submitted by Ian Lepore (see FreeBSD PR bin/174641).
2012-12-23 08:51:48 -08:00
Jan Beich
ed90c97332 document what stats.active does not track
Based on http://www.canonware.com/pipermail/jemalloc-discuss/2012-March/000164.html
2012-11-06 16:30:24 -08:00
Jason Evans
e3d13060c8 Purge unused dirty pages in a fragmentation-reducing order.
Purge unused dirty pages in an order that first performs clean/dirty run
defragmentation, in order to mitigate available run fragmentation.

Remove the limitation that prevented purging unless at least one chunk
worth of dirty pages had accumulated in an arena.  This limitation was
intended to avoid excessive purging for small applications, but the
threshold was arbitrary, and the effect of questionable utility.

Relax opt_lg_dirty_mult from 5 to 3.  This compensates for increased
likelihood of allocating clean runs, given the same ratio of clean:dirty
runs, and reduces the potential for repeated purging in pathological
large malloc/free loops that push the active:dirty page ratio just over
the purge threshold.
2012-11-06 00:59:53 -08:00
Jason Evans
609ae595f0 Add arena-specific and selective dss allocation.
Add the "arenas.extend" mallctl, so that it is possible to create new
arenas that are outside the set that jemalloc automatically multiplexes
threads onto.

Add the ALLOCM_ARENA() flag for {,r,d}allocm(), so that it is possible
to explicitly allocate from a particular arena.

Add the "opt.dss" mallctl, which controls the default precedence of dss
allocation relative to mmap allocation.

Add the "arena.<i>.dss" mallctl, which makes it possible to set the
default dss precedence on a per arena or global basis.

Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge".

Add the "stats.arenas.<i>.dss" mallctl.
2012-10-12 18:26:16 -07:00
Jason Evans
174b70efb4 Disable tcache by default if running inside Valgrind.
Disable tcache by default if running inside Valgrind, in order to avoid
making unallocated objects appear reachable to Valgrind.
2012-05-15 23:31:53 -07:00
Jason Evans
781fe75e0a Auto-detect whether running inside Valgrind.
Auto-detect whether running inside Valgrind, thus removing the need to
manually specify MALLOC_CONF=valgrind:true.
2012-05-15 14:48:14 -07:00
Jason Evans
80fe0478e6 Generalize "stats.mapped" documentation.
Generalize "stats.mapped" documentation to state that all inactive
chunks are omitted, now that it is possible for mmap'ed chunks to be
omitted in addition to DSS chunks.
2012-05-09 23:08:48 -07:00
Jason Evans
2e671ffbad Add the --enable-mremap option.
Add the --enable-mremap option, and disable the use of mremap(2) by
default, for the same reason that freeing chunks via munmap(2) is
disabled by default on Linux: semi-permanent VM map fragmentation.
2012-05-09 16:12:00 -07:00
Jason Evans
d926c90500 Fix Valgrind URL in documentation.
Reported by Daichi GOTO.
2012-04-25 23:17:57 -07:00
Jason Evans
8f0e0eb1c0 Fix a memory corruption bug in chunk_alloc_dss().
Fix a memory corruption bug in chunk_alloc_dss() that was due to
claiming newly allocated memory is zeroed.

Reverse order of preference between mmap() and sbrk() to prefer mmap().

Clean up management of 'zero' parameter in chunk_alloc*().
2012-04-21 13:33:48 -07:00
Jason Evans
0b25fe79aa Update prof defaults to match common usage.
Change the "opt.lg_prof_sample" default from 0 to 19 (1 B to 512 KiB).

Change the "opt.prof_accum" default from true to false.

Add the "opt.prof_final" mallctl, so that "opt.prof_prefix" need not be
abused to disable final profile dumping.
2012-04-17 16:39:33 -07:00
Jason Evans
25a000e896 Update pprof (from gperftools 2.0). 2012-04-17 15:49:30 -07:00
Jason Evans
59ae2766af Add the --disable-munmap option.
Add the --disable-munmap option, remove the configure test that
attempted to detect the VM allocation quirk known to exist on Linux
x86[_64], and make --disable-munmap implicit on Linux.
2012-04-16 18:08:58 -07:00
Jason Evans
d6abcbb14b Always disable redzone by default.
Always disable redzone by default, even when --enable-debug is
specified.  The memory overhead for redzones can be substantial, which
makes this feature something that should only be opted into.
2012-04-12 17:09:54 -07:00
Jason Evans
122449b073 Implement Valgrind support, redzones, and quarantine.
Implement Valgrind support, as well as the redzone and quarantine
features, which help Valgrind detect memory errors.  Redzones are only
implemented for small objects because the changes necessary to support
redzones around large and huge objects are complicated by in-place
reallocation, to the point that it isn't clear that the maintenance
burden is worth the incremental improvement to Valgrind support.

Merge arena_salloc() and arena_salloc_demote().

Refactor i[v]salloc() to expose the 'demote' option.
2012-04-11 11:46:18 -07:00
Jason Evans
b147611b52 Add utrace(2)-based tracing (--enable-utrace). 2012-04-05 13:36:17 -07:00
Jason Evans
48db6167e7 Remove obsolete "config.dynamic_page_shift" mallctl documentation. 2012-04-03 01:33:55 -07:00
Jason Evans
ae4c7b4b40 Clean up *PAGE* macros.
s/PAGE_SHIFT/LG_PAGE/g and s/PAGE_SIZE/PAGE/g.

Remove remnants of the dynamic-page-shift code.

Rename the "arenas.pagesize" mallctl to "arenas.page".

Remove the "arenas.chunksize" mallctl, which is redundant with
"opt.lg_chunk".
2012-04-02 07:04:34 -07:00
Jason Evans
d4be8b7b6e Add the "thread.tcache.enabled" mallctl. 2012-03-26 19:02:49 -07:00
Jason Evans
7091b415bb Fix various documentation formatting regressions. 2012-03-19 09:36:44 -07:00
Jason Evans
e7b8fa18d2 Rename the "tcache.flush" mallctl to "thread.tcache.flush". 2012-03-16 17:09:32 -07:00
Jason Evans
0a0bbf63e5 Implement aligned_alloc().
Implement aligned_alloc(), which was added in the C11 standard.  The
function is weakly specified to the point that a minimally compliant
implementation would be painful to use (size must be an integral
multiple of alignment!), which in practice makes posix_memalign() a
safer choice.
2012-03-13 12:55:21 -07:00
Jason Evans
4507f34628 Remove the lg_tcache_gc_sweep option.
Remove the lg_tcache_gc_sweep option, because it is no longer
very useful.  Prior to the addition of dynamic adjustment of tcache fill
count, it was possible for fill/flush overhead to be a problem, but this
problem no longer occurs.
2012-03-05 14:34:37 -08:00
Jason Evans
7e77eaffff Add the --disable-experimental option. 2012-03-02 17:47:37 -08:00