Commit Graph

120 Commits

Author SHA1 Message Date
Daniel Micay
f22214a29d Use regular arena allocation for huge tree nodes.
This avoids grabbing the base mutex, as a step towards fine-grained
locking for huge allocations. The thread cache also provides a tiny
(~3%) improvement for serial huge allocations.
2014-10-07 23:57:09 -07:00
Jason Evans
8bb3198f72 Refactor/fix arenas manipulation.
Abstract arenas access to use arena_get() (or a0get() where appropriate)
rather than directly reading e.g. arenas[ind].  Prior to the addition of
the arenas.extend mallctl, the worst possible outcome of directly
accessing arenas was a stale read, but arenas.extend may allocate and
assign a new array to arenas.

Add a tsd-based arenas_cache, which amortizes arenas reads.  This
introduces some subtle bootstrapping issues, with tsd_boot() now being
split into tsd_boot[01]() to support tsd wrapper allocation
bootstrapping, as well as an arenas_cache_bypass tsd variable which
dynamically terminates allocation of arenas_cache itself.

Promote a0malloc(), a0calloc(), and a0free() to be generally useful for
internal allocation, and use them in several places (more may be
appropriate).

Abstract arena->nthreads management and fix a missing decrement during
thread destruction (recent tsd refactoring left arenas_cleanup()
unused).

Change arena_choose() to propagate OOM, and handle OOM in all callers.
This is important for providing consistent allocation behavior when the
MALLOCX_ARENA() flag is being used.  Prior to this fix, it was possible
for an OOM to result in allocation silently allocating from a different
arena than the one specified.
2014-10-07 23:14:57 -07:00
Jason Evans
155bfa7da1 Normalize size classes.
Normalize size classes to use the same number of size classes per size
doubling (currently hard coded to 4), across the intire range of size
classes.  Small size classes already used this spacing, but in order to
support this change, additional small size classes now fill [4 KiB .. 16
KiB).  Large size classes range from [16 KiB .. 4 MiB).  Huge size
classes now support non-multiples of the chunk size in order to fill (4
MiB .. 16 MiB).
2014-10-06 01:45:13 -07:00
Daniel Micay
a95018ee81 Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.

It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).

Repeated vector reallocation micro-benchmark:

    #include <string.h>
    #include <stdlib.h>

    int main(void) {
        for (size_t i = 0; i < 100; i++) {
            void *ptr = NULL;
            size_t old_size = 0;
            for (size_t size = 4; size < (1 << 30); size *= 2) {
                ptr = realloc(ptr, size);
                if (!ptr) return 1;
                memset(ptr + old_size, 0xff, size - old_size);
                old_size = size;
            }
            free(ptr);
        }
    }

The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.

With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.

An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.

Numbers with transparent huge pages enabled:

glibc (copies elided via MREMAP_MAYMOVE): 8.471s

jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s

jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s

Numbers with transparent huge pages disabled:

glibc (copies elided via MREMAP_MAYMOVE): 15.403s

jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s

jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s

Closes #137
2014-10-05 14:47:01 -07:00
Jason Evans
47395a1b4c Avoid purging in microbench when lazy-lock is enabled. 2014-10-04 14:59:38 -07:00
Jason Evans
029d44cf8b Fix tsd cleanup regressions.
Fix tsd cleanup regressions that were introduced in
5460aa6f66 (Convert all tsd variables to
reside in a single tsd structure.).  These regressions were twofold:

1) tsd_tryget() should never (and need never) return NULL.  Rename it to
   tsd_fetch() and simplify all callers.
2) tsd_*_set() must only be called when tsd is in the nominal state,
   because cleanup happens during the nominal-->purgatory transition,
   and re-initialization must not happen while in the purgatory state.
   Add tsd_nominal() and use it as needed.  Note that tsd_*{p,}_get()
   can still be used as long as no re-initialization that would require
   cleanup occurs.  This means that e.g. the thread_allocated counter
   can be updated unconditionally.
2014-10-04 11:22:55 -07:00
Jason Evans
b72d4abc5f Skip test_prof_thread_name_validation if !config_prof. 2014-10-03 23:41:53 -07:00
Jason Evans
fc12c0b8bc Implement/test/fix prof-related mallctl's.
Implement/test/fix the opt.prof_thread_active_init,
prof.thread_active_init, and thread.prof.active mallctl's.

Test/fix the thread.prof.name mallctl.

Refactor opt_prof_active to be read-only and move mutable state into the
prof_active variable.  Stop leaning on ctl-related locking for
protection.
2014-10-03 23:25:30 -07:00
Jason Evans
551ebc4364 Convert to uniform style: cond == false --> !cond 2014-10-03 10:16:09 -07:00
Jason Evans
ebbd0c91f0 Remove obsolete comment. 2014-10-02 23:05:23 -07:00
Jason Evans
20c31deaae Test prof.reset mallctl and fix numerous discovered bugs. 2014-10-02 23:01:10 -07:00
Jason Evans
cc9e626ea9 Refactor permuted backtrace test allocation.
Refactor permuted backtrace test allocation that was originally used
only by the prof_accum test, so that it can be used by other heap
profiling test binaries.
2014-10-01 22:28:23 -07:00
Jason Evans
f97e5ac4ec Implement compile-time bitmap size computation. 2014-09-28 14:43:11 -07:00
Jason Evans
5460aa6f66 Convert all tsd variables to reside in a single tsd structure. 2014-09-23 02:36:08 -07:00
Daniel Micay
4cfe55166e Add support for sized deallocation.
This adds a new `sdallocx` function to the external API, allowing the
size to be passed by the caller.  It avoids some extra reads in the
thread cache fast path.  In the case where stats are enabled, this
avoids the work of calculating the size from the pointer.

An assertion validates the size that's passed in, so enabling debugging
will allow users of the API to debug cases where an incorrect size is
passed in.

The performance win for a contrived microbenchmark doing an allocation
and immediately freeing it is ~10%.  It may have a different impact on a
real workload.

Closes #28
2014-09-08 17:34:24 -07:00
Jason Evans
c3f8650749 Add relevant function attributes to [msn]allocx(). 2014-09-08 16:47:51 -07:00
Jason Evans
a1f3929ffd Thwart optimization of free(malloc(1)) in microbench. 2014-09-08 16:23:48 -07:00
Daniel Micay
c3bfe9569a avoid conflict with the POSIX timer_t type
It hits a compilation error with glibc 2.19 without a rename.
2014-09-08 01:20:44 -04:00
Jason Evans
423d78a21b Add microbench tests. 2014-09-07 19:58:04 -07:00
Jason Evans
b67ec3c497 Add a simple timer implementation for use in benchmarking. 2014-09-07 19:57:24 -07:00
Jason Evans
c21b05ea09 Whitespace cleanups. 2014-09-04 22:27:26 -07:00
Jason Evans
1628e8615e Add rb_empty(). 2014-08-19 21:05:54 -07:00
Jason Evans
586c8ede42 Fix arena.<i>.dss mallctl to handle read-only calls. 2014-08-15 12:20:20 -07:00
Jason Evans
a2ea54c986 Add atomic operations tests and fix latent bugs. 2014-08-06 23:36:19 -07:00
Mike Hommey
999e1b5cc7 Fix thd_join on win64 2014-06-01 20:50:24 -07:00
Jason Evans
1f6d77e1f6 Use KQU() rather than QU() where applicable.
Fix KZI() and KQI() to append LL rather than ULL.
2014-05-28 21:17:42 -07:00
Jason Evans
99118622ff Use nallocx() rather than mallctl() to trigger initialization.
Use nallocx() rather than mallctl() to trigger initialization, because
nallocx() has no side effects other than initialization, whereas
mallctl() does a bunch of internal memory allocation.
2014-05-28 11:23:01 -07:00
Jason Evans
26f44df742 Make sure initialization occurs prior to running tests. 2014-05-28 11:08:17 -07:00
Mike Hommey
b54aef1d8c Fixup after 3a730df (Avoid pointer arithmetic on void*[...]) 2014-05-28 09:46:09 -07:00
Mike Hommey
17767b5f2b Correctly return exit code from thd_join on Windows 2014-05-28 09:43:30 -07:00
Mike Hommey
26246af977 Define INFINITY when it's not defined 2014-05-28 09:41:28 -07:00
Mike Hommey
12f74e680c Move platform headers and tricks from jemalloc_internal.h.in to a new jemalloc_internal_decls.h header 2014-05-28 09:38:10 -07:00
Mike Hommey
a9df1ae622 Use ULL prefix instead of LLU for unsigned long longs
MSVC only supports the former.
2014-05-27 15:45:14 -07:00
Mike Hommey
3a730dfd50 Avoid pointer arithmetic on void* in test/integration/rallocx.c 2014-05-27 15:26:28 -07:00
Mike Hommey
86e2e703ff Rename "small" local variable, because windows headers #define it 2014-05-27 15:20:31 -07:00
Mike Hommey
7330c3770a Use C99 varadic macros instead of GCC ones 2014-05-27 15:17:00 -07:00
Mike Hommey
f41f143668 Replace variable arrays in tests with VARIABLE_ARRAY 2014-05-27 15:10:38 -07:00
Mike Hommey
47d58a01ff Define _CRT_SPINCOUNT in test/src/mtx.c like in src/mutex.c 2014-05-27 15:05:05 -07:00
Jason Evans
e2deab7a75 Refactor huge allocation to be managed by arenas.
Refactor huge allocation to be managed by arenas (though the global
red-black tree of huge allocations remains for lookup during
deallocation).  This is the logical conclusion of recent changes that 1)
made per arena dss precedence apply to huge allocation, and 2) made it
possible to replace the per arena chunk allocation/deallocation
functions.

Remove the top level huge stats, and replace them with per arena huge
stats.

Normalize function names and types to *dalloc* (some were *dealloc*).

Remove the --enable-mremap option.  As jemalloc currently operates, this
is a performace regression for some applications, but planned work to
logarithmically space huge size classes should provide similar amortized
performance.  The motivation for this change was that mremap-based huge
reallocation forced leaky abstractions that prevented refactoring.
2014-05-15 22:36:41 -07:00
aravind
fb7fe50a88 Add support for user-specified chunk allocators/deallocators.
Add new mallctl endpoints "arena<i>.chunk.alloc" and
"arena<i>.chunk.dealloc" to allow userspace to configure
jemalloc's chunk allocator and deallocator on a per-arena
basis.
2014-05-12 10:46:03 -07:00
Jason Evans
a344dd01c7 Fix coding sytle nits. 2014-05-01 15:51:30 -07:00
Jason Evans
ecd3e59ca3 Remove the "opt.valgrind" mallctl.
Remove the "opt.valgrind" mallctl because it is unnecessary -- jemalloc
automatically detects whether it is running inside valgrind.
2014-04-15 14:33:50 -07:00
Jason Evans
a2c719b374 Remove the "arenas.purge" mallctl.
Remove the "arenas.purge" mallctl, which was obsoleted by the
"arena.<i>.purge" mallctl in 3.1.0.
2014-04-15 12:46:28 -07:00
Jason Evans
4d434adb14 Make dss non-optional, and fix an "arena.<i>.dss" mallctl bug.
Make dss non-optional on all platforms which support sbrk(2).

Fix the "arena.<i>.dss" mallctl to return an error if "primary" or
"secondary" precedence is specified, but sbrk(2) is not supported.
2014-04-15 12:09:48 -07:00
Jason Evans
9790b9667f Remove the *allocm() API, which is superceded by the *allocx() API. 2014-04-14 22:32:31 -07:00
Jason Evans
e64b1b7be9 Enable big-endian mode for SFMT.
Add cpp logic to enable big-endian mode in SFMT.  This should fix SFMT
tests on e.g. MIPS and SPARC.
2014-03-30 17:24:24 -07:00
Jason Evans
df3f27024f Adapt hash tests to big-endian systems.
The hash code, which has MurmurHash3 at its core, generates different
output depending on system endianness, so adapt the expected output on
big-endian systems.  MurmurHash3 code also makes the assumption that
unaligned access is okay (not true on all systems), but jemalloc only
hashes data structures that have sufficient alignment to dodge this
limitation.
2014-03-30 16:27:08 -07:00
Jason Evans
ada8447cf6 Reduce maximum tested alignment.
Reduce maximum tested alignment from 2^29 to 2^25.  Some systems may not
have enough contiguous virtual memory to satisfy the larger alignment,
but the smaller alignment is still adequate to test multi-chunk
alignment.
2014-03-30 11:22:23 -07:00
Jason Evans
ab8c79fdaf Fix message formatting errors uncovered by p_test_fail() refactoring. 2014-03-30 11:21:09 -07:00
Jason Evans
e3f27cfced Fix p_test_fail()'s va_list abuse.
p_test_fail() was passing a va_list to two separate functions with the
expectation that no reset would occur.  Refactor p_test_fail()'s callers
to instead format two strings and pass them to p_test_fail().

Add a missing parameter to an assert_u64_eq() call, which the compiler
warned about after the assertion macro refactoring.
2014-03-29 23:14:32 -07:00