Commit Graph

449 Commits

Author SHA1 Message Date
Jason Evans
81e547566e Add --with-lg-tiny-min, generalize --with-lg-quantum. 2014-10-10 22:35:07 -07:00
Jason Evans
9b75677e53 Don't fetch tsd in a0{d,}alloc().
Don't fetch tsd in a0{d,}alloc(), because doing so can cause infinite
recursion on systems that require an allocated tsd wrapper.
2014-10-10 18:19:20 -07:00
Jason Evans
fc0b3b7383 Add configure options.
Add:
  --with-lg-page
  --with-lg-page-sizes
  --with-lg-size-class-group
  --with-lg-quantum

Get rid of STATIC_PAGE_SHIFT, in favor of directly setting LG_PAGE.

Fix various edge conditions exposed by the configure options.
2014-10-09 22:44:37 -07:00
Jason Evans
57efa7bb0e Avoid atexit(3) when possible, disable prof_final by default.
atexit(3) can deadlock internally during its own initialization if
jemalloc calls atexit() during jemalloc initialization.  Mitigate the
impact by restructuring prof initialization to avoid calling atexit()
unless the registered function will actually dump a final heap profile.

Additionally, disable prof_final by default so that this land mine is
opt-in rather than opt-out.

This resolves #144.
2014-10-08 18:08:00 -07:00
Jason Evans
3a8b9b1fd9 Fix a recursive lock acquisition regression.
Fix a recursive lock acquisition regression, which was introduced by
8bb3198f72 (Refactor/fix arenas
manipulation.).
2014-10-08 00:54:16 -07:00
Daniel Micay
f22214a29d Use regular arena allocation for huge tree nodes.
This avoids grabbing the base mutex, as a step towards fine-grained
locking for huge allocations. The thread cache also provides a tiny
(~3%) improvement for serial huge allocations.
2014-10-07 23:57:09 -07:00
Jason Evans
8bb3198f72 Refactor/fix arenas manipulation.
Abstract arenas access to use arena_get() (or a0get() where appropriate)
rather than directly reading e.g. arenas[ind].  Prior to the addition of
the arenas.extend mallctl, the worst possible outcome of directly
accessing arenas was a stale read, but arenas.extend may allocate and
assign a new array to arenas.

Add a tsd-based arenas_cache, which amortizes arenas reads.  This
introduces some subtle bootstrapping issues, with tsd_boot() now being
split into tsd_boot[01]() to support tsd wrapper allocation
bootstrapping, as well as an arenas_cache_bypass tsd variable which
dynamically terminates allocation of arenas_cache itself.

Promote a0malloc(), a0calloc(), and a0free() to be generally useful for
internal allocation, and use them in several places (more may be
appropriate).

Abstract arena->nthreads management and fix a missing decrement during
thread destruction (recent tsd refactoring left arenas_cleanup()
unused).

Change arena_choose() to propagate OOM, and handle OOM in all callers.
This is important for providing consistent allocation behavior when the
MALLOCX_ARENA() flag is being used.  Prior to this fix, it was possible
for an OOM to result in allocation silently allocating from a different
arena than the one specified.
2014-10-07 23:14:57 -07:00
Jason Evans
bf40641c5c Fix a prof_tctx_t destruction race. 2014-10-06 16:35:11 -07:00
Jason Evans
155bfa7da1 Normalize size classes.
Normalize size classes to use the same number of size classes per size
doubling (currently hard coded to 4), across the intire range of size
classes.  Small size classes already used this spacing, but in order to
support this change, additional small size classes now fill [4 KiB .. 16
KiB).  Large size classes range from [16 KiB .. 4 MiB).  Huge size
classes now support non-multiples of the chunk size in order to fill (4
MiB .. 16 MiB).
2014-10-06 01:45:13 -07:00
Daniel Micay
a95018ee81 Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.

It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).

Repeated vector reallocation micro-benchmark:

    #include <string.h>
    #include <stdlib.h>

    int main(void) {
        for (size_t i = 0; i < 100; i++) {
            void *ptr = NULL;
            size_t old_size = 0;
            for (size_t size = 4; size < (1 << 30); size *= 2) {
                ptr = realloc(ptr, size);
                if (!ptr) return 1;
                memset(ptr + old_size, 0xff, size - old_size);
                old_size = size;
            }
            free(ptr);
        }
    }

The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.

With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.

An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.

Numbers with transparent huge pages enabled:

glibc (copies elided via MREMAP_MAYMOVE): 8.471s

jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s

jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s

Numbers with transparent huge pages disabled:

glibc (copies elided via MREMAP_MAYMOVE): 15.403s

jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s

jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s

Closes #137
2014-10-05 14:47:01 -07:00
Jason Evans
f11a6776c7 Fix OOM-related regression in arena_tcache_fill_small().
Fix an OOM-related regression in arena_tcache_fill_small() that caused
cache corruption that would almost certainly expose the application to
undefined behavior, usually in the form of an allocation request
returning an already-allocated region, or somewhat less likely, a freed
region that had already been returned to the arena, thus making it
available to the arena for any purpose.

This regression was introduced by
9c43c13a35 (Reverse tcache fill order.),
and was present in all releases from 2.2.0 through 3.6.0.

This resolves #98.
2014-10-05 13:05:10 -07:00
Jason Evans
f04a0bef99 Fix prof regressions.
Fix prof regressions related to tdata (main per thread profiling data
structure) destruction:
- Deadlock.  The fix for this was intended to be part of
  20c31deaae (Test prof.reset mallctl and
  fix numerous discovered bugs.) but the fix was left incomplete.
- Destruction race.  Detaching tdata just prior to destruction without
  holding the tdatas lock made it possible for another thread to destroy
  the tdata out from under the thread that was on its way to doing so.
2014-10-04 15:03:49 -07:00
Jason Evans
0800afd03f Silence a compiler warning. 2014-10-04 14:59:17 -07:00
Jason Evans
029d44cf8b Fix tsd cleanup regressions.
Fix tsd cleanup regressions that were introduced in
5460aa6f66 (Convert all tsd variables to
reside in a single tsd structure.).  These regressions were twofold:

1) tsd_tryget() should never (and need never) return NULL.  Rename it to
   tsd_fetch() and simplify all callers.
2) tsd_*_set() must only be called when tsd is in the nominal state,
   because cleanup happens during the nominal-->purgatory transition,
   and re-initialization must not happen while in the purgatory state.
   Add tsd_nominal() and use it as needed.  Note that tsd_*{p,}_get()
   can still be used as long as no re-initialization that would require
   cleanup occurs.  This means that e.g. the thread_allocated counter
   can be updated unconditionally.
2014-10-04 11:22:55 -07:00
Jason Evans
fc12c0b8bc Implement/test/fix prof-related mallctl's.
Implement/test/fix the opt.prof_thread_active_init,
prof.thread_active_init, and thread.prof.active mallctl's.

Test/fix the thread.prof.name mallctl.

Refactor opt_prof_active to be read-only and move mutable state into the
prof_active variable.  Stop leaning on ctl-related locking for
protection.
2014-10-03 23:25:30 -07:00
Jason Evans
551ebc4364 Convert to uniform style: cond == false --> !cond 2014-10-03 10:16:09 -07:00
Jason Evans
20c31deaae Test prof.reset mallctl and fix numerous discovered bugs. 2014-10-02 23:01:10 -07:00
Daniel Micay
f8034540a1 Implement in-place huge allocation shrinking.
Trivial example:

    #include <stdlib.h>

    int main(void) {
        void *ptr = malloc(1024 * 1024 * 8);
        if (!ptr) return 1;
        ptr = realloc(ptr, 1024 * 1024 * 4);
        if (!ptr) return 1;
    }

Before:

    mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcfff000000
    mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcffec00000
    madvise(0x7fcfff000000, 8388608, MADV_DONTNEED) = 0

After:

    mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1934800000
    madvise(0x7f1934c00000, 4194304, MADV_DONTNEED) = 0

Closes #134
2014-10-01 16:55:03 -07:00
Dave Rigby
e3a16fce5e Mark malloc_conf as a weak symbol
This fixes issue #113 - je_malloc_conf is not respected on OS X
2014-09-29 15:05:55 -07:00
Jason Evans
0c5dd03e88 Move small run metadata into the arena chunk header.
Move small run metadata into the arena chunk header, with multiple
expected benefits:
- Lower run fragmentation due to reduced run sizes; runs are more likely
  to completely drain when there are fewer total regions.
- Improved cache behavior.  Prior to this change, run headers were
  always page-aligned, which put extra pressure on some CPU cache sets.
  The degree to which this was a problem was hardware dependent, but it
  likely hurt some even for the most advanced modern hardware.
- Buffer overruns/underruns are less likely to corrupt allocator
  metadata.
- Size classes between 4 KiB and 16 KiB become reasonable to support
  without any special handling, and the runs are small enough that dirty
  unused pages aren't a significant concern.
2014-09-29 01:31:39 -07:00
Jason Evans
f97e5ac4ec Implement compile-time bitmap size computation. 2014-09-28 14:43:11 -07:00
Jason Evans
6ef80d68f0 Fix profile dumping race.
Fix a race that caused a non-critical assertion failure.  To trigger the
race, a thread had to be part way through initializing a new sample,
such that it was discoverable by the dumping thread, but not yet linked
into its gctx by the time a later dump phase would normally have reset
its state to 'nominal'.

Additionally, lock access to the state field during modification to
transition to the dumping state.  It's not apparent that this oversight
could have caused an actual problem due to outer locking that protects
the dumping machinery, but the added locking pedantically follows the
stated locking protocol for the state field.
2014-09-24 22:23:43 -07:00
Jason Evans
5460aa6f66 Convert all tsd variables to reside in a single tsd structure. 2014-09-23 02:36:08 -07:00
Jason Evans
9d8f3d2033 Fix prof regressions.
Don't use atomic_add_uint64(), because it isn't available on 32-bit
platforms.

Fix forking support functions to manage all prof-related mutexes.

These regressions were introduced by
602c8e0971 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
2014-09-11 18:09:14 -07:00
Jason Evans
c3e9e7b041 Fix irallocx_prof() sample logic.
Fix irallocx_prof() sample logic to only update the threshold counter
after it knows what size the allocation ended up being.  This regression
was caused by 6e73dc194e (Fix a profile
sampling race.), which did not make it into any releases prior to this
fix.
2014-09-11 17:04:03 -07:00
Jason Evans
9c640bfdd4 Apply likely()/unlikely() to allocation/deallocation fast paths. 2014-09-11 17:01:58 -07:00
Jason Evans
91566fc079 Fix mallocx() to always honor MALLOCX_ARENA() when profiling. 2014-09-11 13:15:33 -07:00
Daniel Micay
23fdf8b359 mark some conditions as unlikely
* assertion failure
* malloc_init failure
* malloc not already initialized (in malloc_init)
* running in valgrind
* thread cache disabled at runtime

Clang and GCC already consider a comparison with NULL or -1 to be cold,
so many branches (out-of-memory) are already correctly considered as
cold and marking them is not important.
2014-09-10 21:49:42 -04:00
Jason Evans
6e73dc194e Fix a profile sampling race.
Fix a profile sampling race that was due to preparing to sample, yet
doing nothing to assure that the context remains valid until the stats
are updated.

These regressions were caused by
602c8e0971 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
2014-09-09 19:47:09 -07:00
Jason Evans
6fd53da030 Fix prof_tdata_get()-related regressions.
Fix prof_tdata_get() to avoid dereferencing an invalid tdata pointer
(when it's PROF_TDATA_STATE_{REINCARNATED,PURGATORY}).

Fix prof_tdata_get() callers to check for invalid results besides NULL
(PROF_TDATA_STATE_{REINCARNATED,PURGATORY}).

These regressions were caused by
602c8e0971 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
2014-09-09 15:29:34 -07:00
Jason Evans
a2260c95cd Fix sdallocx() assertion.
Refactor sdallocx() and nallocx() to share inallocx(), and fix an
sdallocx() assertion to check usize rather than size.
2014-09-09 10:39:15 -07:00
Daniel Micay
4cfe55166e Add support for sized deallocation.
This adds a new `sdallocx` function to the external API, allowing the
size to be passed by the caller.  It avoids some extra reads in the
thread cache fast path.  In the case where stats are enabled, this
avoids the work of calculating the size from the pointer.

An assertion validates the size that's passed in, so enabling debugging
will allow users of the API to debug cases where an incorrect size is
passed in.

The performance win for a contrived microbenchmark doing an allocation
and immediately freeing it is ~10%.  It may have a different impact on a
real workload.

Closes #28
2014-09-08 17:34:24 -07:00
Jason Evans
b718cf77e9 Optimize [nmd]alloc() fast paths.
Optimize [nmd]alloc() fast paths such that the (flags == 0) case is
streamlined, flags decoding only happens to the minimum degree
necessary, and no conditionals are repeated.
2014-09-07 14:40:19 -07:00
Jason Evans
c21b05ea09 Whitespace cleanups. 2014-09-04 22:27:26 -07:00
Qinfan Wu
ff6a31d3b9 Refactor chunk map.
Break the chunk map into two separate arrays, in order to improve cache
locality. This is related to issue #23.
2014-09-04 22:22:52 -07:00
Qinfan Wu
58799f6d1c Remove junk filling in tcache_bin_flush_small().
Junk filling is done in arena_dalloc_bin_locked(), so arena_alloc_junk_small()
is redundant. Also, we should use arena_dalloc_junk_small() instead of
arena_alloc_junk_small().
2014-08-26 21:28:31 -07:00
Sara Golemon
3e24afa28e Test for availability of malloc hooks via autoconf
__*_hook() is glibc, but on at least one glibc platform (homebrew),
the __GLIBC__ define isn't set correctly and we miss being able to
use these hooks.

Do a feature test for it during configuration so that we enable it
anywhere the hooks are actually available.
2014-08-22 15:19:21 -07:00
Jason Evans
602c8e0971 Implement per thread heap profiling.
Rename data structures (prof_thr_cnt_t-->prof_tctx_t,
prof_ctx_t-->prof_gctx_t), and convert to storing a prof_tctx_t for
sampled objects.

Convert PROF_ALLOC_PREP() to prof_alloc_prep(), since precise backtrace
depth within jemalloc functions is no longer an issue (pprof prunes
irrelevant frames).

Implement mallctl's:
- prof.reset implements full sample data reset, and optional change of
  sample interval.
- prof.lg_sample reads the current sample interval (opt.lg_prof_sample
  was the permanent source of truth prior to prof.reset).
- thread.prof.name provides naming capability for threads within heap
  profile dumps.
- thread.prof.active makes it possible to activate/deactivate heap
  profiling for individual threads.

Modify the heap dump files to contain per thread heap profile data.
This change is incompatible with the existing pprof, which will require
enhancements to read and process the enriched data.
2014-08-19 21:31:16 -07:00
Jason Evans
3a81cbd2d4 Dump heap profile backtraces in a stable order.
Also iterate over per thread stats in a stable order, which prepares the
way for stable ordering of per thread heap profile dumps.
2014-08-19 21:05:54 -07:00
Jason Evans
ab532e9799 Directly embed prof_ctx_t's bt. 2014-08-19 21:05:54 -07:00
Jason Evans
b41ccdb125 Convert prof_tdata_t's bt2cnt to a comprehensive map.
Treat prof_tdata_t's bt2cnt as a comprehensive map of the thread's
extant allocation samples (do not limit the total number of entries).
This helps prepare the way for per thread heap profiling.
2014-08-19 21:05:54 -07:00
Jason Evans
586c8ede42 Fix arena.<i>.dss mallctl to handle read-only calls. 2014-08-15 12:20:20 -07:00
Jason Evans
070b3c3fbd Fix and refactor runs_dirty-based purging.
Fix runs_dirty-based purging to also purge dirty pages in the spare
chunk.

Refactor runs_dirty manipulation into arena_dirty_{insert,remove}(), and
move the arena->ndirty accounting into those functions.

Remove the u.ql_link field from arena_chunk_map_t, and get rid of the
enclosing union for u.rb_link, since only rb_link remains.

Remove the ndirty field from arena_chunk_t.
2014-08-14 14:45:58 -07:00
Qinfan Wu
e8a2fd83a2 arena->npurgatory is no longer needed since we drop arena's lock
after stashing all the purgeable runs.
2014-08-12 09:50:01 -07:00
Qinfan Wu
90737fcda1 Remove chunks_dirty tree, nruns_avail and nruns_adjac since we no
longer need to maintain the tree for dirty page purging.
2014-08-12 09:50:00 -07:00
Qinfan Wu
e970800c78 Purge dirty pages from the beginning of the dirty list. 2014-08-12 09:50:00 -07:00
Qinfan Wu
a244e5078e Add dirty page counting for debug 2014-08-12 09:50:00 -07:00
Qinfan Wu
04d60a132b Maintain all the dirty runs in a linked list for each arena 2014-08-12 09:50:00 -07:00
Jason Evans
1522937e9c Fix the cactive statistic.
Fix the cactive statistic to decrease (rather than increase) when active
memory decreases.  This regression was introduced by
aa5113b1fd (Refactor overly large/complex
functions) and first released in 3.5.0.
2014-08-06 23:43:39 -07:00
Qinfan Wu
ea73eb8f3e Reintroduce the comment that was removed in f9ff603. 2014-08-06 16:43:01 -07:00
Qinfan Wu
55c9aa1038 Fix the bug that causes not allocating free run with lowest address. 2014-08-06 16:10:08 -07:00
Mike Hommey
6f533c1903 Ensure the default purgeable zone is after the default zone on OS X 2014-06-10 07:18:29 -07:00
Richard Diamond
994fad9bda Add check for madvise(2) to configure.ac.
Some platforms, such as Google's Portable Native Client, use Newlib and
thus lack access to madvise(2).  In those instances, pages_purge() is
transformed into a no-op.
2014-06-03 09:32:49 -07:00
Chris Peterson
70807bc54b Fix -Wsometimes-uninitialized warnings 2014-06-02 07:53:52 -07:00
Chris Peterson
3e310b34eb Fix -Wsign-compare warnings 2014-06-02 07:51:33 -07:00
Richard Diamond
94ed6812bc Don't catch fork()ing events for Native Client.
Native Client doesn't allow forking, thus there is no need to catch
fork()ing events for Native Client.

Additionally, without this commit, jemalloc will introduce an unresolved
pthread_atfork() in PNaCl Rust bins.
2014-06-02 07:45:33 -07:00
Richard Diamond
9c3a10fdf6 Try to use __builtin_ffsl if ffsl is unavailable.
Some platforms (like those using Newlib) don't have ffs/ffsl.  This
commit adds a check to configure.ac for __builtin_ffsl if ffsl isn't
found.  __builtin_ffsl performs the same function as ffsl, and has the
added benefit of being available on any platform utilizing
Gcc-compatible compiler.

This change does not address the used of ffs in the MALLOCX_ARENA()
macro.
2014-06-02 07:44:50 -07:00
Jason Evans
d04047cc29 Add size class computation capability.
Add size class computation capability, currently used only as validation
of the size class lookup tables.  Generalize the size class spacing used
for bins, for eventual use throughout the full range of allocation
sizes.
2014-05-28 21:06:46 -07:00
Jason Evans
e2deab7a75 Refactor huge allocation to be managed by arenas.
Refactor huge allocation to be managed by arenas (though the global
red-black tree of huge allocations remains for lookup during
deallocation).  This is the logical conclusion of recent changes that 1)
made per arena dss precedence apply to huge allocation, and 2) made it
possible to replace the per arena chunk allocation/deallocation
functions.

Remove the top level huge stats, and replace them with per arena huge
stats.

Normalize function names and types to *dalloc* (some were *dealloc*).

Remove the --enable-mremap option.  As jemalloc currently operates, this
is a performace regression for some applications, but planned work to
logarithmically space huge size classes should provide similar amortized
performance.  The motivation for this change was that mremap-based huge
reallocation forced leaky abstractions that prevented refactoring.
2014-05-15 22:36:41 -07:00
aravind
fb7fe50a88 Add support for user-specified chunk allocators/deallocators.
Add new mallctl endpoints "arena<i>.chunk.alloc" and
"arena<i>.chunk.dealloc" to allow userspace to configure
jemalloc's chunk allocator and deallocator on a per-arena
basis.
2014-05-12 10:46:03 -07:00
Jason Evans
a344dd01c7 Fix coding sytle nits. 2014-05-01 15:51:30 -07:00
Jason Evans
6f001059aa Simplify backtracing.
Simplify backtracing to not ignore any frames, and compensate for this
in pprof in order to increase flexibility with respect to function-based
refactoring even in the presence of non-deterministic inlining.  Modify
pprof to blacklist all jemalloc allocation entry points including
non-standard ones like mallocx(), and ignore all allocator-internal
frames.  Prior to this change, pprof excluded the specifically
blacklisted functions from backtraces, but it left allocator-internal
frames intact.
2014-04-22 20:55:09 -07:00
Lucian Adrian Grijincu
9d4e13f45a prof_backtrace: use unw_backtrace
unw_backtrace:
- does internal per-thread caching
- doesn't acquire an internal lock
2014-04-22 18:39:47 -07:00
Jason Evans
3541a904d6 Refactor small_size2bin and small_bin2size.
Refactor small_size2bin and small_bin2size to be inline functions rather
than directly accessed arrays.
2014-04-16 17:14:33 -07:00
Jason Evans
3e3caf03af Merge pull request #73 from bmaurer/smallmalloc
Smaller malloc hot path
2014-04-16 16:33:21 -07:00
Ben Maurer
021136ce4d Create a const array with only a small bin to size map 2014-04-16 14:31:24 -07:00
Ben Maurer
6c39f9e059 refactor profiling. only use a bytes till next sample variable. 2014-04-16 13:43:30 -07:00
Ben Maurer
a7619b7fa5 outline rare tcache_get codepaths 2014-04-16 13:36:56 -07:00
Jason Evans
bd87b01999 Optimize Valgrind integration.
Forcefully disable tcache if running inside Valgrind, and remove
Valgrind calls in tcache-specific code.

Restructure Valgrind-related code to move most Valgrind calls out of the
fast path functions.

Take advantage of static knowledge to elide some branches in
JEMALLOC_VALGRIND_REALLOC().
2014-04-15 16:49:57 -07:00
Jason Evans
ecd3e59ca3 Remove the "opt.valgrind" mallctl.
Remove the "opt.valgrind" mallctl because it is unnecessary -- jemalloc
automatically detects whether it is running inside valgrind.
2014-04-15 14:33:50 -07:00
Jason Evans
a2c719b374 Remove the "arenas.purge" mallctl.
Remove the "arenas.purge" mallctl, which was obsoleted by the
"arena.<i>.purge" mallctl in 3.1.0.
2014-04-15 12:46:28 -07:00
Jason Evans
4d434adb14 Make dss non-optional, and fix an "arena.<i>.dss" mallctl bug.
Make dss non-optional on all platforms which support sbrk(2).

Fix the "arena.<i>.dss" mallctl to return an error if "primary" or
"secondary" precedence is specified, but sbrk(2) is not supported.
2014-04-15 12:09:48 -07:00
Jason Evans
9790b9667f Remove the *allocm() API, which is superceded by the *allocx() API. 2014-04-14 22:32:31 -07:00
Jason Evans
9b0cbf0850 Remove support for non-prof-promote heap profiling metadata.
Make promotion of sampled small objects to large objects mandatory, so
that profiling metadata can always be stored in the chunk map, rather
than requiring one pointer per small region in each small-region page
run.  In practice the non-prof-promote code was only useful when using
jemalloc to track all objects and report them as leaks at program exit.
However, Valgrind is at least as good a tool for this particular use
case.

Furthermore, the non-prof-promote code is getting in the way of
some optimizations that will make heap profiling much cheaper for the
predominant use case (sampling a small representative proportion of all
allocations).
2014-04-11 14:24:51 -07:00
Jason Evans
f4e026f525 Merge pull request #70 from bmaurer/bitsplitrefactor
refactoring for bits splitting
2014-04-10 13:02:28 -07:00
Ben Maurer
f9ff60346d refactoring for bits splitting 2014-04-10 12:43:54 -07:00
Ben Maurer
be8e59f5a6 Don't dereference chunk->arena in free() hot path
When you call free() we load chunk->arena even though that
data isn't used on the tcache hot path.

In profiling some FB applications, I found that ~30% of the
dTLB misses in the free() function come from this line. With
4 MB chunks, the arena_chunk_t->map is ~ 32 KB (1024 pages
in the chunk, 4 8 byte pointers in arena_chunk_map_t). This
means there's only a 1/8 chance of the page containing
chunk->arena also comtaining the map bits.
2014-04-05 15:59:08 -07:00
Jason Evans
9480a23005 Merge pull request #59 from HarryWeppner/dev
FreeBSD memory (leak) profiling support
2014-03-29 16:47:08 -07:00
Jason Evans
57fb8e94ae Merge pull request #61 from mxw/huge-dss-prec
Use arena dss prec instead of default for huge allocs.
2014-03-28 14:48:56 -07:00
Harald Weppner
c2da2591be Consistently use debug lib(s) if present
Fixes a situation where nm uses the debug lib but
addr2line does not, which completely messes up the symbol
lookup.
2014-03-28 13:47:59 -07:00
Max Wang
fbb31029a5 Use arena dss prec instead of default for huge allocs.
Pass a dss_prec_t parameter to huge_{m,p,r}alloc instead of defaulting
to the chunk dss prec.
2014-03-28 13:43:58 -07:00
Chris Pride
20a8c78bfe Fix a crashing case where arena_chunk_init_hard returns NULL.
This happens when it fails to allocate a new chunk. Which
arena_chunk_alloc then passes into arena_avail_insert without any
checks. This then causes a crash when arena_avail_insert tries
to check chunk->ndirty.

This was introduced by the refactoring of arena_chunk_alloc
which previously would have returned NULL immediately after
calling chunk_alloc. This is now the return from
arena_chunk_init_hard so we need to check that return, and
not continue if it was NULL.
2014-03-25 22:36:05 -07:00
Harald Weppner
bf543df20c Enable profiling / leak detection in FreeBSD
* Assumes procfs is mounted at /proc, cf.
  <http://www.freebsd.org/doc/en/articles/linux-users/procfs.html>
2014-03-17 23:53:00 -07:00
Jason Evans
940fdfd5ee Fix junk filling for mremap(2)-based huge reallocation.
If mremap(2) is used for huge reallocation, physical pages are mapped to
new virtual addresses rather than data being copied to new pages.  This
bypasses the normal junk filling that would happen during allocation, so
add junk filling that is specific to this case.
2014-02-25 12:37:25 -08:00
Erwan Legrand
69e9fbb9c1 Fix typo 2014-02-14 12:48:58 +01:00
Jason Evans
0c4e743eaf Test and fix malloc_printf("%%"). 2014-01-22 09:00:27 -08:00
Jason Evans
e2206edebc Fix unused variable warnings. 2014-01-21 14:59:13 -08:00
Jason Evans
772163b4f3 Add heap profiling tests.
Fix a regression in prof_dump_ctx() due to an uninitized variable.  This
was caused by revision 4f37ef693e, so no
releases are affected.
2014-01-17 15:40:52 -08:00
Jason Evans
eefdd02e70 Fix a variable prototype/definition mismatch. 2014-01-16 18:04:30 -08:00
Jason Evans
4f37ef693e Refactor prof_dump() to reduce contention.
Refactor prof_dump() to use a two pass algorithm, and prof_leave() prior
to the second pass.  This avoids write(2) system calls while holding
critical prof resources.

Fix prof_dump() to close the dump file descriptor for all relevant error
paths.

Minimize the size of prof-related static buffers when prof is disabled.
This saves roughly 65 KiB of application memory for non-prof builds.

Refactor prof_ctx_init() out of prof_lookup_global().
2014-01-16 13:36:38 -08:00
Jason Evans
fb1775e47e Refactor prof_lookup() by extracting prof_lookup_global(). 2014-01-14 17:04:34 -08:00
Jason Evans
aa5113b1fd Refactor overly large/complex functions.
Refactor overly large functions by breaking out helper functions.

Refactor overly complex multi-purpose functions into separate more
specific functions.
2014-01-14 16:23:03 -08:00
Jason Evans
b2c31660be Extract profiling code from [re]allocation functions.
Extract profiling code from malloc(), imemalign(), calloc(), realloc(),
mallocx(), rallocx(), and xallocx().  This slightly reduces the amount
of code compiled into the fast paths, but the primary benefit is the
combinatorial complexity reduction.

Simplify iralloc[t]() by creating a separate ixalloc() that handles the
no-move cases.

Further simplify [mrxn]allocx() (and by implication [mrn]allocm()) to
make request size overflows due to size class and/or alignment
constraints trigger undefined behavior (detected by debug-only
assertions).

Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling
backtrace creation in imemalign().  This bug impacted posix_memalign()
and aligned_alloc().
2014-01-12 15:41:05 -08:00
Jason Evans
6b694c4d47 Add junk/zero filling unit tests, and fix discovered bugs.
Fix growing large reallocation to junk fill new space.

Fix huge deallocation to junk fill when munmap is disabled.
2014-01-07 16:54:17 -08:00
Jason Evans
e18c25d23d Add util unit tests, and fix discovered bugs.
Add unit tests for pow2_ceil(), malloc_strtoumax(), and
malloc_snprintf().

Fix numerous bugs in malloc_strotumax() error handling/reporting.  These
bugs could have caused application-visible issues for some seldom used
(0X... and 0... prefixes) or malformed MALLOC_CONF or mallctl() argument
strings, but otherwise they had no impact.

Fix numerous bugs in malloc_snprintf().  These bugs were not exercised
by existing malloc_*printf() calls, so they had no impact.
2014-01-06 20:41:09 -08:00
Jason Evans
b954bc5d3a Convert rtree from (void *) to (uint8_t) storage.
Reduce rtree memory usage by storing booleans (1 byte each) rather than
pointers.  The rtree code is only used to record whether jemalloc manages
a chunk of memory, so there's no need to store pointers in the rtree.

Increase rtree node size to 64 KiB in order to reduce tree depth from 13
to 3 on 64-bit systems.  The conversion to more compact leaf nodes was
enough by itself to make the rtree depth 1 on 32-bit systems; due to the
fact that root nodes are smaller than the specified node size if
possible, the node size change has no impact on 32-bit systems (assuming
default chunk size).
2014-01-02 17:36:38 -08:00
Jason Evans
b980cc774a Add rtree unit tests. 2014-01-02 16:17:15 -08:00
Jason Evans
0405312921 Fix an uninitialized variable read in xallocx(). 2013-12-20 15:52:01 -08:00
Jason Evans
d8a390020c Fix a few mallctl() documentation errors.
Normalize mallctl() order (code and documentation).
2013-12-19 21:40:41 -08:00
Jason Evans
0d6c5d8bd0 Add quarantine unit tests.
Verify that freed regions are quarantined, and that redzone corruption
is detected.

Introduce a testing idiom for intercepting/replacing internal functions.
In this case the replaced function is ordinarily a static function, but
the idiom should work similarly for library-private functions.
2013-12-17 15:19:12 -08:00
Jason Evans
6e62984ef6 Don't junk-fill reallocations unless usize changes.
Don't junk fill reallocations for which the request size is less than
the current usable size, but not enough smaller to cause a size class
change.  Unlike malloc()/calloc()/realloc(), *allocx() contractually
treats the full usize as the allocation, so a caller can ask for zeroed
memory via mallocx() and a series of rallocx() calls that all specify
MALLOCX_ZERO, and be assured that all newly allocated bytes will be
zeroed and made available to the application without danger of allocator
mutation until the size class decreases enough to cause usize reduction.
2013-12-15 21:57:09 -08:00
Jason Evans
665769357c Optimize arena_prof_ctx_set().
Refactor such that arena_prof_ctx_set() receives usize as an argument,
and use it to determine whether to handle ptr as a small region, rather
than reading the chunk page map.
2013-12-15 21:57:02 -08:00
Jason Evans
d82a5e6a34 Implement the *allocx() API.
Implement the *allocx() API, which is a successor to the *allocm() API.
The *allocx() functions are slightly simpler to use because they have
fewer parameters, they directly return the results of primary interest,
and mallocx()/rallocx() avoid the strict aliasing pitfall that
allocm()/rallocx() share with posix_memalign().  The following code
violates strict aliasing rules:

    foo_t *foo;
    allocm((void **)&foo, NULL, 42, 0);

whereas the following is safe:

    foo_t *foo;
    void *p;
    allocm(&p, NULL, 42, 0);
    foo = (foo_t *)p;

mallocx() does not have this problem:

    foo_t *foo = (foo_t *)mallocx(42, 0);
2013-12-12 22:35:52 -08:00
Jason Evans
6edc97db15 Fix inline-related macro issues.
Add JEMALLOC_INLINE_C and use it instead of JEMALLOC_INLINE in .c files,
so that the annotated functions are always static.

Remove SFMT's inline-related macros and use jemalloc's instead, so that
there's no danger of interactions with jemalloc's definitions that
disable inlining for debug builds.
2013-12-10 14:35:34 -08:00
Jason Evans
7369232544 Silence some unused variable warnings. 2013-12-10 13:51:52 -08:00
Jason Evans
a4f124f59f Normalize #define whitespace.
Consistently use a tab rather than a space following #define.
2013-12-08 22:28:27 -08:00
Jason Evans
2a83ed0284 Refactor tests.
Refactor tests to use explicit testing assertions, rather than diff'ing
test output.  This makes the test code a bit shorter, more explicitly
encodes testing intent, and makes test failure diagnosis more
straightforward.
2013-12-08 20:52:21 -08:00
Jason Evans
6668853596 Avoid deprecated sbrk(2) on OS X.
Avoid referencing sbrk(2) on OS X, because it is deprecated as of OS X
10.9 (Mavericks), and the compiler warns against using it.
2013-12-03 21:49:36 -08:00
Jason Evans
52b30691f9 Remove unused variable. 2013-12-02 15:16:39 -08:00
Jason Evans
addad093f8 Clean up malloc_ncpus().
Clean up malloc_ncpus() by replacing incorrectly indented if..else
branches with a ?: expression.

Submitted by Igor Podlesny.
2013-11-29 16:19:44 -08:00
Jason Evans
39e7fd0580 Fix ALLOCM_ARENA(a) handling in rallocm().
Fix rallocm() to use the specified arena for allocation, not just
deallocation.

Clarify ALLOCM_ARENA(a) documentation.
2013-11-25 18:02:35 -08:00
Jason Evans
d6df91438a Fix a potential infinite loop during thread exit.
Fix malloc_tsd_dalloc() to bypass tcache when dallocating, so that there
is no danger of causing tcache reincarnation during thread exit.
Whether this infinite loop occurs depends on the pthreads TSD
implementation; it is known to occur on Solaris.

Submitted by Markus Eberspächer.
2013-11-19 18:01:45 -08:00
Jason Evans
c368f8c8a2 Remove unnecessary zeroing in arena_palloc(). 2013-10-29 18:31:17 -07:00
Jason Evans
239692b18e Fix whitespace. 2013-10-28 12:41:37 -07:00
Leonard Crestez
cb17fc6a8f Add support for LinuxThreads.
When using LinuxThreads pthread_setspecific triggers recursive
allocation on all threads. Work around this by creating a global linked
list of in-progress tsd initializations.

This modifies the _tsd_get_wrapper macro-generated function. When it has
to initialize an TSD object it will push the item to the linked list
first. If this causes a recursive allocation then the _get_wrapper
request is satisfied from the list. When pthread_setspecific returns the
item is removed from the list.

This effectively adds a very poor substitute for real TLS used only
during pthread_setspecific allocation recursion.

Signed-off-by: Crestez Dan Leonard <lcrestez@ixiacom.com>
2013-10-24 18:25:19 -07:00
Leonard Crestez
ac4403cacb Delay pthread_atfork registering.
This function causes recursive allocation on LinuxThreads.

Signed-off-by: Crestez Dan Leonard <lcrestez@ixiacom.com>
2013-10-24 16:40:31 -07:00
Jason Evans
93f39f8d23 Fix a file descriptor leak in a prof_dump_maps() error path.
Reported by Pat Lynch.
2013-10-21 15:07:40 -07:00
Jason Evans
1d1cee127a Add a missing mutex unlock in malloc_init_hard() error path.
Add a missing mutex unlock in a malloc_init_hard() error path (failed
mutex initialization).  In practice this bug was very unlikely to ever
trigger, but if it did, application deadlock would likely result.

Reported by Pat Lynch.
2013-10-21 15:04:12 -07:00
Jason Evans
e2985a2381 Avoid (x < 0) comparison for unsigned x.
Avoid (min < 0) comparison for unsigned min in malloc_conf_init().  This
bug had no practical consequences.

Reported by Pat Lynch.
2013-10-21 15:01:44 -07:00
Jason Evans
30e7cb1118 Fix a data race for large allocation stats counters.
Reported by Pat Lynch.
2013-10-21 15:00:06 -07:00
Jason Evans
f1c3da8b02 Consistently use malloc_mutex_prefork().
Consistently use malloc_mutex_prefork() instead of malloc_mutex_lock()
in all prefork functions.
2013-10-21 14:59:10 -07:00
Jason Evans
6556e28be1 Prefer not_reached() over assert(false) where appropriate. 2013-10-21 14:56:27 -07:00
Jason Evans
d504477935 Fix a compiler warning.
Fix a compiler warning in chunk_record() that was due to reading node
rather than xnode.  In practice this did not cause any correctness
issue, but dataflow analysis in some compilers cannot tell that node and
xnode are always equal in cases that the read is reached.
2013-10-20 15:11:01 -07:00
Jason Evans
7b65180b32 Fix a race condition in the "arenas.extend" mallctl.
Fix a race condition in the "arenas.extend" mallctl that could lead to
internal data structure corruption.  The race could be hit if one
thread called the "arenas.extend" mallctl while another thread
concurrently triggered initialization of one of the lazily created
arenas.
2013-10-20 14:39:33 -07:00
Jason Evans
dda90f59e2 Fix a Valgrind integration flaw.
Fix a Valgrind integration flaw that caused Valgrind warnings about
reads of uninitialized memory in internal zero-initialized data
structures (relevant to tcache and prof code).
2013-10-19 23:48:40 -07:00
Jason Evans
87a02d2bb1 Fix a Valgrind integration flaw.
Fix a Valgrind integration flaw that caused Valgrind warnings about
reads of uninitialized memory in arena chunk headers.
2013-10-19 21:40:20 -07:00
Jason Evans
543abf7e6c Fix inlining warning.
Add the JEMALLOC_ALWAYS_INLINE_C macro and use it for always-inlined
functions declared in .c files.  This fixes a function attribute
inconsistency for debug builds that resulted in (harmless) compiler
warnings about functions not being inlinable.

Reported by Ricardo Nabinger Sanchez.
2013-10-19 17:26:00 -07:00
Jason Evans
3ab682d341 Silence an unused variable warning.
Reported by Ricardo Nabinger Sanchez.
2013-10-19 17:25:17 -07:00
Alexandre Perrin
dd6ef0302f malloc_conf_init: revert errno value when readlink(2) fail. 2013-10-13 15:33:15 -07:00
Jason Evans
4f929aa948 Fix another deadlock related to chunk_record().
Fix chunk_record() to unlock chunks_mtx before deallocating a base
node, in order to avoid potential deadlock.  This fix addresses the
second of two similar bugs.
2013-04-22 22:36:18 -07:00
Jason Evans
741fbc6ba4 Fix deadlock related to chunk_record().
Fix chunk_record() to unlock chunks_mtx before deallocating a base node,
in order to avoid potential deadlock.

Reported by Tudor Bosman.
2013-04-17 09:57:11 -07:00
Jason Evans
88c222c8e9 Fix a prof-related locking order bug.
Fix a locking order bug that could cause deadlock during fork if heap
profiling were enabled.
2013-02-06 11:59:30 -08:00
Jason Evans
06912756cc Fix Valgrind integration.
Fix Valgrind integration to annotate all internally allocated memory in
a way that keeps Valgrind happy about internal data structure access.
2013-01-31 17:02:53 -08:00
Jason Evans
a7a28c334e Fix a chunk recycling bug.
Fix a chunk recycling bug that could cause the allocator to lose track
of whether a chunk was zeroed.  On FreeBSD, NetBSD, and OS X, it could
cause corruption if allocating via sbrk(2) (unlikely unless running with
the "dss:primary" option specified).  This was completely harmless on
Linux unless using mlockall(2) (and unlikely even then, unless the
--disable-munmap configure option or the "dss:primary" option was
specified).  This regression was introduced in 3.1.0 by the
mlockall(2)/madvise(2) interaction fix.
2013-01-31 16:53:58 -08:00
Jason Evans
d0e942e466 Fix two quarantine bugs.
Internal reallocation of the quarantined object array leaked the old array.

Reallocation failure for internal reallocation of the quarantined object
array (very unlikely) resulted in memory corruption.
2013-01-31 14:43:54 -08:00
Jason Evans
bbe29d374d Fix potential TLS-related memory corruption.
Avoid writing to uninitialized TLS as a side effect of deallocation.
Initializing TLS during deallocation is unsafe because it is possible
that a thread never did any allocation, and that TLS has already been
deallocated by the threads library, resulting in write-after-free
corruption.  These fixes affect prof_tdata and quarantine; all other
uses of TLS are already safe, whether intentionally (as for tcache) or
unintentionally (as for arenas).
2013-01-31 14:23:48 -08:00
Jason Evans
d1b6e18a99 Revert opt_abort and opt_junk refactoring.
Revert refactoring of opt_abort and opt_junk declarations.  clang
accepts the config_*-based declarations (and generates correct code),
but gcc complains with:

  error: initializer element is not constant
2013-01-22 16:54:26 -08:00
Jason Evans
ba175a2bfb Use config_* instead of JEMALLOC_*.
Convert a couple of stragglers from JEMALLOC_* to use config_*.
2013-01-22 12:14:45 -08:00
Jason Evans
ae03bf6a57 Update hash from MurmurHash2 to MurmurHash3.
Update hash from MurmurHash2 to MurmurHash3, primarily because the
latter generates 128 bits in a single call for no extra cost, which
simplifies integration with cuckoo hashing.
2013-01-22 12:02:08 -08:00
Jason Evans
88393cb0eb Add and use JEMALLOC_ALWAYS_INLINE.
Add JEMALLOC_ALWAYS_INLINE and use it to guarantee that the entire fast
paths of the primary allocation/deallocation functions are inlined.
2013-01-22 08:45:43 -08:00
Jason Evans
38067483c5 Tighten valgrind integration.
Tighten valgrind integration such that immediately after memory is
validated or zeroed, valgrind is told to forget the memory's 'defined'
state.  The only place newly allocated memory should be left marked as
'defined' is in the public functions (e.g. calloc() and realloc()).
2013-01-21 20:04:42 -08:00
Jason Evans
14a2c6a698 Avoid validating freshly mapped memory.
Move validation of supposedly zeroed pages from chunk_alloc() to
chunk_recycle().  There is little point to validating newly mapped
memory returned by chunk_alloc_mmap(), and memory that comes from sbrk()
is explicitly zeroed, so there is little risk to assuming that
chunk_alloc_dss() actually does the zeroing properly.

This relaxation of validation can make a big difference to application
startup time and overall system usage on platforms that use jemalloc as
the system allocator (namely FreeBSD).

Submitted by Ian Lepore <ian@FreeBSD.org>.
2013-01-21 19:56:34 -08:00
Garrett Cooper
6e6164ae15 Don't mangle errno with free(3) if utrace(2) fails
This ensures POLA on FreeBSD (at least) as free(3) is generally assumed
to not fiddle around with errno.

Signed-off-by: Garrett Cooper <yanegomi@gmail.com>
2012-12-24 10:30:57 -08:00
Jason Evans
1bf2743e08 Add clipping support to lg_chunk option processing.
Modify processing of the lg_chunk option so that it clips an
out-of-range input to the edge of the valid range.  This makes it
possible to request the minimum possible chunk size without intimate
knowledge of allocator internals.

Submitted by Ian Lepore (see FreeBSD PR bin/174641).
2012-12-23 08:51:48 -08:00
Jason Evans
1271185b87 Fix chunk_recycle() Valgrind integration.
Fix chunk_recycyle() to unconditionally inform Valgrind that returned
memory is undefined.  This fixes Valgrind warnings that would result
from a huge allocation being freed, then recycled for use as an arena
chunk.  The arena code would write metadata to the chunk header, and
Valgrind would consider these invalid writes.
2012-12-12 10:12:18 -08:00
Jason Evans
6eb84fbe31 Fix "arenas.extend" mallctl to return the number of arenas.
Reported by Mike Hommey.
2012-11-29 22:13:04 -08:00
Jason Evans
a3b3386ddd Avoid arena_prof_accum()-related locking when possible.
Refactor arena_prof_accum() and its callers to avoid arena locking when
prof_interval is 0 (as when profiling is disabled).

Reported by Ben Maurer.
2012-11-13 13:47:53 -08:00
Jason Evans
abf6739317 Tweak chunk purge order according to fragmentation.
Tweak chunk purge order to purge unfragmented chunks from high to low
memory.  This facilitates dirty run reuse.
2012-11-07 10:08:34 -08:00
Mike Hommey
847ff223de Don't register jemalloc's zone allocator if something else already replaced the system default zone 2012-11-06 16:06:59 -08:00
Jason Evans
e3d13060c8 Purge unused dirty pages in a fragmentation-reducing order.
Purge unused dirty pages in an order that first performs clean/dirty run
defragmentation, in order to mitigate available run fragmentation.

Remove the limitation that prevented purging unless at least one chunk
worth of dirty pages had accumulated in an arena.  This limitation was
intended to avoid excessive purging for small applications, but the
threshold was arbitrary, and the effect of questionable utility.

Relax opt_lg_dirty_mult from 5 to 3.  This compensates for increased
likelihood of allocating clean runs, given the same ratio of clean:dirty
runs, and reduces the potential for repeated purging in pathological
large malloc/free loops that push the active:dirty page ratio just over
the purge threshold.
2012-11-06 00:59:53 -08:00
Jason Evans
34457f5144 Fix deadlock in the arenas.purge mallctl.
Fix deadlock in the arenas.purge mallctl due to recursive mutex
acquisition.
2012-11-03 21:18:28 -07:00
Jason Evans
12efefb195 Fix dss/mmap allocation precedence code.
Fix dss/mmap allocation precedence code to use recyclable mmap memory
only after primary dss allocation fails.
2012-10-16 22:06:56 -07:00
Jason Evans
a5c80f893e Add ctl_mutex proection to arena_i_dss_ctl().
Add ctl_mutex proection to arena_i_dss_ctl(), since ctl_stats.narenas is
accessed.
2012-10-15 12:48:59 -07:00
Jason Evans
609ae595f0 Add arena-specific and selective dss allocation.
Add the "arenas.extend" mallctl, so that it is possible to create new
arenas that are outside the set that jemalloc automatically multiplexes
threads onto.

Add the ALLOCM_ARENA() flag for {,r,d}allocm(), so that it is possible
to explicitly allocate from a particular arena.

Add the "opt.dss" mallctl, which controls the default precedence of dss
allocation relative to mmap allocation.

Add the "arena.<i>.dss" mallctl, which makes it possible to set the
default dss precedence on a per arena or global basis.

Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge".

Add the "stats.arenas.<i>.dss" mallctl.
2012-10-12 18:26:16 -07:00
Jan Beich
d0ffd8ed4f mark _pthread_mutex_init_calloc_cb as public explicitly
Mozilla build hides everything by default using visibility pragma and
unhides only explicitly listed headers. But this doesn't work on FreeBSD
because _pthread_mutex_init_calloc_cb is neither documented nor exposed
via any header.
2012-10-10 09:10:37 -07:00
Jason Evans
2cc11ff837 Make malloc_usable_size() implementation consistent with prototype.
Use JEMALLOC_USABLE_SIZE_CONST for the malloc_usable_size()
implementation as well as the prototype, for consistency's sake.
2012-10-09 16:29:21 -07:00
Jason Evans
b5225928fe Fix fork(2)-related mutex acquisition order.
Fix mutex acquisition order inversion for the chunks rtree and the base
mutex.  Chunks rtree acquisition was introduced by the previous commit,
so this bug was short-lived.
2012-10-09 16:16:00 -07:00
Jason Evans
20f1fc95ad Fix fork(2)-related deadlocks.
Add a library constructor for jemalloc that initializes the allocator.
This fixes a race that could occur if threads were created by the main
thread prior to any memory allocation, followed by fork(2), and then
memory allocation in the child process.

Fix the prefork/postfork functions to acquire/release the ctl, prof, and
rtree mutexes.  This fixes various fork() child process deadlocks, but
one possible deadlock remains (intentionally) unaddressed: prof
backtracing can acquire runtime library mutexes, so deadlock is still
possible if heap profiling is enabled during fork().  This deadlock is
known to be a real issue in at least the case of libgcc-based
backtracing.

Reported by tfengjun.
2012-10-09 15:21:46 -07:00
Jason Evans
7de92767c2 Fix mlockall()/madvise() interaction.
mlockall(2) can cause purging via madvise(2) to fail.  Fix purging code
to check whether madvise() succeeded, and base zeroed page metadata on
the result.

Reported by Olivier Lecomte.
2012-10-08 18:04:49 -07:00
Jason Evans
f4c3f8545b Fix error return value in thread_tcache_enabled_ctl().
Reported by Corey Richardson.
2012-10-08 15:48:04 -07:00
Corey Richardson
1d553f72cb If sysconf() fails, the number of CPUs is reported as UINT_MAX, not 1 as it should be 2012-10-08 15:45:38 -07:00
Corey Richardson
35579afb55 Remove unused variable and branch (reported by clang-analzyer) 2012-10-08 15:45:38 -07:00
Jason Evans
5c710cee78 Remove const from __*_hook variable declarations.
Remove const from __*_hook variable declarations, so that glibc can
modify them during process forking.
2012-05-23 16:09:22 -07:00
Jason Evans
f1966e1dc7 Update a comment. 2012-05-16 00:35:08 -07:00
Jason Evans
174b70efb4 Disable tcache by default if running inside Valgrind.
Disable tcache by default if running inside Valgrind, in order to avoid
making unallocated objects appear reachable to Valgrind.
2012-05-15 23:31:53 -07:00
Jason Evans
781fe75e0a Auto-detect whether running inside Valgrind.
Auto-detect whether running inside Valgrind, thus removing the need to
manually specify MALLOC_CONF=valgrind:true.
2012-05-15 14:48:14 -07:00
Jason Evans
58ad1e4956 Return early in _malloc_{pre,post}fork() if uninitialized.
Avoid mutex operations in _malloc_{pre,post}fork() unless jemalloc has
been initialized.

Reported by David Xu.
2012-05-11 17:40:16 -07:00
Jason Evans
d8ceef6c55 Fix large calloc() zeroing bugs.
Refactor code such that arena_mapbits_{large,small}_set() always
preserves the unzeroed flag, and manually manipulate the unzeroed flag
in the one case where it actually gets reset (in arena_chunk_purge()).
This fixes unzeroed preservation bugs in arena_run_split() and
arena_ralloc_large_grow().  These bugs caused large calloc() to return
non-zeroed memory under some circumstances.
2012-05-10 21:49:43 -07:00
Jason Evans
30fe12b866 Add arena chunk map assertions. 2012-05-10 21:49:43 -07:00
Jason Evans
5b0c99649f Refactor arena_run_alloc().
Refactor duplicated arena_run_alloc() code into
arena_run_alloc_helper().
2012-05-10 21:49:43 -07:00
Jason Evans
2e671ffbad Add the --enable-mremap option.
Add the --enable-mremap option, and disable the use of mremap(2) by
default, for the same reason that freeing chunks via munmap(2) is
disabled by default on Linux: semi-permanent VM map fragmentation.
2012-05-09 16:12:00 -07:00
Jason Evans
374d26a43b Fix chunk_recycle() to stop leaking trailing chunks.
Fix chunk_recycle() to correctly compute trailsize and re-insert
trailing chunks.  This fixes a major virtual memory leak.

Simplify chunk_record() to avoid dropping/re-acquiring chunks_mtx.
2012-05-09 14:48:35 -07:00
Jason Evans
de6fbdb72c Fix chunk_alloc_mmap() bugs.
Simplify chunk_alloc_mmap() to no longer attempt map extension.  The
extra complexity isn't warranted, because although in the success case
it saves one system call as compared to immediately falling back to
chunk_alloc_mmap_slow(), it also makes the failure case even more
expensive.  This simplification removes two bugs:

- For Windows platforms, pages_unmap() wasn't being called for unaligned
  mappings prior to falling back to chunk_alloc_mmap_slow().  This
  caused permanent virtual memory leaks.
- For non-Windows platforms, alignment greater than chunksize caused
  pages_map() to be called with size 0 when attempting map extension.
  This always resulted in an mmap() error, and subsequent fallback to
  chunk_alloc_mmap_slow().
2012-05-09 13:05:04 -07:00
Jason Evans
34a8cf6c40 Fix a base allocator deadlock.
Fix a base allocator deadlock due to chunk_recycle() calling back into
the base allocator.
2012-05-02 20:41:42 -07:00
Mike Hommey
c584fc75bb Don't use sizeof() on a VARIABLE_ARRAY
In the alloca() case, this fails to be the right size.
2012-05-02 16:33:19 -07:00
Mike Hommey
3597e91482 Allow je_malloc_message to be overridden when linking statically
If an application wants to override je_malloc_message, it is better to define
the symbol locally than to change its value in main(), which might be too late
for various reasons.

Due to je_malloc_message being initialized in util.c, statically linking
jemalloc with an application defining je_malloc_message fails due to
"multiple definition of" the symbol.

Defining it without a value (like je_malloc_conf) makes it more easily
overridable.
2012-05-02 16:25:41 -07:00
Jason Evans
80737c3323 Further optimize and harden arena_salloc().
Further optimize arena_salloc() to only look at the binind chunk map
bits in the common case.

Add more sanity checks to arena_salloc() that detect chunk map
inconsistencies for large allocations (whether due to allocator bugs or
application bugs).
2012-05-02 16:11:03 -07:00
Jason Evans
889ec59bd3 Make malloc_write() non-inline.
Make malloc_write() non-inline, in order to resolve its dependency on
je_malloc_write().
2012-05-02 02:08:03 -07:00
Jason Evans
203484e2ea Optimize malloc() and free() fast paths.
Embed the bin index for small page runs into the chunk page map, in
order to omit [...] in the following dependent load sequence:
  ptr-->mapelm-->[run-->bin-->]bin_info

Move various non-critcal code out of the inlined function chain into
helper functions (tcache_event_hard(), arena_dalloc_small(), and
locking).
2012-05-02 00:30:36 -07:00
Mike Hommey
fd97b1dfc7 Add support for MSVC
Tested with MSVC 8 32 and 64 bits.
2012-05-01 11:32:11 -07:00
Mike Hommey
da99e31105 Replace JEMALLOC_ATTR with various different macros when it makes sense
Theses newly added macros will be used to implement the equivalent under
MSVC. Also, move the definitions to headers, where they make more sense,
and for some, are even more useful there (e.g. malloc).
2012-04-30 17:57:31 -07:00
Mike Hommey
a14bce85e8 Use Get/SetLastError on Win32
Using errno on win32 doesn't quite work, because the value set in a shared
library can't be read from e.g. an executable calling the function setting
errno.

At the same time, since buferror always uses errno/GetLastError, don't pass
it.
2012-04-30 16:50:55 -07:00
Mike Hommey
af04b744bd Remove the VOID macro
Windows headers define a VOID macro.
2012-04-30 16:42:30 -07:00
Mike Hommey
8b49971d0c Avoid variable length arrays and remove declarations within code
MSVC doesn't support C99, and building as C++ to be able to use them is
dangerous, as C++ and C99 are incompatible.

Introduce a VARIABLE_ARRAY macro that either uses VLA when supported,
or alloca() otherwise. Note that using alloca() inside loops doesn't
quite work like VLAs, thus the use of VARIABLE_ARRAY there is discouraged.
It might be worth investigating ways to check whether VARIABLE_ARRAY is
used in such context at runtime in debug builds and bail out if that
happens.
2012-04-29 00:25:34 -07:00
Jason Evans
f278994029 Fix more prof_tdata resurrection corner cases. 2012-04-28 23:27:13 -07:00
Jason Evans
0050a0f7e6 Handle prof_tdata resurrection.
Handle prof_tdata resurrection during thread shutdown, similarly to how
tcache and quarantine handle resurrection.
2012-04-28 18:14:24 -07:00
Jason Evans
95ff6aadca Don't set prof_tdata during thread cleanup.
Don't set prof_tdata during thread cleanup, because doing so will cause
the cleanup function to be called again, the second time with a NULL
argument.
2012-04-28 14:15:28 -07:00
Jason Evans
3fb50b0407 Fix a PROF_ALLOC_PREP() error path.
Fix a PROF_ALLOC_PREP() error path to initialize the return value to
NULL.
2012-04-25 13:13:44 -07:00
Jason Evans
6b9ed67b4b Fix the "epoch" mallctl.
Fix the "epoch" mallctl to update cached stats even if the passed in
epoch is 0.
2012-04-25 13:12:46 -07:00
Jason Evans
f54166e7ef Add missing Valgrind annotations. 2012-04-23 22:41:36 -07:00
Jason Evans
7e060397a3 Fix quarantine_grow() bugs. 2012-04-23 22:07:30 -07:00
Jason Evans
9cd351d147 Add usize sanity checking to quarantine. 2012-04-23 21:43:18 -07:00
Jason Evans
577dd84660 Handle quarantine resurrection during thread exit.
Handle quarantine resurrection during thread exit in much the same way
as tcache resurrection is handled.
2012-04-23 21:14:26 -07:00
Jason Evans
87667a86a0 Fix two CHILD() macro calls in the ctl tree. 2012-04-23 19:54:15 -07:00
Jason Evans
65f343a632 Fix ctl regression.
Fix ctl to correctly compute the number of children at each level of the
ctl tree.
2012-04-23 19:31:45 -07:00
Jason Evans
8694e2e7b9 Silence compiler warnings. 2012-04-23 13:05:32 -07:00
Mike Hommey
461ad5c87a Avoid using a union for ctl_node_s
MSVC doesn't support C99, and as such doesn't support designated
initialization of structs and unions. As there is never a mix of
indexed and named nodes, it is pretty straightforward to use a
different type for each.
2012-04-23 11:43:44 -07:00
Jason Evans
52386b2dc6 Fix heap profiling bugs.
Fix a potential deadlock that could occur during interval- and
growth-triggered heap profile dumps.

Fix an off-by-one heap profile statistics bug that could be observed in
interval- and growth-triggered heap profiles.

Fix heap profile dump filename sequence numbers (regression during
conversion to malloc_snprintf()).
2012-04-22 16:00:11 -07:00
Mike Hommey
08e2221e99 Remove leftovers from the vsnprintf check in malloc_vsnprintf
Commit 4eeb52f removed vsnprintf validation, but left a now unused va_copy.
It so happens that MSVC doesn't support va_copy.
2012-04-21 21:29:12 -07:00
Mike Hommey
a19e87fbad Add support for Mingw 2012-04-21 21:27:46 -07:00