Commit Graph

288 Commits

Author SHA1 Message Date
Jason Evans
f04a0bef99 Fix prof regressions.
Fix prof regressions related to tdata (main per thread profiling data
structure) destruction:
- Deadlock.  The fix for this was intended to be part of
  20c31deaae (Test prof.reset mallctl and
  fix numerous discovered bugs.) but the fix was left incomplete.
- Destruction race.  Detaching tdata just prior to destruction without
  holding the tdatas lock made it possible for another thread to destroy
  the tdata out from under the thread that was on its way to doing so.
2014-10-04 15:03:49 -07:00
Jason Evans
0800afd03f Silence a compiler warning. 2014-10-04 14:59:17 -07:00
Jason Evans
029d44cf8b Fix tsd cleanup regressions.
Fix tsd cleanup regressions that were introduced in
5460aa6f66 (Convert all tsd variables to
reside in a single tsd structure.).  These regressions were twofold:

1) tsd_tryget() should never (and need never) return NULL.  Rename it to
   tsd_fetch() and simplify all callers.
2) tsd_*_set() must only be called when tsd is in the nominal state,
   because cleanup happens during the nominal-->purgatory transition,
   and re-initialization must not happen while in the purgatory state.
   Add tsd_nominal() and use it as needed.  Note that tsd_*{p,}_get()
   can still be used as long as no re-initialization that would require
   cleanup occurs.  This means that e.g. the thread_allocated counter
   can be updated unconditionally.
2014-10-04 11:22:55 -07:00
Jason Evans
fc12c0b8bc Implement/test/fix prof-related mallctl's.
Implement/test/fix the opt.prof_thread_active_init,
prof.thread_active_init, and thread.prof.active mallctl's.

Test/fix the thread.prof.name mallctl.

Refactor opt_prof_active to be read-only and move mutable state into the
prof_active variable.  Stop leaning on ctl-related locking for
protection.
2014-10-03 23:25:30 -07:00
Jason Evans
551ebc4364 Convert to uniform style: cond == false --> !cond 2014-10-03 10:16:09 -07:00
Jason Evans
20c31deaae Test prof.reset mallctl and fix numerous discovered bugs. 2014-10-02 23:01:10 -07:00
Daniel Micay
f8034540a1 Implement in-place huge allocation shrinking.
Trivial example:

    #include <stdlib.h>

    int main(void) {
        void *ptr = malloc(1024 * 1024 * 8);
        if (!ptr) return 1;
        ptr = realloc(ptr, 1024 * 1024 * 4);
        if (!ptr) return 1;
    }

Before:

    mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcfff000000
    mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcffec00000
    madvise(0x7fcfff000000, 8388608, MADV_DONTNEED) = 0

After:

    mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1934800000
    madvise(0x7f1934c00000, 4194304, MADV_DONTNEED) = 0

Closes #134
2014-10-01 16:55:03 -07:00
Dave Rigby
e3a16fce5e Mark malloc_conf as a weak symbol
This fixes issue #113 - je_malloc_conf is not respected on OS X
2014-09-29 15:05:55 -07:00
Jason Evans
0c5dd03e88 Move small run metadata into the arena chunk header.
Move small run metadata into the arena chunk header, with multiple
expected benefits:
- Lower run fragmentation due to reduced run sizes; runs are more likely
  to completely drain when there are fewer total regions.
- Improved cache behavior.  Prior to this change, run headers were
  always page-aligned, which put extra pressure on some CPU cache sets.
  The degree to which this was a problem was hardware dependent, but it
  likely hurt some even for the most advanced modern hardware.
- Buffer overruns/underruns are less likely to corrupt allocator
  metadata.
- Size classes between 4 KiB and 16 KiB become reasonable to support
  without any special handling, and the runs are small enough that dirty
  unused pages aren't a significant concern.
2014-09-29 01:31:39 -07:00
Jason Evans
f97e5ac4ec Implement compile-time bitmap size computation. 2014-09-28 14:43:11 -07:00
Jason Evans
6ef80d68f0 Fix profile dumping race.
Fix a race that caused a non-critical assertion failure.  To trigger the
race, a thread had to be part way through initializing a new sample,
such that it was discoverable by the dumping thread, but not yet linked
into its gctx by the time a later dump phase would normally have reset
its state to 'nominal'.

Additionally, lock access to the state field during modification to
transition to the dumping state.  It's not apparent that this oversight
could have caused an actual problem due to outer locking that protects
the dumping machinery, but the added locking pedantically follows the
stated locking protocol for the state field.
2014-09-24 22:23:43 -07:00
Jason Evans
5460aa6f66 Convert all tsd variables to reside in a single tsd structure. 2014-09-23 02:36:08 -07:00
Jason Evans
9d8f3d2033 Fix prof regressions.
Don't use atomic_add_uint64(), because it isn't available on 32-bit
platforms.

Fix forking support functions to manage all prof-related mutexes.

These regressions were introduced by
602c8e0971 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
2014-09-11 18:09:14 -07:00
Jason Evans
c3e9e7b041 Fix irallocx_prof() sample logic.
Fix irallocx_prof() sample logic to only update the threshold counter
after it knows what size the allocation ended up being.  This regression
was caused by 6e73dc194e (Fix a profile
sampling race.), which did not make it into any releases prior to this
fix.
2014-09-11 17:04:03 -07:00
Jason Evans
9c640bfdd4 Apply likely()/unlikely() to allocation/deallocation fast paths. 2014-09-11 17:01:58 -07:00
Jason Evans
91566fc079 Fix mallocx() to always honor MALLOCX_ARENA() when profiling. 2014-09-11 13:15:33 -07:00
Daniel Micay
23fdf8b359 mark some conditions as unlikely
* assertion failure
* malloc_init failure
* malloc not already initialized (in malloc_init)
* running in valgrind
* thread cache disabled at runtime

Clang and GCC already consider a comparison with NULL or -1 to be cold,
so many branches (out-of-memory) are already correctly considered as
cold and marking them is not important.
2014-09-10 21:49:42 -04:00
Jason Evans
6e73dc194e Fix a profile sampling race.
Fix a profile sampling race that was due to preparing to sample, yet
doing nothing to assure that the context remains valid until the stats
are updated.

These regressions were caused by
602c8e0971 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
2014-09-09 19:47:09 -07:00
Jason Evans
6fd53da030 Fix prof_tdata_get()-related regressions.
Fix prof_tdata_get() to avoid dereferencing an invalid tdata pointer
(when it's PROF_TDATA_STATE_{REINCARNATED,PURGATORY}).

Fix prof_tdata_get() callers to check for invalid results besides NULL
(PROF_TDATA_STATE_{REINCARNATED,PURGATORY}).

These regressions were caused by
602c8e0971 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
2014-09-09 15:29:34 -07:00
Jason Evans
a2260c95cd Fix sdallocx() assertion.
Refactor sdallocx() and nallocx() to share inallocx(), and fix an
sdallocx() assertion to check usize rather than size.
2014-09-09 10:39:15 -07:00
Daniel Micay
4cfe55166e Add support for sized deallocation.
This adds a new `sdallocx` function to the external API, allowing the
size to be passed by the caller.  It avoids some extra reads in the
thread cache fast path.  In the case where stats are enabled, this
avoids the work of calculating the size from the pointer.

An assertion validates the size that's passed in, so enabling debugging
will allow users of the API to debug cases where an incorrect size is
passed in.

The performance win for a contrived microbenchmark doing an allocation
and immediately freeing it is ~10%.  It may have a different impact on a
real workload.

Closes #28
2014-09-08 17:34:24 -07:00
Jason Evans
b718cf77e9 Optimize [nmd]alloc() fast paths.
Optimize [nmd]alloc() fast paths such that the (flags == 0) case is
streamlined, flags decoding only happens to the minimum degree
necessary, and no conditionals are repeated.
2014-09-07 14:40:19 -07:00
Jason Evans
c21b05ea09 Whitespace cleanups. 2014-09-04 22:27:26 -07:00
Qinfan Wu
ff6a31d3b9 Refactor chunk map.
Break the chunk map into two separate arrays, in order to improve cache
locality. This is related to issue #23.
2014-09-04 22:22:52 -07:00
Qinfan Wu
58799f6d1c Remove junk filling in tcache_bin_flush_small().
Junk filling is done in arena_dalloc_bin_locked(), so arena_alloc_junk_small()
is redundant. Also, we should use arena_dalloc_junk_small() instead of
arena_alloc_junk_small().
2014-08-26 21:28:31 -07:00
Sara Golemon
3e24afa28e Test for availability of malloc hooks via autoconf
__*_hook() is glibc, but on at least one glibc platform (homebrew),
the __GLIBC__ define isn't set correctly and we miss being able to
use these hooks.

Do a feature test for it during configuration so that we enable it
anywhere the hooks are actually available.
2014-08-22 15:19:21 -07:00
Jason Evans
602c8e0971 Implement per thread heap profiling.
Rename data structures (prof_thr_cnt_t-->prof_tctx_t,
prof_ctx_t-->prof_gctx_t), and convert to storing a prof_tctx_t for
sampled objects.

Convert PROF_ALLOC_PREP() to prof_alloc_prep(), since precise backtrace
depth within jemalloc functions is no longer an issue (pprof prunes
irrelevant frames).

Implement mallctl's:
- prof.reset implements full sample data reset, and optional change of
  sample interval.
- prof.lg_sample reads the current sample interval (opt.lg_prof_sample
  was the permanent source of truth prior to prof.reset).
- thread.prof.name provides naming capability for threads within heap
  profile dumps.
- thread.prof.active makes it possible to activate/deactivate heap
  profiling for individual threads.

Modify the heap dump files to contain per thread heap profile data.
This change is incompatible with the existing pprof, which will require
enhancements to read and process the enriched data.
2014-08-19 21:31:16 -07:00
Jason Evans
3a81cbd2d4 Dump heap profile backtraces in a stable order.
Also iterate over per thread stats in a stable order, which prepares the
way for stable ordering of per thread heap profile dumps.
2014-08-19 21:05:54 -07:00
Jason Evans
ab532e9799 Directly embed prof_ctx_t's bt. 2014-08-19 21:05:54 -07:00
Jason Evans
b41ccdb125 Convert prof_tdata_t's bt2cnt to a comprehensive map.
Treat prof_tdata_t's bt2cnt as a comprehensive map of the thread's
extant allocation samples (do not limit the total number of entries).
This helps prepare the way for per thread heap profiling.
2014-08-19 21:05:54 -07:00
Jason Evans
586c8ede42 Fix arena.<i>.dss mallctl to handle read-only calls. 2014-08-15 12:20:20 -07:00
Jason Evans
070b3c3fbd Fix and refactor runs_dirty-based purging.
Fix runs_dirty-based purging to also purge dirty pages in the spare
chunk.

Refactor runs_dirty manipulation into arena_dirty_{insert,remove}(), and
move the arena->ndirty accounting into those functions.

Remove the u.ql_link field from arena_chunk_map_t, and get rid of the
enclosing union for u.rb_link, since only rb_link remains.

Remove the ndirty field from arena_chunk_t.
2014-08-14 14:45:58 -07:00
Qinfan Wu
e8a2fd83a2 arena->npurgatory is no longer needed since we drop arena's lock
after stashing all the purgeable runs.
2014-08-12 09:50:01 -07:00
Qinfan Wu
90737fcda1 Remove chunks_dirty tree, nruns_avail and nruns_adjac since we no
longer need to maintain the tree for dirty page purging.
2014-08-12 09:50:00 -07:00
Qinfan Wu
e970800c78 Purge dirty pages from the beginning of the dirty list. 2014-08-12 09:50:00 -07:00
Qinfan Wu
a244e5078e Add dirty page counting for debug 2014-08-12 09:50:00 -07:00
Qinfan Wu
04d60a132b Maintain all the dirty runs in a linked list for each arena 2014-08-12 09:50:00 -07:00
Jason Evans
1522937e9c Fix the cactive statistic.
Fix the cactive statistic to decrease (rather than increase) when active
memory decreases.  This regression was introduced by
aa5113b1fd (Refactor overly large/complex
functions) and first released in 3.5.0.
2014-08-06 23:43:39 -07:00
Qinfan Wu
ea73eb8f3e Reintroduce the comment that was removed in f9ff603. 2014-08-06 16:43:01 -07:00
Qinfan Wu
55c9aa1038 Fix the bug that causes not allocating free run with lowest address. 2014-08-06 16:10:08 -07:00
Mike Hommey
6f533c1903 Ensure the default purgeable zone is after the default zone on OS X 2014-06-10 07:18:29 -07:00
Richard Diamond
994fad9bda Add check for madvise(2) to configure.ac.
Some platforms, such as Google's Portable Native Client, use Newlib and
thus lack access to madvise(2).  In those instances, pages_purge() is
transformed into a no-op.
2014-06-03 09:32:49 -07:00
Chris Peterson
70807bc54b Fix -Wsometimes-uninitialized warnings 2014-06-02 07:53:52 -07:00
Chris Peterson
3e310b34eb Fix -Wsign-compare warnings 2014-06-02 07:51:33 -07:00
Richard Diamond
94ed6812bc Don't catch fork()ing events for Native Client.
Native Client doesn't allow forking, thus there is no need to catch
fork()ing events for Native Client.

Additionally, without this commit, jemalloc will introduce an unresolved
pthread_atfork() in PNaCl Rust bins.
2014-06-02 07:45:33 -07:00
Richard Diamond
9c3a10fdf6 Try to use __builtin_ffsl if ffsl is unavailable.
Some platforms (like those using Newlib) don't have ffs/ffsl.  This
commit adds a check to configure.ac for __builtin_ffsl if ffsl isn't
found.  __builtin_ffsl performs the same function as ffsl, and has the
added benefit of being available on any platform utilizing
Gcc-compatible compiler.

This change does not address the used of ffs in the MALLOCX_ARENA()
macro.
2014-06-02 07:44:50 -07:00
Jason Evans
d04047cc29 Add size class computation capability.
Add size class computation capability, currently used only as validation
of the size class lookup tables.  Generalize the size class spacing used
for bins, for eventual use throughout the full range of allocation
sizes.
2014-05-28 21:06:46 -07:00
Jason Evans
e2deab7a75 Refactor huge allocation to be managed by arenas.
Refactor huge allocation to be managed by arenas (though the global
red-black tree of huge allocations remains for lookup during
deallocation).  This is the logical conclusion of recent changes that 1)
made per arena dss precedence apply to huge allocation, and 2) made it
possible to replace the per arena chunk allocation/deallocation
functions.

Remove the top level huge stats, and replace them with per arena huge
stats.

Normalize function names and types to *dalloc* (some were *dealloc*).

Remove the --enable-mremap option.  As jemalloc currently operates, this
is a performace regression for some applications, but planned work to
logarithmically space huge size classes should provide similar amortized
performance.  The motivation for this change was that mremap-based huge
reallocation forced leaky abstractions that prevented refactoring.
2014-05-15 22:36:41 -07:00
aravind
fb7fe50a88 Add support for user-specified chunk allocators/deallocators.
Add new mallctl endpoints "arena<i>.chunk.alloc" and
"arena<i>.chunk.dealloc" to allow userspace to configure
jemalloc's chunk allocator and deallocator on a per-arena
basis.
2014-05-12 10:46:03 -07:00
Jason Evans
a344dd01c7 Fix coding sytle nits. 2014-05-01 15:51:30 -07:00