Commit Graph

3361 Commits

Author SHA1 Message Date
Jason Evans
0c5dd03e88 Move small run metadata into the arena chunk header.
Move small run metadata into the arena chunk header, with multiple
expected benefits:
- Lower run fragmentation due to reduced run sizes; runs are more likely
  to completely drain when there are fewer total regions.
- Improved cache behavior.  Prior to this change, run headers were
  always page-aligned, which put extra pressure on some CPU cache sets.
  The degree to which this was a problem was hardware dependent, but it
  likely hurt some even for the most advanced modern hardware.
- Buffer overruns/underruns are less likely to corrupt allocator
  metadata.
- Size classes between 4 KiB and 16 KiB become reasonable to support
  without any special handling, and the runs are small enough that dirty
  unused pages aren't a significant concern.
2014-09-29 01:31:39 -07:00
Jason Evans
f97e5ac4ec Implement compile-time bitmap size computation. 2014-09-28 14:43:11 -07:00
Jason Evans
6ef80d68f0 Fix profile dumping race.
Fix a race that caused a non-critical assertion failure.  To trigger the
race, a thread had to be part way through initializing a new sample,
such that it was discoverable by the dumping thread, but not yet linked
into its gctx by the time a later dump phase would normally have reset
its state to 'nominal'.

Additionally, lock access to the state field during modification to
transition to the dumping state.  It's not apparent that this oversight
could have caused an actual problem due to outer locking that protects
the dumping machinery, but the added locking pedantically follows the
stated locking protocol for the state field.
2014-09-24 22:23:43 -07:00
Dave Rigby
112704cfbf Use MSVC intrinsics for lg_floor
When using MSVC make use of its intrinsic functions (supported on
x86, amd64 & ARM) for lg_floor.
2014-09-24 11:55:02 +01:00
Dave Rigby
70bdee07d9 autoconf: Support cygwin in addition to mingw 2014-09-24 11:31:56 +01:00
Jason Evans
eb5376ab9e Add instructions for installing from non-packaged sources. 2014-09-23 09:21:49 -07:00
Jason Evans
5460aa6f66 Convert all tsd variables to reside in a single tsd structure. 2014-09-23 02:36:08 -07:00
Jason Evans
42f5955938 Ignore jemalloc.pc . 2014-09-21 21:40:38 -07:00
Nick White
913e9a8a85 Generate a pkg-config file 2014-09-19 22:27:35 +01:00
Daniel Micay
f1cf3ea475 fix tls_model autoconf test
It has an unused variable, so it was always failing (at least with gcc
4.9.1). Alternatively, the `-Werror` flag could be removed if it isn't
strictly necessary.
2014-09-16 04:42:33 -04:00
Valerii Hiora
ebca69c9fb Fixed iOS build after OR1 changes 2014-09-12 07:24:28 +03:00
Jason Evans
9d8f3d2033 Fix prof regressions.
Don't use atomic_add_uint64(), because it isn't available on 32-bit
platforms.

Fix forking support functions to manage all prof-related mutexes.

These regressions were introduced by
602c8e0971 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
2014-09-11 18:09:14 -07:00
Jason Evans
c3e9e7b041 Fix irallocx_prof() sample logic.
Fix irallocx_prof() sample logic to only update the threshold counter
after it knows what size the allocation ended up being.  This regression
was caused by 6e73dc194e (Fix a profile
sampling race.), which did not make it into any releases prior to this
fix.
2014-09-11 17:04:03 -07:00
Jason Evans
9c640bfdd4 Apply likely()/unlikely() to allocation/deallocation fast paths. 2014-09-11 17:01:58 -07:00
Jason Evans
91566fc079 Fix mallocx() to always honor MALLOCX_ARENA() when profiling. 2014-09-11 13:15:33 -07:00
Daniel Micay
23fdf8b359 mark some conditions as unlikely
* assertion failure
* malloc_init failure
* malloc not already initialized (in malloc_init)
* running in valgrind
* thread cache disabled at runtime

Clang and GCC already consider a comparison with NULL or -1 to be cold,
so many branches (out-of-memory) are already correctly considered as
cold and marking them is not important.
2014-09-10 21:49:42 -04:00
Daniel Micay
6b5609d23b add likely / unlikely macros 2014-09-10 17:36:32 -04:00
Jason Evans
61beeb9f69 Add sdallocx() to list of functions to prune in pprof. 2014-09-10 08:49:29 -07:00
Jason Evans
6e73dc194e Fix a profile sampling race.
Fix a profile sampling race that was due to preparing to sample, yet
doing nothing to assure that the context remains valid until the stats
are updated.

These regressions were caused by
602c8e0971 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
2014-09-09 19:47:09 -07:00
Jason Evans
6fd53da030 Fix prof_tdata_get()-related regressions.
Fix prof_tdata_get() to avoid dereferencing an invalid tdata pointer
(when it's PROF_TDATA_STATE_{REINCARNATED,PURGATORY}).

Fix prof_tdata_get() callers to check for invalid results besides NULL
(PROF_TDATA_STATE_{REINCARNATED,PURGATORY}).

These regressions were caused by
602c8e0971 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
2014-09-09 15:29:34 -07:00
Jason Evans
7c17e1670d Fix threaded heap profile bug in pprof.
Fix ReadThreadedHeapProfile to pass the correct parameters to
AdjustSamples.
2014-09-09 15:29:34 -07:00
Jason Evans
a2260c95cd Fix sdallocx() assertion.
Refactor sdallocx() and nallocx() to share inallocx(), and fix an
sdallocx() assertion to check usize rather than size.
2014-09-09 10:39:15 -07:00
Bert Maher
d95e704fea Support threaded heap profiles in pprof
- Add a --thread N option to select profile for thread N (otherwise, all
      threads will be printed)
    - The $profile map now has a {threads} element that is a map from thread id to
      a profile that has the same format as the {profile} element
    - Refactor ReadHeapProfile into smaller components and use them to implement
      ReadThreadedHeapProfile
2014-09-09 10:01:35 -07:00
Jason Evans
ffe93419d5 Merge pull request #115 from thestinger/isqalloct
fix isqalloct (should call isdalloct)
2014-09-08 20:19:08 -07:00
Daniel Micay
a62812eacc fix isqalloct (should call isdalloct) 2014-09-08 21:46:17 -04:00
Daniel Micay
4cfe55166e Add support for sized deallocation.
This adds a new `sdallocx` function to the external API, allowing the
size to be passed by the caller.  It avoids some extra reads in the
thread cache fast path.  In the case where stats are enabled, this
avoids the work of calculating the size from the pointer.

An assertion validates the size that's passed in, so enabling debugging
will allow users of the API to debug cases where an incorrect size is
passed in.

The performance win for a contrived microbenchmark doing an allocation
and immediately freeing it is ~10%.  It may have a different impact on a
real workload.

Closes #28
2014-09-08 17:34:24 -07:00
Jason Evans
c3f8650749 Add relevant function attributes to [msn]allocx(). 2014-09-08 16:47:51 -07:00
Jason Evans
a1f3929ffd Thwart optimization of free(malloc(1)) in microbench. 2014-09-08 16:23:48 -07:00
Jason Evans
c54f93f186 Merge pull request #114 from thestinger/timer
avoid conflict with the POSIX timer_t type
2014-09-07 22:41:47 -07:00
Daniel Micay
c3bfe9569a avoid conflict with the POSIX timer_t type
It hits a compilation error with glibc 2.19 without a rename.
2014-09-08 01:20:44 -04:00
Jason Evans
423d78a21b Add microbench tests. 2014-09-07 19:58:04 -07:00
Jason Evans
b67ec3c497 Add a simple timer implementation for use in benchmarking. 2014-09-07 19:57:24 -07:00
Jason Evans
82e88d1ecf Move typedefs from jemalloc_protos.h.in to jemalloc_typedefs.h.in.
Move typedefs from jemalloc_protos.h.in to jemalloc_typedefs.h.in, so
that typedefs aren't redefined when compiling stress tests.
2014-09-07 19:55:03 -07:00
Jason Evans
b718cf77e9 Optimize [nmd]alloc() fast paths.
Optimize [nmd]alloc() fast paths such that the (flags == 0) case is
streamlined, flags decoding only happens to the minimum degree
necessary, and no conditionals are repeated.
2014-09-07 14:40:19 -07:00
Jason Evans
c21b05ea09 Whitespace cleanups. 2014-09-04 22:27:26 -07:00
Qinfan Wu
ff6a31d3b9 Refactor chunk map.
Break the chunk map into two separate arrays, in order to improve cache
locality. This is related to issue #23.
2014-09-04 22:22:52 -07:00
Jason Evans
f34f6037e8 Disable autom4te cache. 2014-09-02 17:49:29 -07:00
Jason Evans
a5a658ab48 Make VERSION generation more robust.
Relax the "are we in a git repo?" check to succeed even if the top level
jemalloc directory is not at the top level of the git repo.

Add git tag filtering so that only version triplets match when
generating VERSION.

Add fallback bogus VERSION creation, so that in the worst case, rather
than generating empty values for e.g. JEMALLOC_VERSION_MAJOR,
configuration ends up generating useless constants.
2014-09-02 15:07:07 -07:00
Jason Evans
3ebf6db2c7 Merge pull request #108 from wqfish/dev
Remove junk filling in tcache_bin_flush_small().
2014-08-27 12:04:01 -07:00
Qinfan Wu
58799f6d1c Remove junk filling in tcache_bin_flush_small().
Junk filling is done in arena_dalloc_bin_locked(), so arena_alloc_junk_small()
is redundant. Also, we should use arena_dalloc_junk_small() instead of
arena_alloc_junk_small().
2014-08-26 21:28:31 -07:00
Sara Golemon
3e24afa28e Test for availability of malloc hooks via autoconf
__*_hook() is glibc, but on at least one glibc platform (homebrew),
the __GLIBC__ define isn't set correctly and we miss being able to
use these hooks.

Do a feature test for it during configuration so that we enable it
anywhere the hooks are actually available.
2014-08-22 15:19:21 -07:00
Jason Evans
602c8e0971 Implement per thread heap profiling.
Rename data structures (prof_thr_cnt_t-->prof_tctx_t,
prof_ctx_t-->prof_gctx_t), and convert to storing a prof_tctx_t for
sampled objects.

Convert PROF_ALLOC_PREP() to prof_alloc_prep(), since precise backtrace
depth within jemalloc functions is no longer an issue (pprof prunes
irrelevant frames).

Implement mallctl's:
- prof.reset implements full sample data reset, and optional change of
  sample interval.
- prof.lg_sample reads the current sample interval (opt.lg_prof_sample
  was the permanent source of truth prior to prof.reset).
- thread.prof.name provides naming capability for threads within heap
  profile dumps.
- thread.prof.active makes it possible to activate/deactivate heap
  profiling for individual threads.

Modify the heap dump files to contain per thread heap profile data.
This change is incompatible with the existing pprof, which will require
enhancements to read and process the enriched data.
2014-08-19 21:31:16 -07:00
Jason Evans
1628e8615e Add rb_empty(). 2014-08-19 21:05:54 -07:00
Jason Evans
3a81cbd2d4 Dump heap profile backtraces in a stable order.
Also iterate over per thread stats in a stable order, which prepares the
way for stable ordering of per thread heap profile dumps.
2014-08-19 21:05:54 -07:00
Jason Evans
ab532e9799 Directly embed prof_ctx_t's bt. 2014-08-19 21:05:54 -07:00
Jason Evans
b41ccdb125 Convert prof_tdata_t's bt2cnt to a comprehensive map.
Treat prof_tdata_t's bt2cnt as a comprehensive map of the thread's
extant allocation samples (do not limit the total number of entries).
This helps prepare the way for per thread heap profiling.
2014-08-19 21:05:54 -07:00
Jason Evans
586c8ede42 Fix arena.<i>.dss mallctl to handle read-only calls. 2014-08-15 12:20:20 -07:00
Jason Evans
070b3c3fbd Fix and refactor runs_dirty-based purging.
Fix runs_dirty-based purging to also purge dirty pages in the spare
chunk.

Refactor runs_dirty manipulation into arena_dirty_{insert,remove}(), and
move the arena->ndirty accounting into those functions.

Remove the u.ql_link field from arena_chunk_map_t, and get rid of the
enclosing union for u.rb_link, since only rb_link remains.

Remove the ndirty field from arena_chunk_t.
2014-08-14 14:45:58 -07:00
Qinfan Wu
e8a2fd83a2 arena->npurgatory is no longer needed since we drop arena's lock
after stashing all the purgeable runs.
2014-08-12 09:50:01 -07:00
Qinfan Wu
90737fcda1 Remove chunks_dirty tree, nruns_avail and nruns_adjac since we no
longer need to maintain the tree for dirty page purging.
2014-08-12 09:50:00 -07:00