Commit Graph

68 Commits

Author SHA1 Message Date
Jason Evans
262146dfc4 Eliminate innocuous compiler warnings. 2015-03-14 14:34:16 -07:00
Jason Evans
764b00023f Fix a heap profiling regression.
Add the prof_tctx_state_destroying transitionary state to fix a race
between a thread destroying a tctx and another thread creating a new
equivalent tctx.

This regression was introduced by
602c8e0971 (Implement per thread heap
profiling.).
2015-03-14 14:01:35 -07:00
Jason Evans
d69964bd2d Fix a heap profiling regression.
Fix prof_tctx_comp() to incorporate tctx state into the comparison.
During a dump it is possible for both a purgatory tctx and an otherwise
equivalent nominal tctx to reside in the tree at the same time.

This regression was introduced by
602c8e0971 (Implement per thread heap
profiling.).
2015-03-12 16:25:18 -07:00
Jason Evans
1cb181ed63 Implement explicit tcache support.
Add the MALLOCX_TCACHE() and MALLOCX_TCACHE_NONE macros, which can be
used in conjunction with the *allocx() API.

Add the tcache.create, tcache.flush, and tcache.destroy mallctls.

This resolves #145.
2015-02-09 17:44:48 -08:00
Jason Evans
5b8ed5b7c9 Implement the prof.gdump mallctl.
This feature makes it possible to toggle the gdump feature on/off during
program execution, whereas the the opt.prof_dump mallctl value can only
be set during program startup.

This resolves #72.
2015-01-25 21:21:35 -08:00
Jason Evans
4581b97809 Implement metadata statistics.
There are three categories of metadata:

- Base allocations are used for bootstrap-sensitive internal allocator
  data structures.
- Arena chunk headers comprise pages which track the states of the
  non-metadata pages.
- Internal allocations differ from application-originated allocations
  in that they are for internal use, and that they are omitted from heap
  profiles.

The metadata statistics comprise the metadata categories as follows:

- stats.metadata: All metadata -- base + arena chunk headers + internal
  allocations.
- stats.arenas.<i>.metadata.mapped: Arena chunk headers.
- stats.arenas.<i>.metadata.allocated: Internal allocations.  This is
  reported separately from the other metadata statistics because it
  overlaps with the allocated and active statistics, whereas the other
  metadata statistics do not.

Base allocations are not reported separately, though their magnitude can
be computed by subtracting the arena-specific metadata.

This resolves #163.
2015-01-23 23:34:43 -08:00
Jason Evans
82cb603ed7 Don't dereference NULL tdata in prof_{enter,leave}().
It is possible for the thread's tdata to be NULL late during thread
destruction, so take care not to dereference a NULL pointer in such
cases.
2014-11-01 00:20:28 -07:00
Jason Evans
cfc5706f69 Miscellaneous cleanups. 2014-10-30 23:18:45 -07:00
Jason Evans
c93ed81cd0 Fix prof_{enter,leave}() calls to pass tdata_self. 2014-10-30 16:50:33 -07:00
Jason Evans
af1f592763 Use JEMALLOC_INLINE_C everywhere it's appropriate. 2014-10-30 16:38:08 -07:00
Jason Evans
44c97b712e Fix a prof_tctx_t/prof_tdata_t cleanup race.
Fix a prof_tctx_t/prof_tdata_t cleanup race by storing a copy of thr_uid
in prof_tctx_t, so that the associated tdata need not be present during
tctx teardown.
2014-10-12 13:03:20 -07:00
Jason Evans
57efa7bb0e Avoid atexit(3) when possible, disable prof_final by default.
atexit(3) can deadlock internally during its own initialization if
jemalloc calls atexit() during jemalloc initialization.  Mitigate the
impact by restructuring prof initialization to avoid calling atexit()
unless the registered function will actually dump a final heap profile.

Additionally, disable prof_final by default so that this land mine is
opt-in rather than opt-out.

This resolves #144.
2014-10-08 18:08:00 -07:00
Jason Evans
bf40641c5c Fix a prof_tctx_t destruction race. 2014-10-06 16:35:11 -07:00
Jason Evans
f04a0bef99 Fix prof regressions.
Fix prof regressions related to tdata (main per thread profiling data
structure) destruction:
- Deadlock.  The fix for this was intended to be part of
  20c31deaae (Test prof.reset mallctl and
  fix numerous discovered bugs.) but the fix was left incomplete.
- Destruction race.  Detaching tdata just prior to destruction without
  holding the tdatas lock made it possible for another thread to destroy
  the tdata out from under the thread that was on its way to doing so.
2014-10-04 15:03:49 -07:00
Jason Evans
029d44cf8b Fix tsd cleanup regressions.
Fix tsd cleanup regressions that were introduced in
5460aa6f66 (Convert all tsd variables to
reside in a single tsd structure.).  These regressions were twofold:

1) tsd_tryget() should never (and need never) return NULL.  Rename it to
   tsd_fetch() and simplify all callers.
2) tsd_*_set() must only be called when tsd is in the nominal state,
   because cleanup happens during the nominal-->purgatory transition,
   and re-initialization must not happen while in the purgatory state.
   Add tsd_nominal() and use it as needed.  Note that tsd_*{p,}_get()
   can still be used as long as no re-initialization that would require
   cleanup occurs.  This means that e.g. the thread_allocated counter
   can be updated unconditionally.
2014-10-04 11:22:55 -07:00
Jason Evans
fc12c0b8bc Implement/test/fix prof-related mallctl's.
Implement/test/fix the opt.prof_thread_active_init,
prof.thread_active_init, and thread.prof.active mallctl's.

Test/fix the thread.prof.name mallctl.

Refactor opt_prof_active to be read-only and move mutable state into the
prof_active variable.  Stop leaning on ctl-related locking for
protection.
2014-10-03 23:25:30 -07:00
Jason Evans
551ebc4364 Convert to uniform style: cond == false --> !cond 2014-10-03 10:16:09 -07:00
Jason Evans
20c31deaae Test prof.reset mallctl and fix numerous discovered bugs. 2014-10-02 23:01:10 -07:00
Jason Evans
6ef80d68f0 Fix profile dumping race.
Fix a race that caused a non-critical assertion failure.  To trigger the
race, a thread had to be part way through initializing a new sample,
such that it was discoverable by the dumping thread, but not yet linked
into its gctx by the time a later dump phase would normally have reset
its state to 'nominal'.

Additionally, lock access to the state field during modification to
transition to the dumping state.  It's not apparent that this oversight
could have caused an actual problem due to outer locking that protects
the dumping machinery, but the added locking pedantically follows the
stated locking protocol for the state field.
2014-09-24 22:23:43 -07:00
Jason Evans
5460aa6f66 Convert all tsd variables to reside in a single tsd structure. 2014-09-23 02:36:08 -07:00
Jason Evans
9d8f3d2033 Fix prof regressions.
Don't use atomic_add_uint64(), because it isn't available on 32-bit
platforms.

Fix forking support functions to manage all prof-related mutexes.

These regressions were introduced by
602c8e0971 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
2014-09-11 18:09:14 -07:00
Jason Evans
6e73dc194e Fix a profile sampling race.
Fix a profile sampling race that was due to preparing to sample, yet
doing nothing to assure that the context remains valid until the stats
are updated.

These regressions were caused by
602c8e0971 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
2014-09-09 19:47:09 -07:00
Jason Evans
6fd53da030 Fix prof_tdata_get()-related regressions.
Fix prof_tdata_get() to avoid dereferencing an invalid tdata pointer
(when it's PROF_TDATA_STATE_{REINCARNATED,PURGATORY}).

Fix prof_tdata_get() callers to check for invalid results besides NULL
(PROF_TDATA_STATE_{REINCARNATED,PURGATORY}).

These regressions were caused by
602c8e0971 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
2014-09-09 15:29:34 -07:00
Jason Evans
602c8e0971 Implement per thread heap profiling.
Rename data structures (prof_thr_cnt_t-->prof_tctx_t,
prof_ctx_t-->prof_gctx_t), and convert to storing a prof_tctx_t for
sampled objects.

Convert PROF_ALLOC_PREP() to prof_alloc_prep(), since precise backtrace
depth within jemalloc functions is no longer an issue (pprof prunes
irrelevant frames).

Implement mallctl's:
- prof.reset implements full sample data reset, and optional change of
  sample interval.
- prof.lg_sample reads the current sample interval (opt.lg_prof_sample
  was the permanent source of truth prior to prof.reset).
- thread.prof.name provides naming capability for threads within heap
  profile dumps.
- thread.prof.active makes it possible to activate/deactivate heap
  profiling for individual threads.

Modify the heap dump files to contain per thread heap profile data.
This change is incompatible with the existing pprof, which will require
enhancements to read and process the enriched data.
2014-08-19 21:31:16 -07:00
Jason Evans
3a81cbd2d4 Dump heap profile backtraces in a stable order.
Also iterate over per thread stats in a stable order, which prepares the
way for stable ordering of per thread heap profile dumps.
2014-08-19 21:05:54 -07:00
Jason Evans
ab532e9799 Directly embed prof_ctx_t's bt. 2014-08-19 21:05:54 -07:00
Jason Evans
b41ccdb125 Convert prof_tdata_t's bt2cnt to a comprehensive map.
Treat prof_tdata_t's bt2cnt as a comprehensive map of the thread's
extant allocation samples (do not limit the total number of entries).
This helps prepare the way for per thread heap profiling.
2014-08-19 21:05:54 -07:00
Chris Peterson
3e310b34eb Fix -Wsign-compare warnings 2014-06-02 07:51:33 -07:00
Jason Evans
6f001059aa Simplify backtracing.
Simplify backtracing to not ignore any frames, and compensate for this
in pprof in order to increase flexibility with respect to function-based
refactoring even in the presence of non-deterministic inlining.  Modify
pprof to blacklist all jemalloc allocation entry points including
non-standard ones like mallocx(), and ignore all allocator-internal
frames.  Prior to this change, pprof excluded the specifically
blacklisted functions from backtraces, but it left allocator-internal
frames intact.
2014-04-22 20:55:09 -07:00
Lucian Adrian Grijincu
9d4e13f45a prof_backtrace: use unw_backtrace
unw_backtrace:
- does internal per-thread caching
- doesn't acquire an internal lock
2014-04-22 18:39:47 -07:00
Ben Maurer
6c39f9e059 refactor profiling. only use a bytes till next sample variable. 2014-04-16 13:43:30 -07:00
Jason Evans
9b0cbf0850 Remove support for non-prof-promote heap profiling metadata.
Make promotion of sampled small objects to large objects mandatory, so
that profiling metadata can always be stored in the chunk map, rather
than requiring one pointer per small region in each small-region page
run.  In practice the non-prof-promote code was only useful when using
jemalloc to track all objects and report them as leaks at program exit.
However, Valgrind is at least as good a tool for this particular use
case.

Furthermore, the non-prof-promote code is getting in the way of
some optimizations that will make heap profiling much cheaper for the
predominant use case (sampling a small representative proportion of all
allocations).
2014-04-11 14:24:51 -07:00
Harald Weppner
c2da2591be Consistently use debug lib(s) if present
Fixes a situation where nm uses the debug lib but
addr2line does not, which completely messes up the symbol
lookup.
2014-03-28 13:47:59 -07:00
Harald Weppner
bf543df20c Enable profiling / leak detection in FreeBSD
* Assumes procfs is mounted at /proc, cf.
  <http://www.freebsd.org/doc/en/articles/linux-users/procfs.html>
2014-03-17 23:53:00 -07:00
Jason Evans
772163b4f3 Add heap profiling tests.
Fix a regression in prof_dump_ctx() due to an uninitized variable.  This
was caused by revision 4f37ef693e, so no
releases are affected.
2014-01-17 15:40:52 -08:00
Jason Evans
eefdd02e70 Fix a variable prototype/definition mismatch. 2014-01-16 18:04:30 -08:00
Jason Evans
4f37ef693e Refactor prof_dump() to reduce contention.
Refactor prof_dump() to use a two pass algorithm, and prof_leave() prior
to the second pass.  This avoids write(2) system calls while holding
critical prof resources.

Fix prof_dump() to close the dump file descriptor for all relevant error
paths.

Minimize the size of prof-related static buffers when prof is disabled.
This saves roughly 65 KiB of application memory for non-prof builds.

Refactor prof_ctx_init() out of prof_lookup_global().
2014-01-16 13:36:38 -08:00
Jason Evans
fb1775e47e Refactor prof_lookup() by extracting prof_lookup_global(). 2014-01-14 17:04:34 -08:00
Jason Evans
239692b18e Fix whitespace. 2013-10-28 12:41:37 -07:00
Jason Evans
93f39f8d23 Fix a file descriptor leak in a prof_dump_maps() error path.
Reported by Pat Lynch.
2013-10-21 15:07:40 -07:00
Jason Evans
f1c3da8b02 Consistently use malloc_mutex_prefork().
Consistently use malloc_mutex_prefork() instead of malloc_mutex_lock()
in all prefork functions.
2013-10-21 14:59:10 -07:00
Jason Evans
6556e28be1 Prefer not_reached() over assert(false) where appropriate. 2013-10-21 14:56:27 -07:00
Jason Evans
bbe29d374d Fix potential TLS-related memory corruption.
Avoid writing to uninitialized TLS as a side effect of deallocation.
Initializing TLS during deallocation is unsafe because it is possible
that a thread never did any allocation, and that TLS has already been
deallocated by the threads library, resulting in write-after-free
corruption.  These fixes affect prof_tdata and quarantine; all other
uses of TLS are already safe, whether intentionally (as for tcache) or
unintentionally (as for arenas).
2013-01-31 14:23:48 -08:00
Jason Evans
ae03bf6a57 Update hash from MurmurHash2 to MurmurHash3.
Update hash from MurmurHash2 to MurmurHash3, primarily because the
latter generates 128 bits in a single call for no extra cost, which
simplifies integration with cuckoo hashing.
2013-01-22 12:02:08 -08:00
Jason Evans
a3b3386ddd Avoid arena_prof_accum()-related locking when possible.
Refactor arena_prof_accum() and its callers to avoid arena locking when
prof_interval is 0 (as when profiling is disabled).

Reported by Ben Maurer.
2012-11-13 13:47:53 -08:00
Jason Evans
20f1fc95ad Fix fork(2)-related deadlocks.
Add a library constructor for jemalloc that initializes the allocator.
This fixes a race that could occur if threads were created by the main
thread prior to any memory allocation, followed by fork(2), and then
memory allocation in the child process.

Fix the prefork/postfork functions to acquire/release the ctl, prof, and
rtree mutexes.  This fixes various fork() child process deadlocks, but
one possible deadlock remains (intentionally) unaddressed: prof
backtracing can acquire runtime library mutexes, so deadlock is still
possible if heap profiling is enabled during fork().  This deadlock is
known to be a real issue in at least the case of libgcc-based
backtracing.

Reported by tfengjun.
2012-10-09 15:21:46 -07:00
Jason Evans
f278994029 Fix more prof_tdata resurrection corner cases. 2012-04-28 23:27:13 -07:00
Jason Evans
0050a0f7e6 Handle prof_tdata resurrection.
Handle prof_tdata resurrection during thread shutdown, similarly to how
tcache and quarantine handle resurrection.
2012-04-28 18:14:24 -07:00
Jason Evans
95ff6aadca Don't set prof_tdata during thread cleanup.
Don't set prof_tdata during thread cleanup, because doing so will cause
the cleanup function to be called again, the second time with a NULL
argument.
2012-04-28 14:15:28 -07:00
Jason Evans
52386b2dc6 Fix heap profiling bugs.
Fix a potential deadlock that could occur during interval- and
growth-triggered heap profile dumps.

Fix an off-by-one heap profile statistics bug that could be observed in
interval- and growth-triggered heap profiles.

Fix heap profile dump filename sequence numbers (regression during
conversion to malloc_snprintf()).
2012-04-22 16:00:11 -07:00