Commit Graph

75 Commits

Author SHA1 Message Date
Jason Evans
3e292475ee Implement atomic operations for x86/x64.
Add inline assembly implementations of atomic_{add,sub}_uint{32,64}()
for x86/x64, in order to support compilers that are missing the relevant
gcc intrinsics.
2011-03-24 16:48:11 -07:00
Jason Evans
9f949f9d82 Revert "Add support for libunwind backtrace caching."
This reverts commit adc675c8ef.

The original commit added support for a non-standard libunwind API, so
it was not of general utility.
2011-03-22 20:44:40 -07:00
je@facebook.com
adc675c8ef Add support for libunwind backtrace caching.
Use libunwind's unw_tdep_trace() if it is available.
2011-03-23 17:45:57 -07:00
Jason Evans
38d9210c46 Fix error detection for ipalloc() when profiling.
sa2u() returns 0 on overflow, but the profiling code was blindly calling
sa2u() and allowing the error to silently propagate, ultimately ending
in a later assertion failure.  Refactor all ipalloc() callers to call
sa2u(), check for overflow before calling ipalloc(), and pass usize
rather than size.  This allows ipalloc() to avoid calling sa2u() in the
common case.
2011-03-23 00:37:29 -07:00
Jason Evans
47e57f9bda Avoid overflow in arena_run_regind().
Fix a regression due to:
    Remove an arena_bin_run_size_calc() constraint.
    2a6f2af6e4
The removed constraint required that small run headers fit in one page,
which indirectly limited runs such that they would not cause overflow in
arena_run_regind().  Add an explicit constraint to
arena_bin_run_size_calc() based on the largest number of regions that
arena_run_regind() can handle (2^11 as currently configured).
2011-03-22 09:00:56 -07:00
Jason Evans
1dcb4f86b2 Dynamically adjust tcache fill count.
Dynamically adjust tcache fill count (number of objects allocated per
tcache refill) such that if GC has to flush inactive objects, the fill
count gradually decreases.  Conversely, if refills occur while the fill
count is depressed, the fill count gradually increases back to its
maximum value.
2011-03-21 00:18:17 -07:00
Jason Evans
893a0ed7c8 Use OSSpinLock*() for locking on OS X.
pthread_mutex_lock() can call malloc() on OS X (!!!), which causes
deadlock.  Work around this by using spinlocks that are built of more
primitive stuff.
2011-03-18 19:30:18 -07:00
Jason Evans
763baa6cfc Add atomic operation support for OS X. 2011-03-18 19:10:31 -07:00
Jason Evans
92d3284ff8 Add atomic.[ch].
Add atomic.[ch], which should have been part of the previous commit.
2011-03-18 18:15:37 -07:00
Jason Evans
0657f12acd Add the "stats.cactive" mallctl.
Add the "stats.cactive" mallctl, which can be used to efficiently and
repeatedly query approximately how much active memory the application is
utilizing.
2011-03-18 17:56:14 -07:00
Jason Evans
597632be18 Improve thread-->arena assignment.
Rather than blindly assigning threads to arenas in round-robin fashion,
choose the lowest-numbered arena that currently has the smallest number
of threads assigned to it.

Add the "stats.arenas.<i>.nthreads" mallctl.
2011-03-18 13:41:33 -07:00
Jason Evans
84c8eefeff Use bitmaps to track small regions.
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills.  Fix this in two places:

  - arena_run_t: Use a new bitmap implementation to track which regions
                 are available.  Furthermore, revert to preferring the
                 lowest available region (as jemalloc did with its old
                 bitmap-based approach).

  - tcache_t: Move read-only tcache_bin_t metadata into
              tcache_bin_info_t, and add a contiguous array of pointers
              to tcache_t in order to track cached objects.  This
              substantially increases the size of tcache_t, but results
              in much higher data locality for common tcache operations.
              As a side benefit, it is again possible to efficiently
              flush the least recently used cached objects, so this
              change changes flushing from MRU to LRU.

The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast.  In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.

Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.

Use JEMALLOC_DEBUG rather than NDEBUG.

Add dassert(), and use it for debug-only asserts.
2011-03-17 16:29:32 -07:00
Jason Evans
77f350be08 Improve backtracing-related configuration.
Clean up configuration for backtracing when profiling is enabled, and
document the configuration logic in INSTALL.

Disable libgcc-based backtracing except on x64 (where it is known to
work).

Add the --disable-prof-gcc option.
2011-03-15 22:23:12 -07:00
Jason Evans
b602daa671 Clean up after arena_bin_info_t change.
Fix a couple of problems related to the addition of arena_bin_info_t.
2011-03-15 22:19:45 -07:00
Jason Evans
49f7e8f35a Create arena_bin_info_t.
Move read-only fields from arena_bin_t into arena_bin_info_t, primarily
in order to avoid false cacheline sharing.
2011-03-15 13:59:15 -07:00
Jason Evans
41ade967c2 Reduce size of small_size2bin lookup table.
Convert all direct small_size2bin[...] accesses to SMALL_SIZE2BIN(...)
macro calls, and use a couple of cheap math operations to allow
compacting the table by 4X or 8X, on 32- and 64-bit systems,
respectively.
2011-03-15 11:12:11 -07:00
Jason Evans
ff7450727f Expand a comment regarding geometric sampling. 2011-03-15 11:12:11 -07:00
Jason Evans
814b9bda7f Fix a cpp logic regression.
Fix a cpp logic error that was introduced by the recent commit:
	Fix "thread.{de,}allocatedp" mallctl.
2011-03-06 23:03:33 -08:00
Arun Sharma
af5d6987f8 Build both PIC and no PIC static libraries
When jemalloc is linked into an executable (as opposed to a shared
library), compiling with -fno-pic can have significant advantages,
mainly because we don't have to go throught the GOT (global offset
table).

Users who want to link jemalloc into a shared library that could
be dlopened need to link with libjemalloc_pic.a or libjemalloc.so.
2011-03-02 11:14:50 -08:00
Jason Evans
655f04a5a4 Fix style nits. 2011-02-13 18:44:59 -08:00
Jason Evans
9dcad2dfd1 Fix "thread.{de,}allocatedp" mallctl.
For the non-TLS case (as on OS X), if the "thread.{de,}allocatedp"
mallctl was called before any allocation occurred for that thread, the
TSD was still NULL, thus putting the application at risk of
dereferencing NULL.  Fix this by refactoring the initialization code,
and making it part of the conditional logic for all per thread
allocation counter accesses.
2011-02-13 18:11:54 -08:00
Jason Evans
f256680f87 Fix ALLOCM_LG_ALIGN definition.
Fix ALLOCM_LG_ALIGN to take a parameter and use it.  Apparently, an
editing error left ALLOCM_LG_ALIGN with the same definition as
ALLOCM_LG_ALIGN_MASK.
2011-01-26 08:24:24 -08:00
Jason Evans
dbd3832d20 Fix assertion typos.
s/=/==/ in several assertions, as well as fixing spelling errors.
2011-01-14 17:37:27 -08:00
Jason Evans
8ad0eacfb3 Update various comments. 2010-12-17 18:07:53 -08:00
Jason Evans
50ac670d09 Remove high_water from tcache_bin_t.
Remove the high_water field from tcache_bin_t, since it is not useful
for anything.
2010-12-16 14:12:48 -08:00
Jason Evans
cfdc8cfbd6 Use mremap(2) for huge realloc().
If mremap(2) is available and supports MREMAP_FIXED, use it for huge
realloc().

Initialize rtree later during bootstrapping, so that --enable-debug
--enable-dss works.

Fix a minor swap_avail stats bug.
2010-11-30 16:50:58 -08:00
Jason Evans
ce93055c49 Use madvise(..., MADV_FREE) on OS X.
Use madvise(..., MADV_FREE) rather than msync(..., MS_KILLPAGES) on OS
X, since it works for at least OS X 10.5 and 10.6.
2010-10-24 13:03:07 -07:00
Jason Evans
e73397062a Replace JEMALLOC_OPTIONS with MALLOC_CONF.
Replace the single-character run-time flags with key/value pairs, which
can be set via the malloc_conf global, /etc/malloc.conf, and the
MALLOC_CONF environment variable.

Replace the JEMALLOC_PROF_PREFIX environment variable with the
"opt.prof_prefix" option.

Replace umax2s() with u2s().
2010-10-23 18:37:06 -07:00
Jason Evans
e4f7846f1f Fix heap profiling bugs.
Fix a regression due to the recent heap profiling accuracy improvements:
prof_{m,re}alloc() must set the object's profiling context regardless of
whether it is sampled.

Fix management of the CHUNK_MAP_CLASS chunk map bits, such that all
large object (re-)allocation paths correctly initialize the bits.  Prior
to this fix, in-place realloc() cleared the bits, resulting in incorrect
reported object size from arena_salloc_demote().  After this fix the
non-demoted bit pattern is all zeros (instead of all ones), which makes
it easier to assure that the bits are properly set.
2010-10-22 10:45:59 -07:00
Jason Evans
81b4e6eb6f Fix a heap profiling regression.
Call prof_ctx_set() in all paths through prof_{m,re}alloc().

Inline arena_prof_ctx_get().
2010-10-20 20:52:00 -07:00
Jason Evans
4d6a134e13 Inline the fast path for heap sampling.
Inline the heap sampling code that is executed for every allocation
event (regardless of whether a sample is taken).

Combine all prof TLS data into a single data structure, in order to
reduce the TLS lookup volume.
2010-10-20 19:05:59 -07:00
Jason Evans
93443689a4 Add per thread allocation counters, and enhance heap sampling.
Add the "thread.allocated" and "thread.deallocated" mallctls, which can
be used to query the total number of bytes ever allocated/deallocated by
the calling thread.

Add s2u() and sa2u(), which can be used to compute the usable size that
will result from an allocation request of a particular size/alignment.

Re-factor ipalloc() to use sa2u().

Enhance the heap profiler to trigger samples based on usable size,
rather than request size.  This has a subtle, but important, impact on
the accuracy of heap sampling.  For example, previous to this change,
16- and 17-byte objects were sampled at nearly the same rate, but
17-byte objects actually consume 32 bytes each.  Therefore it was
possible for the sample to be somewhat skewed compared to actual memory
usage of the allocated objects.
2010-10-20 17:39:18 -07:00
Jason Evans
940a2e02b2 Fix numerous arena bugs.
In arena_ralloc_large_grow(), update the map element for the end of the
newly grown run, rather than the interior map element that was the
beginning of the appended run.  This is a long-standing bug, and it had
the potential to cause massive corruption, but triggering it required
roughly the following sequence of events:
  1) Large in-place growing realloc(), with left-over space in the run
     that followed the large object.
  2) Allocation of the remainder run left over from (1).
  3) Deallocation of the remainder run *before* deallocation of the
     large run, with unfortunate interior map state left over from
     previous run allocation/deallocation activity, such that one or
     more pages of allocated memory would be treated as part of the
     remainder run during run coalescing.
In summary, this was a bad bug, but it was difficult to trigger.

In arena_bin_malloc_hard(), if another thread wins the race to allocate
a bin run, dispose of the spare run via arena_bin_lower_run() rather
than arena_run_dalloc(), since the run has already been prepared for use
as a bin run.  This bug has existed since March 14, 2010:
    e00572b384
    mmap()/munmap() without arena->lock or bin->lock.

Fix bugs in arena_dalloc_bin_run(), arena_trim_head(),
arena_trim_tail(), and arena_ralloc_large_grow() that could cause the
CHUNK_MAP_UNZEROED map bit to become corrupted.  These are all
long-standing bugs, but the chances of them actually causing problems
was much lower before the CHUNK_MAP_ZEROED --> CHUNK_MAP_UNZEROED
conversion.

Fix a large run statistics regression in arena_ralloc_large_grow() that
was introduced on September 17, 2010:
    8e3c3c61b5
    Add {,r,s,d}allocm().

Add debug code to validate that supposedly pre-zeroed memory really is.
2010-10-17 17:52:14 -07:00
Jason Evans
c6e950665c Increase PRN 'a' and 'c' constants.
Increase PRN 'a' and 'c' constants, so that high bits tend to cascade
more.
2010-10-03 00:22:46 -07:00
Jason Evans
588a32cd84 Increase default backtrace depth from 4 to 128.
Increase the default backtrace depth, because shallow backtraces tend to
result in confusing pprof output graphs.
2010-10-02 22:38:14 -07:00
Jason Evans
a881cd2c61 Make cumulative heap profile data optional.
Add the R option to control whether cumulative heap profile data
are maintained.  Add the T option to control the size of per thread
backtrace caches, primarily because when the R option is specified,
backtraces that no longer have allocations associated with them are
discarded as soon as no thread caches refer to them.
2010-10-02 21:40:26 -07:00
Jason Evans
3377ffa1f4 Change CHUNK_MAP_ZEROED to CHUNK_MAP_UNZEROED.
Invert the chunk map bit that tracks whether a page is zeroed, so that
for zeroed arena chunks, the interior of the page map does not need to
be initialized (as it consists entirely of zero bytes).
2010-10-01 17:53:37 -07:00
Jason Evans
7393f44ff0 Omit chunk header in arena chunk map.
Omit the first map_bias elements of the map in arena_chunk_t.  This
avoids barely spilling over into an extra chunk header page for common
chunk sizes.
2010-10-01 17:35:43 -07:00
Jason Evans
37dab02e52 Disable interval-based profile dumps by default.
It is common to have to specify something like JEMALLOC_OPTIONS=F31i,
because interval-based dumps are often unuseful or too expensive.
Therefore, disable interval-based dumps by default.  To get the previous
default behavior it is now necessary to specify 31I as part of the
options.
2010-09-30 17:10:17 -07:00
Jason Evans
6005f0710c Add the "arenas.purge" mallctl. 2010-09-30 16:55:08 -07:00
Jason Evans
075e77cad4 Fix compiler warnings and errors.
Use INT_MAX instead of MAX_INT in ALLOCM_ALIGN(), and #include
<limits.h> in order to get its definition.

Modify prof code related to hash tables to avoid aliasing warnings from
gcc 4.1.2 (gcc 4.4.0 and 4.4.3 do not warn).
2010-09-20 19:53:25 -07:00
Jason Evans
355b438c85 Fix compiler warnings.
Add --enable-cc-silence, which can be used to silence harmless warnings.

Fix an aliasing bug in ckh_pointer_hash().
2010-09-20 19:20:48 -07:00
Jason Evans
6a0d2918ce Add memalign() and valloc() overrides.
If memalign() and/or valloc() are present on the system, override them
in order to avoid mixed allocator usage.
2010-09-20 16:52:41 -07:00
Jason Evans
a09f55c87d Wrap strerror_r().
Create the buferror() function, which wraps strerror_r().  This is
necessary because glibc provides a non-standard strerror_r().
2010-09-20 16:05:41 -07:00
Jason Evans
a094babe33 Add gcc attributes for *allocm() prototypes. 2010-09-17 17:35:42 -07:00
Jason Evans
8e3c3c61b5 Add {,r,s,d}allocm().
Add allocm(), rallocm(), sallocm(), and dallocm(), which are a
functional superset of malloc(), calloc(), posix_memalign(),
malloc_usable_size(), and free().
2010-09-17 15:46:18 -07:00
Jason Evans
8d7a94b275 Fix porting regressions.
Fix new build failures and test failures on Linux that were introduced
by the port to OS X.
2010-09-11 23:38:12 -07:00
Jason Evans
2dbecf1f62 Port to Mac OS X.
Add Mac OS X support, based in large part on the OS X support in
Mozilla's version of jemalloc.
2010-09-11 18:20:16 -07:00
Jordan DeLong
2206e1acc1 Add MAP_NORESERVE support.
Add MAP_NORESERVE to the chunk_mmap() case being used by
chunk_swap_enable(), if the system supports it.
2010-05-11 11:46:53 -07:00
Jason Evans
ecea0f6125 Fix junk filling of cached large objects.
Use the size argument to tcache_dalloc_large() to control the number of
bytes set to 0x5a when junk filling is enabled, rather than accessing a
non-existent arena bin.  This bug was capable of corrupting an
arbitrarily large memory region, depending on what followed the arena
data structure in memory (typically zeroed memory, another arena_t, or a
red-black tree node for a huge object).
2010-04-28 12:00:59 -07:00