Commit Graph

659 Commits

Author SHA1 Message Date
Jason Evans
40cbd30d50 Fix huge_ralloc_no_move() to succeed more often.
Fix huge_ralloc_no_move() to succeed if an allocation request results in
the same usable size as the existing allocation, even if the request
size is smaller than the usable size.  This bug did not cause
correctness issues, but it could cause unnecessary moves during
reallocation.
2015-07-24 18:20:48 -07:00
Jason Evans
87ccb55547 Fix huge_palloc() to handle size rather than usize input.
huge_ralloc() passes a size that may not be precisely a size class, so
make huge_palloc() handle the more general case of a size input rather
than usize.

This regression appears to have been introduced by the addition of
in-place huge reallocation; as such it was never incorporated into a
release.
2015-07-23 17:18:49 -07:00
Jason Evans
50883deb6e Change arena_palloc_large() parameter from size to usize.
This change merely documents that arena_palloc_large() always receives
usize as its argument.
2015-07-23 17:13:18 -07:00
Jason Evans
5fae7dc1b3 Fix MinGW-related portability issues.
Create and use FMT* macros that are equivalent to the PRI* macros that
inttypes.h defines.  This allows uniform use of the Unix-specific format
specifiers, e.g. "%zu", as well as avoiding Windows-specific definitions
of e.g. PRIu64.

Add ffs()/ffsl() support for compiling with gcc.

Extract compatibility definitions of ENOENT, EINVAL, EAGAIN, EPERM,
ENOMEM, and ENORANGE into include/msvc_compat/windows_extra.h and
use the file for tests as well as for core jemalloc code.
2015-07-23 13:56:25 -07:00
Jason Evans
e42c309eba Add JEMALLOC_FORMAT_PRINTF().
Replace JEMALLOC_ATTR(format(printf, ...). with
JEMALLOC_FORMAT_PRINTF(), so that configuration feature tests can
omit the attribute if it would cause extraneous compilation warnings.
2015-07-22 15:44:47 -07:00
Jason Evans
00632609df Move JEMALLOC_NOTHROW just after return type.
Only use __declspec(nothrow) in C++ mode.

This resolves #244.
2015-07-21 08:21:13 -07:00
Mike Hommey
50cd636eed Remove JEMALLOC_ALLOC_SIZE annotations on functions not returning pointers
As per gcc documentation:
  The alloc_size attribute is used to tell the compiler that the function
  return value points to memory (...)

This resolves #245.
2015-07-21 09:16:07 +09:00
Jason Evans
f2bc85298c Add the config.cache_oblivious mallctl. 2015-07-17 16:38:25 -07:00
Jason Evans
aa2826621e Revert to first-best-fit run/chunk allocation.
This effectively reverts 97c04a9383 (Use
first-fit rather than first-best-fit run/chunk allocation.).  In some
pathological cases, first-fit search dominates allocation time, and it
also tends not to converge as readily on a steady state of memory
layout, since precise allocation order has a bigger effect than for
first-best-fit.
2015-07-15 17:15:19 -07:00
Jason Evans
ae93d6bf36 Avoid function prototype incompatibilities.
Add various function attributes to the exported functions to give the
compiler more information to work with during optimization, and also
specify throw() when compiling with C++ on Linux, in order to adequately
match what __THROW does in glibc.

This resolves #237.
2015-07-10 16:09:40 -07:00
Jason Evans
d508ec71eb Fix a variable declaration typo. 2015-07-07 20:28:22 -07:00
Jason Evans
b946086b08 Use jemalloc_ffs() rather than ffs(). 2015-07-07 20:16:25 -07:00
Jason Evans
0313607e66 Fix MinGW build warnings.
Conditionally define ENOENT, EINVAL, etc. (was unconditional).

Add/use PRIzu, PRIzd, and PRIzx for use in malloc_printf() calls.  gcc issued
(harmless) warnings since e.g. "%zu" should be "%Iu" on Windows, and the
alternative to this workaround would have been to disable the function
attributes which cause gcc to look for type mismatches in formatted printing
function calls.
2015-07-07 20:10:28 -07:00
Jason Evans
0dd3ad3841 Fix an assignment type warning for tls_callback. 2015-07-07 20:10:27 -07:00
Jason Evans
bce61d61bb Move a variable declaration closer to its use. 2015-07-07 09:32:05 -07:00
Matthijs
a1aaf949a5 Optimizations for Windows
- Set opt_lg_chunk based on run-time OS setting
- Verify LG_PAGE is compatible with run-time OS setting
- When targeting Windows Vista or newer, use SRWLOCK instead of CRITICAL_SECTION
- When targeting Windows Vista or newer, statically initialize init_lock
2015-06-25 22:53:58 +02:00
Jason Evans
241abc601b Fix size class overflow handling when profiling is enabled.
Fix size class overflow handling for malloc(), posix_memalign(),
memalign(), calloc(), and realloc() when profiling is enabled.

Remove an assertion that erroneously caused arena_sdalloc() to fail when
profiling was enabled.

This resolves #232.
2015-06-23 18:56:14 -07:00
Jason Evans
0a9f9a4d51 Convert arena_maybe_purge() recursion to iteration.
This resolves #235.
2015-06-22 18:50:58 -07:00
Jason Evans
dc0610a714 Add alignment assertions to public aligned allocation functions. 2015-06-22 18:48:58 -07:00
Jason Evans
4f6f2b131e Fix two valgrind integration regressions.
The regressions were never merged into the master branch.
2015-06-22 14:38:06 -07:00
Jason Evans
56048baeb4 Clarify relationship between stats.resident and stats.mapped. 2015-05-29 19:21:10 -07:00
Jason Evans
09983d2f54 Bypass tcache when draining quarantined allocations.
This avoids the potential surprise of deallocating an object with one
tcache specified, and having the object cached in a different tcache
once it drains from the quarantine.
2015-05-29 19:20:36 -07:00
Jason Evans
836bbe9951 Impose a minimum tcache count for small size classes.
Now that small allocation runs have fewer regions due to run metadata
residing in chunk headers, an explicit minimum tcache count is needed to
make sure that tcache adequately amortizes synchronization overhead.
2015-05-19 17:47:16 -07:00
Jason Evans
5154175cf1 Fix performance regression in arena_palloc().
Pass large allocation requests to arena_malloc() when possible.  This
regression was introduced by 155bfa7da1
(Normalize size classes.).
2015-05-19 17:42:31 -07:00
Jason Evans
5aa50a2834 Fix nhbins calculation.
This regression was introduced by
155bfa7da1 (Normalize size classes.).
2015-05-19 17:40:37 -07:00
Jason Evans
fd5f9e43c3 Avoid atomic operations for dependent rtree reads. 2015-05-15 17:02:30 -07:00
Jason Evans
8a03cf039c Implement cache index randomization for large allocations.
Extract szad size quantization into {extent,run}_quantize(), and .
quantize szad run sizes to the union of valid small region run sizes and
large run sizes.

Refactor iteration in arena_run_first_fit() to use
run_quantize{,_first,_next(), and add support for padded large runs.

For large allocations that have no specified alignment constraints,
compute a pseudo-random offset from the beginning of the first backing
page that is a multiple of the cache line size.  Under typical
configurations with 4-KiB pages and 64-byte cache lines this results in
a uniform distribution among 64 page boundary offsets.

Add the --disable-cache-oblivious option, primarily intended for
performance testing.

This resolves #13.
2015-05-06 13:27:39 -07:00
Jason Evans
7041720ac2 Rename pprof to jeprof.
This rename avoids installation collisions with the upstream gperftools.
Additionally, jemalloc's per thread heap profile functionality
introduced an incompatible file format, so it's now worthwhile to
clearly distinguish jemalloc's version of this script from the upstream
version.

This resolves #229.
2015-05-01 12:31:12 -07:00
Jason Evans
8e33c21d2d Prefer /proc/<pid>/task/<pid>/maps over /proc/<pid>/maps on Linux.
This resolves #227.
2015-05-01 09:03:20 -07:00
Igor Podlesny
95e88de0aa Concise JEMALLOC_HAVE_ISSETUGID case in secure_getenv(). 2015-04-30 11:48:56 -07:00
Jason Evans
65db63cf3f Fix in-place shrinking huge reallocation purging bugs.
Fix the shrinking case of huge_ralloc_no_move_similar() to purge the
correct number of pages, at the correct offset.  This regression was
introduced by 8d6a3e8321 (Implement
dynamic per arena control over dirty page purging.).

Fix huge_ralloc_no_move_shrink() to purge the correct number of pages.
This bug was introduced by 9673983443
(Purge/zero sub-chunk huge allocations as necessary.).
2015-03-25 19:10:06 -07:00
Jason Evans
562d266511 Add the "stats.arenas.<i>.lg_dirty_mult" mallctl. 2015-03-24 16:41:38 -07:00
Jason Evans
bd16ea49c3 Fix signed/unsigned comparison in arena_lg_dirty_mult_valid(). 2015-03-24 15:59:28 -07:00
Jason Evans
d324ca8933 Fix arena_get() usage.
Fix arena_get() calls that specify refresh_if_missing=false.  In
ctl_refresh() and ctl.c's arena_purge(), these calls attempted to only
refresh once, but did so in an unreliable way.
arena_i_lg_dirty_mult_ctl() was simply wrong to pass
refresh_if_missing=false.
2015-03-24 12:33:12 -07:00
Igor Podlesny
ef0a0cc328 We have pages_unmap(ret, size) so we use it. 2015-03-23 21:12:33 -07:00
Jason Evans
4acd75a694 Add the "stats.allocated" mallctl. 2015-03-23 17:26:53 -07:00
Qinfan Wu
fd5901ce30 Fix a compile error caused by mixed declarations and code. 2015-03-21 10:18:39 -07:00
Jason Evans
7e336e7359 Fix lg_dirty_mult-related stats printing.
This regression was introduced by
8d6a3e8321 (Implement dynamic per arena
control over dirty page purging.).

This resolves #215.
2015-03-20 18:08:10 -07:00
Jason Evans
e0a08a1496 Restore --enable-ivsalloc.
However, unlike before it was removed do not force --enable-ivsalloc
when Darwin zone allocator integration is enabled, since the zone
allocator code uses ivsalloc() regardless of whether
malloc_usable_size() and sallocx() do.

This resolves #211.
2015-03-18 21:06:58 -07:00
Jason Evans
8d6a3e8321 Implement dynamic per arena control over dirty page purging.
Add mallctls:
- arenas.lg_dirty_mult is initialized via opt.lg_dirty_mult, and can be
  modified to change the initial lg_dirty_mult setting for newly created
  arenas.
- arena.<i>.lg_dirty_mult controls an individual arena's dirty page
  purging threshold, and synchronously triggers any purging that may be
  necessary to maintain the constraint.
- arena.<i>.chunk.purge allows the per arena dirty page purging function
  to be replaced.

This resolves #93.
2015-03-18 18:55:33 -07:00
Jason Evans
04211e2266 Fix heap profiling regressions.
Remove the prof_tctx_state_destroying transitory state and instead add
the tctx_uid field, so that the tuple <thr_uid, tctx_uid> uniquely
identifies a tctx.  This assures that tctx's are well ordered even when
more than two with the same thr_uid coexist.  A previous attempted fix
based on prof_tctx_state_destroying was only sufficient for protecting
against two coexisting tctx's, but it also introduced a new dumping
race.

These regressions were introduced by
602c8e0971 (Implement per thread heap
profiling.) and 764b00023f (Fix a heap
profiling regression.).
2015-03-16 15:11:06 -07:00
Jason Evans
262146dfc4 Eliminate innocuous compiler warnings. 2015-03-14 14:34:16 -07:00
Jason Evans
764b00023f Fix a heap profiling regression.
Add the prof_tctx_state_destroying transitionary state to fix a race
between a thread destroying a tctx and another thread creating a new
equivalent tctx.

This regression was introduced by
602c8e0971 (Implement per thread heap
profiling.).
2015-03-14 14:01:35 -07:00
Mike Hommey
f69e2f6fda Use the error code given to buferror on Windows
a14bce85 made buferror not take an error code, and make the Windows
code path for buferror use GetLastError, while the alternative code
paths used errno. Then 2a83ed02 made buferror take an error code
again, and while it changed the non-Windows code paths to use that
error code, the Windows code path was not changed accordingly.
2015-03-13 13:54:02 -07:00
Jason Evans
d69964bd2d Fix a heap profiling regression.
Fix prof_tctx_comp() to incorporate tctx state into the comparison.
During a dump it is possible for both a purgatory tctx and an otherwise
equivalent nominal tctx to reside in the tree at the same time.

This regression was introduced by
602c8e0971 (Implement per thread heap
profiling.).
2015-03-12 16:25:18 -07:00
Jason Evans
fbd8d773ad Fix unsigned comparison underflow.
These bugs only affected tests and debug builds.
2015-03-11 23:14:50 -07:00
Jason Evans
bc45d41d23 Fix a declaration-after-statement regression. 2015-03-11 16:50:40 -07:00
Jason Evans
f5c8f37259 Normalize rdelm/rd structure field naming. 2015-03-10 18:29:49 -07:00
Jason Evans
38e42d311c Refactor dirty run linkage to reduce sizeof(extent_node_t). 2015-03-10 18:15:40 -07:00
Jason Evans
04ca7580db Fix a chunk_recycle() regression.
This regression was introduced by
97c04a9383 (Use first-fit rather than
first-best-fit run/chunk allocation.).
2015-03-06 23:25:13 -08:00
Jason Evans
97c04a9383 Use first-fit rather than first-best-fit run/chunk allocation.
This tends to more effectively pack active memory toward low addresses.
However, additional tree searches are required in many cases, so whether
this change stands the test of time will depend on real-world
benchmarks.
2015-03-06 20:21:41 -08:00
Jason Evans
5707d6f952 Quantize szad trees by size class.
Treat sizes that round down to the same size class as size-equivalent
in trees that are used to search for first best fit, so that there are
only as many "firsts" as there are size classes.  This comes closer to
the ideal of first fit.
2015-03-06 20:21:41 -08:00
Jason Evans
35e3fd9a63 Fix a compilation error and an incorrect assertion. 2015-02-18 16:51:51 -08:00
Jason Evans
99bd94fb65 Fix chunk cache races.
These regressions were introduced by
ee41ad409a (Integrate whole chunks into
unused dirty page purging machinery.).
2015-02-18 16:40:53 -08:00
Jason Evans
738e089a2e Rename "dirty chunks" to "cached chunks".
Rename "dirty chunks" to "cached chunks", in order to avoid overloading
the term "dirty".

Fix the regression caused by 339c2b23b2
(Fix chunk_unmap() to propagate dirty state.), and actually address what
that change attempted, which is to only purge chunks once, and propagate
whether zeroed pages resulted into chunk_record().
2015-02-18 01:15:50 -08:00
Jason Evans
339c2b23b2 Fix chunk_unmap() to propagate dirty state.
Fix chunk_unmap() to propagate whether a chunk is dirty, and modify
dirty chunk purging to record this information so it can be passed to
chunk_unmap().  Since the broken version of chunk_unmap() claimed that
all chunks were clean, this resulted in potential memory corruption for
purging implementations that do not zero (e.g. MADV_FREE).

This regression was introduced by
ee41ad409a (Integrate whole chunks into
unused dirty page purging machinery.).
2015-02-17 22:25:56 -08:00
Jason Evans
47701b22ee arena_chunk_dirty_node_init() --> extent_node_dirty_linkage_init() 2015-02-17 22:23:10 -08:00
Jason Evans
a4e1888d1a Simplify extent_node_t and add extent_node_init(). 2015-02-17 15:13:52 -08:00
Jason Evans
ee41ad409a Integrate whole chunks into unused dirty page purging machinery.
Extend per arena unused dirty page purging to manage unused dirty chunks
in aaddtion to unused dirty runs.  Rather than immediately unmapping
deallocated chunks (or purging them in the --disable-munmap case), store
them in a separate set of trees, chunks_[sz]ad_dirty.  Preferrentially
allocate dirty chunks.  When excessive unused dirty pages accumulate,
purge runs and chunks in ingegrated LRU order (and unmap chunks in the
--enable-munmap case).

Refactor extent_node_t to provide accessor functions.
2015-02-16 21:02:17 -08:00
Jason Evans
2195ba4e1f Normalize *_link and link_* fields to all be *_link. 2015-02-15 16:43:52 -08:00
Jason Evans
b01186cebd Remove redundant tcache_boot() call. 2015-02-15 14:04:55 -08:00
Jason Evans
41cfe03f39 If MALLOCX_ARENA(a) is specified, use it during tcache fill. 2015-02-13 15:28:56 -08:00
Jason Evans
88fef7ceda Refactor huge_*() calls into arena internals.
Make redirects to the huge_*() API the arena code's responsibility,
since arenas now take responsibility for all allocation sizes.
2015-02-12 14:06:37 -08:00
Daniel Micay
1eaf3b6f34 add missing check for new_addr chunk size
8ddc93293c switched this to over using the
address tree in order to avoid false negatives, so it now needs to check
that the size of the free extent is large enough to satisfy the request.
2015-02-12 15:46:30 -05:00
Jason Evans
cbf3a6d703 Move centralized chunk management into arenas.
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.

Add chunk node caching to arenas, in order to avoid contention on the
base allocator.

Use chunks_rtree to look up huge allocations rather than a red-black
tree.  Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).

Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled.  The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.

Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering.  These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.

Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
2015-02-12 00:15:56 -08:00
Jason Evans
f30e261c5b Update ckh to support metadata allocation tracking. 2015-02-12 00:15:24 -08:00
Jason Evans
064dbfbaf7 Fix a regression in tcache_bin_flush_small().
Fix a serious regression in tcache_bin_flush_small() that was introduced
by 1cb181ed63 (Implement explicit tcache
support.).
2015-02-12 00:15:16 -08:00
Jason Evans
9e561e8d3f Test and fix tcache ID recycling. 2015-02-10 09:03:48 -08:00
Jason Evans
1cb181ed63 Implement explicit tcache support.
Add the MALLOCX_TCACHE() and MALLOCX_TCACHE_NONE macros, which can be
used in conjunction with the *allocx() API.

Add the tcache.create, tcache.flush, and tcache.destroy mallctls.

This resolves #145.
2015-02-09 17:44:48 -08:00
Jason Evans
8d0e04d42f Refactor rtree to be lock-free.
Recent huge allocation refactoring associates huge allocations with
arenas, but it remains necessary to quickly look up huge allocation
metadata during reallocation/deallocation.  A global radix tree remains
a good solution to this problem, but locking would have become the
primary bottleneck after (upcoming) migration of chunk management from
global to per arena data structures.

This lock-free implementation uses double-checked reads to traverse the
tree, so that in the steady state, each read or write requires only a
single atomic operation.

This implementation also assures that no more than two tree levels
actually exist, through a combination of careful virtual memory
allocation which makes large sparse nodes cheap, and skipping the root
node on x64 (possible because the top 16 bits are all 0 in practice).
2015-02-04 16:51:53 -08:00
Jason Evans
f500a10b2e Refactor base_alloc() to guarantee demand-zeroed memory.
Refactor base_alloc() to guarantee that allocations are carved from
demand-zeroed virtual memory.  This supports sparse data structures such
as multi-page radix tree nodes.

Enhance base_alloc() to keep track of fragments which were too small to
support previous allocation requests, and try to consume them during
subsequent requests.  This becomes important when request sizes commonly
approach or exceed the chunk size (as could radix tree node
allocations).
2015-02-04 16:51:53 -08:00
Jason Evans
8ddc93293c Fix chunk_recycle()'s new_addr functionality.
Fix chunk_recycle()'s new_addr functionality to search by address rather
than just size if new_addr is specified.  The functionality added by
a95018ee81 (Attempt to expand huge
allocations in-place.) only worked if the two search orders happened to
return the same results (e.g. in simple test cases).
2015-02-04 16:50:04 -08:00
Mike Hommey
6505733012 Make opt.lg_dirty_mult work as documented
The documentation for opt.lg_dirty_mult says:
    Per-arena minimum ratio (log base 2) of active to dirty
    pages.  Some dirty unused pages may be allowed to accumulate,
    within the limit set by the ratio (or one chunk worth of dirty
    pages, whichever is greater) (...)

The restriction in parentheses currently doesn't happen. This makes
jemalloc aggressively madvise(), which in turns increases the amount
of page faults significantly.

For instance, this resulted in several(!) hundred(!) milliseconds
startup regression on Firefox for Android.

This may require further tweaking, but starting with actually doing
what the documentation says is a good start.
2015-02-04 07:16:55 +09:00
Felix Janda
008267b9f6 util.c: strerror_r returns char* only on glibc 2015-02-03 18:58:02 +01:00
Jason Evans
5b8ed5b7c9 Implement the prof.gdump mallctl.
This feature makes it possible to toggle the gdump feature on/off during
program execution, whereas the the opt.prof_dump mallctl value can only
be set during program startup.

This resolves #72.
2015-01-25 21:21:35 -08:00
Jason Evans
0fd663e9c5 Avoid pointless chunk_recycle() call.
Avoid calling chunk_recycle() for mmap()ed chunks if config_munmap is
disabled, in which case there are never any recyclable chunks.

This resolves #164.
2015-01-25 17:31:24 -08:00
Sébastien Marie
eee27b2a38 huge_node_locked don't have to unlock huge_mtx
in src/huge.c, after each call of huge_node_locked(), huge_mtx is
already unlocked. don't unlock it twice (it is a undefined behaviour).
2015-01-25 15:12:28 +01:00
Jason Evans
4581b97809 Implement metadata statistics.
There are three categories of metadata:

- Base allocations are used for bootstrap-sensitive internal allocator
  data structures.
- Arena chunk headers comprise pages which track the states of the
  non-metadata pages.
- Internal allocations differ from application-originated allocations
  in that they are for internal use, and that they are omitted from heap
  profiles.

The metadata statistics comprise the metadata categories as follows:

- stats.metadata: All metadata -- base + arena chunk headers + internal
  allocations.
- stats.arenas.<i>.metadata.mapped: Arena chunk headers.
- stats.arenas.<i>.metadata.allocated: Internal allocations.  This is
  reported separately from the other metadata statistics because it
  overlaps with the allocated and active statistics, whereas the other
  metadata statistics do not.

Base allocations are not reported separately, though their magnitude can
be computed by subtracting the arena-specific metadata.

This resolves #163.
2015-01-23 23:34:43 -08:00
Guilherme Goncalves
ec98a44662 Use the correct type for opt.junk when printing stats. 2015-01-23 11:01:42 -02:00
Jason Evans
10aff3f3e1 Refactor bootstrapping to delay tsd initialization.
Refactor bootstrapping to delay tsd initialization, primarily to support
integration with FreeBSD's libc.

Refactor a0*() for internal-only use, and add the
bootstrap_{malloc,calloc,free}() API for use by FreeBSD's libc.  This
separation limits use of the a0*() functions to metadata allocation,
which doesn't require malloc/calloc/free API compatibility.

This resolves #170.
2015-01-22 14:04:27 -08:00
Jason Evans
bc96876f99 Fix arenas_cache_cleanup().
Fix arenas_cache_cleanup() to check whether arenas_cache is NULL before
deallocation, rather than checking arenas.
2015-01-22 14:02:56 -08:00
Jason Evans
44b57b8e8b Fix OOM handling in memalign() and valloc().
Fix memalign() and valloc() to heed imemalign()'s return value.

Reported by Kurt Wampler.
2015-01-16 18:04:17 -08:00
Jason Evans
24057f3da8 Fix an infinite recursion bug related to a0/tsd bootstrapping.
This resolves #184.
2015-01-14 16:27:31 -08:00
Guilherme Goncalves
9c6a8d3b0c Move variable declaration to the top its block for MSVC compatibility. 2014-12-17 14:46:35 -02:00
Guilherme Goncalves
2c5cb613df Introduce two new modes of junk filling: "alloc" and "free".
In addition to true/false, opt.junk can now be either "alloc" or "free",
giving applications the possibility of junking memory only on allocation
or deallocation.

This resolves #172.
2014-12-14 17:07:26 -08:00
Daniel Micay
b74041fb6e Ignore MALLOC_CONF in set{uid,gid,cap} binaries.
This eliminates the malloc tunables as tools for an attacker.

Closes #173
2014-12-14 15:36:15 -08:00
Jason Evans
e12eaf93dc Style and spelling fixes. 2014-12-08 16:34:04 -08:00
Jason Evans
1036ddbf11 Fix OOM cleanup in huge_palloc().
Fix OOM cleanup in huge_palloc() to call idalloct() rather than
base_node_dalloc().  This bug is a result of incomplete refactoring, and
has no impact other than leaking memory during OOM.
2014-12-04 16:42:42 -08:00
Daniel Micay
879e76a9e5 teach the dss chunk allocator to handle new_addr
This provides in-place expansion of huge allocations when the end of the
allocation is at the end of the sbrk heap. There's already the ability
to extend in-place via recycled chunks but this handles the initial
growth of the heap via repeated vector / string reallocations.

A possible future extension could allow realloc to go from the following:

    | huge allocation | recycled chunks |
                                        ^ dss_end

To a larger allocation built from recycled *and* new chunks:

    |                      huge allocation                      |
                                                                ^ dss_end

Doing that would involve teaching the chunk recycling code to request
new chunks to satisfy the request. The chunk_dss code wouldn't require
any further changes.

    #include <stdlib.h>

    int main(void) {
        size_t chunk = 4 * 1024 * 1024;
        void *ptr = NULL;
        for (size_t size = chunk; size < chunk * 128; size *= 2) {
            ptr = realloc(ptr, size);
            if (!ptr) return 1;
        }
    }

dss:secondary: 0.083s
dss:primary: 0.083s

After:

dss:secondary: 0.083s
dss:primary: 0.003s

The dss heap grows in the upwards direction, so the oldest chunks are at
the low addresses and they are used first. Linux prefers to grow the
mmap heap downwards, so the trick will not work in the *current* mmap
chunk allocator as a huge allocation will only be at the top of the heap
in a contrived case.
2014-11-28 16:11:19 -08:00
Jason Evans
d49cb68b9e Fix more pointer arithmetic undefined behavior.
Reported by Guilherme Gonçalves.

This resolves #166.
2014-11-17 10:31:59 -08:00
Jason Evans
2012d5a560 Fix pointer arithmetic undefined behavior.
Reported by Denis Denisov.
2014-11-17 09:54:49 -08:00
Jason Evans
9cf2be0a81 Make quarantine_init() static. 2014-11-07 14:50:38 -08:00
Jason Evans
c002a5c800 Fix two quarantine regressions.
Fix quarantine to actually update tsd when expanding, and to avoid
double initialization (leaking the first quarantine) due to recursive
initialization.

This resolves #161.
2014-11-04 18:03:11 -08:00
Jason Evans
2b2f6dc1e4 Disable arena_dirty_count() validation. 2014-11-01 02:29:10 -07:00
Jason Evans
82cb603ed7 Don't dereference NULL tdata in prof_{enter,leave}().
It is possible for the thread's tdata to be NULL late during thread
destruction, so take care not to dereference a NULL pointer in such
cases.
2014-11-01 00:20:28 -07:00
Daniel Micay
dc65213111 rm unused arena wrangling from xallocx
It has no use for the arena_t since unlike rallocx it never makes a new
memory allocation. It's just an unused parameter in ixalloc_helper.
2014-10-30 23:19:34 -07:00
Jason Evans
cfc5706f69 Miscellaneous cleanups. 2014-10-30 23:18:45 -07:00
Daniel Micay
d33f834591 avoid redundant chunk header reads
* use sized deallocation in iralloct_realign
* iralloc and ixalloc always need the old size, so pass it in from the
  caller where it's often already calculated
2014-10-30 17:06:38 -07:00
Daniel Micay
809b0ac391 mark huge allocations as unlikely
This cleans up the fast path a bit more by moving away more code.
2014-10-30 17:06:38 -07:00
Jason Evans
c93ed81cd0 Fix prof_{enter,leave}() calls to pass tdata_self. 2014-10-30 16:50:33 -07:00
Jason Evans
af1f592763 Use JEMALLOC_INLINE_C everywhere it's appropriate. 2014-10-30 16:38:08 -07:00
Jason Evans
8f47e3d82b Merge pull request #151 from thestinger/ralloc
use sized deallocation internally for ralloc
2014-10-16 13:12:05 -07:00
Daniel Micay
a9ea10d27c use sized deallocation internally for ralloc
The size of the source allocation is known at this point, so reading the
chunk header can be avoided for the small size class fast path. This is
not very useful right now, but it provides a significant performance
boost with an alternate ralloc entry point taking the old size.
2014-10-16 15:39:59 -04:00
Jason Evans
c83bccd273 Initialize chunks_mtx for all configurations.
This resolves #150.
2014-10-16 12:33:18 -07:00
Jason Evans
9673983443 Purge/zero sub-chunk huge allocations as necessary.
Purge trailing pages during shrinking huge reallocation when resulting
size is not a multiple of the chunk size.  Similarly, zero pages if
necessary during growing huge reallocation when the resulting size is
not a multiple of the chunk size.
2014-10-15 18:02:02 -07:00
Jason Evans
bf8d6a1092 Add small run utilization to stats output.
Add the 'util' column, which reports the proportion of available regions
that are currently in use for each small size class.  Small run
utilization is the complement of external fragmentation.  For example,
utilization of 0.75 indicates that 25% of small run memory is consumed
by external fragmentation, in other (more obtuse) words, 33% external
fragmentation overhead.

This resolves #27.
2014-10-15 16:18:42 -07:00
Jason Evans
9b41ac909f Fix huge allocation statistics. 2014-10-14 22:20:00 -07:00
Jason Evans
3c4d92e82a Add per size class huge allocation statistics.
Add per size class huge allocation statistics, and normalize various
stats:
- Change the arenas.nlruns type from size_t to unsigned.
- Add the arenas.nhchunks and arenas.hchunks.<i>.size mallctl's.
- Replace the stats.arenas.<i>.bins.<j>.allocated mallctl with
  stats.arenas.<i>.bins.<j>.curregs .
- Add the stats.arenas.<i>.hchunks.<j>.nmalloc,
  stats.arenas.<i>.hchunks.<j>.ndalloc,
  stats.arenas.<i>.hchunks.<j>.nrequests, and
  stats.arenas.<i>.hchunks.<j>.curhchunks mallctl's.
2014-10-12 23:02:10 -07:00
Jason Evans
44c97b712e Fix a prof_tctx_t/prof_tdata_t cleanup race.
Fix a prof_tctx_t/prof_tdata_t cleanup race by storing a copy of thr_uid
in prof_tctx_t, so that the associated tdata need not be present during
tctx teardown.
2014-10-12 13:03:20 -07:00
Jason Evans
381c23dd9d Remove arena_dalloc_bin_run() clean page preservation.
Remove code in arena_dalloc_bin_run() that preserved the "clean" state
of trailing clean pages by splitting them into a separate run during
deallocation.  This was a useful mechanism for reducing dirty page
churn when bin runs comprised many pages, but bin runs are now quite
small.

Remove the nextind field from arena_run_t now that it is no longer
needed, and change arena_run_t's bin field (arena_bin_t *) to binind
(index_t).  These two changes remove 8 bytes of chunk header overhead
per page, which saves 1/512 of all arena chunk memory.
2014-10-10 23:01:03 -07:00
Jason Evans
81e547566e Add --with-lg-tiny-min, generalize --with-lg-quantum. 2014-10-10 22:35:07 -07:00
Jason Evans
9b75677e53 Don't fetch tsd in a0{d,}alloc().
Don't fetch tsd in a0{d,}alloc(), because doing so can cause infinite
recursion on systems that require an allocated tsd wrapper.
2014-10-10 18:19:20 -07:00
Jason Evans
fc0b3b7383 Add configure options.
Add:
  --with-lg-page
  --with-lg-page-sizes
  --with-lg-size-class-group
  --with-lg-quantum

Get rid of STATIC_PAGE_SHIFT, in favor of directly setting LG_PAGE.

Fix various edge conditions exposed by the configure options.
2014-10-09 22:44:37 -07:00
Jason Evans
57efa7bb0e Avoid atexit(3) when possible, disable prof_final by default.
atexit(3) can deadlock internally during its own initialization if
jemalloc calls atexit() during jemalloc initialization.  Mitigate the
impact by restructuring prof initialization to avoid calling atexit()
unless the registered function will actually dump a final heap profile.

Additionally, disable prof_final by default so that this land mine is
opt-in rather than opt-out.

This resolves #144.
2014-10-08 18:08:00 -07:00
Jason Evans
3a8b9b1fd9 Fix a recursive lock acquisition regression.
Fix a recursive lock acquisition regression, which was introduced by
8bb3198f72 (Refactor/fix arenas
manipulation.).
2014-10-08 00:54:16 -07:00
Daniel Micay
f22214a29d Use regular arena allocation for huge tree nodes.
This avoids grabbing the base mutex, as a step towards fine-grained
locking for huge allocations. The thread cache also provides a tiny
(~3%) improvement for serial huge allocations.
2014-10-07 23:57:09 -07:00
Jason Evans
8bb3198f72 Refactor/fix arenas manipulation.
Abstract arenas access to use arena_get() (or a0get() where appropriate)
rather than directly reading e.g. arenas[ind].  Prior to the addition of
the arenas.extend mallctl, the worst possible outcome of directly
accessing arenas was a stale read, but arenas.extend may allocate and
assign a new array to arenas.

Add a tsd-based arenas_cache, which amortizes arenas reads.  This
introduces some subtle bootstrapping issues, with tsd_boot() now being
split into tsd_boot[01]() to support tsd wrapper allocation
bootstrapping, as well as an arenas_cache_bypass tsd variable which
dynamically terminates allocation of arenas_cache itself.

Promote a0malloc(), a0calloc(), and a0free() to be generally useful for
internal allocation, and use them in several places (more may be
appropriate).

Abstract arena->nthreads management and fix a missing decrement during
thread destruction (recent tsd refactoring left arenas_cleanup()
unused).

Change arena_choose() to propagate OOM, and handle OOM in all callers.
This is important for providing consistent allocation behavior when the
MALLOCX_ARENA() flag is being used.  Prior to this fix, it was possible
for an OOM to result in allocation silently allocating from a different
arena than the one specified.
2014-10-07 23:14:57 -07:00
Jason Evans
bf40641c5c Fix a prof_tctx_t destruction race. 2014-10-06 16:35:11 -07:00
Jason Evans
155bfa7da1 Normalize size classes.
Normalize size classes to use the same number of size classes per size
doubling (currently hard coded to 4), across the intire range of size
classes.  Small size classes already used this spacing, but in order to
support this change, additional small size classes now fill [4 KiB .. 16
KiB).  Large size classes range from [16 KiB .. 4 MiB).  Huge size
classes now support non-multiples of the chunk size in order to fill (4
MiB .. 16 MiB).
2014-10-06 01:45:13 -07:00
Daniel Micay
a95018ee81 Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.

It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).

Repeated vector reallocation micro-benchmark:

    #include <string.h>
    #include <stdlib.h>

    int main(void) {
        for (size_t i = 0; i < 100; i++) {
            void *ptr = NULL;
            size_t old_size = 0;
            for (size_t size = 4; size < (1 << 30); size *= 2) {
                ptr = realloc(ptr, size);
                if (!ptr) return 1;
                memset(ptr + old_size, 0xff, size - old_size);
                old_size = size;
            }
            free(ptr);
        }
    }

The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.

With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.

An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.

Numbers with transparent huge pages enabled:

glibc (copies elided via MREMAP_MAYMOVE): 8.471s

jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s

jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s

Numbers with transparent huge pages disabled:

glibc (copies elided via MREMAP_MAYMOVE): 15.403s

jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s

jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s

Closes #137
2014-10-05 14:47:01 -07:00
Jason Evans
f11a6776c7 Fix OOM-related regression in arena_tcache_fill_small().
Fix an OOM-related regression in arena_tcache_fill_small() that caused
cache corruption that would almost certainly expose the application to
undefined behavior, usually in the form of an allocation request
returning an already-allocated region, or somewhat less likely, a freed
region that had already been returned to the arena, thus making it
available to the arena for any purpose.

This regression was introduced by
9c43c13a35 (Reverse tcache fill order.),
and was present in all releases from 2.2.0 through 3.6.0.

This resolves #98.
2014-10-05 13:05:10 -07:00
Jason Evans
f04a0bef99 Fix prof regressions.
Fix prof regressions related to tdata (main per thread profiling data
structure) destruction:
- Deadlock.  The fix for this was intended to be part of
  20c31deaae (Test prof.reset mallctl and
  fix numerous discovered bugs.) but the fix was left incomplete.
- Destruction race.  Detaching tdata just prior to destruction without
  holding the tdatas lock made it possible for another thread to destroy
  the tdata out from under the thread that was on its way to doing so.
2014-10-04 15:03:49 -07:00
Jason Evans
0800afd03f Silence a compiler warning. 2014-10-04 14:59:17 -07:00
Jason Evans
029d44cf8b Fix tsd cleanup regressions.
Fix tsd cleanup regressions that were introduced in
5460aa6f66 (Convert all tsd variables to
reside in a single tsd structure.).  These regressions were twofold:

1) tsd_tryget() should never (and need never) return NULL.  Rename it to
   tsd_fetch() and simplify all callers.
2) tsd_*_set() must only be called when tsd is in the nominal state,
   because cleanup happens during the nominal-->purgatory transition,
   and re-initialization must not happen while in the purgatory state.
   Add tsd_nominal() and use it as needed.  Note that tsd_*{p,}_get()
   can still be used as long as no re-initialization that would require
   cleanup occurs.  This means that e.g. the thread_allocated counter
   can be updated unconditionally.
2014-10-04 11:22:55 -07:00
Jason Evans
fc12c0b8bc Implement/test/fix prof-related mallctl's.
Implement/test/fix the opt.prof_thread_active_init,
prof.thread_active_init, and thread.prof.active mallctl's.

Test/fix the thread.prof.name mallctl.

Refactor opt_prof_active to be read-only and move mutable state into the
prof_active variable.  Stop leaning on ctl-related locking for
protection.
2014-10-03 23:25:30 -07:00
Jason Evans
551ebc4364 Convert to uniform style: cond == false --> !cond 2014-10-03 10:16:09 -07:00
Jason Evans
20c31deaae Test prof.reset mallctl and fix numerous discovered bugs. 2014-10-02 23:01:10 -07:00
Daniel Micay
f8034540a1 Implement in-place huge allocation shrinking.
Trivial example:

    #include <stdlib.h>

    int main(void) {
        void *ptr = malloc(1024 * 1024 * 8);
        if (!ptr) return 1;
        ptr = realloc(ptr, 1024 * 1024 * 4);
        if (!ptr) return 1;
    }

Before:

    mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcfff000000
    mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcffec00000
    madvise(0x7fcfff000000, 8388608, MADV_DONTNEED) = 0

After:

    mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1934800000
    madvise(0x7f1934c00000, 4194304, MADV_DONTNEED) = 0

Closes #134
2014-10-01 16:55:03 -07:00
Dave Rigby
e3a16fce5e Mark malloc_conf as a weak symbol
This fixes issue #113 - je_malloc_conf is not respected on OS X
2014-09-29 15:05:55 -07:00
Jason Evans
0c5dd03e88 Move small run metadata into the arena chunk header.
Move small run metadata into the arena chunk header, with multiple
expected benefits:
- Lower run fragmentation due to reduced run sizes; runs are more likely
  to completely drain when there are fewer total regions.
- Improved cache behavior.  Prior to this change, run headers were
  always page-aligned, which put extra pressure on some CPU cache sets.
  The degree to which this was a problem was hardware dependent, but it
  likely hurt some even for the most advanced modern hardware.
- Buffer overruns/underruns are less likely to corrupt allocator
  metadata.
- Size classes between 4 KiB and 16 KiB become reasonable to support
  without any special handling, and the runs are small enough that dirty
  unused pages aren't a significant concern.
2014-09-29 01:31:39 -07:00
Jason Evans
f97e5ac4ec Implement compile-time bitmap size computation. 2014-09-28 14:43:11 -07:00
Jason Evans
6ef80d68f0 Fix profile dumping race.
Fix a race that caused a non-critical assertion failure.  To trigger the
race, a thread had to be part way through initializing a new sample,
such that it was discoverable by the dumping thread, but not yet linked
into its gctx by the time a later dump phase would normally have reset
its state to 'nominal'.

Additionally, lock access to the state field during modification to
transition to the dumping state.  It's not apparent that this oversight
could have caused an actual problem due to outer locking that protects
the dumping machinery, but the added locking pedantically follows the
stated locking protocol for the state field.
2014-09-24 22:23:43 -07:00
Jason Evans
5460aa6f66 Convert all tsd variables to reside in a single tsd structure. 2014-09-23 02:36:08 -07:00
Jason Evans
9d8f3d2033 Fix prof regressions.
Don't use atomic_add_uint64(), because it isn't available on 32-bit
platforms.

Fix forking support functions to manage all prof-related mutexes.

These regressions were introduced by
602c8e0971 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
2014-09-11 18:09:14 -07:00
Jason Evans
c3e9e7b041 Fix irallocx_prof() sample logic.
Fix irallocx_prof() sample logic to only update the threshold counter
after it knows what size the allocation ended up being.  This regression
was caused by 6e73dc194e (Fix a profile
sampling race.), which did not make it into any releases prior to this
fix.
2014-09-11 17:04:03 -07:00
Jason Evans
9c640bfdd4 Apply likely()/unlikely() to allocation/deallocation fast paths. 2014-09-11 17:01:58 -07:00
Jason Evans
91566fc079 Fix mallocx() to always honor MALLOCX_ARENA() when profiling. 2014-09-11 13:15:33 -07:00
Daniel Micay
23fdf8b359 mark some conditions as unlikely
* assertion failure
* malloc_init failure
* malloc not already initialized (in malloc_init)
* running in valgrind
* thread cache disabled at runtime

Clang and GCC already consider a comparison with NULL or -1 to be cold,
so many branches (out-of-memory) are already correctly considered as
cold and marking them is not important.
2014-09-10 21:49:42 -04:00
Jason Evans
6e73dc194e Fix a profile sampling race.
Fix a profile sampling race that was due to preparing to sample, yet
doing nothing to assure that the context remains valid until the stats
are updated.

These regressions were caused by
602c8e0971 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
2014-09-09 19:47:09 -07:00
Jason Evans
6fd53da030 Fix prof_tdata_get()-related regressions.
Fix prof_tdata_get() to avoid dereferencing an invalid tdata pointer
(when it's PROF_TDATA_STATE_{REINCARNATED,PURGATORY}).

Fix prof_tdata_get() callers to check for invalid results besides NULL
(PROF_TDATA_STATE_{REINCARNATED,PURGATORY}).

These regressions were caused by
602c8e0971 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
2014-09-09 15:29:34 -07:00
Jason Evans
a2260c95cd Fix sdallocx() assertion.
Refactor sdallocx() and nallocx() to share inallocx(), and fix an
sdallocx() assertion to check usize rather than size.
2014-09-09 10:39:15 -07:00
Daniel Micay
4cfe55166e Add support for sized deallocation.
This adds a new `sdallocx` function to the external API, allowing the
size to be passed by the caller.  It avoids some extra reads in the
thread cache fast path.  In the case where stats are enabled, this
avoids the work of calculating the size from the pointer.

An assertion validates the size that's passed in, so enabling debugging
will allow users of the API to debug cases where an incorrect size is
passed in.

The performance win for a contrived microbenchmark doing an allocation
and immediately freeing it is ~10%.  It may have a different impact on a
real workload.

Closes #28
2014-09-08 17:34:24 -07:00
Jason Evans
b718cf77e9 Optimize [nmd]alloc() fast paths.
Optimize [nmd]alloc() fast paths such that the (flags == 0) case is
streamlined, flags decoding only happens to the minimum degree
necessary, and no conditionals are repeated.
2014-09-07 14:40:19 -07:00
Jason Evans
c21b05ea09 Whitespace cleanups. 2014-09-04 22:27:26 -07:00
Qinfan Wu
ff6a31d3b9 Refactor chunk map.
Break the chunk map into two separate arrays, in order to improve cache
locality. This is related to issue #23.
2014-09-04 22:22:52 -07:00
Qinfan Wu
58799f6d1c Remove junk filling in tcache_bin_flush_small().
Junk filling is done in arena_dalloc_bin_locked(), so arena_alloc_junk_small()
is redundant. Also, we should use arena_dalloc_junk_small() instead of
arena_alloc_junk_small().
2014-08-26 21:28:31 -07:00
Sara Golemon
3e24afa28e Test for availability of malloc hooks via autoconf
__*_hook() is glibc, but on at least one glibc platform (homebrew),
the __GLIBC__ define isn't set correctly and we miss being able to
use these hooks.

Do a feature test for it during configuration so that we enable it
anywhere the hooks are actually available.
2014-08-22 15:19:21 -07:00
Jason Evans
602c8e0971 Implement per thread heap profiling.
Rename data structures (prof_thr_cnt_t-->prof_tctx_t,
prof_ctx_t-->prof_gctx_t), and convert to storing a prof_tctx_t for
sampled objects.

Convert PROF_ALLOC_PREP() to prof_alloc_prep(), since precise backtrace
depth within jemalloc functions is no longer an issue (pprof prunes
irrelevant frames).

Implement mallctl's:
- prof.reset implements full sample data reset, and optional change of
  sample interval.
- prof.lg_sample reads the current sample interval (opt.lg_prof_sample
  was the permanent source of truth prior to prof.reset).
- thread.prof.name provides naming capability for threads within heap
  profile dumps.
- thread.prof.active makes it possible to activate/deactivate heap
  profiling for individual threads.

Modify the heap dump files to contain per thread heap profile data.
This change is incompatible with the existing pprof, which will require
enhancements to read and process the enriched data.
2014-08-19 21:31:16 -07:00
Jason Evans
3a81cbd2d4 Dump heap profile backtraces in a stable order.
Also iterate over per thread stats in a stable order, which prepares the
way for stable ordering of per thread heap profile dumps.
2014-08-19 21:05:54 -07:00
Jason Evans
ab532e9799 Directly embed prof_ctx_t's bt. 2014-08-19 21:05:54 -07:00
Jason Evans
b41ccdb125 Convert prof_tdata_t's bt2cnt to a comprehensive map.
Treat prof_tdata_t's bt2cnt as a comprehensive map of the thread's
extant allocation samples (do not limit the total number of entries).
This helps prepare the way for per thread heap profiling.
2014-08-19 21:05:54 -07:00
Jason Evans
586c8ede42 Fix arena.<i>.dss mallctl to handle read-only calls. 2014-08-15 12:20:20 -07:00
Jason Evans
070b3c3fbd Fix and refactor runs_dirty-based purging.
Fix runs_dirty-based purging to also purge dirty pages in the spare
chunk.

Refactor runs_dirty manipulation into arena_dirty_{insert,remove}(), and
move the arena->ndirty accounting into those functions.

Remove the u.ql_link field from arena_chunk_map_t, and get rid of the
enclosing union for u.rb_link, since only rb_link remains.

Remove the ndirty field from arena_chunk_t.
2014-08-14 14:45:58 -07:00
Qinfan Wu
e8a2fd83a2 arena->npurgatory is no longer needed since we drop arena's lock
after stashing all the purgeable runs.
2014-08-12 09:50:01 -07:00
Qinfan Wu
90737fcda1 Remove chunks_dirty tree, nruns_avail and nruns_adjac since we no
longer need to maintain the tree for dirty page purging.
2014-08-12 09:50:00 -07:00
Qinfan Wu
e970800c78 Purge dirty pages from the beginning of the dirty list. 2014-08-12 09:50:00 -07:00
Qinfan Wu
a244e5078e Add dirty page counting for debug 2014-08-12 09:50:00 -07:00
Qinfan Wu
04d60a132b Maintain all the dirty runs in a linked list for each arena 2014-08-12 09:50:00 -07:00
Jason Evans
1522937e9c Fix the cactive statistic.
Fix the cactive statistic to decrease (rather than increase) when active
memory decreases.  This regression was introduced by
aa5113b1fd (Refactor overly large/complex
functions) and first released in 3.5.0.
2014-08-06 23:43:39 -07:00
Qinfan Wu
ea73eb8f3e Reintroduce the comment that was removed in f9ff603. 2014-08-06 16:43:01 -07:00
Qinfan Wu
55c9aa1038 Fix the bug that causes not allocating free run with lowest address. 2014-08-06 16:10:08 -07:00
Mike Hommey
6f533c1903 Ensure the default purgeable zone is after the default zone on OS X 2014-06-10 07:18:29 -07:00
Richard Diamond
994fad9bda Add check for madvise(2) to configure.ac.
Some platforms, such as Google's Portable Native Client, use Newlib and
thus lack access to madvise(2).  In those instances, pages_purge() is
transformed into a no-op.
2014-06-03 09:32:49 -07:00
Chris Peterson
70807bc54b Fix -Wsometimes-uninitialized warnings 2014-06-02 07:53:52 -07:00
Chris Peterson
3e310b34eb Fix -Wsign-compare warnings 2014-06-02 07:51:33 -07:00
Richard Diamond
94ed6812bc Don't catch fork()ing events for Native Client.
Native Client doesn't allow forking, thus there is no need to catch
fork()ing events for Native Client.

Additionally, without this commit, jemalloc will introduce an unresolved
pthread_atfork() in PNaCl Rust bins.
2014-06-02 07:45:33 -07:00
Richard Diamond
9c3a10fdf6 Try to use __builtin_ffsl if ffsl is unavailable.
Some platforms (like those using Newlib) don't have ffs/ffsl.  This
commit adds a check to configure.ac for __builtin_ffsl if ffsl isn't
found.  __builtin_ffsl performs the same function as ffsl, and has the
added benefit of being available on any platform utilizing
Gcc-compatible compiler.

This change does not address the used of ffs in the MALLOCX_ARENA()
macro.
2014-06-02 07:44:50 -07:00
Jason Evans
d04047cc29 Add size class computation capability.
Add size class computation capability, currently used only as validation
of the size class lookup tables.  Generalize the size class spacing used
for bins, for eventual use throughout the full range of allocation
sizes.
2014-05-28 21:06:46 -07:00
Jason Evans
e2deab7a75 Refactor huge allocation to be managed by arenas.
Refactor huge allocation to be managed by arenas (though the global
red-black tree of huge allocations remains for lookup during
deallocation).  This is the logical conclusion of recent changes that 1)
made per arena dss precedence apply to huge allocation, and 2) made it
possible to replace the per arena chunk allocation/deallocation
functions.

Remove the top level huge stats, and replace them with per arena huge
stats.

Normalize function names and types to *dalloc* (some were *dealloc*).

Remove the --enable-mremap option.  As jemalloc currently operates, this
is a performace regression for some applications, but planned work to
logarithmically space huge size classes should provide similar amortized
performance.  The motivation for this change was that mremap-based huge
reallocation forced leaky abstractions that prevented refactoring.
2014-05-15 22:36:41 -07:00
aravind
fb7fe50a88 Add support for user-specified chunk allocators/deallocators.
Add new mallctl endpoints "arena<i>.chunk.alloc" and
"arena<i>.chunk.dealloc" to allow userspace to configure
jemalloc's chunk allocator and deallocator on a per-arena
basis.
2014-05-12 10:46:03 -07:00
Jason Evans
a344dd01c7 Fix coding sytle nits. 2014-05-01 15:51:30 -07:00
Jason Evans
6f001059aa Simplify backtracing.
Simplify backtracing to not ignore any frames, and compensate for this
in pprof in order to increase flexibility with respect to function-based
refactoring even in the presence of non-deterministic inlining.  Modify
pprof to blacklist all jemalloc allocation entry points including
non-standard ones like mallocx(), and ignore all allocator-internal
frames.  Prior to this change, pprof excluded the specifically
blacklisted functions from backtraces, but it left allocator-internal
frames intact.
2014-04-22 20:55:09 -07:00
Lucian Adrian Grijincu
9d4e13f45a prof_backtrace: use unw_backtrace
unw_backtrace:
- does internal per-thread caching
- doesn't acquire an internal lock
2014-04-22 18:39:47 -07:00
Jason Evans
3541a904d6 Refactor small_size2bin and small_bin2size.
Refactor small_size2bin and small_bin2size to be inline functions rather
than directly accessed arrays.
2014-04-16 17:14:33 -07:00
Jason Evans
3e3caf03af Merge pull request #73 from bmaurer/smallmalloc
Smaller malloc hot path
2014-04-16 16:33:21 -07:00
Ben Maurer
021136ce4d Create a const array with only a small bin to size map 2014-04-16 14:31:24 -07:00
Ben Maurer
6c39f9e059 refactor profiling. only use a bytes till next sample variable. 2014-04-16 13:43:30 -07:00
Ben Maurer
a7619b7fa5 outline rare tcache_get codepaths 2014-04-16 13:36:56 -07:00
Jason Evans
bd87b01999 Optimize Valgrind integration.
Forcefully disable tcache if running inside Valgrind, and remove
Valgrind calls in tcache-specific code.

Restructure Valgrind-related code to move most Valgrind calls out of the
fast path functions.

Take advantage of static knowledge to elide some branches in
JEMALLOC_VALGRIND_REALLOC().
2014-04-15 16:49:57 -07:00
Jason Evans
ecd3e59ca3 Remove the "opt.valgrind" mallctl.
Remove the "opt.valgrind" mallctl because it is unnecessary -- jemalloc
automatically detects whether it is running inside valgrind.
2014-04-15 14:33:50 -07:00
Jason Evans
a2c719b374 Remove the "arenas.purge" mallctl.
Remove the "arenas.purge" mallctl, which was obsoleted by the
"arena.<i>.purge" mallctl in 3.1.0.
2014-04-15 12:46:28 -07:00
Jason Evans
4d434adb14 Make dss non-optional, and fix an "arena.<i>.dss" mallctl bug.
Make dss non-optional on all platforms which support sbrk(2).

Fix the "arena.<i>.dss" mallctl to return an error if "primary" or
"secondary" precedence is specified, but sbrk(2) is not supported.
2014-04-15 12:09:48 -07:00
Jason Evans
9790b9667f Remove the *allocm() API, which is superceded by the *allocx() API. 2014-04-14 22:32:31 -07:00
Jason Evans
9b0cbf0850 Remove support for non-prof-promote heap profiling metadata.
Make promotion of sampled small objects to large objects mandatory, so
that profiling metadata can always be stored in the chunk map, rather
than requiring one pointer per small region in each small-region page
run.  In practice the non-prof-promote code was only useful when using
jemalloc to track all objects and report them as leaks at program exit.
However, Valgrind is at least as good a tool for this particular use
case.

Furthermore, the non-prof-promote code is getting in the way of
some optimizations that will make heap profiling much cheaper for the
predominant use case (sampling a small representative proportion of all
allocations).
2014-04-11 14:24:51 -07:00
Jason Evans
f4e026f525 Merge pull request #70 from bmaurer/bitsplitrefactor
refactoring for bits splitting
2014-04-10 13:02:28 -07:00
Ben Maurer
f9ff60346d refactoring for bits splitting 2014-04-10 12:43:54 -07:00
Ben Maurer
be8e59f5a6 Don't dereference chunk->arena in free() hot path
When you call free() we load chunk->arena even though that
data isn't used on the tcache hot path.

In profiling some FB applications, I found that ~30% of the
dTLB misses in the free() function come from this line. With
4 MB chunks, the arena_chunk_t->map is ~ 32 KB (1024 pages
in the chunk, 4 8 byte pointers in arena_chunk_map_t). This
means there's only a 1/8 chance of the page containing
chunk->arena also comtaining the map bits.
2014-04-05 15:59:08 -07:00
Jason Evans
9480a23005 Merge pull request #59 from HarryWeppner/dev
FreeBSD memory (leak) profiling support
2014-03-29 16:47:08 -07:00
Jason Evans
57fb8e94ae Merge pull request #61 from mxw/huge-dss-prec
Use arena dss prec instead of default for huge allocs.
2014-03-28 14:48:56 -07:00
Harald Weppner
c2da2591be Consistently use debug lib(s) if present
Fixes a situation where nm uses the debug lib but
addr2line does not, which completely messes up the symbol
lookup.
2014-03-28 13:47:59 -07:00
Max Wang
fbb31029a5 Use arena dss prec instead of default for huge allocs.
Pass a dss_prec_t parameter to huge_{m,p,r}alloc instead of defaulting
to the chunk dss prec.
2014-03-28 13:43:58 -07:00
Chris Pride
20a8c78bfe Fix a crashing case where arena_chunk_init_hard returns NULL.
This happens when it fails to allocate a new chunk. Which
arena_chunk_alloc then passes into arena_avail_insert without any
checks. This then causes a crash when arena_avail_insert tries
to check chunk->ndirty.

This was introduced by the refactoring of arena_chunk_alloc
which previously would have returned NULL immediately after
calling chunk_alloc. This is now the return from
arena_chunk_init_hard so we need to check that return, and
not continue if it was NULL.
2014-03-25 22:36:05 -07:00
Harald Weppner
bf543df20c Enable profiling / leak detection in FreeBSD
* Assumes procfs is mounted at /proc, cf.
  <http://www.freebsd.org/doc/en/articles/linux-users/procfs.html>
2014-03-17 23:53:00 -07:00
Jason Evans
940fdfd5ee Fix junk filling for mremap(2)-based huge reallocation.
If mremap(2) is used for huge reallocation, physical pages are mapped to
new virtual addresses rather than data being copied to new pages.  This
bypasses the normal junk filling that would happen during allocation, so
add junk filling that is specific to this case.
2014-02-25 12:37:25 -08:00
Erwan Legrand
69e9fbb9c1 Fix typo 2014-02-14 12:48:58 +01:00
Jason Evans
0c4e743eaf Test and fix malloc_printf("%%"). 2014-01-22 09:00:27 -08:00
Jason Evans
e2206edebc Fix unused variable warnings. 2014-01-21 14:59:13 -08:00
Jason Evans
772163b4f3 Add heap profiling tests.
Fix a regression in prof_dump_ctx() due to an uninitized variable.  This
was caused by revision 4f37ef693e, so no
releases are affected.
2014-01-17 15:40:52 -08:00
Jason Evans
eefdd02e70 Fix a variable prototype/definition mismatch. 2014-01-16 18:04:30 -08:00
Jason Evans
4f37ef693e Refactor prof_dump() to reduce contention.
Refactor prof_dump() to use a two pass algorithm, and prof_leave() prior
to the second pass.  This avoids write(2) system calls while holding
critical prof resources.

Fix prof_dump() to close the dump file descriptor for all relevant error
paths.

Minimize the size of prof-related static buffers when prof is disabled.
This saves roughly 65 KiB of application memory for non-prof builds.

Refactor prof_ctx_init() out of prof_lookup_global().
2014-01-16 13:36:38 -08:00
Jason Evans
fb1775e47e Refactor prof_lookup() by extracting prof_lookup_global(). 2014-01-14 17:04:34 -08:00
Jason Evans
aa5113b1fd Refactor overly large/complex functions.
Refactor overly large functions by breaking out helper functions.

Refactor overly complex multi-purpose functions into separate more
specific functions.
2014-01-14 16:23:03 -08:00
Jason Evans
b2c31660be Extract profiling code from [re]allocation functions.
Extract profiling code from malloc(), imemalign(), calloc(), realloc(),
mallocx(), rallocx(), and xallocx().  This slightly reduces the amount
of code compiled into the fast paths, but the primary benefit is the
combinatorial complexity reduction.

Simplify iralloc[t]() by creating a separate ixalloc() that handles the
no-move cases.

Further simplify [mrxn]allocx() (and by implication [mrn]allocm()) to
make request size overflows due to size class and/or alignment
constraints trigger undefined behavior (detected by debug-only
assertions).

Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling
backtrace creation in imemalign().  This bug impacted posix_memalign()
and aligned_alloc().
2014-01-12 15:41:05 -08:00
Jason Evans
6b694c4d47 Add junk/zero filling unit tests, and fix discovered bugs.
Fix growing large reallocation to junk fill new space.

Fix huge deallocation to junk fill when munmap is disabled.
2014-01-07 16:54:17 -08:00
Jason Evans
e18c25d23d Add util unit tests, and fix discovered bugs.
Add unit tests for pow2_ceil(), malloc_strtoumax(), and
malloc_snprintf().

Fix numerous bugs in malloc_strotumax() error handling/reporting.  These
bugs could have caused application-visible issues for some seldom used
(0X... and 0... prefixes) or malformed MALLOC_CONF or mallctl() argument
strings, but otherwise they had no impact.

Fix numerous bugs in malloc_snprintf().  These bugs were not exercised
by existing malloc_*printf() calls, so they had no impact.
2014-01-06 20:41:09 -08:00
Jason Evans
b954bc5d3a Convert rtree from (void *) to (uint8_t) storage.
Reduce rtree memory usage by storing booleans (1 byte each) rather than
pointers.  The rtree code is only used to record whether jemalloc manages
a chunk of memory, so there's no need to store pointers in the rtree.

Increase rtree node size to 64 KiB in order to reduce tree depth from 13
to 3 on 64-bit systems.  The conversion to more compact leaf nodes was
enough by itself to make the rtree depth 1 on 32-bit systems; due to the
fact that root nodes are smaller than the specified node size if
possible, the node size change has no impact on 32-bit systems (assuming
default chunk size).
2014-01-02 17:36:38 -08:00
Jason Evans
b980cc774a Add rtree unit tests. 2014-01-02 16:17:15 -08:00
Jason Evans
0405312921 Fix an uninitialized variable read in xallocx(). 2013-12-20 15:52:01 -08:00
Jason Evans
d8a390020c Fix a few mallctl() documentation errors.
Normalize mallctl() order (code and documentation).
2013-12-19 21:40:41 -08:00
Jason Evans
0d6c5d8bd0 Add quarantine unit tests.
Verify that freed regions are quarantined, and that redzone corruption
is detected.

Introduce a testing idiom for intercepting/replacing internal functions.
In this case the replaced function is ordinarily a static function, but
the idiom should work similarly for library-private functions.
2013-12-17 15:19:12 -08:00
Jason Evans
6e62984ef6 Don't junk-fill reallocations unless usize changes.
Don't junk fill reallocations for which the request size is less than
the current usable size, but not enough smaller to cause a size class
change.  Unlike malloc()/calloc()/realloc(), *allocx() contractually
treats the full usize as the allocation, so a caller can ask for zeroed
memory via mallocx() and a series of rallocx() calls that all specify
MALLOCX_ZERO, and be assured that all newly allocated bytes will be
zeroed and made available to the application without danger of allocator
mutation until the size class decreases enough to cause usize reduction.
2013-12-15 21:57:09 -08:00
Jason Evans
665769357c Optimize arena_prof_ctx_set().
Refactor such that arena_prof_ctx_set() receives usize as an argument,
and use it to determine whether to handle ptr as a small region, rather
than reading the chunk page map.
2013-12-15 21:57:02 -08:00
Jason Evans
d82a5e6a34 Implement the *allocx() API.
Implement the *allocx() API, which is a successor to the *allocm() API.
The *allocx() functions are slightly simpler to use because they have
fewer parameters, they directly return the results of primary interest,
and mallocx()/rallocx() avoid the strict aliasing pitfall that
allocm()/rallocx() share with posix_memalign().  The following code
violates strict aliasing rules:

    foo_t *foo;
    allocm((void **)&foo, NULL, 42, 0);

whereas the following is safe:

    foo_t *foo;
    void *p;
    allocm(&p, NULL, 42, 0);
    foo = (foo_t *)p;

mallocx() does not have this problem:

    foo_t *foo = (foo_t *)mallocx(42, 0);
2013-12-12 22:35:52 -08:00
Jason Evans
6edc97db15 Fix inline-related macro issues.
Add JEMALLOC_INLINE_C and use it instead of JEMALLOC_INLINE in .c files,
so that the annotated functions are always static.

Remove SFMT's inline-related macros and use jemalloc's instead, so that
there's no danger of interactions with jemalloc's definitions that
disable inlining for debug builds.
2013-12-10 14:35:34 -08:00
Jason Evans
7369232544 Silence some unused variable warnings. 2013-12-10 13:51:52 -08:00
Jason Evans
a4f124f59f Normalize #define whitespace.
Consistently use a tab rather than a space following #define.
2013-12-08 22:28:27 -08:00
Jason Evans
2a83ed0284 Refactor tests.
Refactor tests to use explicit testing assertions, rather than diff'ing
test output.  This makes the test code a bit shorter, more explicitly
encodes testing intent, and makes test failure diagnosis more
straightforward.
2013-12-08 20:52:21 -08:00
Jason Evans
6668853596 Avoid deprecated sbrk(2) on OS X.
Avoid referencing sbrk(2) on OS X, because it is deprecated as of OS X
10.9 (Mavericks), and the compiler warns against using it.
2013-12-03 21:49:36 -08:00
Jason Evans
52b30691f9 Remove unused variable. 2013-12-02 15:16:39 -08:00
Jason Evans
addad093f8 Clean up malloc_ncpus().
Clean up malloc_ncpus() by replacing incorrectly indented if..else
branches with a ?: expression.

Submitted by Igor Podlesny.
2013-11-29 16:19:44 -08:00
Jason Evans
39e7fd0580 Fix ALLOCM_ARENA(a) handling in rallocm().
Fix rallocm() to use the specified arena for allocation, not just
deallocation.

Clarify ALLOCM_ARENA(a) documentation.
2013-11-25 18:02:35 -08:00
Jason Evans
d6df91438a Fix a potential infinite loop during thread exit.
Fix malloc_tsd_dalloc() to bypass tcache when dallocating, so that there
is no danger of causing tcache reincarnation during thread exit.
Whether this infinite loop occurs depends on the pthreads TSD
implementation; it is known to occur on Solaris.

Submitted by Markus Eberspächer.
2013-11-19 18:01:45 -08:00
Jason Evans
c368f8c8a2 Remove unnecessary zeroing in arena_palloc(). 2013-10-29 18:31:17 -07:00
Jason Evans
239692b18e Fix whitespace. 2013-10-28 12:41:37 -07:00
Leonard Crestez
cb17fc6a8f Add support for LinuxThreads.
When using LinuxThreads pthread_setspecific triggers recursive
allocation on all threads. Work around this by creating a global linked
list of in-progress tsd initializations.

This modifies the _tsd_get_wrapper macro-generated function. When it has
to initialize an TSD object it will push the item to the linked list
first. If this causes a recursive allocation then the _get_wrapper
request is satisfied from the list. When pthread_setspecific returns the
item is removed from the list.

This effectively adds a very poor substitute for real TLS used only
during pthread_setspecific allocation recursion.

Signed-off-by: Crestez Dan Leonard <lcrestez@ixiacom.com>
2013-10-24 18:25:19 -07:00
Leonard Crestez
ac4403cacb Delay pthread_atfork registering.
This function causes recursive allocation on LinuxThreads.

Signed-off-by: Crestez Dan Leonard <lcrestez@ixiacom.com>
2013-10-24 16:40:31 -07:00
Jason Evans
93f39f8d23 Fix a file descriptor leak in a prof_dump_maps() error path.
Reported by Pat Lynch.
2013-10-21 15:07:40 -07:00
Jason Evans
1d1cee127a Add a missing mutex unlock in malloc_init_hard() error path.
Add a missing mutex unlock in a malloc_init_hard() error path (failed
mutex initialization).  In practice this bug was very unlikely to ever
trigger, but if it did, application deadlock would likely result.

Reported by Pat Lynch.
2013-10-21 15:04:12 -07:00
Jason Evans
e2985a2381 Avoid (x < 0) comparison for unsigned x.
Avoid (min < 0) comparison for unsigned min in malloc_conf_init().  This
bug had no practical consequences.

Reported by Pat Lynch.
2013-10-21 15:01:44 -07:00
Jason Evans
30e7cb1118 Fix a data race for large allocation stats counters.
Reported by Pat Lynch.
2013-10-21 15:00:06 -07:00
Jason Evans
f1c3da8b02 Consistently use malloc_mutex_prefork().
Consistently use malloc_mutex_prefork() instead of malloc_mutex_lock()
in all prefork functions.
2013-10-21 14:59:10 -07:00
Jason Evans
6556e28be1 Prefer not_reached() over assert(false) where appropriate. 2013-10-21 14:56:27 -07:00
Jason Evans
d504477935 Fix a compiler warning.
Fix a compiler warning in chunk_record() that was due to reading node
rather than xnode.  In practice this did not cause any correctness
issue, but dataflow analysis in some compilers cannot tell that node and
xnode are always equal in cases that the read is reached.
2013-10-20 15:11:01 -07:00
Jason Evans
7b65180b32 Fix a race condition in the "arenas.extend" mallctl.
Fix a race condition in the "arenas.extend" mallctl that could lead to
internal data structure corruption.  The race could be hit if one
thread called the "arenas.extend" mallctl while another thread
concurrently triggered initialization of one of the lazily created
arenas.
2013-10-20 14:39:33 -07:00
Jason Evans
dda90f59e2 Fix a Valgrind integration flaw.
Fix a Valgrind integration flaw that caused Valgrind warnings about
reads of uninitialized memory in internal zero-initialized data
structures (relevant to tcache and prof code).
2013-10-19 23:48:40 -07:00
Jason Evans
87a02d2bb1 Fix a Valgrind integration flaw.
Fix a Valgrind integration flaw that caused Valgrind warnings about
reads of uninitialized memory in arena chunk headers.
2013-10-19 21:40:20 -07:00
Jason Evans
543abf7e6c Fix inlining warning.
Add the JEMALLOC_ALWAYS_INLINE_C macro and use it for always-inlined
functions declared in .c files.  This fixes a function attribute
inconsistency for debug builds that resulted in (harmless) compiler
warnings about functions not being inlinable.

Reported by Ricardo Nabinger Sanchez.
2013-10-19 17:26:00 -07:00
Jason Evans
3ab682d341 Silence an unused variable warning.
Reported by Ricardo Nabinger Sanchez.
2013-10-19 17:25:17 -07:00
Alexandre Perrin
dd6ef0302f malloc_conf_init: revert errno value when readlink(2) fail. 2013-10-13 15:33:15 -07:00
Jason Evans
4f929aa948 Fix another deadlock related to chunk_record().
Fix chunk_record() to unlock chunks_mtx before deallocating a base
node, in order to avoid potential deadlock.  This fix addresses the
second of two similar bugs.
2013-04-22 22:36:18 -07:00
Jason Evans
741fbc6ba4 Fix deadlock related to chunk_record().
Fix chunk_record() to unlock chunks_mtx before deallocating a base node,
in order to avoid potential deadlock.

Reported by Tudor Bosman.
2013-04-17 09:57:11 -07:00
Jason Evans
88c222c8e9 Fix a prof-related locking order bug.
Fix a locking order bug that could cause deadlock during fork if heap
profiling were enabled.
2013-02-06 11:59:30 -08:00
Jason Evans
06912756cc Fix Valgrind integration.
Fix Valgrind integration to annotate all internally allocated memory in
a way that keeps Valgrind happy about internal data structure access.
2013-01-31 17:02:53 -08:00
Jason Evans
a7a28c334e Fix a chunk recycling bug.
Fix a chunk recycling bug that could cause the allocator to lose track
of whether a chunk was zeroed.  On FreeBSD, NetBSD, and OS X, it could
cause corruption if allocating via sbrk(2) (unlikely unless running with
the "dss:primary" option specified).  This was completely harmless on
Linux unless using mlockall(2) (and unlikely even then, unless the
--disable-munmap configure option or the "dss:primary" option was
specified).  This regression was introduced in 3.1.0 by the
mlockall(2)/madvise(2) interaction fix.
2013-01-31 16:53:58 -08:00
Jason Evans
d0e942e466 Fix two quarantine bugs.
Internal reallocation of the quarantined object array leaked the old array.

Reallocation failure for internal reallocation of the quarantined object
array (very unlikely) resulted in memory corruption.
2013-01-31 14:43:54 -08:00
Jason Evans
bbe29d374d Fix potential TLS-related memory corruption.
Avoid writing to uninitialized TLS as a side effect of deallocation.
Initializing TLS during deallocation is unsafe because it is possible
that a thread never did any allocation, and that TLS has already been
deallocated by the threads library, resulting in write-after-free
corruption.  These fixes affect prof_tdata and quarantine; all other
uses of TLS are already safe, whether intentionally (as for tcache) or
unintentionally (as for arenas).
2013-01-31 14:23:48 -08:00
Jason Evans
d1b6e18a99 Revert opt_abort and opt_junk refactoring.
Revert refactoring of opt_abort and opt_junk declarations.  clang
accepts the config_*-based declarations (and generates correct code),
but gcc complains with:

  error: initializer element is not constant
2013-01-22 16:54:26 -08:00
Jason Evans
ba175a2bfb Use config_* instead of JEMALLOC_*.
Convert a couple of stragglers from JEMALLOC_* to use config_*.
2013-01-22 12:14:45 -08:00
Jason Evans
ae03bf6a57 Update hash from MurmurHash2 to MurmurHash3.
Update hash from MurmurHash2 to MurmurHash3, primarily because the
latter generates 128 bits in a single call for no extra cost, which
simplifies integration with cuckoo hashing.
2013-01-22 12:02:08 -08:00
Jason Evans
88393cb0eb Add and use JEMALLOC_ALWAYS_INLINE.
Add JEMALLOC_ALWAYS_INLINE and use it to guarantee that the entire fast
paths of the primary allocation/deallocation functions are inlined.
2013-01-22 08:45:43 -08:00
Jason Evans
38067483c5 Tighten valgrind integration.
Tighten valgrind integration such that immediately after memory is
validated or zeroed, valgrind is told to forget the memory's 'defined'
state.  The only place newly allocated memory should be left marked as
'defined' is in the public functions (e.g. calloc() and realloc()).
2013-01-21 20:04:42 -08:00
Jason Evans
14a2c6a698 Avoid validating freshly mapped memory.
Move validation of supposedly zeroed pages from chunk_alloc() to
chunk_recycle().  There is little point to validating newly mapped
memory returned by chunk_alloc_mmap(), and memory that comes from sbrk()
is explicitly zeroed, so there is little risk to assuming that
chunk_alloc_dss() actually does the zeroing properly.

This relaxation of validation can make a big difference to application
startup time and overall system usage on platforms that use jemalloc as
the system allocator (namely FreeBSD).

Submitted by Ian Lepore <ian@FreeBSD.org>.
2013-01-21 19:56:34 -08:00
Garrett Cooper
6e6164ae15 Don't mangle errno with free(3) if utrace(2) fails
This ensures POLA on FreeBSD (at least) as free(3) is generally assumed
to not fiddle around with errno.

Signed-off-by: Garrett Cooper <yanegomi@gmail.com>
2012-12-24 10:30:57 -08:00
Jason Evans
1bf2743e08 Add clipping support to lg_chunk option processing.
Modify processing of the lg_chunk option so that it clips an
out-of-range input to the edge of the valid range.  This makes it
possible to request the minimum possible chunk size without intimate
knowledge of allocator internals.

Submitted by Ian Lepore (see FreeBSD PR bin/174641).
2012-12-23 08:51:48 -08:00
Jason Evans
1271185b87 Fix chunk_recycle() Valgrind integration.
Fix chunk_recycyle() to unconditionally inform Valgrind that returned
memory is undefined.  This fixes Valgrind warnings that would result
from a huge allocation being freed, then recycled for use as an arena
chunk.  The arena code would write metadata to the chunk header, and
Valgrind would consider these invalid writes.
2012-12-12 10:12:18 -08:00
Jason Evans
6eb84fbe31 Fix "arenas.extend" mallctl to return the number of arenas.
Reported by Mike Hommey.
2012-11-29 22:13:04 -08:00
Jason Evans
a3b3386ddd Avoid arena_prof_accum()-related locking when possible.
Refactor arena_prof_accum() and its callers to avoid arena locking when
prof_interval is 0 (as when profiling is disabled).

Reported by Ben Maurer.
2012-11-13 13:47:53 -08:00
Jason Evans
abf6739317 Tweak chunk purge order according to fragmentation.
Tweak chunk purge order to purge unfragmented chunks from high to low
memory.  This facilitates dirty run reuse.
2012-11-07 10:08:34 -08:00
Mike Hommey
847ff223de Don't register jemalloc's zone allocator if something else already replaced the system default zone 2012-11-06 16:06:59 -08:00
Jason Evans
e3d13060c8 Purge unused dirty pages in a fragmentation-reducing order.
Purge unused dirty pages in an order that first performs clean/dirty run
defragmentation, in order to mitigate available run fragmentation.

Remove the limitation that prevented purging unless at least one chunk
worth of dirty pages had accumulated in an arena.  This limitation was
intended to avoid excessive purging for small applications, but the
threshold was arbitrary, and the effect of questionable utility.

Relax opt_lg_dirty_mult from 5 to 3.  This compensates for increased
likelihood of allocating clean runs, given the same ratio of clean:dirty
runs, and reduces the potential for repeated purging in pathological
large malloc/free loops that push the active:dirty page ratio just over
the purge threshold.
2012-11-06 00:59:53 -08:00
Jason Evans
34457f5144 Fix deadlock in the arenas.purge mallctl.
Fix deadlock in the arenas.purge mallctl due to recursive mutex
acquisition.
2012-11-03 21:18:28 -07:00
Jason Evans
12efefb195 Fix dss/mmap allocation precedence code.
Fix dss/mmap allocation precedence code to use recyclable mmap memory
only after primary dss allocation fails.
2012-10-16 22:06:56 -07:00
Jason Evans
a5c80f893e Add ctl_mutex proection to arena_i_dss_ctl().
Add ctl_mutex proection to arena_i_dss_ctl(), since ctl_stats.narenas is
accessed.
2012-10-15 12:48:59 -07:00
Jason Evans
609ae595f0 Add arena-specific and selective dss allocation.
Add the "arenas.extend" mallctl, so that it is possible to create new
arenas that are outside the set that jemalloc automatically multiplexes
threads onto.

Add the ALLOCM_ARENA() flag for {,r,d}allocm(), so that it is possible
to explicitly allocate from a particular arena.

Add the "opt.dss" mallctl, which controls the default precedence of dss
allocation relative to mmap allocation.

Add the "arena.<i>.dss" mallctl, which makes it possible to set the
default dss precedence on a per arena or global basis.

Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge".

Add the "stats.arenas.<i>.dss" mallctl.
2012-10-12 18:26:16 -07:00
Jan Beich
d0ffd8ed4f mark _pthread_mutex_init_calloc_cb as public explicitly
Mozilla build hides everything by default using visibility pragma and
unhides only explicitly listed headers. But this doesn't work on FreeBSD
because _pthread_mutex_init_calloc_cb is neither documented nor exposed
via any header.
2012-10-10 09:10:37 -07:00
Jason Evans
2cc11ff837 Make malloc_usable_size() implementation consistent with prototype.
Use JEMALLOC_USABLE_SIZE_CONST for the malloc_usable_size()
implementation as well as the prototype, for consistency's sake.
2012-10-09 16:29:21 -07:00
Jason Evans
b5225928fe Fix fork(2)-related mutex acquisition order.
Fix mutex acquisition order inversion for the chunks rtree and the base
mutex.  Chunks rtree acquisition was introduced by the previous commit,
so this bug was short-lived.
2012-10-09 16:16:00 -07:00
Jason Evans
20f1fc95ad Fix fork(2)-related deadlocks.
Add a library constructor for jemalloc that initializes the allocator.
This fixes a race that could occur if threads were created by the main
thread prior to any memory allocation, followed by fork(2), and then
memory allocation in the child process.

Fix the prefork/postfork functions to acquire/release the ctl, prof, and
rtree mutexes.  This fixes various fork() child process deadlocks, but
one possible deadlock remains (intentionally) unaddressed: prof
backtracing can acquire runtime library mutexes, so deadlock is still
possible if heap profiling is enabled during fork().  This deadlock is
known to be a real issue in at least the case of libgcc-based
backtracing.

Reported by tfengjun.
2012-10-09 15:21:46 -07:00
Jason Evans
7de92767c2 Fix mlockall()/madvise() interaction.
mlockall(2) can cause purging via madvise(2) to fail.  Fix purging code
to check whether madvise() succeeded, and base zeroed page metadata on
the result.

Reported by Olivier Lecomte.
2012-10-08 18:04:49 -07:00
Jason Evans
f4c3f8545b Fix error return value in thread_tcache_enabled_ctl().
Reported by Corey Richardson.
2012-10-08 15:48:04 -07:00
Corey Richardson
1d553f72cb If sysconf() fails, the number of CPUs is reported as UINT_MAX, not 1 as it should be 2012-10-08 15:45:38 -07:00
Corey Richardson
35579afb55 Remove unused variable and branch (reported by clang-analzyer) 2012-10-08 15:45:38 -07:00
Jason Evans
5c710cee78 Remove const from __*_hook variable declarations.
Remove const from __*_hook variable declarations, so that glibc can
modify them during process forking.
2012-05-23 16:09:22 -07:00
Jason Evans
f1966e1dc7 Update a comment. 2012-05-16 00:35:08 -07:00
Jason Evans
174b70efb4 Disable tcache by default if running inside Valgrind.
Disable tcache by default if running inside Valgrind, in order to avoid
making unallocated objects appear reachable to Valgrind.
2012-05-15 23:31:53 -07:00
Jason Evans
781fe75e0a Auto-detect whether running inside Valgrind.
Auto-detect whether running inside Valgrind, thus removing the need to
manually specify MALLOC_CONF=valgrind:true.
2012-05-15 14:48:14 -07:00
Jason Evans
58ad1e4956 Return early in _malloc_{pre,post}fork() if uninitialized.
Avoid mutex operations in _malloc_{pre,post}fork() unless jemalloc has
been initialized.

Reported by David Xu.
2012-05-11 17:40:16 -07:00
Jason Evans
d8ceef6c55 Fix large calloc() zeroing bugs.
Refactor code such that arena_mapbits_{large,small}_set() always
preserves the unzeroed flag, and manually manipulate the unzeroed flag
in the one case where it actually gets reset (in arena_chunk_purge()).
This fixes unzeroed preservation bugs in arena_run_split() and
arena_ralloc_large_grow().  These bugs caused large calloc() to return
non-zeroed memory under some circumstances.
2012-05-10 21:49:43 -07:00
Jason Evans
30fe12b866 Add arena chunk map assertions. 2012-05-10 21:49:43 -07:00
Jason Evans
5b0c99649f Refactor arena_run_alloc().
Refactor duplicated arena_run_alloc() code into
arena_run_alloc_helper().
2012-05-10 21:49:43 -07:00
Jason Evans
2e671ffbad Add the --enable-mremap option.
Add the --enable-mremap option, and disable the use of mremap(2) by
default, for the same reason that freeing chunks via munmap(2) is
disabled by default on Linux: semi-permanent VM map fragmentation.
2012-05-09 16:12:00 -07:00
Jason Evans
374d26a43b Fix chunk_recycle() to stop leaking trailing chunks.
Fix chunk_recycle() to correctly compute trailsize and re-insert
trailing chunks.  This fixes a major virtual memory leak.

Simplify chunk_record() to avoid dropping/re-acquiring chunks_mtx.
2012-05-09 14:48:35 -07:00
Jason Evans
de6fbdb72c Fix chunk_alloc_mmap() bugs.
Simplify chunk_alloc_mmap() to no longer attempt map extension.  The
extra complexity isn't warranted, because although in the success case
it saves one system call as compared to immediately falling back to
chunk_alloc_mmap_slow(), it also makes the failure case even more
expensive.  This simplification removes two bugs:

- For Windows platforms, pages_unmap() wasn't being called for unaligned
  mappings prior to falling back to chunk_alloc_mmap_slow().  This
  caused permanent virtual memory leaks.
- For non-Windows platforms, alignment greater than chunksize caused
  pages_map() to be called with size 0 when attempting map extension.
  This always resulted in an mmap() error, and subsequent fallback to
  chunk_alloc_mmap_slow().
2012-05-09 13:05:04 -07:00
Jason Evans
34a8cf6c40 Fix a base allocator deadlock.
Fix a base allocator deadlock due to chunk_recycle() calling back into
the base allocator.
2012-05-02 20:41:42 -07:00
Mike Hommey
c584fc75bb Don't use sizeof() on a VARIABLE_ARRAY
In the alloca() case, this fails to be the right size.
2012-05-02 16:33:19 -07:00
Mike Hommey
3597e91482 Allow je_malloc_message to be overridden when linking statically
If an application wants to override je_malloc_message, it is better to define
the symbol locally than to change its value in main(), which might be too late
for various reasons.

Due to je_malloc_message being initialized in util.c, statically linking
jemalloc with an application defining je_malloc_message fails due to
"multiple definition of" the symbol.

Defining it without a value (like je_malloc_conf) makes it more easily
overridable.
2012-05-02 16:25:41 -07:00
Jason Evans
80737c3323 Further optimize and harden arena_salloc().
Further optimize arena_salloc() to only look at the binind chunk map
bits in the common case.

Add more sanity checks to arena_salloc() that detect chunk map
inconsistencies for large allocations (whether due to allocator bugs or
application bugs).
2012-05-02 16:11:03 -07:00
Jason Evans
889ec59bd3 Make malloc_write() non-inline.
Make malloc_write() non-inline, in order to resolve its dependency on
je_malloc_write().
2012-05-02 02:08:03 -07:00
Jason Evans
203484e2ea Optimize malloc() and free() fast paths.
Embed the bin index for small page runs into the chunk page map, in
order to omit [...] in the following dependent load sequence:
  ptr-->mapelm-->[run-->bin-->]bin_info

Move various non-critcal code out of the inlined function chain into
helper functions (tcache_event_hard(), arena_dalloc_small(), and
locking).
2012-05-02 00:30:36 -07:00
Mike Hommey
fd97b1dfc7 Add support for MSVC
Tested with MSVC 8 32 and 64 bits.
2012-05-01 11:32:11 -07:00
Mike Hommey
da99e31105 Replace JEMALLOC_ATTR with various different macros when it makes sense
Theses newly added macros will be used to implement the equivalent under
MSVC. Also, move the definitions to headers, where they make more sense,
and for some, are even more useful there (e.g. malloc).
2012-04-30 17:57:31 -07:00
Mike Hommey
a14bce85e8 Use Get/SetLastError on Win32
Using errno on win32 doesn't quite work, because the value set in a shared
library can't be read from e.g. an executable calling the function setting
errno.

At the same time, since buferror always uses errno/GetLastError, don't pass
it.
2012-04-30 16:50:55 -07:00
Mike Hommey
af04b744bd Remove the VOID macro
Windows headers define a VOID macro.
2012-04-30 16:42:30 -07:00
Mike Hommey
8b49971d0c Avoid variable length arrays and remove declarations within code
MSVC doesn't support C99, and building as C++ to be able to use them is
dangerous, as C++ and C99 are incompatible.

Introduce a VARIABLE_ARRAY macro that either uses VLA when supported,
or alloca() otherwise. Note that using alloca() inside loops doesn't
quite work like VLAs, thus the use of VARIABLE_ARRAY there is discouraged.
It might be worth investigating ways to check whether VARIABLE_ARRAY is
used in such context at runtime in debug builds and bail out if that
happens.
2012-04-29 00:25:34 -07:00
Jason Evans
f278994029 Fix more prof_tdata resurrection corner cases. 2012-04-28 23:27:13 -07:00
Jason Evans
0050a0f7e6 Handle prof_tdata resurrection.
Handle prof_tdata resurrection during thread shutdown, similarly to how
tcache and quarantine handle resurrection.
2012-04-28 18:14:24 -07:00
Jason Evans
95ff6aadca Don't set prof_tdata during thread cleanup.
Don't set prof_tdata during thread cleanup, because doing so will cause
the cleanup function to be called again, the second time with a NULL
argument.
2012-04-28 14:15:28 -07:00
Jason Evans
3fb50b0407 Fix a PROF_ALLOC_PREP() error path.
Fix a PROF_ALLOC_PREP() error path to initialize the return value to
NULL.
2012-04-25 13:13:44 -07:00
Jason Evans
6b9ed67b4b Fix the "epoch" mallctl.
Fix the "epoch" mallctl to update cached stats even if the passed in
epoch is 0.
2012-04-25 13:12:46 -07:00
Jason Evans
f54166e7ef Add missing Valgrind annotations. 2012-04-23 22:41:36 -07:00