Commit Graph

190 Commits

Author SHA1 Message Date
Jason Evans
8bb3198f72 Refactor/fix arenas manipulation.
Abstract arenas access to use arena_get() (or a0get() where appropriate)
rather than directly reading e.g. arenas[ind].  Prior to the addition of
the arenas.extend mallctl, the worst possible outcome of directly
accessing arenas was a stale read, but arenas.extend may allocate and
assign a new array to arenas.

Add a tsd-based arenas_cache, which amortizes arenas reads.  This
introduces some subtle bootstrapping issues, with tsd_boot() now being
split into tsd_boot[01]() to support tsd wrapper allocation
bootstrapping, as well as an arenas_cache_bypass tsd variable which
dynamically terminates allocation of arenas_cache itself.

Promote a0malloc(), a0calloc(), and a0free() to be generally useful for
internal allocation, and use them in several places (more may be
appropriate).

Abstract arena->nthreads management and fix a missing decrement during
thread destruction (recent tsd refactoring left arenas_cleanup()
unused).

Change arena_choose() to propagate OOM, and handle OOM in all callers.
This is important for providing consistent allocation behavior when the
MALLOCX_ARENA() flag is being used.  Prior to this fix, it was possible
for an OOM to result in allocation silently allocating from a different
arena than the one specified.
2014-10-07 23:14:57 -07:00
Jason Evans
155bfa7da1 Normalize size classes.
Normalize size classes to use the same number of size classes per size
doubling (currently hard coded to 4), across the intire range of size
classes.  Small size classes already used this spacing, but in order to
support this change, additional small size classes now fill [4 KiB .. 16
KiB).  Large size classes range from [16 KiB .. 4 MiB).  Huge size
classes now support non-multiples of the chunk size in order to fill (4
MiB .. 16 MiB).
2014-10-06 01:45:13 -07:00
Daniel Micay
a95018ee81 Attempt to expand huge allocations in-place.
This adds support for expanding huge allocations in-place by requesting
memory at a specific address from the chunk allocator.

It's currently only implemented for the chunk recycling path, although
in theory it could also be done by optimistically allocating new chunks.
On Linux, it could attempt an in-place mremap. However, that won't work
in practice since the heap is grown downwards and memory is not unmapped
(in a normal build, at least).

Repeated vector reallocation micro-benchmark:

    #include <string.h>
    #include <stdlib.h>

    int main(void) {
        for (size_t i = 0; i < 100; i++) {
            void *ptr = NULL;
            size_t old_size = 0;
            for (size_t size = 4; size < (1 << 30); size *= 2) {
                ptr = realloc(ptr, size);
                if (!ptr) return 1;
                memset(ptr + old_size, 0xff, size - old_size);
                old_size = size;
            }
            free(ptr);
        }
    }

The glibc allocator fails to do any in-place reallocations on this
benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it
elides the cost of copies via mremap, which is currently not something
that jemalloc can use.

With this improvement, jemalloc still fails to do any in-place huge
reallocations for the first outer loop, but then succeeds 100% of the
time for the remaining 99 iterations. The time spent doing allocations
and copies drops down to under 5%, with nearly all of it spent doing
purging + faulting (when huge pages are disabled) and the array memset.

An improved mremap API (MREMAP_RETAIN - #138) would be far more general
but this is a portable optimization and would still be useful on Linux
for xallocx.

Numbers with transparent huge pages enabled:

glibc (copies elided via MREMAP_MAYMOVE): 8.471s

jemalloc: 17.816s
jemalloc + no-op madvise: 13.236s

jemalloc + this commit: 6.787s
jemalloc + this commit + no-op madvise: 6.144s

Numbers with transparent huge pages disabled:

glibc (copies elided via MREMAP_MAYMOVE): 15.403s

jemalloc: 39.456s
jemalloc + no-op madvise: 12.768s

jemalloc + this commit: 15.534s
jemalloc + this commit + no-op madvise: 6.354s

Closes #137
2014-10-05 14:47:01 -07:00
Jason Evans
551ebc4364 Convert to uniform style: cond == false --> !cond 2014-10-03 10:16:09 -07:00
Jason Evans
5460aa6f66 Convert all tsd variables to reside in a single tsd structure. 2014-09-23 02:36:08 -07:00
Jason Evans
9c640bfdd4 Apply likely()/unlikely() to allocation/deallocation fast paths. 2014-09-11 17:01:58 -07:00
Daniel Micay
a62812eacc fix isqalloct (should call isdalloct) 2014-09-08 21:46:17 -04:00
Daniel Micay
4cfe55166e Add support for sized deallocation.
This adds a new `sdallocx` function to the external API, allowing the
size to be passed by the caller.  It avoids some extra reads in the
thread cache fast path.  In the case where stats are enabled, this
avoids the work of calculating the size from the pointer.

An assertion validates the size that's passed in, so enabling debugging
will allow users of the API to debug cases where an incorrect size is
passed in.

The performance win for a contrived microbenchmark doing an allocation
and immediately freeing it is ~10%.  It may have a different impact on a
real workload.

Closes #28
2014-09-08 17:34:24 -07:00
Jason Evans
b718cf77e9 Optimize [nmd]alloc() fast paths.
Optimize [nmd]alloc() fast paths such that the (flags == 0) case is
streamlined, flags decoding only happens to the minimum degree
necessary, and no conditionals are repeated.
2014-09-07 14:40:19 -07:00
Manuel A. Fernandez Montecelo
ffa259841c Add OpenRISC/or1k LG_QUANTUM size definition 2014-07-29 23:11:26 +01:00
Richard Diamond
9c3a10fdf6 Try to use __builtin_ffsl if ffsl is unavailable.
Some platforms (like those using Newlib) don't have ffs/ffsl.  This
commit adds a check to configure.ac for __builtin_ffsl if ffsl isn't
found.  __builtin_ffsl performs the same function as ffsl, and has the
added benefit of being available on any platform utilizing
Gcc-compatible compiler.

This change does not address the used of ffs in the MALLOCX_ARENA()
macro.
2014-06-02 07:44:50 -07:00
Jason Evans
d04047cc29 Add size class computation capability.
Add size class computation capability, currently used only as validation
of the size class lookup tables.  Generalize the size class spacing used
for bins, for eventual use throughout the full range of allocation
sizes.
2014-05-28 21:06:46 -07:00
Mike Hommey
12f74e680c Move platform headers and tricks from jemalloc_internal.h.in to a new jemalloc_internal_decls.h header 2014-05-28 09:38:10 -07:00
Mike Hommey
22bc570fba Move __func__ to jemalloc_internal_macros.h
test/integration/aligned_alloc.c needs it.
2014-05-27 15:52:49 -07:00
Jason Evans
e2deab7a75 Refactor huge allocation to be managed by arenas.
Refactor huge allocation to be managed by arenas (though the global
red-black tree of huge allocations remains for lookup during
deallocation).  This is the logical conclusion of recent changes that 1)
made per arena dss precedence apply to huge allocation, and 2) made it
possible to replace the per arena chunk allocation/deallocation
functions.

Remove the top level huge stats, and replace them with per arena huge
stats.

Normalize function names and types to *dalloc* (some were *dealloc*).

Remove the --enable-mremap option.  As jemalloc currently operates, this
is a performace regression for some applications, but planned work to
logarithmically space huge size classes should provide similar amortized
performance.  The motivation for this change was that mremap-based huge
reallocation forced leaky abstractions that prevented refactoring.
2014-05-15 22:36:41 -07:00
aravind
fb7fe50a88 Add support for user-specified chunk allocators/deallocators.
Add new mallctl endpoints "arena<i>.chunk.alloc" and
"arena<i>.chunk.dealloc" to allow userspace to configure
jemalloc's chunk allocator and deallocator on a per-arena
basis.
2014-05-12 10:46:03 -07:00
Lucian Adrian Grijincu
9d4e13f45a prof_backtrace: use unw_backtrace
unw_backtrace:
- does internal per-thread caching
- doesn't acquire an internal lock
2014-04-22 18:39:47 -07:00
Jason Evans
3541a904d6 Refactor small_size2bin and small_bin2size.
Refactor small_size2bin and small_bin2size to be inline functions rather
than directly accessed arrays.
2014-04-16 17:14:33 -07:00
Jason Evans
3e3caf03af Merge pull request #73 from bmaurer/smallmalloc
Smaller malloc hot path
2014-04-16 16:33:21 -07:00
Ben Maurer
021136ce4d Create a const array with only a small bin to size map 2014-04-16 14:31:24 -07:00
Jason Evans
bd87b01999 Optimize Valgrind integration.
Forcefully disable tcache if running inside Valgrind, and remove
Valgrind calls in tcache-specific code.

Restructure Valgrind-related code to move most Valgrind calls out of the
fast path functions.

Take advantage of static knowledge to elide some branches in
JEMALLOC_VALGRIND_REALLOC().
2014-04-15 16:49:57 -07:00
Jason Evans
ecd3e59ca3 Remove the "opt.valgrind" mallctl.
Remove the "opt.valgrind" mallctl because it is unnecessary -- jemalloc
automatically detects whether it is running inside valgrind.
2014-04-15 14:33:50 -07:00
Jason Evans
4d434adb14 Make dss non-optional, and fix an "arena.<i>.dss" mallctl bug.
Make dss non-optional on all platforms which support sbrk(2).

Fix the "arena.<i>.dss" mallctl to return an error if "primary" or
"secondary" precedence is specified, but sbrk(2) is not supported.
2014-04-15 12:09:48 -07:00
Jason Evans
9790b9667f Remove the *allocm() API, which is superceded by the *allocx() API. 2014-04-14 22:32:31 -07:00
Ben Maurer
be8e59f5a6 Don't dereference chunk->arena in free() hot path
When you call free() we load chunk->arena even though that
data isn't used on the tcache hot path.

In profiling some FB applications, I found that ~30% of the
dTLB misses in the free() function come from this line. With
4 MB chunks, the arena_chunk_t->map is ~ 32 KB (1024 pages
in the chunk, 4 8 byte pointers in arena_chunk_map_t). This
means there's only a 1/8 chance of the page containing
chunk->arena also comtaining the map bits.
2014-04-05 15:59:08 -07:00
Max Wang
fbb31029a5 Use arena dss prec instead of default for huge allocs.
Pass a dss_prec_t parameter to huge_{m,p,r}alloc instead of defaulting
to the chunk dss prec.
2014-03-28 13:43:58 -07:00
Jason Evans
f234dc51b9 Fix name mangling for stress tests.
Fix stress tests such that testlib code uses the jet_ allocator, but
test code uses libjemalloc.

Generate jemalloc_{rename,mangle}.h, the former because it's needed for
the stress test name mangling fix, and the latter for consistency.  As
an artifact of this change, some (but not all) definitions related to
the experimental API are absent from the headers unless the feature is
enabled at configure time.
2014-01-16 17:38:01 -08:00
Jason Evans
b2c31660be Extract profiling code from [re]allocation functions.
Extract profiling code from malloc(), imemalign(), calloc(), realloc(),
mallocx(), rallocx(), and xallocx().  This slightly reduces the amount
of code compiled into the fast paths, but the primary benefit is the
combinatorial complexity reduction.

Simplify iralloc[t]() by creating a separate ixalloc() that handles the
no-move cases.

Further simplify [mrxn]allocx() (and by implication [mrn]allocm()) to
make request size overflows due to size class and/or alignment
constraints trigger undefined behavior (detected by debug-only
assertions).

Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling
backtrace creation in imemalign().  This bug impacted posix_memalign()
and aligned_alloc().
2014-01-12 15:41:05 -08:00
Jason Evans
b954bc5d3a Convert rtree from (void *) to (uint8_t) storage.
Reduce rtree memory usage by storing booleans (1 byte each) rather than
pointers.  The rtree code is only used to record whether jemalloc manages
a chunk of memory, so there's no need to store pointers in the rtree.

Increase rtree node size to 64 KiB in order to reduce tree depth from 13
to 3 on 64-bit systems.  The conversion to more compact leaf nodes was
enough by itself to make the rtree depth 1 on 32-bit systems; due to the
fact that root nodes are smaller than the specified node size if
possible, the node size change has no impact on 32-bit systems (assuming
default chunk size).
2014-01-02 17:36:38 -08:00
Jason Evans
d82a5e6a34 Implement the *allocx() API.
Implement the *allocx() API, which is a successor to the *allocm() API.
The *allocx() functions are slightly simpler to use because they have
fewer parameters, they directly return the results of primary interest,
and mallocx()/rallocx() avoid the strict aliasing pitfall that
allocm()/rallocx() share with posix_memalign().  The following code
violates strict aliasing rules:

    foo_t *foo;
    allocm((void **)&foo, NULL, 42, 0);

whereas the following is safe:

    foo_t *foo;
    void *p;
    allocm(&p, NULL, 42, 0);
    foo = (foo_t *)p;

mallocx() does not have this problem:

    foo_t *foo = (foo_t *)mallocx(42, 0);
2013-12-12 22:35:52 -08:00
Jason Evans
a4f124f59f Normalize #define whitespace.
Consistently use a tab rather than a space following #define.
2013-12-08 22:28:27 -08:00
Jason Evans
dc1bed6227 Fix more test refactoring issues. 2013-12-05 21:44:25 -08:00
Jason Evans
14990b83d1 Fix test refactoring issues for Linux. 2013-12-05 17:58:32 -08:00
Jason Evans
86abd0dcd8 Refactor to support more varied testing.
Refactor the test harness to support three types of tests:
- unit: White box unit tests.  These tests have full access to all
  internal jemalloc library symbols.  Though in actuality all symbols
  are prefixed by jet_, macro-based name mangling abstracts this away
  from test code.
- integration: Black box integration tests.  These tests link with
  the installable shared jemalloc library, and with the exception of
  some utility code and configure-generated macro definitions, they have
  no access to jemalloc internals.
- stress: Black box stress tests.  These tests link with the installable
  shared jemalloc library, as well as with an internal allocator with
  symbols prefixed by jet_ (same as for unit tests) that can be used to
  allocate data structures that are internal to the test code.

Move existing tests into test/{unit,integration}/ as appropriate.

Split out internal parts of jemalloc_defs.h.in and put them in
jemalloc_internal_defs.h.in.  This reduces internals exposure to
applications that #include <jemalloc/jemalloc.h>.

Refactor jemalloc.h header generation so that a single header file
results, and the prototypes can be used to generate jet_ prototypes for
tests.  Split jemalloc.h.in into multiple parts (jemalloc_defs.h.in,
jemalloc_macros.h.in, jemalloc_protos.h.in, jemalloc_mangle.h.in) and
use a shell script to generate a unified jemalloc.h at configure time.

Change the default private namespace prefix from "" to "je_".

Add missing private namespace mangling.

Remove hard-coded private_namespace.h.  Instead generate it and
private_unnamespace.h from private_symbols.txt.  Use similar logic for
public symbols, which aids in name mangling for jet_ symbols.

Add test_warn() and test_fail().  Replace existing exit(1) calls with
test_fail() calls.
2013-12-03 22:06:59 -08:00
Jason Evans
543abf7e6c Fix inlining warning.
Add the JEMALLOC_ALWAYS_INLINE_C macro and use it for always-inlined
functions declared in .c files.  This fixes a function attribute
inconsistency for debug builds that resulted in (harmless) compiler
warnings about functions not being inlinable.

Reported by Ricardo Nabinger Sanchez.
2013-10-19 17:26:00 -07:00
Riku Voipio
daf6d0446c Add aarch64 LG_QUANTUM size definition
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2013-05-07 11:05:01 -07:00
Jason Evans
a491585157 Add no-op bodies to VALGRIND_*() macro stubs.
Add no-op bodies to VALGRIND_*() macro stubs so that they can be used in
contexts like the following without generating a compiler warning about
the 'if' statement having an empty body:

	if (config_valgrind)
		VALGRIND_MAKE_MEM_UNDEFINED(ret, size);
2013-03-06 11:19:31 -08:00
Mike Frysinger
9f9897ad42 fix building for s390 systems
Checking for __s390x__ means you work on s390x, but not s390 (32bit)
systems.  So use __s390__ which works for both.

With this, `make check` passes on s390.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2013-03-06 11:03:48 -08:00
Jason Evans
06912756cc Fix Valgrind integration.
Fix Valgrind integration to annotate all internally allocated memory in
a way that keeps Valgrind happy about internal data structure access.
2013-01-31 17:02:53 -08:00
Jason Evans
dd0438ee6b Specify 'inline' in addition to always_inline attribute.
Specify both inline and __attribute__((always_inline)), in order to
avoid warnings when using newer versions of gcc.
2013-01-22 20:43:04 -08:00
Jason Evans
ae03bf6a57 Update hash from MurmurHash2 to MurmurHash3.
Update hash from MurmurHash2 to MurmurHash3, primarily because the
latter generates 128 bits in a single call for no extra cost, which
simplifies integration with cuckoo hashing.
2013-01-22 12:02:08 -08:00
Jason Evans
88393cb0eb Add and use JEMALLOC_ALWAYS_INLINE.
Add JEMALLOC_ALWAYS_INLINE and use it to guarantee that the entire fast
paths of the primary allocation/deallocation functions are inlined.
2013-01-22 08:45:43 -08:00
Garrett Cooper
13e4e24c42 Fix build break on *BSD
Linux uses alloca.h; many other operating systems define alloca(3) in
stdlib.h.

Signed-off-by: Garrett Cooper <yanegomi@gmail.com>
2012-12-24 10:32:16 -08:00
Jason Evans
609ae595f0 Add arena-specific and selective dss allocation.
Add the "arenas.extend" mallctl, so that it is possible to create new
arenas that are outside the set that jemalloc automatically multiplexes
threads onto.

Add the ALLOCM_ARENA() flag for {,r,d}allocm(), so that it is possible
to explicitly allocate from a particular arena.

Add the "opt.dss" mallctl, which controls the default precedence of dss
allocation relative to mmap allocation.

Add the "arena.<i>.dss" mallctl, which makes it possible to set the
default dss precedence on a per arena or global basis.

Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge".

Add the "stats.arenas.<i>.dss" mallctl.
2012-10-12 18:26:16 -07:00
Jason Evans
dd03a2e377 Define LG_QUANTUM for hppa.
Submitted by Jory Pratt.
2012-10-08 15:41:06 -07:00
Jason Evans
781fe75e0a Auto-detect whether running inside Valgrind.
Auto-detect whether running inside Valgrind, thus removing the need to
manually specify MALLOC_CONF=valgrind:true.
2012-05-15 14:48:14 -07:00
Jason Evans
2e671ffbad Add the --enable-mremap option.
Add the --enable-mremap option, and disable the use of mremap(2) by
default, for the same reason that freeing chunks via munmap(2) is
disabled by default on Linux: semi-permanent VM map fragmentation.
2012-05-09 16:12:00 -07:00
Jason Evans
8d5865eb57 Make CACHELINE a raw constant.
Make CACHELINE a raw constant in order to work around a
__declspec(align()) limitation.

Submitted by Mike Hommey.
2012-05-02 01:22:16 -07:00
Jason Evans
203484e2ea Optimize malloc() and free() fast paths.
Embed the bin index for small page runs into the chunk page map, in
order to omit [...] in the following dependent load sequence:
  ptr-->mapelm-->[run-->bin-->]bin_info

Move various non-critcal code out of the inlined function chain into
helper functions (tcache_event_hard(), arena_dalloc_small(), and
locking).
2012-05-02 00:30:36 -07:00
Mike Hommey
fd97b1dfc7 Add support for MSVC
Tested with MSVC 8 32 and 64 bits.
2012-05-01 11:32:11 -07:00
Mike Hommey
a14bce85e8 Use Get/SetLastError on Win32
Using errno on win32 doesn't quite work, because the value set in a shared
library can't be read from e.g. an executable calling the function setting
errno.

At the same time, since buferror always uses errno/GetLastError, don't pass
it.
2012-04-30 16:50:55 -07:00
Mike Hommey
8b49971d0c Avoid variable length arrays and remove declarations within code
MSVC doesn't support C99, and building as C++ to be able to use them is
dangerous, as C++ and C99 are incompatible.

Introduce a VARIABLE_ARRAY macro that either uses VLA when supported,
or alloca() otherwise. Note that using alloca() inside loops doesn't
quite work like VLAs, thus the use of VARIABLE_ARRAY there is discouraged.
It might be worth investigating ways to check whether VARIABLE_ARRAY is
used in such context at runtime in debug builds and bail out if that
happens.
2012-04-29 00:25:34 -07:00
Mike Hommey
a5288ca934 Remove unused #includes 2012-04-21 21:32:09 -07:00
Mike Hommey
a19e87fbad Add support for Mingw 2012-04-21 21:27:46 -07:00
Jason Evans
f7088e6c99 Make arena_salloc() an inline function. 2012-04-19 18:28:03 -07:00
Jason Evans
b57d3ec571 Add atomic(9) implementations of atomic operations.
Add atomic(9) implementations of atomic operations.  These are used on
FreeBSD for non-x86 architectures.
2012-04-17 13:27:39 -07:00
Mike Hommey
45f208e112 Replace fprintf with malloc_printf in tests. 2012-04-16 23:05:39 -07:00
Jason Evans
7ca0fdfb85 Disable munmap() if it causes VM map holes.
Add a configure test to determine whether common mmap()/munmap()
patterns cause VM map holes, and only use munmap() to discard unused
chunks if the problem does not exist.

Unify the chunk caching for mmap and dss.

Fix options processing to limit lg_chunk to be large enough that
redzones will always fit.
2012-04-12 20:20:58 -07:00
Jason Evans
5ff709c264 Normalize aligned allocation algorithms.
Normalize arena_palloc(), chunk_alloc_mmap_slow(), and
chunk_recycle_dss() to use the same algorithm for trimming
over-allocation.

Add the ALIGNMENT_ADDR2BASE(), ALIGNMENT_ADDR2OFFSET(), and
ALIGNMENT_CEILING() macros, and use them where appropriate.

Remove the run_size_p parameter from sa2u().

Fix a potential deadlock in chunk_recycle_dss() that was introduced by
eae269036c (Add alignment support to
chunk_alloc()).
2012-04-11 18:13:45 -07:00
Jason Evans
122449b073 Implement Valgrind support, redzones, and quarantine.
Implement Valgrind support, as well as the redzone and quarantine
features, which help Valgrind detect memory errors.  Redzones are only
implemented for small objects because the changes necessary to support
redzones around large and huge objects are complicated by in-place
reallocation, to the point that it isn't clear that the maintenance
burden is worth the incremental improvement to Valgrind support.

Merge arena_salloc() and arena_salloc_demote().

Refactor i[v]salloc() to expose the 'demote' option.
2012-04-11 11:46:18 -07:00
Jason Evans
b147611b52 Add utrace(2)-based tracing (--enable-utrace). 2012-04-05 13:36:17 -07:00
Jason Evans
01b3fe55ff Add a0malloc(), a0calloc(), and a0free().
Add a0malloc(), a0calloc(), and a0free(), which are used by FreeBSD's
libc to allocate/deallocate TLS in static binaries.
2012-04-03 19:25:48 -07:00
Jason Evans
ae4c7b4b40 Clean up *PAGE* macros.
s/PAGE_SHIFT/LG_PAGE/g and s/PAGE_SIZE/PAGE/g.

Remove remnants of the dynamic-page-shift code.

Rename the "arenas.pagesize" mallctl to "arenas.page".

Remove the "arenas.chunksize" mallctl, which is redundant with
"opt.lg_chunk".
2012-04-02 07:04:34 -07:00
Jason Evans
f004737267 Revert "Avoid NULL check in free() and malloc_usable_size()."
This reverts commit 96d4120ac0.

ivsalloc() depends on chunks_rtree being initialized.  This can be
worked around via a NULL pointer check.  However,
thread_allocated_tsd_get() also depends on initialization having
occurred, and there is no way to guard its call in free() that is
cheaper than checking whether ptr is NULL.
2012-04-02 15:18:24 -07:00
Jason Evans
96d4120ac0 Avoid NULL check in free() and malloc_usable_size().
Generalize isalloc() to handle NULL pointers in such a way that the NULL
checking overhead is only paid when introspecting huge allocations (or
NULL).  This allows free() and malloc_usable_size() to no longer check
for NULL.

Submitted by Igor Bukanov and Mike Hommey.
2012-04-02 14:50:03 -07:00
Mike Hommey
80b25932ca Move last bit of zone initialization in zone.c, and lazy-initialize 2012-04-02 14:15:20 -07:00
Jason Evans
4eeb52f080 Remove vsnprintf() and strtoumax() validation.
Remove code that validates malloc_vsnprintf() and malloc_strtoumax()
against their namesakes.  The validation code has adequately served its
usefulness at this point, and it isn't worth dealing with the different
formatting for %p with glibc versus other implementations for NULL
pointers ("(nil)" vs. "0x0").

Reported by Mike Hommey.
2012-04-02 02:30:24 -07:00
Mike Hommey
1a0e777024 Add a SYS_write definition on systems where it is not defined in headers
Namely, in the Android NDK headers, SYS_write is not defined; but
__NR_write is.
2012-03-30 10:21:41 -07:00
Jason Evans
41b6afb834 Port to FreeBSD.
Use FreeBSD-specific functions (_pthread_mutex_init_calloc_cb(),
_malloc_{pre,post}fork()) to avoid bootstrapping issues due to
allocation in libc and libthr.

Add malloc_strtoumax() and use it instead of strtoul().  Disable
validation code in malloc_vsnprintf() and malloc_strtoumax() until
jemalloc is initialized.  This is necessary because locale
initialization causes allocation for both vsnprintf() and strtoumax().

Force the lazy-lock feature on in order to avoid pthread_self(),
because it causes allocation.

Use syscall(SYS_write, ...) rather than write(...), because libthr wraps
write() and causes allocation.  Without this workaround, it would not be
possible to print error messages in malloc_conf_init() without
substantially reworking bootstrapping.

Fix choose_arena_hard() to look at how many threads are assigned to the
candidate choice, rather than checking whether the arena is
uninitialized.  This bug potentially caused more arenas to be
initialized than necessary.
2012-02-02 23:09:53 -08:00
Jason Evans
9225a1991a Add JEMALLOC_CC_SILENCE_INIT().
Add JEMALLOC_CC_SILENCE_INIT(), which provides succinct syntax for
initializing a variable to avoid a spurious compiler warning.
2012-03-23 15:39:07 -07:00
Jason Evans
cd9a1346e9 Implement tsd.
Implement tsd, which is a TLS/TSD abstraction that uses one or both
internally.  Modify bootstrapping such that no tsd's are utilized until
allocation is safe.

Remove malloc_[v]tprintf(), and use malloc_snprintf() instead.

Fix %p argument size handling in malloc_vsnprintf().

Fix a long-standing statistics-related bug in the "thread.arena"
mallctl that could cause crashes due to linked list corruption.
2012-03-23 15:14:55 -07:00
Jason Evans
e24c7af35d Invert NO_TLS to JEMALLOC_TLS. 2012-03-19 10:21:17 -07:00
Jason Evans
6508bc6931 Remove #include <sys/sysctl.h>.
Remove #include <sys/sysctl.h>, which is no longer needed (now using
sysconf(3) to get number of CPUs).
2012-03-15 17:07:42 -07:00
Jason Evans
4e2e3dd9cf Fix fork-related bugs.
Acquire/release arena bin locks as part of the prefork/postfork.  This
bug made deadlock in the child between fork and exec a possibility.

Split jemalloc_postfork() into jemalloc_postfork_{parent,child}() so
that the child can reinitialize mutexes rather than unlocking them.  In
practice, this bug tended not to cause problems.
2012-03-13 16:31:41 -07:00
Jason Evans
4c2faa8a7c Fix a regression in JE_COMPILABLE().
Revert JE_COMPILABLE() so that it detects link errors.  Cross-compiling
should still work as long as a valid configure cache is provided.

Clean up some comments/whitespace.
2012-03-13 11:09:23 -07:00
Jason Evans
d81e4bdd5c Implement malloc_vsnprintf().
Implement malloc_vsnprintf() (a subset of vsnprintf(3)) as well as
several other printing functions based on it, so that formatted printing
can be relied upon without concern for inducing a dependency on floating
point runtime support.  Replace malloc_write() calls with
malloc_*printf() where doing so simplifies the code.

Add name mangling for library-private symbols in the data and BSS
sections.  Adjust CONF_HANDLE_*() macros in malloc_conf_init() to expose
all opt_* variable use to cpp so that proper mangling occurs.
2012-03-07 16:19:19 -08:00
Jason Evans
3492daf1ce Add SH4 and mips architecture support.
Submitted by Andreas Vinsander.
2012-03-05 12:16:57 -08:00
Jason Evans
84f7cdb0c5 Rename prn to prng.
Rename prn to prng so that Windows doesn't choke when trying to create
a file named prn.h.
2012-03-02 15:59:45 -08:00
Jason Evans
0a5489e37d Add --with-mangling.
Add the --with-mangling configure option, which can be used to specify
name mangling on a per public symbol basis that takes precedence over
--with-jemalloc-prefix.

Expose the memalign() and valloc() overrides even if
--with-jemalloc-prefix is specified.  This change does no real harm, and
simplifies the code.
2012-03-01 17:19:20 -08:00
Jason Evans
7e15dab94d Add nallocm().
Add nallocm(), which computes the real allocation size that would result
from the corresponding allocm() call.  nallocm() is a functional
superset of OS X's malloc_good_size(), in that it takes alignment
constraints into account.
2012-02-29 12:56:37 -08:00
Jason Evans
4bb0983013 Use glibc allocator hooks.
When jemalloc is used as a libc malloc replacement (i.e. not prefixed),
some particular setups may end up inconsistently calling malloc from
libc and free from jemalloc, or the other way around.

glibc provides hooks to make its functions use alternative
implementations.  Use them.

Submitted by Karl Tomlinson and Mike Hommey.
2012-02-29 10:37:27 -08:00
Jason Evans
c90ad71237 Remove the sysv option. 2012-02-28 20:31:37 -08:00
Jason Evans
b172610317 Simplify small size class infrastructure.
Program-generate small size class tables for all valid combinations of
LG_TINY_MIN, LG_QUANTUM, and PAGE_SHIFT.  Use the appropriate table to generate
all relevant data structures, and remove the distinction between
tiny/quantum/cacheline/subpage bins.

Remove --enable-dynamic-page-shift.  This option didn't prove useful in
practice, and it prevented optimizations.

Add Tilera architecture support.
2012-02-28 16:50:47 -08:00
Jason Evans
ef8897b4b9 Make 8-byte tiny size class non-optional.
When tiny size class support was first added, it was intended to support
truly tiny size classes (even 2 bytes).  However, this wasn't very
useful in practice, so the minimum tiny size class has been limited to
sizeof(void *) for a long time now.  This is too small to be standards
compliant, but other commonly used malloc implementations do not even
bother using a 16-byte quantum  on systems with vector units (SSE2+,
AltiVEC, etc.).  As such, it is safe in practice to support an 8-byte
tiny size class on 64-bit systems that support 16-byte types.
2012-02-13 15:03:59 -08:00
Jason Evans
4162627757 Remove the swap feature.
Remove the swap feature, which enabled per application swap files.  In
practice this feature has not proven itself useful to users.
2012-02-13 10:56:17 -08:00
Jason Evans
fd56043c53 Remove magic.
Remove structure magic, because 1) it is no longer conditional, and 2)
it stopped being very effective at detecting memory corruption several
years ago.
2012-02-13 10:24:43 -08:00
Jason Evans
7372b15a31 Reduce cpp conditional logic complexity.
Convert configuration-related cpp conditional logic to use static
constant variables, e.g.:

  #ifdef JEMALLOC_DEBUG
    [...]
  #endif

becomes:

  if (config_debug) {
    [...]
  }

The advantage is clearer, more concise code.  The main disadvantage is
that data structures no longer have conditionally defined fields, so
they pay the cost of all fields regardless of whether they are used.  In
practice, this is only a minor concern; config_stats will go away in an
upcoming change, and config_prof is the only other major feature that
depends on more than a few special-purpose fields.
2012-02-10 20:22:09 -08:00
Jason Evans
04ca1efe35 Adjust relative #include for private_namespace.h. 2011-07-30 17:58:07 -07:00
Jason Evans
746e77a06b Add the --with-private-namespace option.
Add the --with-private-namespace option to make it possible to work
around library-private symbols being exposed in static libraries.
2011-07-30 16:40:52 -07:00
Jason Evans
7427525c28 Move repo contents in jemalloc/ to top level. 2011-03-31 20:36:17 -07:00