Clean up the manpage and conditionalize various portions according to how

jemalloc is configured.

Modify arena_malloc() API to avoid unnecessary choose_arena() calls.  Remove
unnecessary code from choose_arena().

Enable lazy-lock by default, now that choose_arena() is both faster and out of
the critical path.

Implement objdir support in the build system.
This commit is contained in:
Jason Evans 2009-06-25 18:06:48 -07:00
parent b7924f50c0
commit cc00a15770
8 changed files with 469 additions and 229 deletions

172
jemalloc/INSTALL Normal file
View File

@ -0,0 +1,172 @@
Building and installing jemalloc can be as simple as typing the following while
in the root directory of the source tree:
./configure
make
make install
=== Advanced configuration =====================================================
The 'configure' script supports numerous options that allow control of which
functionality is enabled, where jemalloc is installed, etc. Optionally, pass
any of the following arguments (not a definitive list) to 'configure':
--help
Print a definitive list of options.
--prefix=<install-root-dir>
Set the base directory in which to install. For example:
./configure --prefix=/usr/local
will cause files to be installed into /usr/local/include, /usr/local/lib,
and /usr/local/man.
--with-rpath=<colon-separated-rpath>
Embed one or more library paths, so that Crux's internal shared library can
find the libraries it is linked to. This works only on ELF-based systems.
--enable-debug
Enable assertions and validation code. This incurs a substantial
performance hit, but is very useful during application development.
--enable-stats
Enable statistics gathering functionality. Use the 'P' option to print
detailed allocation statistics at exit, and/or the 'U' option to print a
detailed allocation trace log.
--disable-tiny
Disable tiny (sub-quantum-sized) object support. Technically it is not
legal for a malloc implementation to allocate objects with less than
quantum alignment (8 or 16 bytes, depending on architecture), but in
practice it never causes any problems if, for example, 4-byte allocationsj
are 4-byte-aligned.
--disable-mag
Disable thread-specific caches for sub-page-sized objects. Objects are
cached and released in bulk using "magazines" -- a term coined by the
developers of Solaris's umem allocator.
--disable-balance
Disable dynamic rebalancing of thread-->arena assignments.
--enable-dss
Enable support for page allocation/deallocation via sbrk(2), in addition to
mmap(2).
--enable-fill
Enable support for junk/zero filling of memory. Use the 'J' option to
control junk filling, or the 'Z' option to control zero filling.
--enable-xmalloc
Enable support for optional immediate termination due to out-of-memory
errors, as is commonly implemented by "xmalloc" wrapper function for malloc.
Use the 'X' option to control termination behavior.
--enable-sysv
Enable support for System V semantics, wherein malloc(0) returns NULL
rather than a minimal allocation. Use the 'V' option to control System V
compatibility.
--enable-dynamic-page-shift
Under most conditions, the system page size never changes (usually 4KiB or
8KiB, depending on architecture and configuration), and unless this option
is enabled, jemalloc assumes that page size can safely be determined during
configuration and hard-coded. Enabling dynamic page size determination has
a measurable impact on performance, since the compiler is forced to load
the page size from memory rather than embedding immediate values.
--disable-lazy-lock
Disable code that wraps pthread_create() to detect when an application
switches from single-threaded to multi-threaded mode, so that it can avoid
mutex locking/unlocking operations while in single-threaded mode. In
practice, this feature usually has little impact on performance unless
magazines are disabled.
The following environment variables (not a definitive list) impact configure's
behavior:
CFLAGS="?"
Pass these flags to the compiler. You probably shouldn't define this unless
you know what you are doing. (Use EXTRA_CFLAGS instead.)
EXTRA_CFLAGS="?"
Append these flags to CFLAGS. This makes it possible to add flags such as
-Werror, while allowing the configure script to determine what other flags
are appropriate for the specified configuration.
The configure script specifically checks whether an optimization flag (-O*)
is specified in EXTRA_CFLAGS, and refrains from specifying an optimization
level if it finds that one has already been specified.
CPPFLAGS="?"
Pass these flags to the C preprocessor. Note that CFLAGS is not passed to
'cpp' when 'configure' is looking for include files, so you must use
CPPFLAGS instead if you need to help 'configure' find header files.
LD_LIBRARY_PATH="?"
'ld' uses this colon-separated list to find libraries.
LDFLAGS="?"
Pass these flags when linking.
PATH="?"
'configure' uses this to find programs.
=== Advanced compilation =======================================================
To run integrated regression tests, type:
make check
To clean up build results to varying degrees, use the following make targets:
clean
distclean
relclean
=== Advanced installation ======================================================
Optionally, define make variables when invoking make, including (not
exclusively):
INCLUDEDIR="?"
Use this as the installation prefix for header files.
LIBDIR="?"
Use this as the installation prefix for libraries.
MANDIR="?"
Use this as the installation prefix for man pages.
CC="?"
Use this to invoke the C compiler.
CFLAGS="?"
Pass these flags to the compiler.
CPPFLAGS="?"
Pass these flags to the C preprocessor.
LDFLAGS="?"
Pass these flags when linking.
PATH="?"
Use this to search for programs used during configuration and building.
=== Development ================================================================
If you intend to make non-trivial changes to jemalloc, use the 'autogen.sh'
script rather than 'configure'. This re-generates 'configure', enables
configuration dependency rules, and enables re-generation of automatically
generated source files.
The build system supports using an object directory separate from the source
tree. For example, you can create an 'obj' directory, and from within that
directory, issue configuration and build commands:
autoconf
mkdir obj
cd obj
../configure --enable-autogen
make

View File

@ -11,10 +11,8 @@ SHELL := /bin/sh
CC := @CC@ CC := @CC@
# Configuration parameters. # Configuration parameters.
BINDIR := @BINDIR@
INCLUDEDIR := @INCLUDEDIR@ INCLUDEDIR := @INCLUDEDIR@
LIBDIR := @LIBDIR@ LIBDIR := @LIBDIR@
DATADIR := @DATADIR@
MANDIR := @MANDIR@ MANDIR := @MANDIR@
# Build parameters. # Build parameters.
@ -34,20 +32,20 @@ endif
REV := 0 REV := 0
# File lists. # File lists.
CHDRS := src/jemalloc.h CHDRS := @srcroot@src/jemalloc.h @objroot@src/jemalloc_defs.h
CSRCS := src/jemalloc.c CSRCS := @srcroot@src/jemalloc.c
DSO := lib/libjemalloc.so.$(REV) DSOS := @objroot@lib/libjemalloc.so.$(REV) @objroot@lib/libjemalloc.so
MAN3 := doc/jemalloc.3 MAN3 := @objroot@doc/jemalloc.3
.PHONY: all dist install check clean distclean relclean .PHONY: all dist install check clean distclean relclean
# Default target. # Default target.
all: $(DSO) all: $(DSOS)
src/%.o: src/%.c @objroot@src/%.o: @srcroot@src/%.c
$(CC) $(CFLAGS) -c $(CPPFLAGS) -o $@ $+ $(CC) $(CFLAGS) -c $(CPPFLAGS) -o $@ $+
$(DSO): $(CSRCS:%.c=%.o) $(DSOS): $(CSRCS:@srcroot@%.c=@objroot@%.o)
@mkdir -p $(@D) @mkdir -p $(@D)
gcc -shared -o $@ $+ $(LDFLAGS) $(LIBS) gcc -shared -o $@ $+ $(LDFLAGS) $(LIBS)
ln -sf libjemalloc.so.$(REV) lib/libjemalloc.so ln -sf libjemalloc.so.$(REV) lib/libjemalloc.so
@ -59,7 +57,10 @@ install:
install -m 644 $$h $(INCLUDEDIR); \ install -m 644 $$h $(INCLUDEDIR); \
done done
install -d $(LIBDIR) install -d $(LIBDIR)
install -m 755 $(DSO) $(LIBDIR) @for s in $(DSOS); do \
echo "install -m 755 $$s $(LIBDIR)"; \
install -m 755 $$s $(LIBDIR); \
done
install -d $(MANDIR) install -d $(MANDIR)
@for m in $(MAN3); do \ @for m in $(MAN3); do \
echo "install -m 644 $$m $(MANDIR)/man3"; \ echo "install -m 644 $$m $(MANDIR)/man3"; \
@ -69,9 +70,9 @@ done
check: check:
clean: clean:
rm -f src/*.o rm -f @objroot@src/*.o
rm -f lib/libjemalloc.so rm -f @objroot@lib/libjemalloc.so
rm -f lib/libjemalloc.so.$(REV) rm -f @objroot@lib/libjemalloc.so.$(REV)
distclean: clean distclean: clean
rm -f @objroot@config.log rm -f @objroot@config.log

4
jemalloc/README Normal file
View File

@ -0,0 +1,4 @@
jemalloc is a general-purpose scalable concurrent malloc(3) implementation.
The INSTALL file contains information on how to configure, build, and install
jemalloc.

View File

@ -41,7 +41,7 @@ MANDIR=`eval echo $mandir`
MANDIR=`eval echo $MANDIR` MANDIR=`eval echo $MANDIR`
AC_SUBST([MANDIR]) AC_SUBST([MANDIR])
cfgoutputs="Makefile" cfgoutputs="Makefile doc/jemalloc.3"
cfghdrs="src/jemalloc_defs.h" cfghdrs="src/jemalloc_defs.h"
dnl If CFLAGS isn't defined and using gcc, set CFLAGS to something reasonable. dnl If CFLAGS isn't defined and using gcc, set CFLAGS to something reasonable.
@ -219,6 +219,12 @@ if test "x$enable_stats" = "x1" ; then
AC_DEFINE([JEMALLOC_STATS], [ ]) AC_DEFINE([JEMALLOC_STATS], [ ])
fi fi
AC_SUBST([enable_stats]) AC_SUBST([enable_stats])
if test "x$enable_stats" = "x0" ; then
roff_stats=".\\\" "
else
roff_stats=""
fi
AC_SUBST([roff_stats])
dnl Enable tiny allocations by default. dnl Enable tiny allocations by default.
AC_ARG_ENABLE([tiny], AC_ARG_ENABLE([tiny],
@ -235,6 +241,15 @@ if test "x$enable_tiny" = "x1" ; then
AC_DEFINE([JEMALLOC_TINY], [ ]) AC_DEFINE([JEMALLOC_TINY], [ ])
fi fi
AC_SUBST([enable_tiny]) AC_SUBST([enable_tiny])
if test "x$enable_tiny" = "x0" ; then
roff_tiny=".\\\" "
roff_no_tiny=""
else
roff_tiny=""
roff_no_tiny=".\\\" "
fi
AC_SUBST([roff_tiny])
AC_SUBST([roff_no_tiny])
dnl Enable magazines by default. dnl Enable magazines by default.
AC_ARG_ENABLE([mag], AC_ARG_ENABLE([mag],
@ -251,6 +266,12 @@ if test "x$enable_mag" = "x1" ; then
AC_DEFINE([JEMALLOC_MAG], [ ]) AC_DEFINE([JEMALLOC_MAG], [ ])
fi fi
AC_SUBST([enable_mag]) AC_SUBST([enable_mag])
if test "x$enable_mag" = "x0" ; then
roff_mag=".\\\" "
else
roff_mag=""
fi
AC_SUBST([roff_mag])
dnl Enable dynamic arena load balancing by default. dnl Enable dynamic arena load balancing by default.
AC_ARG_ENABLE([balance], AC_ARG_ENABLE([balance],
@ -267,6 +288,12 @@ if test "x$enable_balance" = "x1" ; then
AC_DEFINE([JEMALLOC_BALANCE], [ ]) AC_DEFINE([JEMALLOC_BALANCE], [ ])
fi fi
AC_SUBST([enable_balance]) AC_SUBST([enable_balance])
if test "x$enable_balance" = "x0" ; then
roff_balance=".\\\" "
else
roff_balance=""
fi
AC_SUBST([roff_balance])
dnl Do not enable allocation from DSS by default. dnl Do not enable allocation from DSS by default.
AC_ARG_ENABLE([dss], AC_ARG_ENABLE([dss],
@ -283,6 +310,12 @@ if test "x$enable_dss" = "x1" ; then
AC_DEFINE([JEMALLOC_DSS], [ ]) AC_DEFINE([JEMALLOC_DSS], [ ])
fi fi
AC_SUBST([enable_dss]) AC_SUBST([enable_dss])
if test "x$enable_dss" = "x0" ; then
roff_dss=".\\\" "
else
roff_dss=""
fi
AC_SUBST([roff_dss])
dnl Do not support the junk/zero filling option by default. dnl Do not support the junk/zero filling option by default.
AC_ARG_ENABLE([fill], AC_ARG_ENABLE([fill],
@ -299,6 +332,12 @@ if test "x$enable_fill" = "x1" ; then
AC_DEFINE([JEMALLOC_FILL], [ ]) AC_DEFINE([JEMALLOC_FILL], [ ])
fi fi
AC_SUBST([enable_fill]) AC_SUBST([enable_fill])
if test "x$enable_fill" = "x0" ; then
roff_fill=".\\\" "
else
roff_fill=""
fi
AC_SUBST([roff_fill])
dnl Do not support the xmalloc option by default. dnl Do not support the xmalloc option by default.
AC_ARG_ENABLE([xmalloc], AC_ARG_ENABLE([xmalloc],
@ -315,6 +354,12 @@ if test "x$enable_xmalloc" = "x1" ; then
AC_DEFINE([JEMALLOC_XMALLOC], [ ]) AC_DEFINE([JEMALLOC_XMALLOC], [ ])
fi fi
AC_SUBST([enable_xmalloc]) AC_SUBST([enable_xmalloc])
if test "x$enable_xmalloc" = "x0" ; then
roff_xmalloc=".\\\" "
else
roff_xmalloc=""
fi
AC_SUBST([roff_xmalloc])
dnl Do not support the SYSV option by default. dnl Do not support the SYSV option by default.
AC_ARG_ENABLE([sysv], AC_ARG_ENABLE([sysv],
@ -331,6 +376,12 @@ if test "x$enable_sysv" = "x1" ; then
AC_DEFINE([JEMALLOC_SYSV], [ ]) AC_DEFINE([JEMALLOC_SYSV], [ ])
fi fi
AC_SUBST([enable_sysv]) AC_SUBST([enable_sysv])
if test "x$enable_sysv" = "x0" ; then
roff_sysv=".\\\" "
else
roff_sysv=""
fi
AC_SUBST([roff_sysv])
dnl Do not determine page shift at run time by default. dnl Do not determine page shift at run time by default.
AC_ARG_ENABLE([dynamic_page_shift], AC_ARG_ENABLE([dynamic_page_shift],
@ -380,6 +431,7 @@ dnl ============================================================================
dnl jemalloc configuration. dnl jemalloc configuration.
dnl dnl
jemalloc_version=`cat ${srcroot}VERSION` jemalloc_version=`cat ${srcroot}VERSION`
AC_DEFINE_UNQUOTED([JEMALLOC_VERSION], ["$jemalloc_version"])
AC_SUBST([jemalloc_version]) AC_SUBST([jemalloc_version])
dnl ============================================================================ dnl ============================================================================
@ -400,21 +452,24 @@ AC_RUN_IFELSE([AC_LANG_PROGRAM(
return 0; return 0;
]])], ]])],
AC_MSG_RESULT([yes]), AC_MSG_RESULT([yes])
roff_tls="",
AC_MSG_RESULT([no]) AC_MSG_RESULT([no])
roff_tls=".\\\" "
AC_DEFINE_UNQUOTED([NO_TLS], [ ])) AC_DEFINE_UNQUOTED([NO_TLS], [ ]))
AC_SUBST([roff_tls])
dnl Do not enable lazy locking by default. dnl Enable lazy locking by default.
AC_ARG_ENABLE([lazy_lock], AC_ARG_ENABLE([lazy_lock],
[AS_HELP_STRING([--enable-lazy-lock], [AS_HELP_STRING([--enable-lazy-lock],
[Enable lazy locking (avoid locking unless multiple threads)])], [Disable lazy locking (always lock, even when single-threaded)])],
[if test "x$enable_lazy_lock" = "xno" ; then [if test "x$enable_lazy_lock" = "xno" ; then
enable_lazy_lock="0" enable_lazy_lock="0"
else else
enable_lazy_lock="1" enable_lazy_lock="1"
fi fi
], ],
[enable_lazy_lock="0"] [enable_lazy_lock="1"]
) )
if test "x$enable_lazy_lock" = "x1" ; then if test "x$enable_lazy_lock" = "x1" ; then
AC_CHECK_HEADERS([dlfcn.h], , [AC_MSG_ERROR([dlfcn.h is missing])]) AC_CHECK_HEADERS([dlfcn.h], , [AC_MSG_ERROR([dlfcn.h is missing])])

View File

@ -1,5 +1,5 @@
.\" Copyright (c) 2006-2008 Jason Evans <jasone@canonware.com>.
.\" Copyright (c) 2009 Facebook, Inc. All rights reserved. .\" Copyright (c) 2009 Facebook, Inc. All rights reserved.
.\" Copyright (c) 2006-2008 Jason Evans <jasone@canonware.com>.
.\" All rights reserved. .\" All rights reserved.
.\" Copyright (c) 1980, 1991, 1993 .\" Copyright (c) 1980, 1991, 1993
.\" The Regents of the University of California. All rights reserved. .\" The Regents of the University of California. All rights reserved.
@ -42,7 +42,7 @@
.Nm malloc , calloc , posix_memalign , realloc , free , malloc_usable_size .Nm malloc , calloc , posix_memalign , realloc , free , malloc_usable_size
.Nd general purpose memory allocation functions .Nd general purpose memory allocation functions
.Sh LIBRARY .Sh LIBRARY
.Lb libc .Lb libjemalloc
.Sh SYNOPSIS .Sh SYNOPSIS
.In stdlib.h .In stdlib.h
.Ft void * .Ft void *
@ -55,22 +55,23 @@
.Fn realloc "void *ptr" "size_t size" .Fn realloc "void *ptr" "size_t size"
.Ft void .Ft void
.Fn free "void *ptr" .Fn free "void *ptr"
.In jemalloc.h
.Ft size_t
.Fn malloc_usable_size "const void *ptr"
.Ft const char * .Ft const char *
.Va jemalloc_options ; .Va jemalloc_options ;
.Ft void .Ft void
.Fo \*(lp*jemalloc_message\*(rp .Fo \*(lp*jemalloc_message\*(rp
.Fa "const char *p1" "const char *p2" "const char *p3" "const char *p4" .Fa "const char *p1" "const char *p2" "const char *p3" "const char *p4"
.Fc .Fc
.In malloc_np.h
.Ft size_t
.Fn malloc_usable_size "const void *ptr"
.Sh DESCRIPTION .Sh DESCRIPTION
The The
.Fn malloc .Fn malloc
function allocates function allocates
.Fa size .Fa size
bytes of uninitialized memory. bytes of uninitialized memory.
The allocated space is suitably aligned (after possible pointer coercion) The allocated space is suitably aligned
@roff_tiny@(after possible pointer coercion)
for storage of any type of object. for storage of any type of object.
.Pp .Pp
The The
@ -187,31 +188,32 @@ flags being set) become fatal.
The process will call The process will call
.Xr abort 3 .Xr abort 3
in these cases. in these cases.
.It B @roff_balance@@roff_tls@.It B
Double/halve the per-arena lock contention threshold at which a thread is @roff_balance@@roff_tls@Double/halve the per-arena lock contention threshold at
randomly re-assigned to an arena. @roff_balance@@roff_tls@which a thread is randomly re-assigned to an arena.
This dynamic load balancing tends to push threads away from highly contended @roff_balance@@roff_tls@This dynamic load balancing tends to push threads away
arenas, which avoids worst case contention scenarios in which threads @roff_balance@@roff_tls@from highly contended arenas, which avoids worst case
disproportionately utilize arenas. @roff_balance@@roff_tls@contention scenarios in which threads disproportionately
However, due to the highly dynamic load that applications may place on the @roff_balance@@roff_tls@utilize arenas.
allocator, it is impossible for the allocator to know in advance how sensitive @roff_balance@@roff_tls@However, due to the highly dynamic load that
it should be to contention over arenas. @roff_balance@@roff_tls@applications may place on the allocator, it is
Therefore, some applications may benefit from increasing or decreasing this @roff_balance@@roff_tls@impossible for the allocator to know in advance how
threshold parameter. @roff_balance@@roff_tls@sensitive it should be to contention over arenas.
This option is not available for some configurations (non-PIC). @roff_balance@@roff_tls@Therefore, some applications may benefit from increasing
@roff_balance@@roff_tls@or decreasing this threshold parameter.
.It C .It C
Double/halve the size of the maximum size class that is a multiple of the Double/halve the size of the maximum size class that is a multiple of the
cacheline size (64). cacheline size (64).
Above this size, subpage spacing (256 bytes) is used for size classes. Above this size, subpage spacing (256 bytes) is used for size classes.
The default value is 512 bytes. The default value is 512 bytes.
.It D @roff_dss@.It D
Use @roff_dss@Use
.Xr sbrk 2 @roff_dss@.Xr sbrk 2
to acquire memory in the data storage segment (DSS). @roff_dss@to acquire memory in the data storage segment (DSS).
This option is enabled by default. @roff_dss@This option is enabled by default.
See the @roff_dss@See the
.Dq M @roff_dss@.Dq M
option for related information and interactions. @roff_dss@option for related information and interactions.
.It F .It F
Double/halve the per-arena maximum number of dirty unused pages that are Double/halve the per-arena maximum number of dirty unused pages that are
allowed to accumulate before informing the kernel about at least half of those allowed to accumulate before informing the kernel about at least half of those
@ -222,46 +224,48 @@ physical memory becomes scarce and the pages remain unused.
The default is 512 pages per arena; The default is 512 pages per arena;
.Ev JEMALLOC_OPTIONS=10f .Ev JEMALLOC_OPTIONS=10f
will prevent any dirty unused pages from accumulating. will prevent any dirty unused pages from accumulating.
.It G @roff_mag@@roff_tls@.It G
When there are multiple threads, use thread-specific caching for objects that @roff_mag@@roff_tls@When there are multiple threads, use thread-specific caching
are smaller than one page. @roff_mag@@roff_tls@for objects that are smaller than one page.
This option is enabled by default. @roff_mag@@roff_tls@This option is enabled by default.
Thread-specific caching allows many allocations to be satisfied without @roff_mag@@roff_tls@Thread-specific caching allows many allocations to be
performing any thread synchronization, at the cost of increased memory use. @roff_mag@@roff_tls@satisfied without performing any thread synchronization, at
See the @roff_mag@@roff_tls@the cost of increased memory use.
.Dq R @roff_mag@@roff_tls@See the
option for related tuning information. @roff_mag@@roff_tls@.Dq R
This option is not available for some configurations (non-PIC). @roff_mag@@roff_tls@option for related tuning information.
.It J @roff_fill@.It J
Each byte of new memory allocated by @roff_fill@Each byte of new memory allocated by
.Fn malloc @roff_fill@.Fn malloc
or @roff_fill@or
.Fn realloc @roff_fill@.Fn realloc
will be initialized to 0xa5. @roff_fill@will be initialized to 0xa5.
All memory returned by @roff_fill@All memory returned by
.Fn free @roff_fill@.Fn free
or @roff_fill@or
.Fn realloc @roff_fill@.Fn realloc
will be initialized to 0x5a. @roff_fill@will be initialized to 0x5a.
This is intended for debugging and will impact performance negatively. @roff_fill@This is intended for debugging and will impact performance
@roff_fill@negatively.
.It K .It K
Double/halve the virtual memory chunk size. Double/halve the virtual memory chunk size.
The default chunk size is 1 MB. The default chunk size is 1 MB.
.It M @roff_dss@.It M
Use @roff_dss@Use
.Xr mmap 2 @roff_dss@.Xr mmap 2
to acquire anonymously mapped memory. @roff_dss@to acquire anonymously mapped memory.
This option is enabled by default. @roff_dss@This option is enabled by default.
If both the @roff_dss@If both the
.Dq D @roff_dss@.Dq D
and @roff_dss@and
.Dq M @roff_dss@.Dq M
options are enabled, the allocator prefers the DSS over anonymous mappings, @roff_dss@options are enabled, the allocator prefers the DSS over anonymous
but allocation only fails if memory cannot be acquired via either method. @roff_dss@mappings, but allocation only fails if memory cannot be acquired via
If neither option is enabled, then the @roff_dss@either method.
.Dq M @roff_dss@If neither option is enabled, then the
option is implicitly enabled in order to assure that there is a method for @roff_dss@.Dq M
acquiring memory. @roff_dss@option is implicitly enabled in order to assure that there is a method
@roff_dss@for acquiring memory.
.It N .It N
Double/halve the number of arenas. Double/halve the number of arenas.
The default number of arenas is two times the number of CPUs, or one if there The default number of arenas is two times the number of CPUs, or one if there
@ -279,88 +283,70 @@ Double/halve the size of the maximum size class that is a multiple of the
quantum (8 or 16 bytes, depending on architecture). quantum (8 or 16 bytes, depending on architecture).
Above this size, cacheline spacing is used for size classes. Above this size, cacheline spacing is used for size classes.
The default value is 128 bytes. The default value is 128 bytes.
.It R @roff_mag@@roff_tls@.It R
Double/halve magazine size, which approximately doubles/halves the number of @roff_mag@@roff_tls@Double/halve magazine size, which approximately
rounds in each magazine. @roff_mag@@roff_tls@doubles/halves the number of rounds in each magazine.
Magazines are used by the thread-specific caching machinery to acquire and @roff_mag@@roff_tls@Magazines are used by the thread-specific caching machinery
release objects in bulk. @roff_mag@@roff_tls@to acquire and release objects in bulk.
Increasing the magazine size decreases locking overhead, at the expense of @roff_mag@@roff_tls@Increasing the magazine size decreases locking overhead, at
increased memory usage. @roff_mag@@roff_tls@the expense of increased memory usage.
This option is not available for some configurations (non-PIC). @roff_stats@.It U
.It U @roff_stats@Generate a verbose trace log via
Generate @roff_stats@.Fn jemalloc_message
.Dq utrace @roff_stats@for all allocation operations.
entries for @roff_sysv@.It V
.Xr ktrace 1 , @roff_sysv@Attempting to allocate zero bytes will return a
for all operations. @roff_sysv@.Dv NULL
Consult the source for details on this option. @roff_sysv@pointer instead of a valid pointer.
.It V @roff_sysv@(The default behavior is to make a minimal allocation and return a
Attempting to allocate zero bytes will return a @roff_sysv@pointer to it.)
.Dv NULL @roff_sysv@This option is provided for System V compatibility.
pointer instead of @roff_sysv@@roff_xmalloc@This option is incompatible with the
a valid pointer. @roff_sysv@@roff_xmalloc@.Dq X
(The default behavior is to make a minimal allocation and return a @roff_sysv@@roff_xmalloc@option.
pointer to it.) @roff_xmalloc@.It X
This option is provided for System V compatibility. @roff_xmalloc@Rather than return failure for any allocation function, display a
This option is incompatible with the @roff_xmalloc@diagnostic message on
.Dq X @roff_xmalloc@.Dv stderr
option. @roff_xmalloc@and cause the program to drop core (using
.It X @roff_xmalloc@.Xr abort 3 ) .
Rather than return failure for any allocation function, @roff_xmalloc@This option should be set at compile time by including the
display a diagnostic message on @roff_xmalloc@following in the source code:
.Dv stderr @roff_xmalloc@.Bd -literal -offset indent
and cause the program to drop @roff_xmalloc@jemalloc_options = "X";
core (using @roff_xmalloc@.Ed
.Xr abort 3 ) . @roff_fill@.It Z
This option should be set at compile time by including the following in @roff_fill@Each byte of new memory allocated by
the source code: @roff_fill@.Fn malloc
.Bd -literal -offset indent @roff_fill@or
jemalloc_options = "X"; @roff_fill@.Fn realloc
.Ed @roff_fill@will be initialized to 0.
.It Z @roff_fill@Note that this initialization only happens once for each byte, so
Each byte of new memory allocated by @roff_fill@.Fn realloc
.Fn malloc @roff_fill@calls do not zero memory that was previously allocated.
or @roff_fill@This is intended for debugging and will impact performance
.Fn realloc @roff_fill@negatively.
will be initialized to 0.
Note that this initialization only happens once for each byte, so
.Fn realloc
calls do not zero memory that was previously allocated.
This is intended for debugging and will impact performance negatively.
.El .El
.Pp .Pp
The @roff_fill@The
.Dq J @roff_fill@.Dq J
and @roff_fill@and
.Dq Z @roff_fill@.Dq Z
options are intended for testing and debugging. @roff_fill@options are intended for testing and debugging.
An application which changes its behavior when these options are used @roff_fill@An application which changes its behavior when these options are used
is flawed. @roff_fill@is flawed.
.Sh IMPLEMENTATION NOTES .Sh IMPLEMENTATION NOTES
Traditionally, allocators have used @roff_dss@Traditionally, allocators have used
.Xr sbrk 2 @roff_dss@.Xr sbrk 2
to obtain memory, which is suboptimal for several reasons, including race @roff_dss@to obtain memory, which is suboptimal for several reasons, including
conditions, increased fragmentation, and artificial limitations on maximum @roff_dss@race conditions, increased fragmentation, and artificial limitations
usable memory. @roff_dss@on maximum usable memory.
This allocator uses both @roff_dss@This allocator uses both
.Xr sbrk 2 @roff_dss@.Xr sbrk 2
and @roff_dss@and
.Xr mmap 2 @roff_dss@.Xr mmap 2
by default, but it can be configured at run time to use only one or the other. @roff_dss@by default, but it can be configured at run time to use only one or
If resource limits are not a primary concern, the preferred configuration is @roff_dss@the other.
.Ev JEMALLOC_OPTIONS=dM
or
.Ev JEMALLOC_OPTIONS=DM .
When so configured, the
.Ar datasize
resource limit has little practical effect for typical applications; use
.Ev JEMALLOC_OPTIONS=Dm
if that is a concern.
Regardless of allocator configuration, the
.Ar vmemoryuse
resource limit can be used to bound the total virtual memory used by a
process, as described in
.Xr limits 1 .
.Pp .Pp
This allocator uses multiple arenas in order to reduce lock contention for This allocator uses multiple arenas in order to reduce lock contention for
threaded programs on multi-processor systems. threaded programs on multi-processor systems.
@ -375,13 +361,14 @@ improve performance, mainly due to reduced cache performance.
However, it may make sense to reduce the number of arenas if an application However, it may make sense to reduce the number of arenas if an application
does not make much use of the allocation functions. does not make much use of the allocation functions.
.Pp .Pp
In addition to multiple arenas, this allocator supports thread-specific @roff_mag@In addition to multiple arenas, this allocator supports
caching for small objects (smaller than one page), in order to make it @roff_mag@thread-specific caching for small objects (smaller than one page), in
possible to completely avoid synchronization for most small allocation requests. @roff_mag@order to make it possible to completely avoid synchronization for most
Such caching allows very fast allocation in the common case, but it increases @roff_mag@small allocation requests.
memory usage and fragmentation, since a bounded number of objects can remain @roff_mag@Such caching allows very fast allocation in the common case, but it
allocated in each thread cache. @roff_mag@increases memory usage and fragmentation, since a bounded number of
.Pp @roff_mag@objects can remain allocated in each thread cache.
@roff_mag@.Pp
Memory is conceptually broken into equal-sized chunks, where the chunk size is Memory is conceptually broken into equal-sized chunks, where the chunk size is
a power of two that is greater than the page size. a power of two that is greater than the page size.
Chunks are always aligned to multiples of the chunk size. Chunks are always aligned to multiples of the chunk size.
@ -406,12 +393,16 @@ determine all metadata regarding small and large allocations in constant time.
.Pp .Pp
Small objects are managed in groups by page runs. Small objects are managed in groups by page runs.
Each run maintains a bitmap that tracks which regions are in use. Each run maintains a bitmap that tracks which regions are in use.
Allocation requests that are no more than half the quantum (8 or 16, depending @roff_tiny@Allocation requests that are no more than half the quantum (8 or 16,
on architecture) are rounded up to the nearest power of two. @roff_tiny@depending on architecture) are rounded up to the nearest power of
Allocation requests that are more than half the quantum, but no more than the @roff_tiny@two.
minimum cacheline-multiple size class (see the Allocation requests that are
@roff_tiny@more than half the quantum, but
no more than the minimum cacheline-multiple size class (see the
.Dq Q .Dq Q
option) are rounded up to the nearest multiple of the quantum. option) are rounded up to the nearest multiple of the
@roff_tiny@quantum.
@roff_no_tiny@quantum (8 or 16, depending on architecture).
Allocation requests that are more than the minumum cacheline-multiple size Allocation requests that are more than the minumum cacheline-multiple size
class, but no more than the minimum subpage-multiple size class (see the class, but no more than the minimum subpage-multiple size class (see the
.Dq C .Dq C
@ -440,26 +431,26 @@ rather than the normal policy of trying to continue if at all possible.
It is probably also a good idea to recompile the program with suitable It is probably also a good idea to recompile the program with suitable
options and symbols for debugger support. options and symbols for debugger support.
.Pp .Pp
If the program starts to give unusual results, coredump or generally behave @roff_fill@If the program starts to give unusual results, coredump or generally
differently without emitting any of the messages mentioned in the next @roff_fill@behave differently without emitting any of the messages mentioned in
section, it is likely because it depends on the storage being filled with @roff_fill@the next section, it is likely because it depends on the storage
zero bytes. @roff_fill@being filled with zero bytes.
Try running it with the @roff_fill@Try running it with the
.Dq Z @roff_fill@.Dq Z
option set; @roff_fill@option set;
if that improves the situation, this diagnosis has been confirmed. @roff_fill@if that improves the situation, this diagnosis has been confirmed.
If the program still misbehaves, @roff_fill@If the program still misbehaves,
the likely problem is accessing memory outside the allocated area. @roff_fill@the likely problem is accessing memory outside the allocated area.
.Pp @roff_fill@.Pp
Alternatively, if the symptoms are not easy to reproduce, setting the @roff_fill@Alternatively, if the symptoms are not easy to reproduce, setting the
.Dq J @roff_fill@.Dq J
option may help provoke the problem. @roff_fill@option may help provoke the problem.
.Pp @roff_fill@.Pp
In truly difficult cases, the @roff_stats@In truly difficult cases, the
.Dq U @roff_stats@.Dq U
option, if supported by the kernel, can provide a detailed trace of @roff_stats@option can provide a detailed trace of all calls made to these
all calls made to these functions. @roff_stats@functions.
.Pp @roff_stats@.Pp
Unfortunately this implementation does not provide much detail about Unfortunately this implementation does not provide much detail about
the problems it detects; the performance impact for storing such information the problems it detects; the performance impact for storing such information
would be prohibitive. would be prohibitive.
@ -476,7 +467,7 @@ If the
option is set, all warnings are treated as errors. option is set, all warnings are treated as errors.
.Pp .Pp
The The
.Va _malloc_message .Va jemalloc_message
variable allows the programmer to override the function which emits variable allows the programmer to override the function which emits
the text strings forming the errors and warnings if for some reason the text strings forming the errors and warnings if for some reason
the the
@ -486,7 +477,7 @@ Please note that doing anything which tries to allocate memory in
this function is likely to result in a crash or deadlock. this function is likely to result in a crash or deadlock.
.Pp .Pp
All messages are prefixed by All messages are prefixed by
.Dq Ao Ar progname Ac Ns Li : (malloc) . .Dq <jemalloc>: .
.Sh RETURN VALUES .Sh RETURN VALUES
The The
.Fn malloc .Fn malloc
@ -564,15 +555,12 @@ on calls to these functions:
jemalloc_options = "X"; jemalloc_options = "X";
.Ed .Ed
.Sh SEE ALSO .Sh SEE ALSO
.Xr limits 1 ,
.Xr madvise 2 , .Xr madvise 2 ,
.Xr mmap 2 , .Xr mmap 2 ,
.Xr sbrk 2 , .Xr sbrk 2 ,
.Xr alloca 3 , .Xr alloca 3 ,
.Xr atexit 3 , .Xr atexit 3 ,
.Xr getpagesize 3 , .Xr getpagesize 3
.Xr memory 3 ,
.Xr posix_memalign 3
.Sh STANDARDS .Sh STANDARDS
The The
.Fn malloc , .Fn malloc ,

View File

@ -1178,8 +1178,8 @@ static bool size2bin_init_hard(void);
static unsigned malloc_ncpus(void); static unsigned malloc_ncpus(void);
static bool malloc_init_hard(void); static bool malloc_init_hard(void);
static void thread_cleanup(void *arg); static void thread_cleanup(void *arg);
void jemalloc_prefork(void); static void jemalloc_prefork(void);
void jemalloc_postfork(void); static void jemalloc_postfork(void);
/* /*
* End function prototypes. * End function prototypes.
@ -1231,9 +1231,10 @@ umax2s(uintmax_t x, char *s)
# define assert(e) do { \ # define assert(e) do { \
if (!(e)) { \ if (!(e)) { \
char line_buf[UMAX2S_BUFSIZE]; \ char line_buf[UMAX2S_BUFSIZE]; \
jemalloc_message(__FILE__, ":", umax2s(__LINE__, \ jemalloc_message("<jemalloc>: ", __FILE__, ":", \
line_buf), ": Failed assertion: "); \ umax2s(__LINE__, line_buf)); \
jemalloc_message("\"", #e, "\"\n", ""); \ jemalloc_message(": Failed assertion: ", "\"", #e, \
"\"\n"); \
abort(); \ abort(); \
} \ } \
} while (0) } while (0)
@ -1250,15 +1251,17 @@ utrace(const void *addr, size_t len)
assert(len == sizeof(malloc_utrace_t)); assert(len == sizeof(malloc_utrace_t));
if (ut->p == NULL && ut->s == 0 && ut->r == NULL) if (ut->p == NULL && ut->s == 0 && ut->r == NULL)
malloc_printf("%d x USER malloc_init()\n", getpid()); malloc_printf("<jemalloc>:utrace: %d malloc_init()\n",
getpid());
else if (ut->p == NULL && ut->r != NULL) { else if (ut->p == NULL && ut->r != NULL) {
malloc_printf("%d x USER %p = malloc(%zu)\n", getpid(), ut->r, malloc_printf("<jemalloc>:utrace: %d %p = malloc(%zu)\n",
ut->s); getpid(), ut->r, ut->s);
} else if (ut->p != NULL && ut->r != NULL) { } else if (ut->p != NULL && ut->r != NULL) {
malloc_printf("%d x USER %p = realloc(%p, %zu)\n", getpid(), malloc_printf("<jemalloc>:utrace: %d %p = realloc(%p, %zu)\n",
ut->r, ut->p, ut->s); getpid(), ut->r, ut->p, ut->s);
} else } else
malloc_printf("%d x USER free(%p)\n", getpid(), ut->p); malloc_printf("<jemalloc>:utrace: %d free(%p)\n", getpid(),
ut->p);
return (0); return (0);
} }
@ -2247,11 +2250,6 @@ choose_arena(void)
* introduces a bootstrapping issue. * introduces a bootstrapping issue.
*/ */
#ifndef NO_TLS #ifndef NO_TLS
if (isthreaded == false) {
/* Avoid the overhead of TLS for single-threaded operation. */
return (arenas[0]);
}
ret = arenas_map; ret = arenas_map;
if (ret == NULL) { if (ret == NULL) {
ret = choose_arena_hard(); ret = choose_arena_hard();
@ -3405,11 +3403,9 @@ arena_malloc_large(arena_t *arena, size_t size, bool zero)
} }
static inline void * static inline void *
arena_malloc(arena_t *arena, size_t size, bool zero) arena_malloc(size_t size, bool zero)
{ {
assert(arena != NULL);
assert(arena->magic == ARENA_MAGIC);
assert(size != 0); assert(size != 0);
assert(QUANTUM_CEILING(size) <= arena_maxclass); assert(QUANTUM_CEILING(size) <= arena_maxclass);
@ -3418,7 +3414,7 @@ arena_malloc(arena_t *arena, size_t size, bool zero)
if (opt_mag) { if (opt_mag) {
mag_rack_t *rack = mag_rack; mag_rack_t *rack = mag_rack;
if (rack == NULL) { if (rack == NULL) {
rack = mag_rack_create(arena); rack = mag_rack_create(choose_arena());
if (rack == NULL) if (rack == NULL)
return (NULL); return (NULL);
mag_rack = rack; mag_rack = rack;
@ -3427,9 +3423,9 @@ arena_malloc(arena_t *arena, size_t size, bool zero)
return (mag_rack_alloc(rack, size, zero)); return (mag_rack_alloc(rack, size, zero));
} else } else
#endif #endif
return (arena_malloc_small(arena, size, zero)); return (arena_malloc_small(choose_arena(), size, zero));
} else } else
return (arena_malloc_large(arena, size, zero)); return (arena_malloc_large(choose_arena(), size, zero));
} }
static inline void * static inline void *
@ -3439,7 +3435,7 @@ imalloc(size_t size)
assert(size != 0); assert(size != 0);
if (size <= arena_maxclass) if (size <= arena_maxclass)
return (arena_malloc(choose_arena(), size, false)); return (arena_malloc(size, false));
else else
return (huge_malloc(size, false)); return (huge_malloc(size, false));
} }
@ -3449,7 +3445,7 @@ icalloc(size_t size)
{ {
if (size <= arena_maxclass) if (size <= arena_maxclass)
return (arena_malloc(choose_arena(), size, true)); return (arena_malloc(size, true));
else else
return (huge_malloc(size, true)); return (huge_malloc(size, true));
} }
@ -3553,7 +3549,7 @@ ipalloc(size_t alignment, size_t size)
if (ceil_size <= PAGE_SIZE || (alignment <= PAGE_SIZE if (ceil_size <= PAGE_SIZE || (alignment <= PAGE_SIZE
&& ceil_size <= arena_maxclass)) && ceil_size <= arena_maxclass))
ret = arena_malloc(choose_arena(), ceil_size, false); ret = arena_malloc(ceil_size, false);
else { else {
size_t run_size; size_t run_size;
@ -4113,7 +4109,7 @@ arena_ralloc(void *ptr, size_t size, size_t oldsize)
* need to move the object. In that case, fall back to allocating new * need to move the object. In that case, fall back to allocating new
* space and copying. * space and copying.
*/ */
ret = arena_malloc(choose_arena(), size, false); ret = arena_malloc(size, false);
if (ret == NULL) if (ret == NULL)
return (NULL); return (NULL);
@ -5725,7 +5721,7 @@ thread_cleanup(void *arg)
* is threaded here. * is threaded here.
*/ */
void static void
jemalloc_prefork(void) jemalloc_prefork(void)
{ {
bool again; bool again;
@ -5773,7 +5769,7 @@ jemalloc_prefork(void)
#endif #endif
} }
void static void
jemalloc_postfork(void) jemalloc_postfork(void)
{ {
unsigned i; unsigned i;

View File

@ -28,10 +28,24 @@
******************************************************************************* *******************************************************************************
*/ */
#ifdef __cplusplus
extern "C" {
#endif
#ifndef JEMALLOC_H_
#define JEMALLOC_H_
#include "jemalloc_defs.h"
size_t malloc_usable_size(const void *ptr);
extern const char *jemalloc_options; extern const char *jemalloc_options;
extern void (*jemalloc_message)(const char *p1, const char *p2, extern void (*jemalloc_message)(const char *p1, const char *p2,
const char *p3, const char *p4); const char *p3, const char *p4);
void jemalloc_thread_cleanup(void); #endif /* JEMALLOC_H_ */
void jemalloc_prefork(void);
void jemalloc_postfork(void); #ifdef __cplusplus
};
#endif

View File

@ -28,6 +28,14 @@
******************************************************************************* *******************************************************************************
*/ */
#ifndef JEMALLOC_DEFS_H_
#define JEMALLOC_DEFS_H_
/*
* jemalloc version string.
*/
#undef JEMALLOC_VERSION
/* /*
* Hyper-threaded CPUs may need a special instruction inside spin loops in * Hyper-threaded CPUs may need a special instruction inside spin loops in
* order to yield to another virtual CPU. * order to yield to another virtual CPU.
@ -92,3 +100,5 @@
/* sizeof(void *) == 2^SIZEOF_PTR_2POW. */ /* sizeof(void *) == 2^SIZEOF_PTR_2POW. */
#undef SIZEOF_PTR_2POW #undef SIZEOF_PTR_2POW
#endif /* JEMALLOC_DEFS_H_ */