For the non-TLS case (as on OS X), if the "thread.{de,}allocatedp"
mallctl was called before any allocation occurred for that thread, the
TSD was still NULL, thus putting the application at risk of
dereferencing NULL. Fix this by refactoring the initialization code,
and making it part of the conditional logic for all per thread
allocation counter accesses.
If mremap(2) is available and supports MREMAP_FIXED, use it for huge
realloc().
Initialize rtree later during bootstrapping, so that --enable-debug
--enable-dss works.
Fix a minor swap_avail stats bug.
Replace the single-character run-time flags with key/value pairs, which
can be set via the malloc_conf global, /etc/malloc.conf, and the
MALLOC_CONF environment variable.
Replace the JEMALLOC_PROF_PREFIX environment variable with the
"opt.prof_prefix" option.
Replace umax2s() with u2s().
Add the "thread.allocated" and "thread.deallocated" mallctls, which can
be used to query the total number of bytes ever allocated/deallocated by
the calling thread.
Add s2u() and sa2u(), which can be used to compute the usable size that
will result from an allocation request of a particular size/alignment.
Re-factor ipalloc() to use sa2u().
Enhance the heap profiler to trigger samples based on usable size,
rather than request size. This has a subtle, but important, impact on
the accuracy of heap sampling. For example, previous to this change,
16- and 17-byte objects were sampled at nearly the same rate, but
17-byte objects actually consume 32 bytes each. Therefore it was
possible for the sample to be somewhat skewed compared to actual memory
usage of the allocated objects.
Add test/jemalloc_test.h.in, which is processed to include
jemalloc/jemalloc@install_suffix@.h, so that test programs can include
it without worrying about the install suffix.
Add allocm(), rallocm(), sallocm(), and dallocm(), which are a
functional superset of malloc(), calloc(), posix_memalign(),
malloc_usable_size(), and free().