Refactor code such that arena_mapbits_{large,small}_set() always
preserves the unzeroed flag, and manually manipulate the unzeroed flag
in the one case where it actually gets reset (in arena_chunk_purge()).
This fixes unzeroed preservation bugs in arena_run_split() and
arena_ralloc_large_grow(). These bugs caused large calloc() to return
non-zeroed memory under some circumstances.
Add the --enable-mremap option, and disable the use of mremap(2) by
default, for the same reason that freeing chunks via munmap(2) is
disabled by default on Linux: semi-permanent VM map fragmentation.
Further optimize arena_salloc() to only look at the binind chunk map
bits in the common case.
Add more sanity checks to arena_salloc() that detect chunk map
inconsistencies for large allocations (whether due to allocator bugs or
application bugs).
Embed the bin index for small page runs into the chunk page map, in
order to omit [...] in the following dependent load sequence:
ptr-->mapelm-->[run-->bin-->]bin_info
Move various non-critcal code out of the inlined function chain into
helper functions (tcache_event_hard(), arena_dalloc_small(), and
locking).
Theses newly added macros will be used to implement the equivalent under
MSVC. Also, move the definitions to headers, where they make more sense,
and for some, are even more useful there (e.g. malloc).
- Use the extensions autoconf finds for object and executable files.
- Remove the sorev variable, and replace SOREV definition with sorev's.
- Default to je_ prefix on win32.
Using errno on win32 doesn't quite work, because the value set in a shared
library can't be read from e.g. an executable calling the function setting
errno.
At the same time, since buferror always uses errno/GetLastError, don't pass
it.
MSVC doesn't support C99, and building as C++ to be able to use them is
dangerous, as C++ and C99 are incompatible.
Introduce a VARIABLE_ARRAY macro that either uses VLA when supported,
or alloca() otherwise. Note that using alloca() inside loops doesn't
quite work like VLAs, thus the use of VARIABLE_ARRAY there is discouraged.
It might be worth investigating ways to check whether VARIABLE_ARRAY is
used in such context at runtime in debug builds and bail out if that
happens.
MSVC doesn't support C99, and as such doesn't support designated
initialization of structs and unions. As there is never a mix of
indexed and named nodes, it is pretty straightforward to use a
different type for each.
Fix a potential deadlock that could occur during interval- and
growth-triggered heap profile dumps.
Fix an off-by-one heap profile statistics bug that could be observed in
interval- and growth-triggered heap profiles.
Fix heap profile dump filename sequence numbers (regression during
conversion to malloc_snprintf()).
Remove mmap_unaligned, which was used to heuristically decide whether to
optimistically call mmap() in such a way that could reduce the total
number of system calls. If I remember correctly, the intention of
mmap_unaligned was to avoid always executing the slow path in the
presence of ASLR. However, that reasoning seems to have been based on a
flawed understanding of how ASLR actually works. Although ASLR
apparently causes mmap() to ignore address requests, it does not cause
total placement randomness, so there is a reasonable expectation that
iterative mmap() calls will start returning chunk-aligned mappings once
the first chunk has been properly aligned.
Fix chunk_alloc_dss() to zero memory when requested.
Fix chunk_dealloc() to avoid chunk_dealloc_mmap() for dss-allocated
memory.
Fix huge_palloc() to always junk fill when requested.
Improve chunk_recycle() to report that memory is zeroed as a side effect
of pages_purge().
Fix a memory corruption bug in chunk_alloc_dss() that was due to
claiming newly allocated memory is zeroed.
Reverse order of preference between mmap() and sbrk() to prefer mmap().
Clean up management of 'zero' parameter in chunk_alloc*().
Using static memory when malloc_tsd_malloc fails means all threads share
the same wrapper and thus the same wrapped value. This defeats the purpose
of TSD.
Not setting the initialized member leads to randomly calling the cleanup
function in cases it shouldn't be called (and isn't called in other
implementations).
Change the "opt.lg_prof_sample" default from 0 to 19 (1 B to 512 KiB).
Change the "opt.prof_accum" default from true to false.
Add the "opt.prof_final" mallctl, so that "opt.prof_prefix" need not be
abused to disable final profile dumping.
Add a configure test to determine whether common mmap()/munmap()
patterns cause VM map holes, and only use munmap() to discard unused
chunks if the problem does not exist.
Unify the chunk caching for mmap and dss.
Fix options processing to limit lg_chunk to be large enough that
redzones will always fit.
Normalize arena_palloc(), chunk_alloc_mmap_slow(), and
chunk_recycle_dss() to use the same algorithm for trimming
over-allocation.
Add the ALIGNMENT_ADDR2BASE(), ALIGNMENT_ADDR2OFFSET(), and
ALIGNMENT_CEILING() macros, and use them where appropriate.
Remove the run_size_p parameter from sa2u().
Fix a potential deadlock in chunk_recycle_dss() that was introduced by
eae269036c (Add alignment support to
chunk_alloc()).
Implement Valgrind support, as well as the redzone and quarantine
features, which help Valgrind detect memory errors. Redzones are only
implemented for small objects because the changes necessary to support
redzones around large and huge objects are complicated by in-place
reallocation, to the point that it isn't clear that the maintenance
burden is worth the incremental improvement to Valgrind support.
Merge arena_salloc() and arena_salloc_demote().
Refactor i[v]salloc() to expose the 'demote' option.
Use $((...)) for math in size_classes.h rather than expr, because it is
much faster. This is not supported syntax in the classic Bourne shell,
but all modern sh implementations support it, including bash, zsh, and
ash.