While calculating the number of stashed pointers, multiple variables
potentially modified by a concurrent thread were used for the
calculation. This led to some inconsistencies, correctly detected by
the assertions. The change eliminates some possible inconsistencies by
using unmodified variables and only once a concurrently modified one.
The assertions are omitted for the cases where we acknowledge potential
inconsistencies too.
These jobs take about 20 minutes to complete. We don't want to enable
them until we switch to unlimited concurrency plan, otherwise the builds
will take way too long.
Implement the generation of Travis jobs for Windows. Currently, the
generated jobs replicate Appveyor setup and complete successfully. There
is support for MinGW GCC and MSVC compilers as well as 64 and 32 bit
compilation. Linux and MacOS jobs behave identically, but some
environment variables change - CROSS_COMPILE_32BIT=yes is added for
builds with cross compilation, empty COMPILER_FLAGS are not set anymore.
The option makes the process to exit with error code 1 if a memory leak
is detected. This is useful for implementing automated tools that rely
on leak detection.
To avoid potential issues with removing unintended files after 'make
uninstall', spaces are no longer allowed in install suffix. It's worth
mentioning, that with GNU Make on Linux spaces in install suffix didn't
work anyway, leading to errors in the Makefile. But being verbose
about this restriction makes it more transparent for the developers.
With DSS as primary, the default merge impl will (correctly) decline to merge
when one of the extent is non-dss. The error path should tolerate the
not-merged extent being in a merging state.
Under high concurrency / heavy test load (e.g. using run_tests.sh), the
background thread may not get scheduled for a longer period of time. Retry 100
times max before bailing out.
On deallocation, sampled pointers (specially aligned) get junked and stashed
into tcache (to prevent immediate reuse). The expected behavior is to have
read-after-free corrupted and stopped by the junk-filling, while
write-after-free is checked when flushing the stashed pointers.
Many profiling related tests make assumptions on the profiling settings,
e.g. opt_prof is off by default, and prof_active is default on when opt_prof is
on. However the default settings can be changed via --with-malloc-conf at build
time. Fixing the tests by adding the assumed settings explicitly.
Also refactor the handling of the non-deterministic case. Notably allow the
case with narenas set to proceed w/o warnings, to not affect existing valid use
cases.
nstime module guarantees monotonic clock update within a single nstime_t. This
means, if two separate nstime_t variables are read and updated separately,
nstime_subtract between them may result in underflow. Fixed by switching to the
time since utility provided by nstime.
Determinitic number of CPUs is important for percpu arena to work
correctly, since it uses cpu index - sched_getcpu(), and if it will
greater then number of CPUs bad thing will happen, or assertion will be
failed in debug build:
<jemalloc>: ../contrib/jemalloc/src/jemalloc.c:321: Failed assertion: "ind <= narenas_total_get()"
Aborted (core dumped)
Number of CPUs can be obtained from the following places:
- sched_getaffinity()
- sysconf(_SC_NPROCESSORS_ONLN)
- sysconf(_SC_NPROCESSORS_CONF)
For the sched_getaffinity() you may simply use taskset(1) to run program
on a different cpu, and in case it will be not first, percpu will work
incorrectly, i.e.:
$ taskset --cpu-list $(( $(getconf _NPROCESSORS_ONLN)-1 )) <your_program>
_SC_NPROCESSORS_ONLN uses /sys/devices/system/cpu/online, LXD/LXC
virtualize /sys/devices/system/cpu/online file [1], and so when you run
container with limited limits.cpus it will bind randomly selected CPU to
it
[1]: https://github.com/lxc/lxcfs/issues/301
_SC_NPROCESSORS_CONF uses /sys/devices/system/cpu/cpu*, and AFAIK nobody
playing with dentries there.
So if all three of these are equal, percpu arenas should work correctly.
And a small note regardless _SC_NPROCESSORS_ONLN/_SC_NPROCESSORS_CONF,
musl uses sched_getaffinity() for both. So this will also increase the
entropy.
Also note, that you can check is percpu arena really applied using
abort_conf:true.
Refs: https://github.com/jemalloc/jemalloc/pull/1939
Refs: https://github.com/ClickHouse/ClickHouse/issues/32806
v2: move malloc_cpu_count_is_deterministic() into
malloc_init_hard_recursible() since _SC_NPROCESSORS_CONF does
allocations for readdir()
v3:
- mark cpu_count_is_deterministic static
- check only if percpu arena is enabled
- check narenas
Currently used only for guarding purposes, the hint is used to determine
if the allocation is supposed to be frequently reused. For example, it
might urge the allocator to ensure the allocation is cached.
While initially this file contained helper functions for one particular
test, now its usage spread across different test files. Purpose has
shifted towards a collection of handy arena ctl wrappers.
With prof enabled, number of page aligned allocations doesn't match the
number of slab "ends" because prof allocations skew the addresses. It
leads to 'pages' array overflow and hard to debug failures.