The existing checks are good at finding such issues (on tcache flush), but not
so good at pinpointing them. Debug mode can find them, but sometimes debug mode
slows down a program so much that hard-to-hit bugs can take a long time to
crash.
This commit adds functionality to keep programs mostly on their fast paths,
while also checking every sized delete argument they get.
This gives more accurate attribution of bytes and counts to stack traces,
without introducing backwards incompatibilities in heap-profile parsing tools.
We track the ideal reported (to the end user) number of bytes more carefully
inside core jemalloc. When dumping heap profiles, insteading of outputting our
counts directly, we output counts that will cause parsing tools to give a result
close to the value we want.
We retain the old version as an opt setting, to let users who are tracking
values on a per-component basis to keep their metrics stable until they decide
to switch.
This can be useful when you know you want to override some lower-priority
configuration setting with its default value, but don't know what that value
would be.
I.e. set dopts->zero early on if opt.zero is true, rather than leaving it set by
the entry-point function (malloc, calloc, etc.) and then memsetting. This
avoids situations where we zero once in the large-alloc pathway and then again
via memset.
This lets us put more allocations on an "almost as fast" path after a flush.
This results in around a 4% reduction in malloc cycles in prod workloads
(corresponding to about a 0.1% reduction in overall cycles).
This is debug only and we keep it off the fast path. Moving it here simplifies
the internal logic.
This never tries to junk on regions that were shrunk via xallocx. I think this
is fine for two reasons:
- The shrunk-with-xallocx case is rare.
- We don't always do that anyway before this diff (it depends on the opt
settings and extent hooks in effect).
Make the event module to accept two event types, and pass around the event
context. Use bytes-based events to trigger tcache GC on deallocation, and get
rid of the tcache ticker.
Add options stats_interval and stats_interval_opts to allow interval based stats
printing. This provides an easy way to collect stats without code changes,
because opt.stats_print may not work (some binaries never exit).