This hints to the compiler that it should care more about space than CPU (among
other things). In cases where the compiler lacks profile-guided information,
this can be a substantial space savings.
For now, we mark the mallctl or atexit driven profiling and stats functions that
take up the most space.
Develop new data structure and code logic for holding profiling
related information stored in the extent that may be needed after the
extent is released, which in particular is the case for the
reallocation code path (e.g. in `rallocx()` and `xallocx()`). The
data structure is a generalization of `prof_tctx_t`: we previously
only copy out the `prof_tctx` before the extent is released, but we
may be in need of additional fields. Currently the only additional
field is the allocation time field, but there may be more fields in
the future.
The restructuring also resolved a bug: `prof_realloc()` mistakenly
passed the new `ptr` to `prof_free_sampled_object()`, but passing in
the `old_ptr` would crash because it's already been released. Now
the essential profiling information is collectively copied out early
and safely passed to `prof_free_sampled_object()` after the extent is
released.
Prof logging is conceptually seperate from core profiling, so
split it out as a module of its own. There are a few internal
functions that had to be exposed but I think it is a fair trade-off.