Rewrite prof_alloc_prep() as a cpp macro, PROF_ALLOC_PREP(), in order to
remove any doubt as to whether an additional stack frame is created.
Prior to this change, it was assumed that inlining would reduce the
total number of frames in the backtrace, but in practice behavior wasn't
completely predictable.
Create imemalign() and call it from posix_memalign(), memalign(), and
valloc(), so that all entry points require the same number of stack
frames to be ignored during backtracing.
Properly handle boundary conditions for sampled region promotion in
rallocm(). Prior to this fix, some combinations of 'size' and 'extra'
values could cause erroneous behavior. Additionally, size class
recording for promoted regions was incorrect.