Many mallctl*() end points require no locking, so push the locking down
to just the functions that need it. This is of particular import for
"thread.allocated" and "thread.deallocated", which are intended as a
low-overhead way to introspect per thread allocation activity.