Don't use atomic_add_uint64(), because it isn't available on 32-bit
platforms.
Fix forking support functions to manage all prof-related mutexes.
These regressions were introduced by
602c8e0971160e4b85b08b16cf8a2375aa24bc04 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.