While working on #852, I noticed the prng state is atomic. This is the only atomic use of prng in all of jemalloc. Instead, use a threadlocal prng state if possible to avoid unnecessary cache line contention.