For bin-related allocation, protect data structures with bin locks rather than arena locks. Arena locks remain for run allocation/deallocation and other miscellaneous operations. Restructure statistics counters to maintain per bin allocated/nmalloc/ndalloc, but continue to provide arena-wide statistics via aggregation in the ctl code.
jemalloc is a general-purpose scalable concurrent malloc(3) implementation. The INSTALL file contains information on how to configure, build, and install jemalloc.