Looking at the thread counts in our services, jemalloc's background thread
is useful, but mostly idle. Add a config option to tune down the number of threads.
We have a buffer overrun that manifests in the case where arena indices higher
than the number of CPUs are accessed before arena indices lower than the number
of CPUs. This fixes the bug and adds a test.