1) make sure background thread 0 is always created; and 2) fix synchronization between thread 0 and the control thread.
We have a buffer overrun that manifests in the case where arena indices higher than the number of CPUs are accessed before arena indices lower than the number of CPUs. This fixes the bug and adds a test.