Move small run metadata into the arena chunk header, with multiple
expected benefits:
- Lower run fragmentation due to reduced run sizes; runs are more likely
to completely drain when there are fewer total regions.
- Improved cache behavior. Prior to this change, run headers were
always page-aligned, which put extra pressure on some CPU cache sets.
The degree to which this was a problem was hardware dependent, but it
likely hurt some even for the most advanced modern hardware.
- Buffer overruns/underruns are less likely to corrupt allocator
metadata.
- Size classes between 4 KiB and 16 KiB become reasonable to support
without any special handling, and the runs are small enough that dirty
unused pages aren't a significant concern.