For small bitmaps, a linear scan of the bitmap is slightly faster than
a tree search - bitmap_t is more compact, and there are fewer writes
since we don't have to propogate state transitions up the tree.
On x86_64 with the current settings, I'm seeing ~.5%-1% CPU improvement
in production canaries with this change.
The old tree code is left since 32bit sizes are much larger (and ffsl
smaller), and maybe the run sizes will change in the future.
This resolves#339.
Some platforms (like those using Newlib) don't have ffs/ffsl. This
commit adds a check to configure.ac for __builtin_ffsl if ffsl isn't
found. __builtin_ffsl performs the same function as ffsl, and has the
added benefit of being available on any platform utilizing
Gcc-compatible compiler.
This change does not address the used of ffs in the MALLOCX_ARENA()
macro.