Dave Watson 0f8313659e malloc: Add a fastpath
This diff adds a fastpath that assumes size <= SC_LOOKUP_MAXCLASS, and
that we hit tcache.  If either of these is false, we fall back to
the previous codepath (renamed 'malloc_default').

Crucially, we only tail call malloc_default, and with the same kind
and number of arguments, so that both clang and gcc tail-calling
will kick in - therefore malloc() gets treated as a leaf function,
and there are *no* caller-saved registers.   Previously malloc() contained
5 caller saved registers on x64, resulting in at least 10 extra
memory-movement instructions.

In microbenchmarks this results in up to ~10% improvement in malloc()
fastpath.  In real programs, this is a ~1% CPU and latency improvement
overall.
2018-10-18 08:32:19 -07:00
..
2018-07-09 21:40:42 -07:00
2018-07-23 13:37:08 -07:00
2017-04-18 19:01:04 -07:00
2018-07-12 20:53:06 -07:00
2018-06-27 13:39:02 -07:00
2018-10-18 08:32:19 -07:00
2018-08-01 13:27:11 -07:00
2018-04-09 16:50:30 -07:00
2018-07-09 21:40:42 -07:00
2017-05-23 12:26:20 -07:00
2018-10-11 17:25:20 -07:00
2018-07-09 21:40:42 -07:00
2018-10-15 08:24:12 -07:00
2018-10-15 10:11:08 -07:00