Makes the prof sample prng use the tsd prng_state. This allows us to properly
initialize the sample interval event, without having to create tdata. As a
result, tdata will be created on demand (when a thread reaches the sample
interval bytes allocated), instead of on the first allocation.
This change suppresses tdata initialization and prof sample threshold
update in interrupting malloc calls. Interrupting calls have no need
for tdata. Delaying tdata creation aligns better with our lazy tdata
creation principle, and it also helps us gain control back from
interrupting calls more quickly and reduces any risk of delegating
tdata creation to an interrupting call.
The diff 'refactor prof accum...' moved the bytes_until_sample
subtraction before the load of tdata. If tdata is null,
tdata_get(true) will overwrite bytes_until_sample, but we
still sample the current allocation. Instead, do the subtraction
and check logic again, to keep the previous behavior.
blame-rev: 0ac524308d
generation of sub bytes_until_sample, usize; je; for x86 arch.
Subtraction is unconditional, and only flags are checked for the jump,
no extra compare is necessary. This also reduces register pressure.
This removes the tsd macros (which are used only for tsd_t in real builds). We
break up the circular dependencies involving tsd.
We also move all tsd access through getters and setters. This allows us to
assert that we only touch data when tsd is in a valid state.
We simplify the usages of the x macro trick, removing all the customizability
(get/set, init, cleanup), moving the lifetime logic to tsd_init and tsd_cleanup.
This lets us make initialization order independent of order within tsd_t.
With this change, when profiling is enabled, we avoid doing redundant rtree
lookups. Also changed dalloc_atx_t to alloc_atx_t, as it's now used on
allocation path as well (to speed up profiling).