De-duplicate `tolower()`
Move the duplicated maptolower
ASCII case conversion tables to isc_ascii
where they can be shared, and replace the various hot-path tolower
loops with calls to shared and newly-optimized isc_ascii implementations. My measurements suggest it is worth keeping BIND's own tolower
table, because it is faster than <ctype.h>
(presumably because it avoids the extra indirection required for locale support).
I have done a clean-up pass over the code to use <ctype.h>
or isc_ascii
rather than ad-hoc conversions.
I have found that LLVM will autovectorize a simple tolower() copying loop, but not a comparison loop, so the portable hand-vectorized tolower8() seems to be a win for longer comparisons. For (parts of) strings less than 8 bytes long I have kept the loops simple, to make it easier for the compiler and/or CPU to make them go fast.
(The main thing that seems to be missing from vector instruction sets and compiler intrinsics for this application is a way to load / store up to len
bytes of a string into / from a register. It would make the tail end of strings much easier to deal with.)