journal performance improvements - limit fsync()
Version: 9.19.5-dev (Development Release) id:b13d973
Observation:
In a trivial test on a laptop the UPDATE performance on a single zone is about 1/300 of the query performance on the same zone. Using tmpfs
for journal makes it as fast as ~ 1/8 of the query performance on a 4 core machine, with multi-threading 8 virtual cores. (That kind of makes sense because of the serialization?)
From a quick look I guess it is because fsync()
syscall is a real performance drag.
This issue is for discussing possibilities to improve UPDATE performance.
In the code I can see two relatively radical but simple opportunities for optimization:
-
Make fsync()
use for journal files configurable. IIUCfsync()
guards only against hardware failure/kernel crash but it should not matter from perspective of a crashing process. Once the data are in kernel buffer afterwrite()
the process can crash but the filesystem will retain the content anyway. A fsync-knob would allow users to trade performance vs. journal resiliency against system crash (but again, notnamed
crash).- It might even make sense for for secondary zones which can be re-transferred at will.
- It is not unheard-of, see https://www.knot-dns.cz/docs/latest/singlehtml/#journal-db-mode.
-
Even when fsync()
is required/configured, I think that indns_journal_compact()
it should be good enough to callfsync()
once at the very end. There should be zero risk until the new journal is renamed to the original name, so there is no need tofsync()
it before that point. That should improve performance too.
Side-note about fsync()
: It is a trap because on journaled filesystems it can force all other transactions in filesystem journal to be flushed to disk before it even starts flushing the intended file, so it might be even slower than we would like.