Improve what happens with the server after an operator tries to load a broken config.
Problem summary
Recently we had a support ticket for a customer where the root cause of the interruption in service turned out to be self-inflicted -- the customer had tried to load a config to which the parser objected (because of a duplicated subnet.) This unsuccessful attempt to load a new config had the non-obvious effect of causing the server to stop listening for DHCP packets according to @tmark's diagnosis of the problem.
What should we do instead?
While it may not be possible in every case to define what ought to happen when an operator attempts to load a new config with a parse problem in it, the behavior in this instance could stand to be improved. The config load operation would have returned an error and the server did log messages about the parser fail error, e.g.:
kea4-logs.5:2020-03-14 05:47:51.882 ERROR [kea-dhcp4.dhcp4/10414] DHCP4_PARSER_FAIL failed to create or run parser for configuration element shared-networks: duplicate network 'relay-89.36.121.129' found in the configuration (:0:2859) kea4-logs.5:2020-03-14 05:48:06.686 ERROR [kea-dhcp4.dhcp4/10414] DHCP4_PARSER_FAIL failed to create or run parser for configuration element shared-networks: duplicate network 'relay-89.36.121.129' found in the configuration (:0:2859)
BUT it was not obvious to the operator, even after the fact, that their unsuccessful attempt had had the effect of causing the server to stop responding to clients.
- A preferred alternative would be to revert to the previously-working config and to continue processing requests.
- If this is not possible then at least the logging and error messages issued should be re-written to increase their level of urgency and to make it plain to the operator that their unsuccessful attempt to load a new config has been service-affecting.