Configurable thresholds for alerting
Obviously one of the main purposes of a dashboard is to display warnings and alerts, indicators of degradation or failures.
As an administrator I am going to want to adjust the thresholds for these alerts so that they reflect the conditions that are specifically alarming to me - which may vary depending on the criticality of the service being monitored (e.g. is it a paid production service, an service for internal users, a test network or something like free guest wifi).
I would like Stork to assign defaults for these thresholds, enable me to adjust the default values (e.g. global to Stork), and override them on a per server basis.
thresholds we may want to enable alarming on, eventually:
-
pool utilization (high - red, approaching high - yellow) -
LPS (high, low) (Ideally what we want is variance from the usual LPS but I dk if there is any way for us to determine what is usual, given there may be quite a lot of daily and weekly variation.) -
cpu utilization (on Kea, maybe also on the db backend?) -
# of rejected leases? -
is there something we should monitor wrt the LFC, does it build up a backlog or something? -
database connection quality (delay in responses?) -
other platform factors (temperature, is that a thing we get?, is there some alarm about low available memory?) -
report when an updated package is available in the UI (likely that Stork packages will have security vulnerabilities because of web dependencies from outside Stork) - presumably when there is a new package for the same Stork version, that is due to a security issue. -
conflicts when the operator is running multiple Keas with the same address range, using a shared lease db -
ring buffer length/size to identify when over-long buffers cause cascading retries