|
|
# Overview
|
|
|
|
|
|
Issue #252 calls for adding a Leases Per Second statistic to Kea. This discussion
|
|
|
describes the basic design to accomplish this. After some debate, the name of
|
|
|
the new value will actually be Responses Per Second or RPS.
|
|
|
|
|
|
(Note this discussion uses pseudo-code that should resemble go and/or SQL to
|
|
|
convey ideas only, please do not focus any lack of syntactical accuracy)
|
|
|
|
|
|
The dashboard will show two values for RPS per Kea daemon, measured at different
|
|
|
intervals:
|
|
|
|
|
|
- interval_1 = 15 minutes
|
|
|
- interval_2 = 24 hours
|
|
|
|
|
|
These values could be configurable. If so, we should enforce that:
|
|
|
|
|
|
- interval_1 < interval_2
|
|
|
- interval_1 > (statistics pull rate * 2)
|
|
|
|
|
|
RPS is loosely calculated as:
|
|
|
|
|
|
(number of response packets received during interval) / (interval width in secs)
|
|
|
|
|
|
where response packets are:
|
|
|
|
|
|
DHCPv4 = DHCPACKs
|
|
|
DHCPv6 = DHCPV6_REPLYs
|
|
|
|
|
|
If at some point later, we care to add additional packet types (e.g. DHCPOFFERs
|
|
|
and DHCPV6_ADVERTISEs) we can and the label is still meaningful
|
|
|
|
|
|
If data we have for a given Kea daemon does not span an entire interval, we will
|
|
|
display the value based on the data we do have. We could toggle the column color or
|
|
|
put an asterisk next to it, to signify that we do not yet have a full interval.
|
|
|
For example, if we have only have 12 hours of data, we could alter the 24 hour column's
|
|
|
appearance.
|
|
|
|
|
|
# Getting the Data Needed
|
|
|
|
|
|
We will not be adding anything new to Kea to support this. The data will be derived
|
|
|
from the following existing Kea statistics:
|
|
|
|
|
|
- pkt4-ack-sent (v4 servers)
|
|
|
- pkt6-reply-sent (v6 servers).
|
|
|
|
|
|
These statistics are not currently mined by Stork and so the Kea StatsPuller will
|
|
|
need to be extended to retrieve and store them. Alternatively, we could add a
|
|
|
new puller if we want more individualized control.
|
|
|
|
|
|
We will need to retain the last recorded value and sample time for this statistic
|
|
|
for each daemon. We can use a map of values:
|
|
|
|
|
|
```
|
|
|
type SampledValue struct {
|
|
|
Sampled_at int64 // time statistic was recorded (secs since epoch)
|
|
|
Value int64 // e.g. value of pkt4-ack-sent or pkt6-reply-sent
|
|
|
}
|
|
|
|
|
|
ResponsesSent := map[daemon_id]SampledValue;
|
|
|
```
|
|
|
|
|
|
These values are a Kea daemon's running count of how many responses it has sent
|
|
|
since startup, statistics reset, or rollover (unlikely). For now this map will
|
|
|
likely only be held in memory and not persisted to storage.
|
|
|
|
|
|
We will use these values along with the value pulled at each pull cycle to
|
|
|
create and persist a running history of the incremental changes (aka the deltas) in
|
|
|
responses sent between consecutive statistic pulls, in a new table:
|
|
|
|
|
|
```
|
|
|
ResponsesSentHistory {
|
|
|
daemon_id - bigint
|
|
|
interval_start - timestamp // timestamp of this interval
|
|
|
duration - bigint // seconds in this interval
|
|
|
responses_sent - bigint // number of responses in this interval
|
|
|
}
|
|
|
```
|
|
|
|
|
|
This will produce one row per daemon per pull iteration. Each row represents
|
|
|
the difference between the previous absolute value (from the map) and the newly mined
|
|
|
absolute value for a given statistic. We also save the difference between the
|
|
|
two sample times so we have a precise measure of the interval described by the row.
|
|
|
|
|
|
If we assume a statistic pull rate of 60 seconds, then this will produce 1440 rows
|
|
|
per daemon. Rows can be aged off this table once they are more than interval_2 old.
|
|
|
|
|
|
## On each statistic pull iteration
|
|
|
|
|
|
For each Kea daemon, we do the following:
|
|
|
|
|
|
1. Pull the new statistic value from the daemon:
|
|
|
|
|
|
```
|
|
|
sampled_at := time.Now()
|
|
|
value = pktX-<type>-sent from Kea getStatistic()
|
|
|
```
|
|
|
|
|
|
2. Calculate the delta:
|
|
|
|
|
|
```
|
|
|
// Fetch the previously recorded value and time recorded.
|
|
|
previous_sampled_at := ResponsesSent[daemon_id].sampled_at
|
|
|
previous_value := ResponsesSent[daemon_id].value
|
|
|
|
|
|
if (value > previous_value) {
|
|
|
// New value is larger, we assume we have contiguous data.
|
|
|
responses_sent = value - previous_value
|
|
|
} else {
|
|
|
// We have either Kea restart, reset, or statistic rollover. This value
|
|
|
// then represents the number packets sent since that event occurred.
|
|
|
responses_sent = value
|
|
|
}
|
|
|
|
|
|
// Calculate the time between the two samples.
|
|
|
duration := sampled_at - previous_sampled_at
|
|
|
```
|
|
|
|
|
|
3. Insert a new row into ResponsesSentHistory
|
|
|
|
|
|
```
|
|
|
insert (daemon_id, sampled_at, responses_sent, duration)
|
|
|
```
|
|
|
|
|
|
4. Update previous values in ResponsesSent:
|
|
|
|
|
|
```
|
|
|
ResponsesSent[daemon_id].Sampled_at = sampled_at
|
|
|
ResponsesSent[daemon_id].Value = value;
|
|
|
```
|
|
|
|
|
|
After all daemons have been processed records more older than the current time - (interval_2 + interval_1) could be deleted.
|
|
|
|
|
|
## Fetching RPS for Display
|
|
|
The RPS for all daemons for an interval could fetched in single select:
|
|
|
|
|
|
```
|
|
|
SELECT daemon_id, SUM(responses_sent) as responses, SUM(duration) as duration
|
|
|
WHERE interval_begin >= ? AND interval_begin < ?
|
|
|
GROUP BY daemon_id;
|
|
|
```
|
|
|
|
|
|
This would produce a single row per daemon:
|
|
|
|
|
|
daemon_id, responses, duration
|
|
|
|
|
|
RPS = responses / duration
|
|
|
|
|
|
If the duration is less than desired interval - tolerance, we can earmark the value.
|
|
|
|
|
|
## Authors (please add yourself when you contribute):
|
|
|
|
|
|
List of authors as of July 1st, 2020:
|
|
|
* Thomas Markwalder |
|
|
\ No newline at end of file |