Tomek Mrugalski · aeaf8b4e
--- a/designs/Responses-Per-Second-Statistics-for-Kea.md
+++ b/designs/Responses-Per-Second-Statistics-for-Kea.md
+# Overview
+
+Issue #252 calls for adding a Leases Per Second statistic to Kea.  This discussion
+describes the basic design to accomplish this.  After some debate, the name of
+the new value will actually be Responses Per Second or RPS.
+
+(Note this discussion uses pseudo-code that should resemble go and/or SQL to
+convey ideas only, please do not focus any lack of syntactical accuracy)
+
+The dashboard will show two values for RPS per Kea daemon, measured at different
+intervals:
+
+- interval_1 = 15 minutes
+- interval_2 = 24 hours
+
+These values could be configurable. If so, we should enforce that:
+
+- interval_1 < interval_2 
+- interval_1 > (statistics pull rate * 2)
+
+RPS is loosely calculated as:
+
+    (number of response packets received during interval) / (interval width in secs)
+
+where response packets are:
+
+    DHCPv4 = DHCPACKs
+    DHCPv6 = DHCPV6_REPLYs
+
+If at some point later, we care to add additional packet types (e.g.  DHCPOFFERs 
+and DHCPV6_ADVERTISEs) we can and the label is still meaningful
+
+If data we have for a given Kea daemon does not span an entire interval, we will 
+display the value based on the data we do have.  We could toggle the column color or 
+put an asterisk next to it, to signify that we do not yet have a full interval. 
+For example, if we have only have 12 hours of data, we could alter the 24 hour column's
+appearance.
+
+# Getting the Data Needed
+
+We will not be adding anything new to Kea to support this.  The data will be derived
+from the following existing Kea statistics:
+
+- pkt4-ack-sent (v4 servers) 
+- pkt6-reply-sent (v6 servers). 
+
+These statistics are not currently mined by Stork and so the Kea StatsPuller will
+need to be extended to retrieve and store them.  Alternatively, we could add a
+new puller if we want more individualized control. 
+
+We will need to retain the last recorded value and sample time for this statistic
+for each daemon.  We can use a map of values:
+
+```
+    type SampledValue struct {
+        Sampled_at int64  // time statistic was recorded (secs since epoch)
+        Value      int64  // e.g. value of pkt4-ack-sent or pkt6-reply-sent 
+    }
+
+    ResponsesSent := map[daemon_id]SampledValue;
+```
+
+These values are a Kea daemon's running count of how many responses it has sent
+since startup, statistics reset, or rollover (unlikely).  For now this map will
+likely only be held in memory and not persisted to storage.
+
+We will use these values along with the value pulled at each pull cycle to
+create and persist a running history of the incremental changes (aka the deltas) in 
+responses sent between consecutive statistic pulls, in a new table:
+
+```
+    ResponsesSentHistory {
+        daemon_id        - bigint
+        interval_start   - timestamp // timestamp of this interval 
+        duration         - bigint    // seconds in this interval
+        responses_sent   - bigint    // number of responses in this interval
+    }
+```
+
+This will produce one row per daemon per pull iteration.  Each row represents 
+the difference between the previous absolute value (from the map) and the newly mined 
+absolute value for a given statistic.  We also save the difference between the
+two sample times so we have a precise measure of the interval described by the row.
+
+If we assume a statistic pull rate of 60 seconds, then this will produce 1440 rows 
+per daemon. Rows can be aged off this table once they are more than interval_2 old.
+
+## On each statistic pull iteration
+
+For each Kea daemon, we do the following:
+
+1. Pull the new statistic value from the daemon:
+
+```  
+    sampled_at := time.Now()
+    value = pktX-<type>-sent from Kea getStatistic()
+```  
+
+2. Calculate the delta:
+
+```
+    // Fetch the previously recorded value and time recorded.
+    previous_sampled_at := ResponsesSent[daemon_id].sampled_at
+    previous_value := ResponsesSent[daemon_id].value
+
+    if (value > previous_value) {
+        // New value is larger, we assume we have contiguous data.
+        responses_sent = value - previous_value
+    } else {
+        // We have either Kea restart, reset, or statistic rollover. This value
+        // then represents the number packets sent since that event occurred.
+        responses_sent = value
+    }
+
+    // Calculate the time between the two samples.
+    duration := sampled_at - previous_sampled_at
+```
+
+3. Insert a new row into ResponsesSentHistory
+
+```  
+    insert (daemon_id, sampled_at, responses_sent, duration)
+```  
+
+4. Update previous values in ResponsesSent:   
+
+```  
+    ResponsesSent[daemon_id].Sampled_at = sampled_at
+    ResponsesSent[daemon_id].Value = value;
+```  
+
+After all daemons have been processed records more older than the current time - (interval_2 + interval_1) could be deleted.
+
+##  Fetching RPS for Display
+The RPS for all daemons for an interval could fetched in single select:
+ 
+```  
+    SELECT daemon_id, SUM(responses_sent) as responses, SUM(duration) as duration
+        WHERE interval_begin >= ?  AND interval_begin < ?
+        GROUP BY daemon_id;
+```  
+
+   This would produce a single row per daemon:
+
+     daemon_id,  responses, duration
+
+     RPS = responses / duration
+
+   If the duration is less than desired interval - tolerance, we can earmark the value.
+
+## Questions
+1. How do you see to reuse old records in ResponsesSentHistory instead of deleting them.
+So when new values are going to be inserted first we would look for some old record.
+If it is found then it would be used for new data. Otherwise insert would be used.
+This way we do not have to play with looking for old records and deleting them and
+then waiting for vacuum in Postgresql for reclaiming storage.
+
+Response: It is simpler and possibly faster to "add to the end" and age off "from the beginning".  This lets the table operate like a FIFO queue and the logic is straight forward. Trying to reuse existing but obsolete records would be more complicated and possibly slower. It would require more trips to the database or a stored procedure. I don't think we are looking at a volume of records that waiting for Postgresql to reclaim space is going to be an issue.  If it is then it isn't tuned properly.  I believe we should follow the KISS principle here, until or unless it proves to be insufficient.  Either way, the implementation could be changed readily enough.
+
+## Authors (please add yourself when you contribute):
+
+List of authors as of July 1st, 2020:
+* Thomas Markwalder
\ No newline at end of file