Thomas Markwalder · 010279b7
--- a/designs/Responses-Per-Second-Statistics-for-Kea.md
+++ b/designs/Responses-Per-Second-Statistics-for-Kea.md
+# Overview
+
+Issue #252 calls for adding a Leases Per Second statistic to Kea.  This discussion
+describes the basic design to accomplish this.  After some debate, the name of
+the new value will actually be Responses Per Second or RPS.
+
+(Note this discussion uses pseudo-code that should resemble go and/or SQL to
+convey ideas only, please do not focus any lack of syntactical accuracy)
+
+The dashboard will show two values for RPS per Kea daemon, measured at different
+intervals:
+
+- interval_1 = 15 minutes
+- interval_2 = 24 hours
+
+These values could be configurable. If so, we should enforce that:
+
+- interval_1 < interval_2 
+- interval_1 > (statistics pull rate * 2)
+
+RPS is loosely calculated as:
+
+    (number of response packets received during interval) / (interval width in secs)
+
+where response packets are:
+
+    DHCPv4 = DHCPACKs
+    DHCPv6 = DHCPV6_REPLYs
+
+If at some point later, we care to add additional packet types (e.g.  DHCPOFFERs 
+and DHCPV6_ADVERTISEs) we can and the label is still meaningful
+
+If data we have for a given Kea daemon does not span an entire interval, we will 
+display the value based on the data we do have.  We could toggle the column color or 
+put an asterisk next to it, to signify that we do not yet have a full interval. 
+For example, if we have only have 12 hours of data, we could alter the 24 hour column's
+appearance.
+
+# Getting the Data Needed
+
+We will not be adding anything new to Kea to support this.  The data will be derived
+from the following existing Kea statistics:
+
+- pkt4-ack-sent (v4 servers) 
+- pkt6-reply-sent (v6 servers). 
+
+These statistics are not currently mined by Stork and so the Kea StatsPuller will
+need to be extended to retrieve and store them.  Alternatively, we could add a
+new puller if we want more individualized control. 
+
+We will need to retain the last recorded value and sample time for this statistic
+for each daemon.  We can use a map of values:
+
+```
+    type SampledValue struct {
+        Sampled_at int64  // time statistic was recorded (secs since epoch)
+        Value      int64  // e.g. value of pkt4-ack-sent or pkt6-reply-sent 
+    }
+
+    ResponsesSent := map[daemon_id]SampledValue;
+```
+
+These values are a Kea daemon's running count of how many responses it has sent
+since startup, statistics reset, or rollover (unlikely).  For now this map will
+likely only be held in memory and not persisted to storage.
+
+We will use these values along with the value pulled at each pull cycle to
+create and persist a running history of the incremental changes (aka the deltas) in 
+responses sent between consecutive statistic pulls, in a new table:
+
+```
+    ResponsesSentHistory {
+        daemon_id        - bigint
+        interval_start   - timestamp // timestamp of this interval 
+        duration         - bigint    // seconds in this interval
+        responses_sent   - bigint    // number of responses in this interval
+    }
+```
+
+This will produce one row per daemon per pull iteration.  Each row represents 
+the difference between the previous absolute value (from the map) and the newly mined 
+absolute value for a given statistic.  We also save the difference between the
+two sample times so we have a precise measure of the interval described by the row.
+
+If we assume a statistic pull rate of 60 seconds, then this will produce 1440 rows 
+per daemon. Rows can be aged off this table once they are more than interval_2 old.
+
+## On each statistic pull iteration
+
+For each Kea daemon, we do the following:
+
+1. Pull the new statistic value from the daemon:
+
+```  
+    sampled_at := time.Now()
+    value = pktX-<type>-sent from Kea getStatistic()
+```  
+
+2. Calculate the delta:
+
+```
+    // Fetch the previously recorded value and time recorded.
+    previous_sampled_at := ResponsesSent[daemon_id].sampled_at
+    previous_value := ResponsesSent[daemon_id].value
+
+    if (value > previous_value) {
+        // New value is larger, we assume we have contiguous data.
+        responses_sent = value - previous_value
+    } else {
+        // We have either Kea restart, reset, or statistic rollover. This value
+        // then represents the number packets sent since that event occurred.
+        responses_sent = value
+    }
+
+    // Calculate the time between the two samples.
+    duration := sampled_at - previous_sampled_at
+```
+
+3. Insert a new row into ResponsesSentHistory
+
+```  
+    insert (daemon_id, sampled_at, responses_sent, duration)
+```  
+
+4. Update previous values in ResponsesSent:   
+
+```  
+    ResponsesSent[daemon_id].Sampled_at = sampled_at
+    ResponsesSent[daemon_id].Value = value;
+```  
+
+After all daemons have been processed records more older than the current time - (interval_2 + interval_1) could be deleted.
+
+##  Fetching RPS for Display
+The RPS for all daemons for an interval could fetched in single select:
+ 
+```  
+    SELECT daemon_id, SUM(responses_sent) as responses, SUM(duration) as duration
+        WHERE interval_begin >= ?  AND interval_begin < ?
+        GROUP BY daemon_id;
+```  
+
+   This would produce a single row per daemon:
+
+     daemon_id,  responses, duration
+
+     RPS = responses / duration
+
+   If the duration is less than desired interval - tolerance, we can earmark the value.
+
+## Authors (please add yourself when you contribute):
+
+List of authors as of July 1st, 2020:
+* Thomas Markwalder
\ No newline at end of file