... | ... | @@ -86,7 +86,8 @@ Most BIND 9 users have added BIND 9 to their existing fault monitoring systems b |
|
|
| 5.4 | Zone signing status | DNSSEC details, key information, signature validity period | | 1 |
|
|
|
| 5.5 | Zone/rr signing performance | monitoring where BIND is in signing and resigning new or updated zones, both the status and time it takes to complete the signing operation. I realize this is potentially very detailed and complicated, but think of the use case where an auth publisher has a few very large zones - how can they track their signing process? | | 1 |
|
|
|
| 5.6 | Query activity | (queries per second received and answered), with some time series so you can identify changes from the usual pattern, TOD patterns. | | 1 |
|
|
|
| 5.7 | RPZ reporting | logging of RPZ 'matches', with the name of the RPZ, name of the answer zone and action taken - rewrites, nxdomain, etc and counters (eg. 15 minute intervals). This for the purpose of proving to management that the RPZ service is worthwhile and impactful. The user wants to know how much of an impact (each RPZ zone) is having. | | 1 |
|
|
|
| 5.7 | RPZ statistics | logging of RPZ 'matches', with the name of the RPZ, name of the answer zone and action taken - rewrites, nxdomain, etc and counters (eg. 15 minute intervals). This for the purpose of proving to management that the RPZ service is worthwhile and impactful. The user wants to know how much of an impact (each RPZ zone) is having. | | 1 |
|
|
|
| 5.7 | RPZ analysis | logging of RPZ 'matches', with the name of the RPZ, name of the answer zone and action taken - rewrites, nxdomain, etc and counters (eg. 15 minute intervals). This for the purpose of proving to management that the RPZ service is worthwhile and impactful. The user wants to know how much of an impact (each RPZ zone) is having. | | 2 |
|
|
|
| 5.8 | RPZ client reporting | log of clients that are presumably infected based on their DNS requests for malware zones. | | >1 |
|
|
|
|
|
|
|
... | ... | @@ -95,13 +96,13 @@ Sophisticated DNS operators do extensive data analysis, but so much data is gene |
|
|
|
|
|
| # | Feature | Details | Feasibility | Release or GL#? |
|
|
|
| ------ | ------ | ------ | ------ | ------ |
|
|
|
| 6.1 | Query details | easily monitor the volume of queries and responses, rrtypes, response codes, by TCP vs UDP, perhaps by some response size buckets, this is a baseline function that everyone needs and these statistics should be available on a per-server basis from BIND today. Ideal if these can be displayed both per-server and aggregated across clusters of servers. | | |
|
|
|
| 6.2 | Query & Response log analysis | One use case here is drilling down to look at actual queries during a spike in one of the usual metric observed above to find out the query source, name queried for, etc, to identify the source of malicious traffic. This is the sort of data that would be ideally analyzed and discarded, because keeping a lot of it around would become expensive. One issue is that today, enabling DNSTAP on BIND requires restarting BIND. | | |
|
|
|
| 6.3 | Response latency | (average, max, min, mode - time between receiving a query and sending a response, as well as for resolvers, **whether the response came from cache or not**) (Serious providers will use test clients at various locations in the network to continuously test/audit the dns service. We should consider attempting to support that at some point in the future.) | | |
|
|
|
| 6.4 | Cache hit ratio | % of queries answered from cache (time series) | | |
|
|
|
| 6.5 | Cache aging | cache size, average ttl of records in cache, # of records pre-fetched and # of those that expired without being re-queried, top 500(?) records most frequently queried, cache cleaning (how dirty is the cache) | | |
|
|
|
| 6.6 | Cache visualization | some chart that help to visualize what sort of data is in the cache, how much is being renewed with short TTLs, how much is being prefetched, etc. the ultimate goal is to help the user optimize the cache so it is most efficient in for their purpose. How much memory is consumed. | | |
|
|
|
| 6.7 | Memory utilization | what is named's current memory allocation being used for. Esp needed by hybrid server operators (amt used for auth vs recursive) | | |
|
|
|
| 6.1 | Query details | easily monitor the volume of queries and responses, rrtypes, response codes, by TCP vs UDP, perhaps by some response size buckets, this is a baseline function that everyone needs and these statistics should be available on a per-server basis from BIND today. Include queries that are dropped. Ideal if these can be displayed both per-server and aggregated across clusters of servers. | | 1 |
|
|
|
| 6.2 | Query & Response log analysis | One use case here is drilling down to look at actual queries during a spike in one of the usual metric observed above to find out the query source, name queried for, etc, to identify the source of malicious traffic. This is the sort of data that would be ideally analyzed and discarded, because keeping a lot of it around would become expensive. One issue is that today, enabling DNSTAP on BIND requires restarting BIND. | | >1 |
|
|
|
| 6.3 | Response latency | (average, max, min, mode - time between receiving a query and sending a response, as well as for resolvers, **whether the response came from cache or not**) (Serious providers will use test clients at various locations in the network to continuously test/audit the dns service. We should consider attempting to support that at some point in the future.) | | >1 |
|
|
|
| 6.4 | Cache hit ratio | % of queries answered from cache (time series) | | 1 |
|
|
|
| 6.5 | Cache aging | cache size, average ttl of records in cache, # of records pre-fetched and # of those that expired without being re-queried, top 500(?) records most frequently queried, cache cleaning (how dirty is the cache) | | 1 |
|
|
|
| 6.6 | Cache visualization | some chart that help to visualize what sort of data is in the cache, how much is being renewed with short TTLs, how much is being prefetched, etc. the ultimate goal is to help the user optimize the cache so it is most efficient in for their purpose. How much memory is consumed. | | >1 |
|
|
|
| 6.7 | Memory utilization | what is named's current memory allocation being used for. Esp needed by hybrid server operators (amt used for auth vs recursive) | | 1 |
|
|
|
|
|
|
|
|
|
## BIND troubleshooting use cases
|
... | ... | @@ -109,13 +110,14 @@ These could be 'tools' or simply test cases. These are some tasks we want to fac |
|
|
|
|
|
| # | Problem | Details | Feasibility | Release or GL#? |
|
|
|
| ------ | ------ | ------ | ------ | ------ |
|
|
|
| 7.1 | Performance troubleshooting | What is BIND doing (while it is, eating memory, eating CPU, not responding, apparently twiddling it's thumbs or ..?) Do I need to increase any of my throttles because I'm getting close to the limits? | | |
|
|
|
| 7.2 | Cache analysis | What's in cache (by RTYPE - real entries, not the expired ones, although expired but not yet cleaned up might also be interesting | | |
|
|
|
| 7.1 | Performance troubleshooting | What is BIND doing (while it is, eating memory, eating CPU, not responding, apparently twiddling it's thumbs or ..?) Do I need to increase any of my throttles because I'm getting close to the limits? | | 1 |
|
|
|
| 7.2 | Cache analysis | What's in cache (by RTYPE - real entries, not the expired ones, although expired but not yet cleaned up might also be interesting | | 1? |
|
|
|
| 7.3 | Cache cleanup | What's expired in cache and still not cleaned up?| | |
|
|
|
| 7.4 | Memory utilization | How much memory are my auth zones occupying? How much memory is RRL using?| | |
|
|
|
| 7.5 | Throttling | Throttling features include RRL, Fetch-limits, client-quotas, TCP quotas.. Is this server being throttled by fetch-limits or is this zone being throttled by fetch-limits? so, log instances of crossing the thresholds where throttling kicks in, when you cross the threshold again on the way down. | | |
|
|
|
| 7.6 | SRTT++ | See a list of servers for a domain and the current and historical srtt values for those servers. Which server will BIND query for this domain and why. Also, which servers are EDNS capable? | | |
|
|
|
| 7.7 | Cookies | what % of clients are avoiding RRL by providing cookies | | |
|
|
|
| 7.4 | Memory utilization | How much memory are my auth zones occupying? How much memory is RRL using?| | 1 |
|
|
|
| 7.5 | Throttling | Throttling features include RRL, Fetch-limits, client-quotas, TCP quotas.. Is this server being throttled by fetch-limits or is this zone being throttled by fetch-limits? so, log instances of crossing the thresholds where throttling kicks in, when you cross the threshold again on the way down. | | 1 |
|
|
|
| 7.5.1 | Cookies | what % of clients are avoiding RRL by providing cookies | | ? |
|
|
|
| 7.6 | SRTT++ | See a list of servers for a domain and the current and historical srtt values for those servers. Which server will BIND query for this domain and why. Also, which servers are EDNS capable? | | 1|
|
|
|
|
|
|
|
|
|
## Application Infrastructure
|
|
|
Web app. OK to support limited OS for the platform
|
... | ... | |