stork issueshttps://gitlab.isc.org/isc-projects/stork/-/issues2023-05-09T15:48:55Zhttps://gitlab.isc.org/isc-projects/stork/-/issues/777Log client fingerprinting data2023-05-09T15:48:55ZVicky Riskvicky@isc.orgLog client fingerprinting dataIdentifying client device types via 'fingerprinting' is a common feature of dhcp mgmt utilities. Some users who want to do this themselves are asking where they can get the raw data from Kea. They could then use one of the open source d...Identifying client device types via 'fingerprinting' is a common feature of dhcp mgmt utilities. Some users who want to do this themselves are asking where they can get the raw data from Kea. They could then use one of the open source databases such as https://fingerbank.org to determine via post processing what device type the client most likely is.
Can we log the order in which the client requests options as well as the vendor ID for use by a fingerprinting service? (e.g. options 55 and 60 from the REQUEST for DHCPv4)
This could be added to the existing forensic log hook, or we could create another hook.outstandinghttps://gitlab.isc.org/isc-projects/stork/-/issues/37Req: 1.1.1 - Import application server list2020-11-10T16:33:36ZVicky Riskvicky@isc.orgReq: 1.1.1 - Import application server listAs an admin I can bulk import a list of BIND, Kea and/or Prometheus servers in a .csv formatted file into Stork. This spreadsheet may have columns that don't match those that Stork wants, so the columns will need to be examined and some ...As an admin I can bulk import a list of BIND, Kea and/or Prometheus servers in a .csv formatted file into Stork. This spreadsheet may have columns that don't match those that Stork wants, so the columns will need to be examined and some ui provided for selecting the columns to match to the data required by Stork.outstandinghttps://gitlab.isc.org/isc-projects/stork/-/issues/44Req 2.1 - Leases list2022-09-12T13:42:40ZVicky Riskvicky@isc.orgReq 2.1 - Leases listAs an administrator I want to browse a list of leases sorted by default from most recent to oldest, with sorting by any fields in the lease. (perhaps we can limit the fields to sort on once we see what fields are available, in order to m...As an administrator I want to browse a list of leases sorted by default from most recent to oldest, with sorting by any fields in the lease. (perhaps we can limit the fields to sort on once we see what fields are available, in order to make the sorting faster with indices)
I am often going to be looking for information on a particular device or lease, so I want to search based on MAC address or IP address (omnibox search).
1. [ ] I would like to see current leases as well as historical information in the same display, *if possible*. So, for example, if I enter a Mac address, I would want to see the current lease, and prior lease(s).
1. [ ] This will be a long, multi-page list so it would be convenient if I could filter based on some column contents (such as a partial MAC or IP address)
1. [ ] It is not necessary that this be updated in real time. A list of leases that is current as of say, 5 - 15 minutes ago is adequate. It is more important that the lookup is responsive.
1. [ ] It is not necessary that the screen is refreshed automatically while this panel is active on the display, it is ok to require that I push a button to 'refresh' the list. I would prefer that there is some data in the panel when I click on it, rather than having it blank and have to wait for it to populate.
1. [ ] This should not require querying all the dhcp servers - it should come from a central lease db in Stork. I am thinking it is updated by notification from the dhcp servers, after some initialization process where it gets all the current leases.
Details
* If we can also do a reverse DNS lookup on the IP address (this can be a process triggered by the admin, it doesn't have to happen magically) to popular a hostname field, that would be good too.
* The Lease list must also include which *server* owns the lease.
* I may need to save a lot of lease history. It should be possible for me to configure how many hours or days of history I want to preserve, and at some point we should have some kind of log rotation.
* [ ] Total active leases per server
* [ ] Total active leases per service (if there is more than 1 server in the service)
* [ ] # of New leases in the past (configurable period, start with 15 minutes)
* [ ] Leases per second (could be a toggle to display this or the # of new leases in the past period). both metrics could be based on just the most recent period.
* [ ] Historic register of each MAC address seen by the system, with leases assigned, dates, times. This may be exactly the same as the forensic log, so it is fine if we just link to the default location for the forensic log.
* [ ] Ability to browse current lease list. This does not have to be real time, and can/probably should be read-only. This should work for Memfile.
* [ ] Ability to browse current lease list. This does not have to be real time, and can/probably should be read-only. This should work for lease backend.backloghttps://gitlab.isc.org/isc-projects/stork/-/issues/46Req 2.3 - Kea Degradation Canary2023-07-25T13:39:26ZVicky Riskvicky@isc.orgReq 2.3 - Kea Degradation CanaryAs an administrator, I need a clear visual indicator when a Kea server/service is becoming overloaded. This alerts me that I need to take some action to prevent further degradation or failure of the service.
As an administrator, if this...As an administrator, I need a clear visual indicator when a Kea server/service is becoming overloaded. This alerts me that I need to take some action to prevent further degradation or failure of the service.
As an administrator, if this alarm occurs frequently I would like to be able to customize the level that constitutes an alarming value.
If there is a separate panel of alerts or logged events, I would expect to see these threshold-crossing alarms included there.
It would be ideal if this is available without requiring that I install Grafana or Prometheus, as I may have a small deployment of one or two servers.
possible use cases:
- increasing `secs` reported by clients
- users with external lease db, query to see how long it takes the db to do a select to see if the db itself, or the connection to the db is degraded
- any sort of statistics about the ring buffer, to alert when the buffer is growing excessively (this might be possible with the Stork agent but not with Kea)
- something that could help people detect conflicts when they are running multiple Keas with the same address range, using a shared lease db, because these can also lead to cascading performance issues
Details
* We will need to decide what metric or combination of metrics to base this alarm condition on.
* We discussed the fact that increasing delay in responding to client requests might be an indicator of a service degradation and a leading indicator of Kea server failure.backloghttps://gitlab.isc.org/isc-projects/stork/-/issues/48Req 2.4.2 - Pool Utilization alerts2022-11-16T11:54:51ZVicky Riskvicky@isc.orgReq 2.4.2 - Pool Utilization alertsAs a user I need to quickly be able to see the utilization % (per pool), and quickly identify pools that have very high utilization. This is often done with colors or other graphical indicators to make it easy to spot pools with high uti...As a user I need to quickly be able to see the utilization % (per pool), and quickly identify pools that have very high utilization. This is often done with colors or other graphical indicators to make it easy to spot pools with high utilization with a quick visual scan.
See req 2.4 (subnets list) and 2.4.1 (displaying pool utilization) for dependencies required for this.
1. [ ] I would like to have some sort of ALARM or alert when a pool passes a threshold of utilization so this can be displayed at the server level rather than the pool level.
1. [ ] I would like to be able to configure the threshold that constitutes an alert.
1. [ ] I would expect this alert to be prominent, and included in an alerts panel so that I will notice that I need to take action or clients may be denied services.
1. [ ] I would also like to identify pools with very low utilization, because I am going to have to make a configuration change to ensure address availability and I may want to identify a low-utilization pool for a shared network or something.
* [ ] This request for an alert refers to the Stork UI. If it is possible to also have an alarm raised in Prometheus and/or Grafana, that is highly desirable because those apps will have the ability to send these alarms to external services (email, slack, pager duty)
Details
* Default pool utilization% for alarm? 80%?
* If there is any way to extrapolate from recent usage patterns to estimate how long the pool will last (e.g. you have another 24 hours until this pool will likely be exhausted) that could be cool.backloghttps://gitlab.isc.org/isc-projects/stork/-/issues/53Req 4.4 - Clients refusing offers2022-11-16T11:54:51ZVicky Riskvicky@isc.orgReq 4.4 - Clients refusing offersAs administrator I want to be able to see per server which clients are declining offers by subnets. (this is a possible indication that there are duplicate ip addresses out there)
If we are fairly sure the reason is an IP address confl...As administrator I want to be able to see per server which clients are declining offers by subnets. (this is a possible indication that there are duplicate ip addresses out there)
If we are fairly sure the reason is an IP address conflict, it would be good to label this in the UI as IP address conflicts. Perhaps later on we would also like to see offers not responded to. This is an indication of a misconfiguration and is useful for troubleshooting.backloghttps://gitlab.isc.org/isc-projects/stork/-/issues/54Req 5.1 View Zones List2021-06-01T08:44:42ZVicky Riskvicky@isc.orgReq 5.1 View Zones ListAs an administrator I would like to be able to browse a list of DNS zones that I am publishing, along with a bunch of information on the zone.
1. This is likely to be a very large table, with pages of data, so I would like to be able t...As an administrator I would like to be able to browse a list of DNS zones that I am publishing, along with a bunch of information on the zone.
1. This is likely to be a very large table, with pages of data, so I would like to be able to apply filters to make it more manageable.
1. I want to be able to accommodate up to 2M small zones, 2M RRs zone, 100 views.
1. I would like to be able to sort this by zone name, zone type, time of last update (this might be the default sort), zone size? signing status (signed/unsigned/expired?), #RRs.
1. This zone list should include 'dynamic', 'traditional', catalog, automatic, mirror, root hints, forward, stub, static stub zones.
1. I would like to know the zone type and permit filtering based on zone type.
1. I would like to search based on ... (?cnames?)
1. I would like to know which slaves are publishing that zone
1. I may know a zone name, or partial zone name and will want to know more about that zone.backlogMatthijs Mekkingmatthijs@isc.orgMatthijs Mekkingmatthijs@isc.orghttps://gitlab.isc.org/isc-projects/stork/-/issues/55Req 5.1 - Zone Transfer Impact2021-06-01T08:44:42ZVicky Riskvicky@isc.orgReq 5.1 - Zone Transfer ImpactFrom BIND GL issue #513
As an administrator I need to determine the impact of large zone updates on operations.
I may see a drop in QPS performance and want to investigate whether this was caused by a large zone transfer. I will need t...From BIND GL issue #513
As an administrator I need to determine the impact of large zone updates on operations.
I may see a drop in QPS performance and want to investigate whether this was caused by a large zone transfer. I will need to see information that will help me identify which zone, how large it is, when it was updated, so that I can see if I can adjust the configuration to ameliorate the impact of large zone transfers.
Details
* Add metrics on the size of the IXFRs e.g. min, max and average size of IXFRs
* Add the same details to the XFR log on the master that are reported on the secondary:
* transfer of 'example.com/IN' from 127.0.0.1#7753: Transfer completed: 1 messages, 14 records, 986 bytes, 0.001 secs (986000 bytes/sec). The log on the master currently only reports that the transfer started and ended.backloghttps://gitlab.isc.org/isc-projects/stork/-/issues/56Req 5.3 - View Zone Status2021-06-01T08:44:40ZVicky Riskvicky@isc.orgReq 5.3 - View Zone Status"from a user ""It would be very helpful for us to have the various zone timers exposed through the statistics channel. The information is currently available through `rndc zonestatus`, but it would be far easier for us to monitor the ser..."from a user ""It would be very helpful for us to have the various zone timers exposed through the statistics channel. The information is currently available through `rndc zonestatus`, but it would be far easier for us to monitor the servers if this were accessible through the stats channel.
Our use case would be to monitor for zones approaching expiration. We'd like to use the stats channel to pull the full list of zones with the timers in one operation, and then parse the data."""backloghttps://gitlab.isc.org/isc-projects/stork/-/issues/57Req 5.4 - Zone Signing Status2021-06-01T08:44:41ZVicky Riskvicky@isc.orgReq 5.4 - Zone Signing StatusAs an admin I want to see DNSSEC details, key information, signature validity period, when is the next key rollover, when is the next resign, and what is the zone that will be resigned next.. nsec3As an admin I want to see DNSSEC details, key information, signature validity period, when is the next key rollover, when is the next resign, and what is the zone that will be resigned next.. nsec3backloghttps://gitlab.isc.org/isc-projects/stork/-/issues/58Req 5.5 - View NTAs2023-04-11T16:19:44ZVicky Riskvicky@isc.orgReq 5.5 - View NTAsAs an administrator, I need to see what Negative trust anchors are configured. I may have help desk staff that need to be prepared to answer questions about zones that may stop validating.
Questions I have:
* What NTAs are active?
* Fo...As an administrator, I need to see what Negative trust anchors are configured. I may have help desk staff that need to be prepared to answer questions about zones that may stop validating.
Questions I have:
* What NTAs are active?
* For the NTAs configured, when do they expire?
* I also want to see any 'permanent NTAs'. (zones with = validate except.)backloghttps://gitlab.isc.org/isc-projects/stork/-/issues/60Req 5.7 - View RPZ Statistics2021-06-01T08:44:42ZVicky Riskvicky@isc.orgReq 5.7 - View RPZ StatisticsAs an administrator I need to know how much of an impact RPZ is having.
I may be either introducing RPZ for the first time, or trialing an additional RPZ feed and attempting to evaluate how many more matches are found with the addition ...As an administrator I need to know how much of an impact RPZ is having.
I may be either introducing RPZ for the first time, or trialing an additional RPZ feed and attempting to evaluate how many more matches are found with the addition of a new zone(s). I would like to be able to report the number of possible 'bad' queries blocked to management, to justify the cost of commercial RPZ feeds.
The most basic metric is a global counter (eg. 15 minute intervals) of RPZ matches. If we just have a global counter of RPZ matches, then if the user adds a new RPZ feed, they can look to see how much that number changed by.backloghttps://gitlab.isc.org/isc-projects/stork/-/issues/61Req 5.7.2 - RPZ Detail2021-06-01T08:44:42ZVicky Riskvicky@isc.orgReq 5.7.2 - RPZ DetailAs a user, I would like to know how many RPZ matches are coming from *each* RPZ zone. RPZ zones are evaluated in order they are configured, so if two zones include the same filter, the 'match' will be attributed to the first RPZ listed. ...As a user, I would like to know how many RPZ matches are coming from *each* RPZ zone. RPZ zones are evaluated in order they are configured, so if two zones include the same filter, the 'match' will be attributed to the first RPZ listed.
If we can communicate this (the order of the RPZ zones and its relationship to how many answers were blocked by each zone) in the UI that would be helpful.backloghttps://gitlab.isc.org/isc-projects/stork/-/issues/62Req 5.7.1 - RPZ Response Actions2021-06-01T08:44:41ZVicky Riskvicky@isc.orgReq 5.7.1 - RPZ Response ActionsAs a user, I would like to investigate RPZ matches to determine or estimate the type of abuse being blocked by RPZ.
I can extrapolate the type of abuse (malware, legal filtering, etc) based on the type of RPZ action.
Report statistics ...As a user, I would like to investigate RPZ matches to determine or estimate the type of abuse being blocked by RPZ.
I can extrapolate the type of abuse (malware, legal filtering, etc) based on the type of RPZ action.
Report statistics on the type of RPZ action taken (type of action, rewrites, NXDOMAIN etc.)backloghttps://gitlab.isc.org/isc-projects/stork/-/issues/65Req 6.5 - Cache Details2021-06-01T08:44:41ZVicky Riskvicky@isc.orgReq 6.5 - Cache DetailsAs a user I would like to see details on what is in the cache in order to determine why the cache hit ratio might be low. The purpose of displaying this data is to help guide me about configuration settings that could improve the cache e...As a user I would like to see details on what is in the cache in order to determine why the cache hit ratio might be low. The purpose of displaying this data is to help guide me about configuration settings that could improve the cache effectiveness.
Useful details would include
* cache size (memory, # of records)
* average ttl of records in cache (perhaps also min and max ttl?)
* breakdown by record type, status (valid vs expired)
* LRU of records pre-fetched
* LRU of records that expired without being re-queried
* top 500(?) records most frequently queried
* cache cleaning (how dirty is the cache)backloghttps://gitlab.isc.org/isc-projects/stork/-/issues/66Req 6.7 - Memory Utilization2021-06-01T08:44:42ZVicky Riskvicky@isc.orgReq 6.7 - Memory UtilizationAs a user, I would like to know what named's current memory allocation being used for.
* If I am running low on available memory, I want to identify possible options for reducing memory consumption with a configuration change.
* Alterna...As a user, I would like to know what named's current memory allocation being used for.
* If I am running low on available memory, I want to identify possible options for reducing memory consumption with a configuration change.
* Alternatively, this will help me identify 'runaway' processes that are eating memory and not freeing it as part of a troubleshooting exercise.
* When I am operating a hybrid server I need to see the amount of memory being used for auth vs recursive functions.
Some of this information may be available by querying the machine rather than the service.
We may want to review what would be presented. As an operator, I am not going to benefit from really cryptic references to processes inside BIND that I cannot control or stop. However, ISC technical support might want some long list of arcane stuff that I cannot interpret.backloghttps://gitlab.isc.org/isc-projects/stork/-/issues/67Req 7.1 - Performance Troubleshooting2021-06-01T08:44:41ZVicky Riskvicky@isc.orgReq 7.1 - Performance TroubleshootingAs a user, I am looking for information that should be flagged that may help understand what is limiting performance currently.
I am particularly concerned about maximizing performance of my resolver.
What are the critical resources I ...As a user, I am looking for information that should be flagged that may help understand what is limiting performance currently.
I am particularly concerned about maximizing performance of my resolver.
What are the critical resources I need to monitor, besides memory?
- [x] CPU
- [ ] threads
- [ ] sockets??
- [ ] TCP connections
- [ ] 'clients'?
what else?
What information is available on what is tying up these resources?
Quote from Cathy "What is BIND doing (while it is, eating memory, eating CPU, not responding, apparently twiddling it's thumbs or ..?)"backloghttps://gitlab.isc.org/isc-projects/stork/-/issues/68Req 7.2 - Throttling2021-06-01T08:44:41ZVicky Riskvicky@isc.orgReq 7.2 - ThrottlingI would like to know if I am throttling traffic based on configured limits. If so, I might want to change these limits to throttle more or less.
These limits are typically designed to protect the system from being overwhelmed in case of...I would like to know if I am throttling traffic based on configured limits. If so, I might want to change these limits to throttle more or less.
These limits are typically designed to protect the system from being overwhelmed in case of a DDOS. However, sometimes the throttles are set low enough that they impact throughput unnecessarily during normal operation.
Priorities
* Fetch-limits
* clients per query
* client-quotas
* TCP quotas
* RRL
? Is this server being throttled by fetch-limits or is this zone being throttled by fetch-limits?
Log instances of crossing the thresholds where throttling kicks in, when you cross the threshold again on the way down.
Several specific problems we would like to address are:
* https://gitlab.isc.org/isc-projects/bind9/issues/665 Add "rndc fetchlimits" command to dump currently-active ADB rate-limited servers and zones
* https://gitlab.isc.org/isc-projects/bind9/issues/915 Add ability to determine frozen zones
* https://gitlab.isc.org/isc-projects/bind9/issues/1232 [ISC-support #15166] expose zone timers (reload, refresh, expire) via stats channelbackloghttps://gitlab.isc.org/isc-projects/stork/-/issues/69Req 7.2.1 - Throttling and cookies2021-06-01T08:44:41ZVicky Riskvicky@isc.orgReq 7.2.1 - Throttling and cookiesAs an operator, I would like to know, what % of clients are avoiding RRL by providing cookies?As an operator, I would like to know, what % of clients are avoiding RRL by providing cookies?backloghttps://gitlab.isc.org/isc-projects/stork/-/issues/72Req 7.4 - Cache cleanup2021-06-01T08:44:42ZVicky Riskvicky@isc.orgReq 7.4 - Cache cleanupAs an administrator of a resolver, I want to maximize the utility of my memory allocated for cache. I need to know, what's expired in cache and still not cleaned up?As an administrator of a resolver, I want to maximize the utility of my memory allocated for cache. I need to know, what's expired in cache and still not cleaned up?backlog