... | @@ -8,16 +8,16 @@ Stork is intended to provide a centralized focus for monitoring and managing BIN |
... | @@ -8,16 +8,16 @@ Stork is intended to provide a centralized focus for monitoring and managing BIN |
|
The main benefit in the first release will be aggregating and presenting selected data useful for common operational management of DHCP and DNS services and servers. In later releases we would like to also manage configuration of these servers, but we are starting with monitoring. In this initial release we will need to establish an extensible, high performance infrastructure for collecting, storing and analyzing DNS and DHCP activity data.
|
|
The main benefit in the first release will be aggregating and presenting selected data useful for common operational management of DHCP and DNS services and servers. In later releases we would like to also manage configuration of these servers, but we are starting with monitoring. In this initial release we will need to establish an extensible, high performance infrastructure for collecting, storing and analyzing DNS and DHCP activity data.
|
|
|
|
|
|
### Northbound interfaces - API and GUI
|
|
### Northbound interfaces - API and GUI
|
|
It is ideal that the northbound interfaces support both a graphical display (via an ordinary web server) and an API. When we get to the point that we are managing configurations, users will certainly want APIs to integrate with their provisioning systems. In the context of the first release, there may be a native GUI as well as an API to a data visualization tool such as Grafana, and possibly another to Nagios, or Cacti, or another fault management and or alerting system. I expect traditional BIND users will be less interested in a BIND-specific GUI, Kea users have had less time to develop their own systems and will be more interested. Kea users also have relatively more interest in a GUI to drive configuration changes because of the lack of alternatives.
|
|
It is ideal that the northbound interfaces support both a graphical display (via an ordinary web server) and an API. When we get to the point that we are managing configurations, users will certainly want APIs to integrate with their provisioning systems. In the context of the first release, there may be a native GUI as well as an API to a data visualization tool such as Grafana, and possibly another to Nagios, or Cacti, or another fault management and or alerting system. I expect traditional BIND users will be less interested in a BIND-specific GUI, Kea users have had less time to develop their own systems and will be more interested. Kea users also have relatively more interest in a GUI to drive configuration changes because of the lack of alternatives.
|
|
|
|
|
|
### Minimum useful product
|
|
### Minimum useful product
|
|
* **Kea users** - Anterius alternative, with pool utilization monitoring at a minimum.
|
|
* **Kea users** - Anterius alternative, with pool utilization monitoring at a minimum.
|
|
* **BIND users** - solve one or more use cases wrt metrics not easily accomplished with existing tools. Either performance tuning, zone update timing, zone signing timings or cache management.
|
|
* **BIND users** - solve one or more use cases wrt metrics not easily accomplished with existing tools. Either performance tuning, zone update timing, zone signing timings or cache management.
|
|
|
|
|
|
## Glossary
|
|
## Glossary
|
|
|
|
|
|
* Roles: admin (can do all possible actions), user (has limited access)
|
|
* Roles: admin (can do all possible actions), user (has limited access)
|
|
*
|
|
*
|
|
|
|
|
|
## Common Requirements
|
|
## Common Requirements
|
|
|
|
|
... | @@ -33,11 +33,11 @@ It is ideal that the northbound interfaces support both a graphical display (via |
... | @@ -33,11 +33,11 @@ It is ideal that the northbound interfaces support both a graphical display (via |
|
| 1.9 | Import application server list | The user has an inventory of BIND and Kea servers in a spreadsheet and like to import this into Stork without retyping it. This spreadsheet will have columns that don't match those that Stork wants, so the columns will need to be examined. | | 1 |
|
|
| 1.9 | Import application server list | The user has an inventory of BIND and Kea servers in a spreadsheet and like to import this into Stork without retyping it. This spreadsheet will have columns that don't match those that Stork wants, so the columns will need to be examined. | | 1 |
|
|
|
|
|
|
|
|
|
|
## Kea Monitoring
|
|
## Kea Monitoring
|
|
This is a bit more than 'monitoring' because it also requires some reading and analysis of configuration data (pools, host reservations), but it is all still read-only.
|
|
This is a bit more than 'monitoring' because it also requires some reading and analysis of configuration data (pools, host reservations), but it is all still read-only.
|
|
|
|
|
|
| # | Feature | Details | Feasibility | Milestone |
|
|
| # | Feature | Details | Feasibility | Milestone |
|
|
| ------ | ------ | ------ | ------ | ------ |
|
|
| ------ | ------ | ------ | ------ | ------ |
|
|
| 2.1 | Leases list| human-readable list of leases sorted by default from most recent to oldest, with sorting by any fields in the lease, search based on MAC address or IP address. Lease database must also include which server owns the lease. If we can also do a reverse DNS lookup on the IP address (this can be a process triggered by the admin, it doesn't have to happen magically) to popular a hostname field, that would be good too. This should not require querying all the dhcp servers - it should come from a central lease db in Stork. I am thinking it is updated by notification from the dhcp servers, after some initialization process where it gets all the current leases. | | 1 |
|
|
| 2.1 | Leases list| human-readable list of leases sorted by default from most recent to oldest, with sorting by any fields in the lease, search based on MAC address or IP address. Lease database must also include which server owns the lease. If we can also do a reverse DNS lookup on the IP address (this can be a process triggered by the admin, it doesn't have to happen magically) to popular a hostname field, that would be good too. This should not require querying all the dhcp servers - it should come from a central lease db in Stork. I am thinking it is updated by notification from the dhcp servers, after some initialization process where it gets all the current leases. | | 1 |
|
|
| 2.2 | Hosts list| human-readable list of host reservations, with sorting by IP, date assigned, host name. Show if the lease has actually been requested/assigned. Perhaps pxeboot file option value? hostname option value? | | 1 |
|
|
| 2.2 | Hosts list| human-readable list of host reservations, with sorting by IP, date assigned, host name. Show if the lease has actually been requested/assigned. Perhaps pxeboot file option value? hostname option value? | | 1 |
|
|
| 2.3 | Kea response times | It needs to be possible for example, to see if there is a backlog of requests building up that are unfilled as an indicator the Kea server is becoming overloaded. | | 1 |
|
|
| 2.3 | Kea response times | It needs to be possible for example, to see if there is a backlog of requests building up that are unfilled as an indicator the Kea server is becoming overloaded. | | 1 |
|
... | @@ -62,11 +62,11 @@ This is a list of things that are not strictly monitoring. Putting them on a sep |
... | @@ -62,11 +62,11 @@ This is a list of things that are not strictly monitoring. Putting them on a sep |
|
| 4.4 | Clients refusing offers | See all addresses declined by clients - troubleshooting | | 1 |
|
|
| 4.4 | Clients refusing offers | See all addresses declined by clients - troubleshooting | | 1 |
|
|
| 4.5 | | | | |
|
|
| 4.5 | | | | |
|
|
|
|
|
|
## BIND Status and Activity
|
|
## BIND Status and Activity
|
|
Most BIND 9 users have added BIND 9 to their existing fault monitoring systems by now. What is lacking is any integrated way to manage application + server performance together, and any way to view the status of events that are not queries or responses, such as interactions between servers (IXFR/AXFRs), journal updates, signing operations and the like.
|
|
Most BIND 9 users have added BIND 9 to their existing fault monitoring systems by now. What is lacking is any integrated way to manage application + server performance together, and any way to view the status of events that are not queries or responses, such as interactions between servers (IXFR/AXFRs), journal updates, signing operations and the like.
|
|
|
|
|
|
| # | Feature | Details | Feasibility | Milestone |
|
|
| # | Feature | Details | Feasibility | Milestone |
|
|
| ------ | ------ | ------ | ------ | ------ |
|
|
| ------ | ------ | ------ | ------ | ------ |
|
|
| 5.1 | Zone list | human-readable list of zones, sortable by zone name, time of last update (this might be the default sort), zone size? signing status (signed/unsigned/expired?), #RRs. 'dynamic' or 'traditional' zone files | | 1 |
|
|
| 5.1 | Zone list | human-readable list of zones, sortable by zone name, time of last update (this might be the default sort), zone size? signing status (signed/unsigned/expired?), #RRs. 'dynamic' or 'traditional' zone files | | 1 |
|
|
| 5.4 | Zone signing status | DNSSEC details, key information, signature validity period | | 1 |
|
|
| 5.4 | Zone signing status | DNSSEC details, key information, signature validity period | | 1 |
|
|
| 5.5 | Zone/rr signing performance | monitoring where BIND is in signing and resigning new or updated zones, both the status and time it takes to complete the signing operation. I realize this is potentially very detailed and complicated, but think of the use case where an auth publisher has a few very large zones - how can they track their signing process? | | 1 |
|
|
| 5.5 | Zone/rr signing performance | monitoring where BIND is in signing and resigning new or updated zones, both the status and time it takes to complete the signing operation. I realize this is potentially very detailed and complicated, but think of the use case where an auth publisher has a few very large zones - how can they track their signing process? | | 1 |
|
... | @@ -75,10 +75,10 @@ Most BIND 9 users have added BIND 9 to their existing fault monitoring systems b |
... | @@ -75,10 +75,10 @@ Most BIND 9 users have added BIND 9 to their existing fault monitoring systems b |
|
|
|
|
|
|
|
|
|
## BIND Performance Details
|
|
## BIND Performance Details
|
|
Two problems that operators want to address: how can I improve performance (and improving cache utilization is one way to improve performance) and what is my memory being used for?
|
|
Two problems that operators want to address: how can I improve performance (and improving cache utilization is one way to improve performance) and what is my memory being used for?
|
|
|
|
|
|
| # | Feature | Details | Feasibility | Milestone |
|
|
| # | Feature | Details | Feasibility | Milestone |
|
|
| ------ | ------ | ------ | ------ | ------ |
|
|
| ------ | ------ | ------ | ------ | ------ |
|
|
| 6.1 | Query details | easily monitor the volume of queries and responses, rrtypes, response codes, by TCP vs UDP, perhaps by some response size buckets, this is a baseline function that everyone needs and these statistics should be available on a per-server basis from BIND today. Include queries that are dropped. Ideal if these can be displayed both per-server and aggregated across clusters of servers. | | 1 |
|
|
| 6.1 | Query details | easily monitor the volume of queries and responses, rrtypes, response codes, by TCP vs UDP, perhaps by some response size buckets, this is a baseline function that everyone needs and these statistics should be available on a per-server basis from BIND today. Include queries that are dropped. Ideal if these can be displayed both per-server and aggregated across clusters of servers. | | 1 |
|
|
| 6.4 | Cache hit ratio | % of queries answered from cache (time series) | | 1 |
|
|
| 6.4 | Cache hit ratio | % of queries answered from cache (time series) | | 1 |
|
|
| 6.5 | Cache aging | cache size, average ttl of records in cache, # of records pre-fetched and # of those that expired without being re-queried, top 500(?) records most frequently queried, cache cleaning (how dirty is the cache) | | 1 |
|
|
| 6.5 | Cache aging | cache size, average ttl of records in cache, # of records pre-fetched and # of those that expired without being re-queried, top 500(?) records most frequently queried, cache cleaning (how dirty is the cache) | | 1 |
|
... | @@ -89,7 +89,7 @@ Two problems that operators want to address: how can I improve performance (and |
... | @@ -89,7 +89,7 @@ Two problems that operators want to address: how can I improve performance (and |
|
These could be 'tools' or simply test cases. These are some tasks we want to facilitate.
|
|
These could be 'tools' or simply test cases. These are some tasks we want to facilitate.
|
|
|
|
|
|
| # | Problem | Details | Feasibility | Milestone |
|
|
| # | Problem | Details | Feasibility | Milestone |
|
|
| ------ | ------ | ------ | ------ | ------ |
|
|
| ------ | ------ | ------ | ------ | ------ |
|
|
| 7.1 | Performance troubleshooting | What is BIND doing (while it is, eating memory, eating CPU, not responding, apparently twiddling it's thumbs or ..?) Do I need to increase any of my throttles because I'm getting close to the limits? | | 1 |
|
|
| 7.1 | Performance troubleshooting | What is BIND doing (while it is, eating memory, eating CPU, not responding, apparently twiddling it's thumbs or ..?) Do I need to increase any of my throttles because I'm getting close to the limits? | | 1 |
|
|
| 7.2 | Cache analysis | What's in cache (by RTYPE - real entries, not the expired ones, although expired but not yet cleaned up might also be interesting | | 1? |
|
|
| 7.2 | Cache analysis | What's in cache (by RTYPE - real entries, not the expired ones, although expired but not yet cleaned up might also be interesting | | 1? |
|
|
| 7.3 | Cache cleanup | What's expired in cache and still not cleaned up?| | |
|
|
| 7.3 | Cache cleanup | What's expired in cache and still not cleaned up?| | |
|
... | @@ -103,7 +103,7 @@ These could be 'tools' or simply test cases. These are some tasks we want to fac |
... | @@ -103,7 +103,7 @@ These could be 'tools' or simply test cases. These are some tasks we want to fac |
|
Web app. OK to support limited OS for the platform
|
|
Web app. OK to support limited OS for the platform
|
|
|
|
|
|
| # | Feature | Details | Feasibility | Milestone |
|
|
| # | Feature | Details | Feasibility | Milestone |
|
|
| ------ | ------ | ------ | ------ | ------ |
|
|
| ------ | ------ | ------ | ------ | ------ |
|
|
| 10.1 | Installation | We definitely are going to want a package... with all our dependencies, we may need a SCL approach.
|
|
| 10.1 | Installation | We definitely are going to want a package... with all our dependencies, we may need a SCL approach.
|
|
| | |
|
|
| | |
|
|
| 10.2 | User authentication | local authentication is adequate for 1.0, later versions will require network authentication | | |
|
|
| 10.2 | User authentication | local authentication is adequate for 1.0, later versions will require network authentication | | |
|
... | @@ -120,17 +120,17 @@ Web app. OK to support limited OS for the platform |
... | @@ -120,17 +120,17 @@ Web app. OK to support limited OS for the platform |
|
* DNS Activity - time series data showing queries per second
|
|
* DNS Activity - time series data showing queries per second
|
|
* Zone list (name, class, signer status) - would be nice to have visual indicator of signer status
|
|
* Zone list (name, class, signer status) - would be nice to have visual indicator of signer status
|
|
* DNSSEC panel - NTAs
|
|
* DNSSEC panel - NTAs
|
|
* RPZ stats
|
|
* RPZ stats
|
|
* Performance related information - drill down per server - QPS. leases per second, memory, cache, button to go view logs
|
|
* Performance related information - drill down per server - QPS. leases per second, memory, cache, button to go view logs
|
|
* Admin page - view, add, remove users, groups, manage user permissions
|
|
* Admin page - view, add, remove users, groups, manage user permissions
|
|
* Alerts list -
|
|
* Alerts list -
|
|
* We will have a list of guidelines for accessibility. We must be accessible to color blindness at a minimum and we should also be accessible to people with vision impairments as well.
|
|
* We will have a list of guidelines for accessibility. We must be accessible to color blindness at a minimum and we should also be accessible to people with vision impairments as well.
|
|
|
|
|
|
|
|
|
|
## Significant Questions/Issues
|
|
## Significant Questions/Issues
|
|
| # | Question | Response | Date resolved |
|
|
| # | Question | Response | Date resolved |
|
|
| ------ | ------ | ------ | ------ |
|
|
| ------ | ------ | ------ | ------ |
|
|
| A | OSes supported | FreeBSD 12 and Ubuntu 18.04<br/>*more coming in milestone 2* | Oct 7, 2019 |
|
|
| A | OSes supported | FreeBSD 12 and Ubuntu 18.04<br/>*more coming in milestone 2* | Oct 7, 2019 |
|
|
| B | Docker | Supported, but optional. | Oct 9, 2019 |
|
|
| B | Docker | Supported, but optional. | Oct 9, 2019 |
|
|
| B | Does Stork support Kea/BIND built and installed by hand? Or does it require to be installed Stork way only? | | |
|
|
| B | Does Stork support Kea/BIND built and installed by hand? Or does it require to be installed Stork way only? | | |
|
|
| | Product naming | | |
|
|
| | Product naming | | |
|
... | @@ -140,4 +140,4 @@ Web app. OK to support limited OS for the platform |
... | @@ -140,4 +140,4 @@ Web app. OK to support limited OS for the platform |
|
|
|
|
|
Glossary:
|
|
Glossary:
|
|
- Application server: a machine with running Kea or BIND (or MySQL or PostgreSQL or Cassandra, in use by Kea)
|
|
- Application server: a machine with running Kea or BIND (or MySQL or PostgreSQL or Cassandra, in use by Kea)
|
|
- TOD pattern: Time of Day pattern |
|
- TOD pattern: Time of Day pattern |
|
\ No newline at end of file |
|
|