... | ... | @@ -19,7 +19,7 @@ |
|
|
Comments from Vicky
|
|
|
|
|
|
I think you have identified the primary issues.
|
|
|
I know these comments don't really belong here, but I didn't have another place handy to put them.
|
|
|
I know these comments don't really belong here, but I didn't have another place handy to put them.
|
|
|
|
|
|
**Screen Resolution:** (I think we need to lower our expectations here)
|
|
|
I took a poll of our own infrastructure team, and their screen resolution is all over the place. The answer is, you ask 4 ppl you get 4 very different answers.
|
... | ... | @@ -32,7 +32,7 @@ I took a poll of our own infrastructure team, and their screen resolution is all |
|
|
## Target user
|
|
|
Currently, many of our Kea users chose Kea because want maximum control and they are relatively confident of their technical skills. They are trying to save money by using open source. They are likely to have a senior engineer with good network and UNIX administration skills managing the network. This person may be the ONLY one managing Kea, and they probably have a number of other critical responsibilities. They chose Kea over existing commercial systems that have extensive management capabilities and controls. They may have used ISC DHCP previously, and are thinking they don't want to take on supporting very old software. In some cases they have struggled with the initial Kea installation and configuration, particularly with determining how to design the system to give the right clients the intended options and addresses.
|
|
|
|
|
|
Many of these are greenfield ISPs and they don't have legacy systems to integrate with. Their clients are pretty homogeneous and they have good control over what clients are used, so they think their requirements are basic. However, they are growing fast, or hope to grow fast, so they are focused on scalability and high availability. It is common to also have some idea for differentiating clients for providing tiered service to improve revenue that they hope to implement with Kea.
|
|
|
Many of these are greenfield ISPs and they don't have legacy systems to integrate with. Their clients are pretty homogeneous and they have good control over what clients are used, so they think their requirements are basic. However, they are growing fast, or hope to grow fast, so they are focused on scalability and high availability. It is common to also have some idea for differentiating clients for providing tiered service to improve revenue that they hope to implement with Kea.
|
|
|
|
|
|
Some Kea users are universities, and they may choose Kea because it has reportedly better IPv6 support than ISC DHCP. They may have a large and complicated network, but be trialing Kea in one department or location. They may have more other administrative systems (e.g. user authentication, established fault monitoring, help desk) to integrate with, but they might have more of an academic interest in Kea, and their users might be more flexible about being 'experimented on.'
|
|
|
|
... | ... | @@ -47,17 +47,17 @@ These are requirements in order of urgency, and in the order I think we should b |
|
|
* How can I get this thing to alert me when there is a problem? (check dashboard, set up grafana alerts??)
|
|
|
|
|
|
**level 2 - App/System details**
|
|
|
Network admin reports some problem, possibly connectivity or reachability.
|
|
|
Network admin reports some problem, possibly connectivity or reachability.
|
|
|
Now that there is a problem, how do I troubleshoot it?
|
|
|
- is it the app, or the server itself? (look at app and host machine in 1 panel)
|
|
|
- look at the history, when did it start? (look at logs)
|
|
|
- what do these log messages (stats, alarms) mean (check documentation)
|
|
|
- is it a connectivity problem? (ping, traceroute, look at other clients in same subnet)
|
|
|
- is it an issue with the relay, CMTS or CPE? (look at clients/lease data by relay)
|
|
|
- what other things happened at the time the problem started? (check grafana)
|
|
|
- what other things happened at the time the problem started? (check grafana)
|
|
|
- Was there a configuration change? (check last reload time)
|
|
|
- Is this intermittent or on-going?
|
|
|
- If the problem seems to be a client that can contact Kea but isn't getting a lease, why not? What pool would they hit? Is the pool exhausted?
|
|
|
- Is this intermittent or on-going?
|
|
|
- If the problem seems to be a client that can contact Kea but isn't getting a lease, why not? What pool would they hit? Is the pool exhausted?
|
|
|
- Is there an increase in NAKs? Can I see which clients are getting NAKed? Can I see the contents of the client request?
|
|
|
- Is there congestion? What is causing it? Can I see which clients are repeatedly sending discovers or requests?
|
|
|
- Given a MAC address, can I tell when that client last had a lease, and when it expired? (this kind of problem is much more likely to be pursued in an enterprise than at an ISP)
|
... | ... | @@ -71,22 +71,22 @@ Now that there is a problem, how do I troubleshoot it? |
|
|
- What parts of the system are the slowest? Is it the db, or the Kea server CPU or something else? (tbd)
|
|
|
|
|
|
**level 4 - More efficient operations**
|
|
|
* My boss would like some kind of report to see how many devices we have on the network. She would like a breakdown by device type (laptop, mobile, polycom, server, printer). (lease list, fingerprinting) She would like to see the growth in clients over time to use for projections. She would like to see the time to get a lease (responsiveness) or other random statistics.
|
|
|
* My boss would like some kind of report to see how many devices we have on the network. She would like a breakdown by device type (laptop, mobile, polycom, server, printer). (lease list, fingerprinting) She would like to see the growth in clients over time to use for projections. She would like to see the time to get a lease (responsiveness) or other random statistics.
|
|
|
* We have a process for bringing up a new POP or bringing up a new customer that we would like to integrate with Kea. This might include a custom hook, or custom provisioning step.
|
|
|
* We want to offer a new service, with longer/shorter leases, fewer/more addresses per cpe, IPv6 only, dual-stack...
|
|
|
* We hired another person to help with DHCP. Can we share the configuration tasks and keep track of the changes with this thing? (tbd)
|
|
|
* We are able to plan ahead and forsee needing to do some maintenance. Can I schedule that here? How do I gradually bleed off the traffic off a machine to take it out of service? How do I trigger the failover so I can work on the other partner in the pair? (Tbd)
|
|
|
* We are able to plan ahead and foresee needing to do some maintenance. Can I schedule that here? How do I gradually bleed off the traffic off a machine to take it out of service? How do I trigger the failover so I can work on the other partner in the pair? (Tbd)
|
|
|
* Someone screwed up the DHCP! How do I tell which of my colleagues did that? (tbd)
|
|
|
|
|
|
--------
|
|
|
|
|
|
## BIND/DNS User
|
|
|
|
|
|
* The typical BIND user who might try Stork is going to already have other tools for managing BIND. Most BIND users have nagios, zabbix or cacti or something similar for fault management. They will also have Provisioning systems vary widely, but there are many open source tools out there for zone file provisioning, as well as many home-grown tools.
|
|
|
* The typical BIND user who might try Stork is going to already have other tools for managing BIND. Most BIND users have nagios, zabbix or cacti or something similar for fault management. They will also have Provisioning systems vary widely, but there are many open source tools out there for zone file provisioning, as well as many home-grown tools.
|
|
|
* BIND users know they have to be able to mine the logs for information. Information about how to set up logs, what to look for - these are the hottest queries in the KB and the most popular webinars. However, over-logging can seriously impact BIND performance.
|
|
|
* BIND users have been asking for a supported Prometheus exporter, because it is essential in DNS to monitor the query make-up to spot DDOS attacks.
|
|
|
* BIND users have been asking for a supported Prometheus exporter, because it is essential in DNS to monitor the query make-up to spot DDOS attacks.
|
|
|
|
|
|
These users are looking for are solutions for problems they can't address with existing tools. By definition, none of these are simple problems and all will likely require work in BIND as well as in Stork. These include:
|
|
|
- troubleshooting performance problems. This requires looking at the platform, activity, and the application together. It also requires, ultimately, also looking at configuration options that affect performance.
|
|
|
- troubleshooting performance problems. This requires looking at the platform, activity, and the application together. It also requires, ultimately, also looking at configuration options that affect performance.
|
|
|
- troubleshooting issues related to what is in cache. This may require looking at data in the ADB which is not currently exposed to the user.
|
|
|
- monitoring and troubleshooting issues related to zone file updates. These can impact query performance. In case of signed zones, this may also require monitoring signing operations, which can take a long time (e.g. an hour) in case of very large zone files. |
|
|
\ No newline at end of file |
|
|
- monitoring and troubleshooting issues related to zone file updates. These can impact query performance. In case of signed zones, this may also require monitoring signing operations, which can take a long time (e.g. an hour) in case of very large zone files. |