This page contains minutes from the first Stork brainstorming session held in Gdańsk on 2019-06-11.
We went through stork requirements 1.0 and updated/clarified several aspects. Added more precise description of some requirements.
- ACTION: go through the requirements list and turn rough notes into specific questions.
This chart describes a possible data flow. Kea (or bind) can generate log entries, statistics and possibly events. Those can be exported using pull or push model to various environments, such as Naggios, Zabbix, Logstash, Prometeus, InfluxDB, ... and then fancy looking graphs can be generated using Kibana (dedicated solution for Logstash) or Grafana (can support a number of backends, including Prometeus and Influx DB).
We see customers and users interest particularly in Prometeus + Grafana pair, so this is the preferred solution, unless there are very strong technical aspects to go with a different route.
Prometeus has pull (preferred by prometeus, prometeus pulls data from monitored entities such as kea. This requires Kea to implement http server service) and push model (Kea pushes data to prometeus) with the optional ability to deploy gateway, which translates push into pull model.
The funny pictogram in upper right corner means fancy charts.
Overall Stork environment
Dislaimer: this is definitely a work in progress. Expect significant changes. The whiteboard sketch is... well, sketchy, so here's a more verbose description behind each component.
Starting roughly from top to bottom. The Stork will be a wep app, used via a browser. We will need to expose Stork UI (STUI) to have some form of plugins. Some plugins will provide additional UI that will have active elements on the browser (client) side.
The DB - Prometeus - Grafana shows general data flow.
We will need to provide authentication service. Stork 1.0 will likely have it local/internal, but in stork 2.0 we will need to be able to integrate with external/remote auth service (ldap? kerberos?).
The major part of the solution will be Stork itself. It will expose REST API. The area marked with dotted red line is presented in more detail in the next chart. Stork will have its own DB.
Much discussion went around how to communicate with BIND and Kea servers. We went through several different approaches. One of them was to have Stork AGEnt (STAG), a small agent responsible for installing, checking version and upgrading bind and kea. After much discussion, this concept was morphed into a desire to extend CA (Control Agent from Kea) to be able to provide that service. The benefits of that approach is that there are fewer entities to maintain and the overall architecture becomes easier. CA will need to be extended to be able to:
- upgrade Kea
- talk with BIND (via an API that initially talks to rndc but maybe later directly to named)
- upgrade BIND
All of the above will be done with a hook. CA supports hooks mechanism already.
Much discussion revolved around the problem of being able to talk with different Kea (and presumably BIND) versions at the same time. The initial assumption to always require only the latest Kea/BIND version was convenient, but sadly unrealistic. The same is true for requiring all Kea instances to always use config backend or even specific backend type (e.g. always mysql). Again, this would be very convenient, but it's unrealistic. We want the Stork to be deployable in current production networks and they use a variety of backends (memfile, mysql, pgsql, even cassandra).
Initially we hoped that Stork DB could be a slightly extended config backend DB and Stork would import the DB from all managed Kea instances. Sadly, this won't be possible as several instances may have conflicting data (such as each instance having subnets with subnet-id = 1).
We also discussed how to solve logging and monitoring problem. @godfryd proposed to look at LogStash, an open source solution he successfully used in his earlier project. It is able to receive vast amounts of log messages and stores them in structured format. It scales well. Can be deployed with a single instance, but clustering is available if the amount of logs is too large for a single server to handle. It has a querying language used to make structured queries (it's more powerful than a simple string search). It has its own visualization solution called Kibana (it looks somewhat similar to Grafana). A question was brought up whether it is possible (and makes sense) to store logs in Prometheus. There is some similarity between Kea logs and events (every Kea log message has its own unique identifier). This would be super useful to be able to chart graphs for certain events (e.g. a time series chart of when Kea reported ALLOCATION_FAIL warning logs).
The consensus is that the amount of data logged/reported for monitoring is significant and scales with the number of monitored servers. We don't want this data to go through Stork as it would become an instant bottleneck. We want Stork to only configure the stream of data (Stork sends "hey kea #123 (closed), send your logs to XYZ").
We also discussed Stork internal architecture. There will be a pool of worker processes that will handle incoming HTTP connections. Each process will have a request handler that will either process request immediatelly if it is possible or will put specific task in a queue for processing in background. Some tasks may be long running (such as upgrade that may take many minutes). In general browsers will timeout after 30s, so we need to make this async. Backend will be a process that is always running and conducts a number of tasks:
- periodic status polls (communicate with CA)
- install, upgrade Kea, BIND
- read/edit configuration of specific Kea/BIND server
Obviously the backend has to run all the time, not only when someone is browsing the UI.
Backend will communicate with Kea and BIND servers (actually it will talk to CA, which in turn will talk to Kea/BIND).
In the future (after 1.0) we will need to provide some way to provide plugins. They will need to be able to extend backend, the process pool and the UI.
We also listed requirements for the technology that will be selected:
- must provide easy way to implement and run unit-tests
- must provide async processing io/subprocesses/threads/etc (essential, as many operations will take a long time)
- "commonly" used tech: must be reasonably popular, easy to install on some popular platforms (likely ubuntu or centos), must have some community around it, we don't want to use some dead or obscure system