- Overview
- Workflow
- EventsCenter
- Event Types
- Alerts Rules Engine
- Event Log in DB
- Subscriber / Web Browser / UI
- Open questions
- Problems to be resolved
- Comments about Scope and Requirements
Overview
EventsCenter
is a module of Stork server that is responsible for collecting events
and dispatching them to subscribers
.
Event
is an information about some change in the system. It can be New Network
discovered in Kea or New Machine
that was added to Stork server.
Subscriber
is a particular web browser. It registers for given collection of events based on provided criteria. Subscriber receives stream of events from EventsCenter
. This events in web browser may result in updating some parts of UI.
Workflow
graph LR
P1[Kea Stats Puller] --> A4
P2[Kea Status Puller] --> A5
P3[Machine & App State Puller] --> A1
P3 --> A2
P3 --> A3
P4[Kea Hosts Puller] --> A6
A1([New Subnet 7]) --> EC[EventsCenter]
A2([Updated Network 2]) --> EC[EventsCenter]
A3([Removed App 1]) --> EC[EventsCenter]
A4([Updated Stats for Subnet 4]) --> EC[EventsCenter]
A5([Changed HA Status for Daemon 8]) --> EC[EventsCenter]
A6([New Host Reservation 3]) --> EC[EventsCenter]
EC --> C[Alerts Rules Engine]
C --> EC
EC --> D[Event Log in DB<br/><br/>1. event x<br/>2. event y <A><br/>3. event z]
EC -->|SSE| E1[Web Browser 1]
EC -->|SSE| E2[Web Browser 2]
EC -->|SSE| E3[Web Browser 3]
style EC fill:#f9f
style C fill:#fdf
style D fill:#dff
EventsCenter
collects events that are sent to it from various modules in Stork server. Then it checks in Alerts Rules Engine
if given event qualifies to be an alert. Then the event is stored in database, in Event Log table. The next step is broadcasting the event to subscribers. EventsCenter checks criteria submitted by subscribers and sends the event to these with matching criteria. The event is sent to web browser using Server-sent Events (SSE).
EventsCenter
EventsCenter
is a goroutine that is collecting events via input channel.
Event Types
Description | Status |
---|---|
server started/stopped | implemented |
agent added/removed | implemented |
server - agent/kea/bind communication issues | implemented |
monitoring enabled/disabled | todo |
kea config change detected | todo |
ha status change | todo |
cross threshold of pool utilization | todo |
Kea reconfiguration (config-set) | todo |
Alerts Rules Engine
Alerts Rules Engine
is a function that determines if given even should raise an alarm. It is based on rules defined by Stork administrator. A rule can be a global threshold for subnet pool utilization, etc.
Event Log in DB
Event Log
is a table in database. It stores information about events. Each entry contains:
- date of event
- description of event in text form
- reference(s) to related entities (foreign keys to other tables like e.g. Subnet)
The design of the table allows filtering entries by related entity or entity type. This allows presenting events in UI by entity type or by given entity.
Subscriber / Web Browser / UI
When user in web browser enters particular page in Stork Web UI, the code of this page should subscribe to events that this page is interested in. If this is e.g. machines page it should subscribe to events related to machines. IF this is a page of particular application it should subscribe to events connected with this application. If this is the main dashboard then it should subscribe to all events. Each page should read the events that are sent to them and update the UI accordingly.
Open questions
-
Are the rules persistent, i.e. as an admin I log in, set a rule to get alert about X, then log out, X happens, after some time I log in. Will I see a notification?
-
I think we need to manage the event types somehow. There will be a lot of them very soon. A separate table with event types? It should be easily extensible, possibly by different independent devs.
-
Don't assume that SSE will be the only way to consume alerts. It will be for a while, but later down the road we may integrate with some existing notification systems, like prometheus' altermanager or stackexchange's bosun.
Problems to be resolved
- More descriptive name for EventsCenter
- List currently envisioned events. Let's see if there are some obvious patterns.
- Propose Event structure.
Comments about Scope and Requirements
Alarming has 3 components
- the application notification that there is a fault or condition that needs attention (these can come from both Kea and Stork)
- the management process that determines whether to raise or silence an alarm, severity, alarm channels, history, etc
- the management of and communication with alerting channels, such as email, pagerduty, text, etc
We have already determined we are going to rely on an external application for #3, that is not something we want to integrate directly into Stork.
The more we look at #2, the more it seems to me that perhaps we should not build a full-featured capability in Stork either. Most organizations of any size have an established fault management application, like Nagios or Zabbix that is monitoring for alerts on all their critical applications. These can be pretty elaborate and it would be a pretty significant chunk of functionality to add to Stork. We should see how far we can go with a direct integration from Kea to these systems. We can satisfy most requirements by pointing to, e.g. Kea integration with Nagios. So, I am thinking maybe we want to build only a minimal management capability for alerts in Stork, with the expectation that Kea admins will primarily rely on some other dedicated fault management system for their alert channel.
So what Kea alerts should we display in Stork? There will be some application conditions that only Stork knows about, so only Stork can raise an alarm. These include things that are Stork application events, like monitoring and un-monitoring servers, adding and removing Stork user accounts, etc. Events that arise from a combination of looking at Kea configuration data and use data, Pool utilization primarily, might be a special category of Kea alarm that is only available in Stork. If possible, it would be ideal if Stork could be configured to forward these to whatever 'standard' alerting system we integrate Kea with.
When an alarm is raised from Kea, say like a HA pair state change - that information could flow from Kea to both Stork and an alarm system (e.g. Nagios). Given the choice, a user who may have been alerted by Nagios is going to want to view the alarm in Stork because there is a better chance they can drill down to get more information in Stork (not yet, but eventually). So, we should reflect critical Kea events in Stork even if they are also in Nagios and if we can't provide a lot of fancy threshold-crossing and severity management features in Stork.
One way to show Kea events in Stork might be by creating logging channels in Kea that Stork 'subscribes' to. We could set Stork to show the highest severity log messages.