Michal Nowikowski · 8e0f1292
--- a/Designs/Stork-Base-Design.md
+++ b/Designs/Stork-Base-Design.md
+[[_TOC_]]
+
+# TODO List
+
+- authentication methods: username/password, apikey/token? what is the relation to session?
+- update proprietary authorization solution to match what casbin can do
+- define URIs for REST service
+- define RPC API
+- extend the data model: include both locally specified data and the data fetched from remote services
+
+# Stork Base Design
+
+This is the initial design document for Stork. It introduces and describes the foundation of the system, on top of which the features will be gradually built in subsequent releases. The scope of this document includes but is not limited to the following considerations:
+
+- technologies selected,
+- data model (including access control lists),
+- users authentication and authorization mechanisms,
+- integration with existing third party libraries and tools,
+- service discovery and integration with existing installations of Kea and BIND,
+- software upgrade mechanisms including Stork components upgrades and Kea/BIND upgrades,
+- plugins design,
+- packaging and releases
+
+The document is meant to address [Base Stork Requirements](https://gitlab.isc.org/isc-projects/stork/wikis/Stork-1.0-Requirements) (Work in Progress).
+
+**This document is at early stage of development and is a subject to reviews and substantial changes. It reflects authors' personal view on the subject. It is not set in stone and whenever it makes a statement like "we will implement it like this or that" it should be read as "the design authors think that it should be implemented this or that way". Everything in this document may be considered as a final plan (set in stone) when this note is gone and there are no more open issues in the appendix at the bottom of this document.**
+
+# Introduction
+
+## Authors (please add yourself when you contribute):
+
+List of authors as of December 18th, 2019:
+* Marcin Siodelski
+* Michal Nowikowski
+* Matthijs Mekking
+
+## Related Documents
+
+- [Requirements for the First Release](Stork-1.0-Requirements)
+- [Development Environment](Development-Environment)
+- [Web UI Design](Web-UI-Design)
+
+## Terminology
+
+### Releases
+
+The release model and versioning is still to be determined. As of September 2019, the team has not agreed upon all of the [requirements](https://gitlab.isc.org/isc-projects/stork/wikis/Stork-1.0-Requirements) for the initial release. In our discussions to date, we used to refer to this release as Stork 1.0, but 1.0 by convention is a production ready release. On this early stage of planning and development it looks impractical to refer to the initial release as 1.0 because advertising the 1.0 release would require that it is production ready and adds significant value to the existing systems already used to monitor Kea and BIND. One of the options we still have at this point is to have an early release which will be demonstrating the middle stage of the product asking for feedback from the users.
+
+Due to these considerations, this document uses *codenames* for distinguishing between the releases, rather than the version numbers. The first release of Stork, regardless if it is production ready or not ready, is referred to as *Stork Sprout*. In other words, if the first release we make is production ready, the *Stork Sprout* is equivalent to *Stork 1.0*. If we decide to release non-production ready version of Stork, the *Stork Sprout* may mean *Stork 0.8*.
+
+### Abbreviations and Terms
+
+These are the values terms used in this document.
+
+* **ANA**: Authorization and Authentication component of Stork. It provides mechanism to verify that the user has permissions to access the system and specifies what the user can do in the system. **ANA** component also provides means for the super user to create new users and define their permissions.
+<<<<<<< HEAD
+* **Application**: A **Program** or a group of **Programs** providing some functionality.  For example, the Kea **Application** comprises of several **Daemons**, including the control agent and the dhcpv4 daemon.  An **Application** of the same type can not run more than once on the same **Machine**.
+* **Daemon**:  A software component that is part of an **Application**. For example, the *named* deamon is part of the BIND9 **Application**. In the case of Kea, a *kea-dhcp4* is also a *Daemon*. The deamon may be *active* (running) or *inactive* (not running). The deamon may be inactive on purpose (because it is optional or not needed in the current configuration) or may be inactive because of a failure.
+=======
+* **Application**: A **Program** or a group of **Programs** providing some functionality.  For example, the Kea application comprises of several daemons, including the control agent and the dhcpv4 daemon.  An **Application** of the same type can not run more than once on the same **Machine**.
+* **Current State**: is a collection of runtime information about the application, deamon or service which provides the details about its operation and allows for making determination whether it is operating as desired or with issues. It also includes the information about the issues.
+>>>>>>> dfd80568315a6ea01a649550bb7da1a8f0b254cb
+* **Component**: A functional part of the system. It can be a third party software integrated with Stork (e.g. Prometheus, Logstash instance, database instance) or it can be a part of Stork distributed on remote machines, e.g. **STAG**.
+* **Condition**: is a boolean-like information about the deamon, application or service indicating if it is in the desired state or not. The two possible values are: `healthy` and `unhealthy`. The determination whether it is in one of the two is made by examining the current state vs desired state. The condition may be accompanied by a brief description of the problem which gives a hint to the user why the deamon, application or service is unhealthy.
+* **Daemon**:  A software component that is part of an application. For example, the *named* deamon is part of a BIND9 application. In the case of Kea: a *kea-dhcp4* is also a deamon. The deamon may have the `running` or `not running` status. The deamon may have `not running` status on purpose (because it is optional or not needed in the current configuration) or because of a failure. Whether it is not running on purpose or because of the failure can be determined by examining the desired state of the deamon.
+* **Desired State**: is a collection of information about an application, deamon or service which define how the administrator wants the application, deamon or service to behave. A set of parameters belonging to the desired state may not be the same as the set of parameters describing the current state, because the current state may include some diagnostic information, e.g. CPU utilization or application uptime. However, it must be always possible to determine whether the application, service or deamon is in the desired state by examining the current state.
+* **LAR**: Locally Applicable Resource, a resource which is stored in the **StorkDB** and not on the remote machine.
+* **Program**: Synonym of deamon.
+* **Machine**: A machine running one or more **Services** which Stork connects to.
+* **RAR**: Remotely Applicable Resource, a resource which is used and stored on the remote machine besides being stored in **StorkDB**.
+* **Resource**: Representative piece of information typically stored in the StorkDB, upon which CRUD operations can be performed. Example of resources: Stork user, subnet, DNS zone.
+* **Service**: A set of **Applications** that interact together to provide something useful for the end user. For example, the DHCP service may have several kea **Applications** to provide High Availability.  A **Service** is managed by Stork via an API, e.g. DHCP service, DNS service, Prometheus instance.
+* **STAG**: Stork Agent, a deamon running on Machine which manages Services local to this machine. Example functions of **STAG**: report software versions, perform **Service** upgrades.
+* **StorkDB**: PostgreSQL database used by Stork to store critical information about Stork itself (e.g. configuration information) and about other components of the system, such as managed servers, network topology, users with their credentials and many more.
+* **Status**: is a boolean-like information indicating whether a deamon or service has been started and hasn't terminated. There are two `status` values defined: `running` or `not running`.
+* **StorkCLI**: Stork command line interface, a command line tool which can be used instead of the **StorkUI** to manage the system.
+* **StorkUI**: The graphical user interface provided by Stork and available to the user via web browser.
+* **Supported Software**: The software installed on **Machine** and providing services that Stork has integration with, e.g. Kea software, BIND software, Prometheus software etc.
+* **System**: The Stork application along with all required components and all services managed by Stork.
+
+#### Clarifications
+
+The discussion about the clarifications has been moved to: https://gitlab.isc.org/isc-projects/stork/issues/111
+
+# Technology
+
+In the second half of 2019, the team debated about the most suitable technology to be used to implement Stork. We compared available frameworks and programming languages for the frontend and the backend using different criteria, such as: maturity, popularity, whether they are actively developed or not, whether they provide easy to use libraries to create desired functionality of Stork. We also took into account the performance of the programming language and the framework. Finally, we put significant considerations into ease of deployment of the application written in the given framework.
+
+Most of the candidate technologies for frontend were based on Angular, from which we selected PrimeNG for the following reasons:
+* One of the team members worked with PrimeNG before and had good experience with it.
+* PrimeNG supports multi column table sorting.
+* PrimeNG has support for charts.
+
+The leading candidate technologies for the backend implementations were:
+* Golang + gin-gonic framework for building REST API.
+* Python + Flask
+
+Despite Python and Flask being more mature we have chosen Golang because we believed that the following features of Golang outweigh its lower level of maturity:
+- Ease of deployment of the application, because the application is compiled into a single binary.
+- Native support for concurrency in Golang which makes it more performant than Python.
+
+In addition, some of the team members expressed their good experience with writing applications in Go.
+
+Finally, we have chosen PostreSQL as a database for Stork. It is considered to be one of the most advanced relational open source databases. One of the nice PostgreSQL features, not available in MySQL, is the ability to write stored procedures in C (or Go compiled as C library). Finally, there are performant drivers and ORM libraries available in Golang for PostgreSQL, e.g. [go-pg](https://github.com/go-pg/pg). 
+
+The decision about selecting these technologies was made on the regular Stork call on September 24th, 2019 (see the appendix).
+
+As we further explored various frameworks for building REST services with golang, we came across [goswagger](https://goswagger.io), which allows for defining an API using YAML notation and autogenerating the boilerplate code relying on the *net/http* standard library. It also generates documentation for the REST API from the YAML file. Finally, there are very useful online tools which can be used to test the API. As a result, we have made a decision to use [goswagger](https://goswagger.io) instead of gin-tonic.
+
+The Rake tool has been chosen as a build automation tool for the reasons stated [here](https://gitlab.isc.org/isc-projects/stork/wikis/Development-Environment#rationale).
+
+# Architecture Overview
+
+```mermaid
+graph TB
+
+ui --- |HTTP ReST|gin
+gin --- |gRPC|stagbind
+gin --- |gRPC|stagkea
+gin --- |gRPC|stagprom
+gin --- |query|prom
+bindexp --- |Scrape|prom
+keaexp --- |Scrape|prom
+
+subgraph "Web Browser"
+ui[StorkUI - PrimeNG]
+end
+
+subgraph "Server host"
+gin[Stork Web Service - Swagger]
+gin --- pgsql(StorkDB - PostgreSQL)
+end
+style pgsql fill:#ccc
+
+subgraph "Prometheus host"
+stagprom[STAG] --- prom[Prometheus Service]
+end
+style prom fill:#ccc
+
+subgraph "BIND host"
+stagbind[STAG] --- bind[BIND]
+bind --- bindexp[Bind Prometheus Exporter]
+end
+
+subgraph "Kea host"
+stagkea[STAG] --- keaca[Kea CA]
+stagkea[STAG] --- keadhcp4[Kea DHCP4]
+keaca --- keaexp[Kea Prometheus Exporter]
+keadhcp4 --- keaexp[Kea Prometheus Exporter]
+end
+
+```
+
+# Web Service
+
+```mermaid
+graph TB
+
+browser --- nginx
+nginx --- |Static|static(Angular/PrimeNG bundle + other files)
+nginx --- |API|net_http
+
+subgraph "Go service"
+net_http[net/http] --- middleware[middleware: session/auth/etc]
+middleware --- api_dispatch[go-swagger API dispatcher]
+api_dispatch --- login[business logic]
+end
+
+```
+
+# Stork Agent
+
+The Stork Agent (STAG) is the software written in golang and distributed with Stork which is running on the Machines (managed systems) which run various services that Stork communicates with. Individual services often provide the APIs which are directly used by Stork to perform certain actions. For example, Kea Control Agent provides REST API which is used by Stork run commands against Kea DHCPv4 and/or DHCPv6 servers to fetch server's configuration, update server's configuration or otherwise affect the server's operation. Those APIs, however, do not provide enough control over the machine and are not designed to perform house keeping operations, such as triggering software updates, updating configurations of the services which do not provide the APIs etc. The STAG is meant to fill this gap.
+
+The functions of STAG include, but are not limited to the following:
+
+* Check what supported software is installed on the Machine and report its version, e.g. running `kea-dhcp4 -v`.
+* Start the supported software, e.g. start Kea server, start BIND, start Prometheus etc.
+* Trigger updates of the supported software on user's request.
+* Add new targets to Prometheus service, e.g. adding new BIND instance causes Stork to update the list of targets from which the Prometheus scrapes the data. This may require access to the Machine running Prometheus and adding new target file.
+* Managing Prometheus exporters on the Machines (need to clarify...)
+* Returning a log of the remote service to Stork for display.
+* ...
+
+STAG is functionally going to be a service performing certain tasks on Stork's request. It won't hold any representational data model to be accessed or modified by the client (Stork being the client). Therefore, it will have to expose an RPC type of API rather than REST API. Without yet going into much implementation details for individual STAG functions, we can already say that there will be two types of requests send to the STAG:
+
+* The request initiating a short action followed by a response after the action completes. For example, the request asking for the version of the Service.
+* The request initiating a long action. In such case, the RPC client receives a response from the STAG immediately after the STAG receives the request. That way STAG confirms the reception of the request and kicks the desired action which runs in background. The client receives a notification when the action is complete. For example: an upgrade of the software on the Machine takes long. Stork instructs STAG to start the upgrade and receives the response that STAG will perform this action. STAG sends notification to Stork when this upgrade completes.
+
+There are many different approaches to implementing RPC. The JSON-RPC 1.0 codec is provided with the standard *net/rpc/json* library. However, JSON-RPC 1.0 lacks the mechanism to send notifications, i.e. the requests for which the client expects no response. JSON-RPC 2.0 introduces bulk commands (multiple commands sent within a single request), which can be processed concurrently. However, the available go libraries implementing JSON-RPC 2.0 often lack some part of the functionality, e.g. bulk updates.
+
+The good alternative to JSON-RPC is the [gRPC](https://grpc.io) using the [Protocol Buffers](https://developers.google.com/protocol-buffers) for data serialization. It is mature and battle tested solution which meets our requirements. 
+
+We have implemented a simple PoC using gRPC. It is available on the *experiments/grpc* branch of the Stork project. 
+
+The following are the major strengths of the gRPC:
+* It is possible to generate both the client and the server code from the `.proto` file. This file contains the API definition.
+* It uses mature and efficient message exchange format, protocol buffers.
+* It provides the mechanism for streaming the responses and bidirectional communication. Streaming responses is useful for implementing features such as live view of the remote logs.
+* It has built-in support for TLS.
+
+Even though gRPC uses binary HTTP/2 as the underlying transport it does not pose the problem for debugging the data stream sent between the client and the server because the latest versions of [Wireshark](https://www.wireshark.org) have extensions to decode HTTP/2 and the protocol buffers.
+
+The gRPC is now the leading candidate for implementing RPC between the Stork Server and STAG.
+
+# Database Role
+
+Stork is a centralized management and monitoring system integrating many components, including third party software. It will be used to manage the services running on many machines accessible over the network. The information about the components will be dispersed, but will have to be easily be made available to the Stork user via the GUI or the command line tool. Therefore, the interesting information to the user will have to be gathered from the remote system components and then often stored in the local database where it can be quickly accessed by the user.
+
+What kind of information will be stored in the database is going to evolve over time, but the following list includes some examples of what what kind of information will be stored initially:
+
+* users,
+* roles and ACLs,
+* service information,
+* service grouping, e.g. high availability,
+* BIND/Kea entire configuration,
+* DHCP leases,
+* host reservations,
+* subnets,
+* pools,
+* shared networks,
+* declined addresses,
+* all zones,
+* zone signing status?,
+* NTAs?
+* *other ?*
+
+We will be calling a representative piece of information of some type stored in the database a *resource*. For example, a collection of information describing a system user is a resource. Similarly, the subnet configuration information is also a resource.
+
+### Locally/Remotely Applicable Resources
+
+One type of resource classification we need to introduce for the purpose of the design is based on where the resource information applies in terms of location within the network:
+- locally applicable resources (LAR)
+- remotely applicable resources (RAR)
+
+The LARs apply within the Stork instance itself. The information about the system user is an example of LAR. The information about the user is created within stork (when the user creates an account) and stored locally for the sake of authentication and authorization. Another LAR example is the information that defines the location of the Prometheus instance. It is used by Stork to communicate with this instance and typically not distributed anywhere outside of Stork.
+
+The RARs apply outside of Stork. The DHCP server configuration is (in a general case) a RAR because it must be available to the DHCP server running on the machine remote to Stork. The RAR may be defined in Stork (via Stork web), but must be later populated to the remote server if the server is supposed to use this configuration. In other case (also to be supported by Stork), the DHCP (or BIND) server configuration may be created outside of Stork (e.g. via editing the configuration file on the remote machine), but must be made available to Stork via the APIs. That implies that RARs may be populated both ways: from Stork to the managed system or from the managed systems to Stork.
+
+## Resolving Conflicts between RARs
+
+The simplest way to avoid conflicts between RARs defined on local (Stork) or remote (Kea/BIND) system is to persist the RAR information in a single place, e.g. Kea/BIND configuration file. The configuration information is both available to the given server and can be fetched and presented in Stork via the remote control API exposed by our servers. This solution, however, does not scale very well. Let's imagine the case that multiple administrators are viewing the same piece of the configuration information. Every time the information is displayed, the control command is sent to the given server to fetch the information. This could considerably impact the remote server's performance. It also generates excessive traffic in the network. More importantly, all modifications to the RAR would immediately cause configuration changes on the remote system. In the case of multiple administrators concurrently modifying the same resource the server could end up in the inconsistent state.
+
+It looks that introducing some control over the flow of resources between Stork and remote components is inevitable. Being able to store the RARs in the local Stork database (in addition to storing them remotely) can solve a lot of problems like:
+- editing multiple resources before committing the configuration to the remote system,
+- concurrent access control to the resource by multiple users,
+- fast access to the resource information (data fetched from local database rather than remote system),
+- detection of *direct* configuration modifications on the remote system (by comparing with locally stored information),
+- keeping Stork extensible to work with different types of DNS/DHCP servers
+
+The major issue with duplication of the RARs between Stork and remote systems is that the conflicts may arise when the RAR is modified on the remote system directly. Another use case is to modify the RAR in Stork without populating it to the remote system, perhaps because the system is temporarily offline. In this section of the document we're trying to deal with the resolution of the conflicts between the RARs.
+
+### Use Case: Adding New Server 
+
+The following sequence diagram shows interactions between the user, StorkUI, Stork Backend, Stork Database and the Kea server being added to the system. Kea is used here as an example. Any type of server would fit into this diagram.
+
+![stork-uc-new-server.svg](uploads/8ee3cb62c0676e8edeca5fe224184aac/stork-uc-new-server.svg)
+
+When the new server is being added to the system, the information about this server is stored in the Stork database. This information includes the IP address of the server, UDP port etc. The fact that this information is accepted (or rejected) by Stork is communicated to the user. When the server is successfully added to the database, the backend will try to fetch its configuration. Next, all the relevant configuration for this server will be fetched from Stork's local database. If such configuration exists the backend will look into it to see if there are any conflicts with the configuration fetched from the new server. If the conflicts exist, the user will be notified via the StorkUI and asked whether the user accepts the configuration fetched from the server. If the user accepts this configuration, the relevant configuration in the Stork database is updated.
+
+# Data Model
+
+## Daemons, Applications, and Services
+
+What exactly is a **Service**? A **Service** is one or more **Applications** running on the network that provides some capability.  An **Application** is one ore more **Daemons** that together provide the desired functionality.  For example, the DHCP service may have multiple kea **Applications** running in different modes in order to provide High Availability.  A kea **Application** constists of multiple **Daemons**, such as the Control Agent, the DHCPv4 daemon, the DHCPv6 daemon and the DDNS daemon.
+
+It is important to acknowledge that there are different presentations of an **Application**.  One view is that an **Application** is something that is running on a **Machine** and has a particular state (Active or not, configuration).
+Another way to look at it is that the **Application** itself dictates the configuration and the **Daemons** running on a **Machine** represent an instance of that **Application**.  That instance also has a state and hopefully matches
+the desired configuration that the **Application** dictates.
+
+The following are the types of the services to be supported in the Sprout release:
+* BIND DNS service,
+* DHCP service,
+* Prometheus service
+
+Stork must store information about services it manages in the database (**StorkDB**).  The information stored is a set of **Applications** and its configuration that together provide the **Service**.
+
+In the initial releases of Stork, the "service discovery" will not be supported. Therefore, the information about the Service will be specified by the user (e.g. via StorkUI). This information must be meaningful to the users and it must allow for identifying the type of the service and connecting to the service using appropriate protocol.
+
+There is a need for having a table that would hold the list of the services that Stork communicates with. Each service exposes a different API and it has a different role in the system, therefore this table should specify what type of API this service is exposing. The supported API types are going to be stored in a separate table:
+
+```
+postgres=# \d service_api;
+                               Table "public.service_api"
+   Column    |  Type   | Collation | Nullable |                 Default                 
+-------------+---------+-----------+----------+-----------------------------------------
+ id          | integer |           | not null | nextval('service_api_id_seq'::regclass)
+ name        | text    |           |          | 
+ description | text    |           |          | 
+Indexes:
+    "service_api_pkey" PRIMARY KEY, btree (id)
+    "name_unique" UNIQUE CONSTRAINT, btree (name)
+Referenced by:
+    TABLE "service" CONSTRAINT "service_api_type_fkey" FOREIGN KEY (api_type) REFERENCES service_api(id)
+```
+
+and the predefined API types will be inserted into the table:
+
+```
+postgres=# SELECT * FROM service_api;
+ id |       name        |             description              
+----+-------------------+--------------------------------------
+  1 | rndc              | BIND rndc
+  2 | kea-dhcp4-over-ca | Kea DHCPv4 server over Control Agent
+  3 | kea-dhcp6-over-ca | Kea DHCPv6 server over Control Agent
+  4 | stag              | REST API exposed by Stork Agent
+  5 | prometheus        | REST API exposed by Prometheus
+(5 rows)
+```
+
+Finally, the service table containing the services that Stork should connect to will have the following structure:
+
+```
+postgres=# \d service;
+                               Table "public.service"
+   Column    |  Type   | Collation | Nullable |               Default               
+-------------+---------+-----------+----------+-------------------------------------
+ id          | integer |           | not null | nextval('service_id_seq'::regclass)
+ name        | text    |           |          | 
+ address     | inet    |           |          | 
+ hostname    | text    |           |          | 
+ port        | integer |           | not null | 
+ api_type    | integer |           | not null | 
+ description | text    |           |          | 
+Indexes:
+    "service_pkey" PRIMARY KEY, btree (id)
+    "service_api_type_idx" btree (api_type)
+Check constraints:
+    "service_check" CHECK (address IS NOT NULL OR hostname IS NOT NULL)
+Foreign-key constraints:
+    "service_api_type_fkey" FOREIGN KEY (api_type) REFERENCES service_api(id)
+```
+
+Note that the `service` table includes a constraint verifying that at least an address or hostname is specified.
+
+# Authentication and Authorization
+
+One of the key components of Stork is the "Authentication and Authorization" (AnA) component, which guards against the access to the system by a malicious user and against viewing, modifying or executing system resources by the authenticated users who lack appropriate permissions to perform these specific actions. It is envisaged that future Stork releases will provide integration with external systems, e.g. Kerberos. In the Sprout release, however, we want to make sure that Stork can run without such dependencies and offer its own way to deal with the user permissions.
+
+## Authentication
+
+Stork uses a relational database to store the information about all the critical parts of the system. The authentication information about the users is also stored in this database. The information about the users having access to the system is going to be stored in a dedicated SQL table.
+
+```sql
+CREATE TABLE stork_user (
+  id SERIAL PRIMARY KEY,
+  login TEXT,
+  pswhash TEXT,
+  email TEXT,
+  name TEXT,
+  lastname TEXT);
+```
+
+It contains basic information about the user of the system including the user's login and password hash. The user account can be created by another user having `super` privileges. If Stork is configured to allow self account creation, the account may also be created by the new user after it is approved by the super user being notified over the email or by some other means.
+
+The typical *insert* statement creating a new user account will look like this.
+
+```sql
+INSERT INTO stork_user (login, pswhash, email, name, lastname)
+  VALUES ('jkowal', crypt('new password', gen_salt('md5')), 'jkowal@example.org', 'Jan', 'Kowalski');
+```
+
+The password will be encrypted so as it is not visible to anyone. The use of the cryptographic functions in PostgreSQL requires enabling the crypto extensions:
+
+```sql
+CREATE EXTENSION pgcrypto;
+```
+
+The authentication query will look like this:
+
+```sql
+SELECT (pswhash = crypt ('new password', pswhash))
+  FROM stork_user
+  WHERE email = 'jkowal@example.org';
+```
+
+## Sessions
+
+When the user authenticates using his credentials, he should remain logged into the system until he explicitly logs out or a period of the inactivity expires. The HTTP protocol is stateless and does not keep track of the client's activity. Traditionally, the web applications have been using *sessions* mechanism to record the fact that the client has been authenticated and store any additional information about the client that could be handy for the server in the future communication with the client.
+
+Recently, it has been a common practice by developers to use *JSON Web Tokens (RFC 7519)*, a.k.a. *JWT* instead of the traditional *sessions*. One of the arguments is that *JWT* can be used to reduce or completely eliminate the state keeping on the server side, because the token sent by the client contains the entire state information (stateless JWT). This approach may be potentially useful in the systems which need to scale and for which maintaining the large number of sessions becomes an issue. However, this argument is not valid for a single Stork installation which is going have a limited user base. In fact, keeping the state information about the clients entering the system may be useful for auditing and statistics.
+
+*JWT* has no way to invalidate the state of the client on the server side. The tokens provide only the expiration time and until that time elapses the stateless server would have to trust the tokens it receives from the client. If the server was to be able to invalidate the state before the token expires, some form of session mechanism would have to be introduced anyway. That all leads to the conclusion that server must keep the state information and therefore there is no good argument in favor of using *JWT* for authentication.
+
+The traditional session mechanism is battle tested and relies on the cookies being more secure than *local storage* which developers often elect for storing *JWT* tokens.
+
+There is a very interesting [blog written in 2016](http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/) which argues about not using *JWT* as a session keeping mechanism.
+
+In an attempt to find a GO library already implementing sessions and working with PostgreSQL database as a storage, we have identified the [SCS](https://github.com/alexedwards/scs) as a good candidate. Please refer to the [PostgreSQL specific part](https://github.com/alexedwards/scs/tree/master/postgresstore) of the SCS library documentation for the details how to setup the database.
+
+## Authorization
+
+There are two common models of defining permissions to the resources:
+- Role Based Access Control (RBAC)
+- Access Control Lists (ACL)
+
+In the RBAC model a set of permissions to different resources (and types of resources) are assigned to a role (group_, rather than particular user. The users can be then assigned to their roles/groups. This mechanism allows for smooth assignment of permissions for a new user. The special role which must be supported by the RBAC is the *super user* role, which is granted all permissions within the system. The roles may be defined per resource type, e.g. an individual having the SUBNET_MANAGER role would be allowed to manage all subnets within the system. Another way to partition the roles would be per resource instance: e.g. SERVER_ID13_CONFIG_MANAGER would be able to manage the entire configuration of the particular server (having id of 13 in our example). But, this is not really something we'd like to do in the real system as it does not scale very well. Instead we'd rather want our permissions to be defined in the hierarchical way, e.g. `/servers/13/*`. Instead of defining a static role, we just specified a permission stating that it is allowed to access any resource belonging to the server with id 13, both existing now or to be added in the future. Since this type of permission grants access to many resources and it may be enough for the user (responsible for managing the server with id of 13) to be granted this permission rather than assign a role. This implies that, although the RBAC system is very handy, it doesn't necessarily scale very well. As shown above, besides being able to define a role it is also important to be able to assign users to the permissions directly. That's what the ACL is designed to do.
+
+The ACL better deals with fine grained permissions. It couples the permissions with the users, rather than the roles (groups). It answers the question: "what can this user do with the system?" rather than "what role does the user have in the system?". An example of the ACL entry is "user marcin can view subnet 13".
+
+It is clear that we want Stork to be able to facilitate both access control models. We already know that there will be use cases requiring both of them. For example:
+- "user marcin can (only) manage configurations on server 13" (ACL),
+- "user john can manage all DNS specific resources" (RBAC)
+
+The number of possible use cases is very large and we must not limit ourselves to picking one or another. A hybrid approach becomes necessary. In this hybrid approach we must be able to facilitate the following scenarios:
+- A role (group) is created and assigned multiple permissions. Next, multiple users are assigned this role (assigned to the group).
+- A user is assigned one or more permissions without being assigned to any role (or group).
+- A user is assigned a role (to a group) and also has several permissions assigned to the user exclusively. The user specific permissions take precedence over group specific permissions.
+
+Permissions may be explicit, e.g. "permissions to manage subnet with id 13", or they may be implicit: "permissions to manage all subnets".
+
+### Casbin
+
+The [casbin](https://github.com/casbin/casbin) is the open source library available for different programming languages (including golang) which provides generic support for different access control models, including RBAC and ACL.
+
+In the typical usage scenario, the application requires two configuration files to be provided: a model configuration file and the policy file. The former specifies how to match the policy rules against the requests coming in. The latter specifies the rules indicating "who (or which group) can access what resource and what can do with it".
+
+There is a rich set of examples in the [casbin documentation](https://casbin.org/docs/en/supported-models). Here, we want to limit ourselves to some basic concepts of the library and see if it seems suitable for our needs.
+
+The following is the example of the policy file using REST API type of resources:
+
+```
+[1] p, marcin, /servers/2/*, (GET)|(POST)|(PUT)|(DELETE)
+[2] p, xiong, /servers/3/*, GET
+[3] p, subnet_watchers, /subnets, GET
+[4] g, marcin, subnet_watchers
+[5] p, machine_managers, /machines/{id}/*, (GET)|(POST)|(PUT)|(DELETE)
+[6] g, marcin, machine_managers
+
+```
+
+The respective lines have the following meaning:
+1.  User marcin can access, modify and delete data pertaining to the server 2. For example, it may fetch the configuration of the server 2. It may modify the information about the server 2 in the StorkDB etc.
+1.  User xiong can fetch the information about the server 3, including its configuration.
+1.  The group subnet_watchers can list subnets.
+1.  User marcin belongs to a group subnet_watchers, so it inherits the permissions of that group.
+1.  The group machine_managers can access, modify and delete the machine information.
+1.  User marcin belongs to the group machine_managers and it inherits its permissions.
+
+Note that the policy file both defines user permissions and also associates some of the users with groups (roles). Neither the user names nor the resource names are validated by casbin enforcer. It is up to the application to validate the user names (also perform authentication) and the resource names. The policy file is read (typically when the application is launched) and the enforcer uses this data for authorization. Adding the new policy entries to the file will not take effect until the application is reloaded. Therefore, casbin provides a simple API to add new policy entries. The updated policy can be saved into the file (or other storage).
+
+There are [storage adapters](https://casbin.org/docs/en/adapters) available, contributed by casbin users. One of them is [casbin-pg-adapter](https://github.com/MonedaCacao/casbin-pg-adapter) - which works with the [go-pg adapter](https://github.com/go-pg/pg), considered in this design as the ORM library for Stork. The casbin policy file uses CSV format. The way that the adapters seem to store the policy in the database reflects the CSV file structure. The policy is stored in the single table and this table is dropped and recreated (with updated policy) upon the policy update.
+
+The [casbin API documentation](https://casbin.org/docs/en/management-api) describes all API calls being supported. It is possible to fetch policies for the particular user or group. It is also possible to check whether the user has certain permissions based on grouping policy or user specific policy.
+
+While the policy information (assignment of roles and permissions) is subject to frequent changes, the model configuration is mostly static. The following is the example model configuration which makes use of the policy defined above:
+
+
+```
+[request_definition]
+r = sub, obj, act
+
+[policy_definition]
+p = sub, obj, act
+
+[role_definition]
+g = _, _
+
+[policy_effect]
+e = some(where (p.eft == allow))
+
+[matchers]
+m = (g(r.sub, p.sub) || r.sub == p.sub) && keyMatch3(r.obj, p.obj) && regexMatch(r.act, p.act)
+```
+
+The *matchers* section instructs casbin how to match polices with the requests coming in. The following is the brief description of the components present in the matchers section above:
+* `g(r.sub, p.sub)` - checks if the user/request subject (r.sub) belongs to the group (g), for which the policy subject (p.sub) is also associated with this group, and if the object to be accessed (r.obj) matches the object for which the policy was defined (p.obj).
+* `r.sub == p.sub` - checks if the user (r.sub) is the same as the user for which the policy was defined.
+* `keyMatch3` - is the matching function checking if the requested resource (r.obj) matches the resource for which the policy was defined.
+* `regexMatch` - is the function which checks if the requested action r.act matches one of the actions present in the given policy entry.
+
+The following is the set of examples (URI + action) and result of enforcing them for the authorization model and policy provided above:
+
+| user | URI | action | access granted |
+| ---- | --- | ------ | -------------- |
+| marcin | /servers/2/subnets/3 | POST | yes |
+| xiong | /servers/2/subnets/3 | POST | no |
+| xiong | /servers/3/subnets/3 | GET | yes |
+| xiong | /servers/3/subnets/3 | POST | no |
+| marcin | /subnets | GET | yes |
+| xiong | /subnets | GET | no |
+| marcin | /machines/1/os | POST | yes |
+
+The simple coding experiment demonstrating those polices is available in https://gitlab.isc.org/isc-projects/stork/commit/07970d05095d381f92e3e0a0fb01460e9ecd4756.
+
+This library looks very promising and it gives the certain level of flexibility. However, there are still several considerations which have to be taken into account before we can accept casbin as the authorization solution:
+
+* Use of multiple models simultaneously is apparently not supported. The question is whether we can construct the *matchers* expression which facilitates various types of policies we desire.
+* Policies stored in the database have no linkage with the data model in Stork. In particular, the *subjects* in casbin have no relation to the *users* in Stork. The *objects* in casbin have no relation to the *resources* in Stork. While this gives a lot of flexibility, it complicates operations like automatic removal of the policies relating to the particular resource when the resource is removed from the database. Also, there are no means to check data integrity within the database.
+* The storage of the polices in the database is inefficient - the policy table is dropped and added back with the updated policy.
+* In addition, the database administrator may disallow creation of the tabes for certain users.
+
+### Proprietary Authorization Implementation (To be updated or deprecated)
+
+The following SQL examples are not meant to provide the exhaustive set of SQL relations and permissions that can be expressed with the ACL. They are meant to demonstrate the ideas how the ACL can be implemented within the Stork database. They should be easily reused against any new resource type introduced into Stork.
+
+Consider the following resource table including subnets:
+
+```sql
+CREATE TABLE subnet (id SERIAL PRIMARY KEY, content JSON);
+```
+
+This table has a very simple structure and it stores subnet configuration in JSON format in the second column. Storing the subnet configuration that way has many benefits which are described elsewhere in this document. For the purpose of understanding the ACL, it is important to note that there is a dedicated table storing resources of the given type. This table has a primary key which can be used in references to other tables.
+
+We already have the table `stork_users` which holds the information about the users of the system. There is another table required, which associates the users with specific permissions.
+
+```sql
+postgres=# \d user_permission;
+                           Table "public.user_permission"
+    Column     |  Type   | Collation | Nullable |              Default              
+---------------+---------+-----------+----------+-----------------------------------
+ id            | integer |           | not null | nextval('perms_id_seq'::regclass)
+ resource_name | text    |           | not null | 
+ resource_id   | integer |           |          | 
+ user_id       | integer |           |          | 
+ access_list   | json    |           |          | 
+Foreign-key constraints:
+    "perms_user_id_fkey" FOREIGN KEY (user_id) REFERENCES stork_user(id)
+```
+
+It has a foreign key to the `stork_user` table which means that the permissions may be specified only for the existing user. On the other hand, it doesn't explicitly refer to any resource type as this table is meant to hold access control lists for all types of resources. The specific resource to which the entry in this table applies is identified by the pair of `resource_name` and `resource_id` values. The example of the `resource_name` value is the `subnet`. The resource id is the primary key value of the subnet in the `subnet` SQL table. The `resource_id` is optional (can be NULL). In that case, it is assumed that the specific user has access to all resources of the specific type, e.g. all subnets (both those existing in the database and those that will be added in the future).
+
+The ability to specify fain grained permissions for individual users is flexible and powerful, but it is many times impractical from the administrator's perspective. We need a mechanism to define a set of permissions and be able to associate them with multiple users. A permission set may contain a collection of permissions to multiple resources of the same type of multiple resources of different types. It is up to the administrator to define those sets. In order to facilitate this scenario we will create a new entity called *group*. A group can be assigned a set of permissions, just like a user can be assigned a set of permissions. The groups are held in the separate SQL table.
+
+```sql
+postgres=# \d stork_user_group;
+                               Table "public.stork_user_group"
+   Column    |  Type   | Collation | Nullable |                   Default                    
+-------------+---------+-----------+----------+----------------------------------------------
+ id          | integer |           | not null | nextval('stork_user_group_id_seq'::regclass)
+ name        | text    |           |          | 
+ description | text    |           |          | 
+Indexes:
+    "stork_user_group_pkey" PRIMARY KEY, btree (id)
+Referenced by:
+    TABLE "group_permission" CONSTRAINT "group_permission_group_id_fkey" FOREIGN KEY (group_id) REFERENCES stork_user_group(id)
+    TABLE "user_group_assoc" CONSTRAINT "user_group_assoc_group_id_fkey" FOREIGN KEY (group_id) REFERENCES stork_user_group(id)
+```
+The group is associated with a user specified set of permissions via the `group_permission` table:
+
+```sql
+postgres=# \d group_permission;
+                                Table "public.group_permission"
+    Column     |  Type   | Collation | Nullable |                   Default                    
+---------------+---------+-----------+----------+----------------------------------------------
+ id            | integer |           | not null | nextval('group_permission_id_seq'::regclass)
+ resource_name | text    |           | not null | 
+ resource_id   | integer |           |          | 
+ group_id      | integer |           |          | 
+ access_list   | json    |           |          | 
+Indexes:
+    "group_permission_pkey" PRIMARY KEY, btree (id)
+    "group_id_idx" btree (group_id)
+Foreign-key constraints:
+    "group_permission_group_id_fkey" FOREIGN KEY (group_id) REFERENCES stork_user_group(id)
+```
+
+As in case of the `user_permission` table, the `resource_id` is optional.
+
+Finally, individual users will be associated with the groups using the following table:
+
+```sql
+postgres=# \d user_group_assoc;
+           Table "public.user_group_assoc"
+  Column  |  Type   | Collation | Nullable | Default 
+----------+---------+-----------+----------+---------
+ user_id  | integer |           | not null | 
+ group_id | integer |           | not null | 
+Indexes:
+    "user_group_assoc_pkey" PRIMARY KEY, btree (user_id, group_id)
+Foreign-key constraints:
+    "user_group_assoc_group_id_fkey" FOREIGN KEY (group_id) REFERENCES stork_user_group(id)
+    "user_group_assoc_user_id_fkey" FOREIGN KEY (user_id) REFERENCES stork_user(id)
+```
+
+The relations facilitating the ACL are graphically presented below.
+
+![stork-schema-acl](uploads/cbeca9d7ea0eb44370e13e048e3ba52e/stork-schema-acl.png)
+
+The organization of the ACL should facilitate the following typical use cases:
+- check the permissions of the user and/or group with respect to this particular resource?
+- find the users (and groups) that have any privileges with respect to this resource and what these privileges are?
+- check the permissions of that user in the system?
+ 
+Below, we provide example queries for these use cases.
+
+#### Use Case 1
+
+The following query selects all user specific permissions of the user `jkowal@example.org` to the subnet having id of 1. It returns at most one row.
+
+```sql
+SELECT u.email, s.content ->> 'prefix' AS subnet, p.access_list
+  FROM stork_user AS u 
+  INNER JOIN user_permission AS p 
+    ON u.id = p.user_id
+  INNER JOIN subnet AS s
+    ON CAST(s.content ->> 'subnet_id' AS INTEGER) = p.resource_id
+  WHERE u.id = 1 AND p.resource_name = 'subnet' AND p.resource_id = 1;
+```
+
+```
+       email        |  subnet   |               access_list                
+--------------------+-----------+------------------------------------------
+ jkowal@example.org | 192.0.2.5 | [ "create", "read", "update", "delete" ]
+```
+
+The following is the example query which returns all the group permissions of the user having the id of 1 to the subnet having the id of 1:
+
+```sql
+SELECT * FROM group_permission
+  WHERE group_id IN
+      (SELECT group_id FROM user_group_assoc WHERE user_id = 1)
+    AND resource_name = 'subnet' AND (resource_id IS NULL OR resource_id = 1);
+```
+
+#### Use Case 2
+
+This query selects all users having any permissions to the subnet having id of 1.
+
+```sql
+SELECT u.email, p.access_list
+  FROM stork_user AS u
+  INNER JOIN user_permission AS p
+    ON u.id = p.user_id
+  WHERE p.resource_name = 'subnet' AND (p.resource_id IS NULL OR p.resource_id = 1);
+```
+
+```
+       email        |               access_list                
+--------------------+------------------------------------------
+ jkowal@example.org | [ "create", "read", "update", "delete" ]
+```
+
+This query selects all groups having any permissions to the subnet having id of 1.
+
+```sql
+SELECT * FROM stork_user_group AS g
+  INNER JOIN group_permission AS p
+    ON g.id = p.group_id
+  WHERE p.resource_name = 'subnet' AND (p.resource_id IS NULL OR p.resource_id = 1);
+```
+
+#### Use Case 3
+
+This query returns all permissions granted to the user.
+
+```sql
+SELECT p.resource_table, p.resource_id, p.access_list
+  FROM user_permission AS p
+  WHERE p.user_id = 1;
+```
+
+```
+ resource_table | resource_id |               access_list                
+----------------+-------------+------------------------------------------
+ subnet         |           1 | [ "create", "read", "update", "delete" ]
+```
+
+The following query returns all permissions granted to the user via the groups it belongs to.
+
+```sql
+SELECT * FROM group_permission
+  WHERE group_id IN
+      (SELECT group_id FROM user_group_assoc WHERE user_id = 1);
+```
+
+It excludes user specific permissions.
+
+# Viewing Remote Logs
+
+The Stork user wants to be able to click on the selected service in the StorkUI and then display the logs of the selected service. Typically, the user will be interested in seeing the *tail* of the log to diagnose the problems with the given service. The functionality that we will provide must not rely on the API of the given service to return the logs because the service may be down. In fact, the service being down may be the primary reason why someone is trying to look into the logs.
+
+One of the most powerful software solutions that enable remote logging are Logstash and Greylog. They come with the whole infrastructure which not only is capable of transferring the logs from remote systems to the central database, but also they structure the captured logs in the way that allows for efficient and flexible searching using [Elastic Stack](https://www.elastic.co/products/elasticsearch). The integration with those tools comes at the expense of additional burden on the system administrator to install and configure them. We envisage such integration in the future, but making this integration user friendly would require significant effort and much more experience with those solutions than we have today.
+
+It also seems impractical to use "all or nothing" approach and force users to install such a heavy environment if the only operation the users want to perform is to watch and scroll through the tail of the log file.
+
+We propose that the remote logging in Stork Sprout will be limited to a home grown solution for viewing the tail of the given log file.
+
+Stork will have access to the configurations of the services such as Kea, BIND etc. STAG will provide the RPC command to return the partial log for the given file. The path to the file will be sent as a parameter in the RPC call. The response to this command will contain the tail of the log. Possibly, we can also provide additional parameter specifying the maximum number of lines to be returned. This solution will be lightweight and easy to implement. It will mostly address the use case of viewing the dead server's last log messages.
+
+In the second step we will extend this mechanism to follow the log messages produced by the service, i.e. similar to unix `tail -f` command. The gRPC enables streaming responses, which means that Stork may issue a request to STAG to fetch and follow the log and STAG will be sending the new log lines as they appear. There will be similar streaming mechanism required between the StorkUI and the backend.
+
+# Operation
+
+## Adding new Server with STAG and Services
+
+```mermaid
+sequenceDiagram
+    User->>Server: add new server (address where STAG has been started)
+    Server->>STAG: what do you have
+    STAG-->>Server: I have Kea
+```
+
+# Appendix 1: Decisions Made and To Be Made
+
+This appendix lists all critical issues identified by the team for which an explicit decision has to be made. It also includes the final decision and the date when it was made.
+
+| Issue | Decision | Reasons | Decision Date | Signature |
+| ----- | -------- | -------|-------------- | ---------
+| Frontend Technology | PrimeNG | | Stork Call - Sept 24, 2019 | Godfryd, Marcin, Matthijs, Tomek, Vicky |
+| Backend Technology | Golang + gin-gonic (may be affected by subsequent decision regarding the use of swagger) | | Stork Call - Sept 24, 2019 | Godfryd, Marcin, Matthijs, Tomek, Vicky |
+| Stork Database Selection | Postgres | | Stork Call - Sept 24, 2019 | Godfryd, Marcin, Matthijs, Tomek, Vicky |
+| Single vs Multiple repositories for frontend and backend | | ? | ? | ? |
+| Agreement on requirements for 1.0 release | | ? | ? | ? |
+| Use of goswagger vs gin | goswagger (partially deprecates the decision about Backend Technology) | We want to use Swagger for documenting API, and generating server and client sides. Gin does not support this. There is only one tool that can do this for Go: goswagger.io. We also do not loose that much as the web service will serve mostly API so features for serving regular html pages are not required. | 2019.10.01 | Godfryd, Marcin, Matthijs, Tomek, Witold, Alan |
+| Use of casbin vs proprietary authorization | | ? | ? | ? |