Fix zero server fingerprint file preserved after upgrade to 1.15.1
Use case
User first update Stork server to 1.15.1, the agent is in a old version. Then, the agent is restarted. Now, the server and agent can communicate. Next, user updates the agent to 1.15.1 and restarts it. Now, the agent rejects the server connections.
Steps to reproduce:
- Prepare environment with Stork server and agent in 1.13 version
- Register agent and authorize
- Update server to 1.15.1
- Restart agent
- Update agent to 1.15.1
- Restart agent
- Observe the bad certificate errors
It is probably caused by the zero fingerprint of the server stored, in this case, on the agent side:
root@e2c8d0ae75b3:/pkgs# cat /var/lib/stork-agent/tokens/server-cert.sha256
0000000000000000000000000000000000000000000000000000000000000000
So, in my opinion backend/agent/register.go:86-99 need to be re-think.
Workarounds:
- if you upgrade server first, then run with old agent + new server - recovery: upgrade agent, then remove the cert files manually.
- upgrade both agent and server at the same time
- upgrade agent first (won't be able to connect to old server), upgrade server next, connection will be reestablished
Solutions
- Assume the cert store is valid if the server fingerprint file is missing. Return zero value on read fingerprint if it is missing. Cons: A bit not consitend but good enough.
- Remove the zero fingerprint file immediately after the validity check Cons: We need to handle case when the fingerprint file is missing. It may be error-prone.
- Add an specialized IsValid function that will ignore the server fingerprint file. Cons: We need to handle case when the fingerprint file is missing. It may be error-prone.
- Remove the special case entirely. Cons: The agent need to be re-authorized manually after update from previous versions.