Vault Runbook#
A runbook for common issues encountered when operating Vault on Operate First Cloud.
Pre-requisites#
Unseal keys and root token for the snapshot
Kustomize CLI
Vault CLI
OC CLI
Missing Client token#
Error can be encountered when communicating locally via Vault CLI or via the Vault Pods. Error looks like the following:
Error making API request.
URL: GET http://127.0.0.1:8200/....
Code: 400. Errors:
* missing client token
In both situations simply run vault login
and enter your token. In the case of the Vault Pods, oc rsh
in to the pod
first by running oc rsh <vault-pod-name>
.
Note: For the vault pod, use the root-token acquired via bitwarden.
In the case of the Vault Pods, you should need to do it only once.
The token is saved to: /home/vault/.vault-token
Any successive terminal sessions in the pod will use this token.
Could not get provider client: unable to log in to auth method#
This error may be seen in the ESO logs when looking to setup a secretstore that fails to authenticate against vault.
One reason this may happen is the serviceaccount JWT for the target cluster (where the ESO in question resides)
that is used to access the TokenReview
API to validate other JWT may have been rotated.
There is no clear way to confirm this, but you can run the following command when logged in to the cluster where this ESO is deployed. Log in to vault via cli, and run:
vault write auth/${cluster}-k8s/config \
token_reviewer_jwt=$(oc sa get-token eso-vault-auth -n external-secrets-operator) \
kubernetes_host=$(oc whoami --show-server)
This will update the token_reviewer_jwt
with the new ESO SA that vault can use to validate other JWTs. Try creating
the secretstore again and see if it successfully authenticates (by checking the .status
field).
Failed to look up namespace from the token#
When creating a client_token
via k8s JWT service account token you might get this error message:
Error writing data to auth/smaug-k8s/login: Error making API request.
URL: PUT https://vault-ui-vault.apps.smaug.na.operate-first.cloud/v1/auth/smaug-k8s/login
Code: 500. Errors:
* error performing token check: failed to look up namespace from the token: no namespace
One of the causes of this might be the creation of a new client_token
while the cached client_token
at $HOME/.vault_token
has been in invalidated.
To fix this just remove the old token with: rm ~/.vault_token
.
Rejoin A Vault node to the Vault cluster#
To confirm all nodes are part of the Vault cluster, run the following on all vault pods in vault
namespace:
$ oc exec -ti opf-vault-0 -n vault -- vault operator raft list-peers
$ oc exec -ti opf-vault-1 -n vault -- vault operator raft list-peers
$ oc exec -ti opf-vault-2 -n vault -- vault operator raft list-peers
There are 3 possible outputs for each pod:
A. Only one vault node is listed (itself) B. There are 2 vault nodes are listed (itself and one more), identify the leader pod C. All 3 vault nodes are listed
Then there are three possible cases:
Case 1: All nodes show output (C) Everything is good, you are done.
Case 2: All show output (A)
If all nodes show (A), then no nodes are part of the cluster, identify one to be the leader and set the following
environment variable: LEADER=http://<pod-name>.opf-vault-internal:8200
where <pod-name>
corresponds to your chosen
leader, for example: LEADER=http://opf-vault-1.opf-vault-internal:8200
.
Join the remaining pod (opf-vault-0
and opf-vault-2
in our example) to our chosen leader:
oc exec -ti opf-vault-0 -- vault operator raft join http://opf-vault-1.opf-vault-internal:820
oc exec -ti opf-vault-2 -- vault operator raft join http://opf-vault-1.opf-vault-internal:820
Unseal each of these nodes, instructions here.
Case 3: At least one pod showed 2 nodes in the cluster Then the third node needs to be rejoined with the other nodes, identify the leader from the other 2 nodes by running:
Let’s assume opf-vault-0
and opf-vault-2
are in the same cluster, and opf-vault-1
is not (i.e. showing a single
node output in list-peers
command).
~ $ oc exec -ti opf-vault-1 -- vault operator raft list-peers
Node Address State Voter
---- ------- ----- -----
09b66090-446e-3974-bcb1-7080dacc07e7 opf-vault-0.opf-vault-internal:8201 follower true
56796c78-adac-b69e-8a62-c431b17c8b9b opf-vault-2.opf-vault-internal:8201 leader true
When we do the same for opf-vault-1
we see:
~ $ oc exec -ti opf-vault-1 -- vault operator raft list-peers
Node Address State Voter
---- ------- ----- -----
e58ab8b5-62d7-cc6f-27e0-79656bfb8ab7 opf-vault-1.opf-vault-internal:8201 leader true
In this example, it’s clear that opf-vault-1
is not part of the main cluster. And opf-vault-2
is the leader of the
main cluster. So we want to rejoin opf-vault-1
by running:
oc exec -ti opf-vault-1 -- vault operator raft join http://opf-vault-2.opf-vault-internal:820
Now all 3 nodes should show the following output:
$ oc exec -ti opf-vault-1 -- vault operator raft list-peers
Node Address State Voter
---- ------- ----- -----
09b66090-446e-3974-bcb1-7080dacc07e7 opf-vault-0.opf-vault-internal:8201 follower true
56796c78-adac-b69e-8a62-c431b17c8b9b opf-vault-2.opf-vault-internal:8201 leader true
e58ab8b5-62d7-cc6f-27e0-79656bfb8ab7 opf-vault-1.opf-vault-internal:8201 follower true
Unseal opf-vault1
by following the instructions here.
We are done.