Troubleshooting during installation¶
Problems with CSP on the web based applications (webapp, team-settings, account-pages)¶
If you have installed wire-server, but the web application page in your browser has connection problems and throws errors in the console such as "Refused to connect to 'https://assets.example.com' because it violates the following Content Security Policies"
, make sure to check that you have configured the CSP_EXTRA_
environment variables.
In the file that you use as override when running helm install/update -f <override values.yaml>
(using the webapp as an example):
For more info, you can have a look at respective charts values files, i.e.:
Problems with ansible and python versions¶
If for instance the following fails:
If your target machine only has python 3 (not python 2.7), you can tell ansible to use python 3 by default, by specifying ansible_python_interpreter
:
(python 3 may not be supported by all ansible modules yet)
Flaky issues with Cassandra (failed QUORUMs, etc.)¶
Cassandra is very picky about time! Ensure that NTP is properly set up on all nodes. Particularly for Cassandra DO NOT use anything else other than ntp. Here are some helpful blogs that explain why:
How can I ensure that I have correctly setup NTP on my machine(s)? Have a look at this ansible playbook
I deployed demo-smtp
but I’m not receiving any verification emails¶
- Check whether brig deployed successfully (brig pod(s) should be in state Running)
- Inspect Brig logs
-
The receiving email server might refuse to accept any email sent by the
demo-smtp
server, due to not being a trusted origin. You may want to set up one of the following email verification mechanisms. - DKIM
-
You may want to adjust the SMTP configuration for Brig (
wire-server/[values,secrets].yaml
).
(Don’t forget to apply the changes with helm upgrade wire-server wire/wire-server -f values.yaml -f secrets.yaml
)
I deployed demo-smtp
and I want to skip email configuration and retrieve verification codes directly¶
If the only thing you need demo-smtp for is sending yourself verification codes to create a test account, it might be simpler and faster to just skip SMTP configuration, and simply retrieve the code internally right after it is sent, while it is in the outbound email queue.
To do this, click create a user/account/team, or if you already have, click on Resend Code
:
Then run the following command
Or step by step:
- Get the name of the pod
Which will give you a result that looks something like this
In which case, the pod name is demo-smtp-85557f6877-qxk2p
, which replaces
- Then get the content of emails and extract the code
Which will give you the content of sent emails, including the code
This means the code is 022515
, simply enter it in the interface.
If the email has already been sent out, it’s possible the queue will be empty.
If that is the case, simply click the “Resend Code” link in the interface, then quickly re-send the command, a new email should now be present.
Obtaining Brig logs, and the format of different team/user events¶
To obtain brig logs, simply run
You will get log entries for various different types of events that happen, for example:
- User creation
- Activation key creation
- Activation of a new user
- User indexing
- Team creation
- Invitation sent
Diagnosing and addressing bad network/disconnect issues¶
Diagnosis¶
If you are experiencing bad network/disconnection issues, here is how to obtain the cause from the client log files:
In the Web client, the connection state handler logs the disconnected state as reported by WebRTC as:
On mobile, the output in the log is slightly different:
And when the timer expires and the connection is not re-established:
If the attempt to reconnect then fails you will likely see the following:
If the connection to the SFT (Conference Calling 2.0 (aka SFT)) server is considered lost due to missing ping messages from a non-functionning or delayed data channel or a failure to receive/decrypt media you will see:
Then followed by these values:
Configuration¶
Question: Are the connection values for bad networks/disconnect configurable on on-prem?
Answer: The values are not currently configurable, they are built into the clients at compile time, we do have a mechanism for sending calling configs to the clients but these values are not currently there.
Diagnosing issues with installation steps.¶
Some steps of the installation (for example helm
commands) provide less feedback than others in the case errors are encountered.
These are some steps you can take to debug what is going on when the installation process breaks down.
As an example, we’ll take a case where we try installing wire-server
with helm
, but it fails due to cassandra
being broken in some way.
This guide, while focusing on a cassandra
related issue, will also provide general steps to debug problems that could be related to other components like rabbitmq
, redis
, etc.
Our first step is to identify and isolate which component is causing the issue.
Before installing wire-server
, we run d kubectl get pods
and get the result:
We then run the wire-server
helm installation command:
And we get the following error:
This, by itself, isn’t much help in understanding what is going wrong.
We can get more information by running d kubectl get pods
again:
(You can also do d kubectl get pods -o wide
to get more details though that’s not necessary here)
When comparing with the previous run of the command, we can see that a new pod has been created, called cassandra-migration-qgn7r
, and that it is in the Init:0/4
state.
This means that the pod has been created, but that the init containers have not yet completed. In particular, it is at step 0 out of 4.
If we let it running for a while, we’d see the “RESTARTS
” field increase to 1, then 2, etc, as the init containers keep failing.
We can use d kubectl logs
to learn more about this failing pod:
Note the name job-done
, this is the name of the last step (container) of the pod, which is not yet running.
We can get even more information about the pod by running d kubectl describe pod cassandra-migration-qgn7r
:
In this output, the «containers» are the different «stages» of this pod, described as they get executed.
We can see that the gundeck-schema
container (step) has failed, and that it has been restarted 4 times.
The other containers (steps) have not yet been executed, because the previous step failed, they’ll be in a “Waiting
”” state
We can get further information about the failure by running d kubectl logs cassandra-migrations-qgn7r -c gundeck-schema
.
This will provide us an output such as:
The error message «Cannot achieve consistency level ALL
» is the cause of the failure, it essentially means that some of the cassandra nodes in our cluster are not running, or not reachable in some way.
We have now succesfully reached the «root» cause of the issue.
We could use nodetool status
to get more details about the cassandra nodes, ping <NODE_IP>
to check if they are reachable, cat /var/log/cassandra/system.log
to look for any warnings/errors, review the cassandra documentation, use diagnostic tools such as nodetool cfstats
or nodetool describecluster
, etc.
Note that because the cassandra-migration-qgn7r
pod might get destroyed once the helm command outputs its error/terminates, you might have a limited amount of time to run these debugging commands, and might need to uninstall then re-install wire-server to get the error to occur multiple times. To uninstall the wire-server helm chart before running it again, run d helm uninstall wire-server
.
More generally, you can also get d kubectl get events
to get a list of all the events that have happened in your cluster, including the creation/destruction of pods, and the errors that have occured.
Here we can see that the cassandra-migrations-qgn7r
pod was created, then the warnings about the «BackOff
» and reaching the backoff limit.
Verifying correct deployment of DNS / DNS troubleshooting.¶
After installation, or if you meet some functionality problems, you should check that your DNS setup is correct.
You’ll do this from either your own computer (any public computer connected to the Internet), or from the Wire backend itself.
Testing public domains.¶
From your own computer (not from the Wire backend), test that you can reach all sub-domains you setup during the Wire installation:
assets.<domain>
teams.<domain>
webapp.<domain>
accounts.<domain>
nginz-https.<domain>
nginz-ssl.<domain>
sftd.<domain>
restund01.<domain>
restund02.<domain>
federator.<domain>
Some domains (such as the federator) might not apply to your setup. Refer to the domains you configured during installation, and act accordingly.
You can test if a domain is reachable by typing in your local terminal:
If the domain is succesfully resolved, you should see something like:
And if the domain can not be resolved, it will be something like this:
Do this for each and every of the domains you configured, make sure each of them is reachable from the open Internet.
If a domain can not be reached, check your DNS configuration and make sure to solve the issue.
Testing internal domain resolution.¶
Open a shell inside the SNS pod, and make sure you can resolve the following three domains:
minio-external
cassandra-external
elasticsearch-external
First get a list of all pods:
In here, find the sns pod (usually its name contains fake-aws-sns
).
Open a shell into that pod:
From inside the pod, you should now test each domain:
If the domain is succesfully resolved, you should see something like:
And if the domain can not be resolved, it will be something like this:
If you can not resolve any of the three domains, please request support.
Testing reachability of AWS.¶
First off, use the Amazon AWS documentation to determine your region code: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html
Here we will use us-west-1
but please change this to whichever value you set in your values.yaml
file during installation.
First list all pods:
In here, find the sns pod (usually its name contains fake-aws-sns
).
Open a shell into that pod:
And test the reachability of the AWS services:
If it can be reached, you’ll see something like this:
And if it can’t:
If you can not reach the AWS domain from the SNS pod, you need to try those from one of the servers running kubernetes (kubernetes host):
Then try the same thing using nslookup
.
If either of these steps fail, please request support.