EGroupware Cloud Status
Malfunction of EGroupware Mail in FRA, failover to data center in KA operational
EGroupware Email services: 2021 July 6th:
Scrub (Filesystem check) is running to check everything in details. This will take anyway some days. Until its finished we moved half of the instances now to Karlsruhe to not slow down the filesystem check.
Effected mailboxes has been restored from KA and are working properly now again in FRA.
EGroupware Email services: 2021 July 5th 10.30 (CEST)
Storage System in the datacenter in Frankfurt shows Checksum errors and Mailboxes are not available.
- Opened Ticket at teh service provider of the datacenter IONOS and waiting for there response
- Switched Mail backend in Frankfurt off temporary, so the redundancy could take over and Mail services are now running all in the datacenter in Karlsruhe.
Your data is save, but the performance will be a bit slower until Frankfurt can be switched on again.
We will inform here as soon as we have any news on that topic.
EGroupware Cloud maintenance window: 2021 June 2nd from 8 – 9.30 pm (CEST)
Our guess on yesterday’s problem is that an “broken request” from a client on a single domain, then causes “Traefik” to respond to more than that client for some time with a “500 Internal Server Error”.
We will re-enable “Traefik” tonight and try to find out which request, domain and IP is causing the problem.
Problem has been identified with high probability and everything has been reset to normal operation.
Internal Server Error in Frankfurt: 01.06.2021 21:00 – 23.59 hrs
There was a problem in the EGroupware Cloud availability zone in Frankfurt from around 9pm, so that “500 Internal Server Error” occurred there again and again. The availability zone in Karlsruhe was not affected by the problem or only for a short time after we had switched everything to Karlsruhe as a workaround. Further investigations suggest that there is NO direct connection to the update to 21.1, but rather a problem with “Traefik” as a proxy / Kubenetes Ingress Controller, which only comes into play under very specific conditions.
As a first step, we updated the version of “Traefik”, which reduced the problem but did not eliminate it. A search in the “Github Forums of Traefik” gave a similar error description in the following post. In order to be able to provide a meaningfully usable EGroupware Cloud today, we removed “Traefik” and are talking directly to Nginx, so there were then no more “Internal Server Errors”.
Failure of all EGroupware and mail services: 06.04.2021: 17.45 – 19.20 CEST
IONOS has caused a network problem, hence the outage of the EGroupware and Mail services.
Colleagues are working as quickly as possible to clean up and restore connections.
06.04.2021: 18.30h The IONOS network is back up, but it will take some time until EGroupware and Mail are available again.
06.04.2021: 19.20h The nodes in Karlsruhe and then in Frankfurt are available again, so all EGroupware and Mail services are running.
SERVICE FAILURE EGROUPWARE NODE KARLSRUHE & FRANKFURT 24.08.2020 15.40 (CEST)
Service failure EGroupware node Karlsruhe & Frankfurt 24.08.2020 15.40 (CEST)
We are in the process of determining where the problem lies.
Currently, both nodes seem to be affected.
Only analysis shows a connection problem on the loadbalancers,
so there is no connection from outside.
18.00 o’clock (CEST): All systems (including the database cluster). were shut down.
The first database node was successfully restarted.
Currently the second database node is starting and synchronizing with the first.
As soon as this is completed, we will also restart the remaining systems.
18.30h (CEST): EGroupware and mail services are up again.