High availabillity / failover with Gitea Helm chart

viceice · May 6, 2021, 1:26pm

Hi there,

I’m running Gitea with the official Helm chart, and it’s working fine so far. I already have a redis and elasticsearch cluster configured. Also the PostgreSQL server will be HA soon. Data is stored on sha shared nfs drive

So now i tried to set replicaCount: 2 which is basically working too, but sometimes i get broken PR’s.
I think this mostly happens when there is a lot load from CI and my renovate bot.

If i only have one running gitea container, it shows the broken pr only a frew seconds after push. But if i try to view big diffs, gitea is blocking for more than a second, so my kubernetes healthchecks fail and the container gets restarted.

So i like to have at least two container running to have a small failover if one container fails.

Does anybody know how to best configure, so i don’t get the broken PR’s?

I’ve switched to the v3 helm chart with gitea v1.14.1 rootless. I’m using http only access.

I can share my values if it helps.

techknowlogick · May 7, 2021, 7:28pm

Soo uhh, Gitea should block in a way like that. There are multiple instances that server (tens of) thousands of users, that are not multi-homed where this isn’t happening to them. It sounds like there could be an underlying issue that should be solved rather than attempting to apply a bandaid with HA.

viceice · May 10, 2021, 7:48am

So I’t maybe a postgres or redis issue? or a too slow nfs?

This sometimes happen when i do a big push, then the kubernetes probes to /user/login are failing with timeouts (was 1sec). The readiness has now 2sec timeout and the liveness probe 10sec.

techknowlogick · May 10, 2021, 9:53pm

Turns out the hanging is likely an issue with an upstream library (invalid use of mutex locks in that library which do lock things under load), still investigating how we as a project can solve this, and hopefully it also means that we can fix the upstream library too so others can benefit as well.

viceice · May 11, 2021, 4:31am

Thanks for information, so with the extended timeouts it seems to be no longer restarted by kubernetes, but thats not fixing the underlying issue.

So it would be nice to have an extra /_health endpoint for general health checks (like database available, redis available …) which hopefully don’t need any lock.

btw: I’m using minio (4 node cluster) as s3 storage, which is also shared between other services.

viceice · July 6, 2021, 9:50am

Any news on this? I’ve had again some hung git clones. Can i workaround by adding some more workers for any queues?

techknowlogick · August 4, 2021, 12:57am

Indeed, this has been fixed in a recent stable release.

lunny · October 22, 2021, 6:01am

Now you can expect https://gitea.com/gitea/helm-chart/pulls/205 merged. But it’s not completed and I have added some comments there.

pat-s · May 29, 2023, 8:34am

#437 - [Breaking] HA-support via `Deployment` - helm-chart - Gitea: Git with a cup of tea is the successor of the previous one mentioned.

Many details need to be considered for HA, just increasing the replicaCount doesn’t work and will lead into problems.

Topic		Replies	Views
Gitea sometimes make server crash on commit Install/Maintain/Configure	2	672	September 24, 2023
Install on Kubernetes with separate database fails Install/Maintain/Configure	0	1348	May 25, 2021
How to run more than one replicas? Install/Maintain/Configure	0	474	October 20, 2023
Database fault tolerance / Docker Install/Maintain/Configure	2	1203	June 15, 2018
504 - Gateway Timeout on Repository Creation General	0	819	June 12, 2023

High availabillity / failover with Gitea Helm chart

Related Topics