Scrapping causes gitea to get unresponsive

Since this morning I’m having notifications that my gitea is slow to respond, my logs indicate a lot of scrapping being done.
What I’m seeing on gitea side is that after a few minutes my host starts to have 20+ load, there is also a lot of connections to the database (I had to bump the limit), ctrl-c ing gitea take some time, I guess until all requests are treated.

I was running 1.10.3 on docker, and migrated outside docker to 1.12.0+dev-255-g70dd3faad (official binary), and both has the same problem.

I also got a panic while receiving requests:

[Macaron] 2020-02-06 21:29:50: Completed GET /DashieHam/xxx/src/commit/xxx/modes?lang=bg-BG 500 Internal Server Error in 3m52.240921707s
[Macaron] 2020-02-06 21:29:50: Completed GET /dashie/xxx/src/commit/xxx/lib/pleroma/web/activity_pub/views/user_view.ex 500 Internal Server Error in 4m4.294135608s
panic: sync: negative WaitGroup counter

goroutine 19542 [running]:
sync.(*WaitGroup).Add(0xc0073a2450, 0xffffffffffffffff)
	/usr/local/go/src/sync/waitgroup.go:74 +0x139
sync.(*WaitGroup).Done(...)
	/usr/local/go/src/sync/waitgroup.go:99
code.gitea.io/gitea/modules/graceful.wrappedConn.Close(0x31f9760, 0xc0894f2388, 0xc0073a2420, 0x0, 0x0)
	/go/src/code.gitea.io/gitea/modules/graceful/server.go:249 +0x7b
net/http.(*conn).close(0xc089dca280)
	/usr/local/go/src/net/http/server.go:1662 +0x42
net/http.(*conn).serve.func1(0xc089dca280)
	/usr/local/go/src/net/http/server.go:1771 +0xbf
net/http.(*conn).serve(0xc089dca280, 0x31e71a0, 0xc089db8c00)
	/usr/local/go/src/net/http/server.go:1900 +0xa49
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:2928 +0x384
panic: sync: negative WaitGroup counter

goroutine 18776 [running]:
sync.(*WaitGroup).Add(0xc0073a2450, 0xffffffffffffffff)
	/usr/local/go/src/sync/waitgroup.go:74 +0x139
sync.(*WaitGroup).Done(...)
	/usr/local/go/src/sync/waitgroup.go:99
code.gitea.io/gitea/modules/graceful.wrappedConn.Close(0x31f9760, 0xc0868268c0, 0xc0073a2420, 0x0, 0x0)
	/go/src/code.gitea.io/gitea/modules/graceful/server.go:249 +0x7b
net/http.(*conn).close(0xc086d9cf00)
	/usr/local/go/src/net/http/server.go:1662 +0x42
net/http.(*conn).serve.func1(0xc086d9cf00)
	/usr/local/go/src/net/http/server.go:1771 +0xbf
net/http.(*conn).serve(0xc086d9cf00, 0x31e71a0, 0xc087d352c0)
	/usr/local/go/src/net/http/server.go:1900 +0xa49
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:2928 +0x384

Any ideas about this whole behavior ?

Is someone bombing you with requests, perhaps? That’s one reason I can think of for a sudden spike in DB transactions.

the usual scanners/indexers/whatever. but it’s the first time my gitea is working that bad.
I can’t let it run more than a few minutes before saturating all db connections, timeouting and just overloading everything.

We are discussing internally, this is definitely odd behavior.

So it looks like you’re experiencing a double decrement from the server connection waitgroup. I suspect that means that there is a double close of the connection somewhere - quite why I don’t understand. Usually a double close would result in an error - and again I don’t understand why this hasn’t happened in your case - that error would have prevented the double decrement.

So - because we obviously can’t rely on that double close causing an error we’re going to have to change things hence: https://github.com/go-gitea/gitea/pull/10170

Now… That doesn’t explain what really is going on. This only solves one symptom.

I suspect that you might be suffering port-exhaustion from repeated opens and closes of connections to your db. (It’s postgres right?) That would cause some interesting behaviour and could cause the network stack to not behave in the way we expect.

Now solutions to port exhaustion are:

  • Use a unix socket for your db
  • Set your maxopenconnections to be the same as your maxidleconnections and both be low enough.

Be careful of your http proxy similarly causing port exhaustion - you can use a unix socket for this too.

We’re still too easy to cause port exhaustion and that’s definitely something we should look into. I think it’s possible to end up getting the same data out of the database multiple times for multiple things - most of which aren’t used too!

Thanks @zeripath for the WG fix.

My Nginx, Postgres and Gitea are the three on different VMs so I can’t uses unix sockets.
I’ve then set:

MAX_OPEN_CONNS = 50
MAX_IDLE_CONNS = 50

I will now restart Gitea, and see if scrapping restart and if there is any change.