HAProxy connection issues #314

Closed
opened 2021-03-04 08:19:14 +00:00 by raucao · 3 comments
Owner

Also see #271 (comment)

Since last week or so, our HAProxy on Draco is failing to forward connections for a short while every night (CET night / very early morning, American evening / late night). The situation seems to last for about 10 minutes every time, and sometimes it happens 2 times in a row.

When this happens, some Uptime Robot monitors of mine (Wiki and Mastodon) are catching it:

Screenshot

Also see https://gitea.kosmos.org/kosmos/chef/issues/271#issuecomment-3501 > Since last week or so, our HAProxy on Draco is failing to forward connections for a short while every night (CET night / very early morning, American evening / late night). The situation seems to last for about 10 minutes every time, and sometimes it happens 2 times in a row. > When this happens, some Uptime Robot monitors of mine (Wiki and Mastodon) are catching it: ![Screenshot](https://gitea.kosmos.org/attachments/5e7e3388-8de5-4c1c-a2ea-4683c735a0e4)
raucao added the
ops
label 2021-03-04 08:19:14 +00:00
raucao changed title from Investigate HAProxy connection issues to HAProxy connection issues 2021-03-04 08:34:57 +00:00
raucao added this to the Current operational issues project 2021-03-04 08:35:07 +00:00
Author
Owner

@slvrbckt had a look at our HAProxy config and noticed that there was no maxconn settings, which resulted in the default of ulimit being used (only 1024).

This could likely already have been the cause of additional connections failing. I added a new global setting of maxconn 60000, as suggested on chat.

Let's see if that solves it already...

More suggestions:

We could also add some backend-specific connection limits to prevent overloading specific services, like so:

backend some_backend_service
	server foo-1 127.0.0.1:8000 maxconn 30
	server foo-2 127.0.0.1:8001 maxconn 30

for all of your backends which specify mode http, you can add the like option http-reuse always

it may be useful to add timeouts to your backends, in case something is hanging or otherwise unresponsive. E.g. timeout server 30s

@slvrbckt had a look at our HAProxy config and noticed that there was no `maxconn` settings, which resulted in the default of `ulimit` being used (only 1024). This could likely already have been the cause of additional connections failing. I added a new global setting of `maxconn 60000`, as suggested on chat. Let's see if that solves it already... More suggestions: We could also add some backend-specific connection limits to prevent overloading specific services, like so: ```plain backend some_backend_service server foo-1 127.0.0.1:8000 maxconn 30 server foo-2 127.0.0.1:8001 maxconn 30 ``` > for all of your backends which specify mode http, you can add the like `option http-reuse always` > it may be useful to add timeouts to your backends, in case something is hanging or otherwise unresponsive. E.g. `timeout server 30s`
Owner

Just one correction it's just http-reuse always (no option prefix)

Just one correction it's just `http-reuse always` (no `option` prefix)
raucao self-assigned this 2021-04-02 08:56:17 +00:00
slvrbckt was assigned by raucao 2021-04-02 08:56:17 +00:00
raucao added the
kredits-1
label 2021-04-02 08:56:23 +00:00
Author
Owner

The problem has not re-appeared, and all available information points to the maxconn limit having been reached before. Thanks again @slvrbckt for helping with this!

The problem has not re-appeared, and all available information points to the maxconn limit having been reached before. Thanks again @slvrbckt for helping with this!
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: kosmos/chef#314
No description provided.