HAProxy connection issues #314

New Issue

raucao · 2021-03-04T08:19:14Z

raucao commented

2021-03-04 08:19:14 +00:00

Since last week or so, our HAProxy on Draco is failing to forward connections for a short while every night (CET night / very early morning, American evening / late night). The situation seems to last for about 10 minutes every time, and sometimes it happens 2 times in a row.

When this happens, some Uptime Robot monitors of mine (Wiki and Mastodon) are catching it:

Also see https://gitea.kosmos.org/kosmos/chef/issues/271#issuecomment-3501 > Since last week or so, our HAProxy on Draco is failing to forward connections for a short while every night (CET night / very early morning, American evening / late night). The situation seems to last for about 10 minutes every time, and sometimes it happens 2 times in a row. > When this happens, some Uptime Robot monitors of mine (Wiki and Mastodon) are catching it: ![Screenshot](https://gitea.kosmos.org/attachments/5e7e3388-8de5-4c1c-a2ea-4683c735a0e4)

Screenshot from 2021-03-02 17-34-42.png

48 KiB

raucao added the

ops

label 2021-03-04 08:19:14 +00:00

raucao referenced this issue

2021-03-04 08:25:12 +00:00

ejabberd cluster node disconnects #271

raucao changed title from ~~Investigate HAProxy connection issues~~ to HAProxy connection issues

2021-03-04 08:34:57 +00:00

raucao added this to the Current operational issues project 2021-03-04 08:35:07 +00:00

raucao commented

2021-03-12 21:09:24 +00:00

@slvrbckt had a look at our HAProxy config and noticed that there was no maxconn settings, which resulted in the default of ulimit being used (only 1024).

This could likely already have been the cause of additional connections failing. I added a new global setting of maxconn 60000, as suggested on chat.

Let's see if that solves it already...

More suggestions:

We could also add some backend-specific connection limits to prevent overloading specific services, like so:

backend some_backend_service
	server foo-1 127.0.0.1:8000 maxconn 30
	server foo-2 127.0.0.1:8001 maxconn 30

for all of your backends which specify mode http, you can add the like option http-reuse always

it may be useful to add timeouts to your backends, in case something is hanging or otherwise unresponsive. E.g. timeout server 30s

@slvrbckt had a look at our HAProxy config and noticed that there was no `maxconn` settings, which resulted in the default of `ulimit` being used (only 1024). This could likely already have been the cause of additional connections failing. I added a new global setting of `maxconn 60000`, as suggested on chat. Let's see if that solves it already... More suggestions: We could also add some backend-specific connection limits to prevent overloading specific services, like so: ```plain backend some_backend_service server foo-1 127.0.0.1:8000 maxconn 30 server foo-2 127.0.0.1:8001 maxconn 30 ``` > for all of your backends which specify mode http, you can add the like `option http-reuse always` > it may be useful to add timeouts to your backends, in case something is hanging or otherwise unresponsive. E.g. `timeout server 30s`

👍 2

slvrbckt commented

2021-03-13 14:00:09 +00:00

Just one correction it's just http-reuse always (no option prefix)

Just one correction it's just `http-reuse always` (no `option` prefix)

👍 1

raucao self-assigned this 2021-04-02 08:56:17 +00:00

slvrbckt was assigned by raucao

2021-04-02 08:56:17 +00:00

raucao added the

kredits-1

label 2021-04-02 08:56:23 +00:00

raucao commented

2021-04-02 08:58:16 +00:00

The problem has not re-appeared, and all available information points to the maxconn limit having been reached before. Thanks again @slvrbckt for helping with this!

raucao closed this issue

2021-04-02 08:58:32 +00:00

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: kosmos/chef#314