Choose monitoring/alerting/logging stack/software #432

Open
opened 2022-10-18 12:11:00 +00:00 by raucao · 3 comments
Owner

Note from call:

Currently, there is no unified monitoring or alerting for resources or services set up. We should decide on a (100% FOSS) stack that we'd like to use.

Note from call: Currently, there is no unified monitoring or alerting for resources or services set up. We should decide on a (100% FOSS) stack that we'd like to use.
raucao added the idea label 2022-10-18 12:11:00 +00:00
Author
Owner

This could be nice for managing Prometheus: https://openitcockpit.io/

Forget about this one. Prometheus (of all things) is like one of two features behind an enterprise edition paywall.

<del>This could be nice for managing Prometheus: https://openitcockpit.io/</del> Forget about this one. Prometheus (of all things) is like one of two features behind an enterprise edition paywall.
Author
Owner

Forgot to put https://victoriametrics.com/products/open-source/ here, which we talked about before. It can ingest data from pretty much everything, and seems to be well-suited for small environments.

Forgot to put https://victoriametrics.com/products/open-source/ here, which we talked about before. It can ingest data from pretty much everything, and seems to be well-suited for small environments.
raucao added the feature label 2025-04-22 14:34:32 +00:00
raucao added this to the 2025 project 2025-04-22 14:34:44 +00:00
raucao moved this to Epics in 2025 on 2025-04-22 15:04:10 +00:00
raucao modified the project from 2025 to 2026 2026-01-15 07:30:59 +00:00
raucao moved this to To Do in 2026 on 2026-01-21 04:09:48 +00:00
Author
Owner

I looked into this a bit, and I think the simplest, most widely used and supported solution is still Prometheus + Alertmanager. (And optionally, Grafana dashboards, of course.)

Monitoring system resources has first-class support via Node exporter, so alerts for things like low disk space are very easy to add. Mostly a question of config automation, so we don't have to manually edit rules for every existing or new host.

For uptime monitoring specifically, we could stop paying for UptimeRobot and use Peekaping from some location/VM that isn't one of our main hosting locations. We can either create an XMPP notification adapter for it, or simply add their Webhooks to Hubot Incoming Webhook.

I looked into this a bit, and I think the simplest, most widely used and supported solution is still [Prometheus](https://prometheus.io/) + [Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/). (And optionally, Grafana dashboards, of course.) Monitoring system resources has first-class support via [Node exporter](https://github.com/prometheus/node_exporter), so alerts for things like low disk space are very easy to add. Mostly a question of config automation, so we don't have to manually edit rules for every existing or new host. For uptime monitoring specifically, we could stop paying for UptimeRobot and use [Peekaping](https://peekaping.com/) from some location/VM that isn't one of our main hosting locations. We can either create an XMPP notification adapter for it, or simply add their Webhooks to [Hubot Incoming Webhook](https://github.com/67P/hubot-incoming-webhook).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: kosmos/chef#432