Happened again today, so I wrote a quick Ruby script, and added a cron job that is logging the output to /root/cron.log for now:
cluster_nodes = `/opt/ejabberd-20.04/bin/ejabberdctl list_cluster`
node_names = cluster_nodes.gsub("'","").lines.map(&:chomp)
if node_names.include?("ejabberd@draco.kosmos.org")
# Cluster nodes are connected, so we're good
else
# For some reason, draco isn't connected anymore
puts "#{DateTime.now.to_s} ejabberd@draco.kosmos.org not found in node list, re-connecting..."
`/opt/ejabberd-20.04/bin/ejabberdctl join_cluster ejabberd@draco.kosmos.org`
end
Created nodejs-2 with private IP 10.1.1.229 and deleted the broken VM. Then deployed kredits-github to it.
centaurus is running nginx on the host, so we cannot use HTTP/S forwarding with HAProxy, as we do on draco. It's possible to do this in Nginx nowadays (see here and here), but I didn't want to get into that right now, because we should discuss it first for that server, and we shouldn't do the exact same thing with 2 different programs IMO. So the traffic forwarding is done from draco to the VM on centaurus for now.
Just apt-updated nodejs-1, tried to reboot and now it won't boot up anymore, taking up full CPU resources while trying to do so. I have shut it off for now. No idea what's going on, because there's no console output.
I have prepared a fresh VM on centaurus for kosmos.social:
mastodon-1
10.1.1.156