WIP Prepare postgres for migration by replication

This commit is contained in:
2026-04-08 15:41:56 +04:00
parent 6583cd7010
commit bc3f291bd2
22 changed files with 748 additions and 7 deletions

271
doc/postgres-migration.md Normal file
View File

@@ -0,0 +1,271 @@
# Migrating PostgreSQL cluster to a new major version
## Summary
1. Dump from a replica
2. Restore to fresh VM running new major version
3. Add logical replication for delta sync from current/old primary
4. Switch primary to new server
5. Remove logical replication on new server
## Runbook
* Primary host: `PRIMARY_HOST`
* Replica host: `REPLICA_HOST`
* New PG14 host: `NEW_HOST`
* PostgreSQL superuser: `postgres`
* Running locally on each machine via `sudo -u postgres`
Adjust hostnames/IPs/etc. where needed.
---
### 🟢 0. PRIMARY — Pre-checks
```bash
sudo -u postgres psql -c "SHOW wal_level;"
sudo -u postgres psql -c "SHOW max_replication_slots;"
```
If needed, edit config:
```bash
sudo -u postgres vi $PGDATA/postgresql.conf
```
Ensure:
```conf
wal_level = logical
max_replication_slots = 10
```
Restart if changed:
```bash
sudo systemctl restart postgresql
```
---
### 🔵🟡 3. Create keypair for syncing dump later
🔵 On NEW_HOST:
```bash
sudo mkdir -p /home/postgres/.ssh && \
sudo chown -R postgres:postgres /home/postgres && \
sudo chmod 700 /home/postgres/.ssh && \
sudo -u postgres bash -c 'ssh-keygen -t ecdsa -b 256 -f /home/postgres/.ssh/id_ecdsa -N "" -C "postgres@$(hostname)"' && \
sudo cat /home/postgres/.ssh/id_ecdsa.pub
```
Copy the public key from the above output
🟡 On replica:
```bash
sudo mkdir -p /home/postgres/.ssh && \
sudo chown -R postgres:postgres /home/postgres && \
sudo chmod 700 /home/postgres/.ssh && \
echo [public_key] | sudo tee /home/postgres/.ssh/authorized_keys > /dev/null && \
sudo chmod 700 /home/postgres/.ssh
```
---
### 🟢 1. PRIMARY — Create publication and replication slots
```bash
sudo -u postgres pg_create_replication_publications
```
or
```bash
sudo -u postgres pg_create_replication_publication [db_name]
```
Listing publications and slots:
```bash
sudo -u postgres pg_list_replication_publications
sudo -u postgres pg_list_replication_slots
```
---
### 🟡 3. REPLICA — Pause replication
```bash
sudo -u postgres psql -c "SELECT pg_wal_replay_pause();"
```
Verify:
```bash
sudo -u postgres psql -c "SELECT pg_is_wal_replay_paused();"
```
---
### 🟡 4. REPLICA — Run dump
```bash
sudo -u postgres pg_dump_all_databases
```
or
```bash
sudo -u postgres bash -c "pg_dumpall --globals-only > /tmp/globals.sql"
sudo -u postgres pg_dump_database [db_name]
```
---
### 🟡 5. REPLICA — Resume replication
```bash
sudo -u postgres psql -c "SELECT pg_wal_replay_resume();"
```
---
### 🔵 6. COPY dumps to NEW HOST
From NEW_HOST:
```bash
export REPLICA_HOST=[private_ip] && \
cd /tmp && \
sudo -u postgres scp "postgres@$REPLICA_HOST:/tmp/globals.sql" . && \
sudo -u postgres scp "postgres@$REPLICA_HOST:/tmp/dump_*.tar.zst" .
```
---
### 🔵 7. NEW HOST (PostgreSQL 14) — Restore
#### 7.1 Restore globals
```bash
sudo -u postgres psql -f /tmp/globals.sql
```
---
#### 7.2 Create databases
```bash
sudo -u postgres psql -Atqc "SELECT datname FROM pg_database WHERE datallowconn AND datname NOT IN ('template1')" | \
xargs -I{} sudo -u postgres createdb {}
```
or
```bash
sudo -u postgres createdb [db_name]
```
---
#### 7.3 Restore each database
```bash
sudo -u postgres pg_restore_all_databases
```
or
```bash
sudo -u postgres pg_restore_database [db_name]
```
---
### 🔵 8. NEW HOST — Create subscriptions
```bash
sudo -u postgres pg_create_replication_subscriptions
```
or
```bash
sudo -u postgres pg_create_replication_subscription [db_name]
```
---
### 🔵 9. NEW HOST — Monitor replication
```bash
sudo -u postgres pg_list_replication_subscriptions
```
---
### 🔴 11. CUTOVER
#### 11.1 Stop writes on old primary
(app maintenance mode, stop app/daemons)
---
#### 11.2 Wait for replication to catch up
TODO: not the best way to check, since WAL LSNs keep increasing
```bash
sudo -u postgres psql -d [db_name] -c "SELECT * FROM pg_stat_subscription;"
```
---
#### 11.3 Fix sequences
Run per DB:
```bash
sudo -u postgres pg_fix_sequences_in_all_databases
```
or
```bash
sudo -u postgres pg_fix_sequences [db_name]
```
---
#### 11.4 Point app to NEW_HOST
* Update pg.kosmos.local in /etc/hosts on app server(s) (maybe override
attribute for role and converge)
* Start app/daemons, deactivate maintenance mode
---
### 🧹 12. CLEANUP (NEW_HOST)
```bash
sudo -u postgres pg_drop_replication_subscriptions
```
---
### 🧹 13. CLEANUP (PRIMARY)
TODO: Looks like slots are dropped automatically, when subscriptions are dropped
```bash
sudo -u postgres pg_drop_replication_publications
```
---
### ✅ DONE
---