5.0 KiB
Migrating PostgreSQL cluster to a new major version
Summary
- Dump from a replica
- Restore to fresh VM running new major version
- Add logical replication for delta sync from current/old primary
- Switch primary to new server
- Remove logical replication on new server
Runbook
- Primary host:
PRIMARY_HOST - Replica host:
REPLICA_HOST - New PG14 host:
NEW_HOST - PostgreSQL superuser:
postgres - Running locally on each machine via
sudo -u postgres
Adjust hostnames/IPs/etc. where needed.
🟢 0. PRIMARY — Pre-checks
sudo -u postgres psql -c "SHOW wal_level;"
sudo -u postgres psql -c "SHOW max_replication_slots;"
If needed, edit config:
sudo -u postgres vi $PGDATA/postgresql.conf
Ensure:
wal_level = logical
max_replication_slots = 10
Restart if changed:
sudo systemctl restart postgresql
🔵🟡 3. Create keypair for syncing dump later
🔵 On NEW_HOST:
sudo mkdir -p /home/postgres/.ssh && \
sudo chown -R postgres:postgres /home/postgres && \
sudo chmod 700 /home/postgres/.ssh && \
sudo -u postgres bash -c 'ssh-keygen -t ecdsa -b 256 -f /home/postgres/.ssh/id_ecdsa -N "" -C "postgres@$(hostname)"' && \
sudo cat /home/postgres/.ssh/id_ecdsa.pub
Copy the public key from the above output
🟡 On replica:
sudo mkdir -p /home/postgres/.ssh && \
sudo chown -R postgres:postgres /home/postgres && \
sudo chmod 700 /home/postgres/.ssh && \
echo [public_key] | sudo tee /home/postgres/.ssh/authorized_keys > /dev/null && \
sudo chmod 700 /home/postgres/.ssh
🟢 1. PRIMARY — Create publication and replication slots
sudo -u postgres pg_create_replication_publications
or
sudo -u postgres pg_create_replication_publication [db_name]
Listing publications and slots:
sudo -u postgres pg_list_replication_publications
sudo -u postgres pg_list_replication_slots
🟡 3. REPLICA — Pause replication
sudo -u postgres psql -c "SELECT pg_wal_replay_pause();"
Verify:
sudo -u postgres psql -c "SELECT pg_is_wal_replay_paused();"
🟡 4. REPLICA — Run dump
sudo -u postgres pg_dump_all_databases
or
sudo -u postgres bash -c "pg_dumpall --globals-only > /tmp/globals.sql"
sudo -u postgres pg_dump_database [db_name]
🟡 5. REPLICA — Resume replication
sudo -u postgres psql -c "SELECT pg_wal_replay_resume();"
🔵 6. COPY dumps to NEW HOST
From NEW_HOST:
export REPLICA_HOST=[private_ip] && \
cd /tmp && \
sudo -u postgres scp "postgres@$REPLICA_HOST:/tmp/globals.sql" . && \
sudo -u postgres scp "postgres@$REPLICA_HOST:/tmp/dump_*.tar.zst" .
🔵 7. NEW HOST (PostgreSQL 14) — Restore
7.1 Restore globals
sudo -u postgres psql -f /tmp/globals.sql
7.2 Create databases
sudo -u postgres psql -Atqc "SELECT datname FROM pg_database WHERE datallowconn AND datname NOT IN ('template1')" | \
xargs -I{} sudo -u postgres createdb {}
or
sudo -u postgres createdb [db_name]
7.3 Restore each database
sudo -u postgres pg_restore_all_databases
or
sudo -u postgres pg_restore_database [db_name]
🔵 8. NEW HOST — Create subscriptions
sudo -u postgres pg_create_replication_subscriptions
or
sudo -u postgres pg_create_replication_subscription [db_name]
🔵 9. NEW HOST — Monitor replication
sudo -u postgres pg_list_replication_subscriptions
🔴 11. CUTOVER
11.1 Stop writes on old primary
Put app(s) in maintenance mode, stop the app/daemons.
11.2 Wait for replication to catch up
TODO: not the best way to check, since WAL LSNs keep increasing
sudo -u postgres psql -d [db_name] -c "SELECT * FROM pg_stat_subscription;"
11.3 Fix sequences
Run per DB:
sudo -u postgres pg_fix_sequences_in_all_databases
or
sudo -u postgres pg_fix_sequences [db_name]
11.4 Point app to NEW_HOST
-
Update
pg.kosmos.localin/etc/hostson app server(s). For example:export NEW_PG_PRIMARY=[private_ip] knife ssh roles:ejabberd -a knife_zero.host "sudo sed -r \"s/^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+\s(pg.kosmos.local)/$NEW_PG_PRIMARY\t\1/\" -i /etc/hosts"Or override node attribute(s) if necessary and/or approporiate.
-
Start the app/daemons, and deactivate maintenance mode.
🧹 12. CLEANUP NEW_HOST
sudo -u postgres pg_drop_replication_subscriptions
🧹 13. CLEANUP PRIMARY
TODO: Looks like slots are dropped automatically, when subscriptions are dropped
sudo -u postgres pg_drop_replication_publications
🧹 13. CLEANUP Chef
Once all apps/databases are migrated, update the role in the node config of the new primary to 'postgres_primary' and converge it.
Also delete the old primary node config from the Chef repo.