Files
chef/doc/postgres-migration.md

4.5 KiB

Migrating PostgreSQL cluster to a new major version

Summary

  1. Dump from a replica
  2. Restore to fresh VM running new major version
  3. Add logical replication for delta sync from current/old primary
  4. Switch primary to new server
  5. Remove logical replication on new server

Runbook

  • Primary host: PRIMARY_HOST
  • Replica host: REPLICA_HOST
  • New PG14 host: NEW_HOST
  • PostgreSQL superuser: postgres
  • Running locally on each machine via sudo -u postgres

Adjust hostnames/IPs/etc. where needed.


🟢 0. PRIMARY — Pre-checks

sudo -u postgres psql -c "SHOW wal_level;"
sudo -u postgres psql -c "SHOW max_replication_slots;"

If needed, edit config:

sudo -u postgres vi $PGDATA/postgresql.conf

Ensure:

wal_level = logical
max_replication_slots = 10

Restart if changed:

sudo systemctl restart postgresql

🔵🟡 3. Create keypair for syncing dump later

🔵 On NEW_HOST:

sudo mkdir -p /home/postgres/.ssh && \
sudo chown -R postgres:postgres /home/postgres && \
sudo chmod 700 /home/postgres/.ssh && \
sudo -u postgres bash -c 'ssh-keygen -t ecdsa -b 256 -f /home/postgres/.ssh/id_ecdsa -N "" -C "postgres@$(hostname)"' && \
sudo cat /home/postgres/.ssh/id_ecdsa.pub

Copy the public key from the above output

🟡 On replica:

sudo mkdir -p /home/postgres/.ssh && \
sudo chown -R postgres:postgres /home/postgres && \
sudo chmod 700 /home/postgres/.ssh && \
echo [public_key] | sudo tee /home/postgres/.ssh/authorized_keys > /dev/null && \
sudo chmod 700 /home/postgres/.ssh

🟢 1. PRIMARY — Create publication and replication slots

sudo -u postgres pg_create_replication_publications

or

sudo -u postgres pg_create_replication_publication [db_name]

Listing publications and slots:

sudo -u postgres pg_list_replication_publications
sudo -u postgres pg_list_replication_slots

🟡 3. REPLICA — Pause replication

sudo -u postgres psql -c "SELECT pg_wal_replay_pause();"

Verify:

sudo -u postgres psql -c "SELECT pg_is_wal_replay_paused();"

🟡 4. REPLICA — Run dump

sudo -u postgres pg_dump_all_databases

or

sudo -u postgres bash -c "pg_dumpall --globals-only > /tmp/globals.sql"
sudo -u postgres pg_dump_database [db_name]

🟡 5. REPLICA — Resume replication

sudo -u postgres psql -c "SELECT pg_wal_replay_resume();"

🔵 6. COPY dumps to NEW HOST

From NEW_HOST:

export REPLICA_HOST=[private_ip] && \
cd /tmp && \
sudo -u postgres scp "postgres@$REPLICA_HOST:/tmp/globals.sql" . && \
sudo -u postgres scp "postgres@$REPLICA_HOST:/tmp/dump_*.tar.zst" .

🔵 7. NEW HOST (PostgreSQL 14) — Restore

7.1 Restore globals

sudo -u postgres psql -f /tmp/globals.sql

7.2 Create databases

sudo -u postgres psql -Atqc "SELECT datname FROM pg_database WHERE datallowconn AND datname NOT IN ('template1')" | \
xargs -I{} sudo -u postgres createdb {}

or

sudo -u postgres createdb [db_name]

7.3 Restore each database

sudo -u postgres pg_restore_all_databases

or

sudo -u postgres pg_restore_database [db_name]

🔵 8. NEW HOST — Create subscriptions

sudo -u postgres pg_create_replication_subscriptions

or

sudo -u postgres pg_create_replication_subscription [db_name]

🔵 9. NEW HOST — Monitor replication

sudo -u postgres pg_list_replication_subscriptions

🔴 11. CUTOVER

11.1 Stop writes on old primary

(app maintenance mode, stop app/daemons)


11.2 Wait for replication to catch up

TODO: not the best way to check, since WAL LSNs keep increasing

sudo -u postgres psql -d [db_name] -c "SELECT * FROM pg_stat_subscription;"

11.3 Fix sequences

Run per DB:

sudo -u postgres pg_fix_sequences_in_all_databases

or

sudo -u postgres pg_fix_sequences [db_name]

11.4 Point app to NEW_HOST

  • Update pg.kosmos.local in /etc/hosts on app server(s) (maybe override attribute for role and converge)
  • Start app/daemons, deactivate maintenance mode

🧹 12. CLEANUP (NEW_HOST)

sudo -u postgres pg_drop_replication_subscriptions

🧹 13. CLEANUP (PRIMARY)

TODO: Looks like slots are dropped automatically, when subscriptions are dropped

sudo -u postgres pg_drop_replication_publications

DONE