load_balance_mode warning

[pgpool-online-recovery] / README.md
diff --git a/README.md b/README.md

index b094ac4..c39ad3e 100644 (file)
--- a/README.md
+++ b/README.md
@@ -3,15 +3,104 @@ pgpool-online-recovery
  
  This simple project aims to automate and make easy the online recovery process of a failed pgpool's backend node in master/slave mode.
  
+This version is work-in-progress using Centos7 and upstream packages. It doesn't require psmisc package, making Centos7 minimal installation sufficient for scripts to run, since it uses systemd to manage postgresql-9.6 installed in /var/lib/pgsql/9.6/data/
+
+Hardware configuration is 2 nodes with 3 IP addresses:
+
+       10.200.1.60 edozvola-db-pgpool  <- virtual IP with pgpool listening on port 9999
+       
+       10.200.1.61 edozvola-db-01
+       10.200.1.62 edozvola-db-02
+
+In this setup, application should connect to edozvola-db-pgpool:9999 since we are running postgresql on same nodes
+as pgpool which listen to default 5432 port. FIXME
+
+Deployment script ./t/1-init-cluster.sh assumes that management machine from which it's run is 10.200.1.1
+which is added in pg_hba.conf as authorized to be able to deploy cluster. It also assumes that management machine
+has ssh access to nodes of cluster using ssh keys or you will need to type passwords multiple times, so use
+ssh-copy-id root@10.200.1.6[12] to install them.
+
+In it's current form, scripts assume that postgresql-9.6 is allready installed on machines while pgpool isn't.
+It doesn't metter if pgpool is allready there, but postgresql-9.6 must be allready installed.
+
+You can run cluster creation it with:
+
+make init
+
+
+This will destroy all databases on all nodes, archive logs, etc, so don't do this if you need your old data later.
+
+On the other hand this will also create setup whole cluster, and you can examine it's status using:
+
+make
+
+
+In output you can see both nodes, and their status, if it looks ok, test cluster with:
+
+./t/3-test.sh
+
+
+You can also run
+
+./t/80-insert-test.sh
+
+to run continuous inserts into database in insert, select order without explicit tranactions for them.
+This nicely demonstrates problem if load_balance_mode = on in pgpool.conf
+
+
+If you edited local files, push changes to all nodes using:
+
+make push
+
+To restart pgpool (and cleanup it's state) do:
+
+make restart
+
+If you want to see systemd status of services just type:
+
+make status
+
+
+Now you can start './t/80-insert-test.sh' in one terminal to create insert and select load on cluster and
+kill one of nodes with 'echo b > /proc/sysrq-trigger'
+
+For example, kill slave:
+
+ssh root@10.200.1.62 'echo b > /proc/sysrq-trigger'
+
+pgpool should detect broken back-end and remove it.
+You can verify that using just 'make' and see that one node is down.
+To issue online recovery, you can use:
+
+make fix
+
+now, try to kill master:
+
+ssh root@10.200.1.61 'echo b > /proc/sysrq-trigger'
+
+FIXME: pgpool is stuck and needs to be restarted
+
+
+If installing on existing streaming replication you will need to tell pgpool where current master is with:
+
+echo 0 > /tmp/postgres_master
+
+You can also force re-check of nodes by removing status file and restarting pgool:
+
+rm /var/log/pgpool_status
+systemctl restart pgpool
+
+
+
  Requirements
  ============
  
  There are two requirements to these scripts to work.
  
-* The first one is [pgpool2](http://www.pgpool.net) (v3.1.3) available in [Debian Wheezy](http://packages.debian.org/stable/database/pgpool2). We assume that pgpool2 is installed, set up in master/slave mode with loadbalacing and manageable via PCP interface.
-* The second one is obviously Postgres server (v9.1) also available in Wheezy packages repository.
+* The first one is [pgpool-II](http://www.pgpool.net) (v3.6.5) available for [Centos7 from upstream](http://www.pgpool.net/yum/rpms/3.6/redhat/rhel-7-x86_64/pgpool-II-pg96-3.6.5-1pgdg.rhel7.x86_64.rpm).
+* The second one is obviously Postgres server (v9.6) also for [Centos7 from upstream](https://yum.postgresql.org/9.6/redhat/rhel-7-x86_64/pgdg-redhat96-9.6-3.noarch.rpm)
  
-There are several tutorials about setting up pgpool2 and postgres servers with [Streaming Replication](http://wiki.postgresql.org/wiki/Streaming_Replication) and this readme is far to be a howto for configuring both of them. You can check out [this tutorial](https://aricgardner.com/databases/postgresql/pgpool-ii-3-0-5-with-streaming-replication/) which describes really all the steps needed.
+There are several tutorials about setting up pgpool2 and postgres servers with [Streaming Replication](http://wiki.postgresql.org/wiki/Streaming_Replication) and this readme is far to be a howto for configuring both of them.
  
  Installation and configuration
  ==============================
@@ -41,18 +130,18 @@ Installation
  The installation steps are simple. You just need to copy provided bash scripts and config files as follow.
  
  **In pgpool node** :
-* Copy pgpool.conf to /etc/pgpool2/. This is an optional operation and in this case you have to edit the default pgpool.conf file in order to looks like the config file we provided.
-* Copy failover.sh into /usr/local/bin/ and online-recovery.sh to your home or another directory that will be easily accessible.
+* Copy pgpool.conf to /etc/pgpool-II/. This is an optional operation and in this case you have to edit the default pgpool.conf file in order to looks like the config file we provided.
+* Copy failover.sh into /etc/pgpool-II/ and online-recovery.sh to same directory or another directory that will be easily accessible.
  
  **In the master and slave postgres nodes** :
-* Copy streaming-replication.sh script into /var/lib/postgresql/ (postgres homedir).
-* Copy postgresql.conf.master and postgresql.conf.slave files to /etc/postgresql/9.1/main/.
+* Copy streaming-replication.sh script into /var/lib/pgsql/ (postgres homedir).
+* Copy postgresql.conf.master and postgresql.conf.slave files to /var/lib/pgsql/9.6/data/.
  * Finally copy recovery.conf into /var/lib/postgresql/9.1/main/.
  
-PS : All similar old files must be backed up to be able to rollback in case of risk (e.g: cp -p /etc/pgpool2/pgpool.conf /etc/pgpool2/pgpool.conf.backup).
+PS : All similar old files must be backed up to be able to rollback in case of risk (e.g: cp -p /etc/pgpool-II/pgpool.conf /etc/pgpool-II/pgpool.conf.backup).
  Make sure that :
  - All scripts are executable and owned by the proper users. 
-- /var/lib/postgresql/9.1/archive directory is created (used to archive WAL files). This folder must be owned by postgres user !
+- /var/lib/pgsql/9.6/archive directory is created (used to archive WAL files). This folder must be owned by postgres user !
  - Do not forge to edit pg_hba.conf in each postgres server to allow access to cluster's nodes.
  
  Not enough ! It remains only the configuration steps and we'll be done :)
@@ -142,9 +231,9 @@ After starting pgpool, try to test this two scenarios :
  
  **1. When a slave fails down** :
  
-Open pgpool log file 'tail -f /var/log/pgpool2/pgpool.log'.
+Open pgpool log file 'journalctl -u pgpool -f'
  
-Stop slave node '/etc/init.d/postgres stop'.
+Stop slave node 'sudo systemctl stop postgresql-9.6'
  
  After exceeding health_check_period, you should see this log message :