J'ai mis en place un cluster Redis à deux nœuds.
[master] 192.168.56.102 : Maître Redis (:6379), Esclave Redis (:6380), Sentinel(:26379), Sentinel#2(:26380)
[rescue] 192.168.56.103 : Maître Redis (:6379), Esclave Redis (:6380), Sentinelle(:26379)
Chaque instance esclave est un esclave de l'instance maître sur la même machine. Chaque instance sentinelle surveille les deux instances maîtres.
J'utilise ce qui précède en conjonction avec twemproxy (qui n'a rien à voir avec cette question) et un client-reconfig-script pour mettre à jour la configuration de twemproxy afin que l'application continue de fonctionner.
J'arrête les serveurs pour voir ce qui se passe et si tout fonctionne correctement.
[master] arrêter redis master : Le quorum est capable d'élire un nouveau maître avec succès. Journal ci-dessous.
==> /tmp/sentinel.log <==
[14701] 29 Dec 18:16:55.096 # +sdown slave 192.168.56.102:6379 192.168.56.102 6379 @ master 192.168.56.102 6380
[14705] 29 Dec 18:16:55.096 # +sdown slave 192.168.56.102:6379 192.168.56.102 6379 @ master 192.168.56.102 6380
[14701] 29 Dec 18:18:04.187 # -sdown slave 192.168.56.102:6379 192.168.56.102 6379 @ master 192.168.56.102 6380
[14705] 29 Dec 18:18:04.236 # -sdown slave 192.168.56.102:6379 192.168.56.102 6379 @ master 192.168.56.102 6380
[14705] 29 Dec 18:18:14.160 * +convert-to-slave slave 192.168.56.102:6379 192.168.56.102 6379 @ master 192.168.56.102 6380
[14705] 29 Dec 18:47:47.151 # +sdown slave 192.168.56.102:6379 192.168.56.102 6379 @ master 192.168.56.102 6380
[14701] 29 Dec 18:47:47.170 # +sdown slave 192.168.56.102:6379 192.168.56.102 6379 @ master 192.168.56.102 6380
[14701] 29 Dec 18:47:57.650 * +reboot slave 192.168.56.102:6379 192.168.56.102 6379 @ master 192.168.56.102 6380
[14705] 29 Dec 18:47:57.652 * +reboot slave 192.168.56.102:6379 192.168.56.102 6379 @ master 192.168.56.102 6380
[14705] 29 Dec 18:47:57.715 # -sdown slave 192.168.56.102:6379 192.168.56.102 6379 @ master 192.168.56.102 6380
[14701] 29 Dec 18:47:57.738 # -sdown slave 192.168.56.102:6379 192.168.56.102 6379 @ master 192.168.56.102 6380
[14705] 29 Dec 18:48:08.088 # +sdown master master 192.168.56.102 6380
[14701] 29 Dec 18:48:08.180 # +sdown master master 192.168.56.102 6380
[14701] 29 Dec 18:48:08.280 # +odown master master 192.168.56.102 6380 #quorum 2/2
[14701] 29 Dec 18:48:08.280 # +new-epoch 73
[14701] 29 Dec 18:48:08.280 # +try-failover master master 192.168.56.102 6380
[14701] 29 Dec 18:48:08.471 # +vote-for-leader a664b9f61df2b10bbbb5d865b01c599ddd36183c 73
[14701] 29 Dec 18:48:08.472 # 192.168.56.103:26379 voted for 491b32b95c547a8266faf9b04ce6b0c18486236b 73
[14705] 29 Dec 18:48:08.473 # +new-epoch 73
[14705] 29 Dec 18:48:08.475 # +vote-for-leader 491b32b95c547a8266faf9b04ce6b0c18486236b 73
[14701] 29 Dec 18:48:08.475 # 192.168.56.102:26380 voted for 491b32b95c547a8266faf9b04ce6b0c18486236b 73
[14701] 29 Dec 18:48:08.835 # +config-update-from sentinel 192.168.56.103:26379 192.168.56.103 26379 @ master 192.168.56.102 6380
[14705] 29 Dec 18:48:08.835 # +config-update-from sentinel 192.168.56.103:26379 192.168.56.103 26379 @ master 192.168.56.102 6380
[14705] 29 Dec 18:48:08.836 # +switch-master master 192.168.56.102 6380 192.168.56.102 6379
[14701] 29 Dec 18:48:08.836 # +switch-master master 192.168.56.102 6380 192.168.56.102 6379
[14701] 29 Dec 18:48:08.836 * +slave slave 192.168.56.102:6380 192.168.56.102 6380 @ master 192.168.56.102 6379
[14705] 29 Dec 18:48:08.836 * +slave slave 192.168.56.102:6380 192.168.56.102 6380 @ master 192.168.56.102 6379
==> /tmp/sent.log <==
Failing master: 192.168.56.102:6380
New Master: 192.168.56.102:6379
Failing master: 192.168.56.102:6380
New Master: 192.168.56.102:6379
==> /tmp/sentinel.log <==
[14705] 29 Dec 18:48:11.855 # +sdown slave 192.168.56.102:6380 192.168.56.102 6380 @ master 192.168.56.102 6379
[14701] 29 Dec 18:48:11.903 # +sdown slave 192.168.56.102:6380 192.168.56.102 6380 @ master 192.168.56.102 6379
Vous pouvez également voir que mon script de reconfiguration est appelé avec succès et sort quelques détails sur l'écran.
Le problème se pose lorsque j'essaie d'arrêter l'instance maître sur la machine [de secours]. La sentinelle rapporte continuellement "-failover-abort-not-elected master resque 192.168.56.103 6379".
==> /tmp/sentinel.log <==
[14705] 29 Dec 18:48:11.855 # +sdown slave 192.168.56.102:6380 192.168.56.102 6380 @ master 192.168.56.102 6379
[14701] 29 Dec 18:48:11.903 # +sdown slave 192.168.56.102:6380 192.168.56.102 6380 @ master 192.168.56.102 6379
[14705] 29 Dec 18:48:43.401 # -sdown slave 192.168.56.102:6380 192.168.56.102 6380 @ master 192.168.56.102 6379
[14701] 29 Dec 18:48:43.433 # -sdown slave 192.168.56.102:6380 192.168.56.102 6380 @ master 192.168.56.102 6379
[14705] 29 Dec 18:48:53.344 * +convert-to-slave slave 192.168.56.102:6380 192.168.56.102 6380 @ master 192.168.56.102 6379
[14705] 29 Dec 18:49:23.617 # +sdown master resque 192.168.56.103 6379
[14701] 29 Dec 18:49:23.625 # +sdown master resque 192.168.56.103 6379
[14705] 29 Dec 18:49:23.674 # +odown master resque 192.168.56.103 6379 #quorum 2/2
[14705] 29 Dec 18:49:23.674 # +new-epoch 74
[14705] 29 Dec 18:49:23.674 # +try-failover master resque 192.168.56.103 6379
[14701] 29 Dec 18:49:23.727 # +odown master resque 192.168.56.103 6379 #quorum 3/2
[14701] 29 Dec 18:49:23.727 # +new-epoch 74
[14701] 29 Dec 18:49:23.727 # +try-failover master resque 192.168.56.103 6379
[14705] 29 Dec 18:49:23.886 # +vote-for-leader b608fcab7a201799826f4d9ee839aed3cf556fdf 74
[14701] 29 Dec 18:49:23.889 # +vote-for-leader a664b9f61df2b10bbbb5d865b01c599ddd36183c 74
[14701] 29 Dec 18:49:23.890 # 192.168.56.102:26380 voted for b608fcab7a201799826f4d9ee839aed3cf556fdf 74
[14705] 29 Dec 18:49:23.890 # 192.168.56.102:26379 voted for a664b9f61df2b10bbbb5d865b01c599ddd36183c 74
[14705] 29 Dec 18:49:23.893 # 192.168.56.103:26379 voted for b608fcab7a201799826f4d9ee839aed3cf556fdf 74
[14701] 29 Dec 18:49:23.893 # 192.168.56.103:26379 voted for b608fcab7a201799826f4d9ee839aed3cf556fdf 74
[14705] 29 Dec 18:49:23.980 # +elected-leader master resque 192.168.56.103 6379
[14705] 29 Dec 18:49:23.980 # +failover-state-select-slave master resque 192.168.56.103 6379
[14705] 29 Dec 18:49:24.064 # -failover-abort-no-good-slave master resque 192.168.56.103 6379
[14705] 29 Dec 18:49:24.117 # Next failover delay: I will not start a failover before Mon Dec 29 18:49:32 2014
[14701] 29 Dec 18:49:28.417 # -failover-abort-not-elected master resque 192.168.56.103 6379
[14701] 29 Dec 18:49:28.489 # Next failover delay: I will not start a failover before Mon Dec 29 18:49:32 2014
[14705] 29 Dec 18:49:32.217 # +new-epoch 75
[14705] 29 Dec 18:49:32.217 # +try-failover master resque 192.168.56.103 6379
[14701] 29 Dec 18:49:32.423 # +new-epoch 75
[14701] 29 Dec 18:49:32.424 # +try-failover master resque 192.168.56.103 6379
[14705] 29 Dec 18:49:32.433 # +vote-for-leader b608fcab7a201799826f4d9ee839aed3cf556fdf 75
[14705] 29 Dec 18:49:32.435 # 192.168.56.103:26379 voted for 491b32b95c547a8266faf9b04ce6b0c18486236b 75
[14701] 29 Dec 18:49:32.437 # +vote-for-leader a664b9f61df2b10bbbb5d865b01c599ddd36183c 75
[14701] 29 Dec 18:49:32.438 # 192.168.56.102:26380 voted for b608fcab7a201799826f4d9ee839aed3cf556fdf 75
[14705] 29 Dec 18:49:32.438 # 192.168.56.102:26379 voted for a664b9f61df2b10bbbb5d865b01c599ddd36183c 75
[14701] 29 Dec 18:49:32.438 # 192.168.56.103:26379 voted for 491b32b95c547a8266faf9b04ce6b0c18486236b 75
[14705] 29 Dec 18:49:36.636 # -failover-abort-not-elected master resque 192.168.56.103 6379
[14705] 29 Dec 18:49:36.691 # Next failover delay: I will not start a failover before Mon Dec 29 18:49:40 2014
[14701] 29 Dec 18:49:36.843 # -failover-abort-not-elected master resque 192.168.56.103 6379
[14701] 29 Dec 18:49:36.905 # Next failover delay: I will not start a failover before Mon Dec 29 18:49:40 2014
Un nouveau maître n'est pas élu et le script de reconfiguration n'est pas appelé.
Je comprends que pour qu'un nouveau maître soit élu, le quorum (n/2+1) devra être d'accord. C'est pourquoi j'utilise 3 sentinelles pour tester.
Je ne comprends pas pourquoi l'exemple ci-dessus ne se termine pas par une élection (comme [master]).
J'utilise le serveur Redis v=2.8.19 sha=0a21368c:1 malloc=jemalloc-3.6.0 bits=64 build=e570b291804f6e35
Merci pour toute aide et ne faites pas attention à la grossière erreur d'orthographe du mot sauvetage !
--edit
[master] redis slave config:
daemonize yes
pidfile "/var/run/redis/redis-server-slave.pid"
port 6380
tcp-backlog 511
bind 192.168.56.102
timeout 0
tcp-keepalive 0
loglevel notice
logfile "/var/log/redis/redis-server.log"
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename "dump.rdb"
dir "/var/lib/redis"
slave-serve-stale-data yes
slave-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
slave-priority 100
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes
maxclients 4064
# Generated by CONFIG REWRITE
slaveof 192.168.56.102 6379
[master] master instance run_id : 5c1ffb7742ad78cde12dbe4858747a314adaebe9
[maître] esclave instance run_id : 5772f437519bb38782d38f6675ae5d9157be2419
[rescue] master instance run_id : 5c1ffb7742ad78cde12dbe4858747a314adaebe9
[rescue] esclave instance run_id : 5772f437519bb38782d38f6675ae5d9157be2419
--
[master] Sentinelle (:26379) run_id : a664b9f61df2b10bbbb5d865b01c599ddd36183c
[master] Sentinelle (:26380) run_id : b608fcab7a201799826f4d9ee839aed3cf556fdf
[esclave] Sentinelle (:26379) run_id : a664b9f61df2b10bbbb5d865b01c599ddd36183c
Informations sur Sentinel
# Sentinel
sentinel_masters:2
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
master0:name=resque,status=ok,address=192.168.56.103:6379,slaves=0,sentinels=3
master1:name=master,status=ok,address=192.168.56.102:6379,slaves=1,sentinels=3
Sentinel conf de 192.168.56.103
port 26379
logfile "/tmp/sentinel.log"
dir "/tmp"
sentinel monitor resque 192.168.56.103 6379 2
sentinel down-after-milliseconds resque 3000
sentinel failover-timeout resque 4000
sentinel client-reconfig-script resque /home/sm0ke/Projects/git/thrace/extra/sentinel-failover.py
sentinel config-epoch resque 0
sentinel leader-epoch resque 76
sentinel known-sentinel resque 192.168.56.102 26379 a664b9f61df2b10bbbb5d865b01c599ddd36183c
sentinel known-sentinel resque 192.168.56.102 26380 b608fcab7a201799826f4d9ee839aed3cf556fdf
maxclients 4064
daemonize yes
# Generated by CONFIG REWRITE
sentinel monitor master 192.168.56.102 6379 2
sentinel down-after-milliseconds master 3000
sentinel failover-timeout master 4000
sentinel client-reconfig-script master /home/sm0ke/Projects/git/thrace/extra/sentinel-failover.py
sentinel config-epoch master 73
sentinel leader-epoch master 73
sentinel known-slave master 192.168.56.102 6380
sentinel known-sentinel master 192.168.56.102 26380 b608fcab7a201799826f4d9ee839aed3cf556fdf
sentinel known-sentinel master 192.168.56.102 26379 a664b9f61df2b10bbbb5d865b01c599ddd36183c
sentinel current-epoch 76
redis-cli info pour Sentinel sur 192.168.56.103
# Server
redis_version:2.8.19
redis_git_sha1:0a21368c
redis_git_dirty:1
redis_build_id:e570b291804f6e35
redis_mode:sentinel
os:Linux 3.2.0-4-amd64 x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.7.2
process_id:8345
run_id:491b32b95c547a8266faf9b04ce6b0c18486236b
tcp_port:26379
uptime_in_seconds:8054
uptime_in_days:0
hz:17
lru_clock:10594345
config_file:/etc/redis/sentinel.conf
# Sentinel
sentinel_masters:2
sentinel_tilt:0
sentinel_running_scripts:16
sentinel_scripts_queue_length:4
master0:name=resque,status=ok,address=192.168.56.103:6379,slaves=0,sentinels=3
master1:name=master,status=ok,address=192.168.56.102:6379,slaves=1,sentinels=3