Centos7.2, Ceph avec 3 OSD, 1 MON fonctionnant sur un même nœud. radosgw et tous les démons fonctionnent sur le même nœud, et tout fonctionnait bien. Après le redémarrage du serveur, tous les OSD n'ont pas pu communiquer (apparemment) et radosgw ne fonctionne pas correctement, son journal dit :
2016-03-09 17:03:30.916678 7fc71bbce880 0 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403), process radosgw, pid 24181
2016-03-09 17:08:30.919245 7fc712da8700 -1 Initialization timeout, failed to initialize
ceph health
montre :
HEALTH_WARN 1760 pgs stale; 1760 pgs stuck stale; too many PGs per OSD (1760 > max 300); 2/2 in osds are down
y ceph osd tree
donner :
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 2.01999 root default
-2 1.01999 host app112
0 1.00000 osd.0 down 1.00000 1.00000
1 0.01999 osd.1 down 0 1.00000
-3 1.00000 host node146
2 1.00000 osd.2 down 1.00000 1.00000
y service ceph status
les résultats :
=== mon.app112 ===
mon.app112: running {"version":"0.94.6"}
=== osd.0 ===
osd.0: running {"version":"0.94.6"}
=== osd.1 ===
osd.1: running {"version":"0.94.6"}
=== osd.2 ===
osd.2: running {"version":"0.94.6"}
=== osd.0 ===
osd.0: running {"version":"0.94.6"}
=== osd.1 ===
osd.1: running {"version":"0.94.6"}
=== osd.2 ===
osd.2: running {"version":"0.94.6"}
et ceci est service radosgw status
:
Redirecting to /bin/systemctl status radosgw.service
ceph-radosgw.service - LSB: radosgw RESTful rados gateway
Loaded: loaded (/etc/rc.d/init.d/ceph-radosgw)
Active: active (exited) since Wed 2016-03-09 17:03:30 CST; 1 day 23h ago
Docs: man:systemd-sysv-generator(8)
Process: 24134 ExecStop=/etc/rc.d/init.d/ceph-radosgw stop (code=exited, status=0/SUCCESS)
Process: 2890 ExecReload=/etc/rc.d/init.d/ceph-radosgw reload (code=exited, status=0/SUCCESS)
Process: 24153 ExecStart=/etc/rc.d/init.d/ceph-radosgw start (code=exited, status=0/SUCCESS)
Voyant cela, j'ai essayé sudo /etc/init.d/ceph -a start osd.1 and stop for a couple of times, mais le résultat est le même que ci-dessus.
sudo /etc/init.d/ceph -a stop osd.1
=== osd.1 ===
Stopping Ceph osd.1 on open-kvm-app92...kill 12688...kill 12688...done
sudo /etc/init.d/ceph -a start osd.1
=== osd.1 ===
create-or-move updated item name 'osd.1' weight 0.02 at location {host=open-kvm-app92,root=default} to crush map
Starting Ceph osd.1 on open-kvm-app92...
Running as unit ceph-osd.1.1457684205.040980737.service.
Merci de m'aider.
EDIT : il semble que mon ne peut pas parler à osd. mais les deux démons fonctionnent correctement. le journal d'osd montre :
2016-03-11 17:35:21.649712 7f003c633700 5 osd.0 234 tick
2016-03-11 17:35:22.649982 7f003c633700 5 osd.0 234 tick
2016-03-11 17:35:23.650262 7f003c633700 5 osd.0 234 tick
2016-03-11 17:35:24.650538 7f003c633700 5 osd.0 234 tick
2016-03-11 17:35:25.650807 7f003c633700 5 osd.0 234 tick
2016-03-11 17:35:25.779693 7f0024c96700 5 osd.0 234 heartbeat: osd_stat(6741 MB used, 9119 MB avail, 15861 MB total, peers []/[] op hist [])
2016-03-11 17:35:26.651059 7f003c633700 5 osd.0 234 tick
2016-03-11 17:35:27.651314 7f003c633700 5 osd.0 234 tick
2016-03-11 17:35:28.080165 7f0024c96700 5 osd.0 234 heartbeat: osd_stat(6741 MB used, 9119 MB avail, 15861 MB total, peers []/[] op hist [])