Discussion:
[CI] CLMS stuck in pre-root initialization on 2-node SSI cluster
Roger Tsang
2007-02-18 07:46:44 UTC
Permalink
Hi,

I ran into the following problem on a 2-node DRBD-SSI cluster. The
clms master was failing over, but paniced due to IO errors. At the
same time the client was booting but did not detect the root node went
down. Last console message was waiting to join cluster. Not good,
but hard to reproduce.

The clms master must have failed before nodedown daemon spawned or the
daemon never got the event. So the client got stuck in an infinite
loop somewhere I think in clms_client_sync_masterparams().

Roger


--- cluster/clms/clms_client.c 10 Feb 2005 01:05:32 -0000 1.7
+++ cluster/clms/clms_client.c 18 Feb 2007 06:57:02 -0000
@@ -158,7 +158,7 @@
static void
clms_client_sync_masterparams(void)
{
- int error;
+ int error, tries = 0;
int rval;
char *masterinfo;
int masterinfo_len;
@@ -185,11 +185,17 @@
}
if (error == -EREMOTE) {
nidelay(HZ);
+ if (++tries > CONFIG_NODE_MONITOR_TIMEOUT_MS /
1000 + 1) {
+ printk(KERN_WARNING
+ "Failed reading master list,
master went down!\n");
+ machine_restart(NULL);
+ }
continue;
}
if (error >= 0)
error = rval;
if (error < 0)
+ /* XXX: Never reached */
panic("%s:Error %d reading master list\n",
__FUNCTION__, -error);
break;

Loading...