[CI] re[2]: Migrating the memexpd server

Greg Freemyer

2002-03-13 17:14:07 UTC

David,

I believe you may also want to talk to Aneesh Kumar (of Compaq India).

He is actively integrating IBM's DLM into CI (Cluster Interconnect). He is also a project member of the SSI (Linux Single System Image) project.

The SSI project currently is using OpenGFS and CI as two of its major components. Once Aneesh is done, I believe they will be adding DLM to the mix.

The SSI project also has an existing shared root solution which you discussed in a separate e-mail. They use GFS to do this.

To join either of the above mailing lists:
https://lists.sourceforge.net/lists/listinfo/ci-linux-devel
http://lists.sourceforge.net/lists/listinfo/ssic-linux-devel

Greg Freemyer
Internet Engineer
Deployment and Integration Specialist
The Norcross Group
www.NorcrossGroup.com

Have you look into IBM's DLM (distributed lock manager)? Can we do some
kind of implementation to acheive parallel locking mechanism so that we
don't have to do a failover? Do you know how does the GPFS (General
Parallel fs) works from IBM? It seems they are so perfect in this kind
of area. Since it is a propreitry fs module, I don't know how all they
integrate each other.
For your solution, how can you know which is the next lock server to
use? Or you just simply takeover the IP address? Then how does other gfs
clusters to know about this and handle the failover? Writing all locks
to disk will result in a servere performance hit.
David

Hi,
IŽve successfully set up a method to checkpoint & restart the GFS memexpd
lock server. I used OpenGFS 4.01 and after some patches made it run on

Linux

for S/390. I worked with a checkpointing tool for Linux (Crak) and

extended

the checkpointing tool to support restart of applications which use

sockets.

In order to test this I checkpointed & restarted the locking server

memexpd.

If the server disappears, the clients stop executing and try to

reconnect. As

I restarted the server they resumed execution, so it works fine. That

gives

the option to migrate the server to another machine, without modification

the server. If used together with an IP-address takeover solution you can
also move the server to another host.
If I got it right the only failover support for memexpd was to write

every

single lock to disk in order to be able to restart the server. So I think
checkpointing the server in order to restart is much better for

performance

than recording locks to disk...
If anyone is interested I can give further details,
Jan
-------------------------------------------------------------------------

----

University of Technology, Dresden
Dept. of Computer Science
-------------------------------------------------------------------------

----

_______________________________________________
Opengfs-users mailing list
https://lists.sourceforge.net/lists/listinfo/opengfs-users