Discussion:
unknown
1970-01-01 00:00:00 UTC
Permalink
* VIP takeover
* LVS director connections table synchronization.

The current devel status on that 2 points are :

* VIP takeover : Done using VRRP keepalived framework. Currently IPSEC-AH
need to be tuned, especially the IPSEC seqnum synchronization in fallback
scenario.
* LVS director connections table sync : Currently this part is handled by a
LVS part called syncd, that subscribe to a multicast group sending
connection table. This kernel space process can be drived by the VRRP
framework (according to a specific VRRP instance state). Currently the LVS
syncd can not be used in an active/active env since it can only exist one
MASTER syncd sending connections info to the multicast group. This will be
enhanced I think (our time is too short :) )
Well In the case of already existing connection which will be doing read
and write( If i understand correctly ) they will be carried out as
remote operation on the socket,which means even though the application
migrates the socket still remain at the first node(node1).( Brian
should be able to tell more about that. Brian ? ).
We haven't decided about what should happen to the main socket (
socket() ) that is bound to the address.
Ok. for me the point that I don t understand is the CVIP point :/

Just let me sketch a simple env :

example specs: Two VIPs exposed to the world, loadbalancing using LVS on a
realserver pool.
Sketch:

+------------------+
| WAN area |
+------------------+

.............................................[CI/SSI cluster].....
. .
. .
. +----[VIP1]---+ +----[VIP2]---+ .
. | | | | .
. | LVS node1 | | LVS node2 | .
. +-------------+ +-------------+ .
. .
. +----[RIP1]---+ +----[RIP2]---+ +----[RIP3]---+ .
. | | | | | | .
. | app node1 | | app node2 | | app node3 | .
. +-------------+ +-------------+ +-------------+ .
..................................................................


WAN client will access VIP1 and VIP2 exposing a virtual service to the
world (http for example). Then LVS node1 will schedule traffic to app
node1,2,3 (the same for LVS node2).

What is the meaning of CVIP regarding VIP1,2 ? if I understand, CVIP will
transit periodically to LVS node1 and node2, this will break the LVS inside
scheduling coherence because connection table will have no longer
pertinence.

This finaly mean that the CVIP node affectation CI/SSI design is something
like scheduling decision. This scheduling decision will perturbe the LVS
scheduling decision since the LVS code scheduling decision requierd VIP
owned by the director to garanty scheduling performance. This 2 scheculing
levels introduce a concept problem, this problem can be highlighted for
persistents connections.

For example if LVS (node1,2) loadbalance SSL servers (node1,2,3), one
connection should be sticked to a specific app node otherwise ssl protocol
will be corrupted. This stickyness is done if the app node connection
between client & server is present in the LVS conn table. If LVS VIP change
from node from LVS node1 to LVS node2, then LVS node2 will not have the
entry in its connections table so LVS scheduler will be played and another
app node can be elected which will broke this persistence need.

So I can see 2 solutions :

* Replicating the whole director connection table in realtime
* Stick CVIP to a specific LVS node.

=> Then the LVS VIP takeover will be handled by VRRP (for example).
right now I am thinking of using a script that will select potential LVS
directors manually and syn the table between them using --sync option
and fialover using VRRP
Oh OK. But what will be the script key decision for selecting a specific
LVS director ?
this will introduce some scheduling decisions.

This is the part a little obscur for me : What is the LVS key decision
selection ?
I was trying to bring a Cluster Wide IP functionality to the SSI cluster
using LVS and VRRP( keepalive ). Also I will be looking into the initial
work done by Kai. That means nothing is decided. We could either use LVS
and VRRP or the code from HP's NSC for SCO unixware.
ah OK... This cluster Wide IP must depend on the takeover protocol you will
choose (VRRP, ...).
The patch was intented to make people able to run LVS with keepalive on
a CI cluster.Whether it form a base of Cluster Wide IP for SSI is yet
to be decided.
Yes, yes no problem for me, I will add your patch into keepalived, really
not a problem. I just want to know a little bit more on CI/SSI :)
I was actually pointing at the initialize_nodemap function. If it fails
how am i going to inform the main line code. Does a exit() will do it.
This Keepalived code doesn t use pthread (for porting reasons), the thread
functionnality is provided by a central I/O MUX. So to end a thread, just
use return() at the end of your thread functions.
If my initialize_node_map fails then VRRP( I use the term VRRP because
there is another keep alive with SSI. To differentiate i guess for the
time being VRRP is ok ? ) cannot run with CI .
(VRRP name is ok for me, some people mistake VRRP and vrrpd this is why I
use keepalived).

Ok for init_node_map.
I am not aware of one. But it will be good to have a mapping of node
number with node IP address. For example in a configuration. with node1
having two IP address one for cluster interconnect and other for
external network, it will be good to have an interface like
clusternode_t cluster_node_num( uint32_t addr_ip).
On both the IP it will return me node1.
Hmm, yes, I agree. as a part of your libcluster.
The CI/SSI also allow an application to know about the node up/down
events by registering for SIGCLUSTER signal. But using SIGCLUSTER
doesn't well fit into the existing VRRP( keepalive) model with IO
multiplexer. Infact first i tried to do it with signal . But later
scraped it because it doesn't fit well.
Oh OK.

I was thinking of a AF_CLUSTER family. Create a socket with this family
will create a kernel channel to the CI/SSI core code. Then the CI/SSI code
can broadcast kernel message on node status that will be received userspace
to update node status => no polling so overhead will be optimized.

Best regards,
Alexandre

Loading...