# vim: set foldenable foldmethod=indent sw=4 ts=8 :

# Copyright 2013 Linbit HA Solutions GmbH
# Lars Ellenberg @ linbit.com

TODO:
	someone convert this into proper ascii doc please ;-)
	... and draw some pictures ...

How crm-fence-peer.sh, pacemaker, and the OCF Linbit DRBD resource agent
are supposed to work together.

Two node cluster is the trickier one, because it has not real quorum.

Relative Timeouts
	--dc-timeout > dead-time resp. stonith-timeout
	if stonith enabled, --timeout >= --dc-timeout
	if no stonith, then timeout may be small.

Pacemaker operations timeouts
	monitor and promote action timeout > max(dc_timeout, timeout)

Node reboot, possibly because of crash or stonith due to communication loss
	no peer reachable	[no delay]
		crm may decide to elect itself, shoot the peer,
		and start services.

		If DRBD peer disk state is known Outdated or worse, DRBD will
		switch itself to UpToDate, allowing it to be promoted,
		without further fencing actions.

		If DRBD peer disk state is DUnknown, DRBD will be only Consistent.
		In case crm decides to promote this instance, the fence-peer callback
		runs, finds the peer "unreachable", finds itself Consistent only,
		does NOT set any constraint, and DRBD refuses to be promoted.

		CRM will now try in an endless loop to promote this instance.

		Avoid this by adding
		param adjust_master_score="0 10 1000 10000"
		to the DRBD resource definition.

	no replication link
		CRM can see both nodes. [delay: crmadmin -S $peer]

		If currently both nodes are Secondary Consistent, CRM will decide to
		promote one instance. The fence-peer callback will find the other node
		still reachable after timeout, and set the constraint.

		If there is already one Primary, and this is a node rejoining the
		cluster, there should already be a constraint preventing this node
		from being promoted.

Only Replication link breaks during normal operation
	Single Primary  [delay: crmadmin -S $peer]
		fence-peer callback finds DC,
		crmadmin -S confirms peer still "reachable",
		and sets contraint.

	Dual Primary
		both fence-peer callbacks find DC,
		both see node_state "reachable",
		optionaly delay for --network-hickup timeout,
		and if DRBD is still disconnected,
		both try to set the constraint.
		Only one succeeds.

		The loser should probably commit suicide,
		to reduce the overall recovery time.
		--suicide-on-failure-if-primary

Node crash
	surviving node is Secondary,	[no delay]
		If not DC, triggers DC election, elects itself.
		Is DC now.
		If stonith enabled, shoots the peer.
		Promotes this node.
		During promotion, fenc-peer callback
		finds a DC, and a node_state "unreachable",
		so sets the constraint "immediately".

	surviving node is Primary (DC)	[delay up to timeout]
		If stonith enabled, shoots the peer.
		fence-peer callback finds DC, after some
		time sees node_state "unreachable",
		or times out while node_state is still "reachable".
		Either way still sets the constraint.

	surviving node is Primary (not DC) [delay up to mac(dc_timeout,timeout)]
		fence-peer callback loops trying to contact DC.
		eventually this node is elected DC.
		If stonith enabled, shoots the peer.

		Fence-peer callback either times out while no DC is available,
		thus fails.  Make sure you chose a suitable --dc-timeout.

		Or it finds the other node "unreachable",
		and sets the constraint.

Total communication loss
	To the single node, this looks like node crash, so see above.

	The difference is the potential of data divergence.

	If DRBD was configured for "fencing resource-and-stonith",
	IO on any Primary is frozen while the fence-peer callback runs.

	If stonith is enabled, timeouts should be selected so that
	we are shot while waiting for the DC to confirm node_state
	"unreachable" of the peer, thus combined with freezing IO,
	no harmful data diversion can happen at this time.

	If there is no stonith enabled, data divergence is unavoidable.

		==> Multi-Primary *requires*
		    both node level fencing (stonith)
		    AND drbd resource level fencing

	Again: Multi-Primary REQUIRES stonith enabled and working.

