How many processes do I need to tolerate faults?
Tolerate mean for the system to be safe. In a synchronous system we get both safety and liveness, in an asynchronous one we only get safety since liveness it’s impossible.
Remember that the FLP theorem says:
Consensus with faults it’s impossible to solve in an asynchronous system
Taxonomy
Crash faults | Byzantine faults | Byzantine faults w/ signatures | |
---|---|---|---|
Synchronous | |||
Asynchronous | (Paxos) |
Byzantine faults
Byzantine faults needs nodes.
Let’s assume we have nodes and we want to tolerate byzantine fault. This isn’t possible since we haven’t reached the requirement. Let’s see why:
is the General, is the lieutenant. has to give both the liutenants the same order, otherwise they will lose the battle.
If is byzantine, it can give the Attack () command and the Retreat () command. Since the commands are different, they will lose. In order for a liutenent to be sure they’re on the same page, they could exchange messages with the other liutenent.
In case is byzantine, it can tell that the command from the general was instead of , and would never know who between and is byzantine, and cannot make a safe decision either way.
In both cases, receives the same information.
In order for to be able to make a safe decision, another node is needed to reach the minimum requirement of nodes. Other more or less advanced algorithms will be used in order to exchange messages with this extra node to check wether a node is byzantine or not.
Byzantine faults with digital signature
If the system is synchronous, in case we use digital signature, then that receives the message from cannot counterfeit the message, so what we’ve seen previously cannot happen.
If the system is asynchronous, the byzantine node can pretend that the message didn’t arrive to it and so you still need another node. An algorithm for this case is PBFT.