Testing the Algorithm – Evaluating TLA plus

If writing distributed algorithms is hard, proving they work with the many permutations of state in a highly concurrent distributed architecture is next to impossible. To mitigate this risk, typically you would write a comprehensive set of unit and integration tests and couple this with load and system failure testing to determine correctness both of the algorithm and the implementation. However, testing has its limits. It is difficult to simulate the many variations of system stalling, network latency and failure and to determine the consequences to other system components of resulting back pressure. In addition, discovering and fixing a flaw…

Read More