Scaling a Blockchain

In my Bachelor thesis "Analysis of the possibilities and limitations of Smart Contracts with the help of an implementation in Solidity" I demonstrate ways, in which Smart Contracts can be structured with known and proven patterns to improve, amongst others, security, while not worsening the user experience.

  • blockchain
  • Bitcoin
  • Ethereum
  • Casper

During my research I came to the conclusion that one of the main factors holding blockchains back from being a generally accepted and applied solution is the limited scalability of current implementations.

Scaling is one of the aspects being considered by the Ethereum development team for the upcoming update Ethereum 2.0 in order to provide a blockchain with a high transaction throughput. Other improvements that are being considered are in the fields of Decentralization and Security. In the past one of these three factors had to be limited in order to guarantee high standards for the other two.

To achieve better Scalability, Decentralization and Security it has been proposed that Ethereum changes its consensus algorithm from Proof of Work to Proof of Stake.

Consensus

Proof of Work (PoW)

In Proof of Work based public blockchains (e.g. Bitcoin and the current implementation of Ethereum) the algorithm rewards participants who solve cryptographic puzzles in order to validate transactions and create new blocks. This process is also known as "Mining".

Unfortunately the difficulty of solving these cryptographic puzzles has lead to the formation of so called mining pools and has resulted in the four biggest mining pools controlling about 70% of the hash rate on the Ethereum network.

blockchain-scaling-chart

Ethereum Mining Pools

The formation of mining pools mainly had the following ramifications:

  • Uneven playing field

    . Big players (i.e. mining pools) have a better chance of mining blocks and therefore obtaining rewards. This in return enables them to invest those earnings in new mining equipment and as a result increases the rate at which they get rewarded.

  • Centralization

    . Consolidation of mining power into a few pools enables pools to theoretically launch a 51% attack on the network which in turn enables participants in the attack to possibly double-spend, reverse transactions or in general act in a malicious way towards the network.

Proof of Stake (PoS)

In Proof of Stake based public blockchains (e.g. Ethereum's upcoming Casper implementation) new blocks can be created in a multitude of ways, one of these ways is by popular vote cast by a set of so called validators. The weight of the singular vote can depend on the size of the deposit (Stake) by the validator.

There are many different variants of PoS algorithms, which vary from the way the validators are rewarded for their participation in the system to the process of becoming a validator. From an algorithmic standpoint most of these variants can be classified into two major types:

  • Chain-based

  • Byzantine Fault Tolerance style (BFT-style)

In chain-based PoS the right to create a single new block is assigned to a validator, that gets chosen pseudo-randomly by the underlying algorithm. The newly created block has to point to some previous block (usually the block at the end of the current canonical chain). As a result most blocks converge over time into a single constantly growing chain.

In contrast during BFT-style PoS, validators are randomly assigned the right to propose blocks, but agreeing on which block is canonical is done through a multi-round process where every validator votes for some specific block during each round. In the end all validators permanently agree on whether any given block is part of the chain or not. Consensus on a block can come within one block and does not depend on any other factors (e.g. chain length) other than the voting process itself.

Benefits of Proof of Stake over Proof of Work

  • A Proof of Stake Blockchain does not consume large quantities of electricity in order to secure itself, as there are no computationally expensive puzzles to solve.

  • Ability to make various forms of 51% attacks more expensive to carry out than Proof of Work.

  • Discourage formation of centralized cartels (e.g. by limiting stake weight after a certain investment sum has been reached)

  • Discourage cartels - if formed - from acting maliciously or harmfully to the network

Finality

All consensus algorithms need a concept of finality which helps to decide which chain is valid. Finality is reached when the chain is in a state, where from that point onward all previous transactions are final and cannot be restored to a previous state and generally can no longer be reverted.

Proof of Work Blockchains only guarantee probabilistic finality, which does not mean that all blocks are final. This was shown by the Value Overflow Incident, where transactions from half a day in the Bitcoin network were reverted.

Proof of Stake Blockchains, as proposed by Casper research, guarantee economical finality. This is a mechanism that ensures that validators loose at least part of their stake if a validated block gets removed from the canonical chain, on which they previously staked / voted on. Therefore all validators have an interest in keeping the confirmed block as part of the canonical chain. Proof of Stake therefore also does not guarantee absolute finality.

Finality in combination with the underlying fork choice rule enables users to agree on the canonical chain and as a result on a common truth. For simplicity the number of validators is limited to a set consisting of N validators, who all have staked the same amount of coins (i.e. their votes have the same weight) and each specific validator k can create a block in the slot k, N + k, 2N + k, .... Moreover each block points only to one specific parent block.

The current Proof of Work version of Ethereum uses the longest chain rule as a fork choice rule, which states that the longest chain is always the current canonical chain.

chain-3
The green chain messages in blue, slots from left to right (e.g. A's block on the left is at slot 0, etc.)

This fork choice currently results in roughly 90% of new blocks being added to the correct chain and therefore only about 10% of blocks are lost. Unfortunately when increasing the frequency, in which new blocks are added to the chain, the rate at which blocks are lost increases aswell. Therefore a new set of fork choice rules is needed when trying to scale the transaction throughput of a blockchain.

Latest Message Driven Greedy Heaviest Observed Subtree Fork Choice Rule (LMD GHOST)

In order to limit the complexity of the following examples the validator set is limited to have a size of 5, the validators are labeled A, B, C, D, E, which means validator k at position n in the queue creates the blocks at slots n, n + N, n + 2 * N, ... Following this pattern validator A creates the blocks at slots 0, 5, 10, ...

Chain-4
Latest messages in blue, slots from left to right (e.g. A's block on the left is at slot 0, etc.)

During evaluation of the LMD GHOST fork choice only the most recent messages, signed by each validator, are relevant. The algorithm for calculating the support of each block works as follows:

1. Start at the genesis block

2. Each time there is a fork decide for the side where more of the latest messages support that blocks' subtree

3. Repeat until block with no children is reached

Chain5
Chain after calculating the supporting weight of each block

In order to compute the head the chain has to be traversed by choosing the chain with the highest number of validations at each fork. So in the example at the first fork the bottom chain is selected as it has four relevant messages supporting it compared to the one relevant message supporting the single block on top. At the next fork the middle chain is selected as it has the support of two blocks.

In this example the resulting canonical chain is the same chain that would be chosen when applying the longest chain rule. In fact LMD GHOST and the longest chain rule will result in exactly the same canonical chain in almost all cases. However in more extreme cases this is not always true.

Chain6
Scoring blocks by chain length. Longest Chain Rule leads to the top chain being chosen as the canonical chain.
Chain7
Scoring blocks by number of supporting latest messages and using the LMD GHOST rule (latest message from each validator shown in blue). The bottom chain has more recent support and is chosen as the canonical chain. It is not yet clear which one of the three blocks at the end of the bottom chain will take precedence.

The LMD GHOST approach is advantageous because it is better at extracting information in cases of multiple validators creating blocks with the same parent. Following the LMD GHOST rule all blocks are now counted as votes for the parent block. The longest chain rule fails to capture this nuance.

Finalization of Blocks in Casper

Casper FFG

Casper The Friendly Finality Gadget (Casper FFG) aims to give chain-based economic finality, which means that it will be unfavorable for stakeholders to revert a finalized block, as it can basically be described as burning the previously staked assets. Finality is achieved when a two thirds supermajority of the committee vote on and therefore sign a block. These votes are weighted by the stake the single validator previously provided. Casper FFG is built in such a way that it will never finalize a checkpoint that is in conflict with a previous checkpoint.

However Casper FFG is still vulnerable to attacks. For example if an attacker can prevent the system from reaching the majority needed to finalize blocks no new blocks are permanently added to the chain and it becomes stale.

Casper CBC

The LMD GHOST fork choice results in validator decisions having another property that benefits the network: they are sticky.

Chain8

What would need to happen for an invalid chain (top chain), created by an attacker (B), to become the new canonical chain in the case that four out of five validators voted for the "correct" chain?

A, C, D and E each built on top of E's first block and recognized that E had a high score stemming from the LMD GHOST fork choice, but all validators could have seen one or two of B's invalid blocks. Additionally D and E could have seen C's new block, but apparently did not.

When A was creating a new block the score in favor of the bottom chain was at least 4-1 and the views for C, D and E paint a similar picture - all favoring the bottom chain. Therefore all four validators are in a position where they cannot legally change their minds unless two other validators illegally change their minds first to bring the score to 2-3 in favor of B's invalid block.

Hence as a result of the stickiness of decisions and the resulting ability to detect illegal chain switches ("safety oracles") it is possible to achieve safe consensus asynchronously.

Sources