What is sharding?

ForkLog

6 years ago

What is sharding?

Sharding is a method of splitting and storing a single logical dataset across multiple databases. Another way to define sharding is horizontal partitioning of data.

When was sharding invented, and by whom?

The concept of sharding has been used in managing traditional centralised databases since the late 1990s. The term “shard” (fragment) gained currency thanks to one of the earliest massively multiplayer online role-playing games, Ultima Online, whose developers spread players across different servers (different “worlds” in the game) to cope with traffic.

A common business use case is splitting a user database by geography. Users belonging to the same location are grouped together and hosted on a distinct server.

What is sharding in a blockchain context?

Blockchain is a database whose nodes act as individual servers. Applied to blockchains, sharding means splitting the network into individual segments (shards). Each shard contains a unique set of smart contracts and account balances.

Each shard is assigned a node that verifies transactions and operations, in contrast to a scheme in which every node verifies every transaction across the entire network.

Dividing a blockchain into more manageable segments increases transaction throughput and thus addresses the scalability problem faced by most modern blockchains.

How does sharding work?

Explained using Ethereum as an example:

The Ethereum blockchain comprises thousands of computers, or nodes, each of which “lends” the network a certain amount of hash rate. That hash rate allows the Ethereum Virtual Machine (EVM) to function—to execute smart contracts and run decentralised applications (DApps).

Today Ethereum relies on sequential execution, in which every node must compute every operation and process every transaction. As a result, the verification process takes time: Ethereum handles roughly ten transactions per second, whereas Visa processes about 24,000.

Adding more computers does not necessarily improve efficiency, because the entire ledger is stored on every device, and the verification chain simply grows longer.

The idea of sharding is to abandon the model in which each node computes every operation, in favour of parallel execution, in which nodes handle only certain computations. This lets many transactions be processed simultaneously.

The blockchain is split into separate shards (subdomains or segments). Nodes manage only the part of the ledger to which they are attached (they run processes and confirm transactions) rather than maintaining the entire ledger.

What problems does sharding solve?

Sharding is a potential solution to scaling.

The more popular a blockchain becomes, the more users initiate transactions, launch decentralised applications and run other processes on the network. Transaction speeds then fall, hindering long-term growth. Rising activity forces nodes to intensify verification. There is a risk of “clogging”, as happened to Ethereum during the CryptoKitties craze, when the game accounted for 11% of the network’s transactions.

If groups of nodes are responsible for individual segments, each node need not maintain the entire ledger to execute every operation. Validation can occur in parallel rather than linearly, increasing network speed—thus addressing scalability.

What are the drawbacks of sharding?

The main issues are communication and security. If a blockchain is split into isolated segments, each shard becomes a separate network. Users and applications in one subdomain cannot communicate with those in another without a special communication mechanism.

Security is also a concern, because it is easier for hackers to seize a single shard: less hash rate is required to control an individual segment (the so-called 1% attack).

After taking over a segment, attackers can send invalid transactions to the main network. Data in that shard may also become invalid and be irretrievably lost. Ethereum proposes randomised sampling as a remedy—shard protocols are randomly assigned to different sections to authenticate blocks.

What are the alternatives to sharding?

Developers have proposed two ways to improve performance and transaction speed in blockchains.

The first is to increase block size. The key idea: the bigger the block, the more transactions it can contain and, therefore, the higher the transactions per second.

However, larger blocks require more computing power to verify. If block size grows substantially, only the most powerful computers will be able to supply the computational resources needed to act as nodes.

The high cost of such hardware means node pools will inevitably become smaller and more centralised, raising the risk of a 51% attack. Increasing block size also requires a hard fork, which risks splitting the community: if not all users accept the upgrade, two different chains using different coins will emerge. Block-size increases are not a long-term solution.

The second proposal is to use altcoins, so that different functions and applications run on their own networks with their own coins.

This model improves performance because a single blockchain is not overloaded, but it also heightens security risks, as computing power is spread across multiple blockchains. Again, the risk of a network breach rises because the power required to mount a 51% attack is much lower.

Who uses sharding?

Zilliqa was the first platform to implement sharding. On its testnet it achieved 2,828 transactions per second.

The Near blockchain ecosystem lets developers build and deploy decentralised applications. Near describes itself as “a sharded PoS blockchain” and claims its sharding technology keeps nodes small enough to run on low-power devices—potentially even on mobile phones.

Ethereum offers a blockchain ecosystem for deploying DApps based on smart contracts. The Ethereum Foundation plans to include sharding in the updated protocol Ethereum 2.0.

Other projects working with sharding include Cardano, QuarkChain and PChain.

What is the future of sharding?

Sharding features in the white paper for the Libra digital currency. Ahead of launch, Facebook acquired Chainspace, whose developer team specialises in sharding. Details remain unknown, but it is reasonable to assume Libra’s blockchain would implement some form of sharding.

Sharding could, in theory, resolve the so-called blockchain trilemma.

As Vitalik Buterin explained, the blockchain trilemma holds that only two of the three core properties—security, decentralisation and scalability—can be maintained at once. If sharding’s challenges are overcome, distributed networks could scale without sacrificing decentralisation or security.

Follow Forklog on Facebook!