This is part of a 5 part series on Rivet's transition to Cardinal. If you are unfamiliar with Cardinal, we suggest starting with the introduction
EtherCattle
Years ago when our team first started building OpenRelay, we quickly found that the hardest part of our stack to manage from an operational perspective was the Ethereum node infrastructure. Ethereum nodes can be quite cumbersome to manage — they’re peer-to-peer systems that rely on getting information from strangers on the internet, many of whom are not good at running Ethereum nodes. They have hundreds of gigabytes of data (terabytes if you want an archive node), making backup and recovery a bear. And if you want to scale them up — good luck — it quickly gets way more complicated than putting nodes behind a loadbalancer if you want anything resembling a consistent view of data.
So the OpenRelay team borrowed the concept of streaming replication from more traditional database. We created the EtherCattle initiative — a system with master and replica nodes, where the masters connect to peers, validate the blocks coming in, and share their data with replicas that handle requests from clients. This has worked quite well for Rivet — over the nearly two years since we launched we’ve achieved some of the highest uptime in the industry while scaling on demand to meet our customers needs.
But the EtherCattle initiative has a significant shortcoming. Both the master and replica servers in EtherCattle are derivatives of Geth — the Ethereum client written in Go. The streaming replication system is fairly naive — everything that the master writes to its database is streamed bit-for-bit to the replica, which writes the data bit-for-bit to its own disk. This hinges on the replica and master using the exact same schema for their respective databases, and this is becoming a problem.
There are several reasons this tight coupling between our replicas and masters is a problem.
First and foremost, it limits the chains we can support. We currently have versions of EtherCattle for Geth and ETC Labs’ core-geth, allowing us to support chains supported by those clients:
- Ethereum Mainnet
- Ethereum Classic
- Ropsten
- Goerli
- Rinkeby
And potentially a handful of others such as Kotti and Mordor, though those are untested.
We would really like to be able to expand our horizon to chains not supported by Geth, such as:
- Polygon
- xDai
- Optimism
- Kovan
and other EVM based (or EVM derived) networks that either have their own clients or forked from Geth long enough ago that maintaining EtherCattle becomes a challenge. If we need bit-for-bit compatibility between our masters and our replicas, there are huge challenges to supporting lots of different chains.
This is where Cardinal comes in.
Cardinal
The key idea of Cardinal is to separate the master server and the replica server with an explicitly defined communication layer, abstracted away from how either system stores its data. This will make it easier to adapt other clients to act as Cardinal masters, and can let the Cardinal replica focus on storing and serving data in the best way possible for RPC clients, rather than being structured for Geth’s concerns of participating in a peer-to-peer protocol.
It will also allow us to eliminate our existing dependencies on Geth as a single-point-of-failure. The chainsplit that occurred with the Berlin hardfork was caused by a bug in OpenEthereum, which brought down numerous services ranging from Etherscan to exchanges for several hours while the OE team diagnosed and fixed the issue. While our team tends to have more confidence in Geth than OpenEthereum (which has a long history of quality control issues), we don’t like the idea of being fundamentally dependent on a single Ethereum client for consensus critical purposes.
Moving single points of failure?
Doesn’t Cardinal just move the single point of failure down a layer, so that it’s at the Cardinal replica layer instead of the p2p node layer?
Sort of, but not as catastrophically. As we saw during the Berlin hardfork, if an Ethereum node implements an operation just a little bit incorrectly, that node can no longer sync with the clients that implement the operation correctly.
Cardinal will rely on master nodes for consensus. If Cardinal replicas were running with masters derived from Geth and Nethermind and the Nethermind node found itself on the wrong side of a chainsplit, we could take that node offline and Cardinal would keep being served by Geth.
If the Cardinal replica has an implementation issue with some operation, that
won’t effect the consensus critical data, only the RPC requests served to
users. It may lead to incorrect results for some eth_calls
, or incorrect gas
calculations on eth_estimateGas
, but it won’t lead to a total system failure.
Obviously we will want to put a high level of quality control on our own
implementations of these components to avoid such issues, but in most cases
maintaining operations in a reduced capacity while troubleshooting such an
issue is preferable to a system-wide failure.
Using OpenEthereum’s Berlin hardfork issue as an example — if the
Cardinal replica had exhibited the same issue as OpenEthereum, it would have
continued to sync and serve correct data for the vast majority of users, while
returning an incorrect response to eth_estimateGas
for a subset of
transactions so esoteric that nothing like it appeared on any of the testnets
before the mainnnet fork. We’d still take the issue seriously if it caused us
to return any data incorrectly, but it would not impact the vast majority of
our users, while a consensus issue in a conventional client most certainly
would.
At the time of launch, we have versions of PluGeth based on Geth and ETC Labs’ Core Geth, allowing us to support all of the chains supported by either client. We are starting work on a Nethermind extension to support Cardinal, which will give us added peace of mind that we’re not entirely dependent on Geth. We also have plans to add Cardinal master support to chains like Polygon, Optimism, BSC and more.
Cardinal’s design also helps lower costs relative to EtherCattle. Cardinal replicas store a much leaner version of the Ethereum state trie, and make more efficient use of Kafka for lower cost operations. We hope that this will make Cardinal more accessible to project teams that want to run their own nodes, but want less overhead than running a full EtherCattle cluster.
Next
Read about The Architecture of Cardinal.