Geth: It's Time for Long Term Support

The November 11th Ethereum chainsplit resulted from several node operators, miners, and exchanges who were running outdated versions of Geth. Many peopled jumped on this point — why were major blockchain companies running outdated versions of Geth? Shouldn’t they be up to date?

There are two primary reasons that updates happen: Bug fixes, and new features. Many projects separate these out into multiple channels — there’s even a widely agreed specification called Semantic Versioning, which defines version numbers along these lines. When you see a version number of the form: MAJOR.MINOR.PATCH (eg 1.9.17) this generally means that:

PATCH: Only includes bugfixes / security updates. Any tools interacting with this software should expect it to behave exactly as it was intended to before, any changes address unintended behaviors. Generally if the PATCH version goes up, it should be very safe to apply an update, and one can expect the software to become more stable.
MINOR: Includes backward compatible new features. Any tools interacting with this software should still work without having to make any changes, but there are also new features. New features often involve bigger changes, which are more likely to introduce new, unintended behaviors. Ideally MINOR versions are safe to apply, but they are more likely to introduce unintended instabilities than PATCH versions.
MAJOR: Includes backwards incompatible changes. Tools interacting with this software will probably have to make changes to incorporate the updates.

This makes it possible for v1.3.7 of a project to come out after v1.4.0 — so that projects that aren’t ready to make the leap from v1.3.x to v1.4.x can apply security updates without having to worry about the ramifications of new features.

While Geth uses the 3 part X.Y.Z version numbering, they don’t comply with Semantic Versioning. If they did, you would expect v1.9.23 to be a highly stabilized version of v1.9.0, serving exactly the same function with no new features and much more stable. Instead, nearly every PATCH update includes new features. For example, v1.9.7 added support for the Istanbul hard fork — hard forks seem like the definition of a backwards incompatible change, but Geth increased the PATCH version instead of the MAJOR version. Then v1.9.9 added support for Glacier Muir — another hard fork. v1.9.12 changed the default sender for eth_call from Geth’s default account to the 0x00...000 account — not a problem for Rivet (since we don’t have a default account) but a breaking change for some projects. v1.9.13 changed how much ETH the caller of an eth_call and eth_estimageGas had; this was listed as a bugfix, but was a breaking change for at least one Rivet customer. v1.9.14 changed the error messages of eth_call and eth_estimateGas to include the reason transactions were reverted — this is very useful, but if people were relying on the old error messages this is a breaking change.

Complexity In Evaluating Breaking Changes

It’s usually pretty easy to delineate a new feature from a bugfix, but breaking changes can sometimes be harder to define.

For example, Rivet uses Geth essentially as a Go library. We were probably the only project downstream from Geth that considered it a breaking change when Geth started writing the latest block hash to LevelDB in a batch write operation instead of an individual put operation, but for us that required a significant refactor. Still, we wouldn’t have faulted the Geth team for putting that in a MINOR version instead of making a MAJOR version for it.

When Geth changed its behavior of allocating an ETH supply for eth_call invocations, they made it more correctly align with what an equivalent transaction would do, but at the same time made it impossible to simulate two or three transactions ahead to see what a particular call would do once a particular address was sent the ETH necessary to execute the call. For some projects this was a breaking change and the MAJOR version should have been updated, for others it was a bugfix and only justified an increase to the PATCH version.

It is not our intent to rile up debate and criticism for each decision of which version to bump; that can quickly turn into bikeshedding and get in the way of progress. But being able to extrapolate meaning from the version number can be quite useful, and distinguishing big new features from critical security updates can be invaluable.

So if you’re a business running Geth, moving from v1.9.9 to v1.9.17 to apply a security patch means taking on a whole host of potentially breaking changes, both for your own internal systems and your customers systems. Since there’s no separation of critical bug fixes / security fixes from new features / breaking changes, it’s all or nothing. When major security patches are slipped in secretly, it’s no wonder that even responsible companies choose to stay behind.

At Rivet, every time a Geth release comes out we go over the release milestone, looking at least the title of each pull request to evaluate:

Is this a breaking change for our streaming replication system?
Is this likely to be a breaking change for our users?
If this is a security update, how likely is it to impact our systems?

On occasion we have backported bugfixes into our fork because we weren’t ready to handle other changes that would impact our streaming replication system, but we can’t do that if critical security fixes are mislabeled as optimizations.

Rivet’s Proposal

We propose to help the Geth team adopt Semantic Versioning. From the Geth team’s perspective, not a lot has to change — just be more willing to bump version numbers according to the rules of Semantic Versioning. This may mean Geth goes from v1.9.24 to v13.2.0 by the end of next year, but users will be able to more readily evaluate the magnitude of the changes in the update they are applying.

The Rivet team is then offering to maintain long-term support (LTS) releases, backporting critical bug fixes and security updates into the LTS release. Companies that want to make sure they stay up to date on critical updates but aren’t ready to deal with breaking changes can apply the LTS update and only get critical updates. The Rivet team is happy to maintain these support branches — we already do much of the work for our own fork, and the additional effort should be relatively minimal. But we can’t do it without support from the Geth team, highlighting (at least to us) which updates are critical and should be backported vs which are optimizations.

Now, any LTS update would necessarily end with a hard-fork; we’re not proposing to backport hardfork functionality into the LTS (as hardforks are the definition of a breaking change). When preparing for a hardfork, businesses would need to upgrade to the next LTS release, which will likely be the first Geth release to fully support the pending fork. In a case where a year goes by between hard forks we may have an intermediate LTS to come up to speed with new features, and a defined transition period where both LTS releases are supported for a period of time to allow businesses time to test and transition between versions.

The LTS model is one adopted by many projects with similar complexity to Geth. You see it in operating systems such as Ubuntu and Redhat, where certain versions of the OS get critical fixes for years, while the bleeding edge versions are supported for less than a year. You see it in many software frameworks such as Node.js, Python, an Django where some releases are supported for short periods of time, while others are supported for 30 months. LTS releases are the industry standard way to balance the need for a project to move forward quickly with the need for businesses with dependencies on that software to have operationally stable systems.

The November 11 chainsplit highlighted the need for a similar level of rigor in Ethereum clients. We recognize that the Geth team is stretched very thin, and are happy to pick up the slack to make the LTS model work so long as they’re willing to cooperate by getting us the information we need to know which updates should be backported to the LTS release.

There is some risk that highlighting a bugfix by including it in an LTS release may draw a potential attacker’s attention to that fix, prompting the exploitation of the vulnerability. But we believe that risk is more than offset by the value of teams being able to apply critical updates quickly, without having to untangle security updates from cool new features that introduce unexpected behaviors.

Rivet’s Proposal

Share