How I built this page

This site has needed a facelift for years. Not because the technology was outdated, but because every previous version of this blog eventually died. Quietly.

I’ve started blogs before. Many of them. They all followed the same lifecycle: excitement → a few posts → silence. Over the years, those genuine attempts quietly turned into a blog graveyard. Apparently, enthusiasm alone isn’t a sustainable publishing system.

Observability is becoming mission critical, but who watches the watchmen?

The last couple of years, there has been quite a lot of development in the area of lowering the barrier of entry for observability. There are now quite a few, reasonably mature options out there that lets you set up a good monitoring stack either through a few clicks or by a few one-liners in the terminal.

In the managed open-source space, the most successful one so far probably is Grafana Cloud, but there definitely is no shortage of closed-source vendors providing APM solutions where everything you need to get started is to drop either a single or multiple agents into your cluster or your machine.

Error Economics - How to avoid breaking the budget

At SLOConf 2021 I talked about how we may use error budgets to add pass/fail criterias to reliability tests we run as part of our CI pipelines.

As Site Reliability Engineers, one of our primary goals is to reduce manual labor, or toil, to a minimum while at the same time keeping the systems we manage as reliable and available as possible. To be able to do this in a safe way, it’s really important that we’re able to easily inspect the state of the system.