Metrics. Of 4.
How does your engineering team currently know whether it is performing well? Whether your current architecture is starting to slow you down? Whether it is time to refactor? How good in shape you deployment pipeline is?
The Metrics of 4 are a set of metrics that have the goal to balance two dimensions of your work: the throughput with which your team can deliver software and the stability of the work you are delivering.
Driving as an analogy
Think of it this way. Let’s say you are driving a car.
While driving, you keep track of the current performance of the car (speed, rpm). The indicators answer the question of: How fast are you going? If you aren’t fast enough or too fast, you can accelerate or decelerate, switching the gears accordingly.
At the same time you also have indicators capturing the quality or stability of the system (fuel, engine temperature, engine state, …). When you are running out of fuel, the fuel light lights up and informs you that there soon is a potential problem. You stop over, refill, and you’re back on track.
Easy - right?
It is easy. And obviously I don’t want to push this comparison to software delivery too much but just to summarize: The promise of the “Metrics of 4” is: These four metrics have scientifically been shown to be the ones that differentiate “high-performing teams” from “low-performing teams”.
The 4
So, of course, driving a car is not equal to developing software. Developing software is much more similar to the manufacturing of goods, like shoes or smartphones. Hence, I will now introduce a slight change in terminology.
Throughput metrics (How fast and how often does your team “produce” software?)
Deployment frequency: By “deployment” we mean a software deployment to production or to an app store
Lead time for changes: The time it takes to go from code committed to code successfully running in production
Stability metrics (How stable is the software system that your team is building?)
Time to restore service: Time it generally takes to restore service for the primary application or service you work on when a service incident (e.g., unplanned outage, service impairment) occurs
Change failure rate: What percentage of changes to production (including, for example, software releases and infrastructure configuration changes) fail
Example
Let’s imagine we are a freshly established team building a backend service with Python. There is a wide variety of technical and organisational choices we can make, including how to handle dependency management (pip, pipenv, poetry, ...), how to organise our on-calls, how to architect our system, how often to do pair-programming, what parts of the system to test, etc., etc.. Now, let’s run our metrics and see where we stand:
Deployment frequency: 4 daily deploys (which turns into 20 weekly, 80 monthly, 960 yearly). We do Continuous delivery, so there are no bigger “releases”, one commit means one deploy
Lead time for changes: 4 minutes (from code committed to a full deploy of our service into our Kubernetes cluster)
Time to restore service: 22 hours. Hardest part here is to determine what failure means. We currently track this as Sentry tickets and their time until resolution.
Change failure rate: 20% of our deploys through CircleCI fail somewhere along because…why actually?
So, how do we score? I just pulled up the latest version of the “State of DevOps” report. Let’s have a look:
Our team being in the game for a couple of months now, we can compare ourselves obviously against the 1,000s of teams that were asked for the report. This is nice. We would count our performance as somewhere around “high”. Being rather strong on the green “Throughput” dimension but weaker on the (light blue) “Stability” dimension. So there is room for improvement.
But also when not comparing ourselves against the “global competition” but simply asking ourselves over time: How are we doing? How are certain technical choices impacting our capacity to be fast? Is technical debt increasing making it more and more difficult to debug and restore a service after failure?
If you need arguments with stakeholders why to prioritise tech debt vs. new feature releases, there you have it. Are there other teams in your company with the same tech stack being faster than you? Have a look and reach out to them to understand what they do differently.
Origins
The “Metrics of 4” are officially called the “The Four Key Metrics” and originate from the book “Accelerate”. The book has several authors, but the lead author Nicole Forsgren is quite a prominent figure and has been pretty active in research on DevOps and its organisational impact during the last decade. She worked at Puppet, then founded her own research institute which was then acquired by Google. Since 2020 she is VP for Research and Strategy at GitHub.
I don’t want to bore you but that’s her current self-description (taken from her website):
"In my new role at GitHub, I am returning to research important topics relating to developer productivity and well-being, so we can help individuals, teams, and organizations — whether in open source or enterprises — create software better, safer, and in more reliable, sustainable, and accessible ways.” – Nicole Forsgren, 2020
What’s the key takeaway here? The book (and therefore the “Metrics of Four”) originate in the space of DevOps and developer productivity. Nicole has been involved in producing an annual report, the “State of DevOps Report” which then became the basis for her book. The report has been carried out annually since 2014 with over 31,000 respondents until 2019. The 4 Metrics is a scientific approach to define those metrics that guide teams and equally organisations to do great work.
The bigger picture
And now it gets really cool. Even though we as engineers often think differently about it, awesome software is not an end in itself. We create value. For the organisations we work in. And in turn for society. The people using our products.
"Whether you’re trying to generate profits or not, any organization today depends on technology to achieve its mission and provide value to its customers or stakeholders quickly, reliably, and securely.” – Accelerate, Chapter 2
What does this mean?
Forsgren and her colleagues asked more questions. Questions going beyond just software. They were interested how the ability to produce “good software” impact the organizational performance of the companies. But they were also curious about the capability to achieve “softer", non-commercial goals, including the quality of products and services, customer satisfaction or general the ability to achieve organization or mission goals.
What did they find?
"High-performing organizations were consistently twice as likely to exceed these goals as low performers.” And this is true for both the “hard” business metrics (profitability, market share, productivity), as well as the “soft” non-commercial goals.
Just let this sink in for a moment. Not only do sound engineering practices and well-written software enable you and your team to develop faster and more high-quality code. It enables your whole organization to perform better across all dimensions, making your organization two times as likely to achieve these targets.
How can you use the "Metrics of 4”?
Given their scientific origin and rigor, you are now free to implement the “Metrics of 4” and start benchmarking yourself against the broader industry or the organization in which you are working in.
The awesome thing is that the scientific origin, in theory, enables you to experiment. How does adopting SCRUM vs. Kanban change the performance of your team? What was the impact of onboarding three new engineers? Are you backsliding over time or constantly improving?
What practices do *your* team give the edge?
Resources:
Book: Accelerate
Website: Nicole Forsgren