In the previous story, Agile teaches us the true meaning of Architecture, I introduced a ‘new’ definition of Architecture: the design decisions that are hard to completely remove from implementations (or in short: that what is hard to change). And I introduced the seesaw that exists in any organisation, two ends of a spectrum: change is easy (agile) and change is hard (i.e. architecture).
Now, everyone wants to have it both ways. We want as easy as possible change in our organisations, but we also want a good architecture, which has to be there so we are robust, efficient and flexible. The question is: how to have them both. When you start something new, it is seductive to think that creating a good architecture is easy: you design it and then you implement it. Nothing ‘hard’ at all, there is no existing ‘legacy’ architecture that is hard to change. But there are two problems with that:
- The suggestion that it is easy flies in the face of Agile by introducing some sort of up-front design. Sure, I know that before you start implementing in an Agile setting, you still have to do some thinking and designing up front. Even with a greenfield project, you still end up with hard-to-do refactors (i.e. architectural changes).
- Large changes seldom happen in a vacuum, but they happen in a world of already implemented architecture, some of which may have to adapt.
In other words: a greenfield may be easier, but only for a short while and most transformations are not really greenfield. We’re not all small startups with big plans.
— Brian Foote and Joseph Yoder
Everyone in agile settings is generally aware of the importance of good architecture, and if not they quickly become so if changes get mired down in debt and dependencies. Everyone with at least a bit of real world experience knows that without good architecture, things get pretty messy, inefficient, expensive and slow. As Foote and Yoder remarked: If you think good architecture is expensive, try bad architecture. The question is: how to get good architecture.
The software engineers that created the Agile Manifesto stated somewhat naively that architecture emerges from well functioning teams. This does kind of work: a team that gets stuck because its architecture stinks will have to refactor to improve it and room in sprints will have to be made. An agile framework like SAFe goes further. Here the concept of the ‘architectural runway’ is introduced, the enablers for teams to ‘consume’. For work that needs to be done, SAFe suggests reserving a fixed part of the team’s capacity for this ‘runway’ work. The architecture for what happens inside a team is not really addressed, by the way. What isn’t really addressed also is how this interferes with the priority-setting by the product owner.
In situations I have seen, the product owner and scrum master run the team and the product owner sets priorities including architectural ones. The mechanism in SAFe to decide what to do first is called WSJF, which stands for Weighted Shortest Job First.
Weighted Shorted Job First calculates priorities. WSJF is: dividing the ‘cost of delay’ by ‘job duration’. This is pretty easy to understand: the cost of delay (which may be ‘missing out on benefits’, but also increased risk or missing an upcoming deadline). You do not need to calculate to get some monetary value, you can use a relative ordering (such as the ubiquitous mock-Fibonacci series used in many agile settings). And you divide that cost of delay by the job size (SAFe uses ‘job duration’ as a proxy for job size). In SAFe, the job size is also a relative ranking using the Fibonacci-like series.
Fibonacci series and its use in Agile
Often in Agile, a mock-Fibonacci series is used to relatively order work to do, mostly in terms of how much effort is needed to build or change something. The real Fibonacci series consists of a series where each number is the sum of its two predecessors, so starting with the couple 0,1: you get: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 154, and so forth. The series used in Agile are variations on the real series, e.g. 0, 1/2, 1, 2, 3, 5, 8, 13, 20, 40, 100. The idea is to put all the changes that are roughly equal in effort in the same bucket, so, all the small ones in the bucket marked 1, the next up in the bucket 2, etc. While officially the numbers do not represent real effort, only ordering, teams often use the numbers at a shorthand for days of work. The larger the number is, the more uncertainty it has, so the series prevents you from useless discussions to estimate if it is 6 or 7 days for instance and the false sense of precision that that brings. See also: Planning Poker.
There are a few issues with using WSJF for architecture prioritisation.
- Specifically in SAFe: using job duration as a proxy for job size may work against architecture. Often the refactor is done by only a small part of the team which has to work for a longer period (several sprints) to get it done so that the result can be merged back.
- Architecture is ‘what is hard to do’, which means generally it takes more time, which means it gets a low priority from ‘weighted shortest’, unless the benefits are really great. But here we run into the problem that determining the benefits of an architectural epic or story is nothing more than estimation — a luxurious word for ‘guess’. The benefits are often only potential and not actual. The makes Architecture start the contest with a 0-2 disadvantage:
- It is a hard sell to the people focus on features and defects
- We humans are largely driven by what Daniel Kahneman has christened ‘System 1’. System 1 is our second-to-second intelligence, which does the ‘estimation business’. It does this quick and surprisingly good (i.e. navigating a busy sidewalk), but it has a couple of big Achilles’ heels, one of which is that it works on the basis of WYSIATI: What You See Is All There Is. Our ‘System 1s’ work with what is directly available. We don’t take into account what we do not directly ‘see’.
There are two far more serious problems with approaches such as WSJF:
- Architectural needs from outside the scope of the product owner have no business value for that owner, nor is every owner as good at architecture or may be under heavy pressure for short term benefits. So, somehow we need to bring in the architectural checks & balances from the perspective of the organisation as a whole (see Chess and the Art of Enterprise Architecture).
- Grady Booch has said about design: “[…] design is the activity of making […] decisions. Given a large set of forces, a relatively malleable set of materials, and a large landscape upon which to play, the resulting decision space may be large and complex. As such, there is a science associated with design (empirical analysis can point us to optimal regions or exact points in this design space) as well as an art (within the degrees of freedom that range beyond an empirical decision; there are opportunities for elegance, beauty, simplicity, novelty, and cleverness)”. Now, every decision on what to change in your product or landscape is in fact a design decision. And as Grady has said: it is not just a science, it is also an art. Good designers and architects know that aesthetical notions such as ‘elegance’ are powerful proxies for robustness, efficiency and flexibility. But as they are aesthetical, they are very hard to take into account if an economic view dominates prioritising options. SAFe’s simple ‘take an economic view’ approach has the side-effect of moving architecturally important aesthetic considerations aside.
Getting the right priority for Architecture in an Agile setting
In SAFe the architectural side (the enablers) of things is managed by maintaining the ‘architectural runway’. These are parts of the development that enable future — but not too far in the future — features. And SAFe says: Use Capacity Allocation (a percentage of the release train’s capacity in an increment) for Enablers that extend the runway. (A release train is the next level above a team).
I think that is a good idea. Trying to manage architecture with WSJF in the same bucket as features and defects has too many drawbacks. Even with abstractions like Fibonacci-like numbers for ordering benefits and costs.
The question then becomes: who is going to determine the use of that capacity and how?
There are several elements to this, as far as I’m concerned:
- There have to be separate backlogs for architectural elements, so the ‘hard to do’ enablers, but also the ‘hard to do’ debt (more on debt in the next story in this mini-series) can be separately managed. See also Agile & Architecture.
- There has to be a fixed capacity for architectural work (work that comes from the architectural backlog). Straight from SAFe. 20% seems to be a safe bet for a starting level.
- The Lead Architect of the organisation has the final say on prioritising the architecture backlogs. More on this below.
- The product owner is in charge of prioritising the product backlog (equivalent of the Project Executive being in charge of the prioritising in his project in an up-front design setting in Chess and the Art of Architecture). As the Product Owner is also in charge of the team more or less, he must see the architectural backlog items as his own with the Lead Architect as the ‘client’.
- As a result of for instance timing issues (e.g. the same team members are required for certain features and enablers), it is quite possible that the architectural capacity is not fully used. However, if the product owner’s ‘capacity debt’ exceeds a certain level without consent from the Lead Architect, he loses his freedom to decide on his own priorities and the Lead Architect decides until enough architectural work has been done. In other words: you ignore architecture, you pay the price. This is a backstop. It forces the product owners to take the Lead Architect seriously and work together. The whole decision making should be done on consent-basis with escalation to higher management if consent is not attainable (see Chess and the Art of Enterprise Architecture).
In the end, if matters escalate, it all comes down on the conviction of higher management that architecture is important even if there is no direct short term business value. If that is lacking, no architecture approach will work. Separating architecture-required work and having a strong lead architect to manage that in a pragmatic manner is a good way to provide a basic protection. After all: we all are aware that bad architecture is a lot more costly than good architecture, right?
Prioritising an Architecture backlog
In the last story in this mini-series I will talk about what goes in the architecture backlog (e.g. when do we call something debt in an architectural setting for instance). For now, assuming we have our items, how can we prioritise these?
We should remember that what we are doing here, because we are prioritising higher level design decisions, the hard-to-change elements of our organisational landscape is design. Going back to Grady Booch, design can be a science, but it is also an art. So, we can create impressive looking formulae (the ‘science’) and try to find data to feed the formulae, but we have to accept that both the formulae and the inputs are pretty uncertain. Often they give an illusion of science, without being scientific (something that can be said for much of our field, by the way). On the other hand, if we want to trust our brain’s estimation machine (the ‘art’), we must accept that this machine has serious flaws. Good architects who trust their estimation capabilities overly much display a certain hubris which is risky. Not bad per sé, but risky.
A word of warning: generally I write from experience. I’ve seen something myself, or heard about it from people who had the actual experience. Chess and the Art of Enterprise Architecture was written after trying it for three years and learning from both the successes and the failures. The suggestion below is nothing more than a proposal. Untested.
Now, suppose we have a set of architectural improvements, either fixing technical debt or improving our ‘architectural runway’. And assuming we have set aside capacity to work on these, capacity that is (within reason) protected from the hustle and bustle of short term business value. How should we prioritise?
I propose that we use an adaptation of WSJF. We select what our architecture improvement aspects are and order them using mock-Fibonacci. Architecture improvement aspects have to do with your vision and strategy and your goals for architecture. Why do architecture in the first place? Personally, I think you can do worse that doing architecture because you want to improve or maintain regardless of your business strategy in your landscape the following aspects:
- Robustness. You want your landscape to be secure and reliable. It should not be brittle and easily break down or have security issues. Often this is an important satisfier for business, having to do with the all-important trust of your customers.
- Efficiency. You do not want to spill resources, money, time, etc. to run your landscape. A certain cost is unavoidable, but you do not want unnecessary cost. It is nice if you can save a few million every year, that adds to either profits (private sector) or results.
- Flexibility. You do not want to spill resources, money, time, etc. to change your landscape. A certain cost is unavoidable, but you do not want unnecessary cost. It is nice if you can do that large transformation for 30 million in three years instead of 150 million in eight.
Incidentally, these are the core aspects mentioned in the rather popular 4-minute Why Enterprise Architecture animation we created 5 years ago.
As fuzzy as estimation gets, Robustness, Efficiency and Flexibility are the ‘sciency’ parts of deciding what is best for your organisation. I think it is good to give explicit attention to Aesthetics (elegance, beauty) of your landscape because it is such a good proxy for quality.
I propose you take your set of architecture improvements and rank them each, using mock-Fibonacci, on Robustness, Efficiency, Flexibility, and Aesthetics. And you divide that by the result of a mock-ordering on Difficulty to achieve. You would get something like this (mock example):
In this mock example, the first improvement is a point-solution for certain automation in your landscape. It’s actually makes the landscape less elegant (I’m using negative values to be able to signal that things get worse, not better), but it will make running it a lot cheaper. The second adds to our flexibility (we can support more and quicker implement use of the new platform, we expect this to drive innovation but there is no direct business request yet) while also a bit diminishing it (we need to maintain it when we make changes in our landscape) and it comes at a cost of Aesthetics: more variation. The third is driven by the architect’s sense of elegance. The infrastructure automation has become complex and a bit too difficult to change. A refactor is costly, but will clean things up and also make it a bit more robust. The last makes operations a bit less costly but it does improve security a lot and it removes a bottleneck in change activities. In the above example, the improved credentials management comes out on top.
This mechanism also opens up a simple way to set architectural directions from the every top of the organisation. Suppose the organisation’s security and continuity are actually pretty good, but the main problem is that everything is so costly to run? What you would like is to influence the broad strokes, e.g. set a strategic importance on, for instance efficiency (if that is your main strategic worry). That can be done to introduce weight factors for each aspect. E.g. see this:
And suddenly, because of the strategic importance to improve efficiency and agility/flexibility, we will implement the new platform first, the hard to do refactor second and the point-solution last.
Again: this kind of mechanism creates the illusion of exactness which is quite dangerous. People tend to fudge: knowing the weights set by the top, they might choose different weights for the aspects so the end result is what they want. Collaborative setting of these values helps against fudging, though. You can try an architectural ‘Planning Poker’ for each of the Value and Cost aspects.
PS. I’ll be giving the EA keynote at the Enterprise Architecture Conference Europe 2018 on October 23.