The previous blog post introduced a way to use WSJF (Weighted Shortest Job First) in Architecture prioritisation settings. That approach does have something missing which my colleague Henk Dado‘s approach to prioritising the fixing of debt has: a way to include the aspect of ‘time’.Going back to the previous post, a way was presented to weigh architecture (including debt) issues in a WSJF-way with weights for Robustness, Efficiency and Agility of the landscape, as well as a value for what the ‘art’ is of architecture.
Henk came up with a two-dimensional approach for debt, based on two axes: cost of repair and cost of delay. An inventory of different kinds of debt (architectural, code, life cycle management, and testing) was created. Cost of repair is estimated in man-days. Roughly, that is, in terms of t-shirt sizes (XS to XXL). Relative cost of delay was estimated using mock-Fibonacci (see previous article). Such estimation could lead to a visual overview like this:
Illustrating (as always, all examples are fictional unless explicitly stated otherwise):
- A costly to repair issue that doesn’t give as a lot of damage. For instance, a debt in credential management. It’s all done by hand and that is expensive and risky. However, actual risk is low as we have good procedures in place.
- A slightly more costly to repair. In a new program, there is testing debt on code that is being built. There is not much code yet, so the risk is low, but fixing it requires a serious refactor.
- Another expensive to repair issue. The current database is a version that is on extended support, which is costly. Repairing requires work on all applications that use that database version, hence repair is costly too.
- An issue that is relatively easy to repair and that comes with a high cost/risk. For instance: many applications are using a platform that has security problems, e.g. outdated transport security. We could be hacked. Fixing it is relatively easy because a new version of the platform is available and migration is deemed easy.
- In a new experimental program, the foundations have been laid for a new landscape with a lot of business value. There have been made some poor architectural choices, especially in security, which is natural if you are innovating in an agile way. Fixing these choices is not expensive, but since only a very few business applications make use of the new platform, the cost of delay is low too.
Which to do first? Looking at the visual, it seems clear that we should do number 4 first. Low cost of repair and high cost of delay. Then we might do 3 and 5 in either order as they have the same quotient of cost of repair over cost of delay.
But Henk added something else: an estimate of what the debt is going to do over time if we do nothing explicitly. What is its autonomous development? Adding to our fictional examples:
- We know that a new identity and access management tool is being deployed in the near future. If that tool is in place, the cost of repair will decrease substantially. So, if we do nothing, number one will move down. Henk uses the wind’s quarters for this and each direction has a size as well, as an estimation on how fast the situation will change if we do nothing. So, this one would be S3 (south 3).
- The new program is creating more and more code, so the damage quickly grows. Repairing it becomes harder too, but not too much because there is a lot of reuse. So it moves in slightly up and to the right. ENE3. Easth-North-East 3.
- This is not going anywhere by itself. The risk increases a bit because the date of extra cost for support draws nearer, but not fast en the price difference is limited. Cost of repair could grow a bit because over time knowledge is lost. NNE1.
- The applications that use the platform are in a process of being replaced by those on a new platform. If we do nothing, this problems goes away on its own. W3.
- More and more applications will use this platform and it is growing fast. The more applications use the platform, the more expensive it is to repair the situation and the higher the risk. NE5.
Which looks like this:
From this visualisation it becomes clear that — while we should do number 4 first in terms of the quotient of cost of repair versus cost of delay — the autonomous development makes it very clear that we should do number 5 first, then number 2, and ignore numbers 1 and 4. Adding the expected development of the issue greatly enhances the usefulness of the diagram. Suddenly, we are not just reacting to the current state of affairs, we’re actually looking ahead.
Now, this works rather well with debt, but as I noticed in the previous article, there is no real difference between architectural debt and architectural features, they are each other’s mirror in the same way that saving €100 unnecessary cost is the same as getting €100 extra income. So, we could do the same with all architectural — where ‘architectural’ means ‘that what is hard to change‘ — debt and features. We could use this for architecture in general. So we extend Dado’s Diagram to all our architectural issues.:
We’ve changed ‘repair’ to ‘build or repair’ and ‘cost and/or risk of delay’ to ‘risk and/or missed value of delay’. In my view a very attractive and convincing way to present the architectural issues as part of the process to prioritise.
Of course, prioritising on paper is one thing. Documenting and analysing architecture this way — or any other way for that matter — only helps if you have the execution capabilities to follow up on it. Which is very hard to do for most organisations as — other than for instance strategic goals or compliance with regulation — having good architecture is something that has no real visibility or value to external stakeholders (shareholders, customers, regulators) and it is thus up to top management alone to provide the will and the drive.
PS. I’ll be giving the EA keynote at the Enterprise Architecture Conference Europe 2018 on October 23.