Agile, Dirt, and Technical Debt

Our discipline has long been working with the concept of ‘technical debt’. Generally, technical debt is a (often hidden, but not always) defect of some substantial scale. There is substantial effort involved in repairing the debt, otherwise wel will classify it as a minor defect. And since architecture is “what is hard to change”, debt is a matter for architecture, either architecture at the solution level (which does impact the product owner and users directly) or architecture at the organisation’s level (which affects all product owners and users indirectly and so is often neglected).

The substantial defect may have two different consequences:

Real cost, e.g. more costly operations because of a non-optimal setup. This may be compared to the interest on a loan, as in real debt.
Potential cost, e.g. an (unknown, potentially large) extra cost on change you might want to make in the future. This may be compared to selling a ‘naked call option’ (see this earlier post for an explanation of the ‘options’ analogy).

But this is relative, because one could easily phrase this in a positive way without changing anything. Fixing the non-optimal setup might be phrased as creating an improved architecture (a feature) with:

Real benefit, i.e. a lowering of cost
Potential benefit, i.e. the possibility of effecting future change faster and cheaper. This might be compared to buying a call option.

Both are mirrors of each other. So — as is often the case in architecture — we run the risk of getting into lengthy definition discussions. I don’t like definition discussions very much, they tend to be unpragmatic with little real world effect. Avoid them when you can. But — being ‘of the architectural persuasion’ — simplification is to my liking, so I’m proposing that we forget about a difference between enablers and debt. In fact, in an Agile setting, this is already the case for features and defects as these live in a single bucket — the backlog — and there is no fundamental difference between adding features and fixing defects when prioritising. So, I propose that we do away with the illusory and largely meaningless difference between improvements/enablers and debt, as removing debt is an enabler and not creating the enabler is often a form of debt.

The sly architect

Regarding framing the proposal for your landscape as improvement or debt: As cunning architects we should frame the substantial changes we want (as we think the organisation needs them) as much as possible in a negative way (loss). That is because it is a psychologically proven hypothesis that humans react more strongly to potential loss, than to potential gain, a psychological aspect of our animal mind that may be rooted in loss of life versus gain of a meal or sex. But I digress (as often is the case). Read Daniel Kahneman’s Thinking, Fast and Slow, for useful insights in the matter.

Actually, imagine how debt comes into existence. It might for instance creep up on us, e.g. by working too much in a short term mood, or just by being sloppy. But it also might appear overnight. If the vendor announces end of support for a product in a few years time, suddenly your dependence on that product becomes debt. Because it’s not only debt when you’re really hit in a few years, it’s already debt when you know you have to do something about it in the coming years.

Managing the ‘hard side of things’

So how do we manage the architectural, or ‘hard to change’ side of things (including the debt we have) in an agile setting? I’e written about the positioning of architecture in an agile setting before, and there I followed Philippe Kruchten in separating the features/defects from the architectural features/defects. But given the adapted definition of what architecture is, the matrix changes a bit.

WhatColourIsYourBacklog - ArchitectureIsHard.jpeg — The “what colour is your backlog” approach seen as two buckets: the ‘easy to change’ and the ‘hard to change’ (i.e. ‘architecture’).

We end up with two buckets. The ‘normal’ bucket and the ‘hard’ (architecture) bucket. The original matrix by Kruchten tended to equate architecture with ‘hidden’, but I think that is not the case. Some things are clearly visible, but still architecture (hard to change). If — in a fit of utopian fervor — we have completely refactored our landscape into small services, uncoupled via a service bus, with a boatload of serverless computing thrown in, a complete collapse of performance may be very visible to our users. We should not look at the visibility of the benefits side to decide if something is architecture or not, we should look at how hard it is to change it.

Before, I equated architecture with just potential gain/loss, but that is incomplete, because architecture defined as ‘hard to change’ can bring real benefits, e.g. in efficiency (cost of run) or flexibility (cost of change). Having said that: it is often the case that architectural changes are of the ‘potential benefit/loss’ type (and features/benefits are usually not), so I still label the bucket that way.

Within an agile environment, we already have some sort of priority setting mechanism in place to decide what we are going to work on in the next sprint, or the next increment. Within agile approaches, the difficulty is often to create room to work on the potential and indirect gains and losses. In my experience, they often tend to lose out in approaches such as WSJF (Weighted Shorted Job First) and in YAGNI-heavy settings. But being the ‘hard’ stuff, architectural improvements are difficult to estimate, both in terms of work to do as in terms of the benefits. Doing WSJF on a bucket that has both simple and hard stuff in practice discriminates against the hard stuff. And YAGNI is a problem all by itself, leading organisations for instance to erroneously assume that there is an infinite change capacity, which has a detrimental effect on the actual agility of the organisation: too much agile can make you less agile.

At each level, we may have these two buckets. At enterprise level, we may define them (though there, often there are generally only ‘hard’ changes at that level). At product/solution level, we will definitely have both, and here the ‘normal’ items generally outnumber the ‘hard’ ones.

Back to prioritising

Understanding your customers is not a simple extrapolation of listening to them.

We discussed prioritising architecture in an agile setting in the previous article. Prioritising work is the key issue. To get a good architecture in an agile setting means you have to make sure the ‘hard to do’ items get enough attention in the agile teams’ sprints. This means getting two things in those domains (products, value streams):

The hard things of the domain of the agile team itself.
- For instance, suppose I am building fully automated managed hosting to decrease cost and increase agility. My clients need certain platforms to host their applications: databases, middleware, operating systems, application servers, etc.. Adding a complete new platform is a lot of work (hard), so I need to be careful in what platforms I invest. So, if a customer requires a new platform, say a new operating system to run his application on, I can set this up, and setting it up is an architectural decision. I need to do some hard work before I can offer the features and benefits, after all. This is architecture that gets prioritised in the team’s own backlog.
The hard things that are necessary from a wider perspective
- Now suppose my clients are only using Oracle and SQLServer databases at the moment. Do I offer them a PostgreSQL database platform? From a YAGNI-perspective: no. Nobody is going to use it, and it will cost money to maintain. But Suppose I know my customers all complain about the license cost of Oracle and SQLServer? I could build and offer PostgreSQL support and offer them an opportunity. Remember: nobody asked for an iPhone before it came out. Android phones resembled Blackberries, back then. Or recall Henry Ford’s statement that if he had asked people what they wanted they would have asked for a faster horse. Understanding your customers is not a simple extrapolation of listening to them. From the wider perspective I may have architectural demands for a product. Architectural scope is not the same as solution/project scope (this is one of the basic pitfalls for architecture), generally it is larger both in time and content..
- Or maybe my customers are internal customers and they don’t ask for better credential management, but it is of strategic importance to the board to enable this in the fully automated managed hosting.

Back to governance

And thus we get back to the essence of architecture governance. As described in Chess and the Art of Enterprise Architecture, the ‘outside’ architecture role (enterprise architecture) has to be the checks and balances on:

The quality of the solution/project architecture (within its own scope, steered by the product owner (agile) or project executive (project with up-front design) on the basis of user needs.
The architectural demands on the solution/project architecture (from outside its own scope). Architectural demands may be on what and how.

The way we can do this in an Agile setting can be based on the following:

A team has its own backlog that is user-driven, but can contain ‘hard to do’ (architectural) items. These are prioritised using WSJF.
The team can mark defects as ‘dirt’ (for instance, badly written but functional code) or ‘a hole’ (e.g. a not fully implemented part that negatively impacts flexibility). ‘Dirt’ and ‘holes’ can accumulate and become debt if cleaning/adding becomes a substantial problem. A team may not have more than a pre-set amount of ‘dirty’/’hole’, they have a janitorial duty. You could wonder, why aren’t they allowed to have as much ‘holes’ as they want, but that is explained in the article that discussed the effect of YAGNI. Not too much ‘dirt’ and ‘holes’ are a matter of (important) hygiene.
Enterprise Architecture manages its own central architecture backlog, which contains both improvements and fixes (of debt) that are substantial. These are prioritised using WSJF-A (see previous article), though with a caveat — see below. These items may be product/team specific.
A fixed capacity (as mentioned in SAFe) is reserved in the agile teams for the work from the wider enterprise architecture backlog. If there are no architectural demands from enterprise architecture for a team, the team may use the capacity for its own architectural (substantial) items. In practice, product owners and enterprise architecture discuss and collaborate (the EA Chess approach is after all a consent-oriented collaborative model based on clear responsibilities).

A word of warning

A final word on the WSJF-A of the previous article: I use the aspects of Robustness (security, continuity), Efficiency (cost of run), and Flexibility (cost of change) as key elements of what Architecture must influence. It is important to reiterate that when we are making estimates we are in effect guessing, based on incomplete knowledge of all kinds and in a complex web of dependencies. The numbers that come out of such an exercise are inherently brittle. Even if you add a number for Aesthetics (elegance, simplicity, novelty) — as I did — you are still fooling yourself into believing this is science. It’s not. It’s mostly an art.

Second, the issues on an architecture backlog may contain items that accumulate cost or risk over time. One can take this into account in the table. If you have debt that gets worse over time, you could just rank it higher in the here and now. Both ways will work, though.

And thirdly, there may be some sort of ordering that is beneficial. While one item may be more important than another, doing them in that order may be a worse. Such complexities cannot be managed by the mock-rational business of numbers.

It is for those reasons that the end result of a WSJF on architectural (substantial) issues must not be more than input for a session of prioritising. In the end, you get the best result if you look at the output of the numbers game and then (perhaps after a game of ‘pre-mortem’, more on that another time) together look at it and ask yourselves: does this feel OK? Rules, even the illusionary ones, are never intelligent. People, when collaborating, is the best instrument we have.

Previous story in this mini-series: Prioritising Architecture under Agile

PS. I’ll be giving the EA keynote at the Enterprise Architecture Conference Europe 2018 on October 23.

PPS. How does the idea that architecture is ‘that what is hard change’, the substantial, square with one of the chess-analogy remarks in EA Chess that “sometimes a pawn is of decisive importance” (or in other words: details count)? The answer is: sometimes details lead to situations that are hard to change. A detail may be small, but changing (the consequence of) it may be a substantial effort. Those details are pretty important.

7 comments

Pingback: Prioritising Architecture under Agile – R&A Enterprise Architecture
Pingback: Prioritising Architecture and Debt with “Dado’s Diagram” – R&A Enterprise Architecture
Pete says:

October 1, 2018 at 19:21

Henry Ford’s statement that if he had asked people what they wanted they would have asked for a faster horse. On the other hand he also stated any color aanlong if its black.
Chrysler saw an opportunity and delivered red more luxury cars that suddenly opened up the market amd cost marketshare for ford . So stay agile and do look what the customer want…

LikeLike

1. gctwnl says:
  
  October 2, 2018 at 07:53
  
  Agreed. But nobody could compete with Ford unless they copied his production method. And though Chrysler’s cars were more luxurious, they basically weren’t that different, what differed was details not fundamentals. The same is true with copying Lean from Toyota using it as the argument for Agile. Toyota’s lean was an adaptation of the basic production mechanism which overall stayed much the same. In the software world, we’ve taken Lean not as an optimisation of ‘standard production’, we’ve turned it into the foundations of production. But, having said that: sure we need to be agile and look what the user (who might not always be a customer) needs or wants.
  
  LikeLike
  
MUHAMMAD DENNES says:

September 20, 2019 at 17:21

what effects are directly experienced by product owners and users?

LikeLike

1. gctwnl says:
  
  September 22, 2019 at 10:00
  
  I don’t exactly understand this question. Can you be more precise?
  
  LikeLike
  
Pingback: Should you derive your IT Strategy from your Business Strategy? Probably not too much. – R&A Enterprise Architecture