Modeling Spreadsheet Use

There sits your user, happily typing away in his spreadsheet. A rather large spreadsheet it is, full of complicated macros and code. Our user is a spreadsheet wizard, using a plugin from a remote data service and a lot of code.

Looking at it from the ArchiMate point of view: how can this be modeled? 

Note: the content of this post has been superseded (amended and extended) by the content on this issue in my book Mastering ArchiMate. Follow this link to the book page (with downloadable PDF)  for more information.

There must be at least two ways to position software like Excel, because in ArchiMate there are two levels where you find software:

  • At the application level there is Application Component and its functionality
  • At the infrastructure level, there is System Software

(That’s the nice thing about a language, you can say things in multiple correct ways). Using System Software seems not to be the obvious choice in this case, because we must somehow have software at the Application level to be able to support the business process. Only at the application level can we realize an Application Service that is used in a Business Process.

So, let’s start with the most obvious choice: Excel is an Application Component realized by an Artifact (i.e. excel.exe and its support files). The spreadsheet (.xls) file Artifact itself realizes a Data Object for Excel’s Application Function to read/write. I’m presenting an example here, using original ArchiMate colours. To make matters a bit more interesting, our wizard’s spreadsheet uses a plugin from a data provider. Secondly, our example contains a second spreadsheet that does not use that data provider plugin at all:

So, the Excel application is an Application Component (realized by the excel.exe and all its support files at the infrastructure level) and the plugin is another Application Component (realized by another set of .dlls at the infrastructure level). Excel’s functionality loads the plugin’s functionality to become the aggregated “Generic API & Plugins” Application Function seen in the view above. (There are more ways to model this, e.g. I could have modeled an  application interaction between both application functions, more on this later).

The “Generic API & Plugins” Application Function reads a “Spreadsheet A” or “Spreadsheet B” Data Object which is realized by the “spreadsheeta.xls” or “spreadsheetb.xls” Artifact. After reading this data object it realizes an Application Service for the user’s Business Process, one for each spreadsheet it has read. One of these (A) uses functionality from the plugin.

It seems wrong at first sight that the Application Service realized on the basis of Spreadsheet B now depends on the plugin via the Aggregation relation in the model. But if you think about it, it is architecturally correct: suppose the plugin is broken and interferes with normal Excel functionality: spreadsheet B (the one that does not use the plugin at all) breaks as well, so spreadsheet B depends on the plugin in an architectural way, even if it does not use it. No problem here.

But there is an important disadvantage when you model it like this. All these separate spreadsheets are used by one single Application Function to realize all the different Application Services. The one-on-one relation between spreadsheet A and the Application Service based on it gets lost in that one Application Function and that would make analyses problematic.

Now, to reestablish the one-on-one relation between Spreadsheet A and the Application Service configured from it, we could draw an Access relation between that Application Service and the Spreadsheet A Data Object. In reality, you still need the Access relation between the Excel Application Function and Spreadsheet A, because the Application Service based on Spreadsheet A cannot exist without it. And besides, an Access relation between an Application Service and a Data Object is not meant to model something related to the creation of that service, it is meant to model Access by the service behaviour to data. So, except adding complexity, using an extra Access relation could be considered cheating.

Another option is to use an Association relation between Spreadsheet A and the Application Service that is realized based on its content. Apart from using Association in the first place (which most of the time I consider inappropriate because it is used instead of a proper model with more meaningful relations) it also complicates the model.

There is, I think, a better way to model this and it is also closer to the real role that those pesky spreadsheets often fulfill in your business: the spreadsheets themselves are the applications. Instead of thinking of Excel as the application (and thus creating a link between all spreadsheets via the Application Function) we could think of Excel as a platform, a bit like a Java virtual machine that sits on your computer. A user does not use “Java” (realized by java.exe), he uses a specific java program (realized by a .jar file for instance), only using the Java engine as the platform on which to run it.

If we move Excel from the application layer to the infrastructure layer, the spreadsheet itself becomes the application, one that needs the Excel platform to run:

Modeled that way, each .xls file becomes an Artifact that not only realizes a Data Object, it also realizes an Application Component. That makes a lot of sense, actually, because that is in fact what happens with a spreadsheet: it is not just data, it is a mix of active (macros and such) and passive (data) elements. It has behaviour as well as data. The Artifact realizes two objects simultaneously: a Data Object and an Application Component.

Modeled like this, we do not have one Application Function (performed by Excel) that realizes many Application Services (by reading many Data Objects), but we have many Application Components (one for each spreadsheet) that use one infrastructure service (Excel and its plugins on the desktop).

Now, as I wrote earlier, we know that spreadsheet B depends on the plugin even if it does not use that plugin. In this mode that is modeled by creating an Aggregate relation between the Excel Infrastructure Service and the plugin Infrastructure Service.

The big advantage of this mode is that it shows you the spreadsheet as it actually is used in your organization: a combination of program (behaviour) and data. A user may be interacting with a spreadsheet and not even knowing that he is interacting with a spreadsheet as you can normally program nice user interfaces in one. I have seen Excel spreadsheets that are indistinguishable from a program written with other means. It also makes very clear that the spreadsheet wizardry that your power users are so proud of is actually programming (“Hey power user guy! You know what? You’re a programmer!”) and if something goes wrong with the spreadsheet (e.g. dismal performance because the programming was far from perfect) he should look first at what he programmed himself.

Returning to the “Excel as application” model and my remark on modeling that differently: I could indeed have modeled that differently, e.g. an Application Interaction between the Application Functions of Excel and the data vendor’s plugin. I don’t really like the Application Interaction from ArchiMate. In the real IT world there hardly is any: in reality it is almost always “X loads Y” or “X uses Y (via an Application Service realized by Y)”. There is almost always a master and a slave, an application and a plug-in. The interaction suggests an equality that is almost never there in reality (independent software agents could qualify, but how many of those do you know in your organization?).

Many will wonder if modeling in this level of detail makes any sense. How useful is this level of detail? I believe in a correct and detailed model of your As-Is as the basis of analyses at the start of change initiatives (e.g. projects,releases). In a model as detailed as this, I can tell you which business processes are to be checked if we move from plugin version 1 to plugin version 2, for instance. You don’t model all spreadsheets like this of course. Just the ones that are just a mission critical application that happens to be written in Excel. With around 6 objects in your model for such an application, even with 100 essential spreadsheets in your environment, you have to model 600 objects, well within the reach of a decent architect. I find that architects often do not bother to model this (they just add an Application Component labeled “Excel” in their architecture) because they are wary of the effort it takes to find all those “hidden applications” in the organization. Still, not knowing is like keeping your hands in front of your eyes and thinking others can’t see you: a bit immature. Which is why I seriously dislike those views with the “Excel” box in them: they often are a sign of sloppy work.

9 comments

  1. Great work. Very precise thinking involved in this analysis. In past, when creating Enterprise / Segment / Capability Baseline Application Architecture views, I have modelled physical instances of MS Excel, MS Access. If the business relies on these apps, and they contain the only copy of business data, or if they have become part of the data export / import “plumbing” interface architecture between core business applications — they need to be discoverable in the repository. We capture a range of knowledge about these apps, using entity meta data. We also capture limited amounts of unstructured text where this represents useful knowledge.

    At the time, I used UML Component diagrams. I could have used TOGAF’s Physical Application Component and Application Interface entities.

    However:

    I keep thinking about the original intended purpose of ArchiMate, which was a language intended primarily to be used for Enterprise Architecture, as opposed to Solution Architecture.

    What I often see these days is uses of ArchiMate as the new UML Deployment Diagram, focused on modelling physical solution architecture.

    This wouldn’t concern me — if I were also seeing many instances of ArchiMate being used for Enterprise Architecture.

    Probably nothing to worry about.

    Like

  2. As a young discipline, we don’t have very well established definitions of many of the terms we use and multiple, sometimes contradictory, definitions are in use. And the word enterprise itself has several meanings, which may include project.

    I personally see “Enterprise Architecture” as something that does not have to do with doing something at the company/organization level but as the term for the combination of Business/Information Architecture, Application/Data Architecture and Technology/Infrastructure Architecture. In other words, I follow the structure of ArchiMate, calling all layers together “Enterprise Architecture”. Enterprise Architecture for me is about alignment between all the levels (that is what EA was invented for anyway).

    I think this is going to be subject of another post.

    Like

  3. I prefer the second way you model spreadsheets – much better.
    It reminded me that I have come across many organisations where the so called End User Computing applications, i.e. those based on Access, Excel, Sharepoint, VBA, Lotus Notes etc. are deliberately excluded from the target Enterprise Architecture, despite realising core business functionality. I have also discovered many failures in such EUC applications…

    Like

  4. This is a good post and I also prefer the second way you have modelled the spreadsheet. It reminds me of the GRAAL chapter in “Enterprise Archiecture at Work”: Proposition 5 – Business systems tend to gravitate to infrastructure.

    Like

  5. Definitely prefer the 2nd approach. And agree that the importance is in identifying the information that is being managed using a desktop application (spreadsheet) in support of a business process. So often overlooked and one that will highlight serious gaps if you were to do a quality assessment.

    Like

  6. And next to EUC (End User Computing) there is also IMC (IT Management Computing): all those scripts, scheduler flows, batch jobs, etc. Often seen as ‘managing production’, they become a whole world of ‘applications’ too.

    Like

Leave a comment