(This is part two of a series exploring Complexity Science and the concept of complex adaptive systems in the context of ITSM. The first article, a primer for ITSM on these concepts, is here)
In my previous article, I explored the core characteristics of complex adaptive systems. Now, I am going to focus on one particular trait of such systems, emergence, particularly in the context of increasing our understanding of its impact on IT Service Management.
The new ITIL 4 material, notably the ITIL 4 book, High Velocity IT (I was a co-author of this book) introduces complex adaptive systems to the ITIL narrative as part of its broader exploration of DevOps, Agile, and other transformative ways of thinking and working. …
The UK’s Times Newspaper recently broke a story (paywall) alleging personal data breaches by members of the country’s new and hurriedly-assembled COVID-19 contact tracing workforce. This is obviously a significant story, but one particular line caught my eye:
“Thousands of (contact tracing) workers have joined social media groups that act as informal support networks attracting dozens of posts each day requesting help with IT problems and advice on how to handle cases”
It is entirely understandable that the focus of the article is on privacy, but this sentence reveals another interesting angle. …
As an airline customer in the modern era, it is possible to open a mobile app, and swap to a different flight with a few simple clicks. For the end-user, this operation has never been easier than now. However, the reality of this apparently simple action is that a huge number events are triggered in complex web of interrelated digital events. The events occur immediately, and each has an intricate impact on many other components of the system.
It’s interesting to explore this complexity. It needs to be established, for instance, whether I can make this change free of charge, or whether it is billable. If the latter is the case, a payment needs to be handled, which might involve interactions between a payment service run by the OS provider of my phone, a central credit card routing service, and my bank. These entities interact to perform a validation which establishes that I am who I say I am, and whether I have sufficient funds to cover the cost of the purchase. Of course, I might not choose to pay with currency: I may alternatively opt to use loyalty points which have previously been accrued and associated with my customer profile. …
It was interesting to read recently that TSB, the British bank which made headlines for the wrong reasons with a cataclysmic IT migration failure in 2018, has now effectively thrown in the towel and outsourced its entire IT banking systems operation to IBM.
In this article, I’m going to walk through some of the key points in the executive summary of the independent report, carried out by Slaughter and May, into the TSB migration failure which is likely to have prompted this outsourcing (IBM, incidentally, were parachuted into the bank after it struggled to recover).
The report, commissioned and published by TSB themselves, revealed a near-perfect storm of issues which led to the company going live with a new banking system long before it was ready for production, if indeed it was ever going to be. There are many lessons to be learned here, particularly about the fundamental difficulties of executing such a dramatic single change using linear, waterfall project methods. …
By the time Axelos released the foundational level of ITIL 4, in February 2019, a major change to the framework was long overdue, given that the third version dated back to 2007. Twelve years is a long time in a relatively young industry, but this twelve years had been hugely significant, not least because DevOps did not even exist as a phrase in 2007. It was another year before the publication of key pieces of work such as Agile Infrastructure, by Patrick Debois.
I genuinely feel that ITIL 4 Foundation is a really good piece of work. The ITSM community struggled for years to understand that DevOps wasn’t just new, but had actively set out to deconstruct established operational norms… including much of what had been built around ITIL itself. …
I presented yesterday (June 27th 2019) at the excellent DevOps Enterprise Summit, in London. The presentation is here and I’ll link the video when it is published.
One thing that was particularly interesting to me was the view from the venue, over the Thames, to the Isle of Dogs.
This is London’s old Docklands. It’s where my father went, at the age of 17, to join the British India Steam Navigation Company. In those days, Canary Wharf, site of the tallest buildings on the right, looked a little bit different to today.
The docks are gone now. But they’re not really gone. They moved up river to places like Tilbury, Harwich and Felixstowe, with still more cargo coming via roll-on, roll-off shipping through ports such as Dover, after landing and dispersal at the gigantic, massively automated port of Rotterdam. My father talks about spending weeks overseeing the loading and unloading of ships like the MS Kampala, pondering a move to the new container ships, which have now evolved into the beasts up to 400 metres in length like the Maersk Triple E class. …
Regular readers of my articles will likely have read a number of pieces discussing the practice of Swarming — a term used to describe the partial or complete replacement of a traditional “tiered” support structure with a less rigid, more dynamic collaborative approach.
It will also be obvious to those regular readers that I am a big advocate of Swarming. You can read a detailed introduction to the concept (in the context of aligning DevOps to enterprise support channels) here. I’ve discussed examples of real-world Swarming practice in articles such as this one, and have explored Swarming as a means to harness the principles of the Cynefin framework here. …
“Distributed systems have an infinite list of almost impossible failure scenarios”
So said Charity Majors at the 2019 Configuration Management Camp conference in Belgium. You can watch Charity’s amazing presentation in full, here:
The talk highlighted the difficulty of performing traditional testing, deployment and monitoring, when building and deploying software into modern distributed systems. Charity gave some great examples of some complex issues she had encountered, all of which would have been impossible to discern via conventional monitoring approaches. Here are three of them:
“All twenty app services have 10% of nodes enter a simultaneous crash loop cycle, about five times a day, at unpredictable intervals.It …
I like to joke that I went to my first DevOps conference by accident. It’s actually pretty-much true. While browsing for interesting reading back in late 2014, I came upon the website of Configuration Management Camp, aka #cfgmgtcamp, in Ghent.
For those of us whose careers have mainly revolved around the IT Service Management world, the phrase “Configuration Management” tends to evoke the ITSM process, as set down in the ITIL framework, of tracking and recording all of the “configuration items” which underpin the services delivered by the IT function of an organisation. My past life as a professional services consultant often led me into CMDB projects, in a range of complex scenarios such as national emergency service radio networks, the NHS, and large global payment networks. …
Over the last year, I’ve seen some interesting commentary, particularly in the DevOps community, around the concept of workflow tickets.
I was reminded of this yesterday by Rob England (aka @theitskeptic), on reading his recent Kill the Ticket article. This article, in turn, links to a passionately argued Damon Edwards blog, Tickets Make Life Unnecessarily Miserable. Perhaps, argues Edwards…
“…ticket queues (are) a significant source of operational strife hiding in plain sight?”
I don’t disagree with this suggestion, but I don’t see tickets themselves as the fundamental cause of the problems described in Edwards’s article. My own view, reinforced by my work with really great UX people over the last few years, is that those problems are a result of two significant failings: the way system interfaces are built around those tickets, and the way work is organised around them. …
About