9. Resuscitation

Reading Time: 13 minutes

Whether you’re a brother or whether you’re a mother
You’re stayin’ alive, stayin’ alive
Feel the city breakin’ and everybody shakin’
And we’re stayin’ alive, stayin’ alive

Life goin’ nowhere, somebody help me, yeah
I’m stayin’ alive

Bee Gees (Stayin’ Alive, 1977)

“RPA is dead (or dying)” was the call made by several technology analysts and research firms for a period of time.

But it was too late. Customer organisations had changed: new teams and departments had been set up. That cringeworthy term “Robotics” appeared on many name cards and linkedin profiles. More importantly, major US based VCs had already put in big money into these startups.

Something had to be done to address the two main issues:

  1. document data entry
  2. discovering processes to automate

Already, the use of OCR in front of RPA had become common. Soon the pure OCR players who were initially reluctant to partner with us, came out and joined the RPA game. There were many use cases where RPA tools interfaced with the expensive OCRs, obtained the digitised data from the OCR, and propagated them into downstream applications. Computer Vision (CV) was used for pre-processing and pattern recognition, before OCRs were applied to specific regions. This was important because there were many input page formats and it was getting difficult to locate the position of field values. Eg: invoice number on an invoice could be located in various places depending on the page format of the vendor’s invoice. CV could be used to sort incoming images based on known page formats and then the appropriate OCR could be applied.

Since 2011, artificial intelligence (AI) as a computing discipline witnessed yet another revival/thawing from a winter cycle. Since its origins in the 1950s by John McCarthy, AI has suffered several “winters” where no particular progress occurred in that field. From around mid 1990s to 2011, one can say that AI was in a long “semi-winter” cycle. Many achievements did take place during this period, but only a few mentions made it to headlines: such as Deep Blue beating Kasparov. But a few good stars were lining up for AI.

  1. Processing power of portable computing devices crossed a threshold that allowed AI based computational models to be executed on small devices instead of super computers.
  2. The rise of “big-data” enabled the birth of new companies that dealt with extraction and processing of huge data sets that were getting produced in every industry. These data sets were then used as training data for AI software.
  3. Deep learning techniques applied to neural networks became more feasible due to advances in software, machine, and data availability. This resulted in many successful applications of AI.

In the backdrop of a rising AI, and many nice and nifty AI frameworks in Python and Javascript being made available, it became easy to create very impressive quick win solutions where some AI widget is used for text extractions (from images, emails, PDFs, Microsoft Excel & Word, voice messages, etc), which is then passed to an RPA based workflow.

This was the snake oil (sorry, game-changer) that many were looking for. Many RPA vendors added document extraction plugins into their products and thus subsumed that capability into the core offering. At the same time some products allowed “extensibility” by allowing third-party OCR and AI tools to be plugged into their RPA runtime.

The other issue: that of process discovery and configuration was causing much concern because projects were getting stretched. Standard operating procedures (SOPs) for business activities were out-of-date or non-existent. Process steps existed only in people’s heads and not on any formal document. In such situations it became very difficult and time consuming to complete automation projects. Exception-scenarios were getting missed out in implementations, and that caused anxiety to the operations teams in client organisations.

Since artificial intelligence and machine learning were becoming popular, a potential application of AI/ML for process discovery was found. The RPA toolset was already capable of recording keyboard and mouse actions of human users. Some companies were using this data to do “record and playback” type macros. Keyboard-mouse replay can be used to fool people to believing that automations can be configured very quickly. This is a particularly useful subterfuge that one of our competitors used to get ahead of Inventys Fusion in a face-off situation at a bank. Keyboard-mouse replay is a suitable and satisfactory mechanism for user interface testing scenarios. A user interface testing tool is similar to an RPA tool in that they both take control of the UI screen and perform actions on them as if a human was controlling the keyboard and mouse. The difference is that for a UI testing tool, its purpose is only to verify that the UI works as specified, whereas for an RPA tool the actions performed are part of a process flow. Due to this similarity, a few UI testing companies pivoted into RPA upon seeing the value of the RPA market. However, blindly replaying keyboard and mouse actions that are previously recorded is, in my opinion, borderline fraudulent — even if data values are parameterised and controlled through the workflow. The reason is that in order to populate a field, the tool takes a cue from the recorded mouse movement data and replays it to reach the same x-y screen coordinates. The tool then replays the mouse click action causing the field to be in keyboard focus. Subsequently replaying the recorded keystrokes or playing keystrokes of a parameterised value will cause that field to be filled with the contents of the key sequence. If for some reason, the screen’s resolution changes or if due to business logic purposes the field’s position shifts to a new location of the UI window, then the recorded x-y coordinate will no longer be valid; in fact that x-y coordinate could be over a very different field. If a click is made on this wrong location and then the replay tool blindly plays the intended keystrokes, it would either result in the wrong field receiving the text value or the typed characters getting wasted in a non-input area of the UI. Either way, this leads to a very brittle automation configuration that hangs on exact screen resolutions and locations of fields.

The Origin and Evolution articles explain how UI Automation was an evolutionary maturation of primitive screen macro techniques and as such it had moved beyond such brittle and error-prone techniques like keyboard-mouse record and replay.

In addition to keyboard-mouse events, the recorder can also record various other properties of the UI controls that can be useful to create deep configurations. These properties include windows handles, windows messages and other identifiers that denote the user interface elements that were touched and acted upon when the recorder was running. Attempts have been made by vendors to beautify and simplify the automation configuration process. Fancy names have been given to these techniques that seem to “automagically” create automation configurations simply by recording processes when performed by humans and then analysing the recorded data to derive a first-cut RPA automation.

In my opinion, these are toy features with anecdotal usefulness. Literally dozens of things can go wrong in real-life attempts to use these types of gimmicks. The original creators of the enterprise applications would have programmed various features and behaviours for the user interface controls. There could be field-level triggers that are fired when certain actions take place. It is often difficult to determine which type of user interface event would trigger those actions. When the screen is handled by physical mouse and keyboard, nearly half-a-dozen screen events can be triggered by a simple physical mouse click. Unless the tool knows which among these events would trigger the intended action, the automation will not be able to cause the desired follow on effect to occur. For example, in a form that accepts addresses, there would a dropdown for selecting the country and a dropdown for selecting the city within the country. Usually, when the user selects the country dropdown, a trigger is fired that reads the selected country and then loads the list of cities in that country; this list drives the city dropdown. Depending on how the original application was written, the trigger to load the city options may occur after the user clicks-out of the country field, or it may occur as soon as a value for the country field is detected in the model of the form, or it could be when the city field is brought into focus. In an “automagic” configuration, if the tool omits firing the appropriate event, the city dropdown will never get populated with the relevant list associated with the selected country — thus causing the automation to breakdown.

Another use of UI action recordings is to collect many hours or weeks of recording data from multiple users involved in the same business process, and then pass this to an AI based program that would be able to extract the common patterns that related to the workflows that the users are performing. It is believed that this AI based pattern extraction technique would be able to uncover processes that are not property documented.

While this sounds very exciting there are a few issues:

  1. The initial or first-cut flow definition that the analysis tool generates needs to be significantly modified in order to make it a reliable and robust UI Automation. As mentioned earlier, there are various nuances about the native applications’ UI behaviour that need to be taken into consideration in order to define each automation action. Once these modifications are performed, it becomes a nightmare to re-sync the original recording data with the final ‘robust’ version. There may be additions, modifications and deletions of actions. The original recording is virtually thrown-away. The only reason to keep it would be so that if in future, there are specific sub-scenarios that did not get elicited during the initial recording, then the original recording can be used to automatically determine the branch points from where the new sub-scenarios would be joined. Even in such cases, maintaining and synchronising these changes would cause much confusion. Some tools may simply not provide this scenario-joining as a feature. Instead they will rely on the user to define join points in the original flows, treat the new sub-scenarios as new flows, and simply join the two in order to include the sub-scenario into the main flow. All these activities place new demands on the people who are responsible for maintaining the automations.
  2. The recording will only reveal the “as-is” scenario. Taking this as-is flow definition and converting it into an automation, without putting it through sufficient scrutiny and possible optimisation, will only result in the automation of flaws which can potentially magnify the errors since the errors will get repeated at a much faster rate through automations than with manual actions.

In 2017, after formally concluding my association with the acquirers of Inventys, my colleagues and I created Telligro. We honourably discharged our non-compete and non-solicitation obligations and quietly worked on our vision for integration and automation of processes. During that time, we worked with established AI/ML researchers and engineers to develop a process discovery tool that would help speed up the configuration of UI Automation workflows. To their credit, the team did produce the expected tools, but by then I had started leaning towards a different perspective.

I took a step back to review what was happening in this entire post RPA automation scene. To me it felt like an ephphanic realisation of the utter stupidity of how things had turned out with RPA, and the lame attempts to maintain the hype. As mentioned in a previous post, my journey down Mt Stupid had started in 2013. Having learnt my lessons, I embarked on re-building everything from scratch, based on my renewed vision about straight through processing (STP) of business transactions.

We had created Inventys Fusion in order to fill a gap in conventional application integration. Our aim was to achieve STP for all business processes. We had got entangled into back-office processes because that was the only available route to get enterprises to sit up and listen to us. I have detailed that journey in a previous article.

However, outside our control, the RPA movement and the spin generated by the other players who had joined the race had taken the market to new heights of stupidity. The pioneers, Openspan and Inventys had got acquired by non-automation companies. Inventys Fusion eventually became defunct after we (the original Inventys team) parted ways with the acquirer. Openspan had to play second-, third-, or fourth-fiddle to Pega’s mainstream offerings.

The vendors that remained in the scene were all copycat; and their field of interest — the market for their product — was automation of manual business processes.

This new group of vendors was driven by technology analysts’ and consultants’ views about the domain of “process automation”, which was all about replacing human interventions with software interventions. Artificial intelligence (AI) provided a gateway to technologies that made software perform tasks in ways that mimicked humans. Thus document OCR and CV, along with process discovery by analysis of previously recorded actions became the silver bullets that would keep organisations on the “robotic automation” journey.

This led to a flurry of terms to define the offspring of the union between AI and RPA. New phrases were churned every month. Intelligent Process Automation (IPA), Cognitive Process Automation (CPA), Digital Process Automation (DPA), were some of the terms that floated to the top. After a while they lost the “P” and then it became IA (intelligent automation) and CA (cognitive automation).

Vendors and analysts flooded the scene with glossy infographics to “define” these terms and to declare their prophesies about the future of process automation.

Then, after many years of waiting and dillydallying , the big gorilla of technology analysts — the G company — finally entered the ring. They decreed that there will be one automation — the mother of all automations — hyperautomation. There can be no automation after hyperautomation!

Take a look at Gartner’s definition of this term:

Hyperautomation, is a loose definition to describe the set of products that perform RPA, document digitisation using OCR (including recognising of documents), and process discovery or process mining. In addition, it may also include an analytics and reporting module that will tell everyone what it was up to with RPA and document processing.

In my opinion, Gartner’s cobbled up definition of hyperautomation resembles this:

Source: https://www.youtube.com/watch?v=Fws1XkcTHys

They were very late in joining the game. I remember several meetings and conference calls with their “analysts” in the early days. They were not convinced that UI Automation was a suitably “hardcore” technology, worthy of their attention. I wasted several calls with their people in Europe and the United States, explaining the integration aspects of UI Automation technology. Those analysts were mostly dealing with traditional enterprise application integration (EAI) and service oriented architectures (SOA), and they simply could not find the courage to acknowledge that a client-side screen-integration technology can perform application integration. Perhaps it would upset their existing EAI and SOA advisory business. So, for some time they “parked” this under their BPO and shared services analyst.

Meanwhile, the Pied Piper who had first struck the notes of “RPA”, along with a few other Gartner’s competitors were riding a wave of success with RPA advisory work. Gartner had clearly lost the RPA round.

However, when pure RPA started showing signs of disillusionment, and when patched-up solutions (sorry: “augmented solutions”) like OCR/CV + RPA came to the fore, I guess they found an opening to get into the game and thus created this masterpiece definition called hyperautomation.

At this point, I will add that these are my (perhaps cynical) opinions based on my experience in dealing with these analysts and analyst firms. Nevertheless, I still believe that the industry needs analyst firms. They serve an essential need to increase awareness about various products and solutions among potential consumers. As a vendor, I would definitely hire analyst firms to help my company to propagate its message to the wide audience that analyst firms can reach.

Before moving on to more serious aspects, let me elaborate this parody by explaining the various components of Gartner’s definition of hyperautomation:

The concept of “citizen developer” is another new item in the circus; although it has origins outside of automation. “Low code / no code” (LCNC) is the latest incarnation of techniques to offer development and runtime platforms that purportedly require very minimal programming in order to produce some business software functionality. The platform contains various pre-created and reusable components that can be assembled via a visual “composer” tool. The promise is that people can assemble these mini-apps very quickly and cheaply. For the past forty years, there have been numerous attempts to create such tools. LCNC is the latest in the line and certainly the most successful so far. This is due to the advances in componentisation of functionalities, cloud deployments, microservices, and public APIs. While this is all fine from a functionality composition perspective (being able to compose complex functions from simpler ones), the idea of getting business users to assume the role of “citizen developer”, and use LCNC tools to assemble and create arbitrary applications for the organisation is something that I do not resonate with. Making software that performs a business’ operations requires certain kinds of specialised skills. Meta-data, normalisation, parameterisation (knowing how to make something generic so that different parameters can drive the functionality), and algorithms are, among other things, important facets that the so-called citizen developers should be proficient in before they can produce organisation wide software components. By definition “citizen developers” are business users; and that term essentially aims at accentuating the fact that a non-technical, non-programmer type people (mere citizens) can use the tool to develop software applications. It is possible to have sophisticated super-applications that have extraordinary configuration capabilities that could allow these “citizens” to drag-and-drop widgets of pre-existing business features using a visual tool and quickly build a small application that may fit some purpose envisaged by that “citizen developer”. Such tools may be able to provide customisable visual summaries and dashboards that may improve the productivity of business users. But I do not believe that it would be a good idea to enable writing and modification of centrally used business data by citizen developers.

In the context of RPA and hyperautomation, the workflow composition tool has been made progressively easier to use and it has reached such levels of ease that it could be classified as LCNC. Thus it can now be claimed that process automations can be specified by citizen developers. As with the general case of citizen developed software, I have certain reservations on the use of such citizen developed process automations because, for example, various sub-scenarios and boundary value cases have to be formally specified and tested before the automation can be release for organisation wide consumption.

On the positive side, LCNC tools and environments may greatly help conventional developers to speed up their time to complete automation configurations. From that perspective these tools are very valuable indeed; but it is certainly not appropriate to think that a new class of citizen developers and their software applications will improve the efficiency of business processes.

What has been the outcome of all these add-on modules — the pillion riders in the clown-act called hyperautomation?

  1. There has been some improvement in the number of processes that are getting automated, thanks in some part to better OCR, and in some part to process discovery.
  2. The creation of “bot-farms” has led to fairly decent management of these software components. Very fine controls for deployment of automations in data centers and clouds is now possible. Systems can be setup to scale the number of “bots” based on incoming workload. The so-called “Robotics” team leaders and managers can now get their own daily/weekly/monthy/quarterly/yearly reports and dashboards about the good work that the bots — aka “digital workforce” — has been performing.
  3. A nice new fiefdom had been carved out with the concept of “automation governance” and “centers of excellence” (COE) to help manage the rollout of hyperautomation in enterprises. To be fair, COEs existed even before the term hyperautomation was formulated. But despite knowing about their long term detrimental effects, CoEs have continued to exist and have now found a new home in the world of RPA/hyperautomation.
  4. Process discovery tools have some limited success in helping to document the steps of simple to medium level processes. The so-called “citizen developers” are generally too busy to work on creating automation apps

However, a majority of business processes that were being performed as software driven labour continue to remain manual in nature.

RPA, IPA, CPA, IA, CA, DPA, hyperautomation and every other attempt to remain on Mt Stupid cannot fundamentally eliminate software driven labour. The current trajectory and product roadmaps of vendors in this space is unfortunately taking everyone deeper into a rabbit-hole.

It is time to wake up.

Leave a Reply