Automation is the promising technology of the future, delivering unparalleled efficiency for little resource cost. Unfortunately, automation also brings a series of challenges and issues that make its full implementation possibly wrong for certain circumstances.

Below, we’re going to talk about a middle-of-the-road solution called Human-in-the-Loop workflow automation. Tina Huang from Transposit discussed this topic at the 2019 Platform Summit. This presentation is the basis for this piece, which should be watched as a companion.

Why Automation vs. Manual?

Automation is first and foremost an efficiency-increasing solution. Manual tasks are often time-consuming on their own, creating much overhead. For every corporate manual workflow, a ticket must be created and tracked; a project manager must handle those tickets and their associated workflows; teams must manage completions and handoffs, etc. Manual processes incur costs that are not readily obvious but scale enormously. They also introduce room for human error.

Modern adoption paradigms drive a need for automation. SaaS, or Software as a Service, has quickly become a productivity juggernaut, allowing for greater and faster development and utilization. While SaaS is a net positive, it does introduce silos. Even though manual tasks had greater overhead previously, they could at least be done in one place. In a SaaS-driven environment, tasks can touch many different platforms and systems, forcing multiple junctures for even simple flows. There may still be a manual workflow at each of these junctures, prone to similar issues as above.

APIs supply the power to automate this workflow ecosystem, reducing the total manual workflows and reducing the overall manual overhead. Thus, the implementation of automation is not merely a time saver — it is an error reduction system, a platform trust builder, and ultimately, a productivity booster.

The Argument for Human Involvement

However, automation does not fix every problem. In many cases, automation introduces its own set of issues. The majority of automated tooling focuses on easily-automated processes, typically based around a concrete, well-defined trigger and series of follow-up actions that can be completed without any human involvement.

For example, automatically filling in current date information is relatively easy to automate since there’s really only one correct answer. But what about less machine-apparent data types and workflows? The problem is that most tooling only considers easily automatable content and doesn’t provide a lot of space for human interaction.

But why do we even need human involvement? Full automation doesn’t support a wide range of edge cases, creating barriers to fully intelligent workflows. Let’s look at a few edge cases to see where an automated process can fail just as badly as a manual one.

An Example Cause for HITL Workflows

Let’s imagine a system where a user can sign up for your service, and from this signup, a marketing entry is generated. This marketing info is precious, but it’s often incomplete — we may also want to know where they were referred from and the amount of money they spent in their session after signing up.

The core issue with automating in this type of edge case is that data exists in multiple sources, each with variable inputs and record types. Referrer info could come from a physical spreadsheet from teams on the street or an automated referral system. In such a case, your data is messy and would need to be sanitized before being useful. Even if it is sanitized, that data would still need to be looked at for sources of truth — for example, do we trust manual entries more than the automated system?

Automation only causes further problems here, but there’s a middle ground that can deliver automated results with little manual input. In this example, why not let the machines gather and combine all the rough data, but allow the human in the loop to access a streamlined process to choose correct entries over incorrect?

This type of workflow is called Human-in-the-Loop, or HITL, and is a middle ground between full automation and full manual processing. With HITL, at each automation interval, a simple question is asked — “can a human add value to this?” If a human can add value, this becomes a junction where a human commits an action. This kind of action could be deciding which data source is true, or validating that an offer should be extended, or even more complex functions like ensuring two-person authorization for access to remote resources.

Concerns and Challenges

Of course, nothing is perfect, and HITL does introduce some additional challenges to the average workflow. When you implement HITL, you have to create a series of user interfaces. Building out a UI can be challenging and time-consuming. It could exacerbate the problem of having too many services and APIs by creating yet another service or API. For every human involved in the process, this demand scales accordingly, especially if the different junction points are discrete in function, form, and purpose.

Though we want to work within the systems we already implement, it’s quite tempting to create a new solution. The problem then is not every system is extensible and scalable to the point that we can plug these services into the workflow — we would likely need to introduce yet another hook with yet another system to work within.

Additionally, there is a concern around identity. How do we manage identity when we’re working within a complex workflow? While some solutions like OAuth have made this more simple to solve, it is still an issue that must be resolved universally.

Arguably, these concerns are minor compared to the issues with full manual or full automatic workflows; as long as you keep them in mind and address them within the workflow, focusing on user experience, you can resolve them and increase the effectiveness of the HITL approach.

The Problem of API Composition

As Tina Huang describes it, a root cause of issues when introducing HITL is the problem of API composition. This core concern is based on the nature of APIs and application development. Either in automation or manual combinatory efforts like HITL, we’re essentially combining APIs and systems like Lego blocks, stacking them together where they fit, and finding solutions where they may not fit perfectly.

The integration process rarely looks like Legos — due to nuances around API development, plugging APIs into each other can get pretty complicated. This, in turn, creates the n+1 problem. In this type of situation, what should be a single request quickly becomes n+1, where the API and one additional function need to be called to deliver the form and function as required. This is, in essence, a request and a follow-up request that gets merged. A prime example of this type of issue is pagination, which is fundamentally a group of mechanisms that answer a single unifying tool for connecting the data feedback.

This complexity can quickly become a problem in an automation framework, where every element of that framework needs to be developed within a network of unsure conditions. Every part of the automated system needs to be prepared for unstable networks, different identity standards, possible errors, etc. This creates significant issues in a fully automated workflow, as any failure along the way causes a critical failure that stops all the following processes. While the core problems exist in HITL as well, the human in that loop can mitigate much of the followthrough concern — if an automated system fails, HITL should provide a system through which the computerized system is allowed to continue, and the complexity is mitigated.

The Application of HITL

To effectively implement HITL, we need to think about where the human in the loop adds the most value. Simply plugging in a human at every step is essentially just manual workflows with semi-automated steps, which is a major step backward from our end goal. Accordingly, we must figure out what pattern of automation we’d like to pursue.

Perhaps the simplest pattern is to allow the human to be the trigger for the automation. An excellent example of this type of HITL workflow is in customer service. When a customer complains, you may want to offer them a free product to ensure brand loyalty and rectify any mistakes. To do this, a series of time-consuming steps are often necessary — a customer service rep must file a ticket, generate a coupon code, gather customer information, and so on. If this was fully automated, there is the possibility that the system will break, at which point anyone could claim a free product by abusing the system. With HITL workflows, you can allow the customer service rep to determine who can get the free product and then provide a single button that triggers the rest of the process, generating the code and pairing information to deliver the product. This automates much of the process while allowing human discernment to play a critical role.

Another pattern is giving the human the ability to conduct investigations by providing rich data. For example, in a development situation, if a developer build fails for any reason, we can automate the response and troubleshooting process. In the automated process, we can notify the dev that a build has failed and then provide automated tools to search through logs, rebuild the failed build, and so on. This can be done through automated systems tied into a human-facing frontend, allowing for more intelligent choices without making the entire process manual.

Another good HITL use case is a “sanity check pattern,” which establishes accountability for humans in checking an automated process’s actions. Machines aren’t perfect, and as such, it may be prudent to have humans perform “sanity checks” on automated processes. Having the final approval process go through a human allows us to ensure that mistakes are avoided, and automated systems have checked power.

Examples From Transposit

In Tina Huang’s presentation, she mentioned a couple of Transposit-specific examples. Though they reflect a vendor’s take on HITL, they are some very good use cases, and as such, are worth repeating here. While these workflows are generally pretty streamlined, this paradigm could be extended easily to more complex processes with a little planning.

First, Where’s My Commit is a more automated HITL workflow that utilizes Slack to identify and locate commits for development. By utilizing the /wheresmycommit command, a bot facilitates the search for the commit and automatically posts the information combined from multiple sources. This is a far cry better than the original method, which entailed asking an engineer, who would then dig for the JIRA ticket, cross-reference with GitHub and other internal systems, combine the resultant data, and return the location to the requester.

Another good implementation is CirceCI. CircleCI is a build management system that reports on the success or failure of builds. When failure occurs, CircleCI provides the status, and the receiving person can select a set of quick actions. These actions include “retrieve artifacts,” “retry build,” and “revert change.” This is a more complex HITL workflow, as it draws on multiple resources and methods, but provides quite simple human interactions.

Finally, Transposit created a DevOps Oncall channel that is a hub for automated on-call operations. When an alert is automated into the channel, users can collaborate around the issue and use chat functions to carry out certain activities. A big benefit here is their /graph command, which allows people to automatically fetch an image of the build from AWS Cloudwatch and share it in the channel, thus synchronizing around the current core issue.

HITL: Human Automation Hybrid

Ultimately, Human-in-the-Loop is a useful paradigm when implemented within the appropriate circumstances. Allowing humans to add value to an automated chain is an excellent option in most automated workflows and can create unparalleled effectiveness and increased accountability.

What do you think about HITL? Is there a better paradigm? Are there ways that an AI can deliver on these benefits without necessarily requiring an actual human in the loop? Let us know below!