AI Chatbots with Human in the Loop

LLM based AI Chatbots on auto pilot is not very wise and risky. Even when manually composing text or getting an answer to a question through AI chatbots, a human tries different prompts and evaluates responses. So there is human in a try and evaluate loop. An AI chatbot in autonomous mode without human intervention is a ticking time bomb. There have been reports of direct customer facing chatbots malfunctioning, producing wrong or poor quality responses. A user can also abuse such chatbots with prompt hacking to elicit inappropriate responses.

In this post, we will go through three examples of human in loop Ai Chatbot applications as it’s been reported in recent papers. Two of them are from the industry and one from a university.

Human in the Loop

By human in the loop we mean human in the loop at inference time. At training time there is always human in loop with pre trained LLM as SFT and RLHF training. There training techniques require human labeled text.

Humans in the loop can be of two kinds. It can be an actual human in the loop or human written script incorporating some logic. The second kind of human in the loop with human written code is indirect and these systems could be called Neuro Symbolic. Human in the loop chatbot executes a generate and evaluate loop. The chatbot generates the solution and a human or human written script optionally post processes and evaluates the generated output.

An example of the first kind is a live human customer service representative acting as an intermediary interacting with an AI chatbot for potential responses to a customer service query. The representative may modify the chatbot response or prompt again for another response before finally responding to the customer.

An example of a human written code as human in the loop is to use LLM chatbot to generate solutions to a planning problem. The chatbot generated solution is used as a seed and fed as input to a real planner for the final planning solution. The real planner was coded by humans.

Here is the process for human in the loop LLM

Send prompt to LLM
Take the output, evaluate and post process if necessary
If the result is not satisfactory, go back to step 1, otherwise done

Increasing User Engagement

The first study is on engaging users for LLM generated content. The goal is to increase user engagement by optimizing email subject line.

Generating engaging content with LLMs is challenging, primarily for the following two reasons. Even if theLLM generated content seems to be more informative or well-written, it does not necessarily lead to an increase in user activities, such as clicks. While LLMs receive training with specific forms of human feedback, this feedback primarily focuses on enhancing adherence to general, user-agnostic task instructions. However, it does not specifically target increasing the content’s interactions with users.

Secondly, one of the key challenges associated with content generated by LLM lies in its quality. Such content often lacks the distinctiveness and authenticity commonly found in human-created material. LLMs tend to exhibit narrower vocabulary usage and adopt a more formal tone compared to typical human communication. As a result, the generated content may come across as promotional or akin to advertisements. Additionally, there is the issue of “hallucination,” where LLM-generated content occasionally includes references to non-existent entities or false information.

The solution presented is an example of how LLM could be a useful tool with human in the loop. ChatGPT and a rule based system were used to generate email subject line. A/B testing was performed using outputs from the 2 systems. The user feedback results from A/B testing was used to train a reward model. The use feedback in terms of engagements constitute the human in the loop here. Two different kinds of reward models were used, point wise and pair wise for the output. The reward model performance is tracked and retrained as necessary. They didn’t use RL and used the reward model directly at inference time.

At inference time, LLMs is prompted to produce multiple versions of outputs by conditioning on the prompt. This list is combined with output from rule based system. The content with the highest reward based on the reward model is selected. The online inference process involves prompting LLMs to produce several variations of text outputs conditioning on the same context, then assessing these variations using the reward model to determine which one is most likely to engage users.

For point wise reward model, during inference time, the log probability of outputting the sequence “yes” is used to score and rank generated responses. For pair wise reward model, the reward model is used to rank candidate answers in a list using a tournament style. It calls the reward model m − 1 times to rank a list of m candidates.

This example is not a strong case of human in the loop, because there is no human involvement at inference time. However the reward model training is based on the feedback of users of a particular application.

Planning Task

Next we will look into planning ability of LLM and how that can be enhanced by having a real planner coded by humans in the loop.

Planning is about creating a set of actions aka policy, that guides an agent to achieve a desired state in the world. Researchers study planning by analyzing world and reward models, whether those models are provided by humans or learned through agent-world interactions.

LLMs can excel as idea generators for humans in collaborative work scenarios. However, their ability to create guaranteed correct plans is limited. LLMs rely more on co occurrence pattern recognition than on rigorous principles. Without any innate planning ability, LLMs only mimic a real planner. While specialized planners are precise in narrow domains, LLMs may offer plausible but not guaranteed plan heuristics in a broader range of contexts.

When LLM generated plans were used as is, they were found to perform very poorly as far as executing the plan to reach the correct goals for the planning task.

There were 2 scenarios with human in the loop. In the first a real planner software was used. The plans produced by LLMs are given as input to an automated planner working off of a correct domain model to check how easy it is to repair the LLM plans to guarantee their correctness. It was shown that a well known automated planner called LPG, that uses local search to locate and remove flaws in a candidate plan to make it correct, is able to repair the LLM plans with relative ease. It was found that more than 50% of the final plan was due to the edits made by the LPG planner to the initial LLM generated plan.

In the second scenario, LLM generated plans were handed over to real humans to modify and correct. Interestingly real humans only made modest improvement over LLM generated plans. There were two groups of people, the first group were not provided with any kind of assistance and they had come up with the plan from scratch, while the second group was handed LLM generated plan to start with. In first group 74% of them managed to generate a correct final plan, whereas in the second group, 82% generated a correct plan. The improvement with help from LLM was modest. Some users even handed over the LLM generated plan as is without any change. It clearly exhibits humans having automation bias.

Geometric Problem Solving

Next solution is from Google DeepMind for geometric problem solving. AlphaGeometry tackles geometry and math challenges by blending the strengths of LLM and a rule-based deduction engine. Together, they find solutions efficiently. It’s like having both fast, intuitive thinking and deliberate, rational decision-making in one system.

Solving geometric problem requires introducing various geometric constructs such as points and lines during the course of solving the problem. LLMs excel at spotting patterns and they can quickly generate potentially useful geometric constructs, but they struggle with rigorous reasoning and explanations. On the other hand, symbolic deduction engines rely on clear rules and formal logic, making them rational and explainable. However, they can be slow and inflexible when handling complex problems solo.

AlphaGeometry combines its symbolic engine and language model to find solutions. It deduces new statements from the problem diagram, and if needed, adds potentially useful constructs using LLM. This iterative process continues until a solution is found.

Concluding Thoughts

LLM based chatbots on auto pilot are not very practical and useful. With human in the loop or human written script in the loop there is a viable path forward for solving many problems. LLM acts as a potential solution generator. Human or human written script acts as the evaluator. The LLM generator and human or human written script work in tandem in a loop until the problem is solved.