How to ReAct To Simple AI Agents
How to ReAct To Simple AI Agents

Introduction

With the increased usage of large language models (LLMs) in everyday use cases, data frameworks such as LlamaIndex and LangChain have been created for developing applications powered by LLMs. Frameworks like these can build conversational AI applications by connecting language models to external data sources and allowing them to interact with each other. Within these frameworks you have access to a variety of components for working with language models along with ready-made chains, enabling both simple and fully customizable applications. However, in order to accomplish higher-level language tasks you will want to use components like agents and tools along with existing chains. To implement a specialized agent for your use case, an overview of agent architectures (as well as how chains, prompt templates and tools can be used together when completing a complex task) is first needed.

AI agent at the precipice art

Agents and Chains

To get started, let’s clarify the nuances between these components. First, the core idea of an agent is to use a LLM to reason a sequence of actions to take and in which order, whereas in chains, the sequence of actions is hardcoded. A chain is a sequence of calls to components (which can include other chains) that allows us to combine multiple components together to create a single, coherent application.

For example, we can create a chain that takes user input, formats it with a prompt template, and then passes the formatted response to an LLM. We can build more complex chains by combining multiple chains together, or by combining chains with other components.

Using an LLM in isolation is fine for simple applications, but more complex applications require chaining LLMs – either with each other or with other components. LangChain provides extensible interfaces and external integrations for chains (sequence of calls), agents (allows chains to choose which tools to use given high-level directives), memory (application state between runs of a chain) and callbacks (intermediate steps).

ReAct and MRKL Architectures

Let’s now look at an example of a zero-shot agent that uses the Reason and Act (ReAct) framework where the agent reasons over the next action, constructs a command, and then executes the action. The ReAct paradigm then repeats these steps on an iterative loop until the task is complete. Before ReAct there have also been related techniques on Self-Ask and Chain of Thought Prompting (CoT), as well as Plan-and-solve Prompting, which generates a plan beforehand (to decompose a complex task into simpler ones).

The ReAct framework has been implemented by using an application of MRKL systems (Modular Reasoning, Knowledge and Language, pronounced “miracle”) and from Figure 1 we can see that it is a compilation of Reason Only (CoT prompting) and Act Only (Self-Ask) paradigms. The ReAct approach is widely used because not only are you providing the reasoning up front to your LLM, but you are also taking an action to provide an observation back into the LLM in order to maximize performance and iterate on the initial reasoning.

react diagram
Figure 1. Previous methods prompt language models (LM) compared to ReAct, a new paradigm that combines reasoning and acting advances in language models.

You can provide multiple tools to LLMs to perform actions and then let the LLM decide the correct way to interact with these tools to achieve the desired objective. For the MRKL framework two tools must be provided: a Search tool and a Lookup tool (they must be named exactly as so). The Search tool should search for a document, while the Lookup tool should lookup a term in the most recently found document.

LLMs exhibit the best zero-shot transfer capabilities by leveraging knowledge and patterns from other tasks it was trained on that are related in some abstract way. Our example below will be a zero-shot learning agent since it has the ability to perform tasks that it was never explicitly trained on. When creating a custom agent you must give it access to a correct set of tools (see Table 1 for a list of options) to accomplish the objective. Even if you create a new tool it must be carefully described in the prompt template for the agent to properly use them. Also, before we continue it’s important to note that while a zero-shot agent does provide flexibility when calling the LLM, the performance of the agent is dependent on the quality and capabilities of the LLM.

Table 1: Tool Types

Let’s look at how to setup an agent using chains and tools:

from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.llms import OpenAI
import os
os.environ["OPENAI_API_KEY"] = "" # https://platform.openai.com/
os.environ['SERPAPI_API_KEY'] = "" # https://serpapi.com/
llm = OpenAI(temperature=0) #Temp = 0 generally means less hallucinations and a more deterministic model


#Tools are functions that agents can use to interact with the world. These tools can be generic utilities (e.g. search), other chains, or even other agents.
tools = load_tools(["serpapi", "llm-math","wikipedia","terminal"], llm=llm) #The tools the agent will have access to are: [Google search, math, wikipedia, terminal]
agent = initialize_agent(tools,
                         llm,
                         agent="zero-shot-react-description",
                         verbose=True)
tools[1].name, tools[1].description
Output: 
('Calculator', 'Useful for when you need to answer questions about math.')


agent.agent.llm_chain.prompt.template
Output:
Answer the following questions as best you can. You have access to the following tools:


Search: A search engine. Useful for when you need to answer questions about current events. Input should be a search query.
Calculator: Useful for when you need to answer questions about math.
Wikipedia: A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Input should be a search query.
terminal: Run shell commands on this Linux machine.


Use the following format:


Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Search, Calculator, Wikipedia, terminal]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question


Begin!


Question: {input}
Thought:{agent_scratchpad}

In the above example we initialize a LangChain zero-shot agent and give it access to a list of tools. When we run the agent.llm_chain.prompt.template the command provides a description and purpose of each tool, the input that should be used to trigger that tool, as well as the format of the prompt template from the ReAct framework. These prompt templates and examples are customizable for specific tasks and should be changed out accordingly– since most of the work in creating a custom LLMChain comes down to the prompt. Notice at the end of the prompt template there is Thought:{agent_scratchpad} as the final part of the prompt. This is a useful input variable to include so that the LLM can continue on from previous actions and observations.

Agent Types

Different agent architectures have been developed to enable goal-directed tool use and contextual conversations by combining language models with structured knowledge. Let’s take a look at various types of agents and their purpose built architecture.

LangChain Agent types:

  • ZERO_SHOT_REACT_DESCRIPTION = ‘zero-shot-react-description’ | Most used general agent. This agent uses the ReAct framework to determine which tool to use based solely on the tool’s description. Any number of tools can be provided.
  • REACT_DOCSTORE = ‘react-docstore’ | This agent uses the ReAct framework to interact with a docstore. Two tools must be provided: a Search tool and a Lookup tool.
  • SELF_ASK_WITH_SEARCH = ‘self-ask-with-search’ | This agent utilizes a single tool that should be named Intermediate Answer to look up factual answers to questions. This agent is equivalent to the original self ask with search paper.
  • CONVERSATIONAL_REACT_DESCRIPTION = ‘conversational-react-description’ | This agent is designed to be used in conversational settings. It uses the ReAct framework to decide which tool to use, and uses memory to remember the previous conversation interactions.
  • CHAT_ZERO_SHOT_REACT_DESCRIPTION = ‘chat-zero-shot-react-description’
  • CHAT_CONVERSATIONAL_REACT_DESCRIPTION = ‘chat-conversational-react-description’
  • STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION = ‘structured-chat-zero-shot-react-description’ | The structured tool chat agent is capable of using multi-input tools to argument schema and create a structured action input. This is useful for more complex tool usage, like precisely navigating around a browser.
  • OPENAI_FUNCTIONS = ‘openai-functions’ and OPENAI_MULTI_FUNCTIONS = ‘openai-multi-functions’ | Certain OpenAI models (like gpt-3.5-turbo-0613 and gpt-4-0613) have been explicitly fine-tuned to detect when a function should be called and respond with the inputs that should be passed to the function. The OpenAI Functions and Multi-Functions Agents are designed to work with these models.
class langchain.agents.agent_types.AgentType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

For a custom ReAct agent, we can create the necessary tools and prompt template:

#Install and Import Required Modules, LLMs and Agent Tools
!pip -q install langchain huggingface_hub openai google-search-results


import os
os.environ["OPENAI_API_KEY"] = '' # https://platform.openai.com
os.environ['SERPAPI_API_KEY'] = '' # https://serpapi.com/

from langchain.agents import ZeroShotAgent, Tool, AgentExecutor
from langchain import OpenAI, SerpAPIWrapper, LLMChain

search = SerpAPIWrapper()
tools = [
   Tool(
       name="Search",
       func=search.run,
       description="useful for when you need to answer questions about current events",
   )
]

prefix = """Answer the following questions as best you can, but do it in old Shakepearean English. You have access to the following tools:"""
suffix = """Begin! Remember to speak in old Shakepearean English in the final answer. Use the word "behold" at least once.

Question: {input}
{agent_scratchpad}"""

prompt = ZeroShotAgent.create_prompt(
   tools, prefix=prefix, suffix=suffix, input_variables=["input", "agent_scratchpad"]
)



print(prompt.template)


Answer the following questions as best you can, but do it in old Shakepearean English. You have access to the following tools:

Search: useful for when you need to answer questions about current events 

Use the following format: 

Question: the input question you must answer Thought: you should always think about what to do 

Action: the action to take, should be one of [Search] 

Action Input: the input to the action 

Observation: the result of the action ... (this Thought/Action/Action Input/Observation can repeat N times) 

Thought: I now know the final answer 

Final Answer: the final answer to the original input question 

Begin! Remember to speak in old Shakepearean English in the final answer. Use the word "behold" at least once. 

Question: {input} {agent_scratchpad}

#You can now run a query for the new prompt

See that you can feed agents a self-defined prompt template assuming it meets the agent’s requirements.

llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)


tool_names = [tool.name for tool in tools]


agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names)


agent_executor = AgentExecutor.from_agent_and_tools(
    agent=agent, tools=tools, verbose=True
)


agent_executor.run("How many hurricanes are expected to make landfall in the US this year?")
> Entering new AgentExecutor chain…Thought: I must find out how many hurricanes are expected to make landfall in the US this year.Action: SearchAction Input: Number of hurricanes expected to make landfall in the US this yearObservation: NOAA is forecasting a range of 12 to 17 total named storms (winds of 39 mph or higher). Of those, 5 to 9 could become hurricanes (winds of 74

Thought: I now know the final answer

Final Answer: Behold, ’tis said that NOAA is forecasting a range of 12 to 17 total named storms, of which 5 to 9 could become hurricanes.

> Finished chain.
‘Behold, ’tis said that NOAA is forecasting a range of 12 to 17 total named storms, of which 5 to 9 could become hurricanes.

We now have created a custom zero-shot agent using the ReAct framework to dissect a complex question and generate a response by selecting and parameterizing external tools, organizing tasks, and retaining explicit asks from the prompt template (Remember to speak in old Shakepearean English in the final answer. Use the word “behold” at least once) in memory. Now, learning the number of Hurricanes likely to occur is never easy to hear, but learning about them in a Shakespearean English dialect is preferable.

Table 2: Additional examples using AgentType

Conclusion

We now see how the ReAct chain works in LangChain by understanding the reasoning and action iterations being used. When leveraging an agent with the ReAct framework, you can use the prompt as is, but for explicit tasks it is be better to change out the original prompt examples that come from the paper for more relevant ones. Above, we also see how we are able to feed agents a self-defined prompt template (i.e. not restricted to the prompt generated by the create_prompt function) assuming it meets the agent’s requirements. While using default prompt templates for LLMs will deliver good results, a custom template for your agent will allow you the flexibility to make sure the questions, the observations, the actions, and the thoughts are tailored to your use case. This will also be helpful in debugging poor responses by going back through the agent traces that were executed in intermediate steps from your prompt template.

Fully customizable agents extend beyond prompt templates for specific tasks. You’ve now seen how you can also change out the LLM used for reasoning as well the tools you are providing your agent. Therefore, if you are considering using an agent, before testing out complex agents (and if you want to learn about complex agent architectures check out this piece on BabyAGI and Voyager), try first customizing a zero-shot agent that uses the ReAct architecture. This, along with ReAct prompting, LLM chaining and custom tooling will help you build a task-specific agent with the most relevant results and optimal LLM performance.