Building and Deploying AI Agents with LangChain on Vertex AI

annawhelan
Staff

3MG2dH4PQgJYeAG.png

Overview

The rise of generative AI models, such as the Gemini 1.5 Pro model, has opened exciting possibilities for building intelligent agents capable of complex tasks. AI agents enable autonomous behavior by using generative models and external tools to perceive their environment, make decisions, and take actions to achieve goals. But the reality of generative AI applications and AI agents is that they involve lots of time and upkeep to manage the underlying infrastructure and boilerplate code.

LangChain on Vertex AI (Reasoning Engine) is a managed service in Vertex AI that provides a runtime environment for deploying agents built with any orchestration framework, including LangChain. Reasoning Engine abstracts away complexities such as deployment, scaling, and monitoring, which allows developers to focus on the core logic and capabilities within their agents.

In this blog post, we’ll walk through how LangChain on Vertex AI helps developers simplify the complexities of deploying and managing your AI agents. With a single API call to reasoning_engines.create() you can deploy your application to a scalable and secure environment. Then, Reasoning Engine takes care of the deployment, infrastructure, autoscaling, monitoring, and observability, which lets you get back to innovation and problem solving.

Background on generative models and tools

In a previous blog post on Function Calling in Gemini, we discussed a native framework within the Gemini model that can be used to turn natural language prompts into structured data and back again. Developers can use the Function Calling framework to define functions as tools that the Gemini model can use to connect to external systems and APIs to fetch real-time data that supplements the generative model's trained knowledge about the world.

If you want to work with the model, tools, and function components for simple use cases such as entity extraction, structured data outputs, or custom workflows with external APIs, then you probably want to stick with Function Calling.

As you continue to build on top of the model and tools framework by adding more complex workflows, reasoning logic, and error handling to your generative AI application, you might find yourself getting lost in the data connections, retrievers, and orchestration layers and their associated configuration. This is when you know that you've reached the limitations of existing approaches for building and deploying AI agents.

Challenges when going from model to agent

There are many different ways to add more functionality to your generative AI application that uses an LLM to generate content. You might have developed a series of prompts or chained generative model requests to perform a task or set of tasks. Or maybe you've implemented a ReAct agent in LangChain. Or you might be developing on the cutting edge as you implement reflection agents or deploy multi-agent routers.

But when does your application code become an AI agent? How can you build your AI agent code in a modular, composable, and maintainable way rather than a monolithic bundle of confusing code? And how can you deploy your agent in a scalable and reliable way? In the following section, we’ll dive into the technical details of working with agents using LangChain on Vertex AI, which offers developers a streamlined approach to building and deploying production-ready AI agents.

What’s in an agent? Key components in LangChain on Vertex AI

Building and deploying agents with LangChain on Vertex AI involves four distinct layers, each catering to specific development needs.image2.png

  • Model (Gemini model): This layer handles content generation, understanding and responding to user queries in natural language, and summarizing information.
  • Tools (Gemini Function Calling): This layer allows your agent to interact with external systems and APIs, enabling it to perform actions beyond just generating text or images.
  • Reasoning (LangChain): This layer organizes your application code into functions, defining configuration parameters, initialization logic, and runtime behavior. LangChain simplifies LLM application development by providing the building blocks for generative AI applications, and developers maintain control over crucial aspects like custom functions, agent behavior, and model parameters.
  • Deployment (Reasoning Engine): This Vertex AI service hosts your AI agent and provides benefits such as security, observability, and scalability. Reasoning Engine is compatible with LangChain or any open-source framework to build customizable agentic workflows.

Building custom generative AI applications with agentic capabilities often involves adding tools and functions on top of powerful generative models, such as Gemini. While prototyping is exciting, moving to production raises concerns about deployment, scaling, and management of these complex systems. This is where Vertex AI's Reasoning Engine comes in!

Building and deploying an AI agent with LangChain on Vertex AI

In this section, we’ll walk through the key steps of building, testing, and deploying your AI agent with LangChain on Vertex AI based on the sample notebook for building and deploying an agent with LangChain on Vertex AI. You can also go hands-on with the links and resources at the end of this blog post to get started yourself!

1. Define your functions

To start, we’ll need to define functions that Gemini will use as tools to interact with external systems and APIs to retrieve real-time information. With Reasoning Engine and the provided LangChain template, there’s no need to write up an OpenAPI specification or represent your API call as an abstract function signature–just write Python functions!

You can define functions to perform retrieval augmented generation (RAG) and retrieve indexed documents from a vector database based on a user query, as in: 

def search_documents(query):
    """Searches a vector database for snippets in relevant documents"""
    from langchain_google_community import VertexAISearchRetriever

    retriever = VertexAISearchRetriever(
        project_id=PROJECT_ID,
        data_store_id=DATA_STORE_ID,
        location_id=LOCATION_ID,
        max_documents=100,
    )

    result = str(retriever.invoke(query))
    return result

You can also define functions that go beyond traditional RAGs and make queries to APIs to retrieve information from external data sources in real-time, as in: 

def get_exchange_rate(currency_from, currency_to):
    """Retrieves the exchange rate between two currencies"""
    import requests
    response = requests.get(
        f"https://api.frankfurter.app/",
        params={"from": currency_from, "to": currency_to},
    )
    return response.json()

You can even go well beyond RAG implementations and REST API calls to define functions that use OSS or custom Python libraries to perform various types of operations. For example, you might want to create a function that generates and sends a SQL query to BigQuery, searches for businesses using the Maps Places API, or downloads a file from Google Drive, as in: 

def download_file_from_google_drive(file_id):
    """Downloads a file from Google Drive"""
    import google.auth
    from googleapiclient.http import MediaIoBaseDownload
    from googleapiclient.discovery import build

    creds, _ = google.auth.default()
    service = build("drive", "v3", credentials=creds)

    request = service.files().get_media(fileId=file_id)
    file = io.BytesIO()
    downloader = MediaIoBaseDownload(file, request)

    return file.getvalue()

If you can represent it in a Python function, then you can provide it as a tool for your agent!

2. Define your agent

Once you’ve defined all of the functions that you want to include as tools in your AI agent, you can define an agent using our LangChain template: 

agent = reasoning_engines.LangchainAgent(
    model=model,
    tools=[search_documents, get_exchange_rate, download_file_from_google_drive]
)

Note that the tools kwarg includes references to the functions that you described earlier, and the LangChain template in Reasoning Engine introspects the function name, function arguments, default argument values, docstrings, and type hints so that it can pass all of this information as part of the tool description to the agent and Gemini model.

We designed this LangChain template so that you can quickly get started out-of-the-box using default values. We also built the template so that you can have maximum flexibility when customizing the layers of your agent to modify reasoning behavior, generative model parameters, swap out the default agent logic for another type of LangChain agent, or even swap out LangChain for an entirely different orchestration framework!

3. Deploy your agent

Now you’re ready to move on to the deployment step of productionizing your AI agent! Here, you specify the instance of the agent that you defined previously along with the set of Python packages and dependencies required for your agent: 

remote_agent = reasoning_engines.ReasoningEngine.create(
    agent,
    requirements=[
        "google-cloud-aiplatform[reasoningengine,langchain]",
    ],
)

When deploying your agent with Reasoning Engine, there’s no need to add API routes via a web framework, no need for Docker images or containers, and no need for complicated deployment steps. And after a couple of minutes, your AI agent is deployed and ready to accept queries.

Interacting with your deployed AI agent: From prompt to response

Now that you’ve deployed your agent with LangChain on Vertex AI, you can send a prompt to the remotely deployed agent using the following query: 

>>> remote_agent.query(
    input="What's the exchange rate from US dollars to Swedish currency today?"
)

{'input': "What's the exchange rate from US dollars to Swedish currency today?",
 'output': 'Today, 1 US dollar is equal to 10.949 Swedish krona.'}

In this case, the Gemini model didn’t know the exchange rate based on its training data. Rather, our agent used the function that we defined to fetch the current exchange rate, passed that information back to the Gemini model, and Gemini was able to use that real-time information to generate a natural language summary!

Let's take a deeper look behind the scenes of this example query and break down what actions the AI agent took at runtime to go from the user’s input prompt to the output that contains a natural language summary of the answer:image1.png

  1. User submits a query: The user sends an input prompt asking about currency exchange rates between two different currencies.
  2. Send query and tools to model: The agent packages the query with tool descriptions and sends it to the Gemini model.
  3. Model decides on tool usage: Based on the query and tool descriptions, the Gemini model decides whether to utilize a specific function (get_exchange_rate) and which parameters to send as inputs to the function (the currencies that the user wants to know about).
  4. Application calls the tool: The application executes the model’s instructions by calling the appropriate function (get_exchange_rate) with the provided parameters.
  5. Tool results: The application receives a response from the tool (an API response payload).
  6. Return results to model: The application sends the API response payload to the model.
  7. Return results to agent: The agent interacts with the model to understand the observation based on the response.
  8. Agent determines next steps: This process repeats if the agent determines additional tool calls are necessary or if the agent should prepare a final response to send to the user.
  9. Model generates response: Based on the results from the external API and the agent iterations, the model then generates a natural language response for the user that contains the latest currency exchange rate information.

Once your agent is deployed as a Reasoning Engine endpoint in Vertex AI, you can run the following command to get the resource identifier for your remotely deployed agent: 

>>> remote_agent_path = remote_agent.resource_name
projects/954731410984/locations/us-central1/reasoningEngines/8658662864829022208

And now you can import and query the remotely deployed agent in a separate Python application using the Vertex AI SDK for Python, as in: 

remote_agent = reasoning_engines.ReasoningEngine(remote_agent_path)
response = remote_agent.query(input=query)

Or, you can send queries to your remotely deployed agent from other programming languages using any of the available client libraries in Vertex AI, including C#, Java, Node.js, Go, or REST.

Benefits of LangChain on Vertex AI and Reasoning Engine

  • Simplified development: LangChain on Vertex AI streamlines agent development with its modular components and intuitive API built from the ground up for creating and deploying AI agents.
  • Flexibility and control: Developers maintain control over critical aspects of agent behavior and functionality at all of the relevant layers underneath your AI agent.
  • Production-ready deployment: Vertex AI's Reasoning Engine handles the complexities of deployment, scaling, and management.
  • Security and scalability: Vertex AI provides a secure and scalable environment for running agents in production.

Start building AI agents with LangChain on Vertex AI

To start building and deploying agents with LangChain on Vertex AI, you can go hands-on with the following developer resources:

By combining the power of LangChain and Vertex AI, developers can use generative models to build intelligent agents that can tackle complex real-world tasks and autonomous workflows.

We’re excited to see what kinds of intelligent, agentic applications that you build with Reasoning Engine and LangChain on Vertex AI. Happy coding!

7 4 2,504