Build Your First AI Agent With MongoDB and LangChain4j

AI Summary11 min read

TL;DR

This tutorial guides you through building an AI agent for movie recommendations using LangChain4j and MongoDB. It covers semantic search with vector embeddings, integrating external APIs, and autonomous task orchestration.

Key Takeaways

•AI agents use reasoning and tools like databases and APIs to achieve goals autonomously, unlike static chatbots.
•LangChain4j is a Java library that simplifies building LLM-powered applications, including agentic systems with frameworks like ReAct.
•The tutorial demonstrates creating a movie recommendation agent with MongoDB for vector search and external APIs for real-time data.
•Prerequisites include Java, Maven, MongoDB Atlas, and API keys for embeddings and planning models.
•Key steps involve setting up dependencies, loading and embedding data, and configuring the agent for multi-step workflows.

What is an AI agent?

An AI agent is a system that can take in information about its environment, decide what to do next, and work towards a goal, with minimum human intervention. Instead of behaving like a static chatbot that only answers questions, an agent can plan, use tools (like databases or APIs), and adapt based on results.

While the defining traits of an AI agent are the ability to reason and act, there’s no single paradigm for how an agent must function. One popular approach is the ReAct framework (Reasoning + Acting), where the agent:

Thinks about the problem.
Takes an action (e.g., querying a search engine).
Observes the result.
Repeats until it can deliver a complete answer.

In short:

Chatbot: Answers questions based on what it already knows
AI agent: Reasons about what it needs to learn, uses tools to gather that information, and works toward a goal

For our movie recommendation agent, we'll use a supervisor pattern, where a planning model orchestrates multiple specialized tools to find movies by plot description, look up streaming availability, and return a clean answer to the user.

What is LangChain4j?

LangChain4j is an open-source Java library designed to simplify building LLM-powered applications. It provides a unified interface for working with multiple LLM providers (OpenAI, Anthropic, etc.) and vector stores like MongoDB Atlas.

While the name invites comparison to the Python-based LangChain project, LangChain4j is more of a fusion, drawing inspiration from Haystack, LlamaIndex, and the wider AI community, all while staying laser-focused on the needs of Java developers.

Development kicked off in early 2023 during the ChatGPT boom, and while the project is still evolving, its core functionality is stable and production-ready. If you're exploring LLM-powered applications in Java, LangChain4j is a pragmatic and actively maintained option.

For this tutorial, we're specifically using the langchain4j-agentic module, which provides abstractions for building agentic systems, including the supervisor pattern that will orchestrate our movie search workflow.

Prerequisites

Before we get started, here's what you'll need:

Java 8 or later installed and ready to go (I'm using Java 24)
Maven for building the project (version 3.9.10 or later recommended)
A MongoDB Atlas cluster—a free M0 tier is perfect for this tutorial. If you need help setting one up, check out the Get Started with Atlas guide.
API keys:
- Voyage AI for generating embeddings (sign up here)
- You can use the OpenAI API for both the embeddings and chat model, if you prefer.
- OpenAI for the planning model (get your key here)
- Watchmode for streaming availability data (register here)

Make sure your IP address is whitelisted in your MongoDB Atlas cluster's network access settings, and create a database user with read/write permissions.

Our dataset

For this tutorial, we're using the IMDB Top 1000 Movies dataset from Kaggle. Download the CSV file and place it in your project's src/main/resources directory, naming it imdb_top_1000.csv.

This dataset includes movie titles, plot overviews, genres, directors, and IMDB ratings. We'll be embedding the overview field (the plot description) so users can search for movies semantically—e.g., "Find me a sci-fi movie about rebels fighting an empire"—instead of needing to remember the exact title.

Creating our app

Let's scaffold a new Maven project. Create a pom.xml file with the following dependencies:


    <dependencies>
        <!-- LangChain4j core -->
        <dependency>
            <groupId>dev.langchain4j</groupId>
            <artifactId>langchain4j</artifactId>
            <version>1.5.0</version>
        </dependency>

        <!-- LangChain4j agentic module -->
        <dependency>
            <groupId>dev.langchain4j</groupId>
            <artifactId>langchain4j-agentic</artifactId>
            <version>1.5.0-beta11</version>
        </dependency>

        <!-- OpenAI integration for the planning model -->
        <dependency>
            <groupId>dev.langchain4j</groupId>
            <artifactId>langchain4j-open-ai</artifactId>
            <version>1.4.0</version>
        </dependency>

        <!-- MongoDB Atlas integration -->
        <dependency>
            <groupId>dev.langchain4j</groupId>
            <artifactId>langchain4j-mongodb-atlas</artifactId>
            <version>1.5.0-beta11</version>
        </dependency>

        <!-- Voyage AI for embeddings -->
        <dependency>
            <groupId>dev.langchain4j</groupId>
            <artifactId>langchain4j-voyage-ai</artifactId>
            <version>1.5.0-beta11</version>
        </dependency>

        <!-- MongoDB Java Driver -->
        <dependency>
            <groupId>org.mongodb</groupId>
            <artifactId>mongodb-driver-sync</artifactId>
            <version>5.5.1</version>
        </dependency>

        <!-- OpenCSV for parsing the IMDB dataset -->
        <dependency>
            <groupId>com.opencsv</groupId>
            <artifactId>opencsv</artifactId>
            <version>5.8</version>
        </dependency>

        <!-- Jackson for parsing JSON responses from Watchmode -->
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.17.2</version>
        </dependency>
    </dependencies>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>dev.langchain4j</groupId>
                <artifactId>langchain4j-bom</artifactId>
                <version>1.5.0-beta11</version>
                <type>pom</type>
            </dependency>
        </dependencies>
    </dependencyManagement>
</project>

Enter fullscreen mode Exit fullscreen mode

These dependencies give us everything we need: the LangChain4j agentic framework, MongoDB and OpenAI integrations, embedding support via Voyage AI, and utilities for parsing CSV and JSON data.

Connecting to MongoDB

Now, let's create the main application class that will handle connecting to MongoDB, loading our movie data, and setting up our agent.

Create a file at MovieAgentApp.java:

package com.mongodb.movieagent;

import com.mongodb.client.*;
import dev.langchain4j.model.voyageai.VoyageAiEmbeddingModel;
import dev.langchain4j.store.embedding.mongodb.IndexMapping;
import dev.langchain4j.store.embedding.mongodb.MongoDbEmbeddingStore;
import org.bson.Document;

import java.util.HashSet;

public class MovieAgentApp {

    public static final String databaseName = "movie_search";
    public static final String collectionName = "movies";
    public static final String indexName = "vector_index";

    public static void main(String[] args) throws InterruptedException {
        String embeddingApiKey = System.getenv("VOYAGE_AI_KEY");
        String mongodbUri = System.getenv("MONGODB_URI");
        String watchmodeKey = System.getenv("WATCHMODE_KEY");
        String openAiKey = System.getenv("OPENAI_KEY");

        MongoClient mongoClient = MongoClients.create(mongodbUri);

        VoyageAiEmbeddingModel embeddingModel = VoyageAiEmbeddingModel.builder()
                .apiKey(embeddingApiKey)
                .modelName("voyage-3")
                .build();
    }
}

Enter fullscreen mode Exit fullscreen mode

This sets up our connection to MongoDB using the connection string from our environment variables. We're also instantiating the Voyage AI embedding model, which we'll use to convert movie plot descriptions into vector embeddings.

The voyage-3 model generates 1024-dimensional embeddings, which we'll need to specify when creating our vector search index. To learn more about Voyage AI's models, check out their blog post about voyage-3.

Now, we will be using the Voyage AI model for generating our embeddings, and the separate OpenAI model for planning and orchestrating our AI agent. If you prefer, you can use OpenAI for both the embedding and planning. Just swap out your VoyageAiEmbeddingModel for the OpenAiEmbeddingModel. This does not go both ways, as Voyage AI is not supported as a chat model in LangChain4j.

Next, we'll configure our embedding store and automatically create a vector search index if one doesn't already exist:

        IndexMapping indexMapping = IndexMapping.builder()
                .dimension(embeddingModel.dimension())
                .metadataFieldNames(new HashSet<>())
                .build();

        MongoDbEmbeddingStore embeddingStore = MongoDbEmbeddingStore.builder()
                .databaseName(databaseName)
                .collectionName(collectionName)
                .createIndex(checkIndexExists(mongoClient))
                .indexName(indexName)
                .indexMapping(indexMapping)
                .fromClient(mongoClient)
                .build();

        if(checkDataExists(mongoClient)) {
            loadDataFromCSV(embeddingStore, embeddingModel);
        }

Enter fullscreen mode Exit fullscreen mode

Let's break down what's happening here:

The IndexMapping tells LangChain4j how to configure the vector search index. We're setting the dimension to match our embedding model (1024 for voyage-3) and leaving metadataFieldNames empty since we don't need to filter on metadata fields for this example.

The MongoDbEmbeddingStore builder does several things:

Points to our movie_search.movies collection
Checks if an index already exists using our helper method checkIndexExists()
If no index exists, automatically creates one with the name vector_index
Uses our IndexMapping to define the index structure

The resulting vector search index looks like this:

{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 1024,
      "similarity": "cosine"
    }
  ]
}

Enter fullscreen mode Exit fullscreen mode

Finally, we check if the collection already has data using checkDataExists(), and if it's empty, we load and embed the movie dataset from our CSV file.

Now, let's add those helper methods at the bottom of the class:

    public static void loadDataFromCSV(
            MongoDbEmbeddingStore embeddingStore,
            VoyageAiEmbeddingModel embeddingModel
    ) throws InterruptedException {
        System.out.println("Loading data...");

        MovieEmbeddingService embeddingService = new MovieEmbeddingService(embeddingStore, embeddingModel);
        embeddingService.ingestMoviesFromCsv();

        System.out.println("Movie data loaded successfully!");
        System.out.println("Waiting 5 seconds for indexing to complete...");
        Thread.sleep(5000);
    }

    public static boolean checkDataExists(MongoClient mongoClient) {
        MongoCollection<Document> collection = mongoClient
            .getDatabase(databaseName)
            .getCollection(collectionName);
        return collection.find().first() == null;
    }

    public static boolean checkIndexExists(MongoClient mongoClient) {
        MongoCollection<Document> collection = mongoClient
            .getDatabase(databaseName)
            .getCollection(collectionName);

        try(MongoCursor<Document> indexes = collection.listIndexes().iterator()) {
            while (indexes.hasNext()) {
                Document index = indexes.next();
                if (indexName.equals(index.getString(indexName))) {
                    return false;
                }
            }
        }
        return true;
    }
}

Enter fullscreen mode Exit fullscreen mode

The checkIndexExists() method iterates through all indexes on the collection and returns false if it finds one named vector_index, telling LangChain4j to skip index creation. The checkDataExists() method simply checks if the collection is empty.

The five-second sleep after loading data gives MongoDB Atlas time to build the vector search index. Atlas indexing is eventually consistent, so this wait ensures our index is queryable before we start searching. In production, you'd want more robust index readiness checking, but for a tutorial, a brief sleep does the job.

Importing our data

Now, we need to actually load the IMDB dataset, parse it, and convert the plot descriptions into vector embeddings. We'll create two classes: Movie to represent each row in the CSV, and MovieEmbeddingService to handle the embedding and storage logic.

Creating the Movie model

Create Movie.java:

package com.mongodb.movieagent;  

import com.opencsv.bean.CsvBindByPosition;  

public class Movie {  

    @CsvBindByPosition(position = 0)  
    private String posterLink;  

    @CsvBindByPosition(position = 1)  
    private String title;  

    @CsvBindByPosition(position = 2)  
    private String year;  

    @CsvBindByPosition(position = 3)  
    private String certificate;  

    @CsvBindByPosition(position = 4)  
    private String runtime;  

    @CsvBindByPosition(position = 5)  
    private String genre;  

    @CsvBindByPosition(position = 6)  
    private String imdbRating;  

    @CsvBindByPosition(position = 7)  
    private String overview;  

    @CsvBindByPosition(position = 8)  
    private String metaScore;  

    @CsvBindByPosition(position = 9)  
    private String director;  

    @CsvBindByPosition(position = 10)  
    private String star1;  

    @CsvBindByPosition(position = 11)  
    private String star2;  

    @CsvBindByPosition(position = 12)  
    private String star3;  

    @CsvBindByPosition(position = 13)  
    private String star4;  

    @CsvBindByPosition(position = 14)  
    private String numberOfVotes;  

    @CsvBindByPosition(position = 15)  
    private String gross;  

    public Movie() {}  

    public String getPosterLink() { return posterLink; }  
    public void setPosterLink(String posterLink) { this.posterLink = posterLink; }  

    public String getTitle() { return title; }  
    public void setTitle(String title) { this.title = title; }  

    public String getYear() { return year; }  
    public void setYear(String year) { this.year = year; }  

    public String getCertificate() { return certificate; }  
    public void setCertificate(String certificate) { this.certificate = certificate; }  

    public String getRuntime() { return runtime; }  
    public void setRuntime(String runtime) { this.runtime = runtime; }  

    public String getGenre() { return genre; }  
    public void setGenre(String genre) { this.genre = genre; }  

    public String getImdbRating() { return imdbRating; }  
    public void setImdbRating(String imdbRating) { this.imdbRating = imdbRating; }  

    public String getOverview() { return overview; }  
    public void setOverview(String overview) { this.overview = overview; }  

    public String getMetaScore() { return metaScore; }  
    public void setMetaScore(String metaScore) { this.metaScore = metaScore; }  

    public String getDirector() { return director; }  
    public void setDirector(String director) { this.director = director; }  

    public String getStar1() { return star1; }  
    public void setStar1(String star1) { this.star1 = star1; }  

    public String getStar2() { return star2; }  
    public void setStar2(String star2) { this.star2 = star2; }  

    public String getStar3() { return star3; }  
    public void setStar3(String star3) { this.star3 = star3; }  

    public String getStar4() { return star4; }  
    public void setStar4(String star4) { this.

Build Your First AI Agent With MongoDB and LangChain4j

TL;DR

Key Takeaways

Tags

What is an AI agent?

What is LangChain4j?

Prerequisites

Our dataset

Creating our app

Connecting to MongoDB

Importing our data

Creating the Movie model

DEV.to - Trending Guides