Build Your First AI Agent With MongoDB and LangChain4j
TL;DR
This tutorial guides you through building an AI agent for movie recommendations using LangChain4j and MongoDB. It covers semantic search with vector embeddings, integrating external APIs, and autonomous task orchestration.
Key Takeaways
- •AI agents use reasoning and tools like databases and APIs to achieve goals autonomously, unlike static chatbots.
- •LangChain4j is a Java library that simplifies building LLM-powered applications, including agentic systems with frameworks like ReAct.
- •The tutorial demonstrates creating a movie recommendation agent with MongoDB for vector search and external APIs for real-time data.
- •Prerequisites include Java, Maven, MongoDB Atlas, and API keys for embeddings and planning models.
- •Key steps involve setting up dependencies, loading and embedding data, and configuring the agent for multi-step workflows.
Tags
AI agents are everywhere right now. We've all heard the pitch: They reason about problems, use tools autonomously, and chain together multiple steps to accomplish goals without constant hand-holding. They're being deployed to book flights, analyze data pipelines, and handle customer support—taking on tasks that previously required either rigid automation or human intervention.
This tutorial sits as a nice introduction to building your first AI agent with LangChain4j. We'll create a movie recommendation agent that:
- Understands natural language descriptions of plots ("a sci-fi movie about rebels fighting an empire").
- Searches semantically through a database using vector embeddings.
- Calls external APIs to fetch real-time streaming availability.
- Orchestrates these steps autonomously based on the query.
- Returns clean, conversational answers instead of raw JSON.
This agent uses MongoDB Atlas for vector search, LangChain4j's agentic framework for orchestration, and the Watchmode API for streaming data. By the end, you'll understand not just how to wire up these components, but how agentic systems actually work: how LLMs plan multi-step workflows, how tools share state, and how to give your agent instructions without hardcoding every possible scenario.
Vector search is what will allow us to search our data with natural language, and if you'd like to learn more and earn a skills badge, check out our Vector Search Fundamentals skills badge.
If you just want the code, it's available in this GitHub repo.
What is an AI agent?
An AI agent is a system that can take in information about its environment, decide what to do next, and work towards a goal, with minimum human intervention. Instead of behaving like a static chatbot that only answers questions, an agent can plan, use tools (like databases or APIs), and adapt based on results.
While the defining traits of an AI agent are the ability to reason and act, there’s no single paradigm for how an agent must function. One popular approach is the ReAct framework (Reasoning + Acting), where the agent:
- Thinks about the problem.
- Takes an action (e.g., querying a search engine).
- Observes the result.
- Repeats until it can deliver a complete answer.
In short:
- Chatbot: Answers questions based on what it already knows
- AI agent: Reasons about what it needs to learn, uses tools to gather that information, and works toward a goal
For our movie recommendation agent, we'll use a supervisor pattern, where a planning model orchestrates multiple specialized tools to find movies by plot description, look up streaming availability, and return a clean answer to the user.
What is LangChain4j?
LangChain4j is an open-source Java library designed to simplify building LLM-powered applications. It provides a unified interface for working with multiple LLM providers (OpenAI, Anthropic, etc.) and vector stores like MongoDB Atlas.
While the name invites comparison to the Python-based LangChain project, LangChain4j is more of a fusion, drawing inspiration from Haystack, LlamaIndex, and the wider AI community, all while staying laser-focused on the needs of Java developers.
Development kicked off in early 2023 during the ChatGPT boom, and while the project is still evolving, its core functionality is stable and production-ready. If you're exploring LLM-powered applications in Java, LangChain4j is a pragmatic and actively maintained option.
For this tutorial, we're specifically using the langchain4j-agentic module, which provides abstractions for building agentic systems, including the supervisor pattern that will orchestrate our movie search workflow.
Prerequisites
Before we get started, here's what you'll need:
- Java 8 or later installed and ready to go (I'm using Java 24)
- Maven for building the project (version 3.9.10 or later recommended)
- A MongoDB Atlas cluster—a free M0 tier is perfect for this tutorial. If you need help setting one up, check out the Get Started with Atlas guide.
- 
API keys:
- Voyage AI for generating embeddings (sign up here)
- You can use the OpenAI API for both the embeddings and chat model, if you prefer.
- OpenAI for the planning model (get your key here)
- Watchmode for streaming availability data (register here)
 
Make sure your IP address is whitelisted in your MongoDB Atlas cluster's network access settings, and create a database user with read/write permissions.
Our dataset
For this tutorial, we're using the IMDB Top 1000 Movies dataset from Kaggle. Download the CSV file and place it in your project's src/main/resources directory, naming it imdb_top_1000.csv.
This dataset includes movie titles, plot overviews, genres, directors, and IMDB ratings. We'll be embedding the overview field (the plot description) so users can search for movies semantically—e.g., "Find me a sci-fi movie about rebels fighting an empire"—instead of needing to remember the exact title.
Creating our app
Let's scaffold a new Maven project. Create a pom.xml file with the following dependencies:
    <dependencies>
        <!-- LangChain4j core -->
        <dependency>
            <groupId>dev.langchain4j</groupId>
            <artifactId>langchain4j</artifactId>
            <version>1.5.0</version>
        </dependency>
        <!-- LangChain4j agentic module -->
        <dependency>
            <groupId>dev.langchain4j</groupId>
            <artifactId>langchain4j-agentic</artifactId>
            <version>1.5.0-beta11</version>
        </dependency>
        <!-- OpenAI integration for the planning model -->
        <dependency>
            <groupId>dev.langchain4j</groupId>
            <artifactId>langchain4j-open-ai</artifactId>
            <version>1.4.0</version>
        </dependency>
        <!-- MongoDB Atlas integration -->
        <dependency>
            <groupId>dev.langchain4j</groupId>
            <artifactId>langchain4j-mongodb-atlas</artifactId>
            <version>1.5.0-beta11</version>
        </dependency>
        <!-- Voyage AI for embeddings -->
        <dependency>
            <groupId>dev.langchain4j</groupId>
            <artifactId>langchain4j-voyage-ai</artifactId>
            <version>1.5.0-beta11</version>
        </dependency>
        <!-- MongoDB Java Driver -->
        <dependency>
            <groupId>org.mongodb</groupId>
            <artifactId>mongodb-driver-sync</artifactId>
            <version>5.5.1</version>
        </dependency>
        <!-- OpenCSV for parsing the IMDB dataset -->
        <dependency>
            <groupId>com.opencsv</groupId>
            <artifactId>opencsv</artifactId>
            <version>5.8</version>
        </dependency>
        <!-- Jackson for parsing JSON responses from Watchmode -->
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.17.2</version>
        </dependency>
    </dependencies>
    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>dev.langchain4j</groupId>
                <artifactId>langchain4j-bom</artifactId>
                <version>1.5.0-beta11</version>
                <type>pom</type>
            </dependency>
        </dependencies>
    </dependencyManagement>
</project>
These dependencies give us everything we need: the LangChain4j agentic framework, MongoDB and OpenAI integrations, embedding support via Voyage AI, and utilities for parsing CSV and JSON data.
Connecting to MongoDB
Now, let's create the main application class that will handle connecting to MongoDB, loading our movie data, and setting up our agent.
Create a file at MovieAgentApp.java:
package com.mongodb.movieagent;
import com.mongodb.client.*;
import dev.langchain4j.model.voyageai.VoyageAiEmbeddingModel;
import dev.langchain4j.store.embedding.mongodb.IndexMapping;
import dev.langchain4j.store.embedding.mongodb.MongoDbEmbeddingStore;
import org.bson.Document;
import java.util.HashSet;
public class MovieAgentApp {
    public static final String databaseName = "movie_search";
    public static final String collectionName = "movies";
    public static final String indexName = "vector_index";
    public static void main(String[] args) throws InterruptedException {
        String embeddingApiKey = System.getenv("VOYAGE_AI_KEY");
        String mongodbUri = System.getenv("MONGODB_URI");
        String watchmodeKey = System.getenv("WATCHMODE_KEY");
        String openAiKey = System.getenv("OPENAI_KEY");
        MongoClient mongoClient = MongoClients.create(mongodbUri);
        VoyageAiEmbeddingModel embeddingModel = VoyageAiEmbeddingModel.builder()
                .apiKey(embeddingApiKey)
                .modelName("voyage-3")
                .build();
    }
}
This sets up our connection to MongoDB using the connection string from our environment variables. We're also instantiating the Voyage AI embedding model, which we'll use to convert movie plot descriptions into vector embeddings.
The voyage-3 model generates 1024-dimensional embeddings, which we'll need to specify when creating our vector search index. To learn more about Voyage AI's models, check out their blog post about voyage-3.
Now, we will be using the Voyage AI model for generating our embeddings, and the separate OpenAI model for planning and orchestrating our AI agent. If you prefer, you can use OpenAI for both the embedding and planning. Just swap out your VoyageAiEmbeddingModel for the OpenAiEmbeddingModel. This does not go both ways, as Voyage AI is not supported as a chat model in LangChain4j.
Next, we'll configure our embedding store and automatically create a vector search index if one doesn't already exist:
        IndexMapping indexMapping = IndexMapping.builder()
                .dimension(embeddingModel.dimension())
                .metadataFieldNames(new HashSet<>())
                .build();
        MongoDbEmbeddingStore embeddingStore = MongoDbEmbeddingStore.builder()
                .databaseName(databaseName)
                .collectionName(collectionName)
                .createIndex(checkIndexExists(mongoClient))
                .indexName(indexName)
                .indexMapping(indexMapping)
                .fromClient(mongoClient)
                .build();
        if(checkDataExists(mongoClient)) {
            loadDataFromCSV(embeddingStore, embeddingModel);
        }
Let's break down what's happening here:
The IndexMapping tells LangChain4j how to configure the vector search index. We're setting the dimension to match our embedding model (1024 for voyage-3) and leaving metadataFieldNames empty since we don't need to filter on metadata fields for this example.
The MongoDbEmbeddingStore builder does several things:
- Points to our movie_search.moviescollection
- Checks if an index already exists using our helper method checkIndexExists()
- If no index exists, automatically creates one with the name vector_index
- Uses our IndexMappingto define the index structure
The resulting vector search index looks like this:
{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 1024,
      "similarity": "cosine"
    }
  ]
}
Finally, we check if the collection already has data using checkDataExists(), and if it's empty, we load and embed the movie dataset from our CSV file.
Now, let's add those helper methods at the bottom of the class:
    public static void loadDataFromCSV(
            MongoDbEmbeddingStore embeddingStore,
            VoyageAiEmbeddingModel embeddingModel
    ) throws InterruptedException {
        System.out.println("Loading data...");
        MovieEmbeddingService embeddingService = new MovieEmbeddingService(embeddingStore, embeddingModel);
        embeddingService.ingestMoviesFromCsv();
        System.out.println("Movie data loaded successfully!");
        System.out.println("Waiting 5 seconds for indexing to complete...");
        Thread.sleep(5000);
    }
    public static boolean checkDataExists(MongoClient mongoClient) {
        MongoCollection<Document> collection = mongoClient
            .getDatabase(databaseName)
            .getCollection(collectionName);
        return collection.find().first() == null;
    }
    public static boolean checkIndexExists(MongoClient mongoClient) {
        MongoCollection<Document> collection = mongoClient
            .getDatabase(databaseName)
            .getCollection(collectionName);
        try(MongoCursor<Document> indexes = collection.listIndexes().iterator()) {
            while (indexes.hasNext()) {
                Document index = indexes.next();
                if (indexName.equals(index.getString(indexName))) {
                    return false;
                }
            }
        }
        return true;
    }
}
The checkIndexExists() method iterates through all indexes on the collection and returns false if it finds one named vector_index, telling LangChain4j to skip index creation. The checkDataExists() method simply checks if the collection is empty.
The five-second sleep after loading data gives MongoDB Atlas time to build the vector search index. Atlas indexing is eventually consistent, so this wait ensures our index is queryable before we start searching. In production, you'd want more robust index readiness checking, but for a tutorial, a brief sleep does the job.
Importing our data
Now, we need to actually load the IMDB dataset, parse it, and convert the plot descriptions into vector embeddings. We'll create two classes: Movie to represent each row in the CSV, and MovieEmbeddingService to handle the embedding and storage logic.
Creating the Movie model
Create Movie.java:
package com.mongodb.movieagent;  
import com.opencsv.bean.CsvBindByPosition;  
public class Movie {  
    @CsvBindByPosition(position = 0)  
    private String posterLink;  
    @CsvBindByPosition(position = 1)  
    private String title;  
    @CsvBindByPosition(position = 2)  
    private String year;  
    @CsvBindByPosition(position = 3)  
    private String certificate;  
    @CsvBindByPosition(position = 4)  
    private String runtime;  
    @CsvBindByPosition(position = 5)  
    private String genre;  
    @CsvBindByPosition(position = 6)  
    private String imdbRating;  
    @CsvBindByPosition(position = 7)  
    private String overview;  
    @CsvBindByPosition(position = 8)  
    private String metaScore;  
    @CsvBindByPosition(position = 9)  
    private String director;  
    @CsvBindByPosition(position = 10)  
    private String star1;  
    @CsvBindByPosition(position = 11)  
    private String star2;  
    @CsvBindByPosition(position = 12)  
    private String star3;  
    @CsvBindByPosition(position = 13)  
    private String star4;  
    @CsvBindByPosition(position = 14)  
    private String numberOfVotes;  
    @CsvBindByPosition(position = 15)  
    private String gross;  
    public Movie() {}  
    public String getPosterLink() { return posterLink; }  
    public void setPosterLink(String posterLink) { this.posterLink = posterLink; }  
    public String getTitle() { return title; }  
    public void setTitle(String title) { this.title = title; }  
    public String getYear() { return year; }  
    public void setYear(String year) { this.year = year; }  
    public String getCertificate() { return certificate; }  
    public void setCertificate(String certificate) { this.certificate = certificate; }  
    public String getRuntime() { return runtime; }  
    public void setRuntime(String runtime) { this.runtime = runtime; }  
    public String getGenre() { return genre; }  
    public void setGenre(String genre) { this.genre = genre; }  
    public String getImdbRating() { return imdbRating; }  
    public void setImdbRating(String imdbRating) { this.imdbRating = imdbRating; }  
    public String getOverview() { return overview; }  
    public void setOverview(String overview) { this.overview = overview; }  
    public String getMetaScore() { return metaScore; }  
    public void setMetaScore(String metaScore) { this.metaScore = metaScore; }  
    public String getDirector() { return director; }  
    public void setDirector(String director) { this.director = director; }  
    public String getStar1() { return star1; }  
    public void setStar1(String star1) { this.star1 = star1; }  
    public String getStar2() { return star2; }  
    public void setStar2(String star2) { this.star2 = star2; }  
    public String getStar3() { return star3; }  
    public void setStar3(String star3) { this.star3 = star3; }  
    public String getStar4() { return star4; }  
    public void setStar4(String star4) { this.