Spring AI with MongoDB Vector Store: Retrieving Source URLs for RAG Applications

Leveraging RAG (Retrieval-Augmented Generation) applications has become increasingly popular for delivering accurate and context-rich responses. Developers often combine proven technologies like Spring AI and MongoDB Vector Store for efficiently handling data retrieval and improving conversational applications.

Spring AI simplifies building chat applications by providing integration points for language models, while MongoDB serves as an effective, scalable vector store ideal for retrieving relevant content based on semantic similarity. If you’re looking to incorporate helpful context, like source URLs, into your chatbot responses, you’ll face a unique set of challenges. Let’s look at how you can overcome them and successfully implement source URL retrieval for your RAG application.

Setting the Entry Point with Spring AI’s ChatController

To get started with Spring AI and MongoDB integration, you first need a proper controller to handle user queries. A common approach is creating a ChatController class that generates responses using Spring’s integration with OllamaChatModel and the MongoDB Vector Store.

Using dependency injection, your ChatController can seamlessly connect with required components, keeping your code manageable:

@RestController
@RequestMapping("/chat")
public class ChatController {

    private final OllamaChatModel chatModel;
    private final VectorStore vectorStore;

    @Autowired
    public ChatController(OllamaChatModel chatModel, VectorStore vectorStore) {
        this.chatModel = chatModel;
        this.vectorStore = vectorStore;
    }

    @PostMapping("/generate")
    public ResponseModel generateResponse(@RequestBody ChatRequest request) {
        // Logic for retrieving responses and source URLs
    }
}

In this setup, your @RestController exposes an API endpoint that handles incoming user questions, fetching data from MongoDB through vector embeddings.

Configuring MongoDB VectorStore in Spring AI

Spring’s AI ecosystem provides an excellent starter package—spring-ai-mongodb-atlas-store-spring-boot-starter. You need to configure the connection details with application properties.

Add these configuration properties to your application.properties file:

spring.ai.mongodb.uri=mongodb+srv://username:password@cluster.mongodb.net
spring.ai.mongodb.database=your_database_name
spring.ai.mongodb.collection=vector_collection

These settings establish a smooth connection between your Spring application and MongoDB, enabling swift vector data operations.

Storing Metadata with Source URLs in MongoDB VectorStore

Your MongoDB collection likely has a standardized structure that stores vector embeddings alongside metadata. To keep track of source URLs, you’ll need an extra metadata field that’s easily retrievable.

Here’s an example structure of the MongoDB document:

{
  "_id": "12345",
  "embedding": [0.123, 0.022, ..., 0.456],
  "metadata": {
    "title": "Spring AI Guide",
    "url": "https://spring.io/projects/spring-ai"
  }
}

Including the "url" metadata property helps later when you aim to display the source of the content in user responses. It serves as a direct reference to the original information source.

Enhancing ResponseModel to Include Source URLs

Suppose your application requires displaying source URLs for additional context and credibility. You’ll first design a ResponseModel that can accommodate URLs effectively:

public class ResponseModel {
  
  private String generatedResponse;
  private List sourceUrls;

  // Getters and Setters omitted for brevity
}

Now, your controller should ideally populate sourceUrls when returning responses. But in real-world scenarios, many users report challenges in auto-populating these URLs directly from the metadata.

Common Challenges in URL Retrieval from MongoDB VectorStore

While the intention is straightforward, developers often encounter hurdles retrieving URLs promptly. These issues are commonly due to the following reasons:

Inadequate metadata handling: Metadata isn’t automatically transferred through Spring AI’s default retrieval mechanisms.
Response-model mismatch: Failing to map metadata explicitly in your code.
Spring AI configuration limitations: Default setup needs tweaks to pass metadata effectively.

Identifying these pitfalls upfront saves considerable debugging time. Although Spring AI is highly intuitive, custom data retrieval tasks require deliberate implementation.

Strategies for Resolving the Source URL Retrieval Issue

Let’s look at achievable solutions to address these challenges effectively.

1. Advisor Implementations in ChatClient Configuration

The Spring AI framework supports advisors, such as QuestionAnswerAdvisor, to guide your chat model towards retrieving additional data from VectorStore.

Consider this implementation example of QuestionAnswerAdvisor in your configuration bean:

@Bean
QuestionAnswerAdvisor qaAdvisor(VectorStore vectorStore) {
    return new QuestionAnswerAdvisor(vectorStore);
}

@Bean
ChatClient chatClient(OllamaChatModel model, QuestionAnswerAdvisor qaAdvisor) {
    return new ChatClient.Builder(model)
        .advisor(qaAdvisor)
        .build();
}

This setup explicitly communicates with MongoDB VectorStore, ensuring metadata is evaluated for additional contextual information like URLs.

2. Modifying ResponseModel to Handle Complex Metadata

Updating your ResponseModel to handle structured metadata explicitly will greatly simplify URL retrieval:

public class ResponseModel {
  
  private String generatedResponse;
  private List sources;

  public static class SourceMetadata {
      private String title;
      private String url;

      // Getters and Setters
  }

  // Getters and Setters omitted for brevity
}

Incorporating clearly structured objects like SourceMetadata helps carry multiple metadata properties intuitively.

Successfully Retrieving Source URLs in Spring AI

Combining advisor logic and improved response models allows you to retrieve URLs efficiently. After configuring your client and advisor, your controller logic may look like this:

@PostMapping("/generate")
public ResponseModel generateResponse(@RequestBody ChatRequest request) {
    ChatResponse aiResponse = chatClient.chat(request.getQuestion());

    List sourceList = aiResponse.getDocuments().stream()
        .map(doc -> {
            ResponseModel.SourceMetadata metadata = new ResponseModel.SourceMetadata();
            metadata.setTitle(doc.getMetadata().get("title"));
            metadata.setUrl(doc.getMetadata().get("url"));
            return metadata;
        })
        .collect(Collectors.toList());

    ResponseModel responseModel = new ResponseModel();
    responseModel.setGeneratedResponse(aiResponse.getResult());
    responseModel.setSources(sourceList);
    return responseModel;
}

Now your Spring AI application returns complete and credible answers, clearly indicating their source histories—a valuable addition for any RAG-based project.

With it, users easily navigate back to original content for further reading, enhancing transparency and trust seamlessly.

Implementing source URL retrieval is just one step toward building robust conversational applications with Spring and MongoDB integration. As databases and frameworks evolve further, we can expect even smoother integrations and richer metadata support, providing better experiences.

Are you ready to enhance your RAG applications with reliable source URLs? Let us know how you’re using Spring AI to level up your projects!