Naive RAG Pipeline

Overview

The simplest RAG pipeline involves three sequential steps:

Indexing
- Documents are split into chunks.
- Each chunk is encoded into a vector using an embedding model.
- These vectors are stored in a vector database.
Retrieval
- User query is also converted into a vector.
- The top-K similar vectors (document chunks) are retrieved using similarity metrics.
Generation
- Retrieved content is concatenated with the query.
- The combined input is passed into an LLM for generation.

naive-rag

Limitations of Naive RAG:

May retrieve irrelevant content.
Doesn’t prioritize or summarize results.
Might cause hallucinations if bad content is retrieved.
Lacks control mechanisms and adaptability.