---
title: "Claude Embeddings and RAG Pipelines: A Practical Guide"
canonical_url: "https://www.calypso.so/blog/claude-embeddings-rag-pipelines"
last_updated: "2026-06-25T19:33:58.608Z"
meta:
  description: "Build Claude retrieval systems with text and multimodal embeddings, reranking, and citations for grounded answers from messy documents."
  keywords: "Claude embeddings, RAG pipeline, multimodal retrieval, Voyage AI"
  "og:description": "Build Claude retrieval systems with text and multimodal embeddings, reranking, and citations for grounded answers from messy documents."
  "og:title": "Claude Embeddings and RAG Pipelines: A Practical Guide"
  "twitter:description": "Build Claude retrieval systems with text and multimodal embeddings, reranking, and citations for grounded answers from messy documents."
  "twitter:title": "Claude Embeddings and RAG Pipelines: A Practical Guide"
---

Calypso home

Blog / rag-engineering

# **Building Embeddings and RAG Pipelines with Claude**

A practical guide to building Claude retrieval systems with text and multimodal embeddings, reranking, and citations.

**Claude****embeddings****RAG****multimodal retrieval****Voyage**

![Calypso Team](https://www.calypso.so/images/authors/calypso.png)

**Calypso Team**

4 min read·June 25, 2026·7 sources

**Essay ** Claude is strong at reasoning over grounded context, but it is not the retrieval layer. In production, you still need an embeddings and RAG pipeline that decides what evidence enters the prompt, how it is ranked, and how it is cited. ## Claude is the answer layer, not the vector store Anthropic's docs separate the job cleanly. Claude is the model that reads the context and writes the answer. Voyage supplies the embedding and reranking models that turn documents into something a retrieval system can search. That split is useful because it keeps reasoning and retrieval from getting blurred together. The practical takeaway is simple. Do not treat the model as a database with better grammar. Build a retrieval pipeline first, then give Claude the best evidence you can actually recover. ## Choose text or multimodal embeddings based on the source Plain text sources should usually go through text embeddings. That covers help docs, API references, policies, articles, and clean OCR output. Voyage's current text models include `voyage-4-large`, `voyage-4`, `voyage-4-lite`, and `voyage-4-nano`, which are intended for general-purpose retrieval across different latency and quality needs. If the source depends on visual structure, use multimodal embeddings instead. Screenshots, slide decks, PDFs with charts, scanned tables, and figure-heavy documents need a model that can keep the image and text signals together. Voyage's multimodal models are built for that exact case, which is why they fit real document retrieval better than a text-only shortcut. ## Build the retrieval path before Claude sees anything A useful pipeline starts with ingestion. Parse the source, preserve page and section metadata, split content into answerable chunks, and embed those chunks with the right model. Then use vector search to pull a broad candidate set, because the first pass is about recall, not perfection. After that, add filters and reranking. Metadata filters keep the search inside the right product, customer, language, or permission boundary. Reranking trims the candidate set down to passages that are actually useful for the query instead of merely adjacent to it. A good retrieval pipeline is boring in the right way. It should make it hard for weak evidence to slip through. - Parse source documents and preserve structure. - Chunk by content shape, not just token count. - Embed documents and queries with the right model type. - Retrieve broadly, then rerank the top candidates. - Pass only the best evidence to Claude. ## Reranking is the difference between decent and usable The first retrieval step is usually noisy. That is normal. Embeddings are designed to find semantic neighbors quickly, not to decide whether a passage is the best possible support for a specific question. Reranking fixes that by scoring the query and each candidate together. Voyage's reranker docs describe this as a cross-encoder step, which is why it often improves relevance more than people expect. It does not replace vector search. It sits on top of it and sharpens the result. That extra step matters a lot when the source set is large or the question is vague. ## Ground the final answer with citations Claude can answer from the retrieved evidence, but the answer should still be auditable. Anthropic's citations guidance exists for a reason: the user should be able to check where each claim came from instead of trusting a polished paragraph on faith. If the system also needs to move the output into another step, structured output is the cleanest next move. A grounded answer can be returned as JSON, a UI payload, or a downstream agent input without forcing another parsing pass. ## Production tradeoffs are real Model choice is always a tradeoff between quality, latency, and cost. Smaller embedding models are useful when you need to search a lot of content quickly. Larger models are better when retrieval quality matters more than raw speed. Multimodal retrieval costs more, but it saves you from flattening the parts of the document that actually matter. PDF handling also has costs because pages can contribute both text and image processing. That is not a reason to avoid PDFs. It is a reason to be deliberate about ingestion, chunking, and the amount of content you send through the pipeline. ## Build the pipeline into Calypso Calypso is built for teams that want the retrieval layer behind grounded answers without stitching every piece together by hand. Buckets hold multimodal knowledge, Agents define behavior, and Integrations ship answers into websites, workflows, APIs, MCP clients, and product surfaces. If you are evaluating Claude for a retrieval product, the real question is not whether embeddings work. They do. The question is whether you want to own the entire pipeline, or use a knowledge layer that already handles the hard parts of grounding, citations, and delivery.**Sources ** References and source material used in this essay. - [**1****Anthropic: Embeddings**platform.claude.com](https://platform.claude.com/docs/en/build-with-claude/embeddings) - [**2****Anthropic: PDF support**platform.claude.com](https://platform.claude.com/docs/en/build-with-claude/pdf-support) - [**3****Anthropic: Citations**platform.claude.com](https://platform.claude.com/docs/en/build-with-claude/citations) - [**4****Anthropic: Structured outputs**platform.claude.com](https://platform.claude.com/docs/en/build-with-claude/structured-outputs) - [**5****Voyage AI: Embeddings**docs.voyageai.com](https://docs.voyageai.com/docs/embeddings) - [**6****Voyage AI: Reranker**docs.voyageai.com](https://docs.voyageai.com/docs/reranker) - [**7****Voyage AI: Pricing**docs.voyageai.com](https://docs.voyageai.com/docs/pricing)**Keep reading **## Related essays. More writing from the same engineering and product topic cluster. [Technical Guiderag-engineering**Jun 25, 2026 · 5 min read**<h3>**LlamaIndex and RAG Workflows: How Production Retrieval Apps Are Built**</h3>A technical deep dive into how LlamaIndex structures ingestion, indexing, retrieval, synthesis, and event-driven RAG workflows.**LlamaIndex****RAG**rag-engineering**Read article **](https://www.calypso.so/blog/llamaindex-rag-workflows) [Technical Guiderag-engineering**Jun 25, 2026 · 4 min read**<h3>**ChatGPT, Embeddings, and RAG Pipelines: How Grounded AI Answers Actually Work**</h3>A technical guide to how ChatGPT, embeddings, vector search, and RAG pipelines work together to produce grounded AI answers.**ChatGPT****embeddings**rag-engineering**Read article **](https://www.calypso.so/blog/chatgpt-embeddings-rag-pipelines)**From essay to product**## **Turn engineering ideas into source-backed answers.** Use Calypso to organize sources, attach them to hosted agents, and launch grounded answers across your website, workflows, and product UI. [**See live demo **](https://www.calypso.so/demos) [**Get Started for Free **](https://rag.calypso.so/join)