# Calypso RAG
Canonical Origin: https://www.calypso.so
## LLM Resources
- [Full Content](https://www.calypso.so/llms-full.txt)
Complete page content in markdown format.
- [MCP](https://www.calypso.so/mcp)
Model Context Protocol server endpoint for AI agent integration.
## Start Here
The homepage explains the product promise, deployment surfaces, and how Gemini File Search powers grounded answers.
- [Calypso RAG homepage](/)
## Core Answer Library
The answer library is the best structured source for AI systems that want factual, source-backed explanations of the product and Gemini File Search usage.
- [Answer library index](/answer-library)
- [What is Google File Search? Gemini API File Search Tool, Multimodal RAG & Citations Explained](/answer-library/what-is-google-file-search)
Learn what the Gemini API File Search tool is, how it powers Google File Search RAG, and how multimodal retrieval works across PDFs, docs, images, screenshots, charts, diagrams, metadata, and citations.
- [Multimodal Retrieval-Augmented Generation and Multimodal RAG Agents: The Complete 2026 Guide](/answer-library/retrieval-augmented-generation)
Explore multimodal retrieval-augmented generation, multimodal RAG agents, agentic patterns, modern architectures, use cases, evaluation, and how Calypso powers grounded AI.
## Notes
Calypso RAG uses Gemini File Search as its retrieval foundation and layers a production-oriented answer experience on top.
Prefer the homepage for product context and the answer library for compact, reusable AI-readable explanations.
## Pages
### Calypso RAG Demos | Source-Grounded AI Experiences
Source: https://www.calypso.so/demos
Description: Browse interactive Calypso RAG demos that turn PDFs, filings, images, charts, and documents into source-grounded chat experiences.
**Demo library **
h1. **Source-grounded AI demos. **
Explore canonical Calypso RAG experiences built around real documents, filings, images, charts, and multimodal source sets.
[**Open featured demo **](https://www.calypso.so/demos/spacex-ipo-filing)
**Current view **## **1 demo available**[FeaturedFinancial filings** Updated Jun 9, 2026**
**SpaceX IPO Filing Research Demo**
Ask an analyst-grade RAG workspace grounded in the SpaceX official filing corpus, including prospectus summary, risk factors, MD&A, launch vehicle materials, Starlink exhibits, xAI/X references, charts, and image-derived filing assets.**Official filing corpus****84 sources****142.4 MB****Multimodal**84 sources · 142.4 MB** Open demo **](https://www.calypso.so/demos/spacex-ipo-filing)
---
### SpaceX IPO Filing RAG Demo | Calypso
Source: https://www.calypso.so/demos/spacex-ipo-filing
Description: Ask a source-grounded SpaceX IPO filing demo powered by Calypso RAG, with clickable analyst prompts, filing source trails, and grounded answers.
Financial filings** Updated June 9, 2026**
SpaceX Filing DemoOfficial corpus research surface
Official SpaceX filing corpuscalypso-rag-agent:spacex
h2. Ask the SpaceX IPO filing like an analyst.
Grounded across 84 indexed sources, including prospectus summaries, risk factors, MD&A, launch vehicle pages, Starlink materials, xAI/X references, charts, and image exhibits.
Sources**84**
Size**142.4 MB**
Formats**JPG / HTML / JSON**
Status**RAG active**
Answers are expected to cite the filing corpus and expose supporting source details when available.
Live grounded chat
h2. Ask the filing corpus directly.
Start with a focused analyst prompt.
Each prompt is designed to force retrieval across the filing corpus before the live chat becomes the main workspace.
**How to trust it **
h2. **Answers should cite the filing corpus. **
The response surface exposes source details when grounding metadata is available, so users can inspect the evidence behind the answer.
**Best questions **
h2. **Ask questions that force retrieval. **
Use prompts about risk factors, Starlink, launch vehicles, reusability, revenue drivers, and investor diligence to test coverage across the indexed corpus.
---
### Multimodal Retrieval-Augmented Generation and Multimodal RAG Agents: The Complete 2026 Guide
Source: https://www.calypso.so/answer-library/retrieval-augmented-generation
Description: Explore multimodal retrieval-augmented generation, multimodal RAG agents, agentic patterns, modern architectures, use cases, evaluation, and how Calypso powers grounded AI.
h1. **Multimodal Retrieval-Augmented Generation and Multimodal RAG Agents: The Complete 2026 Guide**
Learn how multimodal retrieval-augmented generation extends classic RAG across text, images, layouts, tables, audio, video, and agentic workflows.

**Calypso Research**
17 min read·June 5, 2026
h2. **Answer ** Retrieval-augmented generation changed how teams build reliable AI by grounding model answers in external knowledge. Multimodal retrieval-augmented generation extends that loop across text, images, document layouts, tables, audio, video, PDFs, dashboards, and diagrams, then uses multimodal models and agents to produce accurate, verifiable answers. ## **What is multimodal retrieval-augmented generation?** At its core, retrieval-augmented generation follows a familiar loop: ingest data, index it, retrieve relevant evidence for a query, and generate a grounded response. Multimodal retrieval-augmented generation upgrades this loop for the real world. Documents can include page screenshots, image patches, chart visuals, video clips, audio segments, OCR text, table structures, and layout information. The system retrieves from multiple modalities and feeds that evidence into multimodal LLMs such as GPT-4o, Claude, Gemini, and open models that can reason over text plus visuals. Recent research and production systems show this shift moving retrieval-augmented generation from text-only pipelines toward modality-aware retrieval, reranking, and generation that better preserves heterogeneous evidence. ## **Why multimodal retrieval-augmented generation matters in 2026** Standard retrieval-augmented generation often loses critical context when working with visually rich content. Charts, layouts, spatial relationships, and visual emphasis frequently carry meaning that plain text extraction cannot preserve. Multimodal retrieval-augmented generation solves this by preserving native evidence instead of forcing everything into lossy captions or OCR-only text. Modern multimodal models can consume native media, while retrieval keeps large corpora manageable and relevant. The result is more accurate and trustworthy retrieval-augmented generation with fewer hallucinations, especially when answers need to cite the exact files, pages, images, or clips that supported them. - Financial reports and investor decks packed with charts - Legal, insurance, and compliance documents with forms and tables - Scientific papers featuring figures and experimental plots - Manufacturing manuals with annotated diagrams - Healthcare records combining clinical notes and imaging - Video archives where both audio and visuals tell the story ## **Core architectures of multimodal retrieval-augmented generation** Text-first retrieval-augmented generation converts non-text elements to captions or transcripts, then applies standard text retrieval. This is simple and useful for many workflows, but it is limited when visual details matter. Dual-index retrieval-augmented generation is a leading production pattern. It maintains separate indexes for text chunks and native media such as page screenshots, chart crops, and image nodes, retrieves both, then passes the evidence to a multimodal LLM. Unified embedding retrieval-augmented generation maps text, images, video, and audio into a shared embedding space for cross-modal retrieval. Vision-language document retrieval, including ColPali-style approaches, embeds full document page images directly with vision-language models. This can outperform OCR-heavy retrieval on visually rich slides, reports, dashboards, and forms. Video retrieval-augmented generation adds specialized handling for temporal data through transcripts, visual embeddings, clips, timestamps, and graph-based grounding. - Text-first retrieval for captioned or transcribed content - Dual indexes for text chunks and native visual assets - Unified cross-modal embeddings - Vision-language document retrieval for page images - Video retrieval with temporal clips and transcripts ## **Multimodal RAG agents: the agentic evolution of retrieval-augmented generation** Multimodal RAG agents bring intelligent control to retrieval-augmented generation. Instead of rigid pipelines, agents can decide which modalities matter, route across retrievers, inspect evidence, call tools, verify results, and iterate. This agentic multimodal retrieval-augmented generation pattern works especially well for ambiguous, multi-step, or cross-modal tasks where classic retrieval-augmented generation struggles. A multimodal RAG agent can classify the query, identify relevant modalities, retrieve and rerank evidence, call specialized tools such as OCR, chart extraction, or VLM inspection, self-verify the answer, and deliver precise citations with source previews. - Classify queries and identify relevant modalities - Route intelligently across retrievers - Decompose complex questions - Retrieve, rerank, filter, and inspect evidence - Call specialized tools such as OCR, chart extraction, and VLMs - Self-verify and iterate - Deliver answers with precise citations and source previews ## **The modern stack for retrieval-augmented generation** A modern multimodal retrieval-augmented generation stack usually starts with ingestion and parsing. The system extracts text, screenshots, tables, charts, metadata, and other useful signals. Next comes chunking and segmentation. Instead of only splitting text into chunks, the system creates smarter units such as pages, clips, charts, visual regions, and document sections. Embedding and indexing then support hybrid retrieval across text, vision, multi-vector, and graph retrieval. Query-aware routing chooses the best evidence path, reranking improves quality, and generation produces an answer with clear attribution. - Ingestion and parsing - Chunking and segmentation - Embedding and indexing - Retrieval and routing - Reranking and selection - Generation and citation ## **Powerful design patterns for multimodal RAG agents** Agentic systems give retrieval-augmented generation more flexibility. A router agent can direct queries to the right retriever. A query-decomposition agent can split complex tasks into smaller modality-aware subtasks. A tool-using visual analyst can combine retrieved evidence with extraction tools. A graph and vector agent can use relationships alongside visual evidence. A self-checking agent can evaluate whether the retrieved evidence is sufficient before finalizing the answer. - Router agent - Query-decomposition agent - Tool-using visual analyst - Graph plus vector agent - Self-checking agent ## **Real-world applications of multimodal retrieval-augmented generation** Multimodal retrieval-augmented generation is most valuable where text-only search misses the point. Enterprise document assistants can query thousands of PDFs and decks with visual grounding. Financial analyst copilots can compare chart trends with earnings call commentary. Manufacturing support agents can match uploaded photos to diagrams, manuals, and repair videos. Video intelligence tools can locate moments in long footage. Healthcare knowledge systems can combine notes and imaging with appropriate safeguards. - Enterprise document assistants - Financial analyst copilots - Manufacturing support agents - Video intelligence tools - Healthcare knowledge systems ## **Evaluation, failure modes, and best practices for retrieval-augmented generation** Evaluation should measure retrieval quality, grounding strength, cross-modal reasoning, and citation accuracy. It is not enough to know whether the final answer sounds plausible; teams need to know whether the system found the right evidence and cited it correctly. Common failure modes include over-relying on text, using lossy captions, routing to the wrong retriever, missing temporal context in video, or trusting unverified multimodal reasoning. The best systems start with real user questions, preserve native multimodal evidence, combine hybrid retrieval with reranking, use modality-aware citations, deploy agents selectively for complex workflows, and manage cost, privacy, and permissions. - Evaluate rigorously by modality - Preserve native multimodal evidence - Combine hybrid retrieval with reranking - Use citations for pages, timestamps, bounding boxes, and source files - Deploy agents selectively for complex workflows - Manage cost, privacy, and permissions ## **How Calypso powers modern retrieval-augmented generation** Multimodal retrieval-augmented generation is becoming more agentic, vision-native, and capable. Vision-space methods, advanced Video RAG, and intelligent agents will continue driving progress. For teams ready to implement powerful multimodal retrieval-augmented generation without starting from scratch, Calypso provides a production-ready multimodal RAG layer powered by Gemini File Search. Calypso handles PDFs, slides, charts, diagrams, screenshots, and other files with citations and metadata-aware filtering. Its OpenAI-compatible API makes it easy to integrate grounded answers into websites, custom agents, n8n workflows, and internal tools. Whether you are building a simple document assistant or a sophisticated agentic multimodal retrieval-augmented generation system, Calypso helps accelerate time-to-value while keeping answers grounded and verifiable. ## **Sources ****6** Links used to ground claims in this article. - **1****AC**
**Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation (Abootorabi et al.)**aclanthology.orgaclanthology.org/2025.findings-acl.861 - **2****AR**
**ColPali: Efficient Document Retrieval with Vision Language Models (Faysse et al.)**arxiv.orgarxiv.org/abs/2407.01449 - **3****AR**
**VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos (Ren et al.)**arxiv.orgarxiv.org/abs/2502.01549 - **4****LL**
**LlamaIndex Multimodal RAG Guide: Index Text + Images**llamaindex.aillamaindex.ai/blog/multimodal-rag-in-llamacloud - **5****DO**
**LangChain Docs: Build a custom RAG agent with LangGraph**docs.langchain.comdocs.langchain.com/oss/python/langgraph/agentic-rag - **6****CA**
**Calypso.so: practical multimodal RAG platform**calypso.socalypso.so**Put Calypso RAG to work**## **Turn grounded answers into a production-ready product surface.** Use one retrieval layer across your website, PDFs, docs, workflows, and internal tools without losing citations, trust, or speed to launch. [**See live demo **](https://www.calypso.so/) [**Get Started for Free **](https://rag.calypso.so/join)
---
### Calypso RAG Answer Library | Grounded Q&A on Gemini File Search
Source: https://www.calypso.so/answer-library
Description: Browse source-backed answers about Gemini File Search, grounded responses, PDFs, answer libraries, agents, and reusable deployment surfaces in Calypso RAG.
h1. **Answer Library **
Browse source-backed answers about Calypso RAG, Gemini File Search, grounded responses, PDFs, and reusable deployment surfaces.
**Current view **## **2 answers available**[FeaturedMultimodal RAG** Updated Jun 5, 2026****Multimodal Retrieval-Augmented Generation and Multimodal RAG Agents: The Complete 2026 Guide**
Learn how multimodal retrieval-augmented generation extends classic RAG across text, images, layouts, tables, audio, video, and agentic workflows.Article** Read answer **](https://www.calypso.so/answer-library/retrieval-augmented-generation) [FeaturedGemini File Search** Updated May 28, 2026****What is Google File Search? Gemini API File Search Tool, Multimodal RAG, PDFs, Images, and Citations Explained**
Learn what the Gemini API File Search tool is, how it powers Google File Search RAG, and how multimodal retrieval works across PDFs, docs, images, screenshots, charts, diagrams, metadata, and citations.Article** Read answer **](https://www.calypso.so/answer-library/what-is-google-file-search)**Put Calypso RAG to work**## **Turn grounded answers into a production-ready product surface.** Use one retrieval layer across your website, PDFs, docs, workflows, and internal tools without losing citations, trust, or speed to launch. [**See live demo **](https://www.calypso.so/) [**Get Started for Free **](https://rag.calypso.so/join)
---
### Multimodal Gemini File Search RAG for Websites & AI Agents | Calypso
Source: https://www.calypso.so/
Description: Build multimodal Gemini File Search RAG for websites and AI agents. Answer from PDFs, docs, images, diagrams, and help content with citations.
**Gemini File Search-powered multimodal RAG **
h1. **Multimodal RAG for source-backed AI answers **
Launch a hosted Gemini File Search RAG layer that answers from your PDFs, docs, screenshots, charts, diagrams, help center, FAQs, and images with citations people can verify. Calypso gives your website, AI agents, workflows, and product UI one reusable knowledge layer for grounded answers across text and visual content.
[**Launch SpaceX filing demo **](https://www.calypso.so/demos/spacex-ipo-filing) [**Get Started for Free **](https://rag.calypso.so/join) [**Read docs **](https://docs.calypso.so/)
[**MCP Server****@calypsohq/multimodal-rag-mcp-server** Connect Calypso RAG to Cursor, Claude Desktop, and MCP-compatible AI clients.](https://www.npmjs.com/package/@calypsohq/multimodal-rag-mcp-server) [**n8n Node****@calypsohq/n8n-nodes-calypso** Ask agents, load named profiles, inspect buckets, and upload files from n8n workflows.](https://www.npmjs.com/package/@calypsohq/n8n-nodes-calypso)
**Why it feels reliable **
h2. **Search every format your users rely on. **
Calypso turns Gemini File Search into a polished multimodal RAG layer for teams that need answers grounded in real company knowledge, not generic chatbot guesses.
**Inputs **
**Text + visuals **
**Output **
**Cited answers **
**Multimodal**
h3. **Search text, PDFs, screenshots, charts, and diagrams**
One retrieval layer for the real files your users rely on: documentation, visuals, help content, product images, reports, and internal knowledge.
**01**
**Grounded**
h3. **Show the evidence behind every answer**
Return answers with source references, page-aware citations, and retrieval metadata so users can verify before they trust.
**02**
**Context-aware**
h3. **Filter retrieval by customer, team, file type, or status**
Scope answers by workspace, department, language, use case, status, or metadata without duplicating knowledge bases.
**03**
**Reusable**
h3. **Deploy the same knowledge layer everywhere**
Use one multimodal RAG layer across your website widget, AI agents, n8n workflows, endpoints, and product UI.
**04**
**How it works **
h2. **From scattered knowledge to verifiable AI answers **
Connect multimodal content, let Gemini File Search retrieve the right text and visual context, and return answers users can verify before they trust.
**01**
**Connect your multimodal knowledge**
Upload or connect the content your team already depends on: PDFs, documentation, help center pages, screenshots, diagrams, charts, images, policies, FAQs, and internal files.
**02**
**Let Gemini File Search retrieve the right context**
Calypso uses Gemini File Search to retrieve relevant text and visual context, so answers can be grounded in the actual source material instead of a generic model response.
**03**
**Show users the evidence**
Return answers with citations, source references, and page-aware grounding so users can verify the response before they trust it.
**04**
**Ship the answer layer anywhere**
Use the same multimodal RAG layer in your website widget, AI agents, workflow automations, support flows, sales assistants, and custom product experiences.
**What multimodal RAG means **
h2. **Retrieval that understands more than text. **
Multimodal RAG lets your answer layer retrieve from the documents, screenshots, diagrams, charts, and metadata your team already uses every day.
**Simple idea **
Not just “chat with docs.” Calypso helps users ask questions across the full knowledge surface: written files, visual files, and source metadata.
**Text**
**Docs, FAQs, policies**
**Visuals**
**Screenshots, charts, diagrams**
**Files**
**PDFs, manuals, reports**
**Metadata**
**Workspace, language, status**
**Answer flow **
**01**
h3. **Retrieve across formats**
Search written content, visual artifacts, and structured file context together.
**02**
h3. **Ground the response**
Use Gemini File Search context before the model writes an answer.
**03**
h3. **Return cited output**
Show users the source trail behind the answer so they can verify it.
**Users get a source-backed answer they can trust. **
Calypso packages retrieval, grounding, citations, and deployment into one reusable answer layer.
**What you can build **
h2. **One multimodal RAG layer. Every answer surface. **
Deploy Calypso once, then reuse the same grounded knowledge layer across your website, agents, workflows, internal tools, and product UI.
**Deployment model **
h3. **Ship the knowledge layer once. **
Your files, citations, metadata, and retrieval logic stay unified. Each surface gets the same source-backed answers without rebuilding a new RAG stack.
**Core **
**Calypso RAG **
**Website widget**
**AI agents**
**n8n workflows**
**Product UI**
**OpenAI-compatible API**
**Internal tools**
**Website**
h3. **Answer widget for high-intent visitors**
**01**
Turn docs, PDFs, FAQs, help content, and product visuals into instant source-backed answers on your site.
**“Does this support citations from PDFs?”**
**Support**
h3. **Knowledge assistant for customer issues**
**02**
Help users resolve questions from screenshots, troubleshooting guides, policies, manuals, and support articles.
**“Why is this setup screen failing?”**
**Sales**
h3. **Sales enablement assistant**
**03**
Answer questions from pricing sheets, comparison docs, product PDFs, case studies, diagrams, and internal material.
**“Which plan fits a 20-person team?”**
**Product**
h3. **Documentation agent for users**
**04**
Let users ask across docs, API references, onboarding flows, visual guides, screenshots, and release material.
**“How do I connect this API endpoint?”**
**Workflows**
h3. **Retrieval layer for automations**
**05**
Reuse grounded answers inside n8n, agent endpoints, internal workflows, customer operations, and product actions.
**“Summarize the right policy for this ticket.”**
**Pricing **
h2. **Hosted RAG plans that feel simple. **
Start with a lightweight hosted workspace, move into flexible usage when volume grows, and step up to enterprise controls when the workload gets serious.
**Plan**
h3. **RAG Plus**
**$17**/month
Hosted knowledge workspace with clear limits.
**+ **
Hosted RAG workspace setup included
**+ **
RAG requests: 50 / month
**+ **
Great for launching your knowledge base
[**Free Trial**](https://rag.calypso.so/join)
**Recommended**
h3. **RAG Pro**
**Best value **
**$41**/month
Hosted knowledge workspace with flexible usage.
**+ **
Hosted RAG workspace setup included
**+ **
RAG requests: 200 included
**+ **
Usage-based pricing by section
**+ **
Priority support
**+ **
Analytics
[**Free Trial**](https://rag.calypso.so/join)
**Plan**
h3. **Enterprise**
**Contact us**
Maximum performance and controls for hosted knowledge workspaces.
**+ **
Everything in RAG Pro
**+ **
Higher included usage and priority throughput
**+ **
Advanced controls and reporting
**+ **
Team-based project support
[**Talk to us**](https://rag.calypso.so/join)
**Questions and answers **
h2. **Questions teams ask before they ship multimodal RAG. **
Most questions come down to retrieval, source coverage, citations, and whether the same answer layer can be reused beyond the website.
**Launch Calypso RAG **
h2. **Launch multimodal Gemini File Search RAG anywhere your users need answers. **
Calypso is the fastest way to launch multimodal Gemini File Search RAG for your website, agents, workflows, and product UI, with grounded answers users can verify.
[**Explore the SpaceX RAG demo **](https://www.calypso.so/demos/spacex-ipo-filing) [**Get Started for Free **](https://rag.calypso.so/join)
---
### What is Google File Search? Gemini API File Search Tool, Multimodal RAG & Citations Explained
Source: https://www.calypso.so/answer-library/what-is-google-file-search
Description: Learn what the Gemini API File Search tool is, how it powers Google File Search RAG, and how multimodal retrieval works across PDFs, docs, images, screenshots, charts, diagrams, metadata, and citations.
h1. **What is Google File Search? Gemini API File Search Tool, Multimodal RAG, PDFs, Images, and Citations Explained**
Learn what the Gemini API File Search tool is, how it powers Google File Search RAG, and how multimodal retrieval works across PDFs, docs, images, screenshots, charts, diagrams, metadata, and citations.

**Calypso Research**
50 min read·May 18, 2026
h2. **Answer ** The Gemini API File Search tool, often searched for as Google File Search, is Google’s managed retrieval-augmented generation system for grounding Gemini responses in your own files. ## **What is Google File Search?** It lets developers upload, index, retrieve, and cite information from documents and multimodal content without building a custom vector database pipeline from scratch. The simplest way to think about the Gemini API File Search tool is this: it gives developers managed RAG infrastructure directly inside the Gemini API. File Search handles the document ingestion, chunking, embedding, indexing, retrieval, and citation workflow that teams would otherwise need to assemble with a parser, embedding model, vector database, retriever, and custom grounding layer. With Gemini Embedding 2, the File Search tool now supports multimodal RAG across text and images. That means teams can build AI answer experiences over PDFs, docs, screenshots, charts, diagrams, product images, visual documentation, help center content, and other mixed-format knowledge bases. For product teams, this changes the shape of RAG. Instead of building a chatbot that only searches text, you can build a verifiable AI answer layer that retrieves from the actual knowledge your company uses every day: documentation, manuals, PDFs, visual guides, diagrams, forms, tables, screenshots, reports, and internal files. ## **Key features of the Gemini API File Search tool** The Gemini API File Search tool abstracts away much of the classic RAG stack. Instead of manually preparing files, chunking content, generating embeddings, storing vectors, running similarity search, injecting context, and mapping citations back to source files, developers can create a File Search Store and attach the File Search tool when calling Gemini. The core advantage is speed to production. The File Search tool helps teams move from “we need RAG” to “we have grounded answers with citations” without operating a custom retrieval infrastructure layer. - Managed file ingestion, chunking, embedding, indexing, and retrieval - Semantic search powered by Gemini embeddings - Multimodal retrieval with Gemini Embedding 2 - Support for text files, code files, rich documents, PDFs, and images - Page-level citations for paged documents - Media citations for referenced image content - Custom metadata for filtered retrieval - A managed File Search Store for organizing indexed knowledge - Grounding metadata that helps applications show where answers came from - A cost model where storage and query-time embeddings are free, while indexing embeddings and normal Gemini input/output tokens are billed ## **What makes the Gemini API File Search tool multimodal?** Traditional RAG systems usually start with text. They extract text from documents, split that text into chunks, embed those chunks, and retrieve the most relevant passages at query time. That works well when the knowledge is mostly written. But many real-world files are not purely textual. PDFs often contain charts, tables, diagrams, screenshots, product images, forms, visual instructions, and page layouts where the meaning depends on what the user can see, not just the words that can be extracted. This is where traditional RAG starts to lose context. In older RAG pipelines, visual content usually had to be converted into text first. A screenshot, chart, scanned page, product image, or diagram would be passed through OCR or image captioning, then the generated text would be embedded and searched. That approach can be useful, but it is not ideal. OCR can miss visual structure, layout, colors, relationships, axes, icons, annotations, image quality, and the spatial meaning that makes diagrams, charts, and screenshots useful in the first place. For example, OCR might capture the words inside a product screenshot, but miss the interface state. It might extract labels from a chart, but lose the trend, shape, legend, or visual comparison. It might read text inside a diagram, but fail to preserve how the components connect to each other. Multimodal RAG goes further. When a File Search Store is configured with Gemini Embedding 2, the Gemini API File Search tool can support retrieval across both text and image content. Instead of forcing every visual asset through a text-only pipeline, File Search can make visual information searchable alongside written documents. That means a knowledge base can include PDFs, docs, screenshots, charts, diagrams, product images, help center articles, and visual guides in the same retrieval layer. This matters because company knowledge is rarely just text. It lives inside documents, but also inside the visual material embedded in those documents. A multimodal File Search workflow helps preserve more of that context, making AI answers more useful for real product, support, sales, research, compliance, and documentation use cases. - Product screenshots - Charts and graphs - Architecture diagrams - Sequence diagrams - Visual troubleshooting guides - Product images - Scanned forms - Slide decks - PDF manuals - Reports with tables and figures - Help center articles with embedded visuals - Design system references - Real estate floor plans and listing images - Research documents with charts or diagrams ## **How the Gemini API File Search workflow works** The File Search workflow has two main phases: indexing and retrieval. First, you create a File Search Store. This store acts as the managed retrieval layer for your files. You upload documents, images, or other supported content into the store, and the Gemini API handles the processing pipeline. Second, you call Gemini with the File Search tool attached. Gemini retrieves relevant context from the File Search Store, uses that context during generation, and returns a grounded answer. For text-heavy RAG, a text-oriented embedding setup may be enough. For multimodal RAG, the important step is creating the store with `models/gemini-embedding-2`. - Create a File Search Store. - Choose the right embedding model. - Upload and index files. - Attach the File Search tool when calling Gemini. - Receive a grounded response. - Inspect grounding metadata for source citations, page numbers, or media references.**python**Example snippet 1```
from google import genai
``` 2```
from google.genai import types
``` 3```
import time
``` 4```
``` 5```
client = genai.Client()
``` 6```
``` 7```
h1. Create a File Search Store for multimodal RAG.
``` 8```
file_search_store = client.file_search_stores.create(
``` 9```
config={
``` 10```
"display_name": "company-knowledge-base",
``` 11```
"embedding_model": "models/gemini-embedding-2"
``` 12```
}
``` 13```
)
``` 14```
``` 15```
h1. Upload and index a file.
``` 16```
operation = client.file_search_stores.upload_to_file_search_store(
``` 17```
file="product-guide.pdf",
``` 18```
file_search_store_name=file_search_store.name,
``` 19```
config={
``` 20```
"display_name": "Product Guide"
``` 21```
}
``` 22```
)
``` 23```
``` 24```
while not operation.done:
``` 25```
time.sleep(5)
``` 26```
operation = client.operations.get(operation)
``` 27```
``` 28```
h1. Query Gemini with the File Search tool.
``` 29```
response = client.models.generate_content(
``` 30```
model="gemini-2.5-flash",
``` 31```
contents="What does the product guide say about onboarding users?",
``` 32```
config=types.GenerateContentConfig(
``` 33```
tools=[
``` 34```
types.Tool(
``` 35```
file_search=types.FileSearch(
``` 36```
file_search_store_names=[file_search_store.name]
``` 37```
)
``` 38```
)
``` 39```
]
``` 40```
)
``` 41```
)
``` 42```
``` 43```
print(response.text)
```## **Supported data and multimodal content** The Gemini API File Search tool supports a wide range of files for retrieval workflows, including documents, structured files, code files, spreadsheets, and presentations. For multimodal use cases, File Search can work with images when the File Search Store is configured with Gemini Embedding 2. This lets teams search visual assets alongside text documents. That means a single knowledge base can support questions over many different content types. This is one of the most important reasons the File Search tool is becoming more useful for production AI applications. The system can work closer to how teams actually store knowledge: across text, visuals, structured files, and long-form documents. - PDFs - Word documents - Markdown - TXT - JSON - CSV - Excel files - PowerPoint files - HTML - XML - SQL - Shell scripts - JavaScript - TypeScript - Source code files - Rich documents - Image files for multimodal retrieval - PDFs and manuals - Help center articles - Internal documentation - Product screenshots - Technical diagrams - Charts and reports - Visual onboarding guides - Product images - Slide decks - Tables and structured files - Code and developer documentation ## **Page-level citations and media citations** Citations are one of the most important parts of a trustworthy RAG experience. When Gemini answers with File Search context, the response can include grounding metadata that identifies which uploaded files or chunks supported the answer. For paged documents, that metadata can include page numbers. This is especially useful for reports, manuals, research papers, policies, contracts, compliance files, and long technical documents. For multimodal retrieval, File Search can also return media references. When image content supports an answer, developers can use the grounding metadata to show users the actual visual source behind the response. That creates a much stronger answer experience. Instead of only saying: “This answer came from the product guide.” Your application can say: “This answer came from page 14 of the product guide and this specific referenced image.” That is the difference between a generic AI chatbot and a verifiable answer layer. ## **Custom metadata and filtered retrieval** As a knowledge base grows, retrieval quality depends on more than semantic similarity. A user’s question may be semantically similar to many files, but only some of those files should be eligible for the answer. A support question might need only approved help center content. A sales question might need only current pricing sheets. A customer-specific question might need only files from that customer’s workspace. Custom metadata helps solve this. With File Search metadata, developers can attach labels to files and then filter retrieval at query time. This is essential for production RAG because it reduces irrelevant retrieval, keeps answers scoped to the right context, and helps prevent outdated or unauthorized files from being used. Metadata filtering is especially useful for multi-tenant SaaS applications, customer-specific knowledge bases, internal company assistants, legal and compliance workflows, support knowledge systems, sales enablement tools, product documentation agents, and teams with draft and approved content states. Good retrieval is not only about finding relevant content. It is about finding the right relevant content. - `department: support`- `department: legal`- `status: approved`- `status: draft`- `customer: acme`- `language: english`- `content_type: onboarding`- `source: help_center`- `product: enterprise`- `version: 2026`## **Gemini API File Search vs. traditional RAG** Traditional RAG gives teams maximum control, but it also creates more operational work. A custom RAG stack often requires many moving parts. The Gemini API File Search tool handles much of this inside the Gemini API. That tradeoff is important. For many product teams building answer widgets, support assistants, internal knowledge tools, and workflow-ready agents, the File Search tool is a strong default because it reduces the amount of infrastructure required to ship. - Document parsing - Text extraction - Chunking logic - Embedding generation - Vector database setup - Similarity search - Retrieval ranking - Context injection - Citation mapping - Monitoring and tuning - File lifecycle management - Use the Gemini API File Search tool when you want fast setup, managed ingestion, managed indexing, built-in citations, multimodal retrieval, metadata filtering, lower infrastructure burden, and a clean path to production RAG experiences. - Use a custom RAG stack when you need highly specialized ranking, full control over chunking and retrieval, custom hybrid search, existing vector database infrastructure, advanced governance outside the retrieval layer, deep integration with enterprise search systems, or complex retrieval logic beyond what the managed tool provides. ## **Comparison: text-only RAG vs. multimodal File Search RAG** The biggest difference is that multimodal File Search can retrieve from visual content directly. That makes it more useful for real company knowledge, where answers often depend on more than text. | **Capability** | **Text-only RAG** | **Gemini API File Search tool with multimodal retrieval** |
| --- | --- | --- | | Search text documents | Yes | Yes | | Search PDFs | Often, after parsing | Yes | | Search screenshots | Usually limited | Yes, with multimodal support | | Search product images | Usually no | Yes, with Gemini Embedding 2 | | Retrieve charts and diagrams | Limited | Stronger fit | | Support citations | Usually custom-built | Built-in grounding metadata | | Support page-level citations | Usually custom-built | Supported for paged documents | | Support image/media references | Rare | Supported through media citations | | Require a vector database | Usually yes | No separate vector database required | | Support metadata filtering | Custom-built | Supported through custom metadata | ## **Benefits of the Gemini API File Search tool** The main benefit of File Search is that it removes much of the infrastructure burden from RAG. Without File Search, teams usually need to build or integrate multiple systems before they can answer from their own knowledge. With the File Search tool, the ingestion and retrieval layer is managed through the Gemini API. For product teams, this means more time can go into the user experience: answer quality, citation design, workflows, permissions, routing, analytics, and deployment. - Faster RAG prototyping - Less infrastructure to maintain - Built-in grounding and citations - Support for multimodal retrieval - Metadata filtering for scoped answers - A managed knowledge store for files - A cleaner path from prototype to production - Less work around parsing, chunking, embedding, and retrieval plumbing ## **Use cases for multimodal File Search RAG** Website answer widgets can answer visitor questions from product docs, FAQs, PDFs, help center pages, screenshots, visual guides, and internal product knowledge. Instead of making users search across dozens of pages, the site can return a grounded answer with source references. Support teams can use File Search to answer from troubleshooting guides, screenshots, bug reports, help articles, policy PDFs, and product documentation. Multimodal retrieval is especially useful when support knowledge depends on visual steps, interface screenshots, or annotated guides. Product and developer documentation often contains code, screenshots, diagrams, tables, release notes, and long-form guides. The Gemini API File Search tool can help turn mixed-format documentation into an AI assistant that gives answers with citations. Sales teams can retrieve answers from pricing PDFs, product screenshots, case studies, comparison docs, onboarding decks, and enablement material. This makes it easier to answer buyer questions quickly while keeping responses grounded in approved content. Engineering teams often store knowledge in architecture diagrams, ERDs, sequence diagrams, implementation notes, design docs, and code files. Multimodal retrieval makes those assets easier to search and reuse inside internal knowledge tools, coding assistants, and engineering copilots. Reports, research papers, market analysis, and technical PDFs often include charts, figures, tables, and diagrams. File Search can retrieve relevant context and help cite where the answer came from, which is especially important for research, compliance, and decision-support workflows. Real estate workflows often depend on photos, floor plans, PDFs, maps, listing descriptions, and structured property data. Multimodal retrieval can help applications search across both visual and textual property knowledge. Design teams can use multimodal retrieval to search component libraries, screenshots, brand assets, mockups, and documentation by visual appearance or natural language description. This is difficult to do with text-only retrieval. ## **Pricing and billing** The Gemini API File Search tool is designed to reduce the cost and complexity of operating RAG. The pricing model is centered on indexing and Gemini model usage. In practice, this means the main File Search-specific cost is preparing the knowledge base, not running every future retrieval query against already indexed content. That model is attractive for teams that expect many repeated questions over a relatively stable knowledge base. - File storage is free. - Query-time embeddings are free. - Embeddings are billed when files are indexed. - Retrieved document tokens are charged as regular context tokens. - Normal Gemini model input and output token costs still apply. ## **Limitations to know** The Gemini API File Search tool is powerful, but it is not a complete replacement for every retrieval architecture. The most important point: File Search handles retrieval infrastructure, but your application still needs good product design around trust, permissions, citations, analytics, escalation, and workflow integration. - Some file types and modalities may not be supported. - File size and store size limits apply. - Tool compatibility constraints may apply depending on the Gemini API configuration. - Teams with highly specialized ranking needs may still need custom retrieval infrastructure. - Some governance, permissions, and review workflows must still be implemented at the application layer. - The quality of the answer experience still depends on source quality, file organization, metadata design, and UX. ## **How Calypso uses the File Search opportunity** Calypso turns the Gemini API File Search tool into a production-ready answer layer for websites, agents, workflows, and product experiences. The File Search tool handles the managed retrieval foundation. Calypso focuses on the product layer around it: polished answer UX, reusable knowledge surfaces, source-backed responses, workflow integrations, and deployment paths for teams that want to ship faster. Instead of building a custom RAG application from scratch, teams can use Calypso to turn docs, PDFs, screenshots, charts, diagrams, help content, and images into a grounded AI experience. In other words, the Gemini API File Search tool provides the retrieval infrastructure. Calypso helps turn that infrastructure into a product surface users can trust. - Add an AI answer widget to a website - Build source-backed product support - Give agents access to grounded company knowledge - Reuse the same retrieval layer across workflows - Turn PDFs, docs, and visual content into verifiable answers - Launch faster without building a custom vector database pipeline - Show citations and source references in the product experience ## **Getting started with the Gemini API File Search tool** To get started, developers typically follow this path. For multimodal RAG, create the store with Gemini Embedding 2 so the system can support retrieval across text and image content. For production applications, also plan the surrounding product layer. A great RAG product is not just retrieval. It is retrieval plus trust, usability, and workflow design. - Create a File Search Store. - Choose the embedding model. - Upload files into the store. - Wait for indexing to complete. - Call Gemini with the File Search tool attached. - Display the answer. - Show citations or source references. - Add metadata filters as the knowledge base grows. - Connect the same retrieval layer to your product, website, agents, or workflows. - File permissions - Workspace or customer scoping - Metadata strategy - Citation UI - Fallback behavior - Analytics - Escalation paths - Evaluation workflows - Source refresh logic ## **Conclusion** The Gemini API File Search tool is one of the clearest ways to build managed RAG with Gemini. It gives developers a way to upload files, index knowledge, retrieve relevant context, and return grounded answers with citation metadata without building a custom vector database stack. The most important shift is multimodal RAG. Modern company knowledge is not just text. It lives in screenshots, charts, diagrams, product images, slide decks, PDFs, forms, manuals, research documents, and visual guides. With Gemini Embedding 2, metadata filtering, page-level citations, and media references, File Search can power AI experiences that are more useful, more verifiable, and closer to the way teams actually store knowledge. For teams building AI answer widgets, support assistants, product documentation agents, internal knowledge tools, or workflow-ready AI systems, the Gemini API File Search tool provides the retrieval foundation. Calypso helps turn that foundation into a polished, source-backed product experience. ## **Sources ****6** Links used to ground claims in this article. - **1****BL**
**Introducing the File Search Tool in Gemini API**blog.googleblog.google/innovation-and-ai/technology/developers-tools/file-search-gemini-api - **2****AI**
**File Search | Gemini API | Google AI for Developers**ai.google.devai.google.dev/gemini-api/docs/file-search - **3****ME**
**Using Gemini File Search Tool for RAG (Rickbot Blog)**medium.commedium.com/google-cloud/using-gemini-file-search-tool-for-rag-a-rickbot-blog-b6c4f117e5d3 - **4****BL**
**Gemini API File Search is now multimodal**blog.googleblog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag - **5****VE**
**Why Google’s File Search could displace DIY RAG stacks in the enterprise**venturebeat.comventurebeat.com/ai/why-googles-file-search-could-displace-diy-rag-stacks-in-the-enterprise - **6****LI**
**Google Gemini just dropped a game-changing RAG feature!**linkedin.comlinkedin.com/posts/samwitteveen_ai-rag-gemini-activity-7393311986182320128-I2eN**Put Calypso RAG to work**## **Turn grounded answers into a production-ready product surface.** Use one retrieval layer across your website, PDFs, docs, workflows, and internal tools without losing citations, trust, or speed to launch. [**See live demo **](https://www.calypso.so/) [**Get Started for Free **](https://rag.calypso.so/join)
---