Research#rag#embeddings

RAG on a small budget — what actually works?

priya_data

May 29, 2026

Building a doc-QA feature for a client with almost no infra budget. Current setup: chunk at ~500 tokens, small embedding model, pgvector on a free-tier Postgres, then rerank the top 20 with the LLM itself. Works surprisingly well. What are your cheap-but-good tricks?

💬 2 replies👁 360 views

💬 2 Comments

dev_arjun

12d ago

Cache aggressively. Half of user questions repeat — we answer those from cache and the LLM bill dropped 60%.

tomas_ml

14d ago

Hybrid search. Adding plain BM25 next to vectors fixed more retrieval misses than any embedding upgrade ever did, and it's basically free.

Related Discussions

Open-source vs closed models in production — 2026 reality check

Research · tomas_ml

▲ 25💬 2