Research#rag#embeddings

RAG on a small budget — what actually works?

priya_data

May 29, 2026

Building a doc-QA feature for a client with almost no infra budget. Current setup: chunk at ~500 tokens, small embedding model, pgvector on a free-tier Postgres, then rerank the top 20 with the LLM itself. Works surprisingly well. What are your cheap-but-good tricks?

💬 2 replies👁 360 views

💬 2 Comments

Login to join the conversation

Login to Comment

dev_arjun

12d ago

Cache aggressively. Half of user questions repeat — we answer those from cache and the LLM bill dropped 60%.

tomas_ml

14d ago

Hybrid search. Adding plain BM25 next to vectors fixed more retrieval misses than any embedding upgrade ever did, and it's basically free.

Related Discussions