LoRA fine-tuning Llama 3 for domain Q&A with eval gates
AI Builders Team
Community Starter · Jun 10, 2026
Workflow: 1) Data: Curate 20-50k high-quality domain Q&A pairs; deduplicate; add hard negatives. 2) Split: Train/val/test by document origin to avoid leakage. 3) Base: Choose Llama 3 8B/70B depending on latency budget. 4) LoRA: r=16-32, alpha=32, dropout=0.05; bf16; gradient accumulation to fit GPU. 5) Training: 1-3 epochs with cosine LR; early stop on val loss. 6) Eval: Automatic metrics (exact match, BLEU), model-based grading for helpfulness/faithfulness; test on counterfactuals. 7) Safety: Refusal tuning for out-of-scope; add jailbreak-resistant samples. 8) Deployment: Merge or keep LoRA; quantize (AWQ/QLoRA) for inference; load in vLLM. 9) Monitoring: Drift alerts, per-topic error heatmaps; dataset shift checks. 10) Rollout: Shadow vs canary with human review.