How We Stopped Fine-Tuning and Started Querying: Real-Time RAG with DeepSeek at Rast Mobile
At Rast Mobile, we recently built a fully local RAG system using DeepSeek 32B, integrated with live company databases, and we didn’t need any fine-tuning.
That’s right. No custom training. No embedding pipelines.
Just smart use of metadata, live SQL, and prompt engineering. Here’s how we did it.
- The Realisation: You Don’t Need Fine-Tuning
We started this project thinking we might need to fine-tune a model to understand our client’s business data. But as we dug in, it became clear:
RAG isn’t about teaching the model new knowledge. It’s about giving it the right context at the right time.
And in our case, that context was structured business data from Postgresql — and we already had it.
- LLM’s Job: SQL Generator + Answer Enhancer
So instead of fine-tuning, we used the LLM in two key ways:
- Generate SQL based on user questions and table metadata
- Enhance and rephrase the result into a meaningful, user-friendly explanation
That’s it.
- Our RAG Strategy: Simple, Effective, Local
Here’s what the system does step by step:
- Receive natural language question from user
→ e.g., “How many orders were delivered last week?” - LLM reviews metadata (table names, column types, relationships, samples)
→ No embeddings. Just descriptive metadata and sample rows in context. - LLM generates SQL query
→ e.g.,SELECT COUNT(*) FROM orders WHERE status = 'delivered' AND created_at >= now() - interval '7 days';
- The system executes a query on the live PostgreSQL DB
- LLM wraps the result into a readable, customized answer
→ “There were 187 delivered orders in the last 7 days.”
- What Worked Well
- No need for fine-tuning
- No vector databases or embeddings
- Just smart prompt templates + live metadata injection
- Works with any SQL-compatible DB (PostgreSQL, MySQL, MSSQL, etc.)
- All On-Premise
We run everything locally using DeepSeek 32B, so data privacy is fully preserved. No API calls to OpenAI, no cloud storage. Great for enterprises with strict security policies.
- Live Use Case
The system is now running at one of our partner companies. Non-technical team members can ask business questions and get real-time answers without needing dev support or writing SQL.
It’s like giving them a data analyst who never sleeps.
- Tech Stack Summary
- 🔁 DeepSeek 32B (running locally with vLLM)
- 🧩 Prompt templates dynamically injected with:
- Table names + descriptions
- Column metadata
- Sample rows
- 💾 PostgreSQL (but adaptable to any SQL DB)
- 🧠 RAG logic: Generate SQL → Query DB → Enhance Answer
- What You Can Take Away
If you’re building a RAG system for structured data:
- Skip fine-tuning for SQL-based tasks
- Use rich metadata and prompt engineering
- Let LLMs assist SQL writing, not replace your logic layer
- Think of RAG as contextual I/O, not knowledge training
- Want to Learn More?
We’re now offering this solution as a service at Rast Mobile — with full support for local hosting, custom prompt engineering, and SQL integration.
Mail me or visit rastmobile.com if you want to use LLMS as live data copilots in your business.