Home

About

Services

Blog

Contact

🚀 Join my masterclass

Home

About

Services

Blog

Contact

🚀 Join my masterclass

Home

About

Services

Blog

Contact

Nov 10, 2025

AI Advice

Every AI team hits a wall. Ours had ‘RAG’ written all over it.

RAG Hit a Wall - How We Rebuilt Legal AI to Actually Scale

Every AI team hits a wall. Ours had RAG written all over it.

When I joined Jus Mundi, our first Legal AI worked and delivered value. But as we expanded to more real legal workflows, it stopped scaling. The stack looked perfect on paper, closely following the well known Retrieval-Augmented Generation for Large Language Models: A Survey - and that was exactly the problem. To understand why, it helps to remember what RAG promises in theory and where it fails in practice for high precision domains like law. For an overview of the RAG paradigm and evaluation methods, see the latest arXiv surveys. (arXiv)

Where textbook RAG broke for us

1) No clear OMTM - One Metric That Matters
Were we optimizing for precision, recall, or answer quality. Trying to maximize everything meant we optimized nothing the way lawyers define good.

2) Metric saturation
Raising top k should lift recall. If recall flattens even as k increases, your pipeline has a structural ceiling that no LLM swap will fix.

3) Over engineering retrieval
Legal retrieval needs domain adaptation. We drifted into brittle rules and if elses - the symbolic AI trap. Domain logic must be learned, not hardcoded.

4) The lawyer problem
Legal language is unforgiving. Mixing rule versions or confusing parties kills credibility. Citations are a minefield. Even great general purpose models make these mistakes without better retrieval and orchestration.

This is why RAG demos can shine while production systems stall. Industry wide excitement about RAG is justified - but adoption at scale still demands domain specific choices. (TIME)

The 4 step rebuild that worked

1) Pick an OMTM - we chose recall

In law, missing a key precedent is not an option. We anchored on recall first, then layered precision and answer quality as second order effects. If you cannot reliably retrieve everything that matters, nothing downstream can rescue the answer.

What we measured: recall at k curves, coverage versus token cost, citation faithfulness, and user perceived completeness.

2) Rebuild embeddings for arbitration and legal text

Off the shelf embeddings saturated too early. We trained for legal semantics so we could push k higher without drowning the LLM in noise. That unlocked richer context while keeping token costs in check.

Heuristic to track: k → tokens → latency → recall. If recall does not move as k increases, fix embeddings and retrieval before touching the model.

For background on RAG components and evaluation, see the arXiv survey synthesis. (arXiv)

3) De symbolize retrieval

Instead of hand written rules, we moved to learned selection guided by legal signals like document type, parties, procedural posture, dates, and clause versions. Keep heuristics small, auditable, and backed by data.

4) Build a proprietary multi agent system that thinks like a lawyer

We designed orchestration that mirrors how arbitration lawyers actually work: a planner to decompose the question, a researcher to run targeted passes, an analyst to separate arguments and versions, and a citer to ground conclusions with precise, checkable references.

We also removed abstraction layers that added complexity without value for our domain. For teams relying on generic agent frameworks, evaluate the tradeoffs carefully and read the platform’s own positioning about agents and orchestration to decide what you should own. See the official LangChain site and repository for context. (LangChain)

Results in under four months

Approximately 125 percent higher recall versus our previous stack
Significant gains in citation faithfulness and argument separation
More use cases unlocked without runaway token costs thanks to better retrieval and tighter orchestration

Checklist - signs your RAG is stuck in demo land

Recall at k flattens even as you raise k
Retrieval logic is a thicket of rules
Citations mix up parties or rule versions
Token costs rise without quality gains
User definition of good does not match your dashboard
Latency drifts due to excessive context packing and retries

If two or more resonate, your bottleneck is likely retrieval, not the LLM.

Practical playbook

1) Choose your OMTM explicitly
In regulated domains, start with recall. Add precision and answer quality as measurable layers.

2) Train or adapt embeddings
Domain tuned embeddings beat generic ones on recall and coverage at a reasonable k.

3) Replace brittle rules with learned retrieval
Use domain signals and re rankers. Keep heuristics simple and reversible.

4) Own orchestration where it matters
Mirror expert workflows. Separate planning, retrieval, analysis, and citation. Instrument everything.

5) Watch the cost stack
Track k → tokens → latency → recall. If costs climb without recall moving, fix retrieval first.

6) Measure what users care about
Completeness, correct citations, separation of arguments, and traceability beat generic scores.

Explore More on AI with Ayushman

🧠 AI & Innovation

💼 Leadership & Growth

🌍 Life & Lessons

We're constantly pushing the boundaries of what's possible and seeking new ways to improve our services.

Ayushman Dash

Dec 17, 2025

Life

I worked on this 9 years ago and today it is used in the new Meta RayBan.

A look back at AirScript, my research on converting hand movements into text using EMG, and how it parallels Meta’s latest Neural Band technology for AR glasses.

Ayushman Dash

Dec 17, 2025

Life

I worked on this 9 years ago and today it is used in the new Meta RayBan.

A look back at AirScript, my research on converting hand movements into text using EMG, and how it parallels Meta’s latest Neural Band technology for AR glasses.

Ayushman Dash

Dec 17, 2025

Life

I worked on this 9 years ago and today it is used in the new Meta RayBan.

A look back at AirScript, my research on converting hand movements into text using EMG, and how it parallels Meta’s latest Neural Band technology for AR glasses.

Ayushman Dash

Dec 17, 2025

Life

I worked on this 9 years ago and today it is used in the new Meta RayBan.

A look back at AirScript, my research on converting hand movements into text using EMG, and how it parallels Meta’s latest Neural Band technology for AR glasses.

Ayushman Dash

Dec 14, 2025

AI News

Why Google's Nano Banana Pro Is The Best image Generator

Discover why Google’s new "Nano Banana" update is changing the game for AI character consistency. We break down the 17-second workflow that keeps your subject perfect across every prompt.

Ayushman Dash

Dec 14, 2025

AI News

Why Google's Nano Banana Pro Is The Best image Generator

Discover why Google’s new "Nano Banana" update is changing the game for AI character consistency. We break down the 17-second workflow that keeps your subject perfect across every prompt.

Ayushman Dash

Dec 14, 2025

AI News

Why Google's Nano Banana Pro Is The Best image Generator

Discover why Google’s new "Nano Banana" update is changing the game for AI character consistency. We break down the 17-second workflow that keeps your subject perfect across every prompt.

Ayushman Dash

Dec 14, 2025

AI News

Why Google's Nano Banana Pro Is The Best image Generator

Discover why Google’s new "Nano Banana" update is changing the game for AI character consistency. We break down the 17-second workflow that keeps your subject perfect across every prompt.

Ayushman Dash

Dec 11, 2025

Life

How I Used ChatGPT's Live Camera for Real-Time Learning

Tired of YouTube tutorials? Learn how one user leveraged ChatGPT's live camera and personalized profile to get real-time, context-specific advice while plant shopping in Paris.

Ayushman Dash

Dec 11, 2025

Life

How I Used ChatGPT's Live Camera for Real-Time Learning

Tired of YouTube tutorials? Learn how one user leveraged ChatGPT's live camera and personalized profile to get real-time, context-specific advice while plant shopping in Paris.

Ayushman Dash

Dec 11, 2025

Life

How I Used ChatGPT's Live Camera for Real-Time Learning

Tired of YouTube tutorials? Learn how one user leveraged ChatGPT's live camera and personalized profile to get real-time, context-specific advice while plant shopping in Paris.

Ayushman Dash

Dec 11, 2025

Life

How I Used ChatGPT's Live Camera for Real-Time Learning

Tired of YouTube tutorials? Learn how one user leveraged ChatGPT's live camera and personalized profile to get real-time, context-specific advice while plant shopping in Paris.

Ready to lead with confidence in an AI-driven world?

🚀 Join my masterclass

Quick Links

About

Services

Blog

Contact

Social

Ready to lead with confidence in an AI-driven world?

🚀 Join my masterclass

Quick Links

About

Services

Blog

Contact

Social

Ready to lead with confidence in an AI-driven world?

🚀 Join my masterclass

Quick Links

About

Services

Blog

Contact

Social

Every AI team hits a wall. Ours had ‘RAG’ written all over it.

RAG Hit a Wall - How We Rebuilt Legal AI to Actually Scale

Where textbook RAG broke for us

The 4 step rebuild that worked

1) Pick an OMTM - we chose recall

2) Rebuild embeddings for arbitration and legal text

3) De symbolize retrieval

4) Build a proprietary multi agent system that thinks like a lawyer

Results in under four months

Checklist - signs your RAG is stuck in demo land

Practical playbook

Explore More on AI with Ayushman

Read More Articles

Ready to lead with confidence in an AI-driven world?

Ready to lead with confidence in an AI-driven world?

Ready to lead with confidence in an AI-driven world?