AI Architect. Ex-Agoda & Quantcast. Foundation model systems, retrieval systems, fine-tuning, compression, and drift/regression behavior in production. Advising on reliable model behavior under real-world usage.
Not sure what’s breaking? I diagnose the real root cause fast.
If you’re seeing hallucinations, drift, unstable formatting, retrieval issues, latency spikes, or regressions, I identify the true root cause and recommend the highest-leverage fix.
You’ll get:
- A clear diagnosis of what’s actually failing - Prioritized fixes to apply immediately - Risk reduction for high-stakes customer interactions - Direction on whether to prompt, adapt, fine-tune, or retrain - When you should not fine-tune
Bring failing examples, queries, or outputs.
This session identifies which specific path (if any) you should pursue next.
Stabilize behavior in any customer-visible or investor-visible interaction.
I identify variance drivers in LLM outputs that prompt tests fail to surface, preventing embarrassing or off-brand failures.
You’ll walk away with:
- Tactics to reduce hallucinations under real queries - Deterministic formatting patterns for structured output - Consistent tone and brand voice heuristics - Lightweight fallback paths for high-stakes use-cases
Best for teams shipping customer-facing AI features, pilots, or investor-visible artifacts.
Bring failing examples if you have them.
Choose the right model adaptation approach without burning budget.
I help teams pick between LoRA, adapters, instruction tuning, compression, and distillation without over-specializing models that break in production.
- Architecture-fit recommendations - Dataset quality heuristics to avoid overfitting - Catastrophic-forgetting tripwires - Cost vs. latency vs. accuracy trade-offs - When synthetic data is genuinely useful - When compression should be combined with tuning
Bring sample training examples if possible.
Avoiding one unnecessary adaptation can save thousands.
Fix hallucinations by improving retrieval coherence.
Most factuality issues come from weak retrieval, not model reasoning. I improve retrieval relevance, chunk boundaries, and embedding quality for consistent answers.
- RAG architecture recommendations - Chunking and embedding patterns that preserve semantics - Retrieval tuning guidance for phrasing variance - Grounding techniques without retraining - Considerations for memory layers such as scratchpads or caches
Bring failing queries.
Improved retrieval coherence can dramatically reduce hallucinations.
I design lightweight evaluation loops that catch breaks in formatting, reasoning, prompt sensitivity, and tone in long-tail edge cases, while ensuring inference budgets stay under control.
- Behavioral test design - Output-constraint checks such as JSON, XML, or schema - Drift alerts on silent behavioral changes - Lightweight scoring frameworks - Minimal interpretability checks - Compression guardrails to reduce cost without quality loss
Ideal for teams scaling beyond prompt testing.
A single silent regression or runaway cost spike can erase weeks of progress.