∇θσ(z)∂L/∂wP(y|x)E[x]argmaxKL(P‖Q)μ±σdL/dθtanh(z)α·∇LΣxᵢ/np(z|x)Q(s,a)V(π)∂²f/∂x²ŷ = f(x ; θ*)∑wᵢxᵢ + b∂L/∂w = δ · xᵀP(hire|x) = 0.97ReLU(z) = max(0,z)λ||θ||² → 0ẑ ~ N(μ, σ²)AUC = 0.941
Available · Founding Engineer roles & AI consulting
Om Kumar
Solanki.
AI / ML EngineerFounding EngineerAI ConsultantFull-Stack · Cloud · MLOps
Founding Engineer @ Resso.ai  ·  AI / ML Architect
ŷ = f(real data ; θ*) → production
agent[resso-ai] initialised_

I take a business problem, build the AI system that solves it, and hand it off running in production. Real-time ML pipelines, autonomous agent networks, on-premise RAG — end-to-end, every layer owned.

3+
Years prod AI
<2s
Inference latency
0.97
AUC hire-scoring
3
Companies built
0%
External calls (RAG)
▸ Live inference — resso-ai
resso-inference · v2.1IDLE
▪ Audio Input
▪ Pipeline
WebRTC audio capture
Speaker diarization
NLP token inference
ONNX hire-scorer (INT8)
▪ Hire Score
○ awaiting inference...
▸ Terminal — om@resso-ml
om@resso-ml — zsh
om@resso-ml:~ $ 
scroll

Multi-Agent Orchestration

Why one LLM isn't enough.
How I architect the alternative.

I build pipelines where specialised agents plan, remember, predict, and validate in concert. Reliable AI products, not fragile demos. Industry-standard architecture used at Deloitte, Accenture, and McKinsey.

▸ Live pipeline trace — task execution flow

User Task
"analyse this contract"
Orchestrator
plans · routes · retries
Memory Agent
reads session + vector store
ML Agent
risk scoring · classification
Tool Agent
fetches docs · calls APIs
Validator
checks coherence · confidence
Structured Output
JSON · report · action
Single LLM forgets
Memory Agent keeps session context short-term and retrieves long-term facts from a vector store. Agents never lose the thread.
One model can't specialise
Each agent is tuned for its job. An ML agent runs a fine-tuned risk model, a tool agent handles API calls. No jack-of-all-trades hallucinations.
Long tasks time out or drift
The orchestrator breaks work into atomic sub-tasks, runs them in parallel where possible, and retries failed nodes without re-running the whole chain.
No fault tolerance
A validator agent checks every agent's output before it passes downstream. Bad outputs are caught and rerouted, not silently swallowed.
01
Define agent contracts

Each agent gets a strict input/output schema. The orchestrator enforces types so no agent can accept or emit arbitrary blobs.

02
Build the orchestrator

LangGraph or a custom state machine. It holds the DAG of agent dependencies, handles branching logic, and owns retry and fallback policy.

03
Wire the memory layer

Session memory lives in Redis (fast, ephemeral). Permanent memory indexes into pgvector or Pinecone. Agents query both on every turn.

04
Instrument every node

Each agent emits spans to an observability backend (Langfuse / OTLP). I can see exactly where latency or hallucinations enter the pipeline.

05
Harden with a validator

A lightweight LLM-as-judge agent scores coherence and confidence before results leave the pipeline. Below threshold, re-route or flag for human review.

Deloitte
Human-in-the-Loop (HITL)

Regulated industries require agents to pause and route high-risk decisions to human reviewers before action. Mandatory in Deloitte AI deployments for finance and healthcare.

Azure AI StudioSemantic KernelApproval Workflows
Accenture
Guardrails + PII Redaction

Constitutional AI filters, hallucination detection, and automatic PII scrubbing on every agent output. Accenture AI Hub mandates these before any client data touches an LLM.

NeMo GuardrailsPresidioAzure Content Safety
McKinsey / Big 4
Audit Trails + Compliance

Every agent action, tool call, and decision is logged with full trace context. SOC2, ISO 27001, and GDPR-ready architecture required across all enterprise AI platforms.

LangfuseOpenTelemetryImmutable Logs
IBM / SAP
Enterprise Orchestration Platforms

Large firms standardise on IBM Watson Orchestrate, AWS Bedrock Agents, or Google Vertex AI Agents for governance, cost control, and multi-model routing across business units.

Watson OrchestrateBedrock AgentsVertex AI
Full StackLangGraphAutoGenSemantic KernelCrewAIAzure AI StudioAWS BedrockVertex AIRedispgvectorPineconeLangfuseOpenTelemetryNeMo GuardrailsFastAPIDockerKubernetes

Full Technical Stack

From model weights to the browser — every layer owned.

No hand-offs, no integration gaps. One engineer who understands the full system.

AI / ML
PyTorchHuggingFacescikit-learnXGBoostONNXMLflowWeights & BiasesOpenAI APIAnthropic API
Agent Frameworks
LangGraphLangChainCrewAIAutoGenSemantic KernelLlamaIndex
Memory & Vector
pgvectorPineconeChromaWeaviateRedisQdrant
Backend
FastAPINode.jsPostgreSQLGraphQLRESTWebSocketsCeleryRabbitMQ
Frontend
Next.jsReactTypeScriptTailwindCSSWebRTC
Cloud & Infra
AWSGCPAzureDockerKubernetesTerraformGitHub ActionsCI/CD
Observability
LangfuseOpenTelemetryDatadogSentryGrafana
01 //
Experience & Projects

Production AI systems. Real companies.

Every system here runs in the real world — with real users, real data, and real consequences when something breaks.

AI Consulting
How I solve real business problems with AI
Case studies · Discovery process · Business-to-tech translation · Outcomes
View consulting page ↗
Experience
Companies worked for
01

Resso.ai

Founding Engineer
Real-Time AI Interview Intelligence
Visit ↗
P(hire | audio, transcript, t) = σ(Wₜhₜ + b)
Live Screenshots
Resso.ai — Real-time interview scoring platform
Resso.ai — Real-time interview scoring platformWebRTC · PyTorch · ONNX · AWS

Built the entire production ML platform from scratch. WebRTC audio capture at 8 kHz, speaker diarization pipeline separating candidate and interviewer voices in real time, NLP feature extraction across prosody, semantics and pace, and live hire-probability scoring — all streaming inference at sub-2-second latency during a live interview. The AI rates conversations as they happen, not after.

Designed the talking AI avatar system: lip-sync audio-to-viseme mapping, WebSocket event bus, and on-device model quantisation for smooth real-time video rendering. The system processes every spoken word into actionable structured signals.

Inference latency: 8s → <2s · Job placement rate +45% · Live during interview
PyTorchWebRTCSpeaker DiarizationNLPDockerAWSMLOpsONNX
02

Corol.org & NunaFab

ML Engineer
UHPC Strength Prediction · ML for Structural Engineering
Visit ↗
ŷ = Σ wₖfₖ(X) + ε SHAP: φᵢ = E[f(X)|Xᵢ] − E[f(X)]
Live Screenshots
UHPC formulation prediction platform
UHPC formulation prediction platformXGBoost · SHAP attribution · R²=0.89

Ultra-High Performance Concrete (UHPC) meets machine learning. Working with Corol.org and NunaFab, I built compressive strength prediction models for UHPC mix designs — telling structural engineers not just what mix achieves target strength, but WHY each constituent (water-cement ratio, silica fume, fibre dosage, curing age) drives the outcome. The dataset was only 200 rows. Transfer learning from related concrete domains, aggressive feature engineering, and an SHAP-explainable gradient-boosting ensemble got R² to 0.89.

SHAP attribution made the model interpretable: engineers saw exact feature impact values (φᵢ) for each mix ingredient — silica fume contribution, fibre reinforcement effect, W/C ratio influence. Reduced physical lab testing cycles from weeks to a single afternoon. Screened hundreds of UHPC formulations computationally before any concrete was poured.

Lab cycles: weeks → one afternoon · 100s of mixes screened computationally · R² = 0.89 · SHAP-explainable
SHAPXGBoostScikit-learnTransfer LearningEnsembleFeature Engineering
SaaS Products & Projects
Live products & independent builds
01

Lawline.tech

AI EngineerLIVE SAAS
Legal AI Platform for Rogers · Live SaaS · $1M Investment Conversation
Visit ↗
E(doc) = LLM(top-k(HNSW(chunk)) + clause_template) → {party, obligation, risk, date}
Live Screenshots
Lawline.tech — Live SaaS product
Lawline.tech — Live SaaS productFine-tuned LLM · HNSW · FastAPI

Built the AI core for Lawline.tech — a Canadian legal research platform serving attorneys who cannot use any cloud AI due to attorney-client privilege. Built a fully local RAG stack: HNSW vector store over Canadian legal corpora, BGE cross-encoder reranker, GGUF-quantized local LLM — zero data ever leaves the office. Sub-4% hallucination rate on legal eval sets. Now in an active $1M investment conversation with the President of Rogers for enterprise licensing across Rogers' legal and compliance teams.

Designed the full pipeline: PDF ingestion → semantic chunking (512-token, 128 overlap) → local embeddings → HNSW index → top-K reranked retrieval → GGUF LLM response. Built confidence-gated output: low-confidence answers route to human review instead of surfacing to attorneys. The architecture became the sales pitch — attorneys demo'd the zero-outbound-packets screen to their law society contacts.

Sub-4% hallucination · Air-gapped · 0 bytes leave the office · $1M Rogers President conversation
Legal AIFine-tuningDocument ParsingHNSWFastAPITypeScriptONNX
02

MCP Integration Server

Builder
Universal Agentic Tool Layer
Agent(t) → [LLM + context] → MCP → {toolₙ} → structured_result → LLM

Every enterprise tool lives in a silo — Slack, CRMs, databases, internal APIs. AI agents need one standard to speak to all of them. I built a production MCP (Model Context Protocol) server that acts as the universal backbone. Write one integration, reach everything. The agent calls MCP; MCP speaks to the world. This is how agentic behaviour is triggered in real-world systems: LLM receives context → determines tool needed → MCP routes call → returns structured result → LLM continues reasoning.

Built multi-tool orchestration: parallel tool calls, retry logic, structured output parsing, and context memory integration so agents remember previous tool results across a session.

One server · N integrations · Agentic pipelines with persistent context memory
MCP ProtocolTypeScriptNode.jsTool UseAgentic AIContext Memory
03

Vadtal — Vector DB Platform

AI Architect
On-Premise RAG · Private AI · Zero Data Egress
R(q) = top-k(cosine(E(q), Eᵢ)) → LLM(prompt + context)

A religious organisation managing thousands of donors needed AI search but couldn't send a single byte to the cloud. I designed and built a complete on-premise RAG stack for Vadtal: quantized LLM running on 16 GB RAM using GGUF format, a custom HNSW vector store with semantic chunking and metadata filtering, cosine similarity scoring, and a FastAPI inference server. Everything — embedding, retrieval, generation — runs locally.

Built the full RAG pipeline: document ingestion → semantic chunking → embedding (local model) → HNSW index → top-k retrieval with cosine similarity → context injection → LLM response. Sub-1s end-to-end query latency, 100% private, works offline.

Sub-1s queries · 100% private · Fully offline · 16 GB RAM footprint
Local LLMGGUFHNSWVector DBONNXFastAPISemantic Chunking
04

Lost and Found

Full-Stack Engineer
TTC · Transit Capstone · Pitching to TTC Director — May 2026
match(claim, item) = cosine(E(desc_claim), E(desc_item)) > θ → notify_owner
Live Screenshots
TTC Lost & Found — System overview
TTC Lost & Found — System overviewFull-stack · AI similarity matching engine

Built a complete digital Lost & Found system for the Toronto Transit Commission — one of North America's largest transit networks serving 1.7M daily riders. The platform digitizes the entire claim lifecycle: TTC staff report found items via a mobile app, each item gets a unique QR-tagged scan record, and owners submit claims through a mobile-first portal. An AI similarity-matching engine connects found items to incoming claims using description embeddings. Pitching this to the TTC Director in May 2026.

Full pipeline: mobile item reporting → QR generation → owner claim portal → AI description similarity matching → staff approval dashboard. Built for TTC's operational constraints — works on spotty transit WiFi, handles hundreds of daily items, fully auditable claim history. Production-ready architecture, not a school demo.

Full claim lifecycle · AI item matching · 1.7M riders · Mobile-first · Pitching to TTC Director May 2026
Next.jsTypeScriptAI MatchingQR CodeMobile-FirstPostgreSQLFastAPI

My Engagement Process

From your problem to working software. No slide decks.

Five steps that any client or hiring manager can hold me to. Every phase ends with a clear deliverable, not a status update.

01 / 05
Diagnose

I sit with your team, map the actual problem against what you think the problem is, and identify the root cause. No solution proposed until I can write a one-paragraph problem statement you fully agree with.

Stakeholder interviewsSystem auditRoot cause analysis
02.5 //
Agentic Architecture

Sigmoid-gated trigger systems.

Production MLOps automation: multi-signal monitoring routes through a sigmoid decision gate — LLM orchestrator aggregates drift, performance, time, and confidence signals to decide when to retrain, what to retrain, and how to deploy.

SIGMOID DECISION GATE— IDLE
INPUT z
-2.000
σ(z)
0.1192
DECISION— NO ACTION
P(action) = σ(w₁·drift + w₂·perf + w₃·time + w₄·conf + b)
LIVE SIGNAL MONITORS · polling every 400ms · 0 / 4 signals fired
DATA DRIFTNOMINAL
0.14> 0.30 threshold
F1 DEGRADATIONNOMINAL
0.91< 0.80 threshold
DAYS SINCE TRAINNOMINAL
8d> 30 threshold
LLM CONFIDENCENOMINAL
0.88< 0.70 threshold
LLM ORCHESTRATOR DECISION
All signals nominal — no action required
AGENT EXECUTION LOG · real-time orchestration
waiting for trigger signals...
RETRAIN TIMELINE · automated scheduling
MODEL RETRAIN HISTORY · hire_scorer.onnx
-45d
v1.0
-32d
v1.1
-18d
v1.2
-7d
v2.0
now
v2.1
+12d
v3.0?
data drift
perf drop
time interval
current
LLM predicted
Time-based
Every 30 days fallback trigger
Drift-based
KL > 0.30 data distribution shift
Perf-based
F1 < 0.80 model degradation
LLM-decided
Conf < 0.70 agent prediction
σ(z)
Sigmoid-gated decisions
4-signal
Multi-trigger monitoring
LLM+ML
Hybrid orchestration
<4h
End-to-end retrain cycle
02 //
Mathematical Foundations

The math I work with. Every day.

Not textbook exercises. These are the algorithms powering every system I build. Interactive — explore how each one works.

PyTorch · Scikit-learn · ONNX · MLflow · FastAPI · HNSW · MCP · WebRTC · GGUFAll models ship to production.
03 //
The Story Behind The Work

They put me in Section C. The sigmoid proved the threshold was wrong all along.

I scored 63%, 72%, 78% and was reminded every year I was failing. Nobody told me the pass mark was an arbitrary number someone just decided. That number — I now set it for AI systems that make million-dollar hiring decisions.

Non-tech?You'll see what problems I solve and how much you save.
Technical?You'll see the architecture, the actual σ formula, and real production metrics.
Starbucks — Lawline.tech on screen
Starbucks, Canada · Cup: "Omkumar" · Screen: Lawline.tech
Second Cup café
Starbucks

Canada · 2024 · These photos are real. Built AI companies from coffee shops with no office, no funding.

01

The Division

Class 8 — The label that was supposed to stick forever

Our school had a system. Section A — the students who scored the most marks. Section B — average. Section C — the ones they'd already quietly given up on. We weren't told this. But we felt it. In the tone of voice. In how questions in class were directed at A section students, not us.

I was in Section C.

And so the game began. Every year, from class 8 to class 12, it was the same ritual. Exams. Results. Public comparison. Teachers reading scores aloud. Students ranked. The race had one rule: marks at the end. Not understanding. Not curiosity. Not depth. Just the number.

I scored 63%. Then 72%. Then 78%. Then 68%. Teachers shook their heads. Parents worried. I started believing them. I told myself I was a failure. The system was very good at making that happen.

That was Grade 8. For the next 8 years — from class 8 through every year until engineering college — I carried that label. I thought the problem was me.

The Race — Every year, the same metric
Class 8
63%Section C assigned
Class 9
72%"You can do better"
Class 10
78%"Still not enough"
Class 11
68%"Disappointing"
Class 12
71%"Try harder next time"
Nobody once asked: "Do you understand the concept?"

No teacher ever said: “You know what, let's talk about how many students I failed this year.” They never compared their own performance. They only measured ours. They celebrated when their “A section” students placed — never asking if the C section students had potential they never unlocked.

Building Lawline.tech at Starbucks
Starbucks · Canada · 2024 · Real photo

Cup says “Omkumar.”
Screen shows Lawline.tech.
While Section C was on my record.

No office. No funding. No team. Just a laptop, a problem to solve, and years of being told I wasn't good enough — which turned out to be fuel.

SECTION C → LIVE SAAS
02

The Grind

Canada 2024 — No office. No funding. Just problems worth solving.

The AI revolution was happening. But I wasn't at a funded startup or a big tech company. I was at Starbucks — serving coffee in the morning, writing Python in the afternoon.

Most people see AI as something only Silicon Valley does. I saw it as a tool. There were real problems in front of me: businesses spending hundreds of dollars an hour on legal review. Companies spending weeks screening 500 candidates and still making bad hires. Labs wasting months on trial-and-error chemistry.

I didn't ask permission. I didn't wait for a job offer or a VC cheque. I just started building.

For non-tech — What problems were actually solved
Legal AI
Legal review used to take hours and cost hundreds
A lawyer charges $300/hr to read contracts. For a small business reviewing 20 documents a month, that's $6,000/month just to understand what you signed. I built an AI that reads the same documents in 3 seconds, flags the risky parts, and costs a fraction.
60% fewer legal errors · 3s per document · not 3 hours
Hiring AI
Hiring the wrong person costs 30% of their salary
One bad hire in a 50-person company can cost $15,000–$50,000 once you factor in rehiring, training, lost time. I built an AI that scores candidates in real time during the interview — so hiring managers make decisions on data, not gut feel.
+45% placement rate · AUC 0.97 · <2s scoring
Chemistry ML
R&D cycles that took 6 months now take 2
A chemistry lab testing new materials has to physically make and test hundreds of samples. I built an ML model on just 200 data points that predicts which ingredients work before they even mix them.
R² = 0.89 · 40% cost reduction · 3× faster iterations
AI Integration
Your internal tools could talk to each other — they don't
Most companies have Slack, a CRM, a database, and five different tools that don't share data. I built a single AI layer using the MCP protocol that connects them all — one command, everything updates.
1 integration layer · N tools connected · zero vendor lock-in
03

The Pattern

Engineering — When I finally understood WHY, everything clicked

For 8 years — from Grade 8 all the way to engineering college — I didn't know why I scored low. I studied. I read. But nothing stuck. And I genuinely believed the problem was me.

Then in college, I noticed a pattern in myself — I didn't need more math. I needed a reason to do math. The moment someone gave me WHY a concept existed, I understood it completely. Not the formula. The problem it solved.

Someone explained logistic regression — not “here's the formula” but “here's the problem it solves” — and I got the sigmoid instantly. And then I thought: wait. If the output is a probability with a threshold... that's a yes/no switch. And a yes/no switch can run code.

Linear regression gives you any number — unbounded. Sigmoid wraps it using Euler's constant e, squashing it to 0–1. The S-curve emerges naturally. Then the threshold: the line that decides PASS or FAIL — FIRE or SKIP. That's it. That's the whole machine.

In school, that threshold was a number someone just decided. In AI, it's something you prove with data. You can move it. Question it. Optimise it. School never let us do that.

The problem was never me. It was the way they taught — without the “why.” No context. No reason. No what problem does this solve? Just: memorise, reproduce, get marked. The second I understood WHY something existed, I couldn't stop. I went from 8 years of feeling useless to staying up all night building things — because now I had a reason.

Sigmoid → Agentic Trigger
LIVE DEMO · Resso.ai interview scorer
Pipeline: candidate interview → hire decision
z = w·x + b
Linear score from candidate signals
σ(z) = 1/(1+e⁻ᶻ)
Squash to probability 0→1
σ < τ → SKIP
Pipeline stops — no wasted compute
Interview score (drag to test)σ = 0.62
0.0 — weak candidatestrong candidate — 1.0
threshold τ = 0.65  0.62 < 0.65SKIP
Agent skips — nothing runs
🚫no_action(candidate_id)
📝log(reason = "below_threshold")
💰saved: 0 emails, 0 API calls, 0 time
This is exactly what I built at Resso.ai. The sigmoid output drives everything — schedule, email, CRM, next agent — or nothing. One number. One threshold. The entire decision.

Non-tech: Drag the threshold — you decide what “good enough” means. Notice how arbitrary it feels. Tech: This is the actual σ used in Resso's hire-probability scorer. AUC 0.97 in production.

04

The Truth

The problem was never ability — it was the metric they chose

The system was optimised for memorisation. Recite the formula. Get the mark. Move on. Nobody checked if you could actually use what you learned. Nobody measured whether you understood the 'why'.

I now build systems where the metric is: does this model make the right decision when it matters? Does the business save money? Did the wrong hire get screened out? Is the chemistry formula actually better? Those are real metrics. Percentage marks at class 10 was not.

I build things that generate real ROI — not because I memorised formulas, but because I understand the problem deeply enough to know exactly what an AI system needs to solve it.

The receipts — things actually built
Resso.ai
Founding Engineer

AI that scores candidates in real time during live interviews — so hiring managers stop guessing.

WebRTC · Speaker Diarization · ONNX INT8 · <2s latency
Vadtal — Vector DB
AI Architect

Private AI brain for companies that can't send data to the cloud. Runs 100% on your own server.

GGUF · HNSW · FastAPI · 100% offline · 16GB RAM
Lawline.tech
Built at Starbucks

Upload a legal document. AI reads it, flags risky clauses, extracts obligations. 3 seconds, not 3 hours.

Fine-tuned LLM · Clause classification · Live SaaS
Corol / NunaFab
ML Engineer

ML model that predicts chemistry results before running lab tests. 40% fewer wasted experiments.

XGBoost · SHAP · 200-row sparse dataset · R² = 0.89
05

Now

Founding Engineer · The threshold was always wrong — I just needed data to prove it

Today I'm a Founding Engineer building production AI systems — real-time pipelines, autonomous agents, infrastructure that companies actually depend on.

The difference between me and most AI engineers: I start with the business problem, not the model. What does it cost you today? How much do you save if this works? Then I build the system that gets you there.

To every teacher who read my marks aloud, shook their head, and moved on to the next student —

“What's your net worth?
Why are you shut down?”

You loved measuring us by numbers. You read our scores aloud — 63%, 72%, 78% — and shook your heads. Numbers were everything. Fine. Let's talk numbers now. What's yours? The business you built? The impact you had? Why the silence?

You taught the formula but not the reason. You set the threshold but never questioned it. You measured us — but nobody was measuring you. The metric was wrong from day one. It took me an engineering degree to prove it.

Section C → Founding Engineer · 63% → AUC 0.97 · Pass mark was always arbitrary
Omkumar Solanki

The system didn't fail me. The metric did. I'm not here to impress the system — I'm here to build what it couldn't imagine.

If you're a business owner — I'll show you exactly where AI saves you money and how fast. If you're technical — let's talk architecture, latency, and real production trade-offs.

SECTION C → FOUNDING ENGINEER·63% → AUC 0.97·STARBUCKS → RESSO.AI·THE THRESHOLD WAS WRONG
05 //
Live Terminal

Ask me anything. For real.

Powered by GPT-4o mini via OpenAI — type ai <anything> to ask about Om.

suggested questions
om@resso-ml — zsh↑↓ · Tab
⎇ main · Python 3.11 · PyTorch 2.2 · CUDA 12.1✓ 0 cmds
<2s
Inference latency
0.941
AUC — hire scoring
0%
External calls (RAG)
R&D cycle speedup
06 //
About

The engineer who ships the whole thing.

Honours BASc in AI. AWS-certified. Three years building production ML systems across hiring tech, legal AI, materials science, and enterprise automation — each one deployed, measured, and handed off with documentation.

Om Kumar Solanki

I started building AI systems before the current wave made it fashionable. I understand the math — gradient descent, attention mechanisms, RAG architectures — and I understand what it takes to keep them working at 3 AM when an inference pipeline is down and someone's live interview is waiting. My standard is simple: if you can't measure it improving something real, it doesn't ship.

Education
Sheridan College
BASc — Artificial Intelligence (Honours)AI Minds Board Member, Sheridan EDGE
AWS Academy
Cloud Developing GraduateDec 2025
Experience
Founding Engineer — AI, ML & Real-Time Systems

Designed and built the entire AI platform from zero: WebRTC audio ingestion at 8 kHz, custom speaker diarization pipeline separating candidate and interviewer in real time, NLP feature extraction across prosody and semantics, and a live hire-probability scoring model at <2s inference latency. Built the talking AI avatar system — lip-sync via audio-to-viseme mapping, on-device model quantisation, and WebSocket event bus for real-time video rendering. Every word spoken becomes structured signals. Every interview is scored live.

Nov 2025 — Present
HariKrushna Software
AI Architect & Agentic Software Engineer

Consulting on production AI architecture for enterprise clients across HR tech, compliance, and materials science. Delivered: on-premise RAG stacks (GGUF + HNSW, 100% private, offline-capable), MCP server integrations bridging AI agents to enterprise APIs (Slack, CRMs, databases), and multi-agent orchestration pipelines with persistent context memory. Work spans the full agentic stack — from LLM reasoning loops and tool dispatch, to MLOps and drift detection.

Jun 2024 — Present
ML Engineer — UHPC Strength Prediction

Applied ML to Ultra-High Performance Concrete (UHPC) compressive strength prediction. Built SHAP-explainable XGBoost ensemble models on a 200-row dataset — showing structural engineers not just which mix achieves target strength, but which constituents drive the outcome (W/C ratio, silica fume, fibre dosage, curing age — φᵢ values per ingredient). Transfer learning + aggressive feature engineering got R² = 0.89. Extended to NunaFab's structural composite formulations. Result: physical lab testing cycles cut from weeks to one afternoon, hundreds of mixes screened computationally.

2024
Technologies
PythonTypeScriptReactNext.jsNode.jsPyTorchTensorFlowScikit-learnAWSDockerKubernetesMLflowFastAPIPostgreSQLRedisVector DBsWebRTCRAGMCPONNX
What I build
Real-time ML inference — <2s end-to-end
Multi-agent systems that automate business workflows
Private RAG: your data stays on your servers
Cloud infra that scales without breaking (AWS/GCP/Azure)
Full product — backend, frontend, and the AI layer
MLOps: monitoring, drift detection, auto-retrain pipelines
07 //
AI Consulting

What I can build for you.

AI systems that run inside your infrastructure, trained on your data, and triggered by real business events — not a chatbot wrapper. Your documents, your workflows, your edge. Zero data leaves your building.

01
Your data stays yours

Every model runs inside your building. No cloud APIs. No data leaving your network. GDPR-compliant by architecture, not policy.

02
AI that acts, not just answers

Agentic systems that read signals from your business — tickets, calls, documents — and take actions inside your tools automatically.

03
Built on your domain

Models fine-tuned on your actual data. Not a generic ChatGPT wrapper. A system that understands your industry, your customers, your language.

04
Measurable ROI

Every engagement starts with an ROI model. If we can't show you how it pays for itself, we don't build it. Cost per outcome, not cost per hour.

HOW IT WORKS · TWO-LAYER PIPELINE
SECURE PERIMETER · AIR-GAPPED · ON-PREMISE0 external API calls · encrypted at rest · GDPR compliant
01 TRAINING PIPELINE — fine-tune a model on your business corpus
INPUT
Business Data
CRM · docs · logs
PREPROCESS
Clean + Label
deduplicate · annotate
FOUNDATION
Base Model
Llama 3.1 · Mistral · Phi-3
FINE-TUNE
LoRA Fine-tune
QLoRA 4-bit · custom corpus
EVALUATE
Eval Loop
F1 · AUC · BLEU · perplexity
COMPRESS
ONNX Export
INT8 quant · 4× speedup
DEPLOY
Production
serve · monitor · retrain
TRAINED MODEL ARTIFACT
02 INFERENCE + TRIGGER PIPELINE — live agentic decisions in production
TRIGGER
Business Signal
event · query · document
LOCAL
Embed
nomic-embed · on-premise
RETRIEVAL
HNSW Search
4.2ms p99 · no cloud
REASON
Fine-tuned LLM
your domain · your data
DECISION
Sigmoid Gate
P(action) = σ(z) > 0.50
ORCHESTRATE
Agent
MCP · tool-use · memory
OUTPUT
Business Action
hire · alert · escalate
P(retrain) = σ(w₁·drift + w₂·F1_drop + w₃·days + b) · P(action) = σ(w₁·signal + w₂·confidence + b)
Foundation & embedding
Evaluation & decision gate
Deploy & business output
Fine-tuned reasoning
REAL BUSINESS TRIGGERS · WHAT THE AI RESPONDS TO
Input signalModelThresholdActionRetrain trigger
Interview audio streamHire-scorer (ONNX INT8)P > 0.80Flag candidate · update ATSData drift > 0.30
Customer churn signalsChurn classifier (XGBoost)P > 0.65Trigger retention offer via CRMF1 drops < 0.82
Invoice / document ingestDoc parser (fine-tuned LLM)conf > 0.90Route to approver · auto-fill ERPMonthly + errors
Support ticket submittedIntent classifierP > 0.75Escalate · assign · draft replyNew intent labels
Sales call transcriptOpportunity scorerscore > 70Alert AE · update pipeline stageWin-rate feedback
HOW WE ENGAGE

From idea to production.

AUDIT
Discovery
1–2 weeks
You get a clear map of where AI saves you time and money.

We audit your data, workflows, and stack. You walk away with a concrete AI roadmap — no fluff, no guessing.

+Stack + data readiness audit
+AI opportunity map
+Architecture proposal
+ROI model for each opportunity
MOST COMMON
DEPLOY
Build
4–10 weeks
A fully deployed AI system running inside your infrastructure.

End-to-end: model fine-tuned on your data, agentic pipeline, sigmoid trigger automation — deployed on-premise. Your data never leaves.

+Fine-tune on your business corpus
+On-premise RAG + vector search
+Agentic trigger pipeline
+MLOps monitoring + auto-retrain
+Full source + documentation
ENTERPRISE
Scale
2–4 months
AI across every department. Your whole org moves faster.

Multi-agent platform, continuous retraining, air-gapped deployment, rolled out department by department.

+Multi-agent orchestration
+Air-gapped deployment
+CI/CD for model retraining
+Department-level rollout
+Ongoing retainer included
ONGOING PARTNERSHIP
Monthly advisory + hands-on engineering
Architecture reviews · model monitoring · incident response · new feature development
Ready to talk scope?
Every engagement starts with a free 30-minute call where we define
the problem, the deliverable, and the metric that proves it worked.
Book a free callSend message
07 //
Get in touch

Let's build something.

Open to founding roles, consulting engagements, and ambitious teams that need AI that actually works.

Emailemailtosolankiom@gmail.com
LinkedIn/in/omkumar-solanki
LocationToronto, Canada
resso.aicorol.orgnunafab.com
Designed & built by Omkumar Solanki · 2026