Featured
Python · FastAPI · Next.js
Enterprise RAG Reliability Platform
Local-first enterprise-style RAG reliability platform for MLOps runbooks
and uploaded documents, with cited answers, provider comparison, eval
diagnostics, traces, proof artifacts, and an operational dashboard.
- Designed to run without API keys.
- 29 tests cover provider citation markers, secret-refusal behavior, PDF uploads, CORS configuration, and CLI smoke.
- Tracked Playwright dashboard QA drives ingest, query, eval, console, and overflow checks on desktop/mobile.
- Eval reports include latency, source coverage, cost, and pass/fail evidence.
Applied AI
Python · Next.js
Applied AI Eval Lab
Document intelligence workspace with retrieval, citations,
evaluation metrics, answer-fact coverage, release-gate checks, experiment comparison, and a live static dashboard.
- 21 backend tests, frontend audit/typecheck/static export, demo data checks, and tracked desktop/mobile demo QA.
- Writes JSON/Markdown report artifacts with report detail endpoints.
- Core verification is Docker-free; Docker smoke separately verifies indexing, grounded query, eval gate, reports, CORS, and dashboard readiness.
Retrieval eval
Python · RAG
RAG Forge
Retrieval benchmark runner for comparing chunking, embedding, dense,
BM25, hybrid retrieval, and reranking choices with Markdown/JSON reports
and a regression gate for retrieval-quality changes.
- 37 tests cover ranking, gates, reports, and E5 query embedding behavior.
- Blocks quality drops beyond configured hit-rate, MRR, and latency thresholds.
- Sample check reruns the 24-configuration benchmark and self-comparison gate.
Inference serving
Python · FastAPI
StreamInfer
Local inference-serving project with adaptive batching, backpressure,
model hot-swap, metrics, load-test reports, and LLM-style benchmark sweeps.
- Benchmark sweep compares batch size and timeout tradeoffs with JSON/Markdown reports.
- 40 tests cover serving, backpressure, benchmark gates, and recommendation stability.
- Docker smoke verifies container health, prediction, hot-swap, and metrics paths.
Reliability
Python · ML validation
MLGuard
Pre-deployment checks for drift, performance regression, and latency
regression before shipping model changes.
- 26 tests cover CLI behavior, report summaries, regression checks, and action metadata.
- CLI help and action metadata both avoid advertising unsupported PyTorch artifacts.
- Missing baselines now fail fast unless drift-only mode is explicit.
MLOps
Python · Docker · Kubernetes
MLOps End-to-End Pipeline
Customer churn pipeline covering data ingestion, model training, FastAPI
serving, monitoring, Docker, and Kubernetes-oriented deployment structure.
- 15 tests cover API behavior, request validation, data cleaning, and feature prep.
- Local verification includes strict lint/format checks, training import, and Prometheus parsing without requiring Docker.
- Optional Docker/Compose checks verify container config, health, and prediction paths.
Open source
Ray · LightEval · BentoML
Open Source PRs
Open upstream PRs proposing focused fixes in AI infrastructure and evaluation
tooling: RLlib documentation, LightEval typing, and BentoML server/model/testing docs.