Shubh Sareen | Portfolio

> CAREER.JSON

EXPERIENCE LOG

RESEARCH ENGINEERING INTERN

AUTONISE

APPLIED ML // BENGALURU

● ACTIVE

Working across the full stack of a production AI system — generation pipelines, LLM reliability research, evaluation methodology, and inference infrastructure.

[01]

PROGRAMMATIC IR PIPELINE

Proved that direct coordinate output from autoregressive LLMs is unstable due to spatial lookahead planning failures. Built deterministic wrappers (LLM → Python / Matplotlib / RDKit / Schemdraw) where domain libraries handle layout constraints. Token cost: ₹10–30 → ~₹0.5 per video. Errors and latency dropped dramatically.

[02]

SWISS-TOURNAMENT RANKING

Engineered Swiss-tournament-style matchmaking to rank JEE question banks using simulated intransitive preference systems. Comparison overhead: O(N²) round-robin → O(N log N). Core finding: pairwise comparison design matters as much as the ranking algorithm itself.

[03]

ITERATIVE VERIFICATION HARNESSES

Designed benchmarking rigs for generation-verification loops using DeepSeek and Qwen. Implemented error categorization separating protocol failures (JSON truncation from token limits) from semantic rule failures (business logic errors) to prevent loop decay.

[04]

ML INFRASTRUCTURE

Evaluated serving performance, memory constraints, and scalability tradeoffs using vLLM (RTX 3090), LoRA fine-tuning, and Mixture-of-Experts routing. Built intuition for when training-time architecture choices constrain serving options.

// AI reliability is a design problem, not a model problem.

ML RESEARCH INTERN

INFO-ME

RECOMMENDATION SYSTEMS

○ COMPLETED

Built and evaluated a two-stage content recommendation pipeline with a rigorous focus on honest evaluation methodology.

[01]

TWO-STAGE RECOMMENDATION PIPELINE

Embedding-based retrieval using cosine similarity, paired with a two-agent Llama 3B reranker hosted locally on an RTX 3090. Balanced retrieval quality against serving latency under real resource limits.

[02]

METRIC DESIGN & INVALIDATION

Designed a custom category-overlap evaluation metric — then mathematically invalidated it after discovering structural boundaries in the embedding space that made the metric misleading. Shipping a metric you later disprove is harder and more valuable than one that just looks good.

ML MENTOR

SEASONS OF CODE 2026

IIT BOMBAY // SUMMER PROGRAM

● ACTIVE

Selected as mentor for IIT Bombay's flagship summer ML program, leading a 12-week deep-learning implementation track.

[01]

PIXELS FROM NOISE — STABLE DIFFUSION

End-to-end Stable Diffusion implementation project. Guiding students through: Probability Foundations → VAEs → DDPM → CLIP Conditioning. Building onboarding resources and weekly code check-ins that prioritize mathematical understanding over API calls.