Why My 2025 AI Stack Is TypeScript-First (Python Only Where It Counts)

AI product development has changed dramatically in the last two years. In 2023, most AI engineers defaulted to Python-heavy stacks — FastAPI for APIs, LangChain (Python) for orchestration, HuggingFace for embeddings and fine-tunes. That made sense when models were new and everything happened in notebooks.

But in real products, 90% of the work isn’t Python. It’s UI/UX, auth, payments, user state, analytics — and the faster you ship those, the faster you learn. That’s why in 2025 I shifted to a TypeScript-first stack for nearly everything, and I only bring Python in when it’s truly needed. Result: faster iteration, cleaner integration, and an architecture that’s observable, swappable, and built for change.

#Core Philosophy

TS-first for speed — Next.js + Convex + modern TS tooling ship features in hours, not days.
Python only where it’s irreplaceable — fine-tuning, LoRA, specialized CV/NLP pipelines.
Everything observable — if I can’t see what the model retrieved or why it hallucinated, it’s a demo, not a product.
Swappable components — vector DBs, embeddings, and LLMs can change without rewrites.

#1) Frontend & Application Layer (All TS)

Framework: Next.js (App Router) — mix server/client components, stream responses, share API contracts.
Database & state: Convex — real-time, serverless storage, cron jobs with no extra infra.
Auth: Clerk — OAuth, magic links, SSO in minutes.
UI/Styling: Tailwind CSS + shadcn/ui — fast, consistent, themeable.
File handling: UploadThing or Vercel Blob.

💡 Example: Nightly RAG index refresh = a Convex cron job calling the vector DB—no extra servers, no DevOps overhead.

#2) LLM Orchestration & Retrieval (TS)

Orchestration: LangChain.js / LangGraph.js.
Retrieval: Weaviate or Qdrant for managed ops; pgvector when I want Postgres-native queries.
Embeddings: OpenAI text-embedding-3-large for general use; Voyage AI for multilingual; Cohere for budget scale.
LLMs: Mix and match — Groq (Llama 3) for low latency; OpenAI for reasoning-heavy work.
UI integration: Vercel AI SDK for streaming chat/completions.

💡 Why not Pinecone? I prefer Weaviate/Qdrant for more control over index params and cost structure.

#3) The Python “Island” (Only When Needed)

I don’t run the whole backend in Python anymore, but I keep a microservice for:

LoRA / fine-tuning (HuggingFace PEFT & Transformers)
Custom CV/Audio/NLP pipelines (OpenFace, librosa, spaCy)
Self-hosted inference on Modal, Replicate, or Runpod

It’s a small FastAPI service, deployed separately, called from the TS backend only when necessary.

💡 Benefit: I can scale GPU-heavy workloads independently; the main app stays fast.

#4) Evaluation & Observability

If you can’t debug model behavior, you’re flying blind. I track:

Retrieval hits + metadata
Latency per pipeline stage
Hallucination scores (LLM-graded or heuristic)
User feedback loops

Tools: LangSmith for tracing/A-B prompt tests, Sentry for app errors.

💡 Example: Logging retrieval hits with timestamps exposed a timezone bug breaking nightly index refreshes.

#5) Why This Stack Works

Speed: Product features ship faster in TS.
Flexibility: Swap vector DBs or LLMs in hours.
Scalability: Python workloads don’t slow the rest of the app.
Observability: Every stage is logged and traceable.

#When to Break the TS-First Rule

Research-heavy prototypes where model iteration speed > product speed.
Internal tools where UI/UX isn’t critical.
Deep integrations with Python-only libraries.

For production AI products with real users, TS-first keeps me fastest.

#TL;DR

Ship fast.
Track everything.
Keep it modular.

Pivots are inevitable. This stack lets me change the LLM, embeddings, or vector DB without starting over.