Why My 2025 AI Stack Is TypeScript-First (Python Only Where It Counts)
AI product development has changed dramatically in the last two years. In 2023, most AI engineers defaulted to Python-heavy stacks — FastAPI for APIs, LangChain (Python) for orchestration, HuggingFace for embeddings and fine-tunes. That made sense when models were new and everything happened in notebooks.
But in real products, 90% of the work isn’t Python. It’s UI/UX, auth, payments, user state, analytics — and the faster you ship those, the faster you learn. That’s why in 2025 I shifted to a TypeScript-first stack for nearly everything, and I only bring Python in when it’s truly needed. Result: faster iteration, cleaner integration, and an architecture that’s observable, swappable, and built for change.
#Core Philosophy
- TS-first for speed — Next.js + Convex + modern TS tooling ship features in hours, not days.
- Python only where it’s irreplaceable — fine-tuning, LoRA, specialized CV/NLP pipelines.
- Everything observable — if I can’t see what the model retrieved or why it hallucinated, it’s a demo, not a product.
- Swappable components — vector DBs, embeddings, and LLMs can change without rewrites.
#1) Frontend & Application Layer (All TS)
- Framework: Next.js (App Router) — mix server/client components, stream responses, share API contracts.
- Database & state: Convex — real-time, serverless storage, cron jobs with no extra infra.
- Auth: Clerk — OAuth, magic links, SSO in minutes.
- UI/Styling: Tailwind CSS + shadcn/ui — fast, consistent, themeable.
- File handling: UploadThing or Vercel Blob.
💡 Example: Nightly RAG index refresh = a Convex cron job calling the vector DB—no extra servers, no DevOps overhead.
#2) LLM Orchestration & Retrieval (TS)
- Orchestration: LangChain.js / LangGraph.js.
- Retrieval: Weaviate or Qdrant for managed ops; pgvector when I want Postgres-native queries.
- Embeddings: OpenAI
text-embedding-3-largefor general use; Voyage AI for multilingual; Cohere for budget scale. - LLMs: Mix and match — Groq (Llama 3) for low latency; OpenAI for reasoning-heavy work.
- UI integration: Vercel AI SDK for streaming chat/completions.
💡 Why not Pinecone? I prefer Weaviate/Qdrant for more control over index params and cost structure.
#3) The Python “Island” (Only When Needed)
I don’t run the whole backend in Python anymore, but I keep a microservice for:
- LoRA / fine-tuning (HuggingFace PEFT & Transformers)
- Custom CV/Audio/NLP pipelines (OpenFace, librosa, spaCy)
- Self-hosted inference on Modal, Replicate, or Runpod
It’s a small FastAPI service, deployed separately, called from the TS backend only when necessary.
💡 Benefit: I can scale GPU-heavy workloads independently; the main app stays fast.
#4) Evaluation & Observability
If you can’t debug model behavior, you’re flying blind. I track:
- Retrieval hits + metadata
- Latency per pipeline stage
- Hallucination scores (LLM-graded or heuristic)
- User feedback loops
Tools: LangSmith for tracing/A-B prompt tests, Sentry for app errors.
💡 Example: Logging retrieval hits with timestamps exposed a timezone bug breaking nightly index refreshes.
#5) Why This Stack Works
- Speed: Product features ship faster in TS.
- Flexibility: Swap vector DBs or LLMs in hours.
- Scalability: Python workloads don’t slow the rest of the app.
- Observability: Every stage is logged and traceable.
#When to Break the TS-First Rule
- Research-heavy prototypes where model iteration speed > product speed.
- Internal tools where UI/UX isn’t critical.
- Deep integrations with Python-only libraries.
For production AI products with real users, TS-first keeps me fastest.
#TL;DR
- Ship fast.
- Track everything.
- Keep it modular.
Pivots are inevitable. This stack lets me change the LLM, embeddings, or vector DB without starting over.
