Shipping AI Features in Your Web App Without the Bloat
Adding AI to a product is easy to do badly. Streaming, error states, and cost control patterns for AI features that feel fast and stay cheap.

Bolting an AI feature onto a web app looks trivial: wire up an API call, render the response, ship. The trivial version also feels broken — it hangs for ten seconds, shows a spinner, dumps a wall of text, and runs up a bill nobody budgeted for. Good AI features are an exercise in restraint and craft, not just a model call.
Stream or die
A ten-second wait with a spinner feels broken even when it's working. The same ten seconds with text streaming in word by word feels alive. Streaming is the single highest-impact thing you can do for perceived performance — it turns dead time into progress and tells the user, instantly, that something is happening. If your AI feature isn't streaming, fix that before anything else.
Perceived speed beats raw speed
Users don't experience your latency number; they experience the feeling of waiting. First-token time, optimistic UI, and progressive rendering shape that feeling far more than shaving a few hundred milliseconds off total generation.
Design the failure states
Models time out, rate limits hit, content gets filtered, the network drops. Each of these needs a real UI state, not a generic crash. Tell the user what happened, give them a way to retry, and never lose their input. A feature that fails gracefully feels trustworthy; one that throws a stack trace feels like a beta you didn't agree to test.
- Stream responses — never make the user stare at a spinner
- Design explicit loading, empty, error, and rate-limited states
- Preserve user input across failures so nothing is lost
- Set timeouts and fall back cleanly when the model is slow
Control the cost curve
Unlike a static feature, every use of an AI feature costs money. Without controls, success becomes expensive fast. Cache repeated queries, debounce as the user types, use smaller models where they suffice, and set per-user limits. The goal is a feature whose cost scales sensibly with value, not one that punishes you for getting popular.
“An AI feature that feels instant and costs pennies beats a smarter one that hangs and bleeds money. Craft is the difference.”
Make it feel native
The best AI features don't announce themselves with a glowing badge — they disappear into the workflow. They show up exactly where the user already is, do one thing well, and get out of the way. Restraint is what separates an AI feature people actually use from one they try once and forget.

Keep reading
How AI Copilots Actually Earn Their Keep in Production
Most AI copilots demo well and ship poorly. Here's the engineering that separates a flashy prototype from a copilot people trust every day.
ReadAI EngineeringRAG in the Real World: Retrieval That Doesn't Hallucinate
Retrieval-augmented generation is simple to start and brutal to get right. A practical look at chunking, ranking, and the failure modes nobody warns you about.
ReadAI EngineeringEvals Before Vibes: Measuring AI You Can Trust
"It feels better" isn't a metric. How to build evaluation sets that turn AI development from guesswork into engineering.
Read