HoneyChat HoneyChat

Astro + Next.js + FastAPI in One Repo: The Deploy Contracts We Wish We'd Written Day One

· sm1ck · 6 min read
Astro + Next.js + FastAPI in One Repo: The Deploy Contracts We Wish We'd Written Day One

HoneyChat is three frontends and one backend in one monorepo, all running on a single 32 GB / 16-core Xeon host:

  • Astro (website/) — static MDX + RSS, serves the marketing site, blog and SEO landing pages. ~1,000 pages × 20 languages.
  • Next.js 15 (web/) — SSR, serves the product surfaces: pricing, character profile, chat, profile, payments. The main DAU canvas (ChatRoom.tsx is ~3,000 lines).
  • React + Vite Mini App (miniapp/) — opens inside the Telegram WebApp client. 20 languages. PWA + service worker.
  • FastAPI (api/main.py, uvicorn --workers 4) behind nginx, serving both frontends.
  • aiogram Telegram bot (bot/main.py) — polling, separate process.

On top of that, the docker-compose.yml defines ~14 services that all need to play nicely on deploy: bot, api, nextjs, celery_worker, celery_beat, gen_worker, lora_worker, cleanup_worker, retention_email_worker, postgres, redis, chromadb, nginx, certbot. Two docker networks: internal (service-to-service) and external (nginx only, ports 80/443).

For about six months, every other deploy broke something. A new Astro build that looked fine locally would 404 in production. A FastAPI rebuild would surface as 502 Bad Gateway for ten minutes. A working Mini App would silently keep serving the old bundle for users with the PWA installed.

None of these were code bugs. They were deploy-pipeline bugs. Specifically, five of them, repeated.

Here’s the contract we run now.

Rule 1: --force-recreate for Python, not restart

docker compose restart api does not re-read the built image. It restarts the process inside the existing container with the existing filesystem. If the image has been rebuilt with new Python code, restart won’t pick it up. The change ships only after the next time the container is replaced.

This catches people every release. The new code is on the host, the image is rebuilt, docker compose restart api returns immediately and looks happy, and you spend twenty minutes wondering why your fix didn’t ship.

The right invocation:

Terminal window
docker compose up -d --force-recreate --no-deps api

We wrapped both forms in make:

deploy-api:
docker compose build api
docker compose up -d --force-recreate --no-deps api
@sleep 3
docker compose restart nginx # see rule 2
restart-api:
docker compose restart api # config-only, no code changes

Two targets, two intentions. Use restart-api only for config changes that the running process re-reads on signal.

The same rule applies to every Python service — bot, celery_worker, celery_beat, gen_worker, all four *_workers. We forgot this once for celery_beat after editing the RedBeat schedule and spent half an hour wondering why the new cron entries weren’t firing.

Rule 2: nginx caches upstream DNS at start

When nginx starts, it resolves each upstream’s hostname once and caches the IP for the life of the worker process. If the upstream container is recreated (rule 1), it gets a new IP on the internal docker network. nginx still has the old one.

The symptom is 502 Bad Gateway with host not found in upstream "api" in the nginx error log, even though docker compose ps shows the api container healthy and listening.

The fix is to restart nginx after restarting any service it routes to:

Terminal window
docker compose up -d --force-recreate --no-deps api
sleep 3 # let api come up and accept connections
docker compose restart nginx

The sleep 3 is not superstition — if you restart nginx before api is accepting connections, nginx hits a different failure mode (refused connection) and you have to do it again.

There’s a way around this with resolver directives and set $upstream variables so nginx re-resolves on each request. We tried it. The simpler, deterministic restart pair turned out to be less surprising in practice.

Rule 3: One Makefile target per surface, never docker compose up -d

For a long time, deploying meant some combination of build + up -d + restart nginx and praying. Three commands, executed in the wrong order half the time, rebuilding every service even when only one changed. We replaced it with named targets:

# Astro static blog: build + sync to nginx volume
website:
cd website && npm run build && cp -r dist/* ../frontend/website/
docker compose exec nginx nginx -s reload
# Next.js product app: build + recreate container
web:
docker compose build nextjs
docker compose up -d --force-recreate --no-deps nextjs
@sleep 3
docker compose restart nginx
# Mini App: build + sync + bump SW
miniapp:
cd miniapp && npm run build && cp -r dist/* ../frontend/app/dist/
./scripts/bump_sw_timestamp.sh # see rule 5
docker compose exec nginx nginx -s reload
# API + bot + workers: build + recreate + nginx
deploy:
docker compose build api bot
docker compose up -d --force-recreate --no-deps api bot celery_worker gen_worker
@sleep 3
docker compose restart nginx
# Full release: everything in the right order
deploy-all: deploy web website miniapp

The order in deploy-all matters: api/bot/workers first (everything else depends on them), then SSR Next.js (depends on api), then static Astro (depends on neither), then Mini App (depends on nothing but has its own SW dance).

Rule 4: Astro static + nginx — there is no atomic swap

Astro builds to website/dist/. We cp -r it into the nginx-mounted frontend/website/ volume. During the copy, some files exist with the new content and some still have the old content. A request that crosses that boundary can get a half-broken page.

For a blog at our traffic level the deploy window is short enough that nobody notices. We accept the trade.

If you can’t accept the trade, the patterns are:

  1. rsync --delete into a sibling directory and atomic mv. nginx picks up the swap because mv is atomic on the same filesystem.
  2. Versioned subdirectories with a symlink swap. dist-2026-05-28/ next to a current -> dist-2026-05-28/ symlink. Swap the symlink, reload nginx.
  3. CDN in front of nginx. Push to origin, purge, the CDN does the swap.

Rule 5: Service workers need a kick

The Mini App has a service worker that caches the bundle. If sw.js doesn’t change between deploys, browsers serve users the cached old bundle indefinitely. Users see what looks like a bug (“the new feature doesn’t show up for me”) that’s actually their SW happily ignoring our deploy.

Two things are needed.

On build: stamp the SW with a timestamp so its bytes change every deploy.

scripts/bump_sw_timestamp.sh
#!/bin/bash
SW=frontend/app/dist/sw.js
STAMP=$(date +%s)
sed -i "s|^const SW_VERSION = .*|const SW_VERSION = \"$STAMP\";|" "$SW"

On client: a tiny snippet that triggers reload when the SW changes (decision D217 in our internal log).

if ("serviceWorker" in navigator) {
let firstInstall = true;
navigator.serviceWorker.addEventListener("controllerchange", () => {
if (firstInstall) {
firstInstall = false;
return;
}
window.location.reload();
});
}

The firstInstall guard matters — without it, a brand-new user hitting the site for the first time triggers a reload they didn’t ask for. After the first install, subsequent SW activations are deploys and do warrant a reload.

We learned both halves the hard way. Stamping without the client listener: bundle changes but users stay on the cached one. Client listener without stamping: SW never changes, listener never fires, nothing reloads.

Bonus: the external GPU dependency

We’re not strictly self-contained — image generation hits a Vast.ai ComfyUI box over an SSH tunnel. Vast can change a container’s public IP at any time. We run a small watchdog (scripts/vast_watchdog.py, runs every few minutes) that detects an IP change and force-recreates bot, api, celery_worker, and gen_worker so the new tunnel target gets picked up. Three guards in the loop prevent infinite recreation cycles. This is the only deploy-related thing on our cluster that isn’t directly under make.

What “deploy” looks like now

Terminal window
make website # marketing/blog content push
make web # product change in Next.js
make miniapp # Mini App change
make deploy # backend change (api + bot + workers)
make deploy-all # full release in the right order

Each target is the smallest unit that ships safely on its own. The rules above are baked into the targets, not held in someone’s head.

The number of 502 Bad Gateway incidents we’ve shipped since adopting this contract: roughly zero. The number of “users not seeing the new feature” tickets from cached SWs: also roughly zero. The cost was about a day of writing this Makefile and reading nginx docs more carefully.

Lessons

  1. restart is not up -d --force-recreate. Use the right one for code vs config changes.
  2. nginx upstream DNS is cached at start. Restart nginx after any service it routes to.
  3. One Makefile target per deploy surface. Don’t make humans remember the order.
  4. Static-file deploys aren’t atomic by default. Pick a pattern (symlink swap, rsync atomic, CDN) before you need one.
  5. Service workers need a stamp and a listener. Otherwise your users are stuck on yesterday’s bundle.

A deploy is not “I pushed the code”, it’s “users are reliably seeing the new code.” The middle distance — running container, served-but-not-rebuilt asset, cached-but-stale SW — is where deploys silently fail.


Related notes: Sentry SDK noise filter · range-DELETE postmortem · ChromaDB 0.5 leak fix.

Related Articles

Ready to Meet Your Companion?

Free: 20 messages/day. Premium starts at $4.99/mo.

Chat in Browser Telegram Bot