HoneyChat HoneyChat

OAuth `state` Validation Belongs on the Client, Not the Server

· sm1ck · 2 min read
OAuth `state` Validation Belongs on the Client, Not the Server

Most OAuth tutorials get the state parameter half right.

They tell you to mint a random value before redirecting the user to the provider, then verify it on the callback. They don’t tell you where to store it between those two points. The default answer in nine guides out of ten is “Redis, keyed by some session ID.” Which is fine, until you trip over the bug that pattern guarantees.

We tripped over it on HoneyChat — Telegram-native AI companion, ~300 DAU, web sign-in via Google and Discord. The fix took about a hundred lines of code. The lessons stuck.

HoneyChat auth surfaces at a glance (three distinct paths, only the first needs the OAuth state discussion below):

SurfaceAuth mechanismCode path
Web app (web/, Next.js 15)OAuth 2.0 — Google + DiscordClient mints state, callback at /auth/callback, exchange via api/web_auth.py
Mini App (miniapp/, React + Vite, runs inside Telegram WebApp)Telegram InitData (HMAC of bot token)api/auth.py verifies and issues JWT — no OAuth state
Telegram polling botTelegram user ID from the message envelopeaiogram middleware — nothing to verify, the platform did it

The web app’s “Sign in with Telegram” button does an OAuth-like flow too — but it short-circuits to a positive users.id == tg_uid, which keeps a Telegram-then-web user from getting two rows. (We learned that one separately.) The OAuth state discussion below is for Google and Discord specifically.

The shape of the bug

Our OAuth flow looked like the textbook version:

  1. User clicks “Sign in with Google” on https://honeychat.bot/login.
  2. Our Next.js server generates a random state, stores it in Redis keyed by a server-issued session id, sets the session id as a cookie, and redirects to Google.
  3. Google bounces the user back to https://honeychat.bot/auth/callback?code=…&state=….
  4. Our callback reads the session id cookie, looks up the state in Redis, compares to the state from the query string. Match → good. Mismatch → 403.

In testing it worked. In production it failed at ~3% of attempts with state mismatch — possible CSRF. The users hitting the error were not under attack. They were:

  • People who’d opened the login in two tabs (each one wrote a different state to Redis under different session ids; the second to finish overwrote the cookie; the first tab’s callback then read the wrong state).
  • People on iOS Safari with intelligent tracking prevention (cookies set during the OAuth redirect were being dropped, so the session-id cookie was absent on callback).
  • People where the browser → server → Redis path was slower than the Google round-trip, so the callback hit Redis before the state write had propagated to the read replica.

All three are variants of the same underlying mistake: we’d put the state somewhere the user-agent didn’t carry. The whole point of the OAuth state parameter, per RFC 6749 §10.12, is that it’s a value the client generates, hands to the provider, and verifies when the provider hands it back. The server doesn’t need to know it. The server especially doesn’t need to store it.

The version that works:

// web/src/lib/oauth.ts — runs in the user's browser
function startGoogleOAuth() {
const state = crypto.randomUUID(); // 1. client mints state
document.cookie = [ // 2. set as cookie
`oauth_state=${state}`,
`Path=/auth/callback`, // scoped to callback path only
`Max-Age=600`, // 10 minutes is plenty
`SameSite=Lax`, // Lax — Strict breaks the redirect
`Secure`,
].join("; ");
const url = new URL("https://accounts.google.com/o/oauth2/v2/auth");
url.searchParams.set("client_id", process.env.NEXT_PUBLIC_GOOGLE_CLIENT_ID!);
url.searchParams.set("redirect_uri", `${origin}/auth/callback`);
url.searchParams.set("response_type", "code");
url.searchParams.set("scope", "openid email profile");
url.searchParams.set("state", state);
window.location.href = url.toString();
}
// web/src/app/auth/callback/route.ts — Next.js 15 route handler
export async function GET(req: NextRequest) {
const url = new URL(req.url);
const queryState = url.searchParams.get("state");
const cookieState = req.cookies.get("oauth_state")?.value;
if (!queryState || !cookieState || queryState !== cookieState) {
return new Response("state mismatch", { status: 403 });
}
// Hand the code to FastAPI for the secret-bearing exchange (api/web_auth.py)
const code = url.searchParams.get("code");
const tokens = await fetch("http://api:8000/web/auth/google/exchange", {
method: "POST",
headers: { "content-type": "application/json" },
body: JSON.stringify({ code }),
}).then(r => r.json());
const res = NextResponse.redirect("/profile");
res.cookies.set("oauth_state", "", { maxAge: 0, path: "/auth/callback" });
// Set the app session cookie that FastAPI issued
res.cookies.set("hc_session", tokens.session, { httpOnly: true, secure: true });
return res;
}

That’s it. No Redis, no server-issued session id, no read replica timing. The browser carries the state in a cookie scoped to the callback path. Discord follows the same pattern with a different provider URL — the entire two-provider matrix shares one helper.

The two-tabs case is fine because each tab gets its own value and the Path=/auth/callback scope keeps them from colliding (only the most recent tab’s callback matches its query string anyway — older tab opens fail closed, which is correct).

The iOS Safari case is also handled: with SameSite=Lax and Path=/auth/callback, the cookie survives the cross-origin redirect from Google or Discord.

We considered it. The argument: you could sign the state with HMAC, include a timestamp and a CSRF token, and verify on callback without even needing the cookie to round-trip — the state query string itself could carry the signature.

We didn’t, because:

  1. The cookie + cookie scope already gives you the cross-tab isolation and the path-scoping. A JWT doesn’t add anything for the threat model RFC 6749 §10.12 is actually addressing.
  2. Signed-state-in-query-string is one more thing to get wrong (key rotation, replay window, timing comparison) for the same security property a plain random value + cookie gives you.

If you have additional state you want to round-trip (e.g. the page the user was on before they clicked login, so you can return them there), signing it makes sense. The CSRF protection itself does not require it.

Where the server does fit

The server still has a job in this flow:

  1. Exchange the code for tokens. Only the server has the client secret. For us that’s api/web_auth.py::exchange_google_code() — FastAPI is the only thing on the network that knows GOOGLE_CLIENT_SECRET.
  2. Issue the application session (hc_session cookie) after a successful exchange.
  3. Optionally rate-limit the callback endpoint to slow down brute-force attempts on the code parameter.

What it doesn’t do anymore: store or verify the state.

A note on RFC 6749 §10.12

The spec language is intentionally minimal:

The client MUST implement CSRF protection for its redirection URI. This is typically accomplished by requiring any request sent to the redirection URI endpoint to include a value that binds the request to the user-agent’s authenticated state.

“User-agent’s authenticated state” is the load-bearing phrase. A cookie is the user-agent’s state. A Redis row keyed by a server-issued session id is the server’s state. Those are different things, and only one of them survives the kind of failure modes we hit.

Lessons

  1. Read RFC 6749 §10.12 once. The “state on client” reading is the right one.
  2. The state value belongs in something the user-agent carries. A cookie scoped to the callback path is the simplest such thing.
  3. Don’t store OAuth state in Redis. It introduces cross-tab races, cross-device weirdness, and read-replica latency bugs.
  4. Scope the cookie tightly. Path=/auth/callback, Max-Age=600, SameSite=Lax. Nothing else needs to see it.
  5. Clear the cookie after a successful callback. It’s single-use.

The fix took us about a hundred lines of code and let us delete some Redis plumbing. Our OAuth error rate dropped from ~3% to ~0.1% — that residual is genuine bad actors and people whose Google session expired mid-flow, which is the right floor.


Related notes: Astro + Next.js + FastAPI deploy contracts · Sentry SDK noise filter · range-DELETE postmortem.

Related Articles

Ready to Meet Your Companion?

Free: 20 messages/day. Premium starts at $4.99/mo.

Chat in Browser Telegram Bot