HoneyChat HoneyChat

Sentry SDK 2.x Auto-Integrations Flood Your Inbox — Here's the Filter

· sm1ck · 1 min read
Sentry SDK 2.x Auto-Integrations Flood Your Inbox — Here's the Filter

A clean Sentry inbox is a load-bearing developer tool. The day yours starts showing 4,000 events for things that aren’t bugs, you stop opening it. The day after that, you miss the real bug.

We upgraded HoneyChat — Telegram-native AI companion, ~300 DAU — from Sentry SDK 1.x to 2.x and that’s what happened. The SDK gained a useful new behavior: it auto-enables a set of integrations whenever it detects the relevant library imported. Loguru, OpenAI, SQLAlchemy, asyncpg, Redis, httpx — all on by default in 2.x. No more integrations=[...] boilerplate.

This is great when those integrations capture only real errors. They don’t.

HoneyChat backend stack (for context):

  • aiogram Telegram polling bot (bot/main.py)
  • FastAPI behind nginx (api/main.py, uvicorn --workers 4)
  • Celery workers across four queues: llm, images, gifs, voice (workers/tasks.py)
  • Celery beat with RedBeat scheduler (hourly greetings, daily reports)
  • Dedicated GPU gen_worker (image/GIF generation queue)
  • Storage: PostgreSQL 16 via asyncpg, Redis via aioredis, ChromaDB 0.5
  • LLM calls go through OpenRouter using the official openai Python SDK (base_url swapped to https://openrouter.ai/api/v1) — so openai.* exception types fire on OpenRouter responses too
  • Logger: Loguru, routed to stdout + Sentry

That stack imports every library the Sentry 2.x auto-integration looks for. They all turn on.

What started landing in our inbox

Within a day of the upgrade:

  • Every logger.error("…") from Loguru became a Sentry issue. Including the lines we’d written as error-level just because they were operationally important and we wanted them coloured red in the terminal. Not bugs.
  • Every openai.RateLimitError and openai.APIConnectionError became an issue. These are part of normal life when you route to LLMs via OpenRouter — we handle them with tenacity retries. Not bugs.
  • Every transient asyncpg/SQLAlchemy pool race during deploy became an issue. We restart bot, api, celery_worker, nginx back-to-back during a full release; pool reconnects produce a brief flurry of these. Not bugs.
  • Every Redis ConnectionResetError at network blip. Also not a bug.

Real bugs were drowning. Issue counts went from ~5/day to 4,000+.

What’s actually happening

Sentry SDK 2.x scans sys.modules at init and turns on any integration whose target library is already imported. The relevant docs page lists them. Three matter most for a typical Python service:

  • LoguruIntegration — captures Loguru records at ERROR and above.
  • OpenAIIntegration — captures all openai.OpenAIError raises (which, for us, includes everything OpenRouter ever returns through the OpenAI SDK).
  • SqlalchemyIntegration — captures slow queries, connection errors, and a few other states.

You can disable individual integrations:

import sentry_sdk
from sentry_sdk.integrations.loguru import LoguruIntegration
from sentry_sdk.integrations.openai import OpenAIIntegration
sentry_sdk.init(
dsn=settings.SENTRY_DSN,
disabled_integrations=[
LoguruIntegration,
OpenAIIntegration,
],
)

…but that’s a blunt instrument. We do want LLM errors reported when they’re real. We want Loguru-routed errors reported when the line that produced them is actually a bug.

The fix is a before_send filter.

The filter (core/sentry_filters.py)

import sentry_sdk
from sentry_sdk.types import Event, Hint
# Exceptions we expect to see at low rates as part of normal operation.
EXPECTED_EXCEPTIONS = (
"openai.RateLimitError",
"openai.APIConnectionError",
"openai.APITimeoutError",
"openai.InternalServerError",
"redis.exceptions.ConnectionError",
"redis.exceptions.TimeoutError",
"asyncpg.exceptions.ConnectionDoesNotExistError",
"asyncpg.exceptions.InterfaceError",
"sqlalchemy.exc.OperationalError",
)
# Loguru loggers/modules where ERROR-level lines are operational, not bugs.
OPERATIONAL_LOGGERS = (
"core.llm", # fallback chain, content_filter rescue, retries
"core.image_gen", # GPU → API provider switchover
"core.voice", # Inworld TTS → gTTS fallback
"workers.gen_worker", # task-level fallback
)
def before_send(event: Event, hint: Hint) -> Event | None:
# 1) Drop expected transient exceptions.
exc_info = hint.get("exc_info")
if exc_info:
exc_type = exc_info[0]
exc_path = f"{exc_type.__module__}.{exc_type.__name__}"
if exc_path in EXPECTED_EXCEPTIONS:
return None
# 2) Drop ERROR-level Loguru records from operational modules.
logger_name = (event.get("logger") or "")
if logger_name in OPERATIONAL_LOGGERS:
level = event.get("level")
if level in ("error", "warning"):
return None
return event
sentry_sdk.init(
dsn=settings.SENTRY_DSN,
before_send=before_send,
# Sample tracing low; we only want errors here.
traces_sample_rate=0.0,
profiles_sample_rate=0.0,
)

The filter is twenty lines. The two tuples are the actual contract: these exceptions and these loggers are noise. After deploying this, Sentry inbox went from 4,000+ to ~30 events/day. The 30 included two real bugs we’d been missing.

The log-level discipline that goes with it

A filter is half the answer. The other half is fixing the log levels at the source. Our team now follows three rules:

  • logger.error(...) is only for bugs — a real malfunction the user shouldn’t have experienced. These belong in Sentry by default.
  • logger.warning(...) is for known operational events — fallback fired, retry scheduled, rate limit hit. These go to log files for trend analysis, not to Sentry.
  • logger.info(...) is for normal-path traces — fallback chain step transitions, successful retries, model switch confirmations.

A normal-path “Gemini returned content_filter, falling back to Grok 4.20” is info, not error. The terminal might lose some red, but Sentry stops crying wolf.

When we audited our codebase against this, we found roughly 40 logger.error lines in core/llm.py, core/image_gen.py and workers/gen_worker.py that should have been warning or info. Fixing them at source means the before_send filter doesn’t need to grow indefinitely.

What we didn’t do

We considered ignore_errors=[...] on sentry_sdk.init instead of a before_send. The problem is that ignore_errors only matches by exception type name, not module path. ConnectionError is ambiguous (Redis vs httpx vs asyncpg — all have one). The fully-qualified path check in before_send is more precise.

We also considered turning the integrations off entirely. The risk is losing visibility into real LLM and DB errors — when there’s a real bug in the LLM path, the OpenAIIntegration’s stack-trace enrichment is genuinely useful. Keeping the integrations on and filtering precisely was the better trade.

Lessons

  1. An SDK upgrade can change capture surface without changing your code. Read the changelog before bumping the major.
  2. Auto-enabled integrations are a feature and a tax. Audit which ones are on after upgrade.
  3. before_send is the right hook for noise reduction. It runs late enough to know the full event, early enough to drop cheaply.
  4. Log levels are a contract, not a style preference. If error doesn’t mean “Sentry-worthy”, your Sentry inbox is broken.
  5. A small filter beats turning integrations off. You keep the enrichment, you skip the noise.

The hardest part of this work isn’t the filter — it’s getting the team to agree on what error actually means.


Related notes: LLM refusal rescue chain · ChromaDB 0.5 memory leak · range-DELETE postmortem.

Related Articles

Ready to Meet Your Companion?

Free: 20 messages/day. Premium starts at $4.99/mo.

Chat in Browser Telegram Bot