A clean Sentry inbox is a load-bearing developer tool. The day yours starts showing 4,000 events for things that aren’t bugs, you stop opening it. The day after that, you miss the real bug.
We upgraded HoneyChat — Telegram-native AI companion, ~300 DAU — from Sentry SDK 1.x to 2.x and that’s what happened. The SDK gained a useful new behavior: it auto-enables a set of integrations whenever it detects the relevant library imported. Loguru, OpenAI, SQLAlchemy, asyncpg, Redis, httpx — all on by default in 2.x. No more integrations=[...] boilerplate.
This is great when those integrations capture only real errors. They don’t.
HoneyChat backend stack (for context):
aiogramTelegram polling bot (bot/main.py)- FastAPI behind nginx (
api/main.py,uvicorn --workers 4) - Celery workers across four queues:
llm,images,gifs,voice(workers/tasks.py) - Celery beat with RedBeat scheduler (hourly greetings, daily reports)
- Dedicated GPU
gen_worker(image/GIF generation queue) - Storage: PostgreSQL 16 via asyncpg, Redis via aioredis, ChromaDB 0.5
- LLM calls go through OpenRouter using the official
openaiPython SDK (base_urlswapped tohttps://openrouter.ai/api/v1) — soopenai.*exception types fire on OpenRouter responses too - Logger: Loguru, routed to stdout + Sentry
That stack imports every library the Sentry 2.x auto-integration looks for. They all turn on.
What started landing in our inbox
Within a day of the upgrade:
- Every
logger.error("…")from Loguru became a Sentry issue. Including the lines we’d written aserror-level just because they were operationally important and we wanted them coloured red in the terminal. Not bugs. - Every
openai.RateLimitErrorandopenai.APIConnectionErrorbecame an issue. These are part of normal life when you route to LLMs via OpenRouter — we handle them withtenacityretries. Not bugs. - Every transient
asyncpg/SQLAlchemy pool race during deploy became an issue. We restartbot,api,celery_worker,nginxback-to-back during a full release; pool reconnects produce a brief flurry of these. Not bugs. - Every Redis
ConnectionResetErrorat network blip. Also not a bug.
Real bugs were drowning. Issue counts went from ~5/day to 4,000+.
What’s actually happening
Sentry SDK 2.x scans sys.modules at init and turns on any integration whose target library is already imported. The relevant docs page lists them. Three matter most for a typical Python service:
LoguruIntegration— captures Loguru records atERRORand above.OpenAIIntegration— captures allopenai.OpenAIErrorraises (which, for us, includes everything OpenRouter ever returns through the OpenAI SDK).SqlalchemyIntegration— captures slow queries, connection errors, and a few other states.
You can disable individual integrations:
import sentry_sdkfrom sentry_sdk.integrations.loguru import LoguruIntegrationfrom sentry_sdk.integrations.openai import OpenAIIntegration
sentry_sdk.init( dsn=settings.SENTRY_DSN, disabled_integrations=[ LoguruIntegration, OpenAIIntegration, ],)…but that’s a blunt instrument. We do want LLM errors reported when they’re real. We want Loguru-routed errors reported when the line that produced them is actually a bug.
The fix is a before_send filter.
The filter (core/sentry_filters.py)
import sentry_sdkfrom sentry_sdk.types import Event, Hint
# Exceptions we expect to see at low rates as part of normal operation.EXPECTED_EXCEPTIONS = ( "openai.RateLimitError", "openai.APIConnectionError", "openai.APITimeoutError", "openai.InternalServerError", "redis.exceptions.ConnectionError", "redis.exceptions.TimeoutError", "asyncpg.exceptions.ConnectionDoesNotExistError", "asyncpg.exceptions.InterfaceError", "sqlalchemy.exc.OperationalError",)
# Loguru loggers/modules where ERROR-level lines are operational, not bugs.OPERATIONAL_LOGGERS = ( "core.llm", # fallback chain, content_filter rescue, retries "core.image_gen", # GPU → API provider switchover "core.voice", # Inworld TTS → gTTS fallback "workers.gen_worker", # task-level fallback)
def before_send(event: Event, hint: Hint) -> Event | None: # 1) Drop expected transient exceptions. exc_info = hint.get("exc_info") if exc_info: exc_type = exc_info[0] exc_path = f"{exc_type.__module__}.{exc_type.__name__}" if exc_path in EXPECTED_EXCEPTIONS: return None
# 2) Drop ERROR-level Loguru records from operational modules. logger_name = (event.get("logger") or "") if logger_name in OPERATIONAL_LOGGERS: level = event.get("level") if level in ("error", "warning"): return None
return event
sentry_sdk.init( dsn=settings.SENTRY_DSN, before_send=before_send, # Sample tracing low; we only want errors here. traces_sample_rate=0.0, profiles_sample_rate=0.0,)The filter is twenty lines. The two tuples are the actual contract: these exceptions and these loggers are noise. After deploying this, Sentry inbox went from 4,000+ to ~30 events/day. The 30 included two real bugs we’d been missing.
The log-level discipline that goes with it
A filter is half the answer. The other half is fixing the log levels at the source. Our team now follows three rules:
logger.error(...)is only for bugs — a real malfunction the user shouldn’t have experienced. These belong in Sentry by default.logger.warning(...)is for known operational events — fallback fired, retry scheduled, rate limit hit. These go to log files for trend analysis, not to Sentry.logger.info(...)is for normal-path traces — fallback chain step transitions, successful retries, model switch confirmations.
A normal-path “Gemini returned content_filter, falling back to Grok 4.20” is info, not error. The terminal might lose some red, but Sentry stops crying wolf.
When we audited our codebase against this, we found roughly 40 logger.error lines in core/llm.py, core/image_gen.py and workers/gen_worker.py that should have been warning or info. Fixing them at source means the before_send filter doesn’t need to grow indefinitely.
What we didn’t do
We considered ignore_errors=[...] on sentry_sdk.init instead of a before_send. The problem is that ignore_errors only matches by exception type name, not module path. ConnectionError is ambiguous (Redis vs httpx vs asyncpg — all have one). The fully-qualified path check in before_send is more precise.
We also considered turning the integrations off entirely. The risk is losing visibility into real LLM and DB errors — when there’s a real bug in the LLM path, the OpenAIIntegration’s stack-trace enrichment is genuinely useful. Keeping the integrations on and filtering precisely was the better trade.
Lessons
- An SDK upgrade can change capture surface without changing your code. Read the changelog before bumping the major.
- Auto-enabled integrations are a feature and a tax. Audit which ones are on after upgrade.
before_sendis the right hook for noise reduction. It runs late enough to know the full event, early enough to drop cheaply.- Log levels are a contract, not a style preference. If
errordoesn’t mean “Sentry-worthy”, your Sentry inbox is broken. - A small filter beats turning integrations off. You keep the enrichment, you skip the noise.
The hardest part of this work isn’t the filter — it’s getting the team to agree on what error actually means.
Related notes: LLM refusal rescue chain · ChromaDB 0.5 memory leak · range-DELETE postmortem.