The TL;DR
If you run ChromaDB 0.5.x with more than a few hundred collections, set these two env vars before anything else:
CHROMA_SEGMENT_CACHE_POLICY=LRUCHROMA_MEMORY_LIMIT_BYTES=10737418240 # 10 GiBWithout them, ChromaDB 0.5.x has an unresolved memory leak in the segment cache. Upstream issues #3336 and #5843 are still open. We discovered this the slow way.
HoneyChat at a glance (for context): Telegram-native AI companion bot, ~300 DAU, 17 languages. Stack: aiogram bot + FastAPI (uvicorn, 4 workers) + Celery workers (queues llm / images / gifs / voice) + Celery beat (RedBeat) + Next.js 15 web + Astro blog + React/Vite Mini App. Storage: PostgreSQL 16 + Redis + ChromaDB 0.5.x + Storj S3. Host: 32 GB / 16-core Xeon, single box.
The shape of the leak
We run 2,233 ChromaDB collections in production — one per (character_id, session_id) pair, so each conversation gets isolated semantic memory and scene context never bleeds between sessions. Mean collection size: 4.9 documents (small per-collection, large in aggregate).
On 0.4 this ran fine for months. We upgraded to 0.5 for some new features, and within a week the chromadb container was OOM-killing nightly. The pattern was unmistakable: every time a fresh collection got queried, RSS bumped a few MiB and never came back down. With ~10K collection touches a day across that fleet of 2,233, the container budget filled in about three days. Restart, repeat.
What we tried first (and what didn’t work)
- Restarting the container. Buys a day, doesn’t fix the cause.
- Upgrading ChromaDB. The underlying behavior hasn’t changed in the 0.5.x line.
- Increasing the container memory limit. Just delays the OOM.
- Sharding collections further. We already split per
(character, session)— narrower sharding would have worsened the cache, not helped it. - Blaming the embedding model. Profile pointed elsewhere.
Profiling pointed at the segment cache: ChromaDB caches per-collection segment metadata, and on 0.5 the cache is unbounded by default. The “fix” of “let’s just give it more RAM” never converges if the cache only grows.
The fix
The env vars above tell ChromaDB to use an LRU eviction policy on the segment cache, capped at a memory limit you set. Once we set them and bounced the container, RSS stabilised in a 6-8 GiB band and has stayed there for months.
services: chromadb: image: chromadb/chroma:0.5.18 environment: CHROMA_SEGMENT_CACHE_POLICY: "LRU" CHROMA_MEMORY_LIMIT_BYTES: "10737418240" # 10 GiB deploy: resources: limits: memory: 12GCHROMA_SEGMENT_CACHE_POLICY=LRU switches the cache from unbounded to least-recently-used eviction. CHROMA_MEMORY_LIMIT_BYTES is the budget LRU operates against — 10 GiB out of 32 GB host RAM, leaving room for Postgres, Redis, FastAPI, four Celery workers, nginx, ChromaDB itself, and the OS.
Pick a CHROMA_MEMORY_LIMIT_BYTES that’s well under your container’s hard limit — the policy needs headroom to actually evict before the kernel kills you.
The catch (don’t forget this one)
These env vars are only applied at container creation. docker compose restart chromadb is not enough — you need:
docker compose up -d --force-recreate --no-deps chromadbWe learned this the second time we changed limits while debugging, watching RSS climb again wondering why the fix had stopped working. It hadn’t — the new env never got picked up. If you change the limits, always recreate, not restart.
Why this isn’t on the docs landing page
Most ChromaDB benchmarks and getting-started guides assume one big collection — the documented happy path. If you’re per-user or per-session partitioning (multi-tenant SaaS, per-conversation memory, per-document RAG silos), you hit cache-and-eviction behaviour the docs don’t warn about. The issues are real and open in the repo; the docs just haven’t caught up.
This isn’t a knock on the team — 0.5 was a big jump and they’re shipping fast. It’s just a heads-up that if your workload is “many small collections,” your config has to be different from the tutorial.
Lessons
- “It’s a leak” is usually “it’s a cache without an eviction policy.” Read your dependency’s cache config before chasing valgrind ghosts.
- Many-small-collections is not the documented happy path. Per-user/per-session partitioning needs a config nobody’s tutorial mentions.
- Check open issues before assuming your config is wrong. #3336 and #5843 are community-known, not docs-known.
- Set both env vars together. Without
CHROMA_MEMORY_LIMIT_BYTES, the LRU policy has nothing to evict against and effectively no-ops. - Recreate, don’t restart, when changing startup env. Standard Docker gotcha, doubly painful when you’re debugging memory.
If you’re on Chroma 0.5+ with many collections and seeing slow RSS creep — that’s almost certainly it. Three lines of YAML, one container recreate, done.
Related notes from our companion stack: persistent-memory architecture (Redis recent + ChromaDB semantic) · prompt caching measured across providers · per-tier LLM routing on OpenRouter.