snitchbot · 通知 · telemetry for python

Sentry. snitchbot.
Crashes, load, anomalies
delivered to Telegram.

Drop-in telemetry for any Python service. Exceptions, slow calls, watchdog stalls, RSS / CPU / FD anomalies, and your own notify() calls — streamed to a Telegram chat you already have open. No DSN, no dashboards to host, no blocking calls in your hot path.

$ uv add snitchbot 30-second setup ->
in your process
<10MB
The client — thin, synchronous-safe, fire-and-forget. One dependency (msgpack). Sits inside your Python service, adds no blocking calls.
one per service, isolated
~43MB
An async sidecar spawned alongside your app. Handles Telegram, dedup, rate-limit, anomaly detection — in its own process, so a bug in observability never crashes production.

Every signal,
with the code that triggers it.

Snippets pulled from examples/ — real working scripts, not pseudo-code. Pick a case to see the setup and the verbatim alerts it produces.

orders_api.py
import snitchbot

snitchbot.init("orders-api")

# Unhandled exceptions are captured automatically,
# including stack, thread, and origin.

async def list_orders(user_id: int) -> list:
    return await svc.fetch_all(user_id)

# Somewhere down the stack:
#   raise DatabaseConnectionError("refused to ...")
# -> snitchbot captures and sends the alert below.
watch_slow_demo.py
import asyncio
import time
import snitchbot


@snitchbot.watch_slow(threshold_ms=100)
async def fetch_user_profile(user_id: int) -> dict:
    await asyncio.sleep(0.25)  # 250 ms > threshold
    return {"name": "Alice"}


@snitchbot.watch_slow(threshold_ms=500)
def generate_report() -> str:
    time.sleep(0.6)  # sync, also captured
    return "report-data"


async def main():
    snitchbot.init("slow-demo")
    await fetch_user_profile(42)
    generate_report()
worker.py
import snitchbot
from snitchbot import AnomalyConfig, WatchdogConfig

snitchbot.init("watchdog-demo")

# Zero-config: watchdog is on, threshold 500 ms,
# auto-escalates to error at 2 s, critical at 5 s.

# Full config with custom thresholds:
snitchbot.init(
    "watchdog-demo",
    anomaly=AnomalyConfig(
        watchdog=WatchdogConfig(
            threshold_ms=500,            # 🟠 warning
            error_threshold_ms=2000,      # 🔴 error
            critical_threshold_ms=5000,   # 🟣 critical
            escalation_window="1m",
            cooldown_sec=5,
        ),
    ),
)
vitals_config.py
import snitchbot
from snitchbot import (
    AnomalyConfig,
    RssAnomalyConfig,
    CpuAnomalyConfig,
    FdAnomalyConfig,
    ThreadAnomalyConfig,
)

snitchbot.init(
    "anomaly-demo",
    sample_interval_sec=5,
    anomaly=AnomalyConfig(
        rss=RssAnomalyConfig(
            duration="1m", baseline_duration="30m",
            max_mb=450,          # 🔴 ceiling
            spike_ratio=1.5,      # 🟠 +50% vs baseline
            min_spike_mb=50,      # and ≥ 50 MB absolute
        ),
        cpu=CpuAnomalyConfig(
            duration="2m", baseline_duration="20m",
            max_percent=90,       # 🔴 ceiling
            spike_ratio=2.5,       # 🟠 spike
            min_spike_delta=30,    # ≥ 30 pp
        ),
        fds=FdAnomalyConfig(
            max_fds=800,          # 🔴 ulimit guard
            spike_ratio=1.5,       # 🔴 fd leak
            drop_ratio=0.5,        # 🟠 pool collapse
        ),
        threads=ThreadAnomalyConfig(
            max_threads=100,
            spike_ratio=1.5,
        ),
    ),
)
app.py
import snitchbot

snitchbot.init("orders-api")

# ▶ lifecycle("startup", reason="init")  — sent immediately

# ...your service does its thing...

# On any of these paths, a shutdown event is emitted:
#  · Clean exit      -> reason="clean_exit", exit_code=0
#  · SIGTERM / SIGINT -> reason="sigterm" / "sigint"
#  · Uncaught crash   -> reason="crash" (+ traceback)
#  · Thread crash     -> reason="thread_crash"

# Nothing to call, nothing to decorate.
checkout.py
import snitchbot

snitchbot.init("notify-demo")

# Warning with extras — renders as a meta-table
snitchbot.notify(
    "Starting checkout process",
    severity="warning",
    extras={"cart_size": 3, "user": "Alice"},
)

# Error with live traceback
try:
    _ = 1 / 0
except ZeroDivisionError:
    snitchbot.notify(
        "Division failed in payment calculator",
        severity="error",
        exc_info=True,
    )
handler.py
import asyncio
import snitchbot


@snitchbot.watch_slow(threshold_ms=100)
async def call_payment_api(amount: float) -> str:
    await asyncio.sleep(0.2)
    return "txn-12345"


async def handle_request(request_id: str, user_id: int):
    with snitchbot.request_context(
        trace_id=request_id,
        user_id=user_id,
        action="checkout",
    ):
        snitchbot.notify(
            "User started checkout",
            extras={"cart_size": 3},
        )
        await call_payment_api(99.99)  # inherits ctx
app.py
import logging
import snitchbot

snitchbot.init("log-demo")
snitchbot.setup_logging()  # WARNING+ -> Telegram

# Or, for structlog users:
# processor = snitchbot.setup_structlog()
# structlog.configure(processors=[..., processor])

logger = logging.getLogger("myapp")

# Extras become a meta-table in the alert
logger.warning(
    "Cache miss rate too high",
    extra={"miss_pct": 42},
)

# exc_info=True attaches the traceback
try:
    _ = 1 / 0
except ZeroDivisionError:
    logger.error("Calculation failed", exc_info=True)

# Inside request_context — trace_id attached automatically
with snitchbot.request_context(trace_id="req-abc-123"):
    logger.warning("Slow DB query in checkout")
main.py
import snitchbot

snitchbot.init("web-demo")
snitchbot.setup_logging()

# ── FastAPI ───────────────────────────────────
from fastapi import FastAPI
from snitchbot.integrations.fastapi import install

app = FastAPI()
install(app)

# ── Flask ─────────────────────────────────────
# from flask import Flask
# from snitchbot.integrations.flask import install
# app = Flask(__name__); install(app)

# ── Litestar ──────────────────────────────────
# from litestar import Litestar
# from snitchbot.integrations.litestar import install
# app = Litestar(route_handlers=[...]); install(app)


@app.post("/checkout")
async def checkout(cart_value: int = 100):
    snitchbot.notify("Large checkout",
                     extras={"cart_value": cart_value})
    return {"status": "processing"}


@app.get("/search")
async def search(query: str):
    raise ValueError("Unknown search backend")
snitchbot · orders-api
bot · private
Apr 17
🔴 crash · orders-api · a1b2c3 × 2
DatabaseConnectionError: connection refused to 10.0.0.5:5432
Details
first 14:18:42 UTC (24m ago) last 14:42:10 UTC (just now) pid 101 thread MainThread origin sys_excepthook
Stack (top 3 user frames)
app/db/pool.py:47 in acquire()
  conn = await self._pool.get()
app/services/orders.py:88 in fetch_all()
  return await db.fetch(q)
app/routes/orders.py:12 in list_orders()
  return await svc.fetch_all()
📋 full trace🔕 mute 1h
14:42
Message
/status /last /mute
snitchbot · slow-demo
bot · private
Apr 17
🟠 slow call · slow-demo · ee48d4
__main__.fetch_user_profile took 250 ms (threshold 100 ms)
Details
time 12:57:18 UTC pid 1738 is_async true location watch_slow_demo.py:8
12:57
🟠 slow call · slow-demo · f9c2a1
__main__.generate_report took 612 ms (threshold 500 ms)
Details
time 12:57:20 UTC pid 1738 is_async false location watch_slow_demo.py:14
12:57
Message
/status /last /mute
snitchbot · watchdog-demo
bot · private
Apr 17
🟠 watchdog · watchdog-demo · 7c6497
Event loop blocked for 588 ms (threshold 500 ms)
Details
time 11:25:10 UTC pid 1580 loop main
11:25
🔴 watchdog · watchdog-demo · 7c6497 × 2
Event loop blocked for 2 690 ms (threshold 500 ms)
Details
first 11:25:10 UTC (40s ago) last 11:25:20 UTC (just now) pid 1580 loop main
📋 full trace
11:25
🟣 watchdog · watchdog-demo · 732334
Event loop blocked for 5 699 ms (threshold 500 ms)
Details
time 11:25:24 UTC pid 1580 loop main
Stuck tasks (2)
Innocent-Worker · background_task
  worker.py:55 in background_task()
Task-1 · main
  worker.py:97 in main()
📋 full trace
11:25
Message
/status /last /mute
snitchbot · anomaly-demo
bot · private
Apr 17
🟠 anomaly · anomaly-demo · a7af9c × 2
RSS spike: 183 MB (baseline 70 MB, +160%)
Details
time 11:17:40 UTC pid 1550 type rss_spike window 1m baseline 70 MB current 183 MB
11:17
🔴 anomaly · anomaly-demo · c82b14
CPU ceiling: 94% (limit 90%)
Details
time 11:21:02 UTC pid 1550 type cpu_ceiling window 2m baseline 18% current 94%
11:21
🔴 anomaly · anomaly-demo · 91e7da
FD leak: 40 -> 820 (+780)
Details
time 11:24:55 UTC pid 1550 type fds_spike window 5m baseline 40 current 820
11:24
🟠 anomaly · anomaly-demo · 44c2fb
Thread spike: 8 -> 45 (+462%)
Details
time 11:28:11 UTC pid 1550 type threads_spike window 1m baseline 8 current 45
11:28
Message
/status /last /mute
snitchbot · orders-api
bot · private
Apr 17
orders-api started
━━━━━━━━━━━━━━━━━━
pid 101 time 10:00:14 UTC
10:00
orders-api stopped
━━━━━━━━━━━━━━━━━━
pid 101 reason clean_exit exit_code 0 time 10:42:55 UTC
10:42
orders-api (worker) started
━━━━━━━━━━━━━━━━━━
pid 198 time 11:01:02 UTC
11:01
orders-api (worker) stopped
━━━━━━━━━━━━━━━━━━
pid 198 reason sigterm exit_code 0 time 11:30:41 UTC
11:30
orders-api crashed
━━━━━━━━━━━━━━━━━━
pid 254 reason crash time 13:02:18 UTC
13:02
Message
/status /last /mute
snitchbot · notify-demo
bot · private
Apr 17
🟠 notify · notify-demo · f966e3
Starting checkout process
Details
time 12:42:47 UTC pid 1718 caller checkout.py:6 in main()
Extras
cart_size 3 user Alice
12:42
🔴 notify · notify-demo · 2eec9c
Division failed in payment calculator
Details
time 12:52:35 UTC pid 1732 caller checkout.py:14 in main()
Exception: ZeroDivisionError: division by zero
Traceback (most recent call last):
  File "checkout.py", line 13, in main
    _ = 1 / 0
ZeroDivisionError: division by zero
12:52
Message
/status /last /mute
snitchbot · context-demo
bot · private
Apr 17
🟠 notify · context-demo · 156afe
User started checkout
Details
time 12:53:41 UTC pid 1733 caller handler.py:15 in handle_request()
Extras
cart_size 3
Context
trace_id req-abc-123 user_id 42 action checkout
12:53
🟠 slow call · context-demo · ee48d4
__main__.call_payment_api took 201 ms (threshold 100 ms)
Details
time 12:57:18 UTC pid 1733 is_async true location handler.py:6
Context
trace_id req-abc-123 user_id 42 action checkout
12:57
Message
/status /last /mute
snitchbot · log-demo
bot · private
Apr 17
🟠 log.warning · log-demo · c1b4e8
Cache miss rate too high
Details
time 14:02:11 UTC pid 2104 caller app.py:15 in <module>()
Extras
miss_pct 42 logger myapp level WARNING
14:02
🔴 log.error · log-demo · d9a3f1
Calculation failed
Exception: ZeroDivisionError: division by zero
Traceback (most recent call last):
  File "app.py", line 23, in <module>
    _ = 1 / 0
ZeroDivisionError: division by zero
14:02
🟠 log.warning · log-demo · 7b2c8d
Slow DB query in checkout
Context
trace_id req-abc-123
14:02
Message
/status /last /mute
snitchbot · web-demo
bot · private
Apr 17
🟠 notify · web-demo · 5ec8a2
Large checkout
Details
time 15:12:04 UTC pid 2211 caller main.py:22 in checkout()
Extras
cart_value 500
Context
http_method POST path /checkout client_ip 10.0.0.14 trace_id a3f7-b2c9
15:12
🔴 crash · web-demo · bb9d31
ValueError: Unknown search backend
Details
time 15:13:22 UTC pid 2211 origin fastapi_middleware
Context
http_method GET path /search client_ip 10.0.0.14 trace_id d2e8-c4a1 query {"query":"test"}
Stack (top 2 user frames)
main.py:28 in search()
  raise ValueError("Unknown search backend")
📋 full trace🔕 mute 1h
15:13
Message
/status /last /mute
Uncaught exceptions — captured automatically.
Zero user code beyond snitchbot.init(). Hooks into sys.excepthook, threading.excepthook, and the asyncio exception handler. Works in main thread, worker threads, async tasks. Fork-safe.
Functions that crossed the threshold — sync or async.
Decorate with @snitchbot.watch_slow(threshold_ms=...). Fast path untouched — the alert fires only when duration exceeds the threshold. Works for sync functions too.
Configurable coroutine. Measures event loop latency. Alerts past 500 ms by default.
A lightweight pinger coroutine updates a monotonic timestamp every 100 ms; a separate observer checks the gap. Any stall past threshold_ms is reported with every stuck task's stack. Multi-threshold severity: threshold_ms -> 🟠 warning, error_threshold_ms -> 🔴 error, critical_threshold_ms -> 🟣 critical. All defaults sensible — zero-config works, full-config unlocks the 3-tier escalation.
A richly-configurable vitals detector — RSS, CPU, FDs, threads.
Every metric gets three independent modes: ceiling (hard limit), spike (relative growth vs baseline), drop (relative decline). Windows and baselines are time-based ("15s", "1m", "1h"). The sidecar samples psutil every 5 s (tunable via sample_interval_sec). Turn any mode off by passing None — or skip the config entirely for sensible defaults.
Know when a service starts, stops, or dies.
Emitted automatically by snitchbot.init() and registered atexit / signal handlers. You see startup, clean exits, graceful shutdowns (SIGTERM / SIGINT), and crashes — with pid, role (worker / standalone), reason, and exit code. Multiworker-aware: gunicorn / uvicorn workers get their own role suffix.
Send anything — warnings, errors, business events.
Call snitchbot.notify(text, severity, extras, exc_info). Severity drives the icon (🟠/🔴/🟣) and rate-limit bucket. exc_info=True attaches the current traceback.
Attach trace_id + metadata to every alert in scope.
Wrap code in with snitchbot.request_context(trace_id=..., **extras). Everything inside — notify(), @watch_slow, crash reports — inherits the context. Propagates across await, create_task, and nested calls. Frameworks (FastAPI / Flask / Litestar) set this automatically per request.
stdlib logging and structlog — both bridged to Telegram.
One line — snitchbot.setup_logging() — attaches a handler to Python's logging. WARNING+ records become notifications, keeping level, message, extras, and exc_info. For structlog, call snitchbot.setup_structlog() and add the returned processor to your chain. Inside a request_context, trace_id is attached automatically.
FastAPI, Flask, and Litestar — one-line integration each.
install(app) from the matching integration module. Middleware attaches per-request context (http_method, path, client_ip, trace_id). 5xx errors auto-captured with safe headers and query params. Response gets an X-Snitchbot-Trace-Id header. Logging bridge picks up the same context inside request scope.

Charts where you look.
Data for when you dig.

The sidecar samples psutil and keeps a rolling buffer. Ask for a chart in any window from 1m to 1d, pin a live dashboard that updates itself, or pull the full history as a CSV and drop it into your notebook.

snitchbot · orders-api
bot · private
Apr 17
/chart all 5m 12:17
CPU (5m) cur=99.2% min=0.0 max=100.2  100.20  ┼╮ ╭─╮ ╭─╮  ╭╮  ╭╮  ╭╮  ╭
   80.16  ┤│ │ │ │ │ ╭╯│ ╭╯│  ││  │
   60.12  ┤│ │ │ │ │ │ │ │ │ ╭╯│ ╭╯
   40.08  ┤│ │ │ │ │ │ │ │ ╰╮│ ╰╮│
   20.04  ┤│ │ │ │ ╰╮│ ╰╮│  ││  ││
    0.00  ┤╰─╯ ╰─╯  ╰╯  ╰╯  ╰╯  ╰╯
RSS (5m) cur=80.2MB min=43.5 max=83.5   83.52  ┤
   76.85  ┤ ╭─╮    ╭───╮  ╭───╮   ╭
   70.18  ┤╭╯ │   ╭╯   │  │   ╰╮ ╭╯
   63.51  ┼╯  │   │    │ ╭╯    │ │
   56.84  ┤   │  ╭╯    ╰─╯     │╭╯
   50.17  ┤   │ ╭╯             ╰╯
   43.50  ┤   ╰─╯
FDs (5m) cur=26 min=14 max=34   34.00  ┤
   30.67  ┤          ╭───────────╮
   27.33  ┤          │           │
   24.00  ┤          │           ╰─
   20.67  ┤          │
   17.33  ┼──────────╯
   14.00  ┤
Threads (5m) cur=12 min=6 max=13   13.00  ┤
   11.83  ┤╭──╮   ╭──╮    ╭╮     ╭─
   10.67  ┼╯  ╰╮ ╭╯  │   ╭╯╰╮    │
    9.50  ┤    │ │   ╰╮ ╭╯  │   ╭╯
    8.33  ┤    ╰─╯    ╰╮│   ╰╮  │
    7.17  ┤            ╰╯    ╰╮╭╯
    6.00  ┤                   ╰╯
2026-04-17 12:16:34 UTC -> 2026-04-17 12:17:35 UTC
12:17
/export 12:18
CSV orders-api_vitals.csv 12.4 KB · Document
Vitals export: 128 samples
12:18
Message
/status /last /mute
orders-api_vitals.csv
128 samples · 5-second interval · 2026-04-17
CSVexport
sampled_at cpu_pct rss_mb fds threads
12:16:34 4.1 43.5 14 6
12:16:39 62.4 71.2 18 11
12:16:44 88.6 76.8 22 12
12:16:49100.2 83.5 26 13
12:16:54 41.0 78.3 30 11
12:16:59 79.8 82.1 34 10
12:17:04 22.7 77.9 31 9
12:17:09 94.3 80.6 28 12
12:17:14 55.1 79.1 26 11
12:17:19 99.2 80.2 26 12
...............
drop into pandas, duckdb, or any notebook. CSV · UTF-8
/chart on demand
ASCII charts for any metric, any window.
Metrics: cpu · mem · fds · threads · all.
Windows: 1m · 5m · 15m · 1h · 6h · 1d.
Usage: /chart all 1h
live dashboard pinned · auto-updating
A single pinned message, always current.
Set live_dashboard=True at init(). The sidecar pins one message to the chat and rewrites it every sample_interval_sec with fresh vitals, so it never clutters the history.
/export download
Full vitals history, as a CSV.
The sidecar streams the entire rolling buffer as <service>_vitals.csv via Telegram's sendDocument. Columns: sampled_at, cpu_pct, rss_mb, fds, threads.

Three processes,
two wires.

Your service stays thin. A detached sidecar carries the weight — HTTP, dedup, rate-limit, vitals, interactive commands. They talk over an AF_UNIX datagram socket in microseconds, so nothing you write ever waits on Telegram.

snitchbot architecture Three processes connected by two wires. The host process (pid 101) runs your Python service and the snitchbot client, and uses under 10 MB of memory. The sidecar process (pid 102) is async and isolated, runs the dedup, queue, rate-limit, render, and scrub pipeline, and uses about 43 MB. It talks to the host over AF_UNIX SOCK_DGRAM with msgpack, and to Telegram over HTTPS. Telegram commands come back via long polling. 01 · host process your Python service pid 101 your code fastapi · litestar · flask · cli · worker — decorate with @watch_slow, call notify(), wrap scope in request_context() snitchbot.client thin · synchronous-safe · fire-and-forget hooks · sys.excepthook · threading.excepthook · asyncio exception handler · signal (SIGTERM/SIGINT) · atexit watchdog pinger coroutine + observer thread transport AF_UNIX SOCK_DGRAM, non-blocking footprint < 10 MB added to your process 1 dep (msgpack) · no httpx, no psutil AF_UNIX SOCK_DGRAM msgpack · ~µs 02 · sidecar process async · isolated · does the work pid 102 ingest recv loop · client registry · handshake vitals psutil sampler anomaly detectors pipeline single event flow, stage by stage dedup 5 min · ×N queue priority rate-limit 30 tok/s render HTML scrub secrets secret scrubbing between render and send — tokens, keys, passwords never reach Telegram. interactive /status /chart /last /export /mute /test live dashboard single pinned msg edits in place footprint ~43 MB RSS, isolated from your process one per service · crashes here don't crash you HTTPS · bot api commands · long-poll 03 · telegram bot api · your chat snitchbot · orders-api bot · private 🔴 crash · orders-api DatabaseConnectionError 🟠 slow call fetch_all took 1843 ms 🟠 anomaly · RSS 183 MB (+160%) 📌 pinned · live cpu 42% · rss 82 MB fds 26 · threads 12 updates every 5 s /status /last /mute /chart
why a sidecar
Observability should never crash production.
Telegram's HTTPS can hang for seconds during degradation. Anomaly detectors need psutil. Rate-limit state needs memory. None of that belongs in your hot path — so we put it in its own process. If the sidecar dies, your app doesn't.
why AF_UNIX · SOCK_DGRAM
Fastest IPC that can't block.
SOCK_DGRAM = kernel-level datagram. A send() takes microseconds, buffered atomically by the kernel. No TCP handshake, no TLS, no user-space queue. msgpack keeps the payload compact — pydantic, httpx, psutil stay out of your process.
why long-polling
Your service doesn't need a public endpoint.
The sidecar calls Telegram's getUpdates — commands come back in the response. Works behind NAT, inside Docker, on a laptop. No webhook, no reverse proxy, no TLS cert. /status /chart /export /mute all travel this way.

00:30
from empty terminal to
your first alert.

No account, no DSN, no dashboards. Three steps: a Telegram bot, a chat id, and one uv add. The rest is already wired.

step :00

Talk to @BotFather.

Send /newbot, pick a name, copy the token. Takes 10 seconds, always has.
B
BotFather 17:02
/newbot
Alright, a new bot. How are we going to call it?
orders-api-alerts
Done! Use this token to access the HTTP API:
7824……:AAH…Dg
Keep your token secure — paste it into .env.
step :10

Get your chat id.

Message @userinfobot — it replies with your numeric id. For groups, add your new bot and use the group id instead.
userinfobot 17:03
/start
@your_handle
Id: 1387261905
Language: en
step :20

Install. Initialise.

Drop the token and chat id into .env, add one line to your Python — done. init() reads env vars and spawns the sidecar.
$ uv add snitchbot
# .env
SNITCHBOT_TOKEN="7824…:AAH…Dg"
SNITCHBOT_CHAT_ID="1387261905"

# app.py
import snitchbot
snitchbot.init("orders-api")
:30
and then —
orders-api started ━━━━━━━━━━━━━━━━━━
pid 101 time 2026-04-19 17:03:42 UTC
17:03
Your service just introduced itself. Everything else is already watching.