Skip to main content

🚨 Alerting / Webhooks

Get alerts for:

  • Hanging LLM api calls
  • Slow LLM api calls
  • Failed LLM api calls
  • Budget Tracking per key/user
  • Spend Reports - Weekly & Monthly spend per Team, Tag
  • Failed db read/writes
  • Model outage alerting
  • Daily Reports:
    • LLM Top 5 slowest deployments
    • LLM Top 5 deployments with most failed requests
  • Spend Weekly & Monthly spend per Team, Tag

Works across:

Quick Start

Set up a slack alert channel to receive alerts from proxy.

Step 1: Add a Slack Webhook URL to env

Get a slack webhook url from

You can also use Discord Webhooks, see here

Set SLACK_WEBHOOK_URL in your proxy env to enable Slack alerts.

export SLACK_WEBHOOK_URL="<>/<>/<>"

Step 2: Setup Proxy

alerting: ["slack"]
alerting_threshold: 300 # sends alerts if requests hang for 5min+ and responses take 5min+
spend_report_frequency: "1d" # [Optional] set as 1d, 2d, 30d .... Specifiy how often you want a Spend Report to be sent

Start proxy

$ litellm --config /path/to/config.yaml

Step 3: Test it!

curl -X GET '' \
-H 'Authorization: Bearer sk-1234'


Redacting Messages from Alerts

By default alerts show the messages/input passed to the LLM. If you want to redact this from slack alerting set the following setting on your config

alerting: ["slack"]
alert_types: ["spend_reports"]

redact_messages_in_exceptions: True

Add Metadata to alerts

Add alerting metadata to proxy calls for debugging.

import openai
client = openai.OpenAI(

# request sent to model set on litellm proxy, `litellm --model`
response =
messages = [],
"metadata": {
"alerting_metadata": {
"hello": "world"

Expected Response

Opting into specific alert types

Set alert_types if you want to Opt into only specific alert types. When alert_types is not set, all Default Alert Types are enabled.

👉 See all alert types here

alerting: ["slack"]
alert_types: [

Set specific slack channels per alert type

Use this if you want to set specific channels per alert type

This allows you to do the following

llm_exceptions -> go to slack channel #llm-exceptions
spend_reports -> go to slack channel #llm-spend-reports

Set alert_to_webhook_url on your config.yaml

- model_name: gpt-4
model: openai/fake
api_key: fake-key

master_key: sk-1234
alerting: ["slack"]
alerting_threshold: 0.0001 # (Seconds) set an artifically low threshold for testing alerting
alert_to_webhook_url: {
"llm_exceptions": "",
"llm_too_slow": "",
"llm_requests_hanging": "",
"budget_alerts": "",
"db_exceptions": "",
"daily_reports": "",
"spend_reports": "",
"cooldown_deployment": "",
"new_model_added": "",
"outage_alerts": "",

success_callback: ["langfuse"]

Test it - send a valid llm request - expect to see a llm_too_slow alert in it's own slack channel

curl -i http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "Hello, Claude gm!"}

Using MS Teams Webhooks

MS Teams provides a slack compatible webhook url that you can use for alerting

Quick Start
  1. Get a webhook url for your Microsoft Teams channel

  2. Add it to your .env

  1. Add it to your litellm config
model_name: "azure-model"
model: "azure/gpt-35-turbo"
api_key: "my-bad-key" # 👈 bad key

alerting: ["slack"]
alerting_threshold: 300 # sends alerts if requests hang for 5min+ and responses take 5min+
  1. Run health check!

Call the proxy /health/services endpoint to test if your alerting connection is correctly setup.

curl --location '' \
--header 'Authorization: Bearer sk-1234'

Expected Response

Using Discord Webhooks

Discord provides a slack compatible webhook url that you can use for alerting

Quick Start
  1. Get a webhook url for your discord channel

  2. Append /slack to your discord webhook - it should look like

  1. Add it to your litellm config
model_name: "azure-model"
model: "azure/gpt-35-turbo"
api_key: "my-bad-key" # 👈 bad key

alerting: ["slack"]
alerting_threshold: 300 # sends alerts if requests hang for 5min+ and responses take 5min+


[BETA] Webhooks for Budget Alerts

Note: This is a beta feature, so the spec might change.

Set a webhook to get notified for budget alerts.

  1. Setup config.yaml

Add url to your environment, for testing you can use a link from here

export WEBHOOK_URL=""

Add 'webhook' to config.yaml

alerting: ["webhook"] # 👈 KEY CHANGE
  1. Start proxy
litellm --config /path/to/config.yaml

  1. Test it!
curl -X GET --location '' \
--header 'Authorization: Bearer sk-1234'

Expected Response

"spend": 1, # the spend for the 'event_group'
"max_budget": 0, # the 'max_budget' set for the 'event_group'
"token": "88dc28d0f030c55ed4ab77ed8faf098196cb1c05df778539800c9f1243fe6b4b",
"user_id": "default_user_id",
"team_id": null,
"user_email": null,
"key_alias": null,
"projected_exceeded_data": null,
"projected_spend": null,
"event": "budget_crossed", # Literal["budget_crossed", "threshold_crossed", "projected_limit_exceeded"]
"event_group": "user",
"event_message": "User Budget: Budget Crossed"

API Spec for Webhook Event

  • spend float: The current spend amount for the 'event_group'.

  • max_budget float or null: The maximum allowed budget for the 'event_group'. null if not set.

  • token str: A hashed value of the key, used for authentication or identification purposes.

  • customer_id str or null: The ID of the customer associated with the event (optional).

  • internal_user_id str or null: The ID of the internal user associated with the event (optional).

  • team_id str or null: The ID of the team associated with the event (optional).

  • user_email str or null: The email of the internal user associated with the event (optional).

  • key_alias str or null: An alias for the key associated with the event (optional).

  • projected_exceeded_date str or null: The date when the budget is projected to be exceeded, returned when 'soft_budget' is set for key (optional).

  • projected_spend float or null: The projected spend amount, returned when 'soft_budget' is set for key (optional).

  • event Literal["budget_crossed", "threshold_crossed", "projected_limit_exceeded"]: The type of event that triggered the webhook. Possible values are:

    • "spend_tracked": Emitted whenver spend is tracked for a customer id.
    • "budget_crossed": Indicates that the spend has exceeded the max budget.
    • "threshold_crossed": Indicates that spend has crossed a threshold (currently sent when 85% and 95% of budget is reached).
    • "projected_limit_exceeded": For "key" only - Indicates that the projected spend is expected to exceed the soft budget threshold.
  • event_group Literal["customer", "internal_user", "key", "team", "proxy"]: The group associated with the event. Possible values are:

    • "customer": The event is related to a specific customer
    • "internal_user": The event is related to a specific internal user.
    • "key": The event is related to a specific key.
    • "team": The event is related to a team.
    • "proxy": The event is related to a proxy.
  • event_message str: A human-readable description of the event.

Region-outage alerting (✨ Enterprise feature)

Setup alerts if a provider region is having an outage.

alerting: ["slack"]
alert_types: ["region_outage_alerts"]

By default this will trigger if multiple models in a region fail 5+ requests in 1 minute. '400' status code errors are not counted (i.e. BadRequestErrors).

Control thresholds with:

alerting: ["slack"]
alert_types: ["region_outage_alerts"]
region_outage_alert_ttl: 60 # time-window in seconds
minor_outage_alert_threshold: 5 # number of errors to trigger a minor alert
major_outage_alert_threshold: 10 # number of errors to trigger a major alert

All Possible Alert Types

👉 Here is how you can set specific alert types

LLM-related Alerts

Alert TypeDescriptionDefault On
llm_exceptionsAlerts for LLM API exceptions
llm_too_slowNotifications for LLM responses slower than the set threshold
llm_requests_hangingAlerts for LLM requests that are not completing
cooldown_deploymentAlerts when a deployment is put into cooldown
new_model_addedNotifications when a new model is added to litellm proxy through /model/new
outage_alertsAlerts when a specific LLM deployment is facing an outage
region_outage_alertsAlerts when a specfic LLM region is facing an outage. Example us-east-1

Budget and Spend Alerts

Alert TypeDescriptionDefault On
budget_alertsNotifications related to budget limits or thresholds
spend_reportsPeriodic reports on spending across teams or tags
failed_tracking_spendAlerts when spend tracking fails
daily_reportsDaily Spend reports
fallback_reportsWeekly Reports on LLM fallback occurrences

Database Alerts

Alert TypeDescriptionDefault On
db_exceptionsNotifications for database-related exceptions

Management Endpoint Alerts - Virtual Key, Team, Internal User

Alert TypeDescriptionDefault On
new_virtual_key_createdNotifications when a new virtual key is created
virtual_key_updatedAlerts when a virtual key is modified
virtual_key_deletedNotifications when a virtual key is removed
new_team_createdAlerts for the creation of a new team
team_updatedNotifications when team details are modified
team_deletedAlerts when a team is deleted
new_internal_user_createdNotifications for new internal user accounts
internal_user_updatedAlerts when an internal user's details are changed
internal_user_deletedNotifications when an internal user account is removed