← Back to Blog
Article

From Webhook Noise to Actionable Alerts: A WHEN → IF → THEN Model

From Webhook Noise to Actionable Alerts: A WHEN → IF → THEN Model

From Webhook Noise to Actionable Alerts: A WHEN → IF → THEN Model

The first thing that happens after you wire up a webhook integration is that it works. The second thing that happens is that it works too well.

You connected your error tracker. Now every caught-and-handled exception in a test environment generates an alert. You connected your deployment tool. Now every successful rollout pings your on-call channel. You connected your log aggregator. Now a flurry of DEBUG-level "connection timeout retried" messages wakes someone up at 4 AM because one of them happened to contain the word "error."

The webhook itself isn't the problem. The problem is that webhooks are firehose-shaped and alerts need to be scalpel-shaped, and no one's built the conversion layer between them.

The Integration Dilemma

Every alerting tool eventually faces the same tradeoff:

  • Accept everything the webhook sends and let humans filter. Simple to build, terrible to live with. Your alert channel becomes a log stream. Engineers start muting it.
  • Force upstream systems to filter before sending. Cleaner in theory, nightmarish in practice. You now need to convince a dozen different tool owners to configure their filters to your liking. Some of them have no filter options at all.
  • Build a filter layer on the receiving end. The right answer, but requires a real UI and a real evaluation engine. Not a checkbox of "severity > warning."

We went with the third. And the mental model we landed on — borrowed and adapted from Sentry's issue alert rules — is what made it actually usable.

The WHEN → IF → THEN Model

A filter rule, stripped to its essence, is a conditional. But "conditional" is developer vocabulary. For a filter builder that non-engineers should be able to use, we needed a narrative:

  • WHEN an event matches this trigger (the webhook fires),
  • IF these conditions hold,
  • THEN create an alert.

That three-part structure gives the UI a natural flow: you see the trigger, you build the conditions, you know what happens. Contrast with the more common approach of listing a flat array of "rules" where it's unclear which ones are required, which are optional, and what happens when they don't match.

The WHEN is fixed: "a webhook request arrived at this URL." The THEN is fixed: "create an alert with the given title, description, severity, and metadata." The IF — the conditions — is where the actual intelligence lives.

filterMatch: all / any / none

The single most important design decision was how multiple conditions compose. The traditional answer is "AND or OR." We shipped that first, and immediately ran into a gap: exclusion.

The most common real-world filter rule isn't "alert if X AND Y." It's "alert unless Z." Things like "alert unless environment is test," "alert unless the source is localhost," "alert unless the user agent is our own health checker."

Expressing "unless" with AND/OR requires mental gymnastics. You end up writing environment != 'test' AND environment != 'dev' AND environment != 'staging' — three conditions joined by AND, each with a negated operator — when what you actually mean is "none of these three."

So we added a third match mode, none, and the semantics became:

  • all — every condition must be true. (AND)
  • any — at least one condition must be true. (OR)
  • none — no condition must be true. (exclusion)

Three modes. Covers every real filter rule we've seen.

The Operator Cheat Sheet

Conditions need operators, and operators need to be discoverable. We grouped them by category so the UI can present them without drowning the user:

Equality

  • eq — equals
  • ne — not equals

Text matching

  • sw — starts with
  • ew — ends with
  • co — contains
  • nc — does not contain
  • regex — matches regular expression

Numeric

  • gt, gte, lt, lte — greater/less than, with or without equal

Existence

  • is — field is set (exists in the payload)
  • ns — field is not set

Eleven operators, four categories. Anything more and the dropdown becomes unnavigable; anything less and you lose the ability to express real-world rules.

Real-World Filter Patterns

Theory is cheap. Here are the actual filter rules that teams build, in practice.

Production Critical Only

The classic first rule anyone writes: alert only when severity is critical and the event came from production.

{
  "enabled": true,
  "filterMatch": "all",
  "conditions": [
    { "field": "severity", "operator": "eq", "value": "critical", "type": "string" },
    { "field": "metadata.environment", "operator": "eq", "value": "production", "type": "string" }
  ]
}

filterMatch: all = AND. Both must match. Textbook case.

Error Volume Threshold

Alert only if the error count from a batched integration exceeds some threshold — avoids pinging on every individual stack trace.

{
  "enabled": true,
  "filterMatch": "all",
  "conditions": [
    { "field": "metadata.error_count", "operator": "gte", "value": 100, "type": "number" }
  ]
}

The number type tells the evaluator to coerce. Webhooks sometimes send counts as strings ("100"), and you'd rather the filter handle that than have it silently never match.

Exclude Test Environments

The exclusion pattern in all its glory. Using none match mode: the alert fires if none of the listed conditions are true.

{
  "enabled": true,
  "filterMatch": "none",
  "conditions": [
    { "field": "metadata.environment", "operator": "sw", "value": "test", "type": "string" },
    { "field": "metadata.environment", "operator": "sw", "value": "dev", "type": "string" },
    { "field": "metadata.environment", "operator": "sw", "value": "staging", "type": "string" }
  ]
}

Read as: "create an alert unless the environment starts with test, dev, or staging." Try writing this with pure AND/OR — you end up with three ne conditions joined by all, which works but reads backwards.

Exclude Localhost and Loopbacks

Another none pattern. Often layered with the environment exclusion above.

{
  "enabled": true,
  "filterMatch": "none",
  "conditions": [
    { "field": "metadata.url", "operator": "co", "value": "localhost", "type": "string" },
    { "field": "metadata.url", "operator": "co", "value": "127.0.0.1", "type": "string" },
    { "field": "metadata.user_agent", "operator": "co", "value": "HealthCheck", "type": "string" }
  ]
}

The last condition is the interesting one: you're filtering out your own monitoring system's health checks, which otherwise would generate alerts that point back at you.

Multiple Error Sources

Using any (OR) to surface events that touch any of a set of sensitive services.

{
  "enabled": true,
  "filterMatch": "any",
  "conditions": [
    { "field": "metadata.service", "operator": "eq", "value": "payment", "type": "string" },
    { "field": "metadata.service", "operator": "eq", "value": "auth", "type": "string" },
    { "field": "title", "operator": "co", "value": "database", "type": "string" }
  ]
}

This rule says: "I care about anything touching payments, auth, or the database — if any of these match, page me."

Regex for Structured IDs

When you need something more surgical, regex is there.

{
  "enabled": true,
  "filterMatch": "all",
  "conditions": [
    { "field": "metadata.order_id", "operator": "regex", "value": "^ORD-20\\d{2}-", "type": "string" }
  ]
}

Only fire for orders matching a specific ID format — useful when a single webhook URL is shared across products and you want to filter by ID pattern.

Exclusion Is the Unsung Hero

Of the three match modes, none is the one teams end up using most in production, and it's the one that isn't in most competing tools.

Here's why: your signal-to-noise problem is almost always a noise problem, not a signal problem. You already have the events that matter. What you don't have is a clean way to say "but not these ones." Exclusion is that clean way.

A team that sets up one broad webhook and layers on three none rules — exclude test envs, exclude healthchecks, exclude throttled retries — will get a vastly cleaner alert stream than a team that writes five narrow all rules trying to enumerate every positive case. The positive-case approach misses edge cases. The exclusion approach starts permissive and subtracts.

Fail-Safe Defaults

One subtle design call, worth flagging: what happens when filter evaluation itself fails?

Say the filter references a field that doesn't exist on this particular payload, or the regex has a bug, or the type coercion hits an unexpected null. What's the right default behavior?

Two options:

  1. Fail closed. If the filter errors, drop the event.
  2. Fail open. If the filter errors, create the alert anyway.

We chose fail-open, and it wasn't a close call. The entire point of the monitoring stack is to not miss real incidents. A malformed filter rule that silently eats alerts is a failure mode that can go undetected for months. A malformed filter rule that lets a few extra alerts through is a noise problem that whoever wrote the rule will notice and fix the same day.

The rule we live by: alerts are permanent, filters are not. If the filter layer has a bug, create the alert and let humans sort it out.

Write-Only Is a Feature

The other design choice that took some internal convincing was making the webhook endpoint write-only. No PATCH, no DELETE, no "update the alert to new metadata."

The reasoning is that webhooks are notoriously unreliable carriers of truth. Retries happen. Duplicates happen. Out-of-order delivery happens. If an earlier webhook creates an alert and a later (but delayed) webhook "updates" it, you've let the network topology rewrite your incident log.

By making webhook endpoints create-only — and leaving modification to the authenticated UI and server APIs — we get a much cleaner model: the webhook is an event, the alert is a record of events, and records don't retroactively change.

Occurrence grouping (covered in a separate post) handles the "same event arriving twice" case by appending occurrences rather than creating duplicate alerts, so you get idempotence without mutability.

What We Learned

Three takeaways from building this:

Exclusion is a first-class operator, not a second-class one. Most teams need to subtract noise more than they need to add signal. Design for that up front.

Filter rules are configuration, and configuration needs a UI. JSON-editing filter rules works for engineers. It doesn't work for the ops lead who wants to silence a specific noisy source at 2 AM. A visual builder paired with a JSON view for power users is the right shape.

Fail open, always. Whatever clever logic you put in the filter layer, it is not allowed to lose alerts. Full stop.

A webhook filter layer isn't glamorous. Nobody writes a blog post about how much they love their filter rules. But it's the difference between an alert channel that engineers treat as useful and one they treat as background noise — and that's the difference between a monitoring system that works and one that exists.