← Back to Blog

Replace Webhooks With Persistent Agent Tunnels

February 21, 2026 webhooks event-stream architecture

"Give me /events, not webhooks." That sentiment hit the front page of Hacker News because it captures a frustration shared by every developer who has built a webhook consumer at scale. Webhooks are conceptually simple -- a provider sends an HTTP POST to your URL when something happens. In practice, they are a source of silent data loss, security vulnerabilities, and infrastructure complexity that scales worse than the systems they integrate.

Nearly 20% of webhook event deliveries fail silently during peak loads. The provider's retry logic is a black box you do not control. Events arrive out of order. Your endpoint needs a routable IP address, which means code running behind NAT -- laptops, home servers, CI runners, most AI agents -- cannot receive webhooks at all. And the security model is inverted: you are exposing a public HTTP endpoint that anyone on the internet can POST to.

This article examines why webhooks break down for agent-to-agent communication, why the common workarounds do not solve the fundamental problems, and how persistent encrypted tunnels with built-in event streaming provide a better model.

Why Webhooks Fail

Webhooks turn your application into a distributed system. The moment you accept an incoming HTTP POST from an external provider, you inherit every hard problem in distributed computing: partial failure, message ordering, idempotency, and exactly-once delivery. Most teams do not realize this until they start losing events in production.

Silent failure at scale

Webhook providers typically retry failed deliveries on an exponential backoff schedule. If your server returns a 500 or times out during a deployment window, the provider queues the retry. If your server is down for longer than the retry window -- which varies wildly between providers, from 30 minutes to 72 hours -- those events are gone. You have no way to know they existed. There is no consumer-side replay. There is no offset you can rewind to.

A production study of webhook delivery across major SaaS platforms found that nearly 20% of deliveries fail during peak loads. Not 20% of total events over the lifetime of the integration -- 20% during the hours when you need them most. Payment processing webhooks during Black Friday. CI/CD webhooks during a deploy. Agent task completion signals during a burst of parallel work.

The public URL problem

Webhooks require the consumer to expose a routable HTTP endpoint. This is trivial if you run in a cloud data center with a static IP. It is impossible if your code runs behind NAT. And most AI agents run behind NAT.

Consider the deployment reality: an agent running on a developer laptop behind a home router. An agent running in a Docker container behind a corporate firewall. An agent running on a cloud VM with no public IP (which is the default on most cloud providers now, for good security reasons). None of these can receive webhooks without additional infrastructure.

Security exposure

Exposing a public HTTP endpoint to receive webhooks creates an SSRF (Server-Side Request Forgery) attack surface. You must validate the webhook signature to ensure it came from the expected provider. But signature validation is provider-specific, and a surprising number of implementations get it wrong -- timing attacks on HMAC comparison, missing replay protection, or signature schemes that do not cover the full request body.

Even with correct signature validation, you are still running a public HTTP server that must parse untrusted input. Every webhook endpoint is a potential entry point for payload injection.

The infrastructure spiral

To handle webhooks reliably, you need to build infrastructure that rivals the webhook provider itself. One engineering team documented what it takes to process a single webhook safely:

"You would need 4 new services (SQS, S3, Publisher, Consumer) just to handle a single webhook safely."

An incoming webhook hits a lightweight receiver that immediately returns 200 OK. The receiver pushes the raw payload to a queue (SQS, RabbitMQ). A consumer reads from the queue with retry logic. Failed events go to a dead letter queue. A separate service monitors the dead letter queue and alerts. You need idempotency keys to handle duplicate deliveries. You need ordering logic if events must be processed sequentially. This is four to six services to reliably receive an HTTP POST.

The Ngrok Band-Aid

The most common workaround for the "no public URL" problem is a tunneling service like ngrok. It creates a temporary public URL that tunnels traffic to your local machine. For development, this is convenient. For production agent communication, it introduces its own problems.

The free tier of ngrok limits you to 20 connections per minute and assigns a new random subdomain every session. Your webhook URL changes every time you restart the tunnel -- roughly every 7 hours on the free tier. That means reconfiguring every webhook provider that points at your endpoint, which is a manual process for most SaaS integrations and completely impractical for agent-to-agent communication where peers discover each other dynamically.

Paid tiers fix the URL stability problem but introduce a dependency on a third-party service that sits in the data path. Every webhook payload passes through ngrok's servers in plaintext (unless you add your own TLS layer). For agent communication carrying sensitive data -- task results, model outputs, customer information -- this is an unacceptable trust model.

Ngrok solves the tunneling problem. It does not solve the webhook problem. You still have silent failures, missing ordering guarantees, no consumer-side replay, and an inverted security model where you expose an endpoint rather than initiate a connection.

Persistent Tunnels: A Different Model Entirely

The webhook model is "push to a URL." The persistent tunnel model is "maintain a connection and stream events." This is a fundamental architectural difference, not a minor protocol variation.

In the webhook model, the producer decides when to send data and where to send it. The consumer is passive -- it sits and waits for POSTs. If the consumer is offline, events are lost (or queued on the producer side, which is the producer's problem, not yours). The consumer has no control over delivery timing, ordering, or backpressure.

In the persistent tunnel model, both sides maintain an active connection. The consumer subscribes to specific event topics. Events flow over the existing tunnel -- no new connection setup per event. If the consumer disconnects, it resubscribes when it reconnects. The connection itself handles encryption, NAT traversal, and peer authentication. There is no public URL because the consumer initiates the connection outward, through NAT, to a rendezvous point.

Pilot Protocol implements this model with its event stream on port 1002. Agents connect to each other through encrypted UDP tunnels with automatic NAT traversal (STUN discovery, hole-punching, relay fallback). Once connected, they can publish and subscribe to topic-based event streams without any additional infrastructure.

Pilot's Event Stream: Subscribe, Publish, No Public URL

Port 1002 is Pilot's built-in pub/sub service. It supports topic-based routing with wildcard subscriptions, persistent connections, and encrypted transport. Here is how it works in practice.

CLI quick start

# Terminal 1: subscribe to all task events
pilotctl subscribe "tasks.*"

# Terminal 2: publish a task completion event
pilotctl publish tasks.complete '{"task_id":"abc123","status":"done","result":"summary generated"}'

# Terminal 1 immediately prints:
# [tasks.complete] {"task_id":"abc123","status":"done","result":"summary generated"}

No webhook URL configured. No public endpoint exposed. No HTTP server running. The subscriber initiated the connection outward through NAT and receives events over the existing encrypted tunnel.

Go subscriber and publisher

package main

import (
    "encoding/json"
    "fmt"
    "log"

    "github.com/TeoSlayer/pilotprotocol/pkg/driver"
)

// Subscriber: replaces your webhook endpoint
func main() {
    d, err := driver.Connect()
    if err != nil {
        log.Fatal(err)
    }
    stream, err := d.OpenEventStream()
    if err != nil {
        log.Fatal(err)
    }

    // Subscribe to payment events (replaces Stripe webhook)
    ch, err := stream.Subscribe("payments.*")
    if err != nil {
        log.Fatal(err)
    }

    fmt.Println("Listening for payment events...")
    for event := range ch {
        var payload map[string]any
        json.Unmarshal(event.Data, &payload)
        fmt.Printf("[%s] %v\n", event.Topic, payload)

        // Process the event
        switch event.Topic {
        case "payments.completed":
            handlePaymentComplete(payload)
        case "payments.failed":
            handlePaymentFailed(payload)
        case "payments.refunded":
            handleRefund(payload)
        }
    }
}
package main

import (
    "encoding/json"
    "log"
    "time"

    "github.com/TeoSlayer/pilotprotocol/pkg/driver"
)

// Publisher: replaces the webhook sender
func main() {
    d, err := driver.Connect()
    if err != nil {
        log.Fatal(err)
    }
    stream, err := d.OpenEventStream()
    if err != nil {
        log.Fatal(err)
    }

    event := map[string]any{
        "payment_id": "pay_abc123",
        "amount":     9900,
        "currency":   "usd",
        "status":     "completed",
        "ts":         time.Now().Unix(),
    }
    data, _ := json.Marshal(event)
    stream.Publish("payments.completed", data)
}

Python wrapper

For Python-based agents, wrap the CLI with subprocess or use the webhook bridge pattern:

import subprocess
import json

# Subscribe to events and process them in Python
def subscribe(topic: str):
    proc = subprocess.Popen(
        ["pilotctl", "subscribe", topic, "--json"],
        stdout=subprocess.PIPE,
        text=True
    )
    for line in proc.stdout:
        event = json.loads(line.strip())
        yield event

# Usage: replaces a Flask webhook endpoint
for event in subscribe("tasks.*"):
    print(f"Received: {event['topic']} -> {event['data']}")
    process_event(event)
import subprocess
import json

# Publish an event from Python
def publish(topic: str, data: dict):
    payload = json.dumps(data)
    subprocess.run(
        ["pilotctl", "publish", topic, payload],
        check=True
    )

# Usage: replaces a webhook POST
publish("tasks.complete", {
    "task_id": "abc123",
    "status": "done",
    "result": "Analysis complete. 3 anomalies found."
})

Webhook bridge for existing integrations

If you have existing services that send webhooks, you can bridge them into the Pilot event stream with a small adapter. Set the webhook to point at pilotctl set-webhook:

# Start the webhook bridge: receives HTTP POSTs, publishes to event stream
pilotctl set-webhook http://localhost:8080/events

# Now any webhook pointing at localhost:8080/events
# gets bridged into the Pilot event stream

This gives you a migration path. Existing webhook providers POST to the bridge. The bridge publishes to the event stream. Your agents subscribe to the stream. Over time, you replace webhook integrations with direct Pilot connections as your counterparties adopt the protocol.

Comparison: Webhooks vs SSE vs WebSockets vs Pilot

Property Webhooks SSE WebSockets Pilot Event Stream
Direction Push (server to URL) Server to client Bidirectional Bidirectional pub/sub
NAT traversal Consumer needs public URL Client initiates (OK) Client initiates (OK) Built-in (STUN + relay)
Encryption TLS (you configure) TLS (you configure) TLS (you configure) AES-256-GCM (automatic)
Topic routing URL path (manual) None built-in None built-in Wildcard topics built-in
Auth model HMAC signature per provider Token/cookie Token/cookie Ed25519 trust handshake
Peer discovery Manual URL config Manual URL config Manual URL config Registry + DNS lookup
Works behind NAT No (consumer side) Yes (client initiates) Yes (client initiates) Yes (both sides)
Infra required Queue + DLQ + HTTP server HTTP server WS server None (daemon handles it)
Persistence Provider-dependent retries Reconnect + last-event-id None None (fire-and-forget)

SSE and WebSockets solve the client-initiated connection problem -- the consumer opens the connection outward, so NAT is not an issue on the consumer side. But the server still needs a routable address. For agent-to-agent communication where neither side has a public IP, both SSE and WebSockets fail unless you add a relay infrastructure. Pilot handles this transparently: STUN discovers NAT type, hole-punching establishes direct connections where possible, and relay through the beacon handles symmetric NAT.

What You Eliminate

Switching from webhooks to persistent tunnels removes the following from your architecture:

What you gain: a single pilotctl subscribe command or six lines of Go. The Pilot daemon handles connection management, encryption, NAT traversal, and peer authentication. Your application code receives events on a channel.

Operational cost: Webhooks require you to operate infrastructure proportional to the number of integrations. Pilot's event stream requires you to run one daemon process per agent. The daemon is a single binary, ~15 MB, that runs alongside your agent. No external services, no cloud subscriptions, no infrastructure team.

When Webhooks Are Still the Right Choice

Persistent tunnels are not universally superior to webhooks. Webhooks remain the right choice when:

The decision point is clear: if both sides of the communication channel are agents or services you control, persistent tunnels are simpler, more secure, and more reliable. If one side is a third-party SaaS service, you need webhooks (or a bridge).

Migration Path

You do not need to replace all webhooks at once. The practical migration path is:

  1. Install Pilot on your agents: go install github.com/TeoSlayer/pilotprotocol/cmd/pilotctl@latest
  2. Bridge existing webhooks: Use pilotctl set-webhook to pipe incoming webhooks into the event stream.
  3. New integrations use Pilot: When connecting agents you control, use the event stream directly instead of adding another webhook endpoint.
  4. Decomission webhook infrastructure: As integrations move to Pilot, remove the corresponding webhook receivers, queues, and dead letter queues.
# Step 1: Install
go install github.com/TeoSlayer/pilotprotocol/cmd/pilotctl@latest

# Step 2: Start daemon and join network
pilotctl daemon start
pilotctl join 1

# Step 3: Bridge existing webhooks
pilotctl set-webhook http://localhost:8080/events

# Step 4: Subscribe to events (replaces webhook consumer)
pilotctl subscribe "tasks.*"

The bridge means you never have to choose between webhooks and Pilot. You can run both simultaneously, routing some events through webhooks and others through the event stream, until you have migrated everything you control to persistent tunnels.

Try Pilot Protocol

Replace your webhook infrastructure with six lines of Go. No public URLs, no queues, no dead letter monitoring.

View on GitHub