← Back to Blog

Building a Private Agent Network for Your Company

February 11, 2026 enterprise deployment guide

Your company has agents. The ML team runs training coordinators. The data team has ETL pipelines. The platform team operates deployment bots. Product has customer-facing assistants. And right now, they communicate through... Redis pub/sub? REST webhooks? Shared databases? Slack? A Kafka topic someone set up two years ago that nobody fully understands?

There is no standard internal agent communication layer. Every team builds their own, using whatever transport happens to be available. The result is a fragmented mess: agents that cannot discover each other, no consistent identity model, no encryption between internal services, and no visibility into who is talking to whom.

The problem is not just fragmentation. It is visibility. When something goes wrong — an agent stops responding, a pipeline stalls, a task fails — nobody can answer basic questions. Which agents are online? Who is talking to whom? When did agent X last communicate? These are trivial questions in a proper network. They are unanswerable when agents communicate through five different ad-hoc transports.

Pilot Protocol solves this. Deploy one rendezvous server in your VPC. Give each agent a daemon. They get permanent addresses, encrypted tunnels, mutual trust, and a monitoring dashboard. This guide walks through the entire deployment, from empty VM to production-ready private agent network.

Architecture Overview

A private Pilot Protocol deployment has three components:

  1. Rendezvous server — Central registry, beacon for NAT traversal, and HTTP dashboard. One instance per network (with optional hot-standby replication for HA).
  2. Agent daemons — One per agent. Handles tunnel management, encryption, and IPC. Runs as a background process or systemd service.
  3. Gateway (optional) — Bridges legacy systems (Grafana, Jenkins, Prometheus) to the agent network by mapping Pilot addresses to local IPs.

All communication between daemons is encrypted by default using X25519 + AES-256-GCM. The rendezvous server stores agent registrations and facilitates discovery but never sees plaintext traffic between agents.

Step 1: Deploy the Rendezvous Server

The rendezvous server is a single binary. No database, no message queue, no external dependencies. It persists state to a JSON file and serves a built-in web dashboard.

Provisioning

Choose a VM in your VPC that is reachable by all agent machines. For most companies, a small instance is sufficient — the rendezvous server handles metadata, not data plane traffic. An e2-small (2 vCPU, 2 GB RAM) on GCP or a t3.small on AWS handles thousands of agents comfortably.

Open these ports in your VPC firewall:

PortProtocolPurpose
9000TCPRegistry (agent registration, lookups)
9001UDPBeacon (NAT traversal, hole-punching)
3000TCPDashboard and HTTP API (restrict to internal)

Security note: Port 3000 (dashboard) should only be accessible from your internal network or VPN. It exposes agent topology, trust relationships, and network statistics. Do not expose it to the internet.

Running the Rendezvous

# Download the latest release
wget https://github.com/TeoSlayer/pilotprotocol/releases/latest/download/rendezvous-linux-amd64
chmod +x rendezvous-linux-amd64

# Create persistence directory
sudo mkdir -p /var/lib/pilot
sudo chown $(whoami) /var/lib/pilot

# Start the rendezvous server
./rendezvous-linux-amd64 \
  --registry-addr :9000 \
  --beacon-addr :9001 \
  --store /var/lib/pilot/registry.json \
  --http :3000

For production, run it as a systemd service:

# /etc/systemd/system/pilot-rendezvous.service
[Unit]
Description=Pilot Protocol Rendezvous Server
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=pilot
ExecStart=/opt/pilot/rendezvous \
  --registry-addr :9000 \
  --beacon-addr :9001 \
  --store /var/lib/pilot/registry.json \
  --http :3000
Restart=always
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
sudo systemctl enable pilot-rendezvous
sudo systemctl start pilot-rendezvous

Verify the Deployment

# Check the dashboard
curl http://localhost:3000/api/stats
{"total_agents":0,"total_networks":0,"online_agents":0,"connections_active":0}

# Verify registry port
nc -z localhost 9000 && echo "Registry: OK"

# Verify beacon port
nc -zu localhost 9001 && echo "Beacon: OK"

Step 2: Enroll Your First Agents

Each agent runs a Pilot daemon that connects to the rendezvous, registers itself, and maintains a persistent identity. The daemon is the agent's "network interface" — all communication goes through it.

Daemon Setup

# On each agent machine
wget https://github.com/TeoSlayer/pilotprotocol/releases/latest/download/pilot-daemon-linux-amd64
chmod +x pilot-daemon-linux-amd64

# Start with your rendezvous address
./pilot-daemon-linux-amd64 \
  -registry 10.0.1.5:9000 \
  -beacon 10.0.1.5:9001 \
  -store /var/lib/pilot/identity.json

On first run, the daemon generates an Ed25519 identity key pair, registers with the rendezvous, and receives a virtual address in the format N:NNNN.HHHH.LLLL. The identity is persisted to the store file, so the agent keeps the same address across restarts.

Hostname Convention

Pilot supports hostname registration, allowing agents to be discovered by name instead of address. We recommend a naming convention that maps to your org structure:

# Pattern: team-agent-function
pilotctl set-hostname ml-team-trainer
pilotctl set-hostname ml-team-evaluator
pilotctl set-hostname data-team-etl
pilotctl set-hostname data-team-validator
pilotctl set-hostname platform-deployer
pilotctl set-hostname product-assistant-1

Hostnames are unique within a network. They enable human-readable addressing:

# Instead of connecting by address
pilotctl send 1:0001.A3F2.001B "start training job"

# Connect by hostname
pilotctl send ml-team-trainer "start training job"

Network Organization

Pilot supports multiple networks within a single rendezvous. Use networks to segment your agent fleet by team, environment, or function:

Agents within the same network can discover each other. Cross-network communication requires explicit trust establishment.

Step 3: Configure Trust Policies

By default, Pilot agents are private and invisible. They cannot be discovered by agents outside their network, and even agents within the same network must establish mutual trust before exchanging data. This is the opposite of most internal tools, where everything trusts everything by default.

Same-Network Auto-Trust

For teams that want frictionless internal communication, enable auto-trust within a network. All agents in the same network automatically trust each other:

# On each daemon in the network
pilotctl set-visibility public

Public agents within the same network can be discovered and connected to without a handshake. This is appropriate for agents operated by the same team in a trusted environment.

Cross-Team Trust

When the ML team's trainer agent needs to pull data from the data team's ETL agent, they are in different networks. This requires an explicit trust handshake:

# On ml-team-trainer: request trust with data-team-etl
pilotctl trust request data-team-etl

# On data-team-etl: approve the trust request
pilotctl trust approve ml-team-trainer

The handshake is bilateral: both sides must approve. This maps naturally to organizational approval workflows. The data team can review which ML agents need access and approve or deny on a per-agent basis.

Trust Policies by Org Chart

We recommend mapping trust policies to your organizational structure:

RelationshipPolicy
Same team, same networkAuto-trust (public visibility)
Same team, different envManual trust (prod agents should not auto-trust staging)
Cross-teamBilateral handshake with team lead approval
External partnersBilateral handshake + time-limited trust

Trust Revocation

Trust can be revoked instantly by either party:

# Immediately revoke trust (takes effect on next connection attempt)
pilotctl trust revoke ml-team-trainer

Revocation is immediate. Active connections are terminated, and the revoked agent can no longer initiate new connections. There is no grace period, no cache to expire, no TTL to wait out. This is critical for incident response: if an agent is compromised, you revoke its trust relationships in seconds, not minutes.

Step 4: Bridge Legacy Systems with Gateway

Not every system in your company will run a Pilot daemon. Grafana needs to scrape agent metrics. Jenkins needs to trigger agent tasks. Your internal wiki might link to agent status pages. The Pilot Gateway bridges these legacy systems to the agent network.

How Gateway Works

The gateway runs on a machine that has both a Pilot daemon and access to your corporate network. It maps Pilot virtual addresses to local IP addresses and proxies traffic bidirectionally:

# Start the gateway
sudo ./pilot-gateway \
  -registry 10.0.1.5:9000 \
  -beacon 10.0.1.5:9001

The gateway automatically adds loopback aliases for each agent it proxies. On Linux, this uses ip addr add; on macOS, ifconfig lo0 alias. Each agent gets a unique local IP address.

Grafana Integration

Suppose ml-team-trainer exposes Prometheus metrics on Pilot port 80. Through the gateway, Grafana can scrape it as if it were a local HTTP server:

# In grafana datasource configuration
# The gateway maps ml-team-trainer to 127.0.0.2
url: http://127.0.0.2:80/metrics
scrape_interval: 15s

No changes to Grafana. No Pilot SDK needed. Standard HTTP, standard Prometheus format. The gateway handles the translation between the corporate network and the agent overlay.

Jenkins Integration

Jenkins can trigger agent tasks via HTTP through the gateway:

# Jenkins build step: trigger agent task via gateway
curl -X POST http://127.0.0.3:80/tasks/submit \
  -H "Content-Type: application/json" \
  -d '{"task": "run-evaluation", "model": "v2.3"}'

The agent receives this as a standard HTTP request on its Pilot port 80. It does not know or care that the request came through a gateway. This is the power of the gateway approach: existing tools work unchanged.

Terraform and Ansible Integration

Infrastructure-as-code tools can also use the gateway to interact with agents. A Terraform provider can call agent health endpoints through the gateway to verify deployments. Ansible playbooks can trigger agent reconfiguration through standard HTTP calls. Because the gateway exposes agents as regular IP addresses with standard ports, any tool that speaks HTTP can interact with the agent network.

The key insight is that the gateway is not a "Pilot SDK for legacy tools." It is a transparent bridge. Legacy tools do not know they are talking to agents on an overlay network. They think they are talking to local HTTP servers. This means zero integration effort on the legacy side.

Step 5: Set Up Monitoring

The rendezvous server includes a built-in monitoring dashboard accessible at the HTTP port (default 3000).

Dashboard Features

Prometheus Metrics

For integration with existing monitoring stacks, the rendezvous exposes Prometheus-compatible metrics:

# Scrape rendezvous metrics
curl http://10.0.1.5:3000/metrics

# Example metrics
pilot_agents_total{status="online"} 847
pilot_agents_total{status="offline"} 23
pilot_networks_total 12
pilot_connections_active 156
pilot_registry_lookups_total 45823
pilot_beacon_holepunches_total 312

Add the rendezvous as a Prometheus scrape target in your existing monitoring infrastructure. No special exporters needed.

Alerting

Recommended alert rules:

# Alert if agent count drops more than 10% in 5 minutes
- alert: AgentFleetDrop
  expr: |
    (pilot_agents_total{status="online"} /
     pilot_agents_total{status="online"} offset 5m) < 0.9
  for: 2m
  labels:
    severity: warning

# Alert if rendezvous is unreachable
- alert: RendezvousDown
  expr: up{job="pilot-rendezvous"} == 0
  for: 1m
  labels:
    severity: critical

Step 6: Security Checklist

Before declaring your private agent network production-ready, verify every item on this checklist:

Identity and Keys

Trust Controls

Network Security

Operational

Real-World Use Cases

To make this concrete, here are three deployment patterns we see companies adopt with their private agent networks.

Use Case 1: ML Pipeline Coordination

A machine learning team runs five agents: a data fetcher that pulls training data from various sources, a preprocessor that cleans and tokenizes, a trainer that runs the actual model training, an evaluator that benchmarks against test sets, and a model server that serves the latest checkpoint for inference.

All five agents live in the ml-prod network with public visibility (same-team auto-trust). The pipeline flows naturally: the data fetcher sends processed batches to the preprocessor via Pilot's data exchange service (port 1001), the preprocessor forwards to the trainer, and so on. Each handoff is encrypted end-to-end. No shared filesystem, no S3 bucket as intermediary, no message queue. Direct agent-to-agent streaming over encrypted tunnels.

When the evaluator determines a new model checkpoint beats the previous best, it notifies the model server via Pilot's pub/sub service (port 1002). The model server pulls the checkpoint directly from the trainer using file transfer. The entire pipeline operates without any infrastructure beyond the agents themselves and the rendezvous server.

Use Case 2: Cross-Team Data Access with Approval

The analytics team needs access to the data team's ETL agent to pull transformed datasets. The data team operates in a separate network (data-pipelines) and keeps agents private by default. The cross-team access workflow:

  1. Analytics team lead requests trust: pilotctl trust request data-team-etl
  2. Data team lead receives notification via webhook (piped to Slack)
  3. Data team lead reviews and approves: pilotctl trust approve analytics-reporter
  4. Analytics agent can now connect to data ETL agent and pull datasets
  5. Trust relationship is logged, auditable, and revocable at any time

This maps directly to how enterprise access requests already work (tickets, approvals, audit logs), but without the overhead of VPN configurations, firewall rule changes, or service account provisioning.

Use Case 3: Hybrid Cloud Agent Communication

A company runs inference agents on edge devices (retail stores, factory floors) that need to communicate with training agents in GCP. The edge devices are behind store-level NATs with no public IP addresses. Traditional approaches require VPN tunnels or reverse proxies at each location.

With Pilot, each edge device runs a daemon that connects to the rendezvous server. NAT traversal handles the rest: STUN discovers the NAT type, hole-punching establishes direct tunnels for cone NATs, and relay provides fallback for symmetric NATs. The cloud-based training agent and the edge inference agent communicate directly, encrypted, without any network infrastructure changes at the retail locations.

Scaling Considerations

A single rendezvous server handles thousands of agents. For larger deployments:

Agent CountRecommended Setup
1-500Single rendezvous, e2-small or t3.small
500-5,000Single rendezvous, e2-standard-4, hot-standby replica
5,000-50,000Registry sharding (in development), multiple beacons

For a detailed look at running large agent fleets, see How We Run 10,000 Agents on 3 VMs.

Example: Complete Deployment Script

Here is a complete script that provisions a private agent network from scratch on a fresh Ubuntu 22.04 VM:

#!/bin/bash
# deploy-pilot-network.sh
# Run on your rendezvous VM

set -euo pipefail

# 1. Download binaries
RELEASE="https://github.com/TeoSlayer/pilotprotocol/releases/latest/download"
wget -q "$RELEASE/rendezvous-linux-amd64" -O /opt/pilot/rendezvous
wget -q "$RELEASE/pilot-daemon-linux-amd64" -O /opt/pilot/pilot-daemon
wget -q "$RELEASE/pilotctl-linux-amd64" -O /opt/pilot/pilotctl
chmod +x /opt/pilot/*

# 2. Create pilot user
useradd -r -m -s /bin/false pilot
mkdir -p /var/lib/pilot
chown pilot:pilot /var/lib/pilot

# 3. Install systemd service
cat > /etc/systemd/system/pilot-rendezvous.service <<'EOF'
[Unit]
Description=Pilot Protocol Rendezvous
After=network-online.target

[Service]
Type=simple
User=pilot
ExecStart=/opt/pilot/rendezvous \
  --registry-addr :9000 \
  --beacon-addr :9001 \
  --store /var/lib/pilot/registry.json \
  --http :3000
Restart=always
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

# 4. Start
systemctl daemon-reload
systemctl enable pilot-rendezvous
systemctl start pilot-rendezvous

# 5. Verify
sleep 2
curl -s http://localhost:3000/api/stats
echo ""
echo "Rendezvous deployed. Dashboard at http://$(hostname -I | awk '{print $1}'):3000"

Then on each agent machine, run the daemon setup and enroll:

# On each agent machine
./pilot-daemon \
  -registry RENDEZVOUS_IP:9000 \
  -beacon RENDEZVOUS_IP:9001 \
  -store /var/lib/pilot/identity.json &

# Set hostname
./pilotctl set-hostname ml-team-trainer

# Verify registration
./pilotctl status

That is it. Your agents now have permanent virtual addresses, encrypted tunnels, mutual discovery, and a monitoring dashboard. No Redis. No Kafka. No service mesh sidecar. One binary per machine.

Comparing to Alternatives

Before deploying a private agent network, you might wonder how this compares to existing approaches. Here is a quick comparison:

Service mesh (Istio, Linkerd): Service meshes are designed for microservices, not agents. They require a sidecar proxy per pod, a control plane, and Kubernetes. They provide mTLS and traffic management, but they do not provide NAT traversal, persistent agent identity, or bilateral trust. If your agents are all in Kubernetes, a service mesh works. If they are distributed across heterogeneous environments, it does not.

VPN (WireGuard, Tailscale): VPNs solve the NAT traversal problem at the network layer. They give each machine a virtual IP. But they do not provide per-agent identity (an agent is identified by its machine, not by itself), trust management, or application-level services. A VPN plus a service registry plus a message broker plus an encryption library starts to resemble what Pilot provides out of the box.

Message brokers (Kafka, RabbitMQ, Redis): Brokers excel at decoupled, asynchronous communication. But they are infrastructure that must be deployed, operated, and monitored. They do not provide direct peer-to-peer communication, NAT traversal, or agent identity. For a detailed comparison with NATS, see our protocol comparison article.

Pilot Protocol is not a replacement for any of these. It is the networking layer that makes agent communication work across any environment, with built-in identity, encryption, and trust. Use it alongside your existing tools, not instead of them.

Deploy Your Private Agent Network

Open source, zero dependencies, single binary. Get started in 5 minutes.

Getting Started Guide