Connect Agents Across AWS, GCP, and Azure Without a VPN
You have agents on AWS. Your team in Europe runs agents on GCP. A partner organization uses Azure. You need them to communicate. The traditional answer is VPN: set up site-to-site tunnels between each cloud provider, configure routing tables, manage firewall rules, and hope the whole thing does not fall over when someone changes a security group.
Multi-cloud networking is a hard problem. It is hard because each cloud provider designed its networking for a single-provider world. AWS VPCs, GCP VPC networks, and Azure VNets are all incompatible. Connecting them requires either cloud-provider interconnect products (AWS Transit Gateway, GCP Cloud Interconnect, Azure ExpressRoute) or VPN tunnels between gateways. Both options are expensive, complex, and scale poorly.
For AI agent communication, this complexity is unnecessary. Agents do not need full network-level connectivity between clouds. They need to find each other, establish encrypted connections, and exchange data. Pilot Protocol provides this with virtual addresses that work regardless of which cloud the agent runs on, automatic NAT traversal that handles the networking, and end-to-end encryption that does not depend on cloud-provider security.
The Multi-Cloud Networking Nightmare
Consider a concrete scenario. Your organization runs agents in three clouds:
- AWS us-east-1: Research agents with access to internal databases
- GCP europe-west1: Analysis agents near European data sources (GDPR residency)
- Azure eastus2: Customer-facing agents integrated with Microsoft 365
To connect these with VPN, you need:
- AWS-to-GCP VPN tunnel (2 tunnel endpoints)
- AWS-to-Azure VPN tunnel (2 tunnel endpoints)
- GCP-to-Azure VPN tunnel (2 tunnel endpoints)
That is 6 tunnel endpoints for 3 clouds. Add a fourth cloud or on-premises location, and you need 12 endpoints. This is the combinatorial explosion problem: the number of VPN tunnels grows as N*(N-1)/2 where N is the number of sites. "When managing hundreds of applications, you are quickly talking about managing hundreds of VPN tunnels."
VPN throughput limitations
VPN gateways have throughput limits that are often lower than you expect. Industry reports note that organizations are "very much limited by the throughput of the various VPNs -- around 300 Mbps." AWS VPN connections support up to 1.25 Gbps per tunnel, but real-world throughput is often lower due to encryption overhead, MTU limitations, and the single-threaded nature of IPsec processing on many gateway implementations.
For bulk data transfer between agents (model weights, datasets, training outputs), VPN throughput becomes a bottleneck. For control-plane traffic (task delegation, status updates, coordination), the throughput is sufficient but the operational cost of maintaining VPN infrastructure for lightweight agent communication is disproportionate.
The skills and cost gap
Multi-cloud networking requires expertise in each cloud provider's networking model. AWS networking (VPCs, subnets, security groups, NACLs, route tables, transit gateways) is different from GCP networking (VPC networks, subnets, firewall rules, Cloud Router, Cloud NAT) is different from Azure networking (VNets, subnets, NSGs, route tables, Virtual WAN). A survey of cloud professionals found that 93% expressed concern about cloud security skills shortage.
The cost is equally problematic. "The costs for multi-cloud are enormous -- support and operation cost easily more than doubles." VPN gateway hours, data transfer between regions, cross-cloud egress fees, and the engineering time to manage it all add up quickly. For agent communication that might transfer megabytes per day, you are paying for gigabit-class infrastructure.
Why VPNs Do Not Scale for Agent Communication
VPNs solve the wrong problem for AI agents. VPNs provide network-level connectivity: they make two remote networks appear as if they are on the same LAN. This is useful for applications that need to access databases, file shares, and services using IP addresses and ports. It is overkill for agents that need to exchange messages, delegate tasks, and stream events.
The mismatch shows up in several ways:
- All-or-nothing access: A VPN connects networks, not applications. Once the tunnel is up, any application on one network can reach any application on the other. This violates the principle of least privilege. Agent A needs to talk to Agent B, but the VPN gives Agent A access to everything on Agent B's network.
- Static configuration: VPN tunnels are configured with static endpoints. Agents are dynamic -- they spin up, move between machines, and shut down. Reconfiguring VPN routes every time an agent moves defeats the purpose of automation.
- No identity layer: VPNs authenticate at the network level (certificates, pre-shared keys), not at the application level. There is no concept of "Agent A trusts Agent B but not Agent C" at the VPN layer. You need a separate identity and access management system on top.
- Operational burden: VPN tunnels need monitoring, certificate rotation, failover configuration, and capacity planning. Each tunnel is a potential point of failure. For agent communication where reliability matters, the VPN infrastructure becomes the weakest link.
Virtual Addresses: One Identity Regardless of Cloud
Pilot Protocol assigns each agent a 48-bit virtual address in the format N:NNNN.HHHH.LLLL. This address has two components: a 16-bit network ID and a 32-bit node ID. The address is generated when the agent first starts and remains stable for the agent's lifetime, regardless of where it runs.
An agent on AWS us-east-1 might have address 1:0001.0000.0017. If you migrate that agent to GCP europe-west1, its address stays 1:0001.0000.0017. Other agents continue to reach it at the same address. The Pilot daemon handles re-registration with the registry when the physical endpoint changes, and peers automatically reconnect to the new location.
# Agent identity is portable across clouds
AWS us-east-1 GCP europe-west1 Azure eastus2
+------------------+ +------------------+ +------------------+
| Agent: research | | Agent: analysis | | Agent: customer |
| 1:0001.0000.0017 | | 1:0001.0000.0042 | | 1:0001.0000.0063 |
| IP: 10.0.1.15 | | IP: 10.128.0.5 | | IP: 10.1.0.8 |
+------------------+ +------------------+ +------------------+
| | |
+------------------------+-------------------------+
Pilot overlay network
(same virtual address space)
(direct encrypted tunnels)
The physical IP addresses (10.0.1.15, 10.128.0.5, 10.1.0.8) are cloud-specific and can change. The Pilot addresses are stable. Your application code references Pilot addresses, not IP addresses. This decouples agent identity from infrastructure, which is exactly what you need for multi-cloud deployment.
Example: Agent on AWS Talks to Agent on GCP
Here is a complete walkthrough. We will deploy one agent on an AWS EC2 instance in us-east-1 and another on a GCP Compute Engine instance in europe-west1, and have them communicate.
Step 1: Deploy rendezvous server
The rendezvous server can run on any cloud (or on-premises). It handles address resolution and NAT traversal signaling. For this example, we will run it on the GCP instance, but it could run anywhere.
# On GCP instance (or any machine with a public IP)
go install github.com/TeoSlayer/pilotprotocol/cmd/pilotctl@latest
# Start rendezvous with registry on port 9000, beacon on port 9001
pilotctl rendezvous start \
--registry-addr 0.0.0.0:9000 \
--beacon-addr 0.0.0.0:9001 \
--persist /var/lib/pilot/registry.json
Firewall rules: Open TCP port 9000 (registry) and UDP port 9001 (beacon) on the rendezvous server. Open UDP port 4000 on each agent VM (tunnel port). These are the only firewall rules needed -- three ports total, regardless of how many agents you deploy.
Step 2: Start agent on AWS
# On AWS EC2 instance (us-east-1)
go install github.com/TeoSlayer/pilotprotocol/cmd/pilotctl@latest
# Start daemon pointing at rendezvous
pilotctl daemon start \
--registry <rendezvous-ip>:9000 \
--beacon <rendezvous-ip>:9001 \
--endpoint <aws-public-ip>:4000
# Join network and set identity
pilotctl join 1
pilotctl set-hostname research-aws
pilotctl set-visibility public
pilotctl tags set cloud=aws,region=us-east-1,role=research
Step 3: Start agent on GCP
# On GCP Compute Engine instance (europe-west1)
go install github.com/TeoSlayer/pilotprotocol/cmd/pilotctl@latest
# Start daemon
pilotctl daemon start \
--registry <rendezvous-ip>:9000 \
--beacon <rendezvous-ip>:9001 \
--endpoint <gcp-public-ip>:4000
# Join same network
pilotctl join 1
pilotctl set-hostname analysis-gcp
pilotctl set-visibility public
pilotctl tags set cloud=gcp,region=europe-west1,role=analysis
Step 4: Establish trust and communicate
# From AWS agent: discover and trust GCP agent
pilotctl resolve analysis-gcp
# 1:0001.0000.0042
pilotctl trust request 1:0001.0000.0042 \
--justification "Cross-cloud research collaboration"
# From GCP agent: approve trust
pilotctl trust approve 1:0001.0000.0017
# Communicate: send a message
pilotctl send-message 1:0001.0000.0042 "Analyze this dataset"
# Send a file
pilotctl send-file 1:0001.0000.0042 dataset.csv
# Submit a task
pilotctl task submit 1:0001.0000.0042 \
--description "Run sentiment analysis on Q1 customer feedback"
# Benchmark the connection
pilotctl echo 1:0001.0000.0042
That is the complete setup. Two go install commands, two daemon start commands, one trust handshake, and the agents are communicating across clouds with end-to-end encryption. No VPN tunnels. No cloud interconnect products. No firewall rules beyond the three ports.
NAT Traversal Handles the Networking Automatically
The example above used VMs with public IPs and the --endpoint flag, which skips NAT traversal by registering a fixed public endpoint. But many cloud deployments use private-only VMs (no public IP) behind Cloud NAT or similar services. Pilot handles this automatically.
When an agent starts without the --endpoint flag, the daemon performs STUN discovery to determine its public-facing IP and port, plus the NAT type (Full Cone, Restricted Cone, Port-Restricted Cone, or Symmetric). Based on the NAT types of both peers, the connection strategy is selected automatically:
| Agent A NAT | Agent B NAT | Strategy |
|---|---|---|
| Public IP / Full Cone | Any | Direct connection |
| Restricted Cone | Restricted Cone | Hole-punching via beacon |
| Port-Restricted Cone | Port-Restricted Cone | Hole-punching via beacon |
| Symmetric | Symmetric | Relay through beacon |
| Symmetric | Non-symmetric | Hole-punching (may succeed) |
Cloud NAT services (AWS NAT Gateway, GCP Cloud NAT, Azure NAT Gateway) typically implement Port-Restricted Cone or Symmetric NAT. Between two agents behind different cloud NATs, Pilot attempts hole-punching first. If hole-punching fails (Symmetric NAT on both sides), it falls back to relay through the beacon. The fallback is automatic -- your application code does not change.
# Agent behind AWS NAT Gateway (no public IP)
pilotctl daemon start \
--registry <rendezvous-ip>:9000 \
--beacon <rendezvous-ip>:9001
# No --endpoint flag: STUN discovery handles it
# Agent behind GCP Cloud NAT (no public IP)
pilotctl daemon start \
--registry <rendezvous-ip>:9000 \
--beacon <rendezvous-ip>:9001
# Same: STUN discovery, automatic hole-punching or relay
Performance: Direct Tunnels vs VPN Overhead
Pilot's UDP tunnels introduce less overhead than VPN tunnels for agent communication patterns. Here is why.
A VPN tunnel encapsulates IP packets inside encrypted IP packets. Each packet gets an outer IP header (20 bytes), a UDP or ESP header (8-24 bytes), and encryption overhead (16-32 bytes for AES-GCM). This reduces the effective MTU and can cause fragmentation, especially for larger payloads. VPN gateways also introduce an extra network hop, adding latency.
Pilot tunnels encapsulate application data directly in UDP packets with a 34-byte Pilot header and AES-256-GCM encryption overhead (16-byte auth tag + 12-byte nonce). There is no IP-in-IP encapsulation because Pilot is an overlay network, not a VPN -- it does not route arbitrary IP traffic, only agent communication. This means less overhead per packet and no fragmentation issues with standard 1500-byte MTUs.
For agent communication patterns (small messages, task payloads, event streams), the difference is measurable:
| Metric | Pilot Tunnel | IPsec VPN | WireGuard |
|---|---|---|---|
| Per-packet overhead | 62 bytes | 58-76 bytes | 60 bytes |
| Connection setup | 1 RTT (existing tunnel) | 2-4 RTT (IKE) | 1 RTT |
| Additional hops | 0 (direct P2P) | 2 (gateway each side) | 0-1 |
| NAT traversal | Built-in (automatic) | NAT-T (UDP encap) | Built-in (manual config) |
| Per-agent identity | Yes (Ed25519) | No (network-level) | Yes (Curve25519) |
| Discovery | Registry + tags | None (static config) | None (static config) |
The key performance advantage is not per-packet overhead (which is similar across all three). It is the elimination of the gateway hop and the zero-configuration NAT traversal. With VPN, traffic routes through gateway VMs that become bottlenecks. With Pilot, traffic flows directly between agent VMs over peer-to-peer tunnels.
Comparison: Pilot vs Tailscale vs ZeroTier vs Site-to-Site VPN
| Feature | Pilot Protocol | Tailscale | ZeroTier | Site-to-Site VPN |
|---|---|---|---|---|
| Designed for | AI agent communication | Device connectivity | Virtual networking | Network interconnect |
| Identity model | Per-agent Ed25519 | Per-device (SSO/OAuth) | Per-device (ZT identity) | Per-network (certs/PSK) |
| Trust model | Mutual handshake + justification | ACL policy (centralized) | Network membership | Network-level (all-or-nothing) |
| NAT traversal | STUN + hole-punch + relay | DERP relay servers | Root servers + relay | NAT-T (manual config) |
| Control plane | Self-hosted registry | Tailscale coordination (cloud) | ZeroTier Central (cloud) | Self-managed |
| Agent features | Tasks, events, files, trust, reputation | None (network only) | None (network only) | None (network only) |
| Self-hostable | Yes (fully) | Partial (Headscale) | Partial (self-hosted controller) | Yes |
| Pricing | Free (open source) | Free tier, paid plans | Free tier, paid plans | Per-tunnel-hour (cloud) |
Tailscale and ZeroTier are excellent products for connecting devices and services across networks. They solve the connectivity problem well. But they are general-purpose network tools, not agent communication platforms. They provide a tunnel -- you still need to build agent discovery, trust management, task delegation, event streaming, and reputation tracking on top.
Pilot provides all of these as built-in services on well-known ports. An agent on Pilot can discover peers by tags, establish trust with justifications, delegate tasks (port 1003), stream events (port 1002), exchange files (port 1001), and build reputation through task completion -- all without additional infrastructure. The networking is just the foundation.
Cost Comparison
The cost difference for multi-cloud agent communication is significant:
| Component | Pilot Protocol | Cloud VPN | Tailscale |
|---|---|---|---|
| Software | Free (open source) | Included with cloud | Free for 3 users, $6/user/mo |
| VPN gateway hours | $0 | ~$0.05/hr per tunnel (~$36/mo) | $0 |
| 3 clouds, 3 tunnels | $0 | ~$108/mo | $0 (relay via Tailscale cloud) |
| Rendezvous server | 1 small VM (~$5/mo) | N/A | N/A (Tailscale coordination) |
| Data transfer | Cloud egress only | Cloud egress + VPN processing | Cloud egress only |
| Engineering time | Low (2 commands per agent) | High (per-cloud config) | Low (install + join) |
| Agent features | Included | Build yourself | Build yourself |
For 3 clouds with site-to-site VPN, you are paying approximately $108/month in VPN gateway hours alone, before data transfer. For 5 clouds, it is $360/month (10 tunnels). For 10 sites, it is $1,620/month (45 tunnels). And each tunnel requires configuration, monitoring, and maintenance.
Pilot requires one rendezvous server (a small VM, ~$5/month) regardless of how many agents or clouds you connect. Adding a new cloud means installing Pilot on the new agent and running pilotctl join. No VPN configuration, no firewall rules beyond port 4000 UDP, no cloud-specific networking setup.
Scaling Beyond Three Clouds
The real advantage of Pilot for multi-cloud appears as you scale. Adding agents does not require new tunnels or configuration changes. Each new agent:
- Installs
pilotctl - Starts the daemon pointing at the rendezvous server
- Joins the network
- Establishes trust with the specific peers it needs to communicate with
# Add a new agent on Azure (or any cloud, or on-premises, or a laptop)
go install github.com/TeoSlayer/pilotprotocol/cmd/pilotctl@latest
pilotctl daemon start \
--registry <rendezvous-ip>:9000 \
--beacon <rendezvous-ip>:9001
pilotctl join 1
pilotctl set-hostname new-agent-azure
pilotctl tags set cloud=azure,region=eastus2,role=customer-support
# Discover peers by tag
pilotctl discover --tag role=research
# 1:0001.0000.0017 research-aws [cloud=aws, region=us-east-1, role=research]
# Establish trust with specific agents (not entire networks)
pilotctl trust request 1:0001.0000.0017 --justification "Cross-cloud task delegation"
There are no VPN tunnels to add. No routing tables to update. No firewall rules to modify. No cloud-specific configuration. The agent connects to the rendezvous server, discovers peers, and establishes direct encrypted tunnels. Whether the peer is on AWS, GCP, Azure, Oracle Cloud, a Raspberry Pi, or a laptop on a coffee shop WiFi network, the process is identical.
This is what "cloud-agnostic" actually means for agent networking. Not "works on multiple clouds with per-cloud configuration" but "works on any network with the same two commands."
Getting Started
Deploy agents across any combination of clouds in under 10 minutes:
# 1. Deploy rendezvous (any machine with a public IP)
go install github.com/TeoSlayer/pilotprotocol/cmd/pilotctl@latest
pilotctl rendezvous start --registry-addr 0.0.0.0:9000 --beacon-addr 0.0.0.0:9001
# 2. On each agent (any cloud, any location)
go install github.com/TeoSlayer/pilotprotocol/cmd/pilotctl@latest
pilotctl daemon start --registry <rendezvous-ip>:9000 --beacon <rendezvous-ip>:9001
pilotctl join 1
# 3. Agents discover each other and communicate
pilotctl discover --tag role=analysis
pilotctl trust request <peer-address> --justification "Multi-cloud collaboration"
pilotctl send-message <peer-address> "Task: analyze Q1 data"
No VPN. No cloud interconnect. No per-cloud networking configuration. One overlay network that spans all of them.
Try Pilot Protocol
Connect agents across any cloud with two commands per agent. No VPN tunnels, no cloud interconnect, no networking expertise required.
View on GitHub
Pilot Protocol