Pilot Protocol vs TCP vs gRPC vs NATS

Pilot Protocol vs TCP vs gRPC vs NATS

You are building an agent network. Your agents need to discover each other, exchange structured data, stream events, and do all of this across NAT boundaries and corporate firewalls. The question is not whether you need a networking stack -- you do. The question is which one.

There are four serious contenders: raw TCP sockets, gRPC, NATS, and Pilot Protocol. Each was designed for a different era and a different set of constraints. This post puts them side by side with honest numbers, honest tradeoffs, and a clear recommendation for when each one is the right tool.

The Four Contenders

Before diving into benchmarks, it helps to understand what each protocol is and what it assumes about the world.

Raw TCP is the foundation that almost everything else is built on. It gives you a reliable, ordered byte stream between two endpoints. Nothing more. You handle framing, serialization, encryption, discovery, and NAT traversal yourself. TCP assumes both endpoints have reachable IP addresses and that you are willing to build everything above layer 4.

gRPC is Google's RPC framework built on HTTP/2. It gives you strongly-typed service definitions via Protocol Buffers, bidirectional streaming, deadline propagation, and an ecosystem of interceptors for authentication, tracing, and load balancing. gRPC assumes you have a service mesh or load balancer in front of your services, that endpoints are addressable via DNS, and that you can provision TLS certificates.

NATS is a cloud-native messaging system. It gives you publish/subscribe, request/reply, and queue groups with at-most-once delivery. NATS JetStream adds persistence, exactly-once semantics, and key-value storage. NATS assumes you can run and maintain a NATS server cluster, that all clients can reach the cluster over TCP, and that you want centralized message routing.

Pilot Protocol is an overlay network designed specifically for AI agents. It gives you virtual addressing, encrypted P2P tunnels over UDP, NAT traversal, peer discovery, and a trust model. Pilot assumes agents need to communicate across network boundaries without pre-provisioned infrastructure, and that security should be the default, not an add-on.

Feature-by-Feature Comparison

The following table compares the four protocols across the features that matter most for agent-to-agent communication. A checkmark means the feature is built in. A dash means you need to build or add it yourself.

Feature Raw TCP gRPC NATS Pilot Protocol
Transport TCP (kernel) HTTP/2 over TCP TCP to broker UDP overlay (userspace)
NAT traversal -- -- -- STUN + hole-punch + relay
Encryption Manual (TLS wrapper) TLS required Optional TLS to server X25519 + AES-256-GCM default
Peer discovery -- -- (needs service mesh) Via subject namespace Registry + nameserver
Identity model IP:port TLS certificate CN Client credentials Ed25519 keypair + 48-bit address
Trust / access control -- mTLS + interceptors Account + permission subjects Cryptographic handshake, private-by-default
Publish/subscribe -- Server streaming only Core feature Port 1002 event stream
Request/reply Manual Unary + streaming RPC Core feature Any port (net.Conn)
Message persistence -- -- JetStream --
Virtual addressing -- -- -- 48-bit addresses (N:NNNN.HHHH.LLLL)
Connection multiplexing 1 conn = 1 socket HTTP/2 streams 1 conn to broker All peers over 1 UDP socket
Infrastructure required None TLS certs + DNS NATS server cluster Registry + beacon (single binary)
Language support Every language 12+ languages 40+ clients Go (driver package)

A few things stand out immediately. TCP and gRPC have zero NAT traversal capability. If both endpoints are not directly reachable, you need a separate solution -- a VPN, a reverse proxy, or a relay service. NATS sidesteps the problem by centralizing everything through a broker, but that means all traffic flows through the NATS cluster even when agents are on the same LAN.

Pilot is the only protocol in this comparison that treats NAT as a first-class problem and solves it at the protocol layer. For a deep dive on how this works across all three NAT tiers, see our architecture overview.

Performance: Latency and Throughput

Numbers matter. We benchmarked all four protocols on the same hardware: two GCP e2-standard-2 instances, one in us-east1 and one in europe-west1, with an 85ms baseline RTT. Each test ran 100 iterations; we report the median. For the full Pilot vs HTTP/2 methodology, see our dedicated benchmark post.

Connection Setup Time

Protocol Connection Setup What Happens
Raw TCP ~85ms (1 RTT) SYN / SYN-ACK / ACK
TCP + TLS 1.3 ~170ms (2 RTT) TCP handshake + TLS handshake
gRPC ~175ms (2 RTT) TCP + TLS + HTTP/2 ALPN
NATS ~90ms (1 RTT + INFO) TCP to broker + CONNECT
Pilot Protocol ~15ms (amortized) Tunnel pre-established; X25519 per connection

Pilot's advantage here is structural. The expensive work -- STUN discovery, tunnel establishment, encryption negotiation -- happens once when the daemon starts. Each new agent-to-agent connection reuses the existing tunnel and only pays for a lightweight key exchange. When an orchestrator fans out tasks to 50 agents, Pilot's amortized model saves seconds of cumulative setup time.

Message Latency (1 KB payload, round-trip)

Protocol p50 Latency p99 Latency
Raw TCP 170ms 178ms
gRPC (unary) 173ms 184ms
NATS (req/reply) 175ms 192ms
Pilot Protocol 171ms 180ms

For typical agent payloads -- 1-50 KB of JSON -- all four protocols are dominated by the network RTT. The differences are in the noise. gRPC's slightly higher p99 comes from Protobuf serialization and HTTP/2 framing overhead. NATS adds a broker hop, which explains its marginally higher tail latency.

Sustained Throughput (60-second transfer)

Protocol Throughput (median) Memory (RSS)
Raw TCP 62 Mbps 8 MB
gRPC (streaming) 54 Mbps 52 MB
NATS (JetStream) 45 Mbps 120 MB (server)
Pilot Protocol 50 Mbps 10 MB

Raw TCP wins on throughput because it benefits from decades of kernel-level congestion control optimization. Pilot runs in userspace, which adds overhead per packet. gRPC's HTTP/2 framing and Protobuf encoding reduce its effective throughput below raw TCP. NATS throughput depends heavily on the broker's resources and persistence configuration; JetStream's write-ahead log adds I/O overhead.

Resource efficiency: Pilot's 10 MB memory footprint includes the daemon, all connections, and encryption state. gRPC's 52 MB is the Go runtime plus TLS sessions. NATS's 120 MB is the server process. For agents running on constrained hardware -- edge devices, small VMs, containers with tight memory limits -- this difference matters.

The NAT Reality

Every benchmark above assumed both endpoints have public IP addresses. In production, 88% of networks involve NAT. Here is what happens to each protocol when one endpoint is behind a firewall.

Raw TCP: Connection fails. You need a VPN, reverse proxy, or port forwarding. Each adds infrastructure and configuration.

gRPC: Same as TCP. gRPC is built on top of TCP and inherits all of its reachability limitations. The standard workaround is a gRPC proxy (like Envoy) or a service mesh (like Istio). Both require infrastructure you control on both sides of the NAT.

NATS: Works, because the broker acts as a relay. Both agents connect outbound to the NATS server, so NAT is not a problem. The tradeoff: all traffic routes through the broker, adding latency and creating a central bottleneck. If the NATS cluster goes down, all agent communication stops.

Pilot Protocol: Works, with three-tier NAT traversal. For full-cone and restricted-cone NATs, the beacon coordinates UDP hole-punching and agents communicate directly -- no relay in the data path. For symmetric NATs, the beacon relays traffic as a fallback. The agent code is identical in all cases. See the full architecture for details on each NAT tier.

This is the fundamental differentiator. When you can guarantee that all agents have public IPs and open firewall rules, any protocol works. When you cannot -- and for real-world agent deployments you usually cannot -- Pilot and NATS are the only protocols that function without additional infrastructure. Pilot's advantage over NATS is that traffic goes direct when possible, avoiding the single-point-of-failure broker.

Deep Dive: Raw TCP

TCP is the lowest-level option. It gives you a reliable byte stream and nothing else.

Strengths: Maximum throughput. Minimal overhead. Kernel-optimized congestion control. Every language, every platform, every operating system supports it. If you need to move bulk data between two machines with public IPs, TCP is the fastest path.

Weaknesses for agents: No message framing -- you must implement your own length-prefix or delimiter protocol. No encryption -- you wrap with TLS yourself and manage certificate provisioning. No discovery -- you hardcode IP addresses or build a service registry. No NAT traversal -- if the target is behind a firewall, TCP cannot reach it.

// TCP: you build everything above the byte stream
conn, err := net.Dial("tcp", "192.168.1.100:8080")
// Now: implement framing, serialization, auth, reconnect...

Best for: High-throughput, low-latency communication between agents on the same network or with guaranteed public endpoints. Data pipelines, file transfers, inter-process communication within a single machine.

Deep Dive: gRPC

gRPC gives you strongly-typed APIs with code generation, which is a significant productivity gain for teams building structured agent interactions.

Strengths: Protocol Buffer schemas enforce contract-first design. Bidirectional streaming supports real-time agent communication. Deadline propagation prevents cascading timeouts. Interceptor chains provide clean extension points for auth, logging, and tracing. The ecosystem is mature -- service meshes, load balancers, and API gateways all speak gRPC natively.

Weaknesses for agents: gRPC requires TLS certificate provisioning, which is non-trivial outside of Kubernetes. HTTP/2 multiplexing is powerful but adds framing overhead for small messages. There is no peer discovery -- agents must know each other's addresses in advance or rely on external service discovery (Consul, etcd, DNS SRV records). And like TCP, gRPC has zero NAT traversal capability.

// gRPC: structured, but requires infrastructure
conn, err := grpc.Dial("agent-b.example.com:443",
    grpc.WithTransportCredentials(creds),  // TLS certs required
)
client := pb.NewAgentServiceClient(conn)
resp, err := client.SubmitTask(ctx, &pb.TaskRequest{...})

Best for: Structured microservice architectures where agents expose well-defined APIs, schema evolution matters, and you have control over the deployment environment. Kubernetes-native agent platforms, cloud-hosted agent fleets with load balancers.

Deep Dive: NATS

NATS is the closest competitor to Pilot for agent networking because it solves the connectivity problem, albeit differently.

Strengths: Exceptional publish/subscribe with subject-based routing and wildcards. Request/reply pattern makes agent interactions simple. Queue groups provide automatic load balancing across agent replicas. JetStream adds persistence, exactly-once delivery, and key-value storage. NAT is a non-issue because all agents connect outbound to the NATS cluster. The client ecosystem covers 40+ languages.

Weaknesses for agents: NATS requires a server cluster, which means infrastructure to deploy, monitor, and maintain. All traffic routes through the cluster, even when two agents are on the same LAN. Client-to-client encryption is not built in -- TLS protects the client-to-server link, but the NATS server sees all messages in plaintext. If the cluster goes down, all agent communication stops. JetStream's persistence adds resource overhead that may be unnecessary for real-time agent coordination.

// NATS: simple, but centralized
nc, err := nats.Connect("nats://broker.example.com:4222")
nc.Publish("tasks.agent-b", taskPayload)
// All traffic routes through the broker
// Broker sees plaintext messages

Best for: High-volume event-driven architectures where agents produce and consume streams of events. Monitoring, telemetry, fan-out notifications. Systems where message persistence and replay are required. For a comparison of Pilot's built-in pub/sub versus NATS for agent workloads, see Replace Your Message Broker with 12 Lines of Go.

Deep Dive: Pilot Protocol

Pilot Protocol was designed from the ground up for the specific problem of AI agents communicating across arbitrary network topologies.

Strengths: Built-in NAT traversal with three tiers (STUN, hole-punch, relay). End-to-end encryption between agents by default -- the rendezvous server never sees plaintext. Virtual 48-bit addresses that persist across network changes. Cryptographic trust model where agents are invisible by default. Single UDP socket for all peer connections keeps memory under 25 MB even at 100 concurrent peers. Zero external dependencies -- one Go binary for the daemon, one for the CLI.

Weaknesses: Userspace transport means lower raw throughput than kernel TCP -- approximately 10% less on sustained transfers. The driver package is Go-only, so agents in Python or JavaScript need to shell out to pilotctl or use the IPC socket. No HTTP/2 multiplexing -- connections are byte streams, not structured RPC calls. No durable message log -- if an agent is offline when a message is sent, the message is lost (unlike NATS JetStream). The ecosystem is young compared to gRPC and NATS.

// Pilot: P2P, encrypted, NAT-aware
conn, err := driver.Dial(daemon, "0:0000.0000.0003", 1001)
// NAT traversal handled automatically
// End-to-end encrypted (X25519 + AES-256-GCM)
// conn implements net.Conn -- use standard Go I/O
conn.Write(taskPayload)

Best for: Distributed agent networks where agents run on heterogeneous infrastructure -- laptops, VMs, edge devices, behind corporate firewalls. Scenarios where data sovereignty matters and you cannot route traffic through a centralized broker. Private agent swarms where trust must be established cryptographically, not assumed.

When to Use What

There is no single best protocol. The right choice depends on your constraints.

Use raw TCP when you need maximum throughput between agents on the same network, you control both endpoints, and you are comfortable building your own framing, encryption, and reconnection logic. Common in high-frequency trading, game servers, and data pipeline internals.

Use gRPC when your agents expose structured APIs with well-defined schemas, you run in a cloud environment with load balancers and TLS certificate automation, and you need the ecosystem of interceptors, tracing, and code generation. Common in Kubernetes-based agent platforms and enterprise microservice architectures.

Use NATS when your agents communicate primarily via publish/subscribe patterns, you need message persistence and replay, you can run and maintain a NATS cluster, and NAT is handled at the infrastructure level (all agents are in the same VPC or can reach the NATS cluster). Common in event-driven systems, telemetry pipelines, and fan-out notification architectures.

Use Pilot Protocol when your agents are distributed across multiple networks, sit behind NAT or firewalls, need end-to-end encryption without a centralized decryption point, and must establish trust without pre-shared credentials. Common in cross-organization agent collaboration, edge computing, privacy-sensitive deployments, and any scenario where agents cannot assume network reachability.

Not mutually exclusive: Pilot Protocol's HTTP port (80) and gateway component mean you can run gRPC services over Pilot tunnels. Use Pilot for the connectivity and NAT traversal layer, and gRPC for the API structure. The gateway maps Pilot addresses to local IPs, so existing gRPC clients can connect without modification.

Honest Limitations of Pilot Protocol

We built Pilot Protocol and we know where it falls short. Being honest about limitations builds more trust than hiding them.

  • Raw throughput: Pilot achieves approximately 50 Mbps sustained, compared to 62 Mbps for raw TCP. The 20% gap is the cost of userspace transport. For bulk data transfer between agents on the same LAN, TCP is faster.
  • Language support: The driver package (pkg/driver) is Go-only. Other languages can interact via the pilotctl CLI or the IPC Unix socket, but native client libraries for Python, JavaScript, and Rust do not exist yet.
  • No HTTP/2 multiplexing: gRPC's ability to multiplex many RPC calls over a single HTTP/2 connection is genuinely useful for structured APIs. Pilot connections are raw byte streams -- you implement your own framing or use HTTP/1.1 over port 80.
  • No durable message log: NATS JetStream lets consumers replay missed messages. Pilot's event stream on port 1002 is fire-and-forget. If the subscriber is offline, the event is lost.
  • Ecosystem maturity: gRPC has thousands of production deployments and a massive community. NATS powers messaging at companies like Synadia and Mastercard. Pilot Protocol is newer and smaller.

These limitations are real. They matter for certain use cases. The question is whether they matter for your use case. If your agents are behind NAT, need end-to-end encryption, and must establish trust without pre-shared infrastructure, the features Pilot provides outweigh the features it lacks.

Summary

The networking landscape for AI agents is not one-size-fits-all. Each protocol excels in its niche:

  • TCP gives you raw speed and total control at the cost of building everything yourself.
  • gRPC gives you structured APIs and ecosystem maturity at the cost of infrastructure requirements.
  • NATS gives you powerful pub/sub and message persistence at the cost of running a centralized broker.
  • Pilot Protocol gives you P2P connectivity, encryption, and NAT traversal at the cost of lower raw throughput and a younger ecosystem.

For agent networks that span organizational boundaries, cross NAT barriers, and require cryptographic trust -- the use case Pilot was designed for -- no other protocol in this comparison provides a complete solution without additional infrastructure.

Try the Comparison Yourself

Set up two Pilot agents and run the built-in benchmarks. Compare against your current stack and see where the numbers land for your workload.

View on GitHub