← Back to Blog

Building Claude Code Agent Teams Over Pilot Protocol

February 9, 2026 claude agent-teams distributed

Claude Code introduced agent teams: a manager agent that delegates work to specialist agents, each operating in its own context window and git worktree. The architecture is powerful. The manager breaks a complex task into subtasks, assigns them to specialists (frontend, backend, testing, infrastructure), and synthesizes the results. Each specialist gets a full 200K context window, isolated file access, and focused instructions.

There is one constraint: all agents run on the same machine. The manager and every specialist share a single computer's CPU, memory, and disk. What if your backend specialist needs a GPU server? What if your QA specialist should run in the CI environment? What if your infrastructure specialist needs access to cloud credentials that should never leave the production VPN? This article shows how Pilot Protocol removes the single-machine constraint and distributes Claude agent teams across machines, networks, and even organizations.

How Claude Code Agent Teams Work Today

Before we distribute them, let us understand the current model. When you use Claude Code's agent team feature, here is what happens:

  1. Manager agent starts. It receives the high-level task ("build a user authentication system") and plans the subtasks.
  2. Specialist agents spawn. The manager creates specialist agents with focused roles: "frontend-auth" gets the React components, "backend-auth" gets the API routes, "test-auth" gets the test files.
  3. Git worktree isolation. Each specialist operates in its own git worktree, so file edits do not conflict. The manager can merge results after each specialist completes.
  4. Context window per specialist. Each specialist gets its own 200K token context window. The manager's context is not consumed by specialist work.
  5. Results flow back. Specialists return their results (files modified, tests written, commands run) to the manager, which synthesizes them into the final output.

This is effective for parallelizing work across multiple Claude instances. But the execution is local. Every specialist is a subprocess on the same machine, accessing the same filesystem (via worktrees), using the same network, limited to the same hardware.

The limitation in practice: A manager on a MacBook Pro spawns 4 specialists. Each specialist drives a Claude API session that produces code. But the ML specialist cannot train a model because there is no GPU. The load testing specialist cannot run a realistic benchmark because it would saturate the MacBook's network. The infrastructure specialist has no access to the staging environment's AWS credentials. Same-machine execution works for code generation. It breaks for anything that needs specialized hardware, network position, or credentials.

Pilot as the Distribution Layer

The core idea: replace the local subprocess model with Pilot Protocol's task system. Instead of spawning a specialist as a subprocess, the manager submits a task to a specialist running on a different machine. The specialist accepts the task, executes it (using its own Claude API key, its own hardware, its own network position), and returns the results over an encrypted Pilot tunnel.

Each machine in the distributed team runs:

The manager machine runs the manager agent, which has pilotctl available as a tool. When the manager decides to delegate work, it calls pilotctl task submit instead of spawning a local subprocess. The Pilot network handles everything else: finding the specialist, establishing the encrypted connection, transferring the task, and returning the results.

Setup: Three Machines, Three Specialists

Here is a concrete setup. One manager, three specialists, each on a different machine:

Role Machine Pilot Address Specialty
Manager MacBook Pro (developer laptop) 1:0001.0000.0001 Task coordination
Backend specialist GCP e2-standard-8 (us-east1) 1:0001.0000.0002 Go/Python backend
ML specialist GCP a2-highgpu-1g (us-central1) 1:0001.0000.0003 Model training, GPU tasks
QA specialist CI runner (GitHub Actions) 1:0001.0000.0004 Testing, benchmarks

All four machines run Pilot daemons pointed at the same rendezvous server. The MacBook is behind home NAT. The CI runner is behind GitHub's NAT. The GCP machines have public IPs. Pilot handles all of this transparently: STUN discovery, hole-punching, and relay fallback happen automatically.

# On each machine: start the Pilot daemon
pilot-daemon -registry-addr rendezvous.example.com:9000 \
             -beacon-addr rendezvous.example.com:9001

# On GCP machines: use fixed endpoints (skip STUN)
pilot-daemon -registry-addr rendezvous.example.com:9000 \
             -beacon-addr rendezvous.example.com:9001 \
             -endpoint 34.148.103.117:4000 -public

The Manager: Task Submission as a Tool

The manager agent uses pilotctl task submit as a tool. In Claude Code's tool system, this is a command-line tool that the manager can invoke. The manager's system prompt includes the Pilot addresses of all specialists and their capabilities.

# Manager's tool: submit a task to a specialist
pilotctl task submit \
  --to 1:0001.0000.0002 \
  --description "Implement user authentication API routes" \
  --param "framework=gin" \
  --param "endpoints=login,logout,register,refresh" \
  --param "auth_method=jwt" \
  --wait

# The --wait flag blocks until results are returned.
# Output: the specialist's results (files, text, status).

The manager's workflow becomes:

  1. Analyze the high-level task and decompose into subtasks
  2. For each subtask, identify the best specialist by role and polo score
  3. Submit tasks via pilotctl task submit --to <address> --wait
  4. Collect results from all specialists
  5. Synthesize: merge code, resolve conflicts, run integration tests

The --wait flag is important. It makes task submission synchronous from the manager's perspective: submit the task, block until the result comes back, then continue. This maps directly to how Claude Code's agent team model works today (spawn specialist, wait for result), just over the network instead of locally.

Manager System Prompt

Here is what the manager's system prompt might include to enable distributed delegation:

You have access to the following specialist agents via Pilot Protocol:

Backend specialist (1:0001.0000.0002):
  - Go and Python backend development
  - API design, database schemas, server configuration
  - Has access to staging database credentials

ML specialist (1:0001.0000.0003):
  - Model training, fine-tuning, evaluation
  - GPU-accelerated compute (A100)
  - Has PyTorch, transformers, and CUDA installed

QA specialist (1:0001.0000.0004):
  - Test writing, load testing, benchmarking
  - Runs in CI environment with access to test databases
  - Can execute full test suites and report coverage

To delegate work, use:
  pilotctl task submit --to <address> --description "..." --wait

The specialist will execute the task and return results.
Trust is already established between all agents.

The Worker: Accept, Execute, Return

Each specialist machine runs a worker agent that polls for tasks, executes them with the Claude API, and returns results. Here is a Python implementation:

import subprocess
import json
import anthropic
import sys
import time

client = anthropic.Anthropic()

SPECIALTY = sys.argv[1]  # "backend", "ml", or "qa"

SYSTEM_PROMPTS = {
    "backend": """You are a backend development specialist.
You write production-quality Go and Python code.
You have access to the local filesystem and can create/modify files.
Return your work as a list of files created or modified.""",

    "ml": """You are a machine learning specialist.
You have access to a GPU (A100) and standard ML libraries.
You can train models, run evaluations, and produce reports.
Return results as files and summary text.""",

    "qa": """You are a QA specialist.
You write and execute tests. You run benchmarks.
You have access to test databases and CI tooling.
Return test results, coverage reports, and any failing test details.""",
}

def accept_and_execute():
    while True:
        # Poll for tasks via pilotctl
        result = subprocess.run(
            ["pilotctl", "task", "accept", "--json", "--timeout", "30"],
            capture_output=True, text=True
        )

        if result.returncode != 0:
            # No task available, retry
            time.sleep(2)
            continue

        task = json.loads(result.stdout)
        print(f"Accepted task: {task['id']} - {task['description']}")

        # Build the prompt from task description and params
        prompt = f"Task: {task['description']}\n\n"
        if "params" in task:
            prompt += "Parameters:\n"
            for k, v in task["params"].items():
                prompt += f"  {k}: {v}\n"

        # Execute with Claude API
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=8192,
            system=SYSTEM_PROMPTS[SPECIALTY],
            messages=[{"role": "user", "content": prompt}]
        )

        result_text = response.content[0].text
        print(f"Task {task['id']} completed, sending results...")

        # Return results via pilotctl
        subprocess.run([
            "pilotctl", "task", "send-results",
            "--task-id", task["id"],
            "--data", result_text
        ])

        print(f"Results sent for task {task['id']}")

if __name__ == "__main__":
    print(f"Worker started: specialty={SPECIALTY}")
    accept_and_execute()

Start the worker on each specialist machine:

# On the backend machine
python worker.py backend

# On the ML machine
python worker.py ml

# On the CI runner
python worker.py qa

Each worker runs in a loop: accept a task, execute it with Claude, return the results. The worker uses its local resources (GPU, databases, credentials) during execution. The results are sent back over the Pilot tunnel, encrypted end-to-end with X25519 + AES-256-GCM.

Trust Model for Agent Teams

Security in a distributed agent team has different requirements than a public marketplace. Here is how to configure trust for a private team:

Manager Trusts All Specialists

The manager must be able to submit tasks to every specialist. On the manager machine:

# Manager trusts each specialist
pilotctl trust 1:0001.0000.0002  # backend
pilotctl trust 1:0001.0000.0003  # ml
pilotctl trust 1:0001.0000.0004  # qa

Specialists Only Trust the Manager

Specialists should only accept tasks from the authorized manager. On each specialist machine:

# Specialist trusts only the manager
pilotctl trust 1:0001.0000.0001  # manager

This creates a hub-and-spoke trust topology. The manager can reach all specialists. Each specialist can only be reached by the manager. Specialists cannot submit tasks to each other unless explicitly granted trust. This prevents a compromised specialist from using other specialists as an attack surface.

For teams where specialists need peer communication (e.g., the backend specialist sends test fixtures to the QA specialist), add explicit trust between those specific pairs:

# On the QA specialist: also trust the backend specialist
pilotctl trust 1:0001.0000.0002  # backend can send to qa

Trust is not transitive. Trusting the manager does not mean trusting the manager's other specialists. Every trust relationship is an explicit, mutual Ed25519 handshake. See The Trust Model for the full cryptographic details.

Polo Score for Specialist Reliability

In a private team, the polo score serves a different purpose than in a public marketplace. Instead of gating access, it tracks specialist reliability over time.

Every completed task earns polo for the specialist. The manager can query polo scores to make delegation decisions:

# Check which specialist is most reliable
pilotctl resolve 1:0001.0000.0002  # polo: 47
pilotctl resolve 1:0001.0000.0003  # polo: 31
pilotctl resolve 1:0001.0000.0004  # polo: 52

The manager can include this in its decision-making: "The QA specialist has the highest polo score (52). It consistently accepts and completes tasks quickly. The ML specialist has the lowest (31) because GPU tasks take longer, reducing the efficiency multiplier." This is not about gatekeeping; it is about routing tasks to the specialist most likely to complete them promptly.

Over time, polo scores also reveal operational issues. If a specialist's polo score plateaus or drops, it might indicate hardware problems (GPU errors causing task failures), network issues (high accept latency from poor connectivity), or configuration drift (the specialist's environment has changed).

File Transfer Between Manager and Specialists

Claude agent teams work heavily with files. The manager needs to send source files to specialists and receive modified files back. Pilot's data exchange service (port 1001) handles this:

# Manager sends source files to the backend specialist
pilotctl data send 1:0001.0000.0002 ./src/auth/

# Backend specialist sends modified files back
pilotctl data send 1:0001.0000.0001 ./output/auth/

For the task-based workflow, files can be embedded in task parameters or results as base64-encoded content. For larger files (model weights, datasets), use the data exchange service directly. See Peer-to-Peer File Transfer Between AI Agents for the full file transfer guide.

The file transfer is encrypted through the same tunnel as everything else. No separate file server, no S3 bucket, no shared filesystem. Direct agent-to-agent transfer over the encrypted overlay.

Event Coordination

Complex agent teams need more than request-response. Pilot's event stream (port 1002) enables publish-subscribe patterns within the team:

# Manager subscribes to all specialist events
pilotctl events subscribe --topic "specialist.*"

# Backend specialist publishes when API is ready
pilotctl events publish --topic "specialist.backend.ready" \
  --data '{"endpoints": ["POST /auth/login", "POST /auth/register"]}'

# QA specialist subscribes to backend events
# (automatically starts testing when backend is ready)
pilotctl events subscribe --topic "specialist.backend.*"

This enables reactive workflows. The QA specialist does not need to poll for task completion. It subscribes to "specialist.backend.ready" and automatically begins testing when the backend code is available. The manager subscribes to "specialist.*" and gets a real-time view of all specialist activity. See Replace Your Agent Message Broker for more event-driven patterns.

What This Enables

Distributed Claude agent teams over Pilot Protocol open up scenarios that are impossible with same-machine execution:

Cross-Organization Collaboration

Company A has a frontend team with Claude agents that produce React components. Company B has a backend team with Claude agents that build APIs. They share a Pilot network with mutual trust between the integration specialist agents. Company A's manager submits "build API for user dashboard" to Company B's backend specialist. The result flows back over an encrypted tunnel. Neither company's internal code leaves their network; only the task description and results cross the boundary.

Hardware-Specialized Agents

An ML team has three GPU servers and five CPU-only development machines. GPU agents specialize in training and inference tasks. CPU agents handle code generation, testing, and documentation. The manager routes tasks based on hardware requirements: model fine-tuning goes to GPU agents, API development goes to CPU agents. Polo scores track which GPU servers have the best availability.

Geographic Distribution

A global team runs specialists in different regions for latency-sensitive testing. The US specialist tests against US-region databases. The EU specialist tests against EU-region databases for GDPR compliance verification. The APAC specialist tests against APAC endpoints for latency benchmarks. The manager delegates region-specific QA tasks to the appropriate specialist. Pilot's NAT traversal handles the cross-region connectivity.

Ephemeral Specialist Agents

CI/CD pipelines spawn specialist agents as needed. A GitHub Actions workflow starts a Pilot daemon, registers the CI runner as a QA specialist, accepts a test task from the manager, executes it, returns results, and shuts down. The next pipeline run creates a fresh specialist with a fresh environment. The polo score persists across runs (tied to the Pilot identity stored in the CI cache), so the manager knows this CI environment is reliable.

End-to-End Example: Building a Feature

Let us walk through a complete scenario. The manager receives: "Add two-factor authentication to the user login flow."

# Step 1: Manager analyzes and decomposes the task
# (This happens in the manager's Claude context)
# Subtasks: backend API, frontend UI, tests, documentation

# Step 2: Manager submits backend task
pilotctl task submit \
  --to 1:0001.0000.0002 \
  --description "Add TOTP 2FA endpoints to the auth API" \
  --param "endpoints=enable-2fa,verify-2fa,disable-2fa" \
  --param "library=pquerna/otp" \
  --wait

# Backend specialist accepts, implements, returns code

# Step 3: Manager sends backend code to QA specialist
pilotctl task submit \
  --to 1:0001.0000.0004 \
  --description "Write and run tests for the 2FA API endpoints" \
  --param "coverage_target=90" \
  --wait

# QA specialist accepts, writes tests, runs them, returns results

# Step 4: Manager submits ML task (if 2FA includes risk scoring)
pilotctl task submit \
  --to 1:0001.0000.0003 \
  --description "Train a login risk scoring model on auth logs" \
  --param "model_type=gradient_boosting" \
  --param "features=ip,device,time,location" \
  --wait

# ML specialist accepts, trains model on GPU, returns model file

# Step 5: Manager synthesizes all results
# Merges code from backend, test results from QA, model from ML
# Produces final PR with all components integrated

Each specialist executed on its own hardware, with its own credentials, in its own network position. The manager never needed GPU access, test database credentials, or CI configuration. It delegated the work and synthesized the results. This is the same pattern as local Claude agent teams, but distributed across the infrastructure that each task actually needs.

Getting Started

To set up a distributed Claude agent team:

  1. Deploy a rendezvous server. See Building a Private Agent Network for the full guide. One small VM is enough.
  2. Install Pilot on each machine. go install github.com/TeoSlayer/pilotprotocol/cmd/... or download the binary from the releases page.
  3. Start daemons. Each machine runs pilot-daemon pointed at your rendezvous server.
  4. Establish trust. Use pilotctl trust to create the hub-and-spoke topology.
  5. Start workers. Run the worker script on each specialist machine.
  6. Configure the manager. Add pilotctl task submit as a tool and include specialist addresses in the system prompt.

The entire setup takes about 15 minutes. Most of that time is installing Pilot on each machine. Once the daemons are running and trust is established, adding new specialists is a single pilotctl trust command.

For the full Pilot Protocol documentation, including the driver API for building native Go integrations instead of shelling out to pilotctl, check the docs. For MCP integration with your specialists, see MCP + Pilot: Give Your Agent Tools AND a Network. For contributing to the project itself, see our codebase tour.

Distribute Your Agent Team

Put your specialists where the hardware is. Pilot handles the tunnels, trust, and NAT traversal.

View on GitHub