Peer-to-Peer File Transfer Between Agents

Peer-to-Peer File Transfer Between Agents

When AI agents need to share files -- model weights, training datasets, research reports, generated artifacts -- the default approach is cloud storage. Agent A uploads to S3 or GCS, sends the URL to Agent B, Agent B downloads. This works. It is also slow, expensive, and introduces compliance risks that many organizations cannot accept.

The upload-download pattern means every file transfer makes two trips: once from the sender to the cloud, once from the cloud to the receiver. For a 2 GB model checkpoint, that is 4 GB of bandwidth consumed and two network hops of latency added. If both agents are on the same local network, the data still leaves the building, traverses the internet to a cloud region, and comes back. If the data is subject to data residency requirements, routing it through a third-party cloud service may violate compliance policies.

Pilot Protocol enables direct peer-to-peer file transfer between agents. The sender streams the file over an encrypted tunnel to the receiver. No cloud intermediary. No temporary storage. No third-party access to the data. The file travels the shortest network path between the two agents, encrypted end-to-end with AES-256-GCM, traversing NAT automatically. This article walks through the implementation.

How It Works

File transfer over Pilot uses port 1001 (data exchange), the same port used for structured data transfer between agents. The transfer is a streaming operation: the sender reads the file in chunks, writes each chunk to the Pilot connection, and the receiver reassembles them on the other end. Pilot's transport layer handles the complexity.

  1. Connection establishment. The sender opens a connection to the receiver's Pilot address on port 1001. The Pilot daemon handles NAT traversal -- using direct connection, hole-punching, or relay depending on the network topology. The tunnel is encrypted with X25519 key exchange and AES-256-GCM.
  2. Metadata exchange. The sender transmits a header containing the filename, total size, checksum, and any custom metadata (e.g., model architecture, dataset schema, content type). The receiver validates the metadata and confirms readiness.
  3. Chunked transfer. The sender reads the file in chunks and writes them to the connection. Pilot's auto segmentation splits each chunk into MTU-sized packets at the transport layer. The receiver reads chunks and writes them to disk.
  4. Verification. After the final chunk, the receiver computes a checksum of the reassembled file and compares it against the sender's declared checksum. A match confirms integrity.

The critical point is that the application code does not manage packets, retransmission, or flow control. Pilot's transport layer -- with its sliding window, AIMD congestion control, and advertised receive window -- handles all of this transparently. The application sees a reliable byte stream, just like TCP, but over encrypted UDP tunnels that traverse NAT.

Why port 1001? Pilot's port model assigns specific purposes to port numbers. Port 1001 is reserved for data exchange -- structured data transfer between agents. File transfer is the most common data exchange pattern, but the same port handles any binary payload: serialized model states, database snapshots, compressed archives.

Building the File Sender

Here is a complete Go implementation of a file sender that transfers a file to a remote agent over Pilot:

package main

import (
    "crypto/sha256"
    "encoding/binary"
    "encoding/hex"
    "fmt"
    "io"
    "log"
    "os"

    "github.com/TeoSlayer/pilotprotocol/pkg/driver"
)

const chunkSize = 64 * 1024 // 64 KB chunks

// FileHeader is sent before the file data
type FileHeader struct {
    Filename string
    Size     int64
    Checksum string // SHA-256 hex
}

func main() {
    if len(os.Args) != 3 {
        fmt.Fprintf(os.Stderr, "usage: filesend <pilot-address> <filepath>\n")
        os.Exit(1)
    }

    targetAddr := os.Args[1] // e.g., "1:0000.0042.00A1"
    filePath := os.Args[2]

    // Open the file and compute checksum
    f, err := os.Open(filePath)
    if err != nil {
        log.Fatalf("open file: %v", err)
    }
    defer f.Close()

    stat, _ := f.Stat()
    checksum := computeSHA256(filePath)

    // Connect to Pilot daemon
    drv, err := driver.Connect("/tmp/pilot.sock")
    if err != nil {
        log.Fatal(err)
    }
    defer drv.Close()

    // Open a connection to the receiver on port 1001 (data exchange)
    conn, err := drv.Dial(targetAddr, 1001)
    if err != nil {
        log.Fatalf("dial %s:1001: %v", targetAddr, err)
    }
    defer conn.Close()

    // Send the file header
    header := FileHeader{
        Filename: stat.Name(),
        Size:     stat.Size(),
        Checksum: checksum,
    }

    // Write header length + header JSON
    headerBytes, _ := json.Marshal(header)
    binary.Write(conn, binary.BigEndian, uint32(len(headerBytes)))
    conn.Write(headerBytes)

    // Read confirmation from receiver
    var ack byte
    conn.Read([]byte{ack})
    if ack != 0x01 {
        log.Fatal("receiver rejected transfer")
    }

    // Stream the file in chunks
    buf := make([]byte, chunkSize)
    var sent int64

    for {
        n, err := f.Read(buf)
        if n > 0 {
            _, writeErr := conn.Write(buf[:n])
            if writeErr != nil {
                log.Fatalf("write error at offset %d: %v", sent, writeErr)
            }
            sent += int64(n)

            // Log progress every 10 MB
            if sent%(10*1024*1024) < int64(chunkSize) {
                pct := float64(sent) / float64(stat.Size()) * 100
                log.Printf("sent %d / %d bytes (%.1f%%)", sent, stat.Size(), pct)
            }
        }
        if err == io.EOF {
            break
        }
        if err != nil {
            log.Fatalf("read error: %v", err)
        }
    }

    // Wait for receiver's checksum confirmation
    conn.Read([]byte{ack})
    if ack == 0x01 {
        log.Printf("Transfer complete: %s (%d bytes, checksum verified)", stat.Name(), sent)
    } else {
        log.Printf("Transfer complete but checksum mismatch")
    }
}

func computeSHA256(path string) string {
    f, _ := os.Open(path)
    defer f.Close()
    h := sha256.New()
    io.Copy(h, f)
    return hex.EncodeToString(h.Sum(nil))
}

The sender opens a file, computes its SHA-256 checksum, connects to the receiver over Pilot, sends a header with metadata, and streams the file in 64 KB chunks. Pilot's transport layer handles packetization, retransmission, and flow control beneath the conn.Write() call. The sender's code is not aware that the data is being split into MTU-sized UDP packets, reordered by sequence number, and delivered through an encrypted tunnel.

Building the File Receiver

The receiver listens on port 1001 for incoming file transfers:

package main

import (
    "crypto/sha256"
    "encoding/binary"
    "encoding/hex"
    "encoding/json"
    "io"
    "log"
    "os"
    "path/filepath"

    "github.com/TeoSlayer/pilotprotocol/pkg/driver"
)

func main() {
    outputDir := "./received"
    os.MkdirAll(outputDir, 0755)

    // Connect to Pilot daemon
    drv, err := driver.Connect("/tmp/pilot.sock")
    if err != nil {
        log.Fatal(err)
    }
    defer drv.Close()

    // Listen for incoming connections on port 1001
    listener, err := drv.Listen(1001)
    if err != nil {
        log.Fatalf("listen 1001: %v", err)
    }
    defer listener.Close()

    log.Println("File receiver listening on port 1001")

    for {
        conn, err := listener.Accept()
        if err != nil {
            log.Printf("accept error: %v", err)
            continue
        }
        go receiveFile(conn, outputDir)
    }
}

func receiveFile(conn io.ReadWriteCloser, outputDir string) {
    defer conn.Close()

    // Read header length
    var headerLen uint32
    binary.Read(conn, binary.BigEndian, &headerLen)

    // Read header JSON
    headerBytes := make([]byte, headerLen)
    io.ReadFull(conn, headerBytes)

    var header FileHeader
    json.Unmarshal(headerBytes, &header)

    log.Printf("Incoming file: %s (%d bytes, checksum %s)",
        header.Filename, header.Size, header.Checksum[:16]+"...")

    // Create output file
    outPath := filepath.Join(outputDir, header.Filename)
    f, err := os.Create(outPath)
    if err != nil {
        conn.Write([]byte{0x00}) // reject
        log.Printf("create file error: %v", err)
        return
    }
    defer f.Close()

    // Send acceptance
    conn.Write([]byte{0x01})

    // Receive file data
    h := sha256.New()
    received, err := io.Copy(io.MultiWriter(f, h), io.LimitReader(conn, header.Size))
    if err != nil {
        log.Printf("receive error at offset %d: %v", received, err)
        return
    }

    // Verify checksum
    computed := hex.EncodeToString(h.Sum(nil))
    if computed == header.Checksum {
        conn.Write([]byte{0x01}) // checksum OK
        log.Printf("Received %s: %d bytes, checksum verified", header.Filename, received)
    } else {
        conn.Write([]byte{0x00}) // checksum mismatch
        log.Printf("Checksum mismatch for %s: expected %s, got %s",
            header.Filename, header.Checksum[:16], computed[:16])
        os.Remove(outPath) // clean up corrupted file
    }
}

The receiver listens on port 1001, accepts incoming connections from any trusted peer, reads the file header, and streams the data to disk while computing a running SHA-256 checksum. The io.Copy with io.LimitReader ensures the receiver reads exactly the declared number of bytes, preventing a malicious sender from writing more data than expected.

Transport Mechanics

Beneath the simple Read/Write API, Pilot's transport layer performs several operations that make file transfer reliable over unreliable networks:

Auto segmentation. When the sender writes a 64 KB chunk, Pilot's transport layer splits it into MTU-sized segments (typically 1200-1400 bytes for UDP). Each segment gets a sequence number in the 34-byte packet header. The receiver reassembles segments in order, handling reordering and duplicates. The application never sees individual packets.

Sliding window. Pilot maintains a send window that limits how many unacknowledged segments can be in flight. This prevents the sender from flooding the network. The window size adapts dynamically using AIMD (Additive Increase, Multiplicative Decrease) congestion control -- the same algorithm family that TCP uses, tuned for UDP overlay characteristics.

Flow control. The receiver advertises its available buffer space in every acknowledgment packet (the 2-byte Window field in the packet header). If the receiver's buffer fills up -- because disk I/O is slow or the receiver is processing other connections -- the sender pauses until the receiver signals it has space. This prevents the receiver from being overwhelmed regardless of how fast the sender can produce data.

Encryption. Every segment is encrypted with AES-256-GCM before transmission. The encryption key is established during the initial X25519 key exchange when the tunnel is created. There is no option to disable encryption -- it is always on. A random nonce prefix per connection prevents nonce reuse across sessions.

These mechanisms work together to make file transfer over Pilot as reliable as TCP while maintaining the NAT traversal properties of UDP. The sender and receiver code does not need to implement any of this -- it is handled by the Pilot daemon and driver library.

Cloud Storage vs. Direct Transfer

The trade-offs between using cloud storage (S3, GCS, Azure Blob) and direct Pilot transfer are significant across multiple dimensions:

DimensionCloud Storage (S3/GCS)Pilot Direct Transfer
Latency2 hops (upload + download)1 hop (direct P2P)
Bandwidth costEgress fees both directionsZero (no intermediary)
Storage costPer-GB-month for temporary filesZero (no intermediate storage)
Data residencyData stored in cloud regionData stays on sender/receiver only
EncryptionTLS in transit, optional at restAES-256-GCM end-to-end, always on
Third-party accessCloud provider can access dataNo third party sees the data
SetupIAM roles, bucket policies, SDKsPilot daemon + driver connect
NAT traversalHandled by cloud (public endpoints)Handled by Pilot (STUN/punch/relay)
ResumabilityMultipart upload/downloadReconnect and continue from offset
Max file size5 TB (S3 limit)Unlimited (streaming)

The most compelling advantage of direct transfer is data privacy. When Agent A sends model weights directly to Agent B over a Pilot tunnel, no third party ever holds the data. There is no S3 bucket to misconfigure, no IAM policy to audit, no cloud provider with potential access. For organizations subject to HIPAA, GDPR, or data sovereignty requirements, this is not a nice-to-have -- it is a compliance requirement.

The cost advantage is also significant at scale. AWS charges $0.09/GB for S3 egress. Transferring a 10 GB model checkpoint costs $0.90 for the upload egress and $0.90 for the download egress, totaling $1.80 per transfer. An agent that shares model checkpoints 100 times a month pays $180 in cloud egress fees alone. Direct transfer over Pilot costs zero in cloud fees -- the agents use their existing network connections.

Resumable Transfers

Network interruptions are inevitable, especially for large file transfers. A 50 GB dataset transfer that fails at 90% should not restart from zero. Pilot's connection tracking enables resumable transfers by maintaining state about the last successfully received offset.

The implementation pattern is straightforward: the receiver tracks how many bytes it has written to disk. If the connection drops and the sender reconnects, the receiver sends the last received offset in its acceptance message, and the sender seeks to that position in the file before resuming:

// Receiver: include resume offset in acceptance
type AcceptMessage struct {
    OK          bool  `json:"ok"`
    ResumeAt    int64 `json:"resume_at"` // 0 for new transfer
}

// Sender: seek to resume offset before streaming
func sendWithResume(conn io.ReadWriteCloser, f *os.File, resumeAt int64) error {
    if resumeAt > 0 {
        log.Printf("Resuming transfer from offset %d", resumeAt)
        f.Seek(resumeAt, io.SeekStart)
    }

    buf := make([]byte, chunkSize)
    offset := resumeAt

    for {
        n, err := f.Read(buf)
        if n > 0 {
            conn.Write(buf[:n])
            offset += int64(n)
        }
        if err == io.EOF {
            return nil
        }
        if err != nil {
            return fmt.Errorf("read at offset %d: %w", offset, err)
        }
    }
}

The receiver stores the resume offset in a temporary metadata file alongside the partially received data. If the agent restarts entirely, it can detect the partial file, read the metadata, and request a resume from the sender. This makes large transfers resilient to transient network issues, daemon restarts, and even machine reboots.

Use Cases

Direct file transfer between agents unlocks several patterns that are impractical or expensive with cloud storage intermediaries:

Model weight distribution. A training agent produces a new model checkpoint and distributes it to inference agents across multiple regions. Each inference agent receives the weights directly, without the checkpoint ever landing in a shared bucket. This is faster (one hop instead of two) and more secure (no shared storage to compromise).

Dataset collaboration. Two research agents working on related problems share preprocessed datasets directly. For more on this pattern, see Secure Research Collaboration: Share Models, Not Data. The key benefit is that data residency is maintained -- the dataset only exists on the two agents' machines, never in transit storage.

Report delivery. An analysis agent generates a PDF report and delivers it to a coordinator agent. The report may contain sensitive financial data that should not traverse cloud storage. Direct delivery ensures the report goes from producer to consumer with no intermediate copies.

Configuration sync. Agents in a fleet share configuration files, prompt templates, or tool definitions. Instead of polling a central config server, agents push updates directly to peers. This eliminates the central config server as a single point of failure and reduces sync latency.

For the underlying NAT traversal mechanics that make direct connections possible even when both agents are behind firewalls, see NAT Traversal for AI Agents: A Deep Dive. For the encryption layer that protects file data in transit, check the documentation on Pilot's security model.

Transfer Files Without the Cloud

Send files directly between agents -- encrypted, NAT-traversing, no intermediary. Install Pilot and start transferring in minutes.

View on GitHub