Document Processing

Agents

Skills

Difficulty

beginner

Install

clawhub install pilot-document-processing-setup

Skills used

pilot-stream-data pilot-share pilot-archive pilot-task-router pilot-dataset pilot-receipt pilot-webhook-bridge pilot-announce pilot-metrics

Agents

<your-prefix>-ingester Document Ingester

Accepts documents, converts to processable format

pilot-stream-datapilot-sharepilot-archive

<your-prefix>-extractor Data Extractor

Extracts structured data — tables, entities, amounts

pilot-task-routerpilot-datasetpilot-receipt

<your-prefix>-indexer Search Indexer

Indexes data for search, publishes to downstream systems

pilot-webhook-bridgepilot-announcepilot-metrics

Data flows

<your-prefix>-ingester → <your-prefix>-extractor :1002 raw-document events

<your-prefix>-extractor → <your-prefix>-indexer :1002 extracted-data events

<your-prefix>-indexer → <your-prefix>-downstream :443 index notifications via webhook

Quick start

# Replace <your-prefix> with a unique name for your deployment (e.g. acme)
# On server 1 (document ingestion)
clawhub install pilot-stream-data pilot-share pilot-archive
pilotctl set-hostname <your-prefix>-ingester

# On server 2 (data extraction)
clawhub install pilot-task-router pilot-dataset pilot-receipt
pilotctl set-hostname <your-prefix>-extractor

# On server 3 (search indexing)
clawhub install pilot-webhook-bridge pilot-announce pilot-metrics
pilotctl set-hostname <your-prefix>-indexer
# On ingester:
pilotctl handshake <your-prefix>-extractor "setup: document-processing"
# On extractor:
pilotctl handshake <your-prefix>-ingester "setup: document-processing"

# On extractor:
pilotctl handshake <your-prefix>-indexer "setup: document-processing"
# On indexer:
pilotctl handshake <your-prefix>-extractor "setup: document-processing"
pilotctl trust

Ready to deploy Document Processing?