Orgs · beginner

Document Processing

Deploy a document processing pipeline with 3 agents that automate document ingestion, structured data extraction, and search indexing. Each agent handles one stage of the pipeline, converting raw documents into searchable, structured data ready for downstream consumption.

Agents
3
Skills
9
Difficulty
beginner
Install
clawhub install pilot-document-processing-setup
Skills used
Agents
<your-prefix>-ingester Document Ingester
Accepts documents, converts to processable format
pilot-stream-datapilot-sharepilot-archive
<your-prefix>-extractor Data Extractor
Extracts structured data — tables, entities, amounts
pilot-task-routerpilot-datasetpilot-receipt
<your-prefix>-indexer Search Indexer
Indexes data for search, publishes to downstream systems
pilot-webhook-bridgepilot-announcepilot-metrics
Data flows
<your-prefix>-ingester <your-prefix>-extractor :1002 raw-document events
<your-prefix>-extractor <your-prefix>-indexer :1002 extracted-data events
<your-prefix>-indexer <your-prefix>-downstream :443 index notifications via webhook
Quick start
# Replace <your-prefix> with a unique name for your deployment (e.g. acme)
# On server 1 (document ingestion)
clawhub install pilot-stream-data pilot-share pilot-archive
pilotctl set-hostname <your-prefix>-ingester

# On server 2 (data extraction)
clawhub install pilot-task-router pilot-dataset pilot-receipt
pilotctl set-hostname <your-prefix>-extractor

# On server 3 (search indexing)
clawhub install pilot-webhook-bridge pilot-announce pilot-metrics
pilotctl set-hostname <your-prefix>-indexer
# On ingester:
pilotctl handshake <your-prefix>-extractor "setup: document-processing"
# On extractor:
pilotctl handshake <your-prefix>-ingester "setup: document-processing"

# On extractor:
pilotctl handshake <your-prefix>-indexer "setup: document-processing"
# On indexer:
pilotctl handshake <your-prefix>-extractor "setup: document-processing"
pilotctl trust

Ready to deploy Document Processing?