Orgs · advanced

Data Labeling Pipeline

Deploy a distributed data labeling pipeline with 4 agents that ingests raw data, applies ML-based labels, reviews quality, and exports training-ready datasets. The system handles images, text, and audio across formats like COCO, VOC, and JSONL, with inter-annotator agreement checks and automated quality gating.

Agents
4
Skills
11
Difficulty
advanced
Install
clawhub install pilot-data-labeling-pipeline-setup
Skills used
Agents
<your-prefix>-ingester Data Ingester
Accepts raw data batches, splits into work items
pilot-s3-bridgepilot-stream-datapilot-task-parallel
<your-prefix>-labeler Auto Labeler
Applies ML-based labels to work items
pilot-task-routerpilot-datasetpilot-metrics
<your-prefix>-reviewer Quality Reviewer
Samples labeled items, checks accuracy, flags disagreements
pilot-reviewpilot-event-filterpilot-alert
<your-prefix>-exporter Dataset Exporter
Packages approved labels into training-ready datasets
pilot-datasetpilot-sharepilot-webhook-bridge
Data flows
<your-prefix>-ingester <your-prefix>-labeler :1002 work-item events
<your-prefix>-labeler <your-prefix>-reviewer :1002 labeled-item events
<your-prefix>-reviewer <your-prefix>-labeler :1002 review-feedback events
<your-prefix>-reviewer <your-prefix>-exporter :1002 approved-label events
<your-prefix>-exporter external :443 dataset-published notifications
Quick start
# Replace <your-prefix> with a unique name for your deployment (e.g. acme)
# On server 1 (data ingestion)
clawhub install pilot-s3-bridge pilot-stream-data pilot-task-parallel
pilotctl set-hostname <your-prefix>-ingester

# On server 2 (auto labeling)
clawhub install pilot-task-router pilot-dataset pilot-metrics
pilotctl set-hostname <your-prefix>-labeler

# On server 3 (quality review)
clawhub install pilot-review pilot-event-filter pilot-alert
pilotctl set-hostname <your-prefix>-reviewer

# On server 4 (dataset export)
clawhub install pilot-dataset pilot-share pilot-webhook-bridge
pilotctl set-hostname <your-prefix>-exporter
# On ingester:
pilotctl handshake <your-prefix>-labeler "setup: data-labeling-pipeline"
# On labeler:
pilotctl handshake <your-prefix>-ingester "setup: data-labeling-pipeline"

# On labeler:
pilotctl handshake <your-prefix>-reviewer "setup: data-labeling-pipeline"
# On reviewer:
pilotctl handshake <your-prefix>-labeler "setup: data-labeling-pipeline"

# On reviewer:
pilotctl handshake <your-prefix>-exporter "setup: data-labeling-pipeline"
# On exporter:
pilotctl handshake <your-prefix>-reviewer "setup: data-labeling-pipeline"
pilotctl trust

Ready to deploy Data Labeling Pipeline?