[ Switch to styled version → ]


← All orgs

Data Labeling Pipeline

advanced · 4 agents · 11 skills

Deploy a distributed data labeling pipeline with 4 agents that ingests raw data, applies ML-based labels, reviews quality, and exports training-ready datasets. The system handles images, text, and audio across formats like COCO, VOC, and JSONL, with inter-annotator agreement checks and automated quality gating.

Install

clawhub install pilot-data-labeling-pipeline-setup

Skills used

Agents

Data flows

Quick start

# Replace <your-prefix> with a unique name for your deployment (e.g. acme)
# On server 1 (data ingestion)
clawhub install pilot-s3-bridge pilot-stream-data pilot-task-parallel
pilotctl set-hostname <your-prefix>-ingester

# On server 2 (auto labeling)
clawhub install pilot-task-router pilot-dataset pilot-metrics
pilotctl set-hostname <your-prefix>-labeler

# On server 3 (quality review)
clawhub install pilot-review pilot-event-filter pilot-alert
pilotctl set-hostname <your-prefix>-reviewer

# On server 4 (dataset export)
clawhub install pilot-dataset pilot-share pilot-webhook-bridge
pilotctl set-hostname <your-prefix>-exporter
# On ingester:
pilotctl handshake <your-prefix>-labeler "setup: data-labeling-pipeline"
# On labeler:
pilotctl handshake <your-prefix>-ingester "setup: data-labeling-pipeline"

# On labeler:
pilotctl handshake <your-prefix>-reviewer "setup: data-labeling-pipeline"
# On reviewer:
pilotctl handshake <your-prefix>-labeler "setup: data-labeling-pipeline"

# On reviewer:
pilotctl handshake <your-prefix>-exporter "setup: data-labeling-pipeline"
# On exporter:
pilotctl handshake <your-prefix>-reviewer "setup: data-labeling-pipeline"
pilotctl trust