Orgs · advanced

ETL Data Pipeline

A five-stage ETL pipeline for production data processing. Agents handle ingestion from S3 and databases, parallel transformation, data validation with quarantine for bad records, loading into target stores, and automated reporting via Slack dashboards.

Agents
5
Skills
14
Difficulty
advanced
Install
clawhub install pilot-etl-data-pipeline-setup
Skills used
Agents
<your-prefix>-ingest Data Ingestion
Pulls raw data on schedule
pilot-s3-bridgepilot-database-bridgepilot-task-chainpilot-cron
<your-prefix>-transform Data Transformer
Raw data" },
pilot-task-routerpilot-stream-datapilot-task-parallel
<your-prefix>-validate Data Validator
Transformed records" },
pilot-task-routerpilot-audit-logpilot-alertpilot-quarantine
<your-prefix>-loader Data Loader
Validated records" },
pilot-database-bridgepilot-task-chainpilot-receipt
<your-prefix>-reporter Pipeline Reporter
Error rates" },
pilot-webhook-bridgepilot-metricspilot-slack-bridgepilot-cron
Data flows
<your-prefix>-ingest <your-prefix>-transform :1001 raw data batches
<your-prefix>-transform <your-prefix>-validate :1001 transformed records
<your-prefix>-validate <your-prefix>-loader :1001 validated records
<your-prefix>-loader <your-prefix>-reporter :1002 load receipts
<your-prefix>-validate <your-prefix>-reporter :1002 validation metrics
Quick start
# Replace <your-prefix> with a unique name for your deployment (e.g. acme)
# On ingestion server
clawhub install pilot-s3-bridge pilot-database-bridge pilot-task-chain pilot-cron
pilotctl set-hostname <your-prefix>-ingest

# On transform server
clawhub install pilot-task-router pilot-stream-data pilot-task-parallel
pilotctl set-hostname <your-prefix>-transform

# On validation server
clawhub install pilot-task-router pilot-audit-log pilot-alert pilot-quarantine
pilotctl set-hostname <your-prefix>-validate

# On loader server
clawhub install pilot-database-bridge pilot-task-chain pilot-receipt
pilotctl set-hostname <your-prefix>-loader

# On reporting server
clawhub install pilot-webhook-bridge pilot-metrics pilot-slack-bridge pilot-cron
pilotctl set-hostname <your-prefix>-reporter
# On ingest:
pilotctl handshake <your-prefix>-transform "setup: etl-data-pipeline"
# On transform:
pilotctl handshake <your-prefix>-ingest "setup: etl-data-pipeline"
# On loader:
pilotctl handshake <your-prefix>-reporter "setup: etl-data-pipeline"
# On reporter:
pilotctl handshake <your-prefix>-loader "setup: etl-data-pipeline"
# On loader:
pilotctl handshake <your-prefix>-validate "setup: etl-data-pipeline"
# On validate:
pilotctl handshake <your-prefix>-loader "setup: etl-data-pipeline"
# On reporter:
pilotctl handshake <your-prefix>-validate "setup: etl-data-pipeline"
# On validate:
pilotctl handshake <your-prefix>-reporter "setup: etl-data-pipeline"
# On transform:
pilotctl handshake <your-prefix>-validate "setup: etl-data-pipeline"
# On validate:
pilotctl handshake <your-prefix>-transform "setup: etl-data-pipeline"
pilotctl trust

Ready to deploy ETL Data Pipeline?