Saoussen’s Substack

Production-Ready MLOps on GCP Part 8: Model Monitoring & Continuous Training

Saoussen CHAABNIA — Tue, 17 Feb 2026 15:16:59 GMT

Complete Series:

Introduction

In the previous article, we automated the entire development workflow with CI/CD. But production ML has one more critical challenge: models degrade over time.

Your model was trained on January data. It’s now November. User behavior has changed. Payment methods shifted. Routes evolved. Your model’s accuracy is silently degrading.

This final article covers:

Event-driven continuous training (automatic retraining)
Scheduled retraining patterns
Production orchestration patterns
Observability and debugging
Cost management
Responding to model degradation

By the end, you’ll know how to keep models fresh and accurate in production.

The Model Degradation Problem

Scenario: Your taxi fare prediction model

January (training data):
  - Average trip: 5.2 miles
  - Payment: 60% credit, 40% cash
  - Peak hour: 8 AM
  - Model RMSE: 2.5
November (production data):
  - Average trip: 7.8 miles (+50%!)
  - Payment: 75% credit, 25% cash
  - Peak hour: 9 AM
  - Model RMSE: 3.8 (+52% worse!)

Without monitoring: You don’t notice until customers complain. With monitoring: Automatic alerts + retraining.

Event-Driven Continuous Training

Goal: New data arrives → automatically retrain model.

Cloud Run Function Trigger

# Cloud Run Function (simplified)
def mlops_entrypoint(event, context):
    “”“Triggered when new data arrives in BigQuery.”“”
    # Parse event
    dataset_id = event[’protoPayload’][’resourceName’]
    # Check if significant new data
    if should_retrain(dataset_id):
        # Trigger training pipeline
        trigger_training_pipeline(
            template_path=”gs://.../taxifare-training-pipeline:latest”,
            enable_caching=False,
            use_latest_data=True
        )
    return “OK”

Trigger Conditions

def should_retrain(dataset_id):
    # Option 1: Time-based
    if hours_since_last_training() > 24:
        return True
    # Option 2: Data volume-based
    if new_rows_since_last_training() > 100000:
        return True
    # Option 3: Performance-based (requires ground truth)
    if recent_rmse() > champion_rmse * 1.1:
        return True
    return False

Event Flow

New BigQuery Rows
       ↓
Pub/Sub Message
       ↓
Cloud Run Function
       ↓
(Decision: Should retrain?)
       ↓
Trigger Training Pipeline
       ↓
Train → Evaluate → Compare → Upload (if better)
       ↓
Pub/Sub Notification (pipeline complete)
       ↓
Cloud Run Function
       ↓
Trigger Prediction Pipeline (use new model)

Configuration

# Terraform: Event trigger setup
resource “google_eventarc_trigger” “bigquery_insert_trigger” {
  name     = “bigquery-data-insert”
  location = var.region
  matching_criteria {
    attribute = “type”
    value     = “google.cloud.bigquery.dataset.v1.dataInserted”
  }
  destination {
    cloud_run_service {
      service = google_cloud_run_service.mlops_trigger.name
      region  = var.region
    }
  }
}

Scheduled Retraining

For predictable retraining (e.g., weekly):

Vertex AI Pipeline Schedule Setup

# Create weekly training schedule using Vertex AI Pipeline Schedules
poetry run python -m pipelines.utils.schedule_pipeline \
  --pipeline_type=training \
  --template_path=https://us-central1-kfp.pkg.dev/my-project/mlops-pipeline-repo/taxifare-training-pipeline/latest \
  --pipeline_root=gs://my-project-pl-root \
  --display_name=prod-training-pipeline \
  --schedule_name=prod-training-schedule \
  --cron=”0 2 * * 0” \
  --enable_caching=false \
  --use_latest_data=true

Common Schedules

# Daily at 2 AM
--schedule=”0 2 * * *”
# Weekly on Sunday at 2 AM
--schedule=”0 2 * * 0”
# Monthly on 1st at 2 AM
--schedule=”0 2 1 * *”
# Every 6 hours
--schedule=”0 */6 * * *”

Scheduled Pipeline Parameters

# Schedule with parameters
poetry run python -m pipelines.utils.schedule_pipeline \
  --project=my-prod-project \
  --location=us-central1 \
  --pipeline_template_path=gs://.../training:latest \
  --schedule=”0 2 * * 0” \
  --parameters=’{
    “use_latest_data”: true,
    “enable_caching”: false,
    “model_name”: “taxi-traffic-model”
  }’

Production Orchestration Patterns

Pattern 1: Scheduled Training → Automatic Prediction

Vertex AI Pipeline Schedule (weekly)
    ↓
Training Pipeline
    ↓
(If new champion)
    ↓
Trigger Prediction Pipeline
    ↓
Generate predictions for next week

Use case: Weekly batch predictions for business planning

Implementation: This pattern is achieved through the Cloud Run Function that listens for training pipeline completion events via Pub/Sub, then triggers the prediction pipeline if a new champion model was uploaded.

Pattern 2: New Data → Continuous Training

New Data Arrives (hourly)
    ↓
Cloud Run Function
    ↓
(Check: Enough new data?)
    ↓
Training Pipeline
    ↓
(Champion/Challenger comparison)
    ↓
Model Registry (update champion if better)

Use case: Always have the freshest model

Implementation: The Cloud Run Function (terraform/modules/cloudrunfunction/src/main.py) triggers pipelines based on Pub/Sub events:

@functions_framework.cloud_event
def mlops_entrypoint(event):
    pipeline_config = os.getenv(”PIPELINE_CONFIG”)
    pipeline_config_dict = json.loads(pipeline_config)
    submit_pipeline_job(pipeline_config_dict)

The function reads configuration from environment variables and submits the appropriate pipeline job (training or prediction) to Vertex AI.

Pattern 3: Event-Driven Training via Cloud Run Function

The Cloud Run Function can be triggered by various events (Pub/Sub, Cloud Storage, etc.) to automatically start training or prediction pipelines. The actual trigger mechanism is configured in Terraform (terraform/modules/cloudrunfunction/) and the function logic handles pipeline submission to Vertex AI.

Observability and Debugging

Key Metrics to Monitor

1. Model Performance: View in Vertex AI Model Registry:

RMSE, MAE, MAPE, MSLE metrics for each model version
Comparison between champion and challenger models
Evaluation results from test set

2. Data Skew: Monitored by the model_batch_predict_op component in the prediction pipeline:

Training vs. serving feature distributions
Skew detection thresholds configured per feature
Automatic email alerts when skew exceeds threshold
Metrics logged to Cloud Logging

3. Pipeline Execution: Track in Vertex AI Pipelines console:

Pipeline success/failure rates
Component execution times
Resource utilization
Error logs and stack traces

4. Training Frequency: Monitor via Vertex AI Pipeline Schedules:

Scheduled run frequency (hourly, daily, weekly)
Manual vs. automatic triggers
Champion model update frequency

Cloud Logging Queries

Find training triggers:

resource.type=”cloud_run_revision”
resource.labels.service_name=”mlops-trigger”
jsonPayload.message=”Triggering training pipeline”

Find champion promotions:

resource.type=”aiplatform.googleapis.com/PipelineJob”
jsonPayload.message=~”Challenger wins”

Find skew detections:

resource.type=”aiplatform.googleapis.com/BatchPredictionJob”
jsonPayload.skew_detected=true

Dashboards

Create Cloud Monitoring dashboards to visualize:

Vertex AI Pipeline execution success rates and durations
Model evaluation metrics from the Model Registry
Cloud Run Function invocation counts and errors
BigQuery job statistics for data processing steps
Skew detection alerts from batch prediction jobs

Responding to Model Degradation

Alert → Investigate → Retrain Workflow

1. Receive Alert: When the prediction pipeline’s model_batch_predict_op component detects data skew, it sends an email alert configured in the component parameters.

2. Investigate:

Check Vertex AI Pipelines console for skew detection details
Review Cloud Logging for skew metrics and feature distributions
Compare recent prediction data against training dataset in BigQuery

3. Trigger Retraining: Manually trigger the training pipeline with latest data:

cd pipelines
poetry run python -m pipelines.utils.trigger_pipeline \
  --template_path=./taxifare-training-pipeline.yaml \
  --display_name=manual-retrain-pipeline \
  --enable_caching=false \
  --use_latest_data=true

Or use the Makefile shortcut:

make training build=false enable_caching=false use_latest_data=true

4. Validate Improvement:

Check Vertex AI Model Registry for new model metrics
Compare RMSE between old champion and new model
The pipeline’s champion/challenger logic automatically promotes better models

Optimization Strategies

1. Use Pipeline Caching:

# Enable caching for preprocessing steps that don’t change
make training enable_caching=true

2. Adjust Training Schedule: Configure pipeline schedules based on data velocity:

High-frequency data: Daily training
Stable data: Weekly training
Monitor skew alerts to determine optimal frequency

3. Right-Size Training Resources: Configure machine types in get_workerpool_spec_op component based on dataset size and model complexity.

4. Clean Up Old Artifacts: Regularly manage artifacts in:

Artifact Registry (old pipeline versions and Docker images)
Vertex AI Model Registry (non-champion model versions)
Cloud Storage (old pipeline artifacts and outputs)

5. Optimize BigQuery Costs: The preprocessing SQL queries (ingest.sql, ingest_pred.sql) are optimized to:

Filter data early in the query
Use partitioning when available
Limit data scanned with timestamps

Best Practices

1. Always Version Everything

The system automatically versions:

Models: Stored in Vertex AI Model Registry with version numbers
Pipelines: Tagged in Artifact Registry (e.g., v1.2.3, latest)
Docker images: Tagged in Artifact Registry matching pipeline versions
Training data: Timestamped via timestamp parameter in pipeline runs

2. Use Champion/Challenger Pattern

Implemented in the upload_best_model_op component:

New models are only promoted if they beat the current champion
RMSE comparison happens automatically during training pipeline
All models are preserved in registry for rollback capability

3. Monitor Before Optimizing

1. Deploy with monitoring
2. Observe for 1 week
3. Identify bottlenecks
4. Optimize selectively
5. Measure improvement

4. Set Up Alerts Thoughtfully

# Bad: Alert on every small change
if rmse > baseline_rmse * 1.01:
    alert()
# Good: Alert on sustained degradation
if rolling_avg_rmse(days=7) > baseline_rmse * 1.15:
    alert()

5. Document Retraining Decisions

## Retraining Log
### 2024-01-15
- Trigger: Scheduled weekly retrain
- Data: 2024-01-08 to 2024-01-15
- Result: New model RMSE 2.3 (vs champion 2.5) → Promoted
- Notes: Improved accuracy on credit card payments
### 2024-01-22
- Trigger: Accuracy degradation alert
- Data: 2024-01-15 to 2024-01-22
- Result: New model RMSE 2.8 (vs champion 2.3) → Not promoted
- Notes: Holiday data anomaly, monitoring

Conclusion: Your Complete MLOps System

You’ve now built a complete, production-ready MLOps system across 8 articles:

Architecture → Multi-environment design
Developer Experience → Productive workflows
Infrastructure → Automated with Terraform
Components → Modular and reusable
Training → Sophisticated pipeline
Prediction → Scalable inference
CI/CD → Complete automation
Operations → Continuous improvement

Your system now:

Trains models automatically
Deploys only better models
Generates predictions at scale
Monitors for degradation
Retrains when needed
Maintains itself

What’s next?

Implement it for your use case
Customize for your data
Extend with new features
Share learnings with the community

Thank you for following this comprehensive series!

Now go build amazing, self-maintaining ML systems! 🚀🎉

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Production-Ready MLOps on GCP Part 7: CI/CD for ML

Saoussen CHAABNIA — Tue, 17 Feb 2026 15:14:41 GMT

Complete Series:

Introduction

In the previous article, we built a sophisticated training pipeline that goes from raw data to a production-ready model in Vertex AI Model Registry. But there’s a problem: if you’re manually building containers, compiling pipelines, and deploying infrastructure whenever code changes, you’ll spend more time on plumbing than on improving your models.

This is where CI/CD (Continuous Integration / Continuous Deployment) transforms MLOps from manual and error-prone to automated and reliable.

But ML CI/CD isn’t just copying traditional software CI/CD. ML systems have unique challenges:

Long-running jobs: Training can take hours (not seconds like unit tests)
Non-determinism: Models trained on same data can differ slightly
Multi-artifact deployments: Code + data + models + infrastructure
Multiple environments: Dev, test, prod with different data
Integration testing: Need actual cloud resources (expensive!)

In this article, we’ll explore:

Our 6 Cloud Build CI/CD pipelines
Testing strategies for ML (unit, integration, E2E)
Infrastructure automation with Terraform
Release management and versioning
Development workflow from PR to production

By the end, you’ll understand how to build CI/CD that makes deploying ML as reliable as deploying traditional software.

CI/CD Architecture Overview

Our CI/CD runs entirely in an admin GCP project separate from dev/test/prod:

GitHub Repository
       |
       | (webhook on PR/merge/tag)
       ↓
Admin Project - Cloud Build
       |
       ├──> PR Checks (on pull request)
       ├──> E2E Tests (on /gcbrun comment)
       ├──> Terraform Plan (on PR affecting terraform/)
       ├──> Terraform Apply (on merge to main)
       ├──> Release (on git tag)
       └──> Schedule Pipelines (manual trigger)
       |
       ↓
Deploy to: Dev / Test / Prod Projects

omplete CI/CD workflow from code commit to production deployment

Why a separate admin project?

Security: Cloud Build has permissions to deploy to all environments
Isolation: CI/CD failures don’t affect production workloads
Auditing: All deployments tracked in one place
Cost tracking: Separate billing for CI/CD

The 6 Cloud Build Pipelines

1. PR Checks (`pr-checks.yaml`)

Trigger: Every pull request Purpose: Fast feedback on code quality

steps:
  - name: python:3.10.14
    args:
      - -c
      - |
        # Install Poetry
        curl -sSL https://install.python-poetry.org | python3 -
        export PATH=”/builder/home/.local/bin:$$PATH”
        # Install dependencies
        make install
        # Git init for pre-commit
        git init && git add .
        # Compile pipelines (validates syntax)
        make compile pipeline=training
        make compile pipeline=prediction
        # Run unit tests
        make test-components
        make test-pipelines
timeout: 5400s  # 90 minutes

What gets checked:

Code quality: flake8, black, ruff (via pre-commit hooks)
Pipeline syntax: Can KFP compile the pipelines?
Component logic: Do unit tests pass?
Pipeline logic: Do pipeline tests pass?

Example failure:

❌ flake8: line too long (E501)
   File: components/src/components/upload_best_model_op.py
   Line 45: import google.cloud.aiplatform_v1 import ModelEvaluation, ModelServiceClient

This catches issues before they’re merged, saving time and preventing broken main branches.

Developer experience:

Open PR
Cloud Build automatically runs checks
Results appear as GitHub check (✅ or ❌)
Fix issues if needed
Merge when all checks pass

2. E2E Tests (`e2e-test.yaml`)

Trigger: Comment /gcbrun on PR Purpose: Validate that pipelines actually run in Vertex AI

steps:
  # Build training container
  - id: build-training-image
    name: gcr.io/cloud-builders/docker
    dir: model
    args: [
      ‘build’,
      ‘-t’, ‘${_TEST_VERTEX_LOCATION}-docker.pkg.dev/...’,
      ‘.’
    ]
  # Push to Artifact Registry
  - id: push-training-image
    name: gcr.io/cloud-builders/docker
    args: [’push’, ‘...’]
  # Run E2E tests
  - id: e2e-tests
    name: python:3.10.14
    args:
      - -c
      - |
        make install
        export TRAINING_IMAGE=...
        make e2e-tests pipeline=training
        make e2e-tests pipeline=prediction
timeout: 18000s  # 5 hours (allows pipelines to run)

What gets tested:

Build: Can the training container build successfully?
Training pipeline: Does it run end-to-end in Vertex AI?
Prediction pipeline: Does it produce predictions?
Artifacts: Are models uploaded? Predictions generated?

Why manual trigger?

E2E tests are expensive (VM costs, BigQuery, etc.)
E2E tests take hours
Not every PR needs E2E testing
Developer decides when to run

When to run E2E tests:

✅ Before merging major changes
✅ After refactoring pipeline logic
✅ When adding new components
❌ For documentation-only changes
❌ For minor bug fixes

3. Terraform Plan (`terraform-plan.yaml`)

Trigger: PR that modifies terraform/ files Purpose: Preview infrastructure changes before applying

steps:
  - name: hashicorp/terraform
    args:
      - init
      - -backend-config=bucket=${_TFSTATE_BUCKET}
  - name: hashicorp/terraform
    args:
      - plan
      - -out=tfplan
  - name: hashicorp/terraform
    args:
      - show
      - tfplan

Example output:

Terraform will perform the following actions:
  # google_storage_bucket.new_bucket will be created
  + resource “google_storage_bucket” “new_bucket” {
      + name     = “my-project-new-bucket”
      + location = “us-central1”
    }
  # google_service_account_iam_member.new_permission will be added
  + resource “google_service_account_iam_member” “new_permission” {
      + role               = “roles/storage.objectViewer”
      + service_account_id = “projects/.../serviceAccounts/vertex-pipelines@...”
    }
Plan: 2 to add, 0 to change, 0 to destroy.

Why this matters:

Prevents accidental deletions
Makes infrastructure changes visible to reviewers
Enables discussion before changes are applied
Catches Terraform syntax errors

Separate triggers for each environment:

terraform-plan-dev.yaml
terraform-plan-test.yaml
terraform-plan-prod.yaml

This allows environment-specific infrastructure changes.

4. Terraform Apply (`terraform-apply.yaml`)

Trigger: Merge to main branch Purpose: Actually deploy infrastructure changes

steps:
  - name: hashicorp/terraform
    args:
      - init
      - -backend-config=bucket=${_TFSTATE_BUCKET}
  - name: hashicorp/terraform
    args:
      - apply
      - -auto-approve

Deployment order:

Dev environment (lowest risk)
Test environment (validate before prod)
Prod environment (final deployment)

Safety mechanisms:

Terraform plan must have been reviewed in PR
State is locked in GCS (prevents concurrent applies)
Separate triggers prevent accidental prod deployment
Cloud Build logs create audit trail

5. Release (`release.yaml`)

Trigger: Git tag (e.g., v1.2.3) Purpose: Build and push versioned artifacts to all environments

steps:
  # Build Docker image
  - id: build-container-images
    name: gcr.io/cloud-builders/docker
    args:
      - -c
      - |
        docker build -t ${_IMAGE_NAME}:latest .
        for proj in ${_DESTINATION_PROJECTS} ; do
          docker tag ${_IMAGE_NAME}:latest \
            .../${proj}/mlops-docker-repo/${_IMAGE_NAME}:${TAG_NAME}
          docker push \
            .../${proj}/mlops-docker-repo/${_IMAGE_NAME}:${TAG_NAME}
        done
  # Compile and upload pipelines
  - id: compile-and-publish-pipelines
    name: python:3.10.14
    args:
      - -c
      - |
        make install
        for proj in ${_DESTINATION_PROJECTS} ; do
          export TRAINING_IMAGE=.../${proj}/.../training:${TAG_NAME}
          make compile pipeline=training
          make compile pipeline=prediction
          # Upload to Artifact Registry
          poetry run python -m pipelines.utils.upload_pipeline \
            --template_path=taxifare-training-pipeline.yaml \
            --tag=latest \
            --tag=${TAG_NAME}
        done
timeout: 1800s  # 30 minutes

Artifacts created (for each environment):

Docker image: training:v1.2.3
Training pipeline: taxifare-training-pipeline:v1.2.3
Prediction pipeline: taxifare-prediction-pipeline:v1.2.3

Tagging strategy:

latest: Always points to most recent release
v1.2.3: Specific version for rollback

Release workflow:

# Create and push git tag
git tag -a v1.2.3 -m “Release 1.2.3: Improved model accuracy”
git push origin v1.2.3
# Cloud Build automatically:
# 1. Builds Docker images
# 2. Compiles pipelines
# 3. Pushes to all environments (dev/test/prod)

6. Schedule Pipelines (`schedule-pipelines.yaml`)

Trigger: Manual Purpose: Create Vertex AI Pipeline Schedules for periodic retraining

steps:
  - name: python:3.10.14
    args:
      - -c
      - |
        poetry run python -m pipelines.utils.schedule_pipeline \
          --project=${_VERTEX_PROJECT_ID} \
          --location=${_VERTEX_LOCATION} \
          --pipeline_template_path=${_TRAINING_TEMPLATE_PATH} \
          --schedule=”0 2 * * 0”  # Every Sunday at 2 AM

Use cases:

Weekly model retraining in production
Daily retraining in test environment
Monthly full data refresh

Testing Strategies for ML

ML testing requires multiple levels:

Level 1: Unit Tests

Test individual component logic in isolation:

# tests/test_upload_best_model_op.py
def test_champion_wins(mock_model_class, tmp_path):
    “”“Test that champion is preserved when it’s better.”“”
    # Mock champion with RMSE=0.8
    mock_champion = create_mock_model(rmse=0.8)
    mock_model_class.list.return_value = [mock_champion]
    # Create challenger with worse RMSE=0.9
    challenger_metrics = {”rmse”: 0.9}
    # Call component function
    upload_model(
        model_eval_metrics=challenger_metrics,
        eval_metric=”rmse”,
        eval_lower_is_better=True,
        # ...
    )
    # Assert challenger uploaded but NOT as default
    mock_model_class.upload.assert_called_once_with(
        is_default_version=False  # Champion preserved!
    )

Benefits:

Fast (milliseconds)
Free (no cloud resources)
Run on every commit

Level 2: Pipeline Compilation Tests

Ensure pipelines can compile to YAML:

def test_training_pipeline_compiles():
    “”“Validate training pipeline compiles without errors.”“”
    from kfp import compiler
    from pipelines.training import pipeline
    compiler.Compiler().compile(
        pipeline_func=pipeline,
        package_path=”test_training_pipeline.yaml”
    )
    # If this doesn’t raise, compilation succeeded

Catches:

Syntax errors in pipeline definition
Missing component imports
Incorrect input/output connections

Level 3: End-to-End Tests

Run actual pipelines in dev environment:

# Build training container
docker build -t training:test ./model
docker push us-central1-docker.pkg.dev/my-dev-project/training:test
# Run training pipeline E2E
poetry run python -m pipelines.utils.run_pipeline \
  --pipeline=training \
  --project=my-dev-project \
  --location=us-central1 \
  --enable_caching=false
# Verify outputs
# - Model exists in Model Registry?
# - Evaluation metrics logged?
# - Champion comparison executed?

What E2E tests catch:

IAM permission issues
API enablement problems
Resource quota limits
Real data quality issues
Training convergence problems

Pre-commit Hooks: Local Quality Gates

Before code even reaches CI, pre-commit hooks catch issues:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-added-large-files
  - repo: https://github.com/psf/black
    hooks:
      - id: black
  - repo: https://github.com/pycqa/flake8
    hooks:
      - id: flake8
        args: [--max-line-length=100]
  - repo: https://github.com/astral-sh/ruff-pre-commit
    hooks:
      - id: ruff
        args: [--fix]

Developer workflow:

# Install hooks once
cd pipelines && poetry run pre-commit install
# Hooks run automatically on git commit
git add components/src/components/my_component.py
git commit -m “Add new component”
# Pre-commit runs:
### Removes trailing whitespace
### Formats code with black
### Checks code quality with flake8
### Auto-fixes issues with ruff
# Commit proceeds only if all hooks pass

Benefits:

Instant feedback (don’t wait for CI)
Consistent code style across team
Catches common issues before PR

Complete Workflow: From Code to Production

Let’s walk through a complete development cycle, from feature development to production deployment:

1. Feature Development

# Create feature branch
git checkout -b feature/improve-preprocessing
# Make changes
vim pipelines/src/pipelines/queries/ingest.sql
# Run tests locally
make test-pipelines
# Commit (pre-commit hooks run)
git add pipelines/
git commit -m “Improve feature engineering in preprocessing”

2. Pull Request

# Push branch
git push origin feature/improve-preprocessing
# Open PR on GitHub
gh pr create --title “Improve preprocessing” --body “Add speed features”

Automatic triggers:

PR Checks: Linting, tests, compilation
Terraform Plan (if infrastructure changed)

3. Code Review

Reviewer sees:

Code changes
PR check results (all passing)
Terraform plan (if applicable)

Reviewer can request:

Could you run E2E tests to validate this works end-to-end?
Comment /gcbrun to trigger

Developer comments /gcbrun → E2E tests run

4. Merge

Once approved and checks pass:

gh pr merge --squash

Automatic triggers:

Terraform Apply (if infrastructure changed)
Deploy to dev environment

5. Release

When ready for test/prod:

# Create release tag
git tag -a v1.3.0 -m “Release 1.3.0: Improved preprocessing”
git push origin v1.3.0

Automatic triggers:

Build Docker images for all environments
Compile and upload pipelines
Tag with v1.3.0 and latest

6. Deploy to Test Environment

After the release is complete, manually create pipeline schedules in the test then prod environments:

Option A: Via Cloud Build UI

Go to Cloud Build → Triggers in admin project
Find schedule-pipelines trigger → Click “Run”
Provide substitutions

Option B: Via gcloud

gcloud builds submit \
  --config=cloudbuild/schedule-pipelines.yaml \
  --project=admin-project \
  --substitutions=_ENV=test,_TRAINING_TAG_NAME=v1.3.0,...

This creates two Vertex AI Pipeline Schedules in the test project:

test-training-schedule (runs hourly)
test-prediction-schedule (runs daily)

Verify: Vertex AI Console → Pipelines → Schedules

7. Deploy to Production

Once validated in test, repeat for production:

gcloud builds submit \
  --config=cloudbuild/schedule-pipelines.yaml \
  --project=admin-project \
  --substitutions=_ENV=prod,_TRAINING_TAG_NAME=v1.3.0,...

Creates prod-training-schedule and prod-prediction-schedule

Important: Schedules are created via Vertex AI’s PipelineJobSchedule API (not Cloud Scheduler), executed by pipelines/src/pipelines/utils/schedule_pipeline.py

Event-Driven Execution (Optional)

For continuous training triggered by new data, deploy the Cloud Run Function via Terraform:

# In terraform/environments/prod/main.tf
module “cloudrunfunction” {
  source = “../../modules/cloudrunfunction”
  pipeline_config = {
    type                     = “training”
    training_template_path   = “https://.../taxifare-training-pipeline/latest”
    prediction_template_path = “https://.../taxifare-batch-prediction-pipeline/latest”
    # ... other config
  }
  dataset_id = “chicago_taxi_trips”
  table_id   = “taxi_trips”
}

The function (terraform/modules/cloudrunfunction/src/main.py) triggers pipelines when new data is inserted into BigQuery, providing an alternative to scheduled runs.

Artifact Management

Docker Image Versioning

us-central1-docker.pkg.dev/my-project/mlops-docker-repo/training:
├── latest           (points to v1.3.0)
├── v1.3.0           (current release)
├── v1.2.3           (previous release)
├── v1.2.2           (older release)
└── abc123f          (commit SHAs for testing)

Tagging strategy:

latest: Production deployments pull this
v1.2.3: Specific version for reproducibility
Commit SHA: E2E testing during PR

Pipeline Versioning

Same strategy for compiled KFP pipelines:

mlops-pipeline-repo/taxifare-training-pipeline:
├── latest
├── v1.3.0
├── v1.2.3

Rollback scenario:

# Something wrong with v1.3.0?
# Submit pipeline with older version
poetry run python -m pipelines.utils.run_pipeline \
  --template_path=https://.../taxifare-training-pipeline:v1.2.3

Best Practices

1. Fail Fast

Order steps from fastest to slowest:

steps:
  - Lint (5s)              # Fail here if code quality issues
  - Unit tests (30s)       # Fail here if logic broken
  - Compile (1min)         # Fail here if syntax errors
  - E2E tests (1hr)        # Only run if everything else passes

2. Make CI/CD Logs Searchable

# Structured logging
logging.info(f”component=upload_model status=success model_id={model_id}”)

Cloud Logging query:

resource.type=”cloud_build”
jsonPayload.component=”upload_model”
jsonPayload.status=”success”

3. Separate Admin Project

Never run CI/CD in production project:

Security isolation
Failure isolation
Cost tracking

4. Use Substitution Variables

substitutions:
  _VERTEX_PROJECT_ID: my-project
  _VERTEX_LOCATION: us-central1
# Easy to update, no hardcoded values

5. Test Terraform in Dev First

Sequence:

Terraform plan/apply in dev
Validate resources created
Then apply to test
Finally apply to prod

Conclusion

CI/CD transforms ML development from manual and error-prone to automated and reliable:

PR Checks: Catch issues before merge (90s feedback)
E2E Tests: Validate end-to-end functionality (optional, expensive)
Terraform Automation: Infrastructure as code with preview/apply
Release Management: Versioned artifacts across all environments
Schedule Pipelines: Automated retraining setup

With this CI/CD system:

Every commit is tested
Every infrastructure change is reviewed
Every release is versioned
Production deployments are repeatable

In the next article, we’ll explore production deployment: running predictions, monitoring models, and handling the full production lifecycle.

Key Takeaways:

Separate admin project for CI/CD security and isolation
Six pipelines cover full lifecycle: checks, tests, infrastructure, releases
Multi-level testing: unit (fast), compilation (syntax), E2E (real)
Artifact versioning enables reproducibility and rollback
Pre-commit hooks catch issues before CI
Fail fast: run cheapest validations first

Next in Series: Production ML Deployment: Batch Predictions & Monitoring

GitHub Repository: production-ready-MLOps-on-GCP

CI/CD Code:

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Production-Ready MLOps on GCP Part 6: Prediction Pipeline(From Champion Model to Batch Predictions)

Saoussen CHAABNIA — Sun, 08 Feb 2026 16:32:34 GMT

Complete Series:

Introduction

In the previous article, we built a sophisticated training pipeline that takes raw data and produces a champion model in Vertex AI Model Registry. Now it’s time to put that model to work by generating predictions at scale.

Running predictions in production requires:

Finding the right model: Always use the current champion
Preprocessing consistency: Apply the same transformations as training
Scalability: Handle millions of predictions efficiently
Monitoring: Detect when data distributions shift
Reliability: Fail fast and fail clearly

In this article, we’ll explore:

Prediction pipeline architecture and design
Complete code walkthrough
Batch prediction with BigQuery
Model monitoring and skew detection
Running predictions in different scenarios

By the end, you’ll understand how to build a prediction pipeline that reliably serves your trained models.

Prediction Pipeline Architecture

Our prediction pipeline is simpler than training but equally critical:

1. Lookup Champion Model (Model Registry)
         ↓
2. Preprocess Data (BigQuery SQL - same as training)
         ↓
3. Batch Prediction (BigQuery → BigQuery)
         ↓
4. Monitor for Skew (Training-serving skew detection)
         ↓
5. Alert on Issues (Email alerts if skew detected)

Key design decisions:

BigQuery → BigQuery: Input and output both in BigQuery for seamless integration
Same preprocessing: SQL preprocessing identical to training (consistency)
Built-in monitoring: Vertex AI automatically compares to training data
Scalable: Horizontal scaling with multiple replicas

Step 1: Lookup Champion Model

The prediction pipeline starts by finding the current production model:

champion_model = lookup_model_op(
    model_name=”taxi-traffic-model”,
    location=location,
    project=project,
    fail_on_model_not_found=True,  # Must exist for predictions!
).set_display_name(”Look up champion model”)

What happens:

Query Vertex AI Model Registry for models with display name taxi-traffic-model
Filter for the default version (the champion)
Extract model URI and training dataset metadata
Pass to batch prediction step

Critical: Setting fail_on_model_not_found=True ensures the pipeline fails fast if no model exists, preventing silent failures.

Step 2: Data Preprocessing

Goal: Transform raw prediction data into features matching training format.

prep_query = generate_query(
    input_file=queries_folder / “ingest_pred.sql”,
    source=bq_source_uri,
    dataset=f”{project}.{dataset}”,
    table_=”prep_prediction_table”,
    start_timestamp=timestamp,
    use_latest_data=use_latest_data,
)

prep_op = BigqueryQueryJobOp(
    project=project,
    location=”US”,
    query=prep_query,
).set_display_name(”Ingest & preprocess data”)

Why Same SQL as Training?

Training preprocessing:

SELECT
  EXTRACT(DAYOFWEEK FROM trip_start_timestamp) AS dayofweek,
  EXTRACT(HOUR FROM trip_start_timestamp) AS hourofday,
  trip_miles,
  trip_seconds,
  SAFE_DIVIDE(trip_miles, trip_seconds) * 3600 AS trip_distance,
  company,
  payment_type,
  fare AS total_fare  -- Label for training
FROM ...

Prediction preprocessing:

SELECT
  EXTRACT(DAYOFWEEK FROM trip_start_timestamp) AS dayofweek,
  EXTRACT(HOUR FROM trip_start_timestamp) AS hourofday,
  trip_miles,
  trip_seconds,
  SAFE_DIVIDE(trip_miles, trip_seconds) * 3600 AS trip_distance,
  company,
  payment_type
  -- NO label column (we’re predicting it!)
FROM ...

Critical for consistency: If preprocessing differs between training and prediction, the model will fail or produce garbage predictions.

Step 3: Batch Prediction

Goal: Generate predictions for thousands/millions of rows at scale.

model_batch_predict_op(
    model=champion_model.outputs[”model”],
    job_display_name=”taxi-fare-predict-job”,
    location=location,
    project=project,

    # Input: BigQuery table
    source_uri=f”bq://{project}.{dataset}.prep_prediction_table”,
    source_format=”bigquery”,
    # Output: BigQuery table
    destination_uri=f”bq://{project}.{dataset}”,
    destination_format=”bigquery”,
    # Resource configuration
    machine_type=”n2-standard-4”,
    starting_replica_count=3,
    max_replica_count=10,
    # Monitoring configuration
    monitoring_training_dataset=champion_model.outputs[”training_dataset”],
    monitoring_alert_email_addresses=[”team@example.com”],
    monitoring_skew_config={”defaultSkewThreshold”: {”value”: 0.001}},
).after(prep_op).set_display_name(”Run prediction job”)

Batch Prediction Workflow

Job Submission: Vertex AI creates a batch prediction job
Resource Allocation: Provisions 3–10 VMs (based on data size)
Model Loading: Loads SavedModel on each VM
Parallel Processing: Each VM processes a partition of the data
Predictions: Each row gets a prediction
Output: Writes predictions to BigQuery

Output table structure:

SELECT * FROM `my-project.taxi_trips_dataset.predictions_2024_01_15`

Horizontal Scaling

starting_replica_count=3,   # Start with 3 VMs
max_replica_count=10,       # Scale up to 10 if needed

How scaling works:

Small dataset (< 10k rows): 3 VMs sufficient
Medium dataset (100k rows): Scales to ~5 VMs
Large dataset (1M+ rows): Scales to 10 VMs

Cost optimization:

Use n2-standard-2 for small datasets
Use n2-standard-4 for medium datasets
Use n2-standard-8 for large datasets

Step 4: Model Monitoring and Skew Detection

Over time, data distributions shift:

Example scenario:

Training data (Jan-Mar 2024):
  - Average trip: 5.2 miles
  - Payment: 60% credit card, 40% cash
  - Peak hour: 8 AM

Production data (Nov 2024):
  - Average trip: 7.8 miles  ← Shift!
  - Payment: 75% credit card, 25% cash  ← Shift!
  - Peak hour: 9 AM  ← Shift!

When distributions shift, model accuracy degrades. Model monitoring catches this.

Training-Serving Skew Detection

Vertex AI automatically compares:

Training data distribution (saved during training)
Prediction data distribution (from batch prediction)

Skew metrics:

monitoring_skew_config={
    “defaultSkewThreshold”: {”value”: 0.001},
    # Or per-feature thresholds:
    # “skewThresholds”: {
    #     “payment_type”: {”value”: 0.005},
    #     “trip_distance”: {”value”: 0.01},
    # }
}

How skew is calculated:

For categorical features (e.g., payment_type):

Skew = L-infinity distance between distributions

Training: {cash: 0.4, credit: 0.6}
Prediction: {cash: 0.25, credit: 0.75}

Skew = max(|0.4-0.25|, |0.6-0.75|) = max(0.15, 0.15) = 0.15

If skew > threshold (0.001), alert is triggered.

Alert Configuration

monitoring_alert_email_addresses=[”ml-team@example.com”],
notification_channels=[
    “projects/my-project/notificationChannels/email-channel”,
    “projects/my-project/notificationChannels/slack-channel”,
]

Alert email example:

Subject: Model Monitoring Alert - Skew Detected
Model: taxi-traffic-model (v5)
Feature: payment_type
Skew: 0.15 (threshold: 0.001)
Training distribution:
  cash: 40%
  credit: 60%
Prediction distribution:
  cash: 25%
  credit: 75%
Recommended action: Retrain model with recent data.
View details: https://console.cloud.google.com/vertex-ai/...

Complete Prediction Pipeline Code

Now let’s see how it all fits together:

from kfp import compiler, dsl
from components import lookup_model_op, model_batch_predict_op
from google_cloud_pipeline_components.v1.bigquery import BigqueryQueryJobOp
from pipelines.utils.query import generate_query
import pathlib

# Monitoring configuration
ALERT_EMAILS = [”ml-team@example.com”]
NOTIFICATION_CHANNELS = []
SKEW_THRESHOLDS = {”defaultSkewThreshold”: {”value”: 0.001}}
@dsl.pipeline(name=”taxifare-batch-prediction-pipeline”)
def pipeline(
    project: str,
    location: str,
    bq_location: str,
    bq_source_uri: str = “bigquery-public-data.chicago_taxi_trips.taxi_trips”,
    dataset: str = “taxi_trips_dataset”,
    timestamp: str = “2022-12-01 00:00:00”,
    use_latest_data: bool = True,
    model_name: str = “taxi-traffic-model”,
    machine_type: str = “n2-standard-4”,
    min_replicas: int = 3,
    max_replicas: int = 10,
):
    “”“
    Prediction pipeline which:
     1. Looks up the default model version (champion)
     2. Preprocesses data using BigQuery SQL
     3. Runs batch prediction job (BigQuery → BigQuery)
     4. Monitors for training-serving skew
    Args:
        project: GCP project ID
        location: Vertex AI location (e.g., us-central1)
        bq_location: BigQuery location (e.g., US)
        bq_source_uri: Source BigQuery table
        dataset: Dataset for staging tables
        timestamp: Optional fixed timestamp for predictions
        use_latest_data: Whether to use latest data (default: True)
        model_name: Model display name in registry
        machine_type: VM type for batch prediction
        min_replicas: Minimum number of prediction workers
        max_replicas: Maximum number of prediction workers
    “”“
    queries_folder = pathlib.Path(__file__).parent / “queries”
    # Step 1: Preprocess data using same SQL as training
    prep_query = generate_query(
        input_file=queries_folder / “ingest_pred.sql”,
        source=bq_source_uri,
        dataset=f”{project}.{dataset}”,
        table_=”prep_prediction_table”,
        start_timestamp=timestamp,
        use_latest_data=use_latest_data,
    )
    prep_op = BigqueryQueryJobOp(
        project=project,
        location=”US”,
        query=prep_query,
    ).set_display_name(”Ingest & preprocess data”)
    # Step 2: Lookup champion model from registry
    champion_model = lookup_model_op(
        model_name=model_name,
        location=location,
        project=project,
        fail_on_model_not_found=True,  # Must exist!
    ).set_display_name(”Look up champion model”)
    # Step 3: Run batch prediction with monitoring
    model_batch_predict_op(
        model=champion_model.outputs[”model”],
        job_display_name=”taxi-fare-predict-job”,
        location=location,
        project=project,
        # Input/Output configuration (BigQuery → BigQuery)
        source_uri=f”bq://{project}.{dataset}.prep_prediction_table”,
        destination_uri=f”bq://{project}.{dataset}”,
        source_format=”bigquery”,
        destination_format=”bigquery”,
        # Instance configuration
        instance_config={”instanceType”: “object”},
        # Resource configuration (horizontal scaling)
        machine_type=machine_type,
        starting_replica_count=min_replicas,
        max_replica_count=max_replicas,
        # Monitoring configuration
        monitoring_training_dataset=champion_model.outputs[”training_dataset”],
        monitoring_alert_email_addresses=ALERT_EMAILS,
        notification_channels=NOTIFICATION_CHANNELS,
        monitoring_skew_config=SKEW_THRESHOLDS,
    ).after(prep_op).set_display_name(”Run prediction job”)

if __name__ == “__main__”:
    compiler.Compiler().compile(
        pipeline_func=pipeline,
        package_path=”taxifare-prediction-pipeline.yaml”
    )

Pipeline Execution DAG on Vertex AI pipeline

Key Design Decisions

1. Simple Linear Flow Unlike the training pipeline with its complex DAG, the prediction pipeline is deliberately simple:

No parallel branches
No conditional logic
Fail fast if any step fails

2. Preprocessing Consistency

# Same SQL template as training!
prep_query = generate_query(
    input_file=queries_folder / “ingest_pred.sql”,
    # ...
)

The ingest_pred.sql has identical feature engineering as ingest.sql (training), just without the label column.

3. Dynamic Champion Lookup

champion_model = lookup_model_op(
    model_name=model_name,
    fail_on_model_not_found=True,
)

Never hardcode model versions. Always use the current champion dynamically.

4. Built-in Monitoring

monitoring_training_dataset=champion_model.outputs[”training_dataset”],

The training dataset metadata (saved during training) is automatically used for skew detection.

5. Scalability by Default

min_replicas=3,
max_replicas=10,

Automatically scales based on data volume:

Small dataset: Uses 3 replicas
Large dataset: Scales up to 10 replicas

Running the Pipeline

Compile

make compile pipeline=prediction

Run in Different Scenarios

Production run (latest data):

make prediction enable_caching=false use_latest_data=true

Or using the Python utility:

poetry run python -m pipelines.utils.run_pipeline \
  --pipeline=prediction \
  --project=my-prod-project \
  --use_latest_data=true \
  --enable_caching=false

Backfill run (historical data):

poetry run python -m pipelines.utils.run_pipeline \
  --pipeline=prediction \
  --project=my-prod-project \
  --timestamp=”2024-12-01 00:00:00” \
  --use_latest_data=false

Testing run (small dataset):

poetry run python -m pipelines.utils.run_pipeline \
  --pipeline=prediction \
  --project=my-dev-project \
  --machine_type=”n2-standard-2” \
  --min_replicas=1 \
  --max_replicas=1

Expected Output

Pipeline submitted: projects/123/locations/us-central1/pipelineJobs/prediction-20250113-142536

View in Vertex AI:
https://console.cloud.google.com/vertex-ai/pipelines/runs/prediction-20250113-142536

Prediction Output Format

The batch prediction creates a BigQuery table:

-- View predictions
SELECT * FROM `my-project.taxi_trips_dataset.predictions_20250113_142536`
LIMIT 10;

Using Predictions

Join with actuals (for accuracy measurement):

SELECT
  p.trip_id,
  p.predicted_total_fare,
  a.actual_fare,
  ABS(p.predicted_total_fare - a.actual_fare) AS error,
  ABS(p.predicted_total_fare - a.actual_fare) / a.actual_fare AS pct_error
FROM predictions_20250113_142536 p
JOIN actual_fares a ON p.trip_id = a.trip_id
WHERE a.actual_fare > 0
ORDER BY pct_error DESC
LIMIT 100;

Export for business use:

-- Export to Google Sheets or Data Studio
SELECT
  trip_id,
  predicted_total_fare,
  CASE
    WHEN predicted_total_fare < 10 THEN ‘Low’
    WHEN predicted_total_fare < 25 THEN ‘Medium’
    ELSE ‘High’
  END AS fare_category
FROM predictions_20250113_142536;

Best Practices

1. Always Use Champion Model

# Good: Lookup champion dynamically
champion = lookup_model_op(model_name=”taxi-traffic-model”)
# Bad: Hardcode model version
model_uri = “projects/.../models/123456/versions/1”

Dynamic lookup ensures you always use the latest approved model.

2. Monitor Everything

Enable monitoring on all prediction jobs:

monitoring_training_dataset=champion_model.outputs[”training_dataset”],
monitoring_skew_config=SKEW_THRESHOLDS,

3. Test Predictions in Dev First

# Test prediction pipeline in dev
make prediction enable_caching=false
# Verify predictions look reasonable
bq query --project=my-dev-project “
  SELECT prediction, trip_miles
  FROM predictions_table
  ORDER BY RAND()
  LIMIT 10
“
# Only then run in prod

4. Version Prediction Outputs

# Include timestamp in output table
destination_uri=f”bq://{project}.{dataset}.predictions_{date}”

Enables:

A/B testing between model versions
Historical prediction analysis
Rollback if needed

5. Ground Truth Collection

Collect actual outcomes to measure real accuracy:

-- Join predictions with actual fares (collected later)
SELECT
  p.prediction,
  a.actual_fare,
  ABS(p.prediction - a.actual_fare) AS error
FROM predictions p
JOIN actual_fares a ON p.trip_id = a.trip_id

Use this to:

Track accuracy over time
Trigger retraining when accuracy drops
Validate champion/challenger comparisons

Conclusion

Building a production prediction pipeline requires:

Champion model lookup: Always use the current best model
Preprocessing consistency: Exact same transformations as training
Batch predictions at scale: Horizontal scaling with BigQuery
Model monitoring: Automatic skew detection
Alerting: Notify when issues arise
Cost optimization: Right-size resources

With this prediction pipeline:

Always uses the champion model
Scales horizontally for large datasets
Monitors for data drift automatically
Alerts when issues arise
Integrates seamlessly with training pipeline

In the next article, we’ll automate everything with CI/CD and explore production operations including continuous training, scheduled retraining, and observability.

Key Takeaways:

Prediction preprocessing must match training preprocessing exactly
Always lookup champion model dynamically (never hardcode versions)
Batch predictions scale horizontally for millions of rows
Model monitoring detects training-serving skew automatically
Test predictions in dev before running in production
Version prediction outputs for analysis and rollback

Next in Series: CI/CD & Production Operations

GitHub Repository: production-ready-MLOps-on-GCP

Prediction Pipeline Code:

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Building Distributed Multi-Agent Systems with Google’s AI Stack: Part 6

Saoussen CHAABNIA — Wed, 04 Feb 2026 10:10:56 GMT

Building Production Multi-Agent Systems with Google’s AI Stack series:

Part 1: From Monolithic AI to Distributed Intelligence: Building Your First Multi-Agent System
Part 2: Making Agents Talk: Agent-to-Agent (A2A) Protocol Deep Dive
Part 3: Building the Orchestrator: Coordinating Agents with the AgentTool Pattern
Part 4: Scaling Multi-Agent Workflows: Solving the Token Limit Problem
Part 5: External Tool Integration via Model Context Protocol (MCP)
Part 6: Deploying to Cloud: Cloud Run and Vertex AI Agent Engine ← You are here

Welcome Back!

In Part 5, we integrated external tools via MCP. Now we have a complete multi-agent system running locally.

It’s time to deploy to the cloud!

In this article, we’ll deploy:

5 specialist agents → Cloud Run (containerized, auto-scaling)
Creative Director orchestrator → Vertex AI Agent Engine (managed runtime)

We’ll also leverage:

Parallel deployment (3x faster)
Two-stage A2A configuration
Automated URL collection

Let’s ship it!

Deployment Architecture Overview

Why This Architecture?

Specialists on Cloud Run:

Independent scaling (scale copywriter separately)
Containerized (full control over environment)
Auto-scaling (0–100 instances)
Cost-efficient (pay only when running)

Orchestrator on Agent Engine:

Managed runtime (no container maintenance)
Integrated with Vertex AI
Built-in monitoring

Prerequisites

1. Google Cloud Project Setup

# Install gcloud CLI
# macOS:
brew install google-cloud-sdk
# Linux:
curl https://sdk.cloud.google.com | bash
# Verify
gcloud --version
# Login and set project
gcloud auth login
gcloud config set project YOUR_PROJECT_ID
# Enable required APIs
gcloud services enable \
    run.googleapis.com \
    aiplatform.googleapis.com \
    cloudbuild.googleapis.com \
    artifactregistry.googleapis.com

2. Environment Variables

Create .env file:

# Google Cloud
PROJECT_ID=your-gcp-project-id
REGION=us-central1
# Gemini API
GOOGLE_API_KEY=your-gemini-api-key
# Notion (optional)
NOTION_API_KEY=your-notion-token
NOTION_DATABASE_ID=your-projects-db-id
TASKS_DATABASE_ID=your-tasks-db-id

3. Service Accounts Setup

No setup needed! Cloud Run automatically uses the default Compute Engine service account with all necessary permissions.

This simplifies deployment, no need to create custom service accounts.

Creating Dockerfiles for Specialist Agents

Standard Agent Dockerfile

# agents/brand_strategist/Dockerfile
FROM python:3.12-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    curl \
    && rm -rf /var/lib/apt/lists/*
# Install uv for faster dependency installation
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
# Copy requirements and install
COPY requirements.txt .
RUN uv pip install --system --no-cache -r requirements.txt
# Copy agent code
COPY agent.py .
# Create non-root user for security
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
USER appuser
# Environment
ENV PYTHONUNBUFFERED=1
ENV PORT=8080
ENV HOST=0.0.0.0
EXPOSE 8080
# Run A2A server
CMD [”python”, “agent.py”]

Project Manager Dockerfile (with Node.js for MCP)

# agents/project_manager/Dockerfile*
FROM python:3.12-slim
WORKDIR /app
# Install Node.js for Notion MCP server
RUN apt-get update && apt-get install -y \
    nodejs \
    npm \
    gcc \
    curl \
    && rm -rf /var/lib/apt/lists/*
# Verify Node.js
RUN node --version && npm --version
# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
# Install Python dependencies
COPY requirements.txt .
RUN uv pip install --system --no-cache -r requirements.txt
# Copy agent code
COPY agent.py .
# Create non-root user
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
USER appuser
# Environment
ENV PYTHONUNBUFFERED=1
ENV PORT=8080
ENV HOST=0.0.0.0
EXPOSE 8080
CMD [”python”, “agent.py”]

Parallel Deployment (3x Faster!)

The Problem: Sequential Deployment

# Old approach (SLOW - sequential)
# Deploy each agent one by one
# Total: 15 minutes! ❌

The Solution: Async Parallel Deployment

# deploy/deploy_all_specialists.py
import asyncio
import subprocess
from typing import Dict, List
AGENTS = [
    {”name”: “brand-strategist”, “dir”: “brand_strategist”},
    {”name”: “copywriter”, “dir”: “copywriter”},
    {”name”: “designer”, “dir”: “designer”},
    {”name”: “critic”, “dir”: “critic”},
    {”name”: “project-manager”, “dir”: “project_manager”},
]

async def deploy_single_agent(
    agent_config: Dict,
    project_id: str,
    region: str
) -> str:
    “”“Deploy a single agent to Cloud Run”“”
    name = agent_config[”name”]
    agent_dir = agent_config[”dir”]
    service_account = f”{name}-sa”
    print(f”🚀 Deploying {name}...”)
    agent_path = Path(__file__).parent.parent / agent_dir
    sa_email = f”{service_account}@{project_id}.iam.gserviceaccount.com”
    # Build environment variables
    env_vars = (
        f”GOOGLE_GENAI_USE_VERTEXAI=true,”
        f”GOOGLE_CLOUD_PROJECT={project_id},”
        f”GOOGLE_CLOUD_LOCATION={region}”
    )    # Add Notion credentials for project-manager
    if name == “project-manager”:
        notion_api_key = os.getenv(”NOTION_API_KEY”)
        notion_db_id = os.getenv(”NOTION_DATABASE_ID”)
        if notion_api_key and notion_db_id:
            env_vars += f”,NOTION_API_KEY={notion_api_key},NOTION_DATABASE_ID={notion_db_id}”
    # Deploy command
    cmd = [
        “gcloud”, “run”, “deploy”, name,
        “--source=.”,
        “--port=8080”,
        “--platform=managed”,
        f”--region={region}”,
        f”--project={project_id}”,
        f”--service-account={sa_email}”,
        “--no-allow-unauthenticated”,
        f”--set-env-vars={env_vars}”,
        “--memory=1Gi”,
        “--cpu=1”,
        “--timeout=300”,
        “--max-instances=10”,
        “--min-instances=0”,
        “--quiet”
    ]
    # Run deployment asynchronously
    process = await asyncio.create_subprocess_exec(
        *cmd,
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
        cwd=agent_path
    )
    stdout, stderr = await process.communicate()
    if process.returncode != 0:
        print(f”❌ Failed to deploy {name}: {stderr.decode()}”)
        return None
    print(f”✓ {name} deployed successfully”)
    # Get service URL
    url = await get_service_url(name, project_id, region)
    return url

async def deploy_all_agents(project_id: str, region: str) -> Dict[str, str]:
    “”“Deploy all agents in parallel and collect URLs”“”
    print(”\n” + “=”*70)
    print(”Deploying all specialist agents to Cloud Run (in parallel)”)
    print(”=”*70 + “\n”    # Deploy all agents in parallel using asyncio.gather
    tasks = [
        deploy_single_agent(agent, project_id, region)
        for agent in AGENTS
    ]
    results = await asyncio.gather(*tasks)
    # Build URL mapping
    agent_urls = {}
    for agent, url in zip(AGENTS, results):
        if url:
            agent_urls[agent[”name”]] = url
    print(”\n” + “=”*70)
    print(f”✓ Deployment complete! {len(agent_urls)}/{len(AGENTS)} agents deployed”)
    print(”=”*70)    return agent_urls

Speed comparison:

Sequential: 5 agents × 3 min = 15 minutes
Parallel: ~5 minutes
3x faster!

Two-Stage A2A Configuration

Remember our dual configuration from Part 3? Here’s how it works in deployment:

Stage 1: Initial Deployment

# Deploy with basic environment variables
gcloud run deploy brand-strategist \
    --source=. \
    --set-env-vars=GOOGLE_CLOUD_PROJECT=...,... \
    --region=us-central1
# Service is deployed!
# But agent card still shows placeholder URL

Stage 2: Update A2A Configuration

async def update_agent_a2a_config(
    service_name: str,
    url: str,
    project_id: str,
    region: str
) -> None:
    “”“Update deployed agent with PUBLIC_HOST, PUBLIC_PORT, PROTOCOL”“”
    # Extract PUBLIC_HOST from URL
    # URL: https://brand-strategist-xxx.us-central1.run.app
    public_host = url.replace(”https://”, “”).replace(”http://”, “”).split(”/”)[0]
    print(f”   Updating A2A config for {service_name}...”)
    # Build environment variables update
    env_vars_update = f”PUBLIC_HOST={public_host},PUBLIC_PORT=443,PROTOCOL=https”
    # Add Notion credentials for project-manager
    if service_name == “project-manager”:
        notion_api_key = os.getenv(”NOTION_API_KEY”)
        if notion_api_key:
            env_vars_update += f”,NOTION_API_KEY={notion_api_key}”
    cmd = [
        “gcloud”, “run”, “services”, “update”, service_name,
        “--platform=managed”,
        f”--region={region}”,
        f”--project={project_id}”,
        f”--update-env-vars={env_vars_update}”,
        “--quiet”
    ]
    process = await asyncio.create_subprocess_exec(*cmd)
    await process.wait()
    if process.returncode == 0:
        print(f”   ✓ A2A config updated for {service_name}”)
    else:
        print(f”   Warning: Could not update A2A config for {service_name}”)

Now the agent card shows the correct URL:

{
  “name”: “brand_strategist”,
  “rpc_url”: “https://brand-strategist-xxx.us-central1.run.app:443”
}

Perfect for the orchestrator to discover!

Deploying the Orchestrator to Agent Engine

Step 1: Prepare Agent Code

# agents/creative_director/agent.py
# Agent creation code from Part 4
# Returns App (with context compaction)
root_agent = create_creative_director()
# That’s it! Agent Engine handles the rest

Step 2: Deploy to Agent Engine

# deploy/deploy_orchestrator.py
from google.cloud import aiplatform
from pathlib import Path
def deploy_orchestrator(agent_urls: Dict[str, str], project_id: str, region: str):
    “”“Deploy Creative Director to Vertex AI Agent Engine”“”
    print(”\n” + “=”*70)
    print(”Deploying Creative Director to Vertex AI Agent Engine”)
    print(”=”*70)
    # Initialize Vertex AI
    aiplatform.init(project=project_id, location=region)
    # Prepare environment variables with agent URLs
    env_vars = {
        “GOOGLE_API_KEY”: os.getenv(”GOOGLE_API_KEY”),
        “STRATEGIST_AGENT_URL”: agent_urls.get(”brand-strategist”),
        “COPYWRITER_AGENT_URL”: agent_urls.get(”copywriter”),
        “DESIGNER_AGENT_URL”: agent_urls.get(”designer”),
        “CRITIC_AGENT_URL”: agent_urls.get(”critic”),
        “PM_AGENT_URL”: agent_urls.get(”project-manager”),
    }
    print(”\n📋 Environment variables:”)
    for key, value in env_vars.items():
        if “API_KEY” not in key:
            print(f”   {key}={value}”)
    # Read requirements
    requirements = [”google-adk”, “google-genai”, “python-dotenv”]
    # Deploy to Agent Engine
    print(”\n🚀 Deploying to Agent Engine...”)
    reasoning_engine = aiplatform.ReasoningEngine.create(
        reasoning_engine={
            “agent_file”: “agent.py”,
            “agent_name”: “root_agent”,  # Name of variable in agent.py
            “requirements”: requirements
        },
        display_name=”creative-director-orchestrator”,
        description=”Creative Director orchestrator for AI Creative Studio”,
        requirements=requirements,
        extra_packages=[Path(”agents/creative_director”)],
        env_vars=env_vars
    )
    resource_name = reasoning_engine.resource_name
    print(f”\n✅ Orchestrator deployed!”)
    print(f”   Resource name: {resource_name}”)
    print(f”\n💡 Save this to .env:”)
    print(f”   AGENT_ENGINE_RESOURCE_NAME={resource_name}”)
    return resource_name

Key points:

Deploys agent.py with root_agent variable
Sets all agent URLs in environment variables
Orchestrator discovers agents at runtime!

One-Command Deployment

The Complete Deployment Script

#!/bin/bash
# deploy/deploy_complete_system.sh
set -e
echo “======================================================================”
echo “   AI Creative Studio - Complete System Deployment”
echo “======================================================================”
# Load environment
if [ ! -f .env ]; then
    echo “❌ Error: .env file not found”
    exit 1
fi
source .env
echo “”
echo “📋 Configuration:”
echo “   Project: $PROJECT_ID”
echo “   Region: $REGION”
echo “”
# Step 1: Deploy all specialist agents in parallel
echo “Step 1/2: Deploying specialist agents to Cloud Run (parallel)...”
python3 deploy_all_specialists.py
if [ $? -ne 0 ]; then
    echo “❌ Specialist deployment failed”
    exit 1
fi
# Step 2: Deploy orchestrator
echo “”
echo “Step 2/2: Deploying orchestrator to Vertex AI Agent Engine...”
python3 deploy_orchestrator.py --action deploy
if [ $? -ne 0 ]; then
    echo “❌ Orchestrator deployment failed”
    exit 1
fi
echo “”
echo “======================================================================”
echo “   ✅ Complete System Deployed Successfully!”
echo “======================================================================”
echo “”
echo “🧪 Test your system:”
echo “   python3 test_orchestrator.py”
echo “”

Run It!

cd deploy
chmod +x deploy_complete_system.sh
./deploy_complete_system.sh

Output

======================================================================
   AI Creative Studio - Complete System Deployment
======================================================================
📋 Configuration:
   Project: my-project-123
   Region: us-central1
Step 1/2: Deploying specialist agents to Cloud Run (parallel)...
======================================================================
Deploying all specialist agents to Cloud Run (in parallel)
======================================================================
🚀 Deploying brand-strategist...
🚀 Deploying copywriter...
🚀 Deploying designer...
🚀 Deploying critic...
🚀 Deploying project-manager...
✓ brand-strategist deployed successfully
   URL: https://brand-strategist-xxx.us-central1.run.app
   Updating A2A config for brand-strategist...
   ✓ A2A config updated
✓ copywriter deployed successfully
   URL: https://copywriter-xxx.us-central1.run.app
   Updating A2A config for copywriter...
   ✓ A2A config updated
... (rest of agents)
======================================================================
✓ Deployment complete! 5/5 agents deployed
======================================================================
Step 2/2: Deploying orchestrator to Vertex AI Agent Engine...
======================================================================
Deploying Creative Director to Vertex AI Agent Engine
======================================================================
📋 Environment variables:
   STRATEGIST_AGENT_URL=https://brand-strategist-xxx.us-central1.run.app
   COPYWRITER_AGENT_URL=https://copywriter-xxx.us-central1.run.app
   DESIGNER_AGENT_URL=https://designer-xxx.us-central1.run.app
   CRITIC_AGENT_URL=https://critic-xxx.us-central1.run.app
   PM_AGENT_URL=https://project-manager-xxx.us-central1.run.app
🚀 Deploying to Agent Engine...
✅ Orchestrator deployed!
   Resource name: projects/123/locations/us-central1/reasoningEngines/456
💡 Save this to .env:
   AGENT_ENGINE_RESOURCE_NAME=projects/123/locations/us-central1/reasoningEngines/456
======================================================================
   ✅ Complete System Deployed Successfully!
======================================================================
🧪 Test your system:
   python3 test_orchestrator.py
Total deployment time: ~7 minutes

Testing the Deployed System

Test Script

# test_orchestrator.py
from google.cloud import aiplatform
import os
from dotenv import load_dotenv
load_dotenv()
# Initialize
project_id = os.getenv(”PROJECT_ID”)
region = os.getenv(”REGION”)
resource_name = os.getenv(”AGENT_ENGINE_RESOURCE_NAME”)
aiplatform.init(project=project_id, location=region)
# Load the deployed orchestrator
reasoning_engine = aiplatform.ReasoningEngine(resource_name)
# Test with a simple request
brief = “Research the market for eco-friendly smart water bottles”
print(f”📋 Testing deployed orchestrator\n”)
print(f”Brief: {brief}\n”)
print(”Response:”)
response = reasoning_engine.query(input=brief)
print(response[”output”])
print(”\n✅ Deployed system is working!”)

Run Test

python test_orchestrator.py

Monitoring and Logs

View Orchestrator Logs

# Fetch logs from Agent Engine
gcloud logging read \
    ‘resource.type=”aiplatform.googleapis.com/ReasoningEngine”’ \
    --limit=50 \
    --format=json

View Agent Logs

# Brand Strategist logs
gcloud run services logs read brand-strategist \
    --region=us-central1 \
    --limit=50

Cloud Run Dashboard

# Open Cloud Run console
gcloud console cloud-run

View:

Request counts
Response times
Error rates
Instance scaling

Monitoring and Debugging Your Deployed System

Now that your system is deployed, here are quick tips for observability:

Built-in Observability

ADK Logging Plugin (already enabled in code):

Automatically logs all LLM calls, tool executions, and token usage
No custom configuration needed

Cloud Logging (automatic):

# View orchestrator logs
gcloud logging read \
  ‘resource.type=”aiplatform.googleapis.com/ReasoningEngine”’ \
  --limit=100 --project=YOUR_PROJECT_ID

# View specialist agent logs
gcloud logging read \
  ‘resource.type=”cloud_run_revision” AND
   resource.labels.service_name=”brand-strategist”’ \
  --limit=100 --project=YOUR_PROJECT_ID

A2A Inspector (for testing agents):

Install: https://github.com/a2aproject/a2a-inspector
Connect to your Cloud Run agent URLs
Test queries and view JSONRPC messages

Quick Debugging Commands

# Tail orchestrator logs in real-time
gcloud logging tail \
  ‘resource.type=”aiplatform.googleapis.com/ReasoningEngine”’ \
  --project=YOUR_PROJECT_ID
# Check for errors in specialist agents
gcloud logging read \
  ‘resource.type=”cloud_run_revision” AND severity>=ERROR’ \
  --limit=50 --project=YOUR_PROJECT_ID
# View Cloud Run metrics
gcloud run services describe brand-strategist \
  --platform managed --region us-central1

For comprehensive monitoring, set up Cloud Monitoring dashboards and log-based alerts through the Google Cloud Console.

Visual Tour: Your Deployed System in Action

Specialists Deployed to Cloud Run

Navigate to Cloud Run in Google Cloud Console. You should see all 5 specialist agents deployed as independent services:

✅ brand-strategist — Ready to research markets
✅ copywriter — Ready to write compelling copy
✅ designer — Ready to create visual concepts
✅ critic — Ready to review and provide feedback
✅ project-manager — Ready to organize tasks in Notion

Key indicators:
— Green checkmarks = healthy and running
— Each service has its own URL (the A2A endpoint)
— Auto-scaling configured (0 to 10 instances)
— Currently scaled to zero (no idle costs!)

Orchestrator Deployed to Agent Engine

Navigate to Vertex AI > Agent Engine in Google Cloud Console. You should see:

📋 Display name: Creative Director

Live Execution in Agent Engine Playground

Click on the Creative Director then go into the “Playground” Tab. A session will be created for you. Enter a prompt !

The execution flow visible in the playground as in demo:

Thank you for following this series!

If you built something with these patterns, I’d love to hear about it. Share your projects, questions, and improvements.

Happy building! 🚀

Code Repository: https://github.com/Saoussen-CH/ai-creative-studio-adk-a2a-mcp-vertexai-cloudrun

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Production-Ready MLOps on GCP Part 5: Training Pipeline Deep Dive

Saoussen CHAABNIA — Tue, 13 Jan 2026 10:12:30 GMT

Complete Series:

Introduction

In the previous article, we built a library of reusable Kubeflow Pipeline components — modular building blocks like extract_table_to_gcs_op and upload_best_model_op. Now comes the payoff: assembling these components into a complete, production-ready training pipeline.

But here’s what makes this challenging: a production training pipeline isn’t just “train a model and save it.” It needs to:

Preprocess data at scale using BigQuery
Split data reproducibly so experiments are comparable
Tune hyperparameters automatically to find the best configuration
Train models in custom containers with full control
Evaluate rigorously on held-out test data
Compare with the champion to prevent degraded models from deploying
Version and register models with complete lineage

All while being:

Automated: No manual steps
Reproducible: Same inputs → same outputs
Observable: Full logging and monitoring
Testable: Validated before production

In this article, we’ll dissect our production training pipeline from end to end, exploring:

Data preprocessing with BigQuery SQL
Repeatable data splitting strategies
Hyperparameter tuning with Vertex AI
Custom TensorFlow training containers
Champion/Challenger model comparison
Complete pipeline orchestration

By the end, you’ll understand how all the pieces fit together to create a pipeline that reliably produces production-quality models.

Training Pipeline Architecture

Our training pipeline executes 8 major steps:

1. Data Preprocessing (BigQuery SQL)
         ↓
2. Data Splitting (80/10/10 train/val/test)
         ↓
3. Data Extraction (BigQuery → GCS CSV)
         ↓
4. Hyperparameter Tuning (6 trials, 2 parallel)
         ↓
5. Model Training (Custom TensorFlow container)
         ↓
6. Model Evaluation (Test set metrics)
         ↓
7. Champion/Challenger Comparison (RMSE-based)
         ↓
8. Model Upload to Registry (if better than champion)

Each step is a component (or set of components) that we explored in Article 3. The magic is in how they’re orchestrated.

Step 1: Data Preprocessing with BigQuery

Goal: Transform raw Chicago taxi trip data into features ready for model training.

Why BigQuery for Preprocessing?

You might wonder: why not preprocess in Python (pandas/PySpark)? Several reasons:

Scale: BigQuery processes terabytes effortlessly; pandas doesn’t
Speed: SQL on BigQuery is faster than Python for aggregations
Cost: Process-compute separation — you don’t pay for idle infrastructure
Simplicity: SQL is declarative and familiar to data teams
Versioning: SQL queries in Git are easier to review than Spark DAGs

Preprocessing Query (Simplified)

-- ingest.sql
CREATE OR REPLACE TABLE `{dataset}.{table_}` AS (
  SELECT
    -- Temporal features
    EXTRACT(DAYOFWEEK FROM trip_start_timestamp) AS dayofweek,
    EXTRACT(HOUR FROM trip_start_timestamp) AS hourofday,
    -- Trip characteristics
    trip_miles,
    trip_seconds,
    SAFE_DIVIDE(trip_miles, trip_seconds) * 3600 AS trip_distance,
    -- Categorical features
    company,
    payment_type,
    -- Label (target)
    fare AS total_fare
  FROM `{source}`
  WHERE
    -- Data quality filters
    trip_start_timestamp IS NOT NULL
    AND trip_miles > 0
    AND trip_seconds > 0
    AND fare > 0
    -- Timeframe filter
    {timestamp_filter}
)

Key Preprocessing Decisions

1. Feature Engineering in SQL

SAFE_DIVIDE(trip_miles, trip_seconds) * 3600 AS trip_distance

We create derived features (like speed in miles/hour) directly in SQL rather than in training code. This ensures:

Training-serving consistency: Same SQL runs for training and prediction
Clarity: Feature logic is explicit and reviewable
Performance: BigQuery optimizes SQL execution

2. Data Quality Filters

WHERE trip_miles > 0 AND trip_seconds > 0 AND fare > 0

Filtering bad data at the source prevents:

NaN/Inf values that crash training
Outliers that distort model learning
Invalid records that waste compute

3. Temporal Consistency

{timestamp_filter}

The pipeline supports two modes:

Latest data: Dynamically selects most recent 2–3 months
Fixed timestamp: Uses data from a specific time period

This enables:

Production: Always train on fresh data
Development: Reproducible experiments with fixed data

Pipeline Code: Preprocessing Step

from google_cloud_pipeline_components.v1.bigquery import BigqueryQueryJobOp
from pipelines.utils.query import generate_query

# Generate preprocessing SQL with template substitution
prep_query = generate_query(
    input_file=queries_folder / “ingest.sql”,
    source=bq_source_uri,
    location=bq_location,
    dataset=f”{project}.{dataset}”,
    table_=preprocessed_table,
    label=label,
    start_timestamp=timestamp,
    use_latest_data=use_latest_data,
)
# Execute preprocessing as a pipeline step
prep_op = BigqueryQueryJobOp(
    project=project,
    location=”US”,
    query=prep_query,
).set_display_name(”Ingest & preprocess data”)

What happens:

generate_query() loads SQL template and substitutes parameters
BigqueryQueryJobOp executes the query in BigQuery
Results are written to {project}.{dataset}.preprocessed_data
Subsequent steps read from this table

Step 2: Repeatable Data Splitting

Goal: Split data into train (80%), validation (10%), and test (10%) sets in a deterministic, reproducible way.

The Challenge of Reproducibility

Random splits are problematic:

# BAD: Different split every run
train, test = random_split(data, [0.8, 0.2])

Problems:

Can’t reproduce experiments
Hyperparameter tuning results aren’t comparable
Can’t debug models trained weeks ago

Our Solution: Hash-Based Deterministic Splitting

-- repeatable_splitting.sql
CREATE OR REPLACE TABLE `{destination_table}` AS (
  SELECT * FROM `{source_dataset}.{source_table}`
  WHERE MOD(ABS(FARM_FINGERPRINT(CAST(unique_key AS STRING))), {num_lots}) IN {lots}
)

How it works:

Hash the unique key: FARM_FINGERPRINT(unique_key) produces a consistent hash
Modulo operation: MOD(..., 10) assigns each row to a bucket (0-9)
Select buckets: Buckets 0–7 = train, 8 = validation, 9 = test

Benefits:

Deterministic: Same row always goes to same split
Balanced: Hash distributes rows uniformly
Reproducible: Re-running uses identical splits
Efficient: Computed in BigQuery, not in application code

Pipeline Code: Data Splitting

# Train split (buckets 0-7, 80% of data)
split_train_query = generate_query(
    input_file=queries_folder / “repeatable_splitting.sql”,
    source_dataset=f”{project}.{dataset}”,
    source_table=preprocessed_table,
    num_lots=10,
    lots=tuple(range(8)),  # (0, 1, 2, 3, 4, 5, 6, 7)
)

split_train_data = BigqueryQueryJobOp(
    project=project,
    location=bq_location,
    query=split_train_query,
).after(prep_op).set_display_name(”Split train data”)
# Validation split (bucket 8, 10% of data)
split_valid_query = generate_query(
    input_file=queries_folder / “repeatable_splitting.sql”,
    source_dataset=f”{project}.{dataset}”,
    source_table=preprocessed_table,
    num_lots=10,
    lots=”(8)”,
)
split_valid_data = BigqueryQueryJobOp(
    project=project,
    location=bq_location,
    query=split_valid_query,
).after(prep_op).set_display_name(”Split valid data”)
# Test split (bucket 9, 10% of data)
split_test_query = generate_query(
    input_file=queries_folder / “repeatable_splitting.sql”,
    source_dataset=f”{project}.{dataset}”,
    source_table=preprocessed_table,
    num_lots=10,
    lots=”(9)”,
)
split_test_data = BigqueryQueryJobOp(
    project=project,
    location=bq_location,
    query=split_test_query,
).after(prep_op).set_display_name(”Split test data”)

Dependency management: All three splits depend on prep_op via .after(prep_op), ensuring preprocessing completes first. But they run in parallel with each other since they’re independent.

Step 3: Data Extraction to Cloud Storage

Goal: Export BigQuery tables to GCS as CSV files that TensorFlow can read.

Why Export to GCS?

TensorFlow’s tf.data.experimental.make_csv_dataset() reads from files, not BigQuery directly. We need to bridge this gap.

# Extract training data
train_dataset = (
    extract_table_to_gcs_op(
        bq_table=split_train_data.outputs[”destination_table”]
    )
    .after(split_train_data)
    .set_display_name(”Extract training data from BigQuery to GCS”)
)

# Extract validation data
valid_dataset = (
    extract_table_to_gcs_op(
        bq_table=split_valid_data.outputs[”destination_table”]
    )
    .after(split_valid_data)
    .set_display_name(”Extract validation data from BigQuery to GCS”)
)
# Extract test data
test_dataset = (
    extract_table_to_gcs_op(
        bq_table=split_test_data.outputs[”destination_table”]
    )
    .after(split_test_data)
    .set_display_name(”Extract test data from BigQuery to GCS”)
)

What happens:

Each BigQuery table is exported to a GCS URI (e.g., gs://bucket/train/*.csv)
The extract_table_to_gcs_op component handles the export job
Output artifacts (train_dataset, valid_dataset, test_dataset) are passed to training

Pro tip: GCS paths are automatically generated by KFP based on the pipeline run ID, ensuring each run has isolated data.

Step 4: Hyperparameter Tuning with Vertex AI

Goal: Automatically find the best learning rate and batch size for our model.

Hyperparameter Search Space

We define which hyperparameters to tune and their ranges:

from google.cloud.aiplatform import hyperparameter_tuning as hpt

PARAMETER_SPEC = {
    “learning-rate”: hpt.DoubleParameterSpec(
        min=0.0001,
        max=1,
        scale=”log”  # Search logarithmically
    ),
    “batch-size”: hpt.DiscreteParameterSpec(
        values=[128, 256, 512],
        scale=”linear”
    ),
}
METRIC_SPEC = {
    “val_root_mean_squared_error”: “minimize”
}

Design choices:

Log scale for learning rate: Search exponentially (0.0001, 0.001, 0.01, 0.1, 1)
Discrete batch sizes: Only try powers of 2 for memory efficiency
Validation RMSE: Optimize for generalization, not training loss

Hyperparameter Tuning Workflow

# 1. Prepare args for hyperparameter tuning
args = dict(
    train_data=train_dataset.outputs[”dataset”],
    valid_data=valid_dataset.outputs[”dataset”],
    test_data=test_dataset.outputs[”dataset”],
    hypertune=True,  # Enable hyperparameter tuning mode
)

hypertune_args_step = get_training_args_dict_op(**args).set_display_name(
    “Get-Hypertune-Args”
)
# 2. Configure worker pool for tuning trials
hypertune_worker_pool_specs_step = get_workerpool_spec_op(
    worker_pool_specs=WORKER_POOL_SPECS,
    args=hypertune_args_step.output,
).set_display_name(”Get-Hypertune-Worker-Pool-Spec”)
# 3. Run hyperparameter tuning job
hypertune_step = HyperparameterTuningJobRunOp(
    display_name=”hypertune-job”,
    project=project,
    location=location,
    worker_pool_specs=hypertune_worker_pool_specs_step.output,
    study_spec_metrics=serialize_metrics(METRIC_SPEC),
    study_spec_parameters=serialize_parameters(PARAMETER_SPEC),
    max_trial_count=6,           # Try 6 different combinations
    parallel_trial_count=2,      # Run 2 trials simultaneously
    base_output_directory=f”{base_output_dir}/hypertune-job”,
).set_display_name(”Hypertune-Job”)
# 4. Extract best hyperparameters
hypertune_results_step = get_hyperparameter_tuning_results_op(
    project=project,
    location=location,
    job_resource=hypertune_step.output,
    study_spec_metrics=serialize_metrics(METRIC_SPEC),
).set_display_name(”Get-Hypertune-Results”)

What Happens During Hyperparameter Tuning?

Trial Spawning: Vertex AI launches 2 parallel training jobs with different hyperparameters
Training: Each trial trains the model on the training set, validates on validation set
Metric Reporting: Each trial reports val_root_mean_squared_error to Vertex AI
Algorithm: Vertex AI uses Bayesian optimization to choose next trials intelligently
Best Selection: After 6 trials, the best hyperparameters are identified

Example Trial Results:

Trial 1: learning_rate=0.001, batch_size=128 → val_RMSE=3.2
Trial 2: learning_rate=0.01,  batch_size=256 → val_RMSE=2.9  ← Best so far
Trial 3: learning_rate=0.1,   batch_size=512 → val_RMSE=4.5
Trial 4: learning_rate=0.005, batch_size=256 → val_RMSE=2.7  ← New best!
Trial 5: learning_rate=0.003, batch_size=256 → val_RMSE=2.8
Trial 6: learning_rate=0.007, batch_size=256 → val_RMSE=2.75

Best: learning_rate=0.005, batch_size=256, val_RMSE=2.7

The hypertune_results_step extracts these best hyperparameters for final training.

Step 5: Custom TensorFlow Training Container

Goal: Train a TensorFlow DNN model with the best hyperparameters using a custom container.

Why Custom Containers?

Vertex AI provides pre-built training containers, but we use a custom one because:

Full control: Install exact dependencies we need
Custom preprocessing: TensorFlow layers for feature encoding
Hyperparameter integration: Pass tuned hyperparameters to training script
Model architecture: Implement custom DNN structure
Artifact management: Save model, metrics, and metadata exactly how we want

Training Container Structure

model/
├── Dockerfile                 # Container definition
├── requirements.txt           # Python dependencies
└── trainer/
    ├── __init__.py
    ├── task.py               # Entry point (argument parsing)
    └── model.py              # Model definition and training logic

Model Architecture (model.py)

Our model is a DNN with preprocessing layers built into the graph:

def build_and_compile_model(dataset, model_params):
    # Numeric features (normalize)
    NUM_COLS = [”dayofweek”, “hourofday”, “trip_distance”, “trip_miles”, “trip_seconds”]

    # Ordinal categorical (integer encoding)
    ORD_COLS = [”company”]
    # One-hot categorical (one-hot encoding)
    OHE_COLS = [”payment_type”]
    # Create input layers
    num_ins = {name: Input(shape=(), name=name, dtype=tf.float32) for name in NUM_COLS}
    ord_ins = {name: Input(shape=(), name=name, dtype=tf.string) for name in ORD_COLS}
    cat_ins = {name: Input(shape=(), name=name, dtype=tf.string) for name in OHE_COLS}
    all_ins = {**num_ins, **ord_ins, **cat_ins}
    # Preprocessing layers (learned from training data)
    num_encoded = [normalization(name, dataset)(num_ins[name]) for name in NUM_COLS]
    ord_encoded = [str_lookup(name, dataset, “int”)(ord_ins[name]) for name in ORD_COLS]
    ohe_encoded = [str_lookup(name, dataset, “one_hot”)(cat_ins[name]) for name in OHE_COLS]
    # Concatenate all features
    x = Concatenate()(num_encoded + ord_encoded + ohe_encoded)
    # Hidden layers
    for units, activation in model_params[”hidden_units”]:
        x = Dense(units, activation=activation)(x)
    # Output layer (regression)
    output = Dense(1, name=”output”, activation=”linear”)(x)
    # Build model
    model = Model(inputs=all_ins, outputs=output, name=”nn_model”)
    # Compile with optimizer and metrics
    optimizer = optimizers.get(model_params[”optimizer”])
    optimizer.learning_rate = model_params[”learning_rate”]
    model.compile(
        loss=model_params[”loss_fn”],
        optimizer=optimizer,
        metrics=[
            tf.keras.metrics.RootMeanSquaredError(name=”root_mean_squared_error”),
            tf.keras.metrics.MeanAbsoluteError(name=”mean_absolute_error”),
            tf.keras.metrics.MeanAbsolutePercentageError(name=”mean_absolute_percentage_error”),
            tf.keras.metrics.MeanSquaredLogarithmicError(name=”mean_squared_logarithmic_error”),
        ],
    )
    return model

Key features:

Preprocessing in the Model:

Normalization layer: learns mean/std from training data
StringLookup layers: learn vocabularies for categorical features
These layers are saved with the model → no separate preprocessing needed at inference

2. Multiple Metrics:

RMSE (primary metric for champion/challenger comparison)
MAE, MAPE, MSLE (additional evaluation metrics)

3. Configurable Architecture:

Hidden units, optimizer, learning rate all passed as parameters
Easy to experiment without changing code

Training Execution

# Prepare args for final training (not hypertuning)
args.update(dict(hypertune=False))

training_args_step = get_training_args_dict_op(**args).set_display_name(
    “Get-Training-Args”
)
# Configure worker pool with best hyperparameters
training_worker_pool_specs_step = get_workerpool_spec_op(
    worker_pool_specs=WORKER_POOL_SPECS,
    hyperparams=hypertune_results_step.output,  # Use best hyperparameters!
    args=training_args_step.output,
).set_display_name(”Get-Training-Worker-Pool-Spec”)
# Launch custom training job
custom_job_task = CustomTrainingJobOp(
    project=project,
    display_name=training_job_display_name,
    worker_pool_specs=training_worker_pool_specs_step.output,
    base_output_directory=f”{base_output_dir}/training-job”,
    location=location,
)

What happens:

Vertex AI provisions a VM with the specified machine type (n1-standard-4)
Pulls the custom training container from Artifact Registry
Runs the training script with hyperparameters from tuning step
Model trains on training data, validates on validation data
Saves the trained model to GCS as a TensorFlow SavedModel

Training Script Arguments (task.py)

parser.add_argument(”--train-data”, required=True, help=”Path to training CSV”)
parser.add_argument(”--valid-data”, required=True, help=”Path to validation CSV”)
parser.add_argument(”--test-data”, required=True, help=”Path to test CSV”)
parser.add_argument(”--model-dir”, default=os.getenv(”AIP_MODEL_DIR”), help=”Model output directory”)
parser.add_argument(”--learning-rate”, type=float, default=0.001)
parser.add_argument(”--batch-size”, type=int, default=100)
parser.add_argument(”--epochs”, type=int, default=10)

These arguments are populated by the worker pool spec, which includes the best hyperparameters.

Step 6: Model Evaluation on Test Set

Goal: Evaluate the trained model on held-out test data to get unbiased performance metrics.

# Extract training results (model + metrics)
training_results_step = get_custom_job_results_op(
    project=project,
    location=location,
    job_resource=custom_job_task.output
).set_display_name(”Get-Training-Results”)

The get_custom_job_results_op component:

Reads the SavedModel from GCS
Loads the test dataset
Evaluates the model: model.evaluate(test_data)
Extracts metrics: RMSE, MAE, MAPE, MSLE
Writes metrics to a JSON file artifact

Example metrics.json:

{
  “problemType”: “regression”,
  “rootMeanSquaredError”: 2.7,
  “meanAbsoluteError”: 1.9,
  “meanAbsolutePercentageError”: 12.5,
  “meanSquaredLogarithmicError”: 0.08
}

These metrics are passed to the champion/challenger comparison step.

Step 7: Champion/Challenger Comparison

Goal: Only promote the new model to production if it’s better than the current champion.

This is implemented by the upload_best_model_op component (see Article 3 for details):

upload_best_model_op(
    project=project,
    location=location,
    model=training_results_step.outputs[”model”],
    model_eval_metrics=training_results_step.outputs[”metrics”],
    test_data=test_dataset.outputs[”dataset”],
    eval_metric=”rootMeanSquaredError”,
    eval_lower_is_better=True,
    serving_container_image=PREDICTION_IMAGE,
    model_name=model_name,
    model_description=”Predict price of a taxi trip.”,
    pipeline_job_id=”{{$.pipeline_job_name}}”,
).set_display_name(”Upload model”)

Comparison logic:

Lookup champion: Query Vertex AI Model Registry for the current default model
Get champion metrics: Read evaluation metrics from champion model
Compare:

challenger_wins = ( challenger_rmse < champion_rmse )Upload:

4. Upload:

If challenger wins: Upload as is_default_version=True (becomes new champion)
If champion wins: Upload as is_default_version=False (versioned but not default)

Example scenario:

Current Champion: RMSE = 3.1
New Model: RMSE = 2.7
→ Challenger wins! (2.7 < 3.1)
→ Upload as default version
→ New model becomes champion

Step 8: Model Upload and Registry Management

The model is uploaded to Vertex AI Model Registry with:

Display name: taxi-traffic-model
Version: Auto-incremented (v1, v2, v3, …)
Default flag: Set based on champion/challenger comparison
Evaluation metrics: Imported and visible in Vertex AI UI
Lineage: Linked to training pipeline run, datasets used
Serving container: Specified for deployment

Registry view after upload:

taxi-traffic-model
├── v1 (RMSE: 3.5) - created 2 weeks ago
├── v2 (RMSE: 3.1) - created 1 week ago [CHAMPION]
└── v3 (RMSE: 2.7) - created today [NEW CHAMPION]

Complete Pipeline Code Walkthrough

Here’s the full pipeline definition (simplified for clarity):

from kfp import compiler, dsl

@dsl.pipeline(name=”taxifare-training-pipeline”)
def pipeline(
    project: str,
    location: str,
    model_name: str = “taxi-traffic-model”,
):
    # Step 1: Preprocess data
    prep_op = BigqueryQueryJobOp(
        project=project,
        location=”US”,
        query=prep_query,
    ).set_display_name(”Ingest & preprocess data”)
    # Step 2: Split data (80/10/10)
    split_train = BigqueryQueryJobOp(...).after(prep_op)
    split_valid = BigqueryQueryJobOp(...).after(prep_op)
    split_test = BigqueryQueryJobOp(...).after(prep_op)
    # Step 3: Extract to GCS
    train_dataset = extract_table_to_gcs_op(...).after(split_train)
    valid_dataset = extract_table_to_gcs_op(...).after(split_valid)
    test_dataset = extract_table_to_gcs_op(...).after(split_test)
    # Step 4: Hyperparameter tuning
    hypertune_step = HyperparameterTuningJobRunOp(...)
    hypertune_results = get_hyperparameter_tuning_results_op(...)
    # Step 5: Train with best hyperparameters
    custom_job = CustomTrainingJobOp(
        worker_pool_specs=training_worker_pool_specs_step.output
    )
    # Step 6: Evaluate
    training_results = get_custom_job_results_op(...)
    # Step 7 & 8: Champion/Challenger comparison and upload
    upload_best_model_op(
        model=training_results.outputs[”model”],
        model_eval_metrics=training_results.outputs[”metrics”],
        eval_metric=”rootMeanSquaredError”,
        eval_lower_is_better=True,
        model_name=model_name,
    )

DAG visualization on Vertex AI pipeline:

Pipeline Observability and Debugging

Vertex AI Pipeline UI

When you run the pipeline, Vertex AI provides a rich UI:

DAG visualization: See all steps and their dependencies
Step-by-step logs: Click any component to view logs
Artifact tracking: See inputs/outputs of each step
Lineage graph: Trace data from source to model
Execution timeline: Identify bottlenecks

Key Metrics to Monitor

During hyperparameter tuning:

Trials completed vs total
Best validation RMSE so far
Trial execution time

During training:

Training loss curve
Validation RMSE per epoch
Training duration

After upload:

Champion vs challenger RMSE
Model version number
Upload success/failure

Production Considerations

Caching for Faster Iterations

KFP supports caching — if inputs haven’t changed, skip execution and reuse previous outputs:

# Enable caching for expensive operations
prep_op.set_caching_options(True)
split_train.set_caching_options(True)

When to cache:

Data preprocessing (if source data hasn’t changed)
Data splitting (deterministic, always same result)

When NOT to cache:

Hyperparameter tuning (want fresh trials)
Training (want latest model)

Conclusion

We’ve dissected a production training pipeline from raw data to deployed model. The key elements that make it production-ready:

BigQuery preprocessing: Scalable SQL-based feature engineering
Repeatable splitting: Hash-based deterministic train/val/test splits
Hyperparameter tuning: Automatic optimization with Vertex AI
Custom containers: Full control over training environment
Rigorous evaluation: Test set metrics for unbiased assessment
Champion/Challenger: Quality gate preventing degraded models
Model registry: Versioning, lineage, and governance

Each component is modular, testable, and reusable. The DAG clearly shows dependencies. Observability is built-in at every step.

In the next article, we’ll explore how CI/CD automates this entire workflow — from code commit to production deployment — ensuring every change is tested, validated, and deployed safely.

Key Takeaways:

Preprocess at scale with BigQuery SQL for speed and simplicity
Use hash-based splitting for deterministic, reproducible experiments
Automate hyperparameter tuning to find optimal configurations
Build preprocessing into TensorFlow models for training-serving consistency
Implement Champion/Challenger pattern to protect production quality
Track lineage from raw data through trained models in Vertex AI

Next in Series: CI/CD for ML: Automating from Code to Production

GitHub Repository: production-ready-MLOps-on-GCP

Pipeline Code:

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Production-Ready MLOps on GCP Part 4: Building Reusable Kubeflow Pipeline Components

Saoussen CHAABNIA — Tue, 13 Jan 2026 10:12:28 GMT

Complete Series:

Introduction

In the previous article, we built the infrastructure foundation for our MLOps system using Terraform. Our Vertex AI environment is provisioned, our service accounts have the right permissions, and our artifact registries are ready. Now comes the exciting part: building the ML workflows themselves.

But here’s the thing: if you approach ML pipelines the way most teams do — writing monolithic scripts that do everything — you’ll end up with code that’s hard to test, impossible to reuse, and a nightmare to debug. Every pipeline becomes a snowflake, and maintaining them becomes a full-time job.

The solution? Reusable Kubeflow Pipeline (KFP) components; modular, composable building blocks that can be mixed and matched to create any ML workflow you need.

In this article, we’ll explore:

What makes a good pipeline component
How to design components following the single responsibility principle
Deep dives into 4 critical components from our system
Testing strategies for ML components
Best practices for component development

By the end, you’ll understand how to build a component library that makes creating production ML pipelines as simple as connecting LEGO blocks.

What Are Kubeflow Pipeline Components?

Think of a KFP component as a function with superpowers. It’s a self-contained piece of code that:

Performs one specific task (e.g., “export BigQuery table to GCS”)
Declares its inputs and outputs explicitly
Runs in its own containerized environment
Can be reused across multiple pipelines
Is independently testable

A Simple Example

Here’s what a basic KFP component looks like:

from kfp.dsl import component, Output, Dataset

@component(
    base_image=”python:3.10”,
    packages_to_install=[”pandas==2.0.0”]
)
def process_data_op(
    input_path: str,
    output_dataset: Output[Dataset],
    filter_threshold: float = 0.5
) -> None:
    “”“Process data and save to output.”“”
    import pandas as pd
    # Load data
    df = pd.read_csv(input_path)
    # Apply transformation
    df_filtered = df[df[’score’] > filter_threshold]
    # Save result
    df_filtered.to_csv(output_dataset.path, index=False)

What makes this a component?

@component decorator: Tells KFP this is a reusable component
base_image: Specifies the Docker image to run in
packages_to_install: Auto-installs dependencies
Type annotations: Output[Dataset] tells KFP this produces a dataset artifact
Self-contained logic: Everything needed to run is inside the function

Python Function-Based vs Containerized Components

KFP supports two component types:

Python function-based components (what we use):

Define components as Python functions
KFP automatically containerizes them
Easy to write and test
Perfect for most use cases

Containerized components:

You build the Docker image yourself
Maximum control over the environment
Necessary for complex dependencies or non-Python code

We use function-based components because they offer the best balance of simplicity and power.

Our Component Library: The Building Blocks

Our reference implementation includes 8 reusable components:

extract_table_to_gcs_op - Export BigQuery tables to Cloud Storage
get_training_args_dict_op - Build training configuration dictionaries
get_workerpool_spec_op - Configure distributed training worker pools
get_hyperparameter_tuning_results_op - Parse hyperparameter tuning results
get_custom_job_results_op - Extract metrics from training jobs
lookup_model_op - Find models in Vertex AI Model Registry
upload_best_model_op - Champion/Challenger model comparison and upload
model_batch_predict_op - Execute batch predictions with monitoring

Each component follows the single responsibility principle — it does one thing and does it well. Let’s dive deep into four critical ones.

Component Architecture

Here’s how our 8 reusable components interact with pipelines and GCP services:

Component Deep Dive 1: extract_table_to_gcs_op

Purpose: Export a BigQuery table to Cloud Storage in CSV format.

Why it exists: Many ML frameworks expect data in files (CSV, TFRecord) rather than directly from BigQuery. This component bridges that gap.

Implementation

from kfp.dsl import Dataset, Artifact, component, Input, Output

@component(
    base_image=”python:3.10.14”,
    packages_to_install=[”google-cloud-bigquery==3.24.0”]
)
def extract_table_to_gcs_op(
    bq_table: Input[Artifact],
    dataset: Output[Dataset],
    location: str = “US”,
) -> None:
    “”“Extract a BigQuery table into Google Cloud Storage.”“”
    import google.cloud.bigquery as bq
    # Extract table metadata from input artifact
    project_id = bq_table.metadata[”projectId”]
    dataset_id = bq_table.metadata[”datasetId”]
    table_id = bq_table.metadata[”tableId”]
    # Construct full table ID
    full_table_id = f”{project_id}.{dataset_id}.{table_id}”
    table = bq.table.Table(table_ref=full_table_id)
    # Initialize BigQuery client
    client = bq.client.Client(project=project_id, location=location)
    # Submit extract job to GCS
    extract_job = client.extract_table(table, dataset.uri)
    # Wait for completion
    extract_job.result()

Key Design Decisions

1. Artifact-Based Input

bq_table: Input[Artifact]

The component receives table information as an artifact with metadata, not raw strings. This enables:

Lineage tracking: Vertex AI knows which table produced which dataset
Type safety: Can’t accidentally pass wrong data
Metadata preservation: Project ID, dataset ID, table ID travel together

2. Output as Dataset

dataset: Output[Dataset]

The output is typed as a Dataset, which:

Creates a GCS URI automatically (dataset.uri)
Registers the dataset in Vertex AI Metadata Store
Enables downstream components to reference it

3. Explicit Location

location: str = “US”

BigQuery location matters for data residency and performance. Making it explicit prevents subtle bugs.

Usage in a Pipeline

from kfp import dsl

@dsl.pipeline(name=”my-pipeline”)
def my_pipeline():
    # Previous step creates bq_table artifact
    preprocess_task = preprocess_data_op(...)
    # Extract to GCS
    extract_task = extract_table_to_gcs_op(
        bq_table=preprocess_task.outputs[”bq_table”],
        location=”US”
    )
    # Next step uses the dataset
    train_task = train_model_op(
        training_data=extract_task.outputs[”dataset”]
    )

The output of one component becomes the input of another — clean, type-safe data flow.

Component Deep Dive 2: lookup_model_op

Purpose: Find a model in Vertex AI Model Registry by display name.

Why it exists: For predictions, we need to retrieve the “champion” model. For champion/challenger comparison, we need to find the existing champion.

Implementation (Simplified)

from kfp.dsl import component, Output, Model
from typing import NamedTuple

@component(
    base_image=”python:3.10.14”,
    packages_to_install=[”google-cloud-aiplatform==1.55.0”],
)
def lookup_model_op(
    model_name: str,
    location: str,
    project: str,
    model: Output[Model],
    fail_on_model_not_found: bool = False,
) -> NamedTuple(”Outputs”, [(”model_resource_name”, str), (”training_dataset”, dict)]):
    “”“Fetch a model by display name from Vertex AI Model Registry.”“”
    import json
    import logging
    from pathlib import Path
    from google.cloud.aiplatform import Model
    TRAINING_DATASET_INFO = “training_dataset.json”
    logging.info(f”Listing models with display name {model_name}”)
    models = Model.list(
        filter=f’display_name=”{model_name}”’,
        location=location,
        project=project,
    )
    logging.info(f”Found {len(models)} model(s)”)
    training_dataset = {}
    model_resource_name = “”
    if len(models) == 0:
        logging.error(f”No model found with name {model_name}”)
        if fail_on_model_not_found:
            raise RuntimeError(”Failed as model was not found”)
    elif len(models) == 1:
        target_model = models[0]
        model_resource_name = target_model.resource_name
        # Populate output artifact
        model.uri = target_model.uri
        model.metadata[”resourceName”] = target_model.resource_name
        # Read training dataset metadata (for monitoring)
        path = Path(model.path) / TRAINING_DATASET_INFO
        if path.exists():
            with open(path, “r”) as fp:
                training_dataset = json.load(fp)
            logging.info(f”Training dataset: {training_dataset}”)
    else:
        raise RuntimeError(f”Multiple models with name {model_name} found”)
    return model_resource_name, training_dataset

Key Design Decisions

1. Multiple Return Values

-> NamedTuple(”Outputs”, [(”model_resource_name”, str), (”training_dataset”, dict)])

Components can return multiple outputs. The model_resource_name is used for logging, while training_dataset is used for model monitoring configuration.

2. Flexible Error Handling

fail_on_model_not_found: bool = False

Different scenarios need different behaviors:

First pipeline run: No model exists yet, don’t fail
Production prediction: Model must exist, fail if not found

3. Metadata Extraction The component reads training_dataset.json from the model directory. This metadata (created during training) contains information needed for model monitoring—a great example of components communicating via artifacts and metadata.

Usage: Champion Model Lookup

@dsl.pipeline(name=”prediction-pipeline”)
def prediction_pipeline(model_name: str = “chicago-taxi-fare”):
    # Lookup champion model
    lookup_task = lookup_model_op(
        model_name=model_name,
        location=”us-central1”,
        project=”my-project”,
        fail_on_model_not_found=True  # Must exist for predictions
    )

    # Use champion model for predictions
    predict_task = model_batch_predict_op(
        model=lookup_task.outputs[”model”],
        # ... other params
    )

Component Deep Dive 3: upload_best_model_op

Purpose: Implement Champion/Challenger pattern — compare new model against existing champion, upload to registry only if it’s better.

Why it exists: This is the gatekeeper that prevents degraded models from reaching production. It’s the most critical component in the system.

Implementation (Simplified)

from kfp.dsl import Dataset, Input, Metrics, Model, Output, component
from google_cloud_pipeline_components.types.artifact_types import VertexModel

@component(
    base_image=”python:3.10”,
    packages_to_install=[
        “google-cloud-aiplatform==1.55.0”,
        “google-cloud-pipeline-components==2.14.1”,
    ],
)
def upload_best_model_op(
    model: Input[Model],
    test_data: Input[Dataset],
    model_eval_metrics: Input[Metrics],
    vertex_model: Output[VertexModel],
    project: str,
    location: str,
    model_name: str,
    eval_metric: str,
    eval_lower_is_better: bool,
    pipeline_job_id: str,
    serving_container_image: str,
    model_description: str = None,
    evaluation_name: str = “Imported evaluation”,
) -> None:
    “”“Upload model to registry only if it beats the champion.”“”
    import json
    import logging
    import google.cloud.aiplatform as aip
    from google.protobuf.json_format import MessageToDict
    def lookup_model(model_name: str):
        “”“Look up existing champion model.”“”
        models = aip.Model.list(
            filter=f’display_name=”{model_name}”’,
            location=location,
            project=project,
        )
        if len(models) == 0:
            return None
        elif len(models) == 1:
            return models[0]
        else:
            raise RuntimeError(f”Multiple models with name {model_name} found”)
    def compare_models(champion_metrics, challenger_metrics, eval_lower_is_better):
        “”“Compare models by evaluating a primary metric.”“”
        logging.info(f”Comparing {eval_metric} of models”)
        m_champ = champion_metrics[eval_metric]
        m_chall = challenger_metrics[eval_metric]
        logging.info(f”Champion={m_champ} Challenger={m_chall}”)
        challenger_wins = (
            (m_chall < m_champ) if eval_lower_is_better
            else (m_chall > m_champ)
        )
        logging.info(f”{’Challenger’ if challenger_wins else ‘Champion’} wins!”)
        return challenger_wins
    def upload_model_to_registry(is_default_version, parent_model_uri=None):
        “”“Upload model to Vertex AI Model Registry.”“”
        logging.info(f”Uploading model {model_name} (default: {is_default_version})”)
        uploaded_model = aip.Model.upload(
            display_name=model_name,
            description=model_description,
            artifact_uri=model.uri,
            serving_container_image_uri=serving_container_image,
            parent_model=parent_model_uri,
            is_default_version=is_default_version,
        )
        # Populate output artifact for downstream components
        vertex_model.uri = (
            f”https://{location}-aiplatform.googleapis.com/v1/”
            f”{uploaded_model.versioned_resource_name}”
        )
        vertex_model.metadata[”resourceName”] = (
            uploaded_model.versioned_resource_name
        )
        return uploaded_model
    # Parse challenger metrics
    with open(model_eval_metrics.path, “r”) as f:
        challenger_metrics = json.load(f)
    # Look up champion model
    champion_model = lookup_model(model_name=model_name)
    challenger_wins = True
    parent_model_uri = None
    if champion_model is None:
        logging.info(”No champion model found, uploading new model.”)
    else:
        logging.info(
            f”Model version {champion_model.version_id} “
            “is being challenged by new model.”
        )
        # Get champion evaluation metrics
        champion_eval = champion_model.get_model_evaluation()
        champion_metrics = MessageToDict(
            champion_eval._gca_resource._pb
        )[”metrics”]
        # Compare champion vs challenger
        challenger_wins = compare_models(
            champion_metrics=champion_metrics,
            challenger_metrics=challenger_metrics,
            eval_lower_is_better=eval_lower_is_better,
        )
        parent_model_uri = champion_model.resource_name
    # Upload new model version
    # If challenger wins, it becomes the default version (champion)
    # If challenger loses, it’s uploaded but not set as default
    model = upload_model_to_registry(
        is_default_version=challenger_wins,
        parent_model_uri=parent_model_uri
    )
    # Import evaluation results to Vertex AI
    import_evaluation(
        parsed_metrics=challenger_metrics,
        challenger_model=model,
        evaluation_name=evaluation_name,
    )

Key Design Decisions

1. Champion/Challenger Pattern

is_default_version = challenger_wins

This single line implements model governance:

Challenger wins: Becomes the new default (champion) model
Challenger loses: Still uploaded (for audit trail) but not default

2. Metric-Based Comparison

challenger_wins = (m_chall < m_champ) if eval_lower_is_better else (m_chall > m_champ)

Flexible comparison logic:

For losses (RMSE, MSE): Lower is better
For scores (accuracy, AUC): Higher is better

3. Model Versioning

parent_model=parent_model_uri

All model versions are linked to the same parent, creating a version history in the registry. You can always roll back to a previous version.

4. Evaluation Import The component not only uploads the model but also imports its evaluation metrics into Vertex AI, making them visible in the UI. This is critical for:

Comparing model versions visually
Audit trails
Debugging why a model was/wasn’t promoted

The Three Scenarios

This component handles three scenarios elegantly:

Scenario 1: First Model (No Champion)

No champion found → Upload as default (becomes champion)

Scenario 2: Challenger Wins

Challenger RMSE (2.5) < Champion RMSE (3.1)
→ Upload as default (new champion)

Scenario 3: Champion Wins

Challenger RMSE (3.5) > Champion RMSE (3.1)
→ Upload as non-default (champion unchanged)

Production stays protected — degraded models never become default.

Component Deep Dive 4: model_batch_predict_op

Purpose: Execute batch predictions and enable model monitoring for skew detection.

Why it exists: This component combines two critical production needs — running predictions at scale and monitoring for model degradation.

Implementation Highlights

from kfp.dsl import Input, Model, component, OutputPath
from typing import List, NamedTuple

@component(
    base_image=”python:3.10”,
    packages_to_install=[
        “google-cloud-pipeline-components==2.14.1”,
        “google-cloud-aiplatform==1.55.0”,
    ],
)
def model_batch_predict_op(
    model: Input[Model],
    gcp_resources: OutputPath(str),
    job_display_name: str,
    location: str,
    project: str,
    source_uri: str,
    destination_uri: str,
    source_format: str,
    destination_format: str,
    machine_type: str = “n1-standard-2”,
    starting_replica_count: int = 1,
    max_replica_count: int = 1,
    monitoring_training_dataset: dict = None,
    monitoring_alert_email_addresses: List[str] = None,
    monitoring_skew_config: dict = None,
) -> NamedTuple(”Outputs”, [(”gcp_resources”, str)]):
    “”“Execute batch prediction with optional monitoring.”“”
    import logging
    import time
    from google.cloud.aiplatform_v1beta1.services.job_service import JobServiceClient
    from google.cloud.aiplatform_v1beta1.types import BatchPredictionJob
    from google.protobuf.json_format import ParseDict
    # Configure input/output based on format
    input_config = {”instancesFormat”: source_format}
    output_config = {”predictionsFormat”: destination_format}
    if source_format == “bigquery” and destination_format == “bigquery”:
        input_config[”bigquerySource”] = {”inputUri”: source_uri}
        output_config[”bigqueryDestination”] = {”outputUri”: destination_uri}
    else:
        input_config[”gcsSource”] = {”uris”: [source_uri]}
        output_config[”gcsDestination”] = {”outputUriPrefix”: destination_uri}
    # Build batch prediction request
    message = {
        “displayName”: job_display_name,
        “model”: model.metadata[”resourceName”],
        “inputConfig”: input_config,
        “outputConfig”: output_config,
        “dedicatedResources”: {
            “machineSpec”: {”machineType”: machine_type},
            “startingReplicaCount”: starting_replica_count,
            “maxReplicaCount”: max_replica_count,
        },
    }
    # Add monitoring configuration if provided
    if monitoring_training_dataset and monitoring_skew_config:
        logging.info(”Adding monitoring config to request”)
        message[”modelMonitoringConfig”] = {
            “alertConfig”: {
                “emailAlertConfig”: {
                    “userEmails”: monitoring_alert_email_addresses or []
                },
                “enableLogging”: True,
            },
            “objectiveConfigs”: [{
                “trainingDataset”: monitoring_training_dataset,
                “trainingPredictionSkewDetectionConfig”: monitoring_skew_config,
            }],
        }
    # Submit batch prediction job
    request = ParseDict(message, BatchPredictionJob()._pb)
    client = JobServiceClient(
        client_options={”api_endpoint”: f”{location}-aiplatform.googleapis.com”}
    )
    response = client.create_batch_prediction_job(
        parent=f”projects/{project}/locations/{location}”,
        batch_prediction_job=request,
    )
    logging.info(f”Submitted batch prediction job: {response.name}”)
    # Poll until job completes
    POLLING_INTERVAL = 20
    while True:
        job_status = client.get_batch_prediction_job(name=response.name)
        if job_status.state == JobState.JOB_STATE_SUCCEEDED:
            logging.info(”Job completed successfully”)
            break
        elif job_status.state in [JobState.JOB_STATE_FAILED,
                                   JobState.JOB_STATE_CANCELLED]:
            raise RuntimeError(f”Job failed with state: {job_status.state}”)
        logging.info(f”Job in progress, waiting {POLLING_INTERVAL}s...”)
        time.sleep(POLLING_INTERVAL)
    return (gcp_resources,)

Key Design Decisions

1. Flexible Input/Output Formats

if source_format == “bigquery” and destination_format == “bigquery”:
    # BQ → BQ (most common for our use case)
else:
    # GCS → GCS

The component supports both BigQuery and Cloud Storage, making it reusable for different scenarios.

2. Optional Monitoring

if monitoring_training_dataset and monitoring_skew_config:
    message[”modelMonitoringConfig”] = {...}

Monitoring is optional — you can run predictions without it. But when enabled, Vertex AI automatically:

Compares prediction data distribution to training data
Detects training-serving skew
Sends email alerts if thresholds are exceeded

3. Synchronous Execution with Polling

while True:
    job_status = client.get_batch_prediction_job(...)
    if job_status.state == JOB_STATE_SUCCEEDED:
        break

The component waits for the batch prediction to complete. This is intentional:

Downstream components need the predictions to exist
Failures are caught immediately, not discovered later
Pipeline DAG reflects actual dependencies

4. Resource Configuration

machine_type: str = “n1-standard-2”,
starting_replica_count: int = 1,
max_replica_count: int = 1,

Predictions can scale horizontally. For large datasets, increase replicas for parallel processing.

Component Design Patterns

After examining four components, let’s extract the patterns that make them production-ready.

Pattern 1: Explicit Input/Output Types

Bad:

def my_component(input_path: str) -> str:
    # Returns a string path, no lineage tracking
    return “gs://bucket/output.csv”

Good:

def my_component(
    input_data: Input[Dataset],
    output_data: Output[Dataset]
) -> None:
    # KFP tracks lineage automatically
    process(input_data.path, output_data.path)

Explicit types enable:

Lineage tracking: Vertex AI knows data flow
Type safety: Can’t pass a Model where a Dataset is expected
Automatic URI generation: output_data.uri is created for you

Pattern 2: Metadata for Communication

Components pass data via artifacts, and metadata via artifact properties:

# Component A: Sets metadata
def create_table_op(bq_table: Output[Artifact]) -> None:
    bq_table.metadata[”projectId”] = “my-project”
    bq_table.metadata[”datasetId”] = “my_dataset”
    bq_table.metadata[”tableId”] = “my_table”

# Component B: Reads metadata
def extract_table_op(bq_table: Input[Artifact]) -> None:
    project = bq_table.metadata[”projectId”]  # Metadata preserved

This is cleaner than passing 10 string parameters.

Pattern 3: Logging for Observability

Every component logs extensively:

import logging

logging.info(f”Processing {len(data)} records”)
logging.debug(f”Raw metrics: {raw_metrics}”)
logging.warning(”Model not found, using default”)
logging.error(f”Validation failed: {error_msg}”)

These logs appear in:

Cloud Logging (searchable, filterable)
Vertex AI Pipeline UI (per-step)
Component artifacts

Pro tip: Use structured logging with key-value pairs for easier searching:

logging.info(f”model_upload status=success model_id={model_id} rmse={rmse}”)

Pattern 4: Graceful Error Handling

Components should fail fast and clearly:

Bad:

models = Model.list(...)
model = models[0]  # IndexError if no models!

Good:

models = Model.list(...)
if len(models) == 0:
    if fail_on_model_not_found:
        raise RuntimeError(
            f”No model found with name {model_name}. “
            f”Expected at least one model in {project}/{location}.”
        )
    else:
        logging.warning(”No model found, continuing...”)
        return None

Clear error messages save hours of debugging.

Pattern 5: Configuration via Parameters

Never hardcode:

Bad:

def train_model_op(...):
    BATCH_SIZE = 32  # Hardcoded!
    LEARNING_RATE = 0.001  # Can’t change without editing code

Good:

def train_model_op(
    batch_size: int = 32,
    learning_rate: float = 0.001
):
    # Configurable via pipeline parameters

This makes components reusable across different experiments.

Testing Strategies for Components

Testing ML components requires different approaches than traditional software.

Level 1: Unit Tests

Test component logic in isolation by calling component.python_func:

import components

# Extract the underlying Python function
upload_model = components.upload_best_model_op.python_func
def test_model_upload_no_champion(mock_model_class, tmp_path):
    “”“Test uploading first model (no champion exists).”“”
    # Mock Vertex AI Model.list to return no models
    mock_model_class.list.return_value = []
    # Create test inputs
    model = Model(uri=”gs://bucket/model”)
    metrics_file = tmp_path / “metrics.json”
    metrics_file.write_text(’{”problemType”: “regression”, “rmse”: 2.5}’)
    # Call component function
    upload_model(
        model=model,
        model_eval_metrics=metrics_file,
        eval_metric=”rmse”,
        eval_lower_is_better=True,
        model_name=”test-model”,
        # ... other params
    )
    # Assert model was uploaded as default
    mock_model_class.upload.assert_called_once_with(
        display_name=”test-model”,
        is_default_version=True,  # First model becomes champion
        # ...
    )

Benefits:

Fast (no actual GCP calls)
Cheap (no cloud resources)
Isolated (test one component at a time)

Level 2: Integration Tests

Test component compilation (validates KFP syntax):

def test_component_compiles():
    “”“Ensure component definition is valid.”“”
    from kfp import compiler
    compiler.Compiler().compile(
        pipeline_func=my_pipeline,
        package_path=”pipeline.yaml”
    )
    # If this doesn’t raise, component syntax is valid

Level 3: End-to-End Tests

Run actual pipeline in a dev environment:

# Build container
make build
# Run full training pipeline in dev
make e2e-tests pipeline=training
# Verify outputs
# - Check GCS for artifacts
# - Check Model Registry for uploaded model
# - Check BigQuery for prediction results

E2E tests catch:

IAM permission issues
API enablement problems
Resource quota limits
Real data issues

Run E2E tests on every PR to catch breaking changes before they reach production.

Mocking GCP Services

Use unittest.mock to avoid hitting real GCP APIs:

from unittest.mock import Mock, patch

@patch(”google.cloud.aiplatform.Model”)
@patch(”google.cloud.aiplatform_v1.ModelServiceClient”)
def test_upload_best_model(mock_model_service, mock_model_class):
    # Mock returns
    mock_model_class.list.return_value = []
    mock_model_class.upload.return_value = Mock(
        versioned_resource_name=”models/123/versions/1”
    )
    # Test component
    # ...

This is essential for:

Fast test execution
Testing error conditions
CI/CD without GCP credentials

Best Practices for Component Development

1. Single Responsibility Principle

Each component should do one thing:

Bad: process_and_train_op (does two things)

Good:

process_data_op (preprocessing only)
train_model_op (training only)

Smaller components are easier to:

Test
Debug
Reuse
Understand

2. Idempotency

Components should produce the same output given the same input:

Bad:

timestamp = time.time()  # Different every run!
output_path = f”gs://bucket/data_{timestamp}.csv”

Good:

# Use pipeline-provided timestamp
output_path = f”gs://bucket/data_{pipeline_timestamp}.csv”

Idempotency enables:

Pipeline retries
Reproducible results
Caching

3. Avoid External State

Components should be self-contained:

Bad:

# Reads from external config file
with open(”/config/settings.yaml”) as f:
    config = yaml.load(f)  # Where does this file come from?

Good:

# Configuration passed as parameters
def my_component(batch_size: int, learning_rate: float):
    # Everything needed is in the function signature

4. Version Dependencies Explicitly

@component(
    base_image=”python:3.10.14”,  # Exact version
    packages_to_install=[
        “google-cloud-aiplatform==1.55.0”,  # Exact version
        “pandas==2.0.3”,  # Exact version
    ],
)

Exact versions ensure:

Reproducible builds
No surprise breakages from dependency updates
Clear dependency audit trail

5. Document Inputs and Outputs

def my_component(
    input_data: Input[Dataset],
    threshold: float = 0.5,
) -> None:
    “”“
    Process input data and filter by threshold. 
    Args:
        input_data: Dataset containing features to process.
        threshold: Minimum score to include in output (default: 0.5).
    Outputs:
        output_data: Filtered dataset written to GCS.
    “”“

Good documentation helps:

Other developers use your components
Future you remember what it does
Auto-generated pipeline documentation

Composing Components into Pipelines

Components are the building blocks; pipelines are the assemblies.

Simple Pipeline Example

from kfp import dsl

@dsl.pipeline(
    name=”training-pipeline”,
    description=”Train taxi fare prediction model”
)
def training_pipeline(
    project: str,
    location: str,
    model_name: str = “chicago-taxi-fare”,
):
    # Step 1: Preprocess data
    preprocess_task = preprocess_data_bq_op(
        project=project,
        location=location,
    )
    # Step 2: Extract to GCS
    extract_task = extract_table_to_gcs_op(
        bq_table=preprocess_task.outputs[”bq_table”],
        location=location,
    )
    # Step 3: Train model
    train_task = train_model_op(
        training_data=extract_task.outputs[”dataset”],
        # ...
    )
    # Step 4: Evaluate model
    eval_task = evaluate_model_op(
        model=train_task.outputs[”model”],
        test_data=extract_task.outputs[”test_dataset”],
    )
    # Step 5: Upload if better than champion
    upload_task = upload_best_model_op(
        model=train_task.outputs[”model”],
        model_eval_metrics=eval_task.outputs[”metrics”],
        eval_metric=”rmse”,
        eval_lower_is_better=True,
        model_name=model_name,
        # ...
    )

Notice:

Each task is a component invocation
Outputs of one task feed inputs of another
Pipeline is a Python function decorated with @dsl.pipeline

The next article will dive deep into this training pipeline.

Debugging Components

When components fail, here’s how to debug:

1. Check Component Logs

In Vertex AI Pipeline UI:

Click on failed component
View “Logs” tab
Filter by severity (ERROR, WARNING)

2. Examine Artifacts

Components write artifacts to GCS:

# List artifacts from a pipeline run
gsutil ls gs://my-project-pl-root/artifacts/training-pipeline-123/

# Read a specific artifact
gsutil cat gs://my-project-pl-root/.../metrics.json

3. Test Locally

# Import component function
from components import extract_table_to_gcs_op

# Call directly (not as a KFP component)
extract_func = extract_table_to_gcs_op.python_func
# Test with local mock data
extract_func(
    bq_table=mock_table,
    dataset=mock_dataset,
    location=”US”
)

4. Increase Logging

Add more logging statements temporarily:

logging.info(f”DEBUG: input_data.uri = {input_data.uri}”)
logging.info(f”DEBUG: metadata = {input_data.metadata}”)

Redeploy and rerun to see additional context.

Conclusion

Reusable Kubeflow Pipeline components are the foundation of maintainable MLOps systems. By following the patterns we’ve explored:

Single responsibility: Each component does one thing well
Explicit types: Inputs and outputs are strongly typed
Metadata communication: Pass rich information via artifacts
Extensive logging: Make debugging possible
Comprehensive testing: Unit, integration, and E2E tests

You can build a component library that makes creating new ML pipelines fast, reliable, and enjoyable.

Our 8 components cover the essentials:

Data movement (BigQuery ↔ GCS)
Model registry operations (lookup, upload)
Training configuration (worker specs, hyperparameters)
Inference (batch predictions with monitoring)

In the next article, we’ll combine these components into a complete production training pipeline that handles everything from raw data to a deployed model in Vertex AI Model Registry.

Key Takeaways:

KFP components are containerized Python functions with explicit inputs/outputs
Components should be small, focused, and reusable
Strong typing enables lineage tracking and type safety
Comprehensive testing (unit + integration + E2E) prevents production issues
Champion/Challenger pattern ensures only better models reach production
Good logging makes debugging 10x easier

Next in Series: Designing Production ML Pipelines: Training Pipeline Deep Dive

GitHub Repository: production-ready-MLOps-on-GCP

Component Code:

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Production-Ready MLOps on GCP Part 3: Infrastructure as Code for ML( Terraform + Vertex AI)

Saoussen CHAABNIA — Tue, 13 Jan 2026 10:12:25 GMT

Complete Series:

Introduction

In the previous article, we explored the overall architecture of a production-ready MLOps system on GCP. Now comes the critical question: how do you actually provision all of this infrastructure reliably across dev, test, and production environments?

If you’ve ever manually clicked through the Google Cloud Console to set up Vertex AI pipelines, BigQuery datasets, service accounts, and IAM roles, you know the pain. It’s error-prone, hard to replicate, and impossible to version control. What worked in dev mysteriously breaks in prod. You forget a critical IAM permission. A teammate can’t reproduce your setup.

This is where Infrastructure as Code (IaC) transforms MLOps from fragile to rock-solid.

In this article, we’ll dive deep into:

Why Infrastructure as Code is non-negotiable for MLOps
How to structure Terraform modules for ML workloads
Setting up Vertex AI infrastructure across multiple environments
IAM best practices for secure, least-privilege ML pipelines
Managing Terraform state and deployment workflows

Let’s build infrastructure that’s as version-controlled and testable as your ML code.

Why Infrastructure as Code for MLOps?

Before we dive into code, let’s address the elephant in the room: why bother with IaC when you can just create resources in the Cloud Console?

The Manual Approach Doesn’t Scale

Imagine this scenario:

You manually set up Vertex AI in your dev project
Three months later, you need to replicate it in prod
You can’t remember all the steps
IAM roles are different between environments
The prod deployment fails mysteriously
You spend days debugging what should have been a 10-minute deployment

With Infrastructure as Code:

You define your infrastructure once in Terraform
You apply it to dev: terraform apply
You apply the same code to prod: terraform apply
Everything is identical and reproducible
Changes are tracked in Git with full audit history

Key Benefits for MLOps

1. Reproducibility Every environment (dev/test/prod) uses the exact same code. No configuration drift.

2. Version Control Infrastructure changes go through pull requests, just like application code. You can see who changed what and when.

3. Environment Parity Test environments mirror production exactly, reducing “works on my machine” issues.

4. Disaster Recovery If a project gets accidentally deleted or corrupted, you can recreate it in minutes from code.

5. Documentation Your Terraform code is living documentation of your infrastructure.

6. Collaboration Team members can review and understand infrastructure changes before they’re deployed.

7. Testing Infrastructure changes can be previewed with terraform plan before applying.

Our Terraform Architecture

Our infrastructure follows a modular design with clear separation between reusable modules and environment-specific configurations.

Directory Structure

terraform/
├── environments/           # Environment-specific configurations
│   ├── dev/
│   │   ├── main.tf        # Dev environment setup
│   │   ├── variables.tf   # Dev-specific variables
│   │   ├── auto.tfvars    # Dev variable values
│   │   └── backend.tf     # State backend configuration
│   ├── test/
│   │   └── ...            # Same structure as dev
│   └── prod/
│       └── ...            # Same structure as dev
│
└── modules/               # Reusable Terraform modules
    ├── vertex_deployment/  # Core Vertex AI infrastructure
    │   ├── main.tf        # Resource definitions
    │   ├── variables.tf   # Module variables
    │   ├── iam.tf         # IAM roles and permissions
    │   ├── outputs.tf     # Exported values
    │   └── versions.tf    # Provider versions
    │
    └── cloudrunfunction/   # Cloud Run Function for triggers
        └── ...

Key Design Principles:

DRY (Don’t Repeat Yourself): Common infrastructure is defined once in modules
Environment Isolation: Each environment has its own state and configuration
Separation of Concerns: Modules handle specific capabilities (Vertex, Cloud Functions, etc.)
Consistent Interface: All environments use the same module interface

The vertex_deployment Module: Core Infrastructure

The vertex_deployment module is the heart of our MLOps infrastructure.

The following diagram shows all resources provisioned by our Terraform modules:

Let’s break down what it provisions.

1. Google Cloud APIs

First, we enable all required GCP services:

resource “google_project_service” “gcp_services” {
  for_each                   = toset(var.gcp_service_list)
  project                    = var.project_id
  service                    = each.key
  disable_on_destroy         = var.disable_services_on_destroy
  disable_dependent_services = var.disable_dependent_services
}

Services enabled (17 total):

aiplatform.googleapis.com - Vertex AI core
artifactregistry.googleapis.com - Docker images and pipelines
bigquery.googleapis.com - Data warehouse
cloudbuild.googleapis.com - CI/CD
cloudfunctions.googleapis.com - Event triggers
cloudscheduler.googleapis.com - Scheduled runs
pubsub.googleapis.com - Event messaging
iam.googleapis.com - Access control
And 9 more supporting services…

Why this matters: Forgetting to enable a single API can cause cryptic failures. By declaring all dependencies in code, we ensure consistent setup every time.

2. Service Accounts: Identity and Access

We create two dedicated service accounts with minimal permissions:

# Service account for Vertex AI Pipelines
resource “google_service_account” “pipelines_sa” {
  project      = var.project_id
  account_id   = “vertex-pipelines”
  display_name = “Vertex Pipelines Service Account”
  depends_on   = [google_project_service.gcp_services]
}
# Service account for Cloud Run Function (pipeline trigger)
resource “google_service_account” “vertex_cloudrunfunction_sa” {
  project      = var.project_id
  account_id   = “vertex-cloudrunfunction-sa”
  display_name = “Cloud Run Function Service Account”
  depends_on   = [google_project_service.gcp_services]
}

Security principle: Each component has its own identity with only the permissions it needs. If the Cloud Run Function is compromised, it can’t access resources meant only for pipelines.

3. Cloud Storage: Artifact Storage

We provision two GCS buckets with security best practices:

# Pipeline artifacts and outputs
resource “google_storage_bucket” “pipeline_root_bucket” {
  name                        = “${var.project_id}-pl-root”
  location                    = var.region
  project                     = var.project_id
  uniform_bucket_level_access = true
  public_access_prevention    = “enforced”
  depends_on                  = [google_project_service.gcp_services]
}
# Cloud Run Function source code
resource “google_storage_bucket” “gcf_source_bucket” {
  name                        = “${var.project_id}-gcf-source”
  location                    = local.cloudrunfunction_region
  project                     = var.project_id
  uniform_bucket_level_access = true
  public_access_prevention    = “enforced”
  depends_on                  = [google_project_service.gcp_services]
}

Security features:

uniform_bucket_level_access: Consistent permissions using IAM only (no legacy ACLs)
public_access_prevention: Blocks any attempt to make objects public
Region-specific: Data stays in your preferred location

4. Vertex AI Metadata Store

The metadata store provides lineage tracking for all ML artifacts:

resource “google_vertex_ai_metadata_store” “default_metadata_store” {
  provider    = google-beta
  name        = “default”
  description = “Default metadata store”
  project     = var.project_id
  region      = var.region
  depends_on  = [google_project_service.gcp_services]
}

This enables:

Lineage tracking: See which data produced which models
Experiment tracking: Compare training runs and hyperparameters
Reproducibility: Trace any prediction back to its training data
Compliance: Audit trails for regulatory requirements

5. Artifact Registry: Docker Images and Pipelines

We create two repositories with different formats:

# Docker container images (training containers)
resource “google_artifact_registry_repository” “mlops_docker_repo” {
  repository_id = “mlops-docker-repo”
  description   = “Container images for model training”
  project       = var.project_id
  location      = var.region
  format        = “DOCKER”
  depends_on    = [google_project_service.gcp_services]
}

# Kubeflow Pipeline definitions
resource “google_artifact_registry_repository” “mlops_pipeline_repo” {
  repository_id = “mlops-pipeline-repo”
  description   = “KFP repository for Vertex Pipelines”
  project       = var.project_id
  location      = var.region
  format        = “KFP”
  depends_on    = [google_project_service.gcp_services]
}

Why separate repositories?

Docker and KFP formats have different versioning and metadata needs
Separate permissions: training job builders need Docker access, pipeline deployers need KFP access
Cleaner organization and lifecycle management

6. Pub/Sub: Event-Driven Orchestration

For asynchronous pipeline notifications:

resource “google_pubsub_topic” “pipeline_completion” {
  name       = “pipeline-completion”
  project    = var.project_id
  depends_on = [google_project_service.gcp_services]
}

resource “google_pubsub_subscription” “pipeline_completion_subscription” {
  name    = “pipeline-completion-subscription”
  topic   = google_pubsub_topic.pipeline_completion.id
  project = var.project_id
  push_config {
    push_endpoint = module.cloudrunfunction.function_uri
  }
}

Event flow:

Pipeline completes (success or failure)
Vertex AI publishes to pipeline-completion topic
Pub/Sub pushes notification to Cloud Run Function
Function can trigger dependent pipelines (e.g., training completes → run prediction)

The cloudrunfunction Module: Event-Driven Triggers

While scheduled pipelines run periodically, the cloudrunfunction module enables event-driven execution triggered by new data in BigQuery.

Module Overview

# terraform/environments/prod/main.tf
module “cloudrunfunction” {
  source = “../../modules/cloudrunfunction”
  project_id          = var.project_id
  region              = var.region
  crf_service_account = module.vertex_deployment.cloudrunfunction_sa_email
  gcf_source_bucket   = module.vertex_deployment.gcf_source_bucket
  # Pipeline configuration (JSON-encoded)
  pipeline_config = {
    type                     = “training”
    display_name             = “event-driven-training”
    bq_location              = var.bq_location
    use_latest_data          = true
    timestamp                = “”
    training_template_path   = “https://${var.region}-kfp.pkg.dev/${var.project_id}/mlops-pipeline-repo/taxifare-training-pipeline/latest”
    prediction_template_path = “https://${var.region}-kfp.pkg.dev/${var.project_id}/mlops-pipeline-repo/taxifare-batch-prediction-pipeline/latest”
    pubsub_topic_name        = “training-pipeline-complete”
  }
  # BigQuery trigger: fires when new data inserted
  dataset_id = “chicago_taxi_trips”
  table_id   = “taxi_trips”
}

How It Works

BigQuery Audit Logs generate events when data is inserted
Cloud Run Function is triggered by the audit log event
Function reads PIPELINE_CONFIG from environment variables
Resolves template URI from Artifact Registry (tag → digest)
Submits pipeline job to Vertex AI
Listens on Pub/Sub for training completion
Triggers prediction pipeline automatically

Function code location: terraform/modules/cloudrunfunction/src/main.py

Key Configuration

Trigger Configuration:

event_type = “google.cloud.audit.log.v1.written”
methodName = “google.cloud.bigquery.v2.JobService.InsertJob”
resourceName = “projects/.../datasets/{dataset_id}/tables/{table_id}”

Environment Variables:

PIPELINE_CONFIG: JSON with pipeline template paths and parameters
Standard Vertex AI variables (project, location, service account)

This provides an alternative to scheduled runs, enabling continuous training as new data arrives.

IAM: Security by Design

IAM configuration is where many MLOps projects go wrong. Too permissive, and you’ve created security holes. Too restrictive, and pipelines fail with cryptic permission errors.

Our IAM strategy follows least privilege: each service account gets only the permissions it needs, nothing more.

Vertex Pipelines Service Account Permissions

# Project-level roles for Vertex Pipelines SA
resource “google_project_iam_member” “pipelines_sa_project_roles” {
  for_each = toset(var.pipelines_sa_project_roles)
  project  = var.project_id
  role     = each.key
  member   = “serviceAccount:${google_service_account.pipelines_sa.email}”
}

# Default roles:
# - roles/aiplatform.user          # Submit Vertex AI jobs
# - roles/logging.logWriter        # Write logs
# - roles/bigquery.dataEditor      # Read/write BigQuery
# - roles/bigquery.jobUser         # Run BigQuery jobs
# - roles/artifactregistry.reader  # Pull Docker images

Bucket-specific permissions:

resource “google_storage_bucket_iam_member” “pipelines_sa_pipeline_root_bucket_iam” {
  for_each = toset([
    “roles/storage.objectAdmin”,
    “roles/storage.legacyBucketReader”,
  ])
  bucket = google_storage_bucket.pipeline_root_bucket.name
  member = “serviceAccount:${google_service_account.pipelines_sa.email}”
  role   = each.value
}

Why both roles?

objectAdmin: Create, read, update, delete objects in the bucket
legacyBucketReader: List bucket contents (required for Vertex AI)

Cloud Run Function Service Account Permissions

The function needs to trigger pipelines and access compiled pipeline definitions:

# Allow function SA to impersonate pipelines SA
resource “google_service_account_iam_member” “cloudrunfunction_sa_can_use_pipelines_sa” {
  service_account_id = google_service_account.pipelines_sa.name
  role               = “roles/iam.serviceAccountUser”
  member             = “serviceAccount:${google_service_account.vertex_cloudrunfunction_sa.email}”
}

# Access to Artifact Registry for compiled pipelines
resource “google_artifact_registry_repository_iam_member” “cloudrunfunction_sa_can_access_ar” {
  project    = google_artifact_registry_repository.mlops_pipeline_repo.project
  location   = google_artifact_registry_repository.mlops_pipeline_repo.location
  repository = google_artifact_registry_repository.mlops_pipeline_repo.name
  role       = “roles/artifactregistry.reader”
  member     = “serviceAccount:${google_service_account.vertex_cloudrunfunction_sa.email}”
}

Permission chain:

Cloud Run Function executes with vertex_cloudrunfunction_sa
To submit a pipeline, it needs to use pipelines_sa (service account impersonation)
It needs to read the compiled pipeline from Artifact Registry

This separation ensures the function can trigger pipelines but can’t directly access pipeline data or training artifacts.

Environment Configuration

Each environment (dev/test/prod) has its own configuration but uses the same module:

Dev Environment (terraform/environments/dev/main.tf)

terraform {
  required_version = “>= 1.9”

Environment-Specific Variables (auto.tfvars)

# Dev environment
project_id = “my-mlops-dev”
region     = “us-central1”
dataset_id = “chicago_taxi_dev”
table_id   = “taxi_trips”

# Test environment
project_id = “my-mlops-test”
region     = “us-central1”
dataset_id = “chicago_taxi_test”
table_id   = “taxi_trips”

# Prod environment
project_id = “my-mlops-prod”
region     = “us-central1”
dataset_id = “chicago_taxi_prod”
table_id   = “taxi_trips”

Same module, different values. This ensures environment parity while allowing environment-specific configuration.

Terraform State Management

Terraform state is critical — it’s the source of truth for what’s deployed. Losing state means losing track of your infrastructure.

Remote State in GCS

Each environment has its own state bucket:

# Create state bucket for dev environment
export DEV_PROJECT_ID=my-mlops-dev
export DEV_LOCATION=us-central1

gsutil mb -p $DEV_PROJECT_ID \
  -l $DEV_LOCATION \
  gs://${DEV_PROJECT_ID}-tfstate
# Enable versioning for state recovery
gsutil versioning set on gs://${DEV_PROJECT_ID}-tfstate

Backend Configuration

During Terraform initialization, specify the backend:

cd terraform/environments/dev

terraform init \
  -backend-config=”bucket=${DEV_PROJECT_ID}-tfstate” \
  -backend-config=”prefix=terraform/state”

Benefits:

Shared state: Team members see the same infrastructure state
Locking: Prevents concurrent modifications
Versioning: Recover from accidental deletions or bad changes
Encryption: State is encrypted at rest in GCS

Deployment Workflow

Here’s how infrastructure changes flow through environments:

1. Local Development

# Make infrastructure changes
cd terraform/modules/vertex_deployment
# Edit main.tf, iam.tf, etc.

# Validate syntax
terraform fmt -check
terraform validate

2. Pull Request

Open a PR with your changes. Cloud Build automatically runs:

# cloudbuild/terraform-plan.yaml
steps:
  - name: ‘hashicorp/terraform’
    args:
      - init
      - -backend-config=bucket=${_TFSTATE_BUCKET}

  - name: ‘hashicorp/terraform’
      args:
        - plan
        - -out=tfplan

Review the plan output:

Resources to be added (green +)
Resources to be modified (yellow ~)
Resources to be destroyed (red -)

This preview helps catch unintended changes before they reach production.

3. Merge to Main

When the PR merges, Cloud Build automatically applies changes:

# cloudbuild/terraform-apply.yaml
steps:
  - name: ‘hashicorp/terraform’
    args:
      - init
      - -backend-config=bucket=${_TFSTATE_BUCKET}

  - name: ‘hashicorp/terraform’
      args:
        - apply
        - -auto-approve

Deployment order:

Dev environment (lowest risk)
Test environment (validate before prod)
Prod environment (final deployment)

Separate Cloud Build triggers ensure controlled, sequential rollout.

Best Practices We Follow

1. Explicit Dependencies

Always use depends_on when resources have implicit dependencies:

resource “google_storage_bucket” “pipeline_root_bucket” {
  name       = “${var.project_id}-pl-root”
  # ... other config ...
  depends_on = [google_project_service.gcp_services]
}

This ensures APIs are enabled before creating resources that use them.

2. Parameterized Modules

Use variables for everything that might change:

variable “region” {
  description = “GCP region for resources”
  type        = string
}

variable “project_id” {
  description = “GCP project ID”
  type        = string
}

Never hardcode project IDs, regions, or environment-specific values in modules.

3. Resource Naming Conventions

Use consistent, predictable naming:

name = “${var.project_id}-pl-root”  # GCS bucket
account_id = “vertex-pipelines”      # Service account
repository_id = “mlops-docker-repo”  # Artifact Registry

This makes resources easy to identify and troubleshoot.

4. Security-First Configuration

Always use the most restrictive settings:

uniform_bucket_level_access = true   # No legacy ACLs
public_access_prevention    = “enforced”  # Never public

Loosen restrictions only when absolutely necessary with documented justification.

5. Module Outputs

Export values needed by other modules or external tools:

# outputs.tf
output “pipeline_root_bucket” {
  description = “GCS bucket for pipeline artifacts”
  value       = google_storage_bucket.pipeline_root_bucket.name
}

output “pipelines_sa_email” {
  description = “Email of Vertex Pipelines service account”
  value       = google_service_account.pipelines_sa.email
}

This enables:

Passing values between modules
Using Terraform outputs in CI/CD pipelines
Documentation of important resource identifiers

Common Pitfalls and How We Avoid Them

Pitfall 1: API Enablement Race Conditions

Problem: Creating resources before APIs are fully enabled causes failures.

Solution: Explicit depends_on for all resources:

resource “google_vertex_ai_metadata_store” “default_metadata_store” {
  # ... config ...
  depends_on = [google_project_service.gcp_services]
}

Pitfall 2: Service Account Permissions Missing

Problem: Pipelines fail with cryptic “Permission denied” errors.

Solution: Comprehensive IAM configuration with commented explanations:

# Grant BigQuery access for data preprocessing
“roles/bigquery.dataEditor”,
# Enable job submission for BigQuery queries
“roles/bigquery.jobUser”,

Pitfall 3: Bucket Permissions Too Broad

Problem: Using roles/storage.admin on buckets grants excessive permissions.

Solution: Minimal bucket-level permissions:

for_each = toset([
  “roles/storage.objectAdmin”,        # Object operations only
  “roles/storage.legacyBucketReader”, # List bucket contents
])

Pitfall 4: State File Conflicts

Problem: Multiple developers running terraform apply simultaneously corrupts state.

Solution: GCS backend with automatic locking:

backend “gcs” {
  bucket = “my-project-tfstate”
  # Locking is automatic with GCS backend
}

Testing Infrastructure Code

Just like application code, infrastructure should be tested:

1. Terraform Validate

terraform validate

Checks syntax and internal consistency.

2. Terraform Plan

terraform plan -out=tfplan

Preview changes before applying. Review the plan in CI/CD.

3. tflint (Optional)

tflint --init
tflint

Catches common errors and enforces best practices.

4. terraform-docs (Documentation)

terraform-docs markdown table . > README.md

Generates documentation from your Terraform code.

Real-World Example: Adding a New Service

Let’s walk through adding Cloud Scheduler support to enable scheduled pipeline runs.

Step 1: Update Module Variables

# terraform/modules/vertex_deployment/variables.tf
variable “enable_scheduler” {
  description = “Enable Cloud Scheduler for periodic pipeline runs”
  type        = bool
  default     = false
}

variable “training_schedule” {
  description = “Cron expression for training pipeline schedule”
  type        = string
  default     = “0 2 * * 0”  # Weekly on Sunday at 2 AM
}

Step 2: Add Cloud Scheduler Resource

# terraform/modules/vertex_deployment/scheduler.tf
resource “google_cloud_scheduler_job” “training_pipeline_schedule” {
  count = var.enable_scheduler ? 1 : 0

name     = “training-pipeline-schedule”
  schedule = var.training_schedule
  region   = var.region
  pubsub_target {
    topic_name = google_pubsub_topic.pipeline_trigger.id
    data       = base64encode(jsonencode({
      pipeline_type = “training”
    }))
  }
}

Step 3: Enable in Prod Only

# terraform/environments/prod/main.tf
module “vertex_deployment” {
  source     = “../../modules/vertex_deployment”
  project_id = var.project_id
  region     = var.region

# Enable scheduled runs in prod only
  enable_scheduler  = true
  training_schedule = “0 2 * * 0”  # Weekly retraining
}

Step 4: Deploy

cd terraform/environments/prod
terraform plan  # Review changes
terraform apply # Deploy scheduler

Dev and test remain unchanged (scheduler disabled by default).

Conclusion

Infrastructure as Code transforms MLOps from fragile and manual to robust and automated. With Terraform:

Environments are identical and reproducible
Changes are version-controlled and reviewed
Deployments are automated and consistent
Security is built-in from day one
Team collaboration is streamlined

Our Terraform modules provide:

Complete Vertex AI infrastructure (Pipelines, Training, Registry)
Secure IAM with least-privilege service accounts
Event-driven orchestration with Pub/Sub
Artifact storage with GCS and Artifact Registry
Multi-environment deployment patterns

In the next article, we’ll build on this infrastructure to create reusable Kubeflow Pipeline components that execute our ML workflows.

Key Takeaways:

Use modules for reusable infrastructure patterns
Separate environments with distinct configurations
Store Terraform state remotely in GCS with versioning
Follow least-privilege IAM principles
Automate deployment with Cloud Build
Preview changes with terraform plan before applying

Next in Series: Building Reusable Kubeflow Pipeline Components

GitHub Repository: production-ready-MLOps-on-GCP

Terraform Files:

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Production-Ready MLOps on GCP Part 2: Tools & Workflows for ML Teams

Saoussen CHAABNIA — Tue, 13 Jan 2026 10:12:18 GMT

Complete Series:

Introduction

We’re building a complete production-ready MLOps system:

But here’s the truth: the best-designed system in the world is useless if developers hate using it.

Developer experience (DX) is what determines whether your MLOps platform gets adopted or abandoned. A great DX means:

Fast iteration cycles
Easy to get started
Simple debugging
Clear documentation
Smooth collaboration

In this final article, we’ll explore:

Makefile shortcuts for common tasks
Poetry for dependency management
Pre-commit hooks for code quality
Local testing and debugging workflows
IDE setup and productivity tips
Team collaboration patterns

By the end, you’ll understand how to create an MLOps platform that developers actually enjoy using.

The Developer’s Daily Workflow

Let’s follow a typical development day:

Morning: Pick Up a Task

# Pull latest changes
git pull origin main# Create feature branch
git checkout -b feature/add-new-feature# Activate environment
cd pipelines
poetry shell

Mid-Morning: Develop Locally

# Make changes to a component
vim components/src/components/my_component.py# Run unit tests (fast feedback)
make test-components# Compile pipeline to check syntax
make compile pipeline=training

Afternoon: Test in Cloud

# Build training container
make build# Run full pipeline in dev environment
make training enable_caching=false # Check Vertex AI UI for results

End of Day: Open PR

# Commit changes (pre-commit hooks run)
git add .
git commit -m “Add new feature for X”# Push and create PR
git push origin feature/add-new-feature
gh pr create --title “Add new feature” --body “...”

Key observation: Notice the short feedback loops. Developers don’t wait hours for CI/CD — they get fast local feedback first.

Makefile: Developer-Friendly Automation

Typing long commands is tedious. Our Makefile provides shortcuts:

Common Commands

# Install dependencies
make install# Run unit tests
make test-components
make test-pipelines# Compile pipelines
make compile pipeline=training
make compile pipeline=prediction

# Build and push Docker image
make build
# Run pipelines
make training
make prediction
# Run E2E tests
make e2e-tests pipeline=training

Behind the Scenes

Let’s look at the make training command:

training: ## Run training pipeline
	@$(MAKE) run pipeline=trainingrun: ## Run a pipeline. Set pipeline=.
	@if [ $(compile) = “true” ]; then \
		$(MAKE) compile ; \
	fi && \
	if [ $(build) = “true” ]; then \
		$(MAKE) build ; \
	fi && \
	cd pipelines/src && \
	poetry run python -m pipelines.utils.trigger_pipeline \
	  --template_path=./taxifare-${pipeline}-pipeline.yaml \
	  --display_name=taxifare-${pipeline}-pipeline \
	  --enable_caching=${enable_caching} \
	  --use_latest_data=${use_latest_data}

Benefits:

Simple interface: make training instead of 5 commands
Configurable: make training build=false enable_caching=true
Self-documenting: make help shows all targets

Custom Targets

Add your own shortcuts:

# Quick iteration: compile + run (no build)
quick: compile=true build=false enable_caching=true
	@$(MAKE) training
# Full test: build + compile + run
full: compile=true build=true enable_caching=false
	@$(MAKE) training

Usage:

make quick  # Fast iteration
make full   # Complete test

Poetry: Dependency Management Done Right

Why Poetry? Better than pip + requirements.txt:

Clean Dependency Declaration

# pyproject.toml
[tool.poetry.dependencies]
python = “^3.10”
google-cloud-aiplatform = “^1.55.0”
kfp = “^2.7.0”
pandas = “^2.0.0”
[tool.poetry.group.dev.dependencies]
pytest = “^7.4.0”
black = “^23.7.0”
flake8 = “^6.1.0”
pre-commit = “^3.3.0”

Benefits:

Lock file ensures reproducibility
Dependency groups (dev, test, prod)
Semantic versioning (¹.55.0 = 1.55.0 to <2.0.0)
Fast dependency resolution

Common Poetry Commands

# Install all dependencies (including dev)
poetry install --with dev
# Install production dependencies only
poetry install --without dev
# Add a new dependency
poetry add google-cloud-bigquery
# Add a dev dependency
poetry add --group dev pytest-cov
# Update dependencies
poetry update
# Activate virtual environment
poetry shell
# Run command in environment
poetry run python -m pipelines.training

Virtual Environment Management

Poetry creates isolated environments:

# Where is the environment?
poetry env info
# Output:
# Virtualenv
# Python:         3.10.14
# Path:           /home/user/.cache/pypoetry/virtualenvs/pipelines-abc123
# Multiple Python versions
poetry env use python3.10
poetry env use python3.11

Pre-commit Hooks: Automatic Code Quality

Problem: Code style inconsistencies slow down code reviews.

Solution: Automate formatting and linting before commit.

Hook Configuration

repos:
  # Basic checks
  - repo: https://github.com/pre-commit/pre-commit-hooks
    hooks:
      - id: trailing-whitespace    # Remove trailing whitespace
      - id: end-of-file-fixer      # Ensure files end with newline
      - id: check-yaml             # Validate YAML files
      - id: check-added-large-files  # Prevent huge files
      - id: check-merge-conflict   # Catch merge markers
# Code formatting
  - repo: https://github.com/psf/black
    hooks:
      - id: black
        args: [--line-length=100]
# Linting
  - repo: https://github.com/pycqa/flake8
    hooks:
      - id: flake8
        args: [--max-line-length=100, --ignore=E203,W503]
# Modern linting + auto-fix
  - repo: https://github.com/astral-sh/ruff-pre-commit
    hooks:
      - id: ruff
        args: [--fix]
      - id: ruff-format

Installation and Usage

# Install hooks (once per repository)
cd pipelines
poetry run pre-commit install

# Hooks run automatically on git commit
git add my_file.py
git commit -m “Add feature”

# Pre-commit runs:
# ✅ Removes trailing whitespace
# ✅ Formats code with black
# ✅ Checks with flake8
# ✅ Auto-fixes with ruff

# Commit proceeds only if all pass

Manual Execution

# Run all hooks on all files
pre-commit run --all-files
# Run specific hook
pre-commit run black --all-files
# Skip hooks (use sparingly!)
git commit --no-verify -m “Emergency fix”

What Gets Caught

Before black:

def my_function(x,y,z):
    result=x+y+z
    return result

After black:

def my_function(x, y, z):
    result = x + y + z
    return result

flake8 errors:

my_file.py:45:80: E501 line too long (101 > 100 characters)
my_file.py:67:1: F401 ‘os’ imported but unused

Local Testing and Debugging

Unit Tests with pytest

# Run all tests
cd components
poetry run pytest tests/
# Run specific test file
poetry run pytest tests/test_upload_best_model_op.py
# Run specific test
poetry run pytest tests/test_upload_best_model_op.py::test_champion_wins
# Show print statements
poetry run pytest -s tests/
# Stop at first failure
poetry run pytest -x tests/
# Show coverage
poetry run pytest --cov=components tests/

Test Structure

components/
├── src/
│   └── components/
│       └── upload_best_model_op.py
└── tests/
    ├── conftest.py                      # Shared fixtures
    ├── test_upload_best_model_op.py
    └── test_lookup_model_op.py

Example Test with Mocking

# tests/test_upload_best_model_op.py
import pytest
from unittest.mock import Mock, patch

@patch(”google.cloud.aiplatform.Model”)
def test_first_model_upload(mock_model_class, tmp_path):
    “”“Test uploading first model (no champion exists).”“”

    # Mock: No existing models
    mock_model_class.list.return_value = []

    # Mock: Upload returns model
    mock_uploaded = Mock()
    mock_uploaded.versioned_resource_name = “models/123/versions/1”
    mock_model_class.upload.return_value = mock_uploaded

    # Create test metrics
    metrics = {”problemType”: “regression”, “rmse”: 2.5}

    # Call component
    upload_best_model_op.python_func(
        model_eval_metrics=create_metrics_file(tmp_path, metrics),
        eval_metric=”rmse”,
        eval_lower_is_better=True,
        model_name=”test-model”,
        # ... other params
    )

    # Verify upload was called with is_default_version=True
    mock_model_class.upload.assert_called_once()
    call_args = mock_model_class.upload.call_args
    assert call_args.kwargs[”is_default_version”] == True

Key testing patterns:

Mock GCP APIs (no real API calls)
Use tmp_path fixture for file operations
Test edge cases (no champion, champion wins, challenger wins)
Verify function calls with assert_called_once_with

Debugging Failed Pipelines

Scenario: Pipeline failed in Vertex AI. How to debug locally?

# 1. Find the failed step in Vertex AI UI
#    Example: “Upload model” component failed

# 2. Extract component function
cd components
poetry shell

python
>>> from components import upload_best_model_op
>>> func = upload_best_model_op.python_func

# 3. Call with test data
>>> func(
...     model=test_model,
...     model_eval_metrics=test_metrics,
...     eval_metric=”rmse”,
...     eval_lower_is_better=True,
...     model_name=”test-model”,
...     # ... other params
... )

# 4. Add print/logging for debugging
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> func(...)  # Run again with debug logging

Local Pipeline Compilation

Test pipeline compiles before pushing:

# Compile training pipeline
cd pipelines/src
poetry run python -m pipelines.training

# Output: taxifare-training-pipeline.yaml

# Inspect compiled YAML
head -50 taxifare-training-pipeline.yaml

Common compilation errors:

Error: Component ‘my_component_op’ not found
→ Check import in training.py

Error: Type mismatch: expected Dataset, got Model
→ Check component input/output types

Error: Missing required parameter ‘project’
→ Check pipeline function signature

Error: Type mismatch: expected Dataset, got Model
→ Check component input/output types

Error: Missing required parameter ‘project’
→ Check pipeline function signature

IDE Setup and Productivity

VS Code Configuration

// .vscode/settings.json
{
  // Python interpreter
  “python.defaultInterpreterPath”: “${workspaceFolder}/pipelines/.venv/bin/python”,

  // Formatting
  “python.formatting.provider”: “black”,
  “editor.formatOnSave”: true,

  // Linting
  “python.linting.enabled”: true,
  “python.linting.flake8Enabled”: true,
  “python.linting.pylintEnabled”: false,

  // Type checking
  “python.analysis.typeCheckingMode”: “basic”,

  // Pytest
  “python.testing.pytestEnabled”: true,
  “python.testing.unittestEnabled”: false,

  // File associations
  “files.associations”: {
    “*.yaml”: “yaml”,
    “*.tf”: “terraform”
  }
}  // Formatting
  “python.formatting.provider”: “black”,
  “editor.formatOnSave”: true,

Recommended VS Code Extensions

{
  “recommendations”: [
    “ms-python.python”,           // Python support
    “ms-python.vscode-pylance”,   // Fast type checking
    “hashicorp.terraform”,        // Terraform support
    “redhat.vscode-yaml”,         // YAML support
    “eamodio.gitlens”,            // Git superpowers
    “ms-azuretools.vscode-docker” // Docker support
  ]
}

Environment Management

Environment Variables

# env.sh (never commit!)
export VERTEX_PROJECT_ID=”my-dev-project”
export VERTEX_LOCATION=”us-central1”
export BQ_LOCATION=”US”
export VERTEX_PIPELINE_ROOT=”gs://my-dev-project-pl-root”
export VERTEX_SA_EMAIL=”vertex-pipelines@my-dev-project.iam.gserviceaccount.com”
export IMAGE_NAME=”training”
export IMAGE_TAG=”latest”

# Load environment
source env.sh

# Verify
echo $VERTEX_PROJECT_ID

Example File

# env.sh.example (committed to Git)
export VERTEX_PROJECT_ID=”your-project-id”
export VERTEX_LOCATION=”us-central1”
export BQ_LOCATION=”US”
export VERTEX_PIPELINE_ROOT=”gs://your-bucket/pipeline-root”
export VERTEX_SA_EMAIL=”vertex-pipelines@your-project.iam.gserviceaccount.com”
export IMAGE_NAME=”training”
export IMAGE_TAG=”latest”

New developers:

cp env.sh.example env.sh
vim env.sh  # Update with your values
source env.sh

Git Workflow and Collaboration

Branch Naming Conventions

feature/add-hyperparameter-tuning
bugfix/fix-preprocessing-null-handling
refactor/simplify-upload-component
docs/update-readme

Examples:

feat: add learning rate scheduling to training

Implements cosine annealing learning rate schedule.
Improves model convergence speed by 20%.

Closes #123

fix: handle null values in preprocessing query

Previously, rows with null trip_seconds caused
preprocessing to fail. Now uses COALESCE to replace
with median value.

Fixes #456

Commit Message Guidelines

Using Commitizen (Conventional Commits)

Commit format

(optional scope): 
[optional body]
[optional footer]

Type Mapping (Your Guidelines → Commitizen)

Feature

feat(auth): add OAuth2 login support

Bug fix

fix(api): handle null user response

Refactor

refactor(pipeline): simplify data preprocessing logic

Docs

docs(readme): add setup instructions for local dev

Code Review Checklist

For reviewers:

[ ] Does code follow existing patterns?
[ ] Are tests added/updated?
[ ] Is documentation updated?
[ ] Do pipelines compile?
[ ] Are there any security issues?

For authors:

[ ] Run pre-commit hooks
[ ] Run unit tests locally
[ ] Compile pipelines
[ ] Update CHANGELOG if needed
[ ] Request specific reviewers

Productivity Tips

Shell Aliases

# ~/.bashrc or ~/.zshrc
alias gs=’git status’
alias gl=’git log --oneline -10’
alias gp=’git pull origin main’
alias v=’source env.sh’

# Poetry shortcuts
alias pi=’poetry install’
alias pr=’poetry run’
alias ps=’poetry shell’

# Make shortcuts
alias mt=’make test-components && make test-pipelines’
alias mc=’make compile pipeline=training && make compile pipeline=prediction’

Quick Iteration Workflow

# One-liner: test + compile + run
make test-components && make compile pipeline=training && make training build=false

Jupyter Notebooks for Exploration

# Install Jupyter
poetry add --group dev jupyter

# Launch
poetry run jupyter notebook

# Explore data
import pandas as pd
from google.cloud import bigquery

client = bigquery.Client(project=”my-dev-project”)
df = client.query(”SELECT * FROM dataset.table LIMIT 100”).to_dataframe()
df.head()

Onboarding New Team Members

Day 1 Checklist

## Setup Checklist

- [ ] Install prerequisites
  - [ ] Python 3.10+
  - [ ] Poetry
  - [ ] Docker
  - [ ] gcloud CLI
  - [ ] Terraform
  - [ ] Git

- [ ] Clone repository
  ```bash
  git clone https://github.com/Saoussen-CH/production-ready-MLOps-on-GCP
  cd production-ready-MLOps-on-GCP

[ ] Configure environment

cp env.sh.example env.sh
# Edit env.sh with your values
source env.sh

[ ] Install dependencies

make install

[ ] Set up pre-commit hooks

cd pipelines
poetry run pre-commit install

[ ] Authenticate to GCP

gcloud auth login
gcloud auth application-default login

[ ] Run tests

make test-components
make test-pipelines

[ ] Compile pipelines

make compile pipeline=training
make compile pipeline=prediction

[ ] Run first pipeline

make training build=true enable_caching=false

### First Contribution

```bash
# Pick a good first issue
# Look for issues labeled “good first issue” or “beginner-friendly”

# Example: Update documentation
git checkout -b docs/fix-typo-in-readme
vim README.md
git add README.md
git commit -m “docs: fix typo in README”
git push origin docs/fix-typo-in-readme
gh pr create

Documentation Best Practices

Code Documentation

def upload_best_model_op(
    model: Input[Model],
    model_eval_metrics: Input[Metrics],
    eval_metric: str,
    eval_lower_is_better: bool,
    model_name: str,
) -> None:
    “”“
    Upload model to registry only if it beats the champion.

    Implements the Champion/Challenger pattern: compares new model
    against current default model in registry. Uploads new model
    as default only if it has better performance on eval_metric.

    Args:
        model: Trained model to evaluate as challenger.
        model_eval_metrics: Evaluation metrics from test set.
        eval_metric: Metric name for comparison (e.g., “rmse”).
        eval_lower_is_better: True for losses, False for scores.
        model_name: Display name in Model Registry.

    Returns:
        None. Uploads model to Vertex AI Model Registry.

    Example:
        >>> upload_best_model_op(
        ...     model=trained_model,
        ...     model_eval_metrics=metrics,
        ...     eval_metric=”rmse”,
        ...     eval_lower_is_better=True,
        ...     model_name=”taxi-fare-model”
        ... )
    “”“

README Structure

# Project Name

## Overview
[Brief description]

## Prerequisites
[Required tools and versions]

## Setup
[Step-by-step installation]

## Usage
[Common commands]

## Testing
[How to run tests]

## Contributing
[Contribution guidelines]

## Troubleshooting
[Common issues and solutions]

Conclusion

Developer experience is what separates a good MLOps platform from a great one. By focusing on:

Makefile shortcuts: Common tasks are one command away
Poetry: Reliable dependency management
Pre-commit hooks: Automatic code quality
Fast local testing: Iteration without waiting for CI
Clear documentation: Easy onboarding
Smooth Git workflow: Collaboration without friction

You create an environment where developers can focus on improving models, not fighting tools.

This series has taken you from architecture to implementation to deployment to developer experience. You now have a complete blueprint for building production-ready MLOps systems on GCP.

What’s next?

Star the GitHub repository
Try implementing it for your use case
Share feedback and improvements
Help others by answering questions

Thank you for following this series. Now go build amazing ML systems!

Key Takeaways:

Makefile provides developer-friendly shortcuts
Poetry manages dependencies reliably
Pre-commit hooks enforce code quality automatically
Fast local feedback loops increase productivity
Good documentation lowers onboarding time
Developer experience determines platform adoption

GitHub Repository: production-ready-MLOps-on-GCP

Developer Tools:

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Production-Ready MLOps on GCP Part 1: Architecture Overview

Saoussen CHAABNIA — Tue, 13 Jan 2026 10:12:10 GMT

Introduction

If you’ve ever tried to move a machine learning model from your Jupyter notebook to production, you know the struggle. The model works beautifully on your laptop, but suddenly you’re drowning in questions: How do I retrain it automatically? How do I version my models? How do I deploy to multiple environments? How do I monitor model performance over time?

Welcome to the world of MLOps — where the real challenge isn’t building models, it’s building systems that can train, deploy, and maintain models reliably at scale.

In this series, I’ll walk you through a complete production-ready MLOps implementation on Google Cloud Platform. We’ll use a real-world use case (predicting Chicago taxi fares) to demonstrate how to build an ML system that’s actually ready for production, not just a proof-of-concept.

Complete Series:

By the end of this series, you’ll understand how to:

Structure multi-environment ML infrastructure (dev/test/prod)
Build reusable, testable ML pipeline components
Automate CI/CD for machine learning workflows
Implement model versioning, evaluation, and monitoring
Design event-driven continuous training systems

Let’s start with the big picture.

The Challenge: Why Most ML Projects Fail in Production

According to various industry reports, 85–90% of ML projects never make it to production. Even when they do, many fail within the first year. Why?

The gap between a working ML model and a production ML system is enormous:

Automation: Models need to retrain automatically when new data arrives
Reproducibility: You need to recreate any model version from the past
Testing: Both code and data need comprehensive validation
Monitoring: Model performance degrades over time and needs tracking
Multi-environment deployment: Changes must flow through dev → test → prod
Compliance: You need audit trails, lineage tracking, and governance

This is where MLOps comes in, applying DevOps principles to machine learning workflows.

Our Solution: A Complete MLOps Architecture on GCP

Our reference implementation addresses these challenges with a comprehensive architecture built on Google Cloud Platform. Let’s break down the key components.

The Use Case: Chicago Taxi Fare Prediction

To keep this practical, we’re solving a real problem: predicting taxi fares for Chicago taxi trips. This use case demonstrates common ML patterns:

Tabular data from BigQuery (public Chicago taxi dataset)
Feature engineering with both numeric and categorical variables
Batch predictions for generating forecasts at scale
Model retraining when new data becomes available
Champion/Challenger comparison for model evaluation

The model predicts fare amounts based on:

Trip characteristics (distance, duration, time of day)
Temporal features (day of week, hour of day)
Categorical features (payment type, company)

High-Level Architecture

Our architecture follows a multi-project strategy with four distinct GCP projects:

┌─────────────────────────────────────────────────────────────┐
│  Admin Project                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Cloud Build CI/CD Pipelines                         │   │
│  │  - PR Checks        - Terraform Plan/Apply           │   │
│  │  - E2E Tests        - Release Management             │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                            │
                            ├──────────────┬──────────────┐
                            ▼              ▼              ▼
              ┌──────────────────┐  ┌──────────────┐  ┌──────────────┐
              │  Dev Project     │  │ Test Project │  │ Prod Project │
              ├──────────────────┤  ├──────────────┤  ├──────────────┤
              │ • Vertex AI      │  │ • Vertex AI  │  │ • Vertex AI  │
              │ • BigQuery       │  │ • BigQuery   │  │ • BigQuery   │
              │ • GCS            │  │ • GCS        │  │ • GCS        │
              │ • Artifact Reg.  │  │ • Artifact   │  │ • Artifact   │
              │ • Model Registry │  │ • Model Reg. │  │ • Model Reg. │
              └──────────────────┘  └──────────────┘  │ • Cloud Run  │
                                                      │ • Pub/Sub    │
                                                      └──────────────┘

Why four projects?

Dev Project: Shared sandbox for experimentation and development
Test Project: Mirrors production for validation before release
Prod Project: Production environment with strict controls
Admin Project: Centralized CI/CD that deploys to all environments

This separation provides:

Isolation: Changes in dev don’t affect production
Security: Different IAM policies per environment
Cost tracking: Separate billing for each environment
Compliance: Clear audit trails and change controls

The GCP Services Stack

Our solution leverages these Google Cloud services:

ML Platform (Vertex AI)

Vertex AI Pipelines: Orchestrates ML workflows using Kubeflow
Vertex AI Training: Runs custom training jobs with hyperparameter tuning
Vertex AI Model Registry: Versions and manages models with lineage
Vertex AI Batch Prediction: Executes large-scale inference
Vertex AI Metadata Store: Tracks artifacts, lineage, and experiments

Data & Storage

BigQuery: Data warehouse for preprocessing and feature engineering
Cloud Storage: Stores artifacts, datasets, and pipeline outputs
Artifact Registry: Hosts Docker images and compiled pipelines

Automation & Orchestration

Cloud Build: CI/CD pipelines for testing and deployment
Cloud Run Functions: Event-driven pipeline triggers
Cloud Pub/Sub: Asynchronous messaging for pipeline events
Vertex AI Pipeline Schedules: Periodic pipeline execution

Infrastructure & Security

Terraform: Infrastructure as Code for reproducible deployments
IAM & Service Accounts: Fine-grained access control
Cloud Monitoring & Logging: Observability and debugging

The Two Core ML Pipelines

Our system implements two main pipelines orchestrated with Kubeflow Pipelines (KFP):

1. Training Pipeline

The training pipeline executes these steps:

Data Preprocessing (BigQuery)
         ↓
   Data Splitting (80/10/10)
         ↓
  Export to GCS (CSV)
         ↓
Hyperparameter Tuning (6 trials)
         ↓
  Model Training (Custom TF Container)
         ↓
Model Evaluation (Test Set)
         ↓
Champion/Challenger Comparison
         ↓
Upload to Model Registry

Key features:

Repeatable data splits: Same random seed ensures reproducibility
BigQuery-native preprocessing: SQL-based feature engineering
Custom TensorFlow container: Full control over training logic
Automatic hyperparameter tuning: Vertex AI optimizes learning rate and batch size
Champion/Challenger pattern: New models must beat existing champion on RMSE
Model versioning: All models tagged and stored in registry

2. Prediction Pipeline

The prediction pipeline handles batch inference:

Lookup Champion Model (Registry)
↓
Data Preprocessing (BigQuery)
↓
Batch Prediction (BQ → BQ)
↓
Model Monitoring (Skew Detection)
↓
Alert on Issues

Key features:

Consistent preprocessing: Uses same SQL logic as training
Scalable inference: BigQuery batch predictions for millions of rows
Training-serving skew detection: Monitors for data drift
Automated alerts: Email notifications when skew is detected

Reusable Component Library

A critical aspect of our architecture is reusability. We’ve built 8 custom Kubeflow components that can be mixed and matched:

extract_table_to_gcs_op: Export BigQuery → Cloud Storage
get_training_args_dict_op: Build training configuration
get_workerpool_spec_op: Configure distributed training
get_hyperparameter_tuning_results_op: Parse tuning results
get_custom_job_results_op: Extract training metrics
lookup_model_op: Find models in registry by criteria
upload_best_model_op: Champion/Challenger comparison
model_batch_predict_op: Execute predictions with monitoring

These components are the building blocks that make our pipelines composable and maintainable.

The Development Workflow

Here’s how a typical development cycle works:

1. Local Development

# Developer makes changes locally
git checkout -b feature/improve-model# Run pre-commit hooks (linting, formatting)
poetry run pre-commit run --all-files# Run unit tests
make test# Compile pipeline locally
make run pipeline=training compile=true build=false# Test in dev environment
make training enable_caching=false

2. Pull Request & CI

When you open a PR, Cloud Build automatically:

Runs pre-commit hooks (flake8, black, ruff)
Executes unit tests on components and pipelines
Compiles pipelines to verify syntax
Runs Terraform plan to preview infrastructure changes
(Optionally) Runs E2E tests with /gcbrun comment

3. Merge & Deployment

On merge to main:

Terraform Apply deploys infrastructure changes
Code is ready for release

4. Release

Creating a git tag triggers:

Docker image builds for all environments
Pipeline compilation and versioning
Upload to Artifact Registry with semantic version tags
Ready for scheduling in test/prod

5. Production Execution

In production:

Scheduled: Vertex AI Pipeline Schedules trigger pipelines periodically
Event-driven: Cloud Run Function triggers on new data arrival
Manual: Direct pipeline submission for ad-hoc runs

Key Design Principles

Our architecture follows several important principles:

1. Everything as Code

Infrastructure: Terraform modules
Pipelines: Python with KFP SDK
Training logic: Containerized Python
Configuration: Version-controlled YAML

2. Environment Parity

Test and production environments are identical, ensuring:

What works in test will work in prod
No surprises during deployment
Reduced debugging time

3. Immutable Artifacts

Once built, artifacts never change:

Docker images tagged with git SHA and version
Compiled pipelines versioned in Artifact Registry
Models versioned in Model Registry

4. Automated Testing

Multiple testing layers:

Unit tests: Component logic validation
Integration tests: Pipeline compilation
E2E tests: Full pipeline execution in dev
Infrastructure tests: Terraform validation

5. Security by Design

Least privilege IAM (separate service accounts per pipeline)
No public bucket access
Secrets managed by GCP Secret Manager
Audit logging enabled

6. Observability First

Cloud Logging for all pipeline steps
Vertex AI Metadata for lineage tracking
Model monitoring for performance degradation
Alerting on failures and anomalies

Getting Started

The complete code for this reference implementation is available on GitHub: production-ready-MLOps-on-GCP

To follow along, you’ll need:

GCP account with billing enabled
Four GCP projects (or start with one for learning)
Basic knowledge of Python, Terraform, and ML concepts
Familiarity with Docker and CI/CD concepts

Conclusion

Building production-ready ML systems is complex, but it doesn’t have to be mysterious. By following proven patterns and leveraging the right GCP services, you can create ML systems that are:

Reliable: Automated testing and validation
Scalable: Leveraging managed GCP services
Maintainable: Modular, reusable components
Auditable: Complete lineage and versioning
Secure: Proper IAM and access controls

In the next article, we’ll dive into the infrastructure layer, exploring how Terraform modules provision and manage our Vertex AI environment across dev, test, and prod projects.

Next in Series: Infrastructure as Code for ML: Terraform + Vertex AI

GitHub Repository: production-ready-MLOps-on-GCP

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Building Distributed Multi-Agent Systems with Google’s AI Stack: Part 5

Saoussen CHAABNIA — Tue, 13 Jan 2026 09:44:56 GMT

Building Distributed Multi-Agent Systems with Google’s AI Stack series:

Part 1: From Monolithic AI to Distributed Intelligence: Building Your First Multi-Agent System
Part 2: Making Agents Talk: Agent-to-Agent (A2A) Protocol Deep Dive
Part 3: Building the Orchestrator: Coordinating Agents with the AgentTool Pattern
Part 4: Scaling Multi-Agent Workflows: Solving the Token Limit Problem
Part 5: External Tool Integration via Model Context Protocol (MCP) ← You are here
Part 6: Deploying to Cloud: Cloud Run and Vertex AI Agent Engine

Welcome Back!

In Part 4, we solved the token limit problem with context compaction. Our multi-agent system now handles complex workflows beautifully.

But there’s one more capability we need: connecting to external services.

Our Project Manager agent needs to:

Create tasks in Notion
Link tasks to projects
Work with any Notion database structure
Support multilingual property names

Enter Model Context Protocol (MCP) — a standardized way to connect LLMs to external tools.

In this article, we’ll:

Understand what MCP is and why it matters
Integrate the official Notion MCP server
Implement dynamic schema discovery
Deploy MCP-enabled agents to Cloud Run

Let’s connect our agents to the real world!

What is Model Context Protocol (MCP)?

MCP is a standardized protocol for connecting LLMs to external tools and data sources, created by Anthropic.

Why MCP?

Without MCP (Traditional approach):

# Custom integration for each service
def create_notion_task(title, status, due_date):
    # Custom API client
    # Custom request formatting
    # Custom error handling
    # Custom response parsing
    ...
def create_slack_message(channel, text):
    # Different custom implementation
    ...
def query_database(query):
    # Yet another custom implementation
    ...

With MCP:

# Single standard interface for all tools
mcp_toolset = McpToolset(connection_params=...)
# Agent automatically discovers and uses tools
agent = Agent(
    name=”project_manager”,
    tools=[mcp_toolset]  # All tools available!
)

MCP Benefits

Standardized: One protocol for all external tools
Discoverable: Tools describe themselves
Composable: Mix and match tool servers
Secure: Controlled access and permissions
Community-driven: Growing ecosystem of MCP servers

MCP Architecture

┌─────────────────────────────────┐
│     Agent (LLM)                 │
│                                 │
│  “I need to create a task       │
│   in Notion...”                 │
└──────────────┬──────────────────┘
               │
               ↓ Tool Discovery
┌──────────────────────────────────┐
│   MCP Toolset (ADK Integration)  │
│                                  │
│   - Discovers available tools    │
│   - Formats requests             │
│   - Handles responses            │
└──────────────┬───────────────────┘
               │
               ↓ Stdio/HTTP
┌──────────────────────────────────┐
│   MCP Server                     │
│   (@notionhq/notion-mcp-server)  │
│                                  │
│   - Exposes Notion API as tools  │
│   - Handles authentication       │
│   - Provides tool descriptions   │
└──────────────┬───────────────────┘
               │
               ↓ HTTPS
┌──────────────────────────────────┐
│   Notion API                     │
│                                  │
│   - Actual database operations   │
└──────────────────────────────────┘

Setting Up Notion for MCP

Step 1: Create Notion Integration

Go to notion.so/my-integrations
Click “New integration”
Name it “AI Creative Studio”
Select your workspace
Click “Submit”
Copy the Internal Integration Token (starts with secret_)

Step 2: Create Two Notion Databases

We need TWO databases: Projects and Tasks

Projects Database:

Properties:
- Project name (Title) ← required
- Status (Status: Not started, In progress, Completed)
- Priority (Select: High, Medium, Low)
- Dates (Date with start and end)
- Summary (Rich text)

Tasks Database:

Properties:
- Task name (Title) ← required
- Status (Status: Not started, In progress, Done)
- Priority (Select: High, Medium, Low)
- Due (Date)
- Project (Relation → Projects database)

Step 3: Share Databases with Integration

Open each database
Click “…” menu → “Add connections”
Select your “AI Creative Studio” integration
Repeat for both databases

Step 4: Get Database IDs

Projects Database:

URL: https://www.notion.so/workspace/abc123...
                                     ^^^^^^^^
                                     This is the database ID

Tasks Database:

URL: https://www.notion.so/workspace/def456...
                                     ^^^^^^^^
                                     This is the database ID

Step 5: Configure Environment Variables

# .env
NOTION_API_KEY=secret_abc123...
NOTION_DATABASE_ID=abc123...  # Projects database
TASKS_DATABASE_ID=def456...   # Tasks database

Installing Notion MCP Server

The Project Manager needs Node.js to run the Notion MCP server:

Local Development

# Install Node.js (if not already installed)
# macOS:
brew install node
# Ubuntu/Debian:
sudo apt install nodejs npm
# Verify
node --version  # Should be 18+
npm --version

Cloud Run (Dockerfile)

FROM python:3.12-slim
WORKDIR /app
# Install Node.js for MCP server
RUN apt-get update && apt-get install -y \
    nodejs \
    npm \
    curl \
    && rm -rf /var/lib/apt/lists/*
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy agent code
COPY agent.py .
# ... rest of Dockerfile

Integrating MCP with Project Manager Agent

Step 1: Import MCP Tools

# agents/project_manager/agent.py
import os
import logging
from google.adk.agents import Agent
from google.adk.tools.mcp_tool import McpToolset, StdioConnectionParams
from mcp import StdioServerParameters
from dotenv import load_dotenv
load_dotenv()
logger = logging.getLogger(”ai_creative_studio.project_manager”)

Step 2: Configure Notion MCP Server

def create_project_manager():
    “”“Create Project Manager agent with Notion MCP integration”“”
    # Get configuration
    notion_api_key = os.getenv(”NOTION_API_KEY”)
    projects_db_id = os.getenv(”NOTION_DATABASE_ID”)
    tasks_db_id = os.getenv(”TASKS_DATABASE_ID”, “2ceb1b31123181508894ddb3c597dc48”)
    if not notion_api_key or not projects_db_id:
        logger.warning(”⚠️  Notion credentials not set - agent will work without Notion integration”)
        notion_toolset = None
    else:
        logger.info(”✅ Configuring Notion MCP integration”)
        # IMPORTANT: Notion MCP server expects NOTION_TOKEN, not NOTION_API_KEY
        mcp_env = {
            “NOTION_TOKEN”: notion_api_key,  # ← Note: NOTION_TOKEN
            “PATH”: os.environ.get(”PATH”, “/usr/local/bin:/usr/bin:/bin”)
        }
        # Configure Notion MCP server using globally installed version
        server_params = StdioServerParameters(
            command=”notion-mcp-server”,  # Use globally installed version
            args=[],
            env=mcp_env
        )
        # Create MCP toolset
        notion_toolset = McpToolset(
            connection_params=StdioConnectionParams(
                server_params=server_params,
                timeout=30.0  # 30 second timeout for MCP server startup
            )
        )
        logger.info(”✅ Notion MCP toolset configured”)
    # Create agent with MCP tools
    agent = Agent(
        name=”project_manager”,
        model=”gemini-2.5-flash”,
        instruction=get_system_instruction(projects_db_id, tasks_db_id),
        description=”Project manager for creating timelines, tasks, and organizing deliverables”,
        tools=[notion_toolset] if notion_toolset else []
    )
    logger.info(”✅ Project Manager agent created”)
    return agent

root_agent = create_project_manager()

Key points:

Uses globally installed @notionhq/notion-mcp-server (pinned to v1.9.1 in Dockerfile)
Passes NOTION_TOKEN (not NOTION_API_KEY) to MCP server
Stdio transport (communication via stdin/stdout)
30-second timeout for server startup

Note: We use the globally installed version instead of npx -y to control the exact MCP server version (see Version Pinning Considerations section below).

Dynamic Schema Discovery

Here’s the problem: Hardcoded property names break easily.

The Hardcoded Approach (Fragile)

INSTRUCTION = “”“
Create a page in Notion:
properties = {
    “Name”: {”title”: [{”text”: {”content”: “Project X”}}]},
    “Status”: {”status”: {”name”: “In progress”}},
    “Priority”: {”select”: {”name”: “High”}}
}

Problems:

Breaks if property names change
Doesn’t work with multilingual databases (“Nom”, “Statut”, “Priorité”)
Requires code changes for different databases
No flexibility

Dynamic Schema Discovery (Robust)

Instead, we discover the schema at runtime:

def get_system_instruction(projects_db_id: str, tasks_db_id: str) -> str:
    return f”“”You are a Project Manager with Notion MCP integration.
**CRITICAL: Dynamic Schema Discovery**
Before creating any pages, you MUST discover the actual database schema.
**Step 1: Discover Projects Database Schema**
Use: API-retrieve-a-database
Database ID: {projects_db_id}
This returns:
- Actual property names (might be “Project name”, “Nom du projet”, etc.)
- Property types (title, status, select, date, etc.)
- Available options for status and select properties
- Relation configurations
**Step 2: Adapt to Actual Schema**
DO NOT assume property names! Use the discovered schema:
Example response:
{{
    “properties”: {{
        “Project name”: {{”type”: “title”}},  ← Could be different!
        “État”: {{”type”: “status”}},         ← French!
        “Priorité”: {{”type”: “select”}},     ← French!
        “Dates”: {{”type”: “date”}}
    }}
}}
Create pages using the ACTUAL property names from the schema.
**Step 3: Create Project Page**
Use: API-post-page
Database ID: {projects_db_id}
Properties: [Use discovered names]
**Step 4: Extract Project ID**
From the response, extract the page ID:
{{
    “id”: “abc-123-def-456”,  ← Save this!
    ...
}}
**Step 5: Discover Tasks Database Schema**
Use: API-retrieve-a-database
Database ID: {tasks_db_id}
**Step 6: Create Task Pages**
Use: API-post-page (multiple times)
Database ID: {tasks_db_id}
Properties: [Use discovered names from tasks schema]
Link to project using the relation property:
{{
    “[Relation Property Name]”: {{
        “relation”: [{{”id”: “abc-123-def-456”}}]  ← Project ID from step 4
    }}
}}
**Example Workflow:**
1. Discover Projects DB → Get actual property names
2. Create project page → Get project ID
3. Discover Tasks DB → Get actual property names
4. Create task 1 → Link to project ID
5. Create task 2 → Link to project ID
... (5-10 tasks total)
**IMPORTANT RULES:**
- NEVER hardcode property names like “Name”, “Status”, “Priority”
- ALWAYS use API-retrieve-a-database first
- ALWAYS adapt to the actual schema
- Property names can be in any language
- Relation properties link databases together
**Your Primary Output:**
Create a text-based project timeline with:
- Milestones
- Tasks and deadlines
- Team responsibilities
- Budget breakdown
THEN (if Notion credentials available):
- Create project and tasks in Notion
- Provide links to created pages
“”“

How It Works in Practice

Agent: “I need to create a project in Notion”
    ↓
Step 1: Call API-retrieve-a-database (Projects DB)
    ↓
Response: {
    “properties”: {
        “Nom du projet”: {”type”: “title”},     ← French!
        “Statut”: {”type”: “status”},
        “Priorité”: {”select”: {
            “options”: [
                {”name”: “Haute”},
                {”name”: “Moyenne”},
                {”name”: “Basse”}
            ]
        }}
    }
}
    ↓
Step 2: Agent adapts - uses “Nom du projet”, “Statut”, “Priorité”
    ↓
Step 3: Create page with ACTUAL property names
    ↓
✅ Works with any database structure!

Benefits:

Language-agnostic: Works with French, Spanish, Japanese databases
Flexible: No hardcoded property names
Resilient: Adapts to schema changes
Portable: Same code works with different Notion workspaces

Available MCP Tools

The Notion MCP server exposes these tools:

API-retrieve-a-database

# Get database schema
{
    “name”: “API-retrieve-a-database”,
    “description”: “Retrieve database schema and properties”,
    “parameters”: {
        “database_id”: “abc123...”
    }
}

API-post-page

# Create a new page
{
    “name”: “API-post-page”,
    “description”: “Create a new page in a database”,
    “parameters”: {
        “parent”: {”database_id”: “abc123...”},
        “properties”: {
            “Title Property”: {”title”: [...]},
            “Status Property”: {”status”: {”name”: “In progress”}},
            ...
        }
    }
}

API-patch-page

# Update an existing page
{
    “name”: “API-patch-page”,
    “description”: “Update page properties”,
    “parameters”: {
        “page_id”: “page-123...”,
        “properties”: {...}
    }
}

API-post-database-query

# Query database with filters
{
    “name”: “API-post-database-query”,
    “description”: “Query database with filters and sorts”,
    “parameters”: {
        “database_id”: “abc123...”,
        “filter”: {...},
        “sorts”: [...]
    }
}

Testing MCP Integration Locally

Test Script

# agents/project_manager/test_local_notion.py
import asyncio
from agent import root_agent
from google.adk import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
async def test_notion_integration():
    “”“Test Project Manager with Notion MCP”“”
    brief = “”“
    Create a project timeline for the EcoFlow Instagram campaign.
    Campaign details:
    - Product: EcoFlow smart water bottle
    - Target: Millennials 25-34
    - Budget: $5,000
    - Duration: 2 weeks
    - Deliverables: 5 Instagram posts, visuals, timeline
    Please create:
    1. A text-based project timeline
    2. Project and tasks in Notion (if available)
    “”“
    print(”📋 Testing Project Manager with Notion MCP\n”)
    print(f”Brief: {brief}\n”)
    session_service = InMemorySessionService()
    runner = Runner(
        app_name=”project_manager”,
        agent=root_agent,
        session_service=session_service
    )
    session_id = “test_notion”
    user_id = “test_user”
    try:
        await session_service.create_session(
            app_name=”project_manager”,
            user_id=user_id,
            session_id=session_id
        )
        print(”project_manager > “, end=’‘, flush=True)
        async for event in runner.run_async(
            user_id=user_id,
            session_id=session_id,
            new_message=types.Content(parts=[types.Part(text=brief)])
        ):
            if hasattr(event, ‘text’) and event.text:
                text = event.text
                # Highlight MCP tool calls
                if “API-retrieve-a-database” in text:
                    print(”\n[MCP] Discovering database schema...”, end=’‘)
                elif “API-post-page” in text:
                    print(”\n[MCP] Creating page in Notion...”, end=’‘)
                print(text, end=’‘, flush=True)
        print(”\n\n✅ Project Manager test complete!”)
    finally:
        await runner.close()

if __name__ == “__main__”:
    asyncio.run(test_notion_integration())

Expected Output

📋 Testing Project Manager with Notion MCP
project_manager > I’ll create a project timeline for your EcoFlow campaign.
[MCP] Discovering database schema...
I’ve discovered the Projects database schema.
[MCP] Creating page in Notion...
✓ Created project: “EcoFlow Instagram Campaign”
Project URL: https://notion.so/...
[MCP] Discovering database schema...
I’ve discovered the Tasks database schema.
[MCP] Creating page in Notion...
✓ Created task: “Market Research”
[MCP] Creating page in Notion...
✓ Created task: “Content Creation (5 posts)”
[MCP] Creating page in Notion...
✓ Created task: “Visual Design”
... (more tasks)
**Project Timeline:**
Week 1:
- Days 1-2: Market Research & Strategy
- Days 3-5: Content Creation (5 Instagram posts)
- Days 6-7: Visual Design & Image Generation
Week 2:
- Days 1-2: Review & Revisions
- Days 3-5: Final Approvals
- Days 6-7: Campaign Launch
**Notion Pages Created:**
✅ Project: EcoFlow Instagram Campaign
✅ 8 tasks created and linked to project
✅ Project Manager test complete!

Two-Database Architecture

Why Two Databases?

Projects Database: High-level campaigns

One project = one campaign
Contains overview information
Has dates, budget, status

Tasks Database: Granular work items

Multiple tasks per project
Detailed action items
Assigned to team members
Has deadlines, priorities

Relation: Tasks link to Projects via relation property

Creating the Relation

# In Tasks database, create “Project” relation property:
1. Add property → Relation
2. Name: “Project” (or any name)
3. Select: Projects database
4. Save
# Now tasks can link to projects:
{
    “Project”: {
        “relation”: [{”id”: “project-page-id”}]
    }
}

Deploying MCP-Enabled Agent to Cloud Run

Updated Dockerfile

FROM python:3.12-slim
WORKDIR /app
# Install Node.js for MCP server
RUN apt-get update && apt-get install -y \
    gcc \
    curl \
    && curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
    && apt-get install -y nodejs \
    && rm -rf /var/lib/apt/lists/*
# Install Notion MCP server globally (pinned to 1.9.1)
RUN npm install -g @notionhq/notion-mcp-server@1.9.1
# Verify installations
RUN node --version && npm --version
# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy agent code
COPY agent.py .
# Create non-root user
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
USER appuser
# Environment
ENV PORT=8080
ENV HOST=0.0.0.0
EXPOSE 8080
CMD [”python”, “agent.py”]

Deployment with Notion Credentials

# deploy.sh
# Deploy with Notion environment variables
gcloud run deploy project-manager \
    --source=. \
    --region=us-central1 \
    --set-env-vars=NOTION_API_KEY=$NOTION_API_KEY,NOTION_DATABASE_ID=$NOTION_DATABASE_ID,TASKS_DATABASE_ID=$TASKS_DATABASE_ID \
    --memory=1Gi \
    --cpu=1 \
    --timeout=300
echo “✅ Project Manager deployed with Notion MCP integration”

Troubleshooting MCP

Issue 1: MCP Server Won’t Start

Error: TimeoutError: MCP server did not start within 30 seconds

Solutions:

# Increase timeout
connection_params=StdioConnectionParams(
    server_params=server_params,
    timeout=60.0  # Increase to 60 seconds
)
# Verify Node.js is installed
# docker exec -it container bash
# node --version

Issue 2: Notion Authentication Fails

Error: unauthorized

Solutions:

Verify NOTION_API_KEY is correct (starts with secret_)
Ensure databases are shared with integration
Check environment variable name: NOTION_TOKEN for MCP server

Issue 3: Property Not Found

Error: Property "Name" does not exist

Solution: Use dynamic schema discovery!

# Don’t hardcode “Name”
# Instead, discover the actual property name

MCP Version Pinning Considerations

The Problem with Latest Versions

When deploying to cloud environments, you might encounter this issue:

# ❌ DON’T DO THIS in cloud deployment
server_params = StdioServerParameters(
    command=”npx”,
    args=[”-y”, “@notionhq/notion-mcp-server”],  # Downloads latest version!
    env=mcp_env
)

Why this is risky:

npx -y downloads the latest version every time
Version 2.0.0 introduced a UUID reformatting bug
Database IDs like 2ceb1b311231... get reformatted to 2ceb1b31-1231-... with hyphens
This breaks Notion API calls → 404 errors

The Solution: Version Pinning

1. Install specific version in Dockerfile:

# ✅ Pin to known working version
RUN npm install -g @notionhq/notion-mcp-server@1.9.1

2. Use globally installed version:

# ✅ Use the pinned version
server_params = StdioServerParameters(
    command=”notion-mcp-server”,  # Uses globally installed 1.9.1
    args=[],  # No npx needed!
    env=mcp_env
)

Why Version 1.9.1?

Stable: No UUID reformatting bugs
Tested: Works with all Notion database IDs
Reliable: Consistent behavior across deployments
Predictable: Same version every time

Testing Different Versions

To test a new MCP server version before pinning:

# Install specific version locally
npm install -g @notionhq/notion-mcp-server@2.0.0
# Test with your agent
cd agents/project_manager
python test_notion_local.py
# Check logs for errors
# If stable, update Dockerfile version pin

Best Practice: Always pin to specific versions. Only upgrade after thorough testing.

We’ve built all the agents and integrated external tools. Now it’s time to deploy everything to the cloud!

Get ready to go from localhost to the cloud!

Code Repository: https://github.com/Saoussen-CH/ai-creative-studio-adk-a2a-mcp-vertexai-cloudrun

Next: Part 6: Deploying to the Cloud →

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Building Distributed Multi-Agent Systems with Google’s AI Stack: Part 4

Saoussen CHAABNIA — Tue, 13 Jan 2026 09:44:53 GMT

Building Distributed Multi-Agent Systems with Google’s AI Stack series:

Part 1: From Monolithic AI to Distributed Intelligence: Building Your First Multi-Agent System
Part 2: Making Agents Talk: Agent-to-Agent (A2A) Protocol Deep Dive
Part 3: Building the Orchestrator: Coordinating Agents with the AgentTool Pattern
Part 4: Scaling Multi-Agent Workflows: Solving the Token Limit Problem ← You are here
Part 5: External Tool Integration via Model Context Protocol (MCP)
Part 6: Deploying to Cloud: Cloud Run and Vertex AI Agent Engine

Welcome Back!

In Part 3, we built an intelligent orchestrator that coordinates 5 specialist agents. It works beautifully… until it doesn’t.

The Problem

You test a complete 5-agent campaign workflow:

✅ Agent 1 (Brand Strategist): Complete - 2,000 tokens output
✅ Agent 2 (Copywriter): Complete - 2,500 tokens output
✅ Agent 3 (Designer): Complete - 1,800 tokens output
❌ Agent 4 (Critic): Workflow stops!
❌ Agent 5 (Project Manager): Never reached!

What happened? You hit the token output limit.

In this article, we’ll solve this with Lazy Context Compaction; an elegant solution that:

Summarizes older agent outputs intelligently
Preserves recent context quality
Scales workflows to 10+ agents
Reduces token costs

Let’s fix it!

Understanding the Token Limit Problem

What Are Token Limits?

LLMs have two token limits:

Input limit: How much context they can read (e.g., 128K tokens)
Output limit: How much they can generate (e.g., 8,192 tokens)

Our problem is the output limit.

Why Multi-Agent Workflows Hit Limits

User Brief: 200 tokens
↓
Agent 1 Output: 2,000 tokens
Agent 2 Output: 2,500 tokens
Agent 3 Output: 1,800 tokens
-----------------------------------
Orchestrator’s response so far: 6,500 tokens
Agent 4 tries to start...
❌ Would exceed 8,192 token limit!
Workflow stops prematurely.

Why This Happens

The orchestrator presents the full output from each agent to maintain transparency. After 3 agents, it’s already used most of its output budget!

Traditional solutions:

. Increase max_output_tokens → Still fails with more agents

. Summarize everything → Loses important context

. Reduce agent outputs → Loses quality

Our solution: Lazy Context Compaction

What is Lazy Context Compaction?

Lazy Context Compaction is a strategy that:

Compacts only when needed (after N agents)
Summarizes older outputs (saves tokens)
Preserves recent outputs (maintains quality)
Uses LLM for summarization (intelligent compression)

The Strategy

Agents 1-3: Full context preserved
    ↓
After Agent 3: Compaction triggered
    ↓
Agents 1-2: Summarized → ~500 tokens
Agent 3: Full output preserved → ~1,800 tokens
    ↓
Agents 4-5: Execute with room to spare!

Result: Workflow completes successfully with high-quality outputs.

Implementing Context Compaction with ADK

Step 1: Import Required Components

# agents/creative_director/agent.py
from google.adk.apps.llm_event_summarizer import LlmEventSummarizer
from google.adk.apps.app import EventsCompactionConfig
from google.adk.apps import App
from google.adk.models import Gemini

Step 2: Create Summarizer

# Use fast model for summarization (cost-efficient)
summarization_llm = Gemini(model_id=”gemini-2.5-flash”)
summarizer = LlmEventSummarizer(llm=summarization_llm)

Why Gemini Flash?

Fast summarization
Cost-efficient
High-quality summaries
Same model family as main agent

Step 3: Configure Compaction

compaction_config = EventsCompactionConfig(
    summarizer=summarizer,
    compaction_interval=3,  # Summarize after every 3 agents
    overlap_size=1          # Keep most recent agent’s full output
)

Configuration explained:

compaction_interval=3: Compact after 3 agent completions
overlap_size=1: Keep 1 most recent agent full (preserve quality)

Step 4: Wrap Agent in App

def create_creative_director():
    # ... (agent creation code from Part 4) ...
    agent = Agent(
        name=”creative_director”,
        model=”gemini-2.5-flash”,
        tools=agent_tools,
        instruction=system_instruction,
        generate_content_config=GenerateContentConfig(
            max_output_tokens=20000,  # Increased from 8,192
            temperature=0.2
        )
    )
    # Wrap agent in App with compaction config
    app = App(
        name=”creative_director”,
        root_agent=agent,
        events_compaction_config=compaction_config
    )
    logger.info(”✅ App created with lazy context compaction”)
    logger.info(”   Compaction interval: 3 agents”)
    logger.info(”   Overlap size: 1 agent”)
    logger.info(”   Context will be summarized only when necessary”)
    return app

# Create app (not just agent)
root_agent = create_creative_director()

Important: We return an App, not just an Agent!

How It Works: Step by Step

5-Agent Workflow Example

Phase 1: Agents 1–3 (No Compaction)

User: “Create complete Instagram campaign”
Orchestrator announces plan:
“I’ll coordinate our team:
1. Brand Strategist → research
2. Copywriter → posts
3. Designer → visuals
4. Critic → review
5. Project Manager → timeline”
Agent 1 (Brand Strategist) executes:
→ Output: 2,000 tokens (FULL)
→ Total context: 2,000 tokens
Agent 2 (Copywriter) executes:
→ Output: 2,500 tokens (FULL)
→ Total context: 4,500 tokens
Agent 3 (Designer) executes:
→ Output: 1,800 tokens (FULL)
→ Total context: 6,300 tokens

Status: No compaction yet. All outputs preserved.

Phase 2: After Agent 3 (Compaction Triggered)

Compaction interval reached (3 agents)
    ↓
Summarizer analyzes:
- Agent 1 output (2,000 tokens)
- Agent 2 output (2,500 tokens)
    ↓
Creates intelligent summary:
- “Brand Strategist research: [key points] (300 tokens)
- Copywriter posts: [post summaries] (200 tokens)
Total summary: 500 tokens
    ↓
Keeps Agent 3 full (overlap_size=1):
- Designer visuals: [full output] (1,800 tokens)
    ↓
New context size: 500 + 1,800 = 2,300 tokens

Saved: 4,000 tokens! (from 6,300 → 2,300)

Phase 3: Agents 4–5 (With Compacted Context)

Agent 4 (Critic) executes:
→ Context available: 2,300 tokens
→ Has: Summary of research/posts + Full visual concepts
→ Output: 1,500 tokens
→ Total: 3,800 tokens
Agent 5 (Project Manager) executes:
→ Context available: 3,800 tokens
→ Output: 2,000 tokens
→ Total: 5,800 tokens
✅ Workflow completes successfully!
✅ Under 8,192 token limit
✅ All 5 agents executed

Configuration Strategies

Short Workflows (3–5 agents)

compaction_config = EventsCompactionConfig(
    summarizer=summarizer,
    compaction_interval=3,  # Compact after 3 agents
    overlap_size=1          # Keep last 1 full
)

Use when:

3–5 agents total
Moderate output per agent
Quality is critical

Long Workflows (5–10 agents)

compaction_config = EventsCompactionConfig(
    summarizer=summarizer,
    compaction_interval=4,  # Compact after 4 agents
    overlap_size=2          # Keep last 2 full
)

Use when:

5–10 agents total
Need more recent context preserved
Complex interdependencies

Very Long Workflows (10+ agents)

compaction_config = EventsCompactionConfig(
    summarizer=summarizer,
    compaction_interval=5,  # Compact every 5 agents
    overlap_size=2          # Keep last 2 full
)

Use when:

10+ agents total
Very complex workflows
Multiple rounds of compaction needed

Quality-Critical Workflows

compaction_config = EventsCompactionConfig(
    summarizer=summarizer,
    compaction_interval=3,
    overlap_size=2  # Keep last 2 full (more quality)
)

Use when:

Quality > token savings
Later agents need rich context
Acceptable to compact more frequently

Testing Context Compaction

Test Script

# test_context_compaction.py
import asyncio
from creative_director.agent import root_agent
from google.adk import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
async def test_full_workflow():
    “”“Test complete 5-agent workflow with compaction”“”
    brief = “”“
    Create a complete Instagram campaign for EcoFlow smart water bottle.
    Target: Health-conscious millennials (25-34).
    Budget: $5,000. Launch in 2 weeks.
    Include research, posts, visuals, review, and full timeline.
    “”“
    print(”=”*70)
    print(”Testing 5-Agent Workflow with Context Compaction”)
    print(”=”*70)
    print(f”\nBrief: {brief}\n”)
    session_service = InMemorySessionService()
    runner = Runner(
        app_name=”creative_director”,
        agent=root_agent,  # This is now an App, not just Agent
        session_service=session_service
    )
    session_id = “test_compaction”
    user_id = “test_user”
    agent_count = 0
    try:
        await session_service.create_session(
            app_name=”creative_director”,
            user_id=user_id,
            session_id=session_id
        )
        async for event in runner.run_async(
            user_id=user_id,
            session_id=session_id,
            new_message=types.Content(parts=[types.Part(text=brief)])
        ):
            if hasattr(event, ‘text’) and event.text:
                text = event.text
                # Count agent completions
                if “✓” in text and “complete” in text.lower():
                    agent_count += 1
                    print(f”\n[Agent {agent_count} completed]”)
                # Detect compaction
                if “summariz” in text.lower():
                    print(”\n[!] Context compaction triggered”                print(text, end=’‘, flush=True)
        print(f”\n\n{’=’*70}”)
        print(f”✅ Workflow complete!”)
        print(f”   Agents executed: {agent_count}/5”)
        print(f”{’=’*70}”)
        if agent_count == 5:
            print(”✅ SUCCESS: All 5 agents completed (compaction worked!)”)
        else:
            print(f”❌ PARTIAL: Only {agent_count}/5 agents completed”)
    finally:
        await runner.close()

if __name__ == “__main__”:
    asyncio.run(test_full_workflow())

Expected Output

======================================================================
Testing 5-Agent Workflow with Context Compaction
======================================================================
creative_director > I’ll coordinate our team to create your campaign:
1. Brand Strategist → research
2. Copywriter → posts
3. Designer → visuals
4. Critic → review
5. Project Manager → timeline
Let’s begin!
[Agent 1 completed]
✓ Research complete. I received audience insights...
[Agent 2 completed]
✓ Copywriting complete. I received 5 Instagram posts...
[Agent 3 completed]
✓ Design complete. I received image concepts...
[!] Context compaction triggered
[Agent 4 completed]
✓ Review complete. Quality score: 8.5/10...
[Agent 5 completed]
✓ Timeline complete. Project plan created...
======================================================================
✅ Workflow complete!
   Agents executed: 5/5
======================================================================
✅ SUCCESS: All 5 agents completed (compaction worked!)

Token Usage Comparison

Without Compaction

Agent 1: 2,000 tokens output
Agent 2: 2,500 tokens output
Agent 3: 1,800 tokens output
-----------------------------------
Total: 6,300 tokens
Agent 4: ❌ Cannot start (would exceed 8,192)
Result: FAILURE (3/5 agents completed)

With Compaction (interval=3, overlap=1)

Agent 1: 2,000 tokens output
Agent 2: 2,500 tokens output
Agent 3: 1,800 tokens output
Total before compaction: 6,300 tokens
→ Compaction triggered
Agents 1-2 summarized: 500 tokens
Agent 3 preserved: 1,800 tokens
Total after compaction: 2,300 tokens
Agent 4: 1,500 tokens output (2,300 → 3,800 total)
Agent 5: 2,000 tokens output (3,800 → 5,800 total)
-----------------------------------
Final: 5,800 tokens (under 8,192 limit)
Result: ✅ SUCCESS (5/5 agents completed)

Token savings: 500 tokens from compaction Workflow success: 100% (vs 60% without)

Quality Preservation

What Gets Summarized?

The summarizer preserves key information:

Original Agent 1 Output (2,000 tokens):

**Audience Insights:**
Health-conscious millennials (25-34) are increasingly seeking products...
[1,500 words of detailed analysis]
**Competitive Analysis:**
1. Hydro Flask - Established brand with strong loyalty...
[800 words of competitor details]
**Trending Topics:**
1. #SustainableLiving - 2.3M posts, growing 15% monthly...
[700 words of trend analysis]

Summarized Version (300 tokens):

Research Summary: Target audience is health-conscious millennials (25-34)
valuing sustainability and smart features. Main competitors: Hydro Flask
(premium, no tech), S’well (design-focused), HidrateSpark (smart but basic).
Key trends: sustainable living, hydration tracking, minimalist aesthetics.
Opportunity: premium sustainable + smart features gap in market.

Key points preserved:

Target audience demographics
Main competitors identified
Key trends listed
Strategic opportunity highlighted

Details lost:

Full competitor analysis
Detailed trend statistics
Extended audience behaviors

Quality vs Efficiency Trade-off

overlap_size=0: Maximum compression, minimal quality
overlap_size=1: Balanced (recommended)
overlap_size=2: High quality, less compression
overlap_size=3: Maximum quality, minimal compression

Recommendation: Start with overlap_size=1, increase if quality issues arise.

When NOT to Use Compaction

Scenario 1: Short Workflows

# 2-agent workflow
brief = “Research the market and write 3 posts”
# No compaction needed - output is small

Scenario 2: Small Outputs

# Each agent outputs < 500 tokens
# Total for 5 agents: 2,500 tokens
# Well under limit - compaction unnecessary

Scenario 3: Context-Critical Tasks

# Legal document review where every detail matters
# Better to split into multiple sessions than compress

Advanced: Multiple Compaction Rounds

For very long workflows (15+ agents), multiple compaction rounds occur:

Agents 1-3: Full
→ Compaction 1: Agents 1-2 summarized, Agent 3 kept
Agents 4-6: Execute
→ Compaction 2: Agents 1-4 summarized, Agents 5-6 kept
Agents 7-9: Execute
→ Compaction 3: Agents 1-7 summarized, Agents 8-9 kept
... and so on

Each round further compresses older context while preserving recent work.

Troubleshooting

Issue 1: Workflow Still Stops Early

Solution: Reduce compaction_interval:

compaction_interval=2  # Compact more frequently

Issue 2: Quality Degradation

Solution: Increase overlap_size:

overlap_size=2  # Keep more recent context

Issue 3: Too Much Compaction

Solution: Increase compaction_interval:

compaction_interval=4  # Compact less frequently

Cost Analysis

Without Compaction (Failed Workflow)

Agents executed: 3/5
Input tokens: 6,300 (wasted partial context)
Output tokens: 6,300
Cost: ~$0.05 (but incomplete workflow)
Value: $0 (workflow failed)

With Compaction (Successful Workflow)

Agents executed: 5/5 ✅
Input tokens: 8,000 (including summarization)
Output tokens: 5,800
Summarization cost: ~$0.01
Total cost: ~$0.07
Value: Complete campaign delivered ✅

ROI: 40% more cost, but 100% success vs failure!

Our agents can now scale to handle complex workflows. But what about integrating with external services?

In Part 6, we’ll add Model Context Protocol (MCP) integration to the Project Manager agent.

Code Repository: https://github.com/Saoussen-CH/ai-creative-studio-adk-a2a-mcp-vertexai-cloudrun

Next: Part 5: External Tool Integration via MCP →

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Building Distributed Multi-Agent Systems with Google’s AI Stack: Part 3

Saoussen CHAABNIA — Tue, 13 Jan 2026 09:44:51 GMT

Building Distributed Multi-Agent Systems with Google’s AI Stack series:

Part 1: From Monolithic AI to Distributed Intelligence: Building Your First Multi-Agent System
Part 2: Making Agents Talk: Agent-to-Agent (A2A) Protocol Deep Dive
Part 3: Building the Orchestrator: Coordinating Agents with the AgentTool Pattern ← You are here
Part 4: Scaling Multi-Agent Workflows: Solving the Token Limit Problem
Part 5: External Tool Integration via Model Context Protocol (MCP)
Part 6: Deploying to Cloud: Cloud Run and Vertex AI Agent Engine

Welcome Back!

In Part 2, we made our specialist agents accessible via A2A protocol. Now we have:

Brand Strategist (A2A server running)
Copywriter (A2A server running)
Designer (A2A server running)
Critic (A2A server running)
Project Manager (A2A server running)

But there’s a problem: Who coordinates them?

In this article, we’ll build the Creative Director; an intelligent orchestrator that:

Routes requests to the right agents
Creates execution plans before acting
Passes context between agents
Handles errors gracefully
Lets the LLM decide the workflow

This is where the AgentTool pattern shines. Let’s build it!

The Orchestration Challenge

Naive Approach: Hardcoded Workflow

def create_campaign(brief):
    # Always call all agents in fixed order
    research = call_brand_strategist(brief)
    posts = call_copywriter(brief, research)
    visuals = call_designer(posts)
    feedback = call_critic(research, posts, visuals)
    timeline = call_project_manager(brief, feedback)
    return compile_results(research, posts, visuals, feedback, timeline)

Problems:

. Not flexible: user just wants research? Runs all 5 agents anyway

. No intelligence: can’t adapt to different requests

. Error handling is hard: what if copywriter fails?

. Can’t revise : “make the copy more playful” requires code changes

Better Approach: LLM-Driven Routing

What if the LLM decides which agents to call based on the user’s request?

User: “Just do market research for eco water bottles”
→ LLM: Call ONLY brand_strategist
User: “Create complete campaign with timeline”
→ LLM: Call all 5 agents sequentially
User: “Make the copy more playful”
→ LLM: Call copywriter again with feedback

This is the AgentTool pattern!

What is the AgentTool Pattern?

The AgentTool pattern wraps remote A2A agents as callable tools that the orchestrator’s LLM can use.

How It Works

┌─────────────────────────────────────┐
│     Orchestrator (Agent)            │
│                                     │
│  ┌──────────────────────────────┐  │
│  │  Gemini 2.5 Flash (LLM)      │  │
│  │                              │  │
│  │  “I need to call the         │  │
│  │   brand_strategist tool”     │  │
│  └──────────────────────────────┘  │
│              ↓                      │
│  ┌──────────────────────────────┐  │
│  │  AgentTool (Wrapper)         │  │
│  │  - brand_strategist          │  │
│  │  - copywriter                │  │
│  │  - designer                  │  │
│  │  - critic                    │  │
│  │  - project_manager           │  │
│  └──────────────────────────────┘  │
└─────────────────────────────────────┘
              ↓ A2A Protocol
┌─────────────────────────────────────┐
│    Remote A2A Agents (Cloud Run)    │
│                                     │
│  • Brand Strategist                 │
│  • Copywriter                       │
│  • Designer                         │
│  • Critic                           │
│  • Project Manager                  │
└─────────────────────────────────────┘

Key Benefits:

LLM decides which agents to call
Flexible routing — adapt to any request
Reusability — call same agent multiple times
Natural interface — function calling

Building the Creative Director

Step 1: Import Dependencies

# agents/creative_director/agent.py
import os
import logging
from google.adk.agents import Agent
from google.adk.agents.remote_a2a_agent import RemoteA2aAgent
from google.adk.tools.agent_tool import AgentTool
from google.genai.types import GenerateContentConfig
from dotenv import load_dotenv
load_dotenv()
logger = logging.getLogger(”ai_creative_studio.creative_director”)

Step 2: Create Remote A2A Agents

def create_creative_director():
    “”“
    Create the Creative Director orchestrator using AgentTool pattern.
    Features: Dynamic agent list, LLM-driven routing, planning-first.
    “”“
    logger.info(”=”*70)
    logger.info(”Initializing Creative Director Orchestrator”)
    logger.info(”=”*70)
    # Read agent URLs AT RUNTIME from environment variables
    # This is crucial for Vertex AI Agent Engine deployment
    copywriter_url = os.getenv(”COPYWRITER_AGENT_URL”)
    designer_url = os.getenv(”DESIGNER_AGENT_URL”)
    strategist_url = os.getenv(”STRATEGIST_AGENT_URL”)
    critic_url = os.getenv(”CRITIC_AGENT_URL”)
    pm_url = os.getenv(”PM_AGENT_URL”)
    # Build dynamic agent list and tools
    available_agents_list = []
    agent_tools = []
    # Brand Strategist
    if strategist_url:
        available_agents_list.append(
            “- **brand_strategist**: Researches market trends, competitors, and target audience insights”
        )
        strategist_agent = RemoteA2aAgent(
            name=”brand_strategist”,
            description=”Brand strategist for market research, trend analysis, and competitive insights”,
            agent_card=f”{strategist_url}/.well-known/agent.json”
        )
        agent_tools.append(AgentTool(agent=strategist_agent))
        logger.info(f”✅ Configured brand_strategist: {strategist_url}”)
    # Copywriter
    if copywriter_url:
        available_agents_list.append(
            “- **copywriter**: Creates engaging social media captions and copy”
        )
        copywriter_agent = RemoteA2aAgent(
            name=”copywriter”,
            description=”Expert social media copywriter for creating engaging captions and copy”,
            agent_card=f”{copywriter_url}/.well-known/agent.json”
        )
        agent_tools.append(AgentTool(agent=copywriter_agent))
        logger.info(f”✅ Configured copywriter: {copywriter_url}”)
    # Designer
    if designer_url:
        available_agents_list.append(
            “- **designer**: Generates AI image concepts and visual design prompts”
        )
        designer_agent = RemoteA2aAgent(
            name=”designer”,
            description=”Creative visual designer for generating social media image concepts”,
            agent_card=f”{designer_url}/.well-known/agent.json”
        )
        agent_tools.append(AgentTool(agent=designer_agent))
        logger.info(f”✅ Configured designer: {designer_url}”)
    # Critic
    if critic_url:
        available_agents_list.append(
            “- **critic**: Reviews creative work and provides quality feedback”
        )
        critic_agent = RemoteA2aAgent(
            name=”critic”,
            description=”Creative critic for reviewing campaign materials and providing constructive feedback”,
            agent_card=f”{critic_url}/.well-known/agent.json”
        )
        agent_tools.append(AgentTool(agent=critic_agent))
        logger.info(f”✅ Configured critic: {critic_url}”)
    # Project Manager
    if pm_url:
        available_agents_list.append(
            “- **project_manager**: Creates project timelines, tasks, and deliverables”
        )
        pm_agent = RemoteA2aAgent(
            name=”project_manager”,
            description=”Project manager for creating timelines, tasks, and organizing campaign deliverables”,
            agent_card=f”{pm_url}/.well-known/agent.json”
        )
        agent_tools.append(AgentTool(agent=pm_agent))
        logger.info(f”✅ Configured project_manager: {pm_url}”)
    # Format available agents for prompt injection
    if available_agents_list:
        available_agents_text = “\n”.join(available_agents_list)
        logger.info(f”✅ Configured {len(agent_tools)} specialist agents”)
    else:
        available_agents_text = “⚠️ No specialist agents configured. Set agent URLs in environment variables.”
        logger.warning(”⚠️  No specialist agents configured!”)
    # ... (continued in next section)

Key Innovation: Dynamic agent discovery at runtime!

Step 3: The Planning-First Instruction

This is where the magic happens. We give the LLM clear instructions on how to orchestrate:

# Inject dynamic agent list into instruction template
    system_instruction = SYSTEM_INSTRUCTION_TEMPLATE.format(
        available_agents=available_agents_text
    )
    # ... (continued)

The Instruction Template

SYSTEM_INSTRUCTION_TEMPLATE = “”“You are an expert Creative Director AI Orchestrator for social media campaign creation.
**Your Role:**
You interpret campaign requests, create execution plans, and delegate to specialist agents.
You do NOT create content yourself - you manage the specialists who do.
**Your Available Specialist Tools:**
{available_agents}
**Core Directives & Decision Making:**
1. **Understand User Intent & Complexity**
   Carefully analyze the user’s request to determine the core task(s).
   **Request Classification:**
   - **SIMPLE**: “just do market research” → ONE agent needed
   - **COMPLEX**: “create complete campaign” → MULTIPLE agents needed
   **Examples:**
   - “Research eco-friendly water bottle market” → brand_strategist only
   - “Write 5 Instagram captions” → copywriter only
   - “Create complete campaign with timeline” → ALL 5 agents sequentially
2. **Task Planning & Sequencing (CRITICAL - Do This BEFORE Delegating)**
   **Before calling ANY tool**, you MUST:
   - **Outline the complete plan** in your response to the user
   - **Example plan format:**
     “I’ll coordinate our team to create your campaign. Here’s my plan:
     1. **Brand Strategist** will research the market, competitors, and target audience
     2. **Copywriter** will create 5 Instagram posts using those insights
     3. **Designer** will generate image concepts for each post
     4. **Critic** will review all creative work for quality
     5. **Project Manager** will create the project timeline and deliverables
     Let’s begin with the market research!”
3. **Task Delegation & Execution (Executing Your Plan)**
   For each agent in your plan, follow this EXACT sequence:
   **a) CALL** the appropriate tool with complete context
   - Include ALL relevant information from user’s request
   - For sequential tasks, include output from previous agents
   - Be explicit! Remote agents don’t have conversation history
   **b) WAIT** for tool_output
   - **DO NOT** proceed until you receive the complete response
   - **DO NOT** assume what the response will be
   **c) VERIFY** tool_output shows successful completion
   - Check that tool_output contains actual results (not an error)
   - **IF ERROR detected:** Go to step (e)
   - **IF SUCCESS:** Go to step (d)
   **d) CONFIRM** to user with specific details
   - Format: “✓ [Agent] complete. I received [brief summary of actual output]”
   - Examples:
     - “✓ Research complete. I received insights on target audience, 3 competitors, and 5 trending topics”
     - “✓ Copywriting complete. I received 5 Instagram posts with captions and hashtags”
   - **Then announce next step:** “Now moving to [next agent]...”
   **e) IF ERROR - STOP and Report**
   - **STOP the sequence immediately**
   - Report to user: “❌ Error in [Agent]: [exact error message from tool_output]”
   - Explain impact: “Cannot proceed with [next step] without [failed step results]”
   - Ask: “Would you like me to retry [failed agent] or adjust the approach?”
   - **DO NOT** continue to next agent until issue is resolved
4. **CRITICAL Success Verification**
   You **MUST**:
   - Wait for tool_output after EVERY agent tool call
   - Base your decision to proceed ENTIRELY on confirmation from tool_output
   - STOP if ANY tool call fails or produces ambiguous output
   - Report exact failure messages to the user
   You **MUST NOT**:
   - Assume a task was successful
   - Invent success messages
   - Proceed if the previous tool_output shows an error
   - Continue workflow if a critical step failed
   **Only state that a task is complete if the tool_output explicitly shows successful completion.**
5. **Error Handling & Ambiguity Resolution**
   **When a Tool Fails:**
   1. **STOP** the workflow immediately
   2. **Report exact error:** “❌ Error in [Agent]: [exact error message]”
   3. **Explain impact:** “Cannot proceed with [next steps] without [failed step]”
   4. **Offer options:** “Would you like me to retry or adjust?”
   5. **Wait for user decision** before proceeding
6. **Communication with User**
   - **Transparency First:** Always present the complete response from each agent
   - **Progress Updates:** Inform user which agent is currently working
   - **No Hallucination:** NEVER say results are ready unless you received them
   - **Present Full Outputs:** Show the user exactly what each specialist produced
**CRITICAL WORKFLOW COMPLETION REQUIREMENT:**
When you create a plan listing multiple agents (e.g., “I’ll use agents 1, 2, 3, 4, 5”),
you MUST execute EVERY SINGLE agent in that plan. Do NOT stop after 2 or 3 agents -
continue until ALL planned agents have been called and have responded.
“”“

Key Patterns:

Planning-first: Create plan before execution
Verification: Check each step succeeds
Error handling: Stop and report on failure
Context passing: Each agent gets previous outputs

Instruction Design Tips: What Makes a Great Orchestrator

Writing effective orchestrator instructions is an art. Here are battle-tested tips:

1. Be Explicit About Workflow Steps

❌ Vague:

“Call the agents to create a campaign”

✅ Clear:

“Before calling ANY tool, you MUST:
1. Outline the complete plan
2. Execute each step sequentially
3. Verify success before continuing
4. Report exact errors if any step fails”

Why: LLMs need explicit step-by-step instructions. Vague directions lead to skipped steps.

2. Use Imperative Language with Strong Verbs

❌ Weak:

“You should probably check if the task completed”

✅ Strong:

“You MUST verify tool_output shows successful completion”

Magic words: MUST, NEVER, ALWAYS, DO NOT, CRITICAL, STOP

Why: Strong imperatives reduce ambiguity and increase compliance.

3. Provide Concrete Examples

❌ Abstract:

“Classify the request complexity”

✅ Concrete:

“Request Classification:
- SIMPLE: ‘just do market research’ → brand_strategist only
- COMPLEX: ‘create complete campaign’ → ALL 5 agents sequentially”

Why: Examples ground abstract concepts and reduce misinterpretation.

4. Specify Error Behavior Exactly

❌ Unclear:

“Handle errors appropriately”

✅ Precise:

“When a Tool Fails:
1. STOP the workflow immediately
2. Report: ‘❌ Error in [Agent]: [exact error message]’
3. Explain: ‘Cannot proceed with [next step]’
4. Ask: ‘Would you like me to retry?’
5. WAIT for user decision”

Why: Precise error handling prevents the LLM from “creative” error recovery that makes things worse.

5. Prevent Hallucination with Negative Instructions

❌ Allowing hallucination:

“Summarize the results”

✅ Preventing hallucination:

“You MUST NOT:
- Assume a task was successful
- Invent success messages like ‘Research complete’
- Proceed if tool_output shows an error
- Summarize or filter error messages
ONLY state a task is complete if tool_output explicitly shows success.”

Why: LLMs tend to fill gaps with plausible-sounding content. Negative instructions prevent this.

6. Use Formatting for Emphasis

“**CRITICAL WORKFLOW COMPLETION REQUIREMENT:**
When you create a plan listing multiple agents,
you MUST execute EVERY SINGLE agent in that plan.
Do NOT stop after 2 or 3 agents.”

Techniques:

Bold for critical points
ALL CAPS for emphasis
Numbered lists for sequences
Bullet points for options

Why: Visual hierarchy helps LLMs prioritize instructions.

7. Include Verification Checkpoints

“**Workflow checklist before finishing:**
- ✓ Did I announce a plan with N agents?
- ✓ Have I called ALL N agents from my plan?
- ✓ Did each agent respond successfully?
- ✓ Am I presenting complete results from ALL agents?
If you cannot answer YES to all, DO NOT finish.”

Why: Explicit checkpoints catch premature completions.

8. Design for Revision Loops

“**Revision Workflow:**
If Critic returns ‘Status: NEEDS_REVISION’:
1. Announce to user what needs improvement
2. Call the relevant agent (copywriter/designer) with:
   - Original brief
   - First version
   - Critic’s exact feedback
3. Maximum 1 revision per agent (prevent infinite loops)
4. Proceed to next step with revised version”

Why: Structured revision logic ensures quality without cost explosion.

9. Set Clear Role Boundaries

“**Your Role:**
You do NOT create content yourself - you manage specialists.
**DO:**
- Interpret requests
- Create execution plans
- Delegate to specialists
- Verify outputs
- Handle errors
**DO NOT:**
- Write campaign copy
- Create visual concepts
- Generate research insights”

Why: Clear boundaries prevent the orchestrator from doing specialist work.

10. Test with Edge Cases

After writing instructions, test with:

✅ “Research coffee market” (simple, 1 agent)
✅ “Create complete campaign” (complex, all 5 agents)
✅ “Make posts more professional” (revision, context required)
❌ Simulate agent failure (does it stop gracefully?)
❌ Ambiguous request (does it ask for clarification?)

Pro Tip: Add example patterns directly in the instruction to show expected behavior!

Common Instruction Mistakes to Avoid

Step 4: Create the Agent

# Create orchestrator using Agent (not LlmAgent) with AgentTools
    generation_config = GenerateContentConfig(
        max_output_tokens=20000,  # Increased to support full 5-agent workflows
        temperature=0.2,  # Lower temperature for consistent execution
    )
    agent = Agent(
        name=”creative_director”,
        model=”gemini-2.5-flash”,
        description=”Creative Director orchestrator with lazy context compaction”,
        instruction=system_instruction,
        tools=agent_tools,  # 🔧 AgentTools! LLM can call these as tools
        generate_content_config=generation_config,
    )
    logger.info(”✅ Agent created successfully”)
    logger.info(”=”*70)
    return agent

# Create root_agent for deployment
root_agent = create_creative_director()

Important: We use Agent (not LlmAgent) because we’re using tools!

Testing the Orchestrator Locally

Setup

# In .env file
STRATEGIST_AGENT_URL=http://localhost:8082
COPYWRITER_AGENT_URL=http://localhost:8083
DESIGNER_AGENT_URL=http://localhost:8084
CRITIC_AGENT_URL=http://localhost:8085
PM_AGENT_URL=http://localhost:8086
GOOGLE_API_KEY=your_api_key

Start All Agent Servers

Terminal 1:

cd agents/brand_strategist
python agent.py  # Runs on 8082

Terminal 2:

cd agents/copywriter
PORT=8083 python agent.py

Terminal 3:

cd agents/designer
PORT=8084 python agent.py

Terminal 4:

cd agents/critic
PORT=8085 python agent.py

Terminal 5:

cd agents/project_manager
PORT=8086 python agent.py

Test the Orchestrator

# test_orchestrator_local.py
import asyncio
from creative_director.agent import root_agent
from google.adk import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
async def test_simple_request():
    “”“Test with simple request - should call only 1 agent”“”
    brief = “Research the market for eco-friendly smart water bottles”
    print(”🎬 Testing Simple Request (1 agent expected)\n”)
    print(f”Brief: {brief}\n”)
    session_service = InMemorySessionService()
    runner = Runner(
        app_name=”creative_director”,
        agent=root_agent,
        session_service=session_service
    )
    session_id = “test_simple”
    user_id = “test_user”
    try:
        await session_service.create_session(
            app_name=”creative_director”,
            user_id=user_id,
            session_id=session_id
        )
        async for event in runner.run_async(
            user_id=user_id,
            session_id=session_id,
            new_message=types.Content(parts=[types.Part(text=brief)])
        ):
            if hasattr(event, ‘text’) and event.text:
                print(event.text, end=’‘, flush=True)
        print(”\n\n✅ Simple request complete!”)
    finally:
        await runner.close()

async def test_complex_request():
    “”“Test with complex request - should call all 5 agents”“”
    brief = “”“
    Create a complete Instagram campaign for EcoFlow smart water bottle.
    Target: Health-conscious millennials (25-34).
    Budget: $5,000. Launch in 2 weeks.
    Include full campaign with timeline and tasks.
    “”“
    print(”\n” + “=”*70)
    print(”🎬 Testing Complex Request (5 agents expected)”)
    print(”=”*70 + “\n”)
    print(f”Brief: {brief}\n”)
    session_service = InMemorySessionService()
    runner = Runner(
        app_name=”creative_director”,
        agent=root_agent,
        session_service=session_service
    )
    session_id = “test_complex”
    user_id = “test_user”
    try:
        await session_service.create_session(
            app_name=”creative_director”,
            user_id=user_id,
            session_id=session_id
        )
        async for event in runner.run_async(
            user_id=user_id,
            session_id=session_id,
            new_message=types.Content(parts=[types.Part(text=brief)])
        ):
            if hasattr(event, ‘text’) and event.text:
                print(event.text, end=’‘, flush=True)
        print(”\n\n✅ Complex request complete!”)
    finally:
        await runner.close()

async def main():
    # Test simple request
    await test_simple_request()
    # Test complex request
    await test_complex_request()

if __name__ == “__main__”:
    asyncio.run(main())

Expected Output

Simple Request:

🎬 Testing Simple Request (1 agent expected)

Brief: Research the market for eco-friendly smart water bottles
creative_director > I’ll help you research the eco-friendly smart water bottle market.
Let me use our Brand Strategist to gather market insights.
**Audience Insights:**
[Research results from Brand Strategist...]
**Competitive Analysis:**
[Competitor analysis...]
**Trending Topics:**
[Current trends...]
✓ Research complete. I received insights on target audience, 3 main competitors, and 5 trending topics.
✅ Simple request complete!

Complex Request:

🎬 Testing Complex Request (5 agents expected)
creative_director > I’ll coordinate our team to create your complete Instagram campaign. Here’s my plan:
1. **Brand Strategist** will research the market, competitors, and target audience
2. **Copywriter** will create 5 Instagram posts using those insights
3. **Designer** will generate image concepts for each post
4. **Critic** will review all creative work for quality
5. **Project Manager** will create the project timeline and deliverables
Let’s begin with the market research!
[Calls brand_strategist...]
✓ Research complete. I received audience insights, competitive analysis, and trending topics.
Now moving to copywriting...
[Calls copywriter...]
✓ Copywriting complete. I received 5 Instagram posts with captions and hashtags.
Now creating visual concepts...
[Calls designer...]
✓ Design complete. I received image concepts for all 5 posts.
Now getting quality review...
[Calls critic...]
✓ Review complete. Quality score: 8.5/10
Finally, creating project timeline...
[Calls project_manager...]
✓ Timeline complete. Project plan created with tasks.
Here’s your complete campaign:
[Full campaign output...]
✅ Complex request complete!

How the LLM Decides

The orchestrator’s LLM analyzes the user’s request and decides which tools to call:

Decision Tree

User Request
    ↓
Analyze keywords & intent
    ↓
┌────────────────────┬────────────────────┐
│                    │                    │
“research”           “complete campaign”  “make it more playful”
“just write”         “full package”       “try different visuals”
“review this”        “with timeline”      “revise the copy”
│                    │                    │
↓                    ↓                    ↓
Call 1 agent         Call all 5           Call 1 agent again
                     sequentially         (revision)

Example Requests and Routing

Context Passing Between Agents

The orchestrator passes context from one agent to the next:

# Pseudo-code of what the LLM does
# Step 1: Call brand_strategist
strategist_output = call_tool(
    name=”brand_strategist”,
    query=user_brief
)
# Step 2: Call copywriter with strategist’s output
copywriter_output = call_tool(
    name=”copywriter”,
    query=f”{user_brief}\n\nResearch Insights:\n{strategist_output}”
)
# Step 3: Call designer with copywriter’s output
designer_output = call_tool(
    name=”designer”,
    query=f”Create visuals for these posts:\n{copywriter_output}”
)
# And so on...

Key point: Each agent receives relevant context from previous agents!

Error Handling Example

What happens when an agent fails?

# Scenario: Copywriter fails
User: “Create complete campaign”
Orchestrator: “I’ll coordinate our team... [shows plan]”
[Calls brand_strategist - SUCCESS]
Orchestrator: “✓ Research complete...”
[Calls copywriter - FAILS]
Orchestrator: “❌ Error in copywriter: Connection timeout
Cannot proceed with designer and critic without the social media posts from the copywriter.
Would you like me to:
1. Retry the copywriter
2. Skip copywriting and continue with what we have
3. Abort the workflow
Please let me know how you’d like to proceed.”
[STOPS - waits for user input]

The workflow stops gracefully and asks the user what to do!

Advantages of the AgentTool Pattern

1. Flexibility

# Same orchestrator handles different requests:
“Just research” → Calls 1 agent
“Complete campaign” → Calls 5 agents
“Make it playful” → Calls copywriter again
“Add more visuals” → Calls designer again

2. Reusability

# Call same agent multiple times for revisions
User: “Create posts”
→ [Calls copywriter]
User: “Make them more professional”
→ [Calls copywriter again with feedback]
User: “Add more CTAs”
→ [Calls copywriter third time]

3. Natural Error Recovery

# LLM can handle errors intelligently
If critic fails:
→ LLM decides whether to retry or skip
→ Can adjust plan on the fly
→ Asks user for guidance when needed

Dynamic Agent Discovery Benefits

Our orchestrator discovers agents at runtime:

# Local development
STRATEGIST_AGENT_URL=http://localhost:8082
# Production
STRATEGIST_AGENT_URL=https://brand-strategist-xxx.run.app

Benefits:

Environment-agnostic: No code
Graceful degradation: Missing agents just aren’t listed
Easy updates: Change URLs without redeploying
Testing: Point to test vs production agents

Using adk web for Interactive Testing

Test the orchestrator interactively:

cd agents/creative_director
adk web --log_level DEBUG

Open

http://localhost:8000

and try different requests:

“Research eco water bottles”
“Create 3 Instagram posts”
“Complete campaign with timeline”
“Make the copy more playful”

Watch the LLM decide which agents to call!

Common Patterns

Pattern 1: Research Only

User: “Research the market for sustainable fashion”
→ Orchestrator calls: brand_strategist
→ Returns: Research insights only

Pattern 2: Content Creation

User: “Write 5 TikTok scripts for a coffee brand”
→ Orchestrator might call:
  1. brand_strategist (quick trend check)
  2. copywriter (create scripts)
→ Returns: 5 TikTok scripts

Pattern 3: Complete Campaign

User: “Create full Instagram campaign with timeline”
→ Orchestrator calls all 5 agents:
  1. brand_strategist → research
  2. copywriter → posts
  3. designer → visuals
  4. critic → review
  5. project_manager → timeline
→ Returns: Complete campaign package

Pattern 4: Iterative Refinement

User: “Create 3 posts”
→ [Orchestrator calls copywriter]
User: “Make them more formal”
→ [Orchestrator calls copywriter again with feedback]
User: “Add more hashtags”
→ [Orchestrator calls copywriter third time]

Pattern 5: Critic Revision Workflow (Quality Improvement Loop)

User: “Create complete campaign for luxury watches”
→ Orchestrator plans:
  1. brand_strategist → market research
  2. copywriter → create posts
  3. designer → create visuals
  4. critic → review everything
  5. [Automatic revisions if needed] ← KEY!
  6. project_manager → timeline
→ Execution:
  Step 1-3: Complete successfully
  Step 4: Critic reviews and returns:
    **POSTS REVIEW:**
    - Score: 6/10
    - Status: NEEDS_REVISION
    - Issue: Tone too casual for luxury audience
    **VISUALS REVIEW:**
    - Score: 8/10
    - Status: APPROVED ✓
  Step 5: Orchestrator sees “NEEDS_REVISION”
    → Announces to user: “Critic identified improvements needed”
    → Calls copywriter again with:
      - Original brief
      - First version of posts
      - Critic’s specific feedback
    → Copywriter creates revised posts
  Step 6: Project Manager receives:
    - Revised (approved) posts ✓
    - Approved visuals ✓
    - Complete campaign ready!
→ Returns: High-quality campaign with automatic QA

How it works:

Critic provides structured feedback with Status: APPROVED or NEEDS_REVISION
Orchestrator parses the feedback automatically
If revision needed, orchestrator calls relevant agent with critic’s feedback
Maximum 1 revision per agent (prevents infinite loops)
Only quality-approved deliverables reach Project Manager

Why this matters:

Built-in quality assurance
No manual intervention needed
Consistent quality standards
Prevents flawed work from reaching final output
Cost-efficient (max 1 revision)

Agent mapping:

Posts need revision → Call copywriter with feedback
Visuals need revision → Call designer with feedback
Both need revision → Call both agents sequentially

This revision workflow ensures every campaign meets quality standards before delivery!

We have a working orchestrator, but there’s a problem: token limits.

When running all 5 agents, the context can exceed the model’s token limit, causing the workflow to stop prematurely. In Part 4, we’ll solve this with Lazy Context Compaction.

Code Repository: https://github.com/Saoussen-CH/ai-creative-studio-adk-a2a-mcp-vertexai-cloudrun

Next: Part 4: Scaling with Context Compaction →

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Building Distributed Multi-Agent Systems with Google’s AI Stack: Part 2

Saoussen CHAABNIA — Tue, 13 Jan 2026 09:44:47 GMT

Building Distributed Multi-Agent Systems with Google’s AI Stack series:

Part 1: From Monolithic AI to Distributed Intelligence: Building Your First Multi-Agent System
Part 2: Making Agents Talk: Agent-to-Agent (A2A) Protocol Deep Dive ← You are here
Part 3: Building the Orchestrator: Coordinating Agents with the AgentTool Pattern
Part 4: Scaling Multi-Agent Workflows: Solving the Token Limit Problem
Part 5: External Tool Integration via Model Context Protocol (MCP)
Part 6: Deploying to Cloud: Cloud Run and Vertex AI Agent Engine

Welcome Back!

In Part 1, we built three specialist agents that run locally. But here’s the problem: they can’t talk to each other yet.

In this article, we’ll solve that using the Agent-to-Agent (A2A) Protocol, and I’ll share the KEY TRICKS that makes A2A work seamlessly in both local development and Cloud Run deployment.

What you’ll learn:

A2A protocol fundamentals
Creating A2A servers
The dual configuration pattern
Testing A2A endpoints
Agent card creation

Let’s make our agents communicate!

The Communication Challenge

Currently, our agents are standalone Python processes:

[Brand Strategist] ← Can’t talk to each other
[Copywriter]       ← Running separately
[Designer]         ← No communication

We need them to communicate over HTTP so we can:

Deploy to different servers
Scale independently
Use standardized protocols
Test in isolation

Enter A2A Protocol.

What is A2A Protocol?

A2A (Agent-to-Agent) is a standardized protocol for agent communication developed by Google.

Key Features

JSONRPC 2.0 based: Standard, well-understood format
Agent cards: Discoverable metadata at /.well-known/agent.jsonStateless: Each request is independent
HTTP/HTTPS: Works anywhere
Language-agnostic: Any language can implement

Message Format

Request:

{
  “jsonrpc”: “2.0”,
  “id”: 1,
  “method”: “agent/invoke”,
  “params”: {
    “prompt”: “Research eco-friendly water bottles...”
  }
}

Response:

{
  “jsonrpc”: “2.0”,
  “id”: 1,
  “result”: {
    “content”: “**Audience Insights:**\n...”
  }
}

Agent Card

Every A2A agent exposes metadata at /.well-known/agent.json:

{
  “name”: “brand_strategist”,
  “description”: “Market research and trend analysis”,
  “rpc_url”: “https://brand-strategist-xxx.run.app”,
  “capabilities”: [”research”, “analysis”]
}

Creating an A2A Server (Simple Approach)

Let’s convert our Brand Strategist to an A2A server using ADK’s built-in to_a2a:

# agents/brand_strategist/agent.py
from google.adk.agents import Agent
from google.adk.tools import google_search
import os
# ... (agent creation code from Part 2) ...

if __name__ == “__main__”:
    import uvicorn
    from google.adk.a2a.utils.agent_to_a2a import to_a2a

    PORT = int(os.getenv(”PORT”, “8082”))
    HOST = os.getenv(”HOST”, “0.0.0.0”)

    # Convert agent to A2A application
    a2a_app = to_a2a(root_agent, host=HOST, port=PORT, protocol=”http”)

    # Start server
    print(f”🚀 Starting Brand Strategist A2A Server on http://{HOST}:{PORT}”)
    print(f”📋 Agent card: http://{HOST}:{PORT}/.well-known/agent.json”)
    uvicorn.run(a2a_app, host=HOST, port=PORT)

Run it:

python agent.py

Output:

🚀 Starting Brand Strategist A2A Server on http://0.0.0.0:8082
📋 Agent card: http://0.0.0.0:8082/.well-known/agent.json

Testing the A2A Endpoint

Test 1: Agent Card

curl http://localhost:8082/.well-known/agent.json

Response:

{
  “name”: “brand_strategist”,
  “description”: “Brand strategist for market research...”,
  “rpc_url”: “http://localhost:8082”
}

Test 2: Invoke Agent

curl -X POST http://localhost:8082/ \
  -H “Content-Type: application/json” \
  -d ‘{
    “jsonrpc”: “2.0”,
    “id”: 1,
    “method”: “agent/invoke”,
    “params”: {
      “prompt”: “Research the smart water bottle market”
    }
  }’

Response:

{
  “jsonrpc”: “2.0”,
  “id”: 1,
  “result”: {
    “content”: “**Audience Insights:**\n[Research results...]”
  }
}

Testing with A2A Inspector

While curl works for basic testing, there’s a much better tool: A2A Inspector, a web-based debugging tool specifically designed for A2A agents.

What is A2A Inspector?

A2A Inspector is an open-source tool that:

Connects to A2A agents (local or cloud)
Shows agent cards with full metadata
Sends test queries with a visual interface
Displays JSONRPC messages (request/response)
Validates A2A protocol compliance

Installing A2A Inspector

# Clone the repository
git clone https://github.com/a2aproject/a2a-inspector.git ~/a2a-inspector
cd ~/a2a-inspector

# Install dependencies
npm install
cd frontend && npm install && cd ..# Start the inspector
bash scripts/run.sh

The inspector will start at:

http://localhost:5001

Using A2A Inspector

Step 1: Open the inspector

http://localhost:5001

Step 2: Connect to your agent

Enter agent URL:

http://localhost:8082

Click “Connect”

Step 3: View the agent card

The inspector automatically fetches and displays:

{
  “name”: “brand_strategist”,
  “description”: “Brand strategist for market research...”,
  “protocol”: “a2a”,
  “version”: “1.0”,
  “capabilities”: {
    “streaming”: true
  },
  “endpoints”: {
    “query”: “/query”
  }
}

Step 4: Send test queries

Use the visual interface to send queries:

Query: "Research the eco-friendly water bottle market"
Click “Send”

Step 5: View JSONRPC messages

The inspector shows both request and response:

Request:

{
  “jsonrpc”: “2.0”,
  “method”: “query”,
  “params”: {
    “query”: “Research the eco-friendly water bottle market”
  },
  “id”: 1
}

Response:

{
  “jsonrpc”: “2.0”,
  “result”: {
    “content”: “**Target Audience Insights:**\n\nGen Z (18-25)...”
  },
  “id”: 1
}

Why Use A2A Inspector?

vs curl:

curl: Manual JSON formatting, hard to read responses
Inspector: Visual interface, formatted display

Benefits:

Debug protocol issues: See exact JSONRPC messages
Test faster: No typing JSON by hand
Validate compliance: Ensures your agent follows A2A spec
Test cloud agents: Works with Cloud Run URLs too

Testing Cloud Agents:

After deployment (covered in Part 6), you can test cloud agents:

Agent URL: https://brand-strategist-xxx.run.app

The inspector works identically with cloud URLs!

⚠ The Local vs Cloud Run Challenge

When we deploy to Cloud Run, we hit a critical issue:

The Problem

Local environment:

Server listens on: 0.0.0.0:8082
Agent card should advertise:

http://localhost:8082

Cloud Run environment:

Server listens on: 0.0.0.0:8080 (internal)
Cloud Run routes external 443 → internal 8080
Agent card should advertise:

https://brand-strategist-xxx.run.app:443

If we hardcode the URL, it won’t work in both environments!

The Solution: Dual Configuration Pattern

This is our KEY TRICK, separating listening configuration from public configuration.

The Pattern

# agents/brand_strategist/agent.py

if __name__ == “__main__”:
    import uvicorn
    from google.adk.a2a.utils.agent_to_a2a import to_a2a

    # === LISTENING CONFIGURATION (Internal) ===
    # Where the server binds and listens
    PORT = int(os.getenv(”PORT”, “8082”))
    HOST = os.getenv(”HOST”, “0.0.0.0”)

    # === PUBLIC CONFIGURATION (External) ===
    # What the agent card advertises
    PUBLIC_HOST = os.getenv(”PUBLIC_HOST”, “localhost”)
    PUBLIC_PORT = int(os.getenv(”PUBLIC_PORT”, str(PORT)))
    PROTOCOL = os.getenv(”PROTOCOL”, “http”)

    # Create A2A app with PUBLIC configuration
    a2a_app = to_a2a(
        root_agent,
        host=PUBLIC_HOST,      # ← Goes in agent card
        port=PUBLIC_PORT,      # ← Goes in agent card
        protocol=PROTOCOL      # ← Goes in agent card
    )
    # Run server with LISTENING configuration
    print(f”🚀 Starting on {PROTOCOL}://{HOST}:{PORT}”)
    print(f”🌐 Public URL: {PROTOCOL}://{PUBLIC_HOST}:{PUBLIC_PORT}”)

    uvicorn.run(a2a_app, host=HOST, port=PORT)

Local Configuration

Create .env in agents/brand_strategist/:

# Listening configuration
HOST=0.0.0.0
PORT=8082

# Public configuration (for agent card)
PUBLIC_HOST=localhost
PUBLIC_PORT=8082
PROTOCOL=http

Cloud Run Configuration

Set during deployment (automatically):

# Listening configuration
HOST=0.0.0.0
PORT=8080

# Public configuration (updated after deployment)
PUBLIC_HOST=brand-strategist-xxx.us-central1.run.app
PUBLIC_PORT=443
PROTOCOL=https

How It Works

Step 1: Agent Card Creation

When to_a2a() is called with PUBLIC_HOST, PUBLIC_PORT, and PROTOCOL:

a2a_app = to_a2a(
    root_agent,
    host=”brand-strategist-xxx.run.app”,  # PUBLIC_HOST
    port=443,                               # PUBLIC_PORT
    protocol=”https”                        # PROTOCOL
)

The agent card is created with the public URL:

{
  “name”: “brand_strategist”,
  “rpc_url”: “https://brand-strategist-xxx.run.app:443”
}

Step 2: Server Listening

But uvicorn listens on the internal address:

uvicorn.run(
    a2a_app,
    host=”0.0.0.0”,  # HOST - internal listening
    port=8080         # PORT - internal listening
)

Step 3: Cloud Run Routing

Cloud Run automatically routes:

External requests to

https://service-xxx.run.app:443

→ Internal server at 0.0.0.0:8080

The Magic

Agent card advertises the public URL (what clients use)
Server listens on the internal address (what Cloud Run expects)
Same code works in both environments!

Benefits of This Pattern

Environment-agnostic code: No changes between local and cloud
Clean separation: Listening vs public configuration
Secure by default: Internal ports not exposed
Standard ADK tools: Uses to_a2a without modifications
Easy testing: Local URLs for development, production URLs for deployment

Complete Example: Brand Strategist with Dual Configuration

# agents/brand_strategist/agent.py

import logging
import datetime
import os
from google.adk.agents import Agent
from google.adk.tools import google_search
from dotenv import load_dotenv

load_dotenv()

logger = logging.getLogger(”ai_creative_studio.brand_strategist”)

SYSTEM_INSTRUCTION = f”“”You are a Brand Strategist...
[Full instruction from Part 2]
“”“

root_agent = Agent(
    name=”brand_strategist”,
    model=”gemini-2.5-flash”,
    instruction=SYSTEM_INSTRUCTION,
    description=”Brand strategist for market research, trend analysis, and competitive insights”,
    tools=[google_search]
)
logger.info(”Brand Strategist agent created successfully”)

if __name__ == “__main__”:
    import uvicorn
    from google.adk.a2a.utils.agent_to_a2a import to_a2a

    # LISTENING CONFIGURATION (where server binds)
    PORT = int(os.getenv(”PORT”, “8082”))
    HOST = os.getenv(”HOST”, “0.0.0.0”)

    # PUBLIC CONFIGURATION (what agent card advertises)
    # In Cloud Run: PUBLIC_HOST is the full domain, PUBLIC_PORT is 443
    PUBLIC_HOST = os.getenv(”PUBLIC_HOST”, “localhost”)
    PUBLIC_PORT = int(os.getenv(”PUBLIC_PORT”, str(PORT)))
    PROTOCOL = os.getenv(”PROTOCOL”, “http”)

    # Convert agent to A2A application with PUBLIC info
    a2a_app = to_a2a(root_agent, host=PUBLIC_HOST, port=PUBLIC_PORT, protocol=PROTOCOL)

    # Start server on INTERNAL host and port
    logger.info(f”🚀 Starting Brand Strategist A2A Server on {PROTOCOL}://{HOST}:{PORT}”)
    logger.info(f”📋 Agent card: {PROTOCOL}://{HOST}:{PORT}/.well-known/agent-card.json”)
    logger.info(f”🌐 Public URL: {PROTOCOL}://{PUBLIC_HOST}:{PUBLIC_PORT}”)
    uvicorn.run(a2a_app, host=HOST, port=PORT)

Testing Locally with Dual Configuration

1. Create .env

# agents/brand_strategist/.env
HOST=0.0.0.0
PORT=8082
PUBLIC_HOST=localhost
PUBLIC_PORT=8082
PROTOCOL=http

2. Run the server

cd agents/brand_strategist
python agent.py

3. Check the agent card

curl http://localhost:8082/.well-known/agent.json

Response shows localhost (correct for local):

{
  “name”: “brand_strategist”,
  “rpc_url”: “http://localhost:8082”
}

4. Test invocation

curl -X POST http://localhost:8082/ \
  -H “Content-Type: application/json” \
  -d ‘{
    “jsonrpc”: “2.0”,
    “id”: 1,
    “method”: “agent/invoke”,
    “params”: {”prompt”: “Research smart water bottles”}
  }’

Works perfectly with localhost URLs!

Cloud Run Configuration (Preview)

When deployed to Cloud Run, the deployment script will:

# 1. Deploy service
gcloud run deploy brand-strategist --source=. --region=us-central1

# 2. Get the public URL
SERVICE_URL=$(gcloud run services describe brand-strategist \
  --region=us-central1 \
  --format=’value(status.url)’)

# 3. Extract hostname
PUBLIC_HOST=$(echo $SERVICE_URL | sed ‘s|https://||’ | sed ‘s|/||’)

# 4. Update environment variables
gcloud run services update brand-strategist \
  --region=us-central1 \
  --update-env-vars=PUBLIC_HOST=$PUBLIC_HOST,PUBLIC_PORT=443,PROTOCOL=https

Agent card will then show:

{
  “name”: “brand_strategist”,
  “rpc_url”: “https://brand-strategist-xxx.us-central1.run.app:443”
}

Same code, different configuration!

Apply to All Agents

Update Copywriter, Designer, Critic, and Project Manager with the same pattern:

agents/
├── brand_strategist/
│   ├── agent.py          # ✅ With dual configuration
│   └── .env              # ✅ Local config
├── copywriter/
│   ├── agent.py          # ← Apply pattern
│   └── .env              # ← Add config
├── designer/
│   ├── agent.py          # ← Apply pattern
│   └── .env              # ← Add config
├── critic/
│   ├── agent.py          # ← Apply pattern
│   └── .env              # ← Add config
└── project_manager/
    ├── agent.py          # ← Apply pattern
    └── .env              # ← Add config

A2A Clients: The Other Half

So far we’ve built A2A servers (the specialist agents). But how do we actually call them from code?

The A2A Client Side

While we’ve tested with curl and A2A Inspector, production systems need to call agents programmatically:

# How do we call our A2A agents from another agent?
# How does the orchestrator invoke specialists?

Answer: ADK provides RemoteA2aAgent — a client for calling A2A servers.

Brief Example (Full Details in Part 3)

from google.adk.agents.remote_a2a_agent import RemoteA2aAgent

# Create a client for the Brand Strategist
strategist = RemoteA2aAgent(
    name=”brand_strategist”,
    description=”Brand strategist for market research”,
    agent_card=”http://localhost:8082/.well-known/agent.json”
)

# Call the agent (from orchestrator code)
result = await strategist.invoke(”Research eco-friendly water bottles”)

What We’ve Covered vs What’s Next

This article (Part 2):

A2A servers (creating specialist agents)
A2A protocol and JSONRPC
Testing with curl and A2A Inspector
Dual configuration pattern

Next article (Part 3):

A2A clients (RemoteA2aAgent)
Building the orchestrator
AgentTool pattern
Coordinating multiple agents

We’ll dive deep into A2A clients when we build the orchestrator!

A2A Protocol Benefits Recap

Standardized: JSONRPC 2.0, widely supported
Discoverable: Agent cards expose metadata
Stateless: No session management complexity
HTTP-based: Works with existing infrastructure
Scalable: Deploy agents independently
Testable: Curl, Postman, custom clients
Language-agnostic: Implement in any language

Code Repository: https://github.com/Saoussen-CH/ai-creative-studio-adk-a2a-mcp-vertexai-cloudrun

Next: Part 3: Building the Orchestrator →

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Building Distributed Multi-Agent Systems with Google’s AI Stack Part 1

Saoussen CHAABNIA — Tue, 13 Jan 2026 09:44:25 GMT

Building Distributed Multi-Agent Systems with Google’s AI Stack series:

Part 1: From Monolithic AI to Distributed Intelligence: Building Your First Multi-Agent System ← You are here
Part 2: Making Agents Talk: Agent-to-Agent (A2A) Protocol Deep Dive
Part 3: Building the Orchestrator: Coordinating Agents with the AgentTool Pattern
Part 4: Scaling Multi-Agent Workflows: Solving the Token Limit Problem
Part 5: External Tool Integration via Model Context Protocol (MCP)
Part 6: Deploying to Cloud: Cloud Run and Vertex AI Agent Engine

Introduction

Imagine you’re building an AI system to create complete social media campaigns. Your agent needs to:

Research market trends and competitors
Write engaging social media copy
Generate visual design concepts
Review quality and provide feedback
Create project timelines and tasks

You could build a single, monolithic AI agent to do all of this. But should you?

In this 6-part series, I’ll show you why the answer is no — and demonstrate how to build a distributed multi-agent system using Google’s AI stack. We’ll explore:

Google Agent Development Kit (ADK) for building agents
Agent-to-Agent (A2A) Protocol for communication
Model Context Protocol (MCP) for external tool integration
Vertex AI Agent Engine for managed orchestration
Cloud Run for scalable agent deployment

By the end of this series, you’ll have learned from a real system that generates complete social media campaigns — and you’ll be able to apply these patterns to your own projects.

Part 1: Why Multi-Agent Systems Matter

The Problem with Monolithic AI Agents

Single Agent Approach

class MonolithicCampaignAgent:
    def create_campaign(self, brief):
        # Research the market
        research = self.research_market(brief
)        # Write social media posts
        posts = self.write_posts(research)
        # Generate visual concepts
        visuals = self.design_visuals(posts)
        # Review quality
        feedback = self.review_quality(posts, visuals)
        # Create timeline
        timeline = self.create_timeline(feedback)
        return {
            ‘research’: research,
            ‘posts’: posts,
            ‘visuals’: visuals,
            ‘feedback’: feedback,
            ‘timeline’: timeline
        }

This looks clean, but it has serious problems:

Problem 1: Lack of Separation of Concerns

All functionality lives in one agent. A bug in the research logic can affect the entire system. Changes to the visual generation require redeploying everything.

Problem 2: No Independent Scaling

Need more copywriting capacity? You have to scale the entire agent, including the expensive research and visual generation components.

Problem 3: Prompt Complexity

Your system instruction becomes a massive document trying to teach one LLM how to:

Research like a market analyst
Write like a copywriter
Design like a visual artist
Review like a creative director
Plan like a project manager

The result? A confused agent that’s mediocre at everything.

Problem 4: Limited Flexibility

Want to use the copywriter for a different project? You can’t — it’s tightly coupled to the campaign workflow.

Problem 5: Testing Nightmare

How do you test just the visual generation? You can’t, without running the entire pipeline.

The Multi-Agent Solution

Instead of one agent doing everything, we create specialized agents that each do one thing extremely well:

┌─────────────────────────────────────────────┐
│         🎬 Creative Director                │
│         (Orchestrator)                      │
│    - Routes requests intelligently          │
│    - Coordinates specialists                │
│    - Passes context between agents          │
└──────────┬──────────────────────────────────┘
           │
           │ A2A Protocol (HTTPS)
           │
    ┌──────┴───────┬───────┬────────┬──────┐
    │              │       │        │      │
┌───▼───┐  ┌──────▼──┐ ┌──▼────┐ ┌─▼───┐ ┌▼─────┐
│ 🔍    │  │ ✍️      │ │ 🎨    │ │ ⭐  │ │ 📋   │
│Research│  │Copywriter│ │Designer│ │Review│ │Planning│
│Agent   │  │Agent    │ │Agent   │ │Agent │ │Agent  │
└────────┘  └─────────┘ └────────┘ └─────┘ └──────┘

Benefits

Separation of Concerns

Each agent has one responsibility
Bugs are isolated
Independent updates and improvements

Independent Scaling

Scale copywriter separately from designer
Cost-efficient resource allocation
Match capacity to demand

Specialized Expertise

Each agent has focused instructions
Better quality output
Clear responsibilities

Flexibility and Reusability

Use copywriter in other projects
Mix and match agents
Compose new workflows easily

Easier Testing

Test each agent independently
Mock dependencies
Clear success criteria

Enter Google’s Agent Development Kit (ADK)

Building a multi-agent system from scratch is complex. You need:

Agent runtime and lifecycle management
Communication protocols
Tool integration
Session management
Deployment infrastructure

Google ADK provides all of this out of the box.

What is ADK?

The Agent Development Kit is a framework for building, deploying, and managing AI agents. It provides:

Agent Types: LlmAgent for simple agents, Agent for complex orchestration
Built-in Tools: Google Search, code execution, and more
Remote Agent Support: Call agents over HTTP via A2A protocol
Session Management: Built-in state management
Cloud Integration: Deploy to Vertex AI Agent Engine

Core Concepts

1. Agents

from google.adk.agents import Agent

agent = Agent(
    name=”brand_strategist”,
    model=”gemini-2.5-flash”,
    instruction=”You are a brand strategist...”,
    description=”Market research and trend analysis”,
    tools=[google_search]
)

2. Tools

Tools extend agent capabilities:

from google.adk.tools import google_search

# Built-in tool
tools = [google_search]

# Custom tool
@function_tool
def analyze_sentiment(text: str) -> dict:
    “”“Analyze sentiment of text”“”
    # Your implementation
    return {”sentiment”: “positive”, “score”: 0.85}

3. Sessions

Sessions maintain conversation context:

from google.adk.sessions import InMemorySessionService

session_service = InMemorySessionService()

4. Runners

Runners execute agents:

from google.adk import Runner

runner = Runner(
    app_name=”my_agent”,
    agent=agent,
    session_service=session_service
)
async for event in runner.run_async(
    user_id=”user_123”,
    session_id=”session_456”,
    new_message=Content(parts=[Part(text=”Hello!”)])
):
    print(event.text)

Introducing AI Creative Studio: A Real-World Example

Throughout this series, we’ll build AI Creative Studio — a distributed multi-agent system for creating complete social media campaigns.

System Architecture

The Agents

1. Brand Strategist (LlmAgent + Google Search)

Researches market trends
Analyzes competitors
Identifies target audience insights

2. Copywriter (LlmAgent)

Creates engaging social media captions
Writes hashtags and CTAs
Adapts tone and style

3. Designer (LlmAgent)

Generates visual concepts
Creates AI image generation prompts
Defines style and mood

4. Critic (LlmAgent)

Reviews all creative work
Provides constructive feedback
Scores quality

5. Project Manager (Agent + Notion MCP)

Creates project timeline
Generates task list
Integrates with Notion for task management

6. Creative Director (Agent - Orchestrator)

Coordinates all specialists
Implements planning-first workflow
Manages context and error handling

Deployment Architecture

Specialists → Cloud Run

Containerized services
Auto-scaling (0–100 instances)
A2A server endpoints
HTTPS communication

Orchestrator → Vertex AI Agent Engine

Managed runtime
No containerization needed
Environment-based configuration

Part 2: Building Your First ADK Agents

Now that we understand why multi-agent systems matter, let’s get hands-on and build our first specialist agents.

Setup: Installing ADK

First, let’s set up our development environment.

Prerequisites

# Python 3.11 or higher
python --version  # Should be 3.11+

# Create project directory
mkdir ai-creative-studio
cd ai-creative-studio

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install Google ADK
pip install google-adk google-genai python-dotenv

Environment Configuration

Create a .env file:

# Get your API key from: https://aistudio.google.com/app/apikey
GOOGLE_API_KEY=your_gemini_api_key_here

Understanding Agent Anatomy

Before we build, let’s understand what makes up an ADK agent:

from google.adk.agents import Agent

agent = Agent(
    name=”agent_name”,              # Identifier for logging/debugging
    model=”gemini-2.5-flash”,       # LLM model to use
    instruction=”System instruction...”,  # Agent’s role and behavior
    description=”Brief description...”,   # What this agent does
    tools=[...]                     # Optional: External capabilities
)

Key Components

1. System Instruction

Defines the agent’s role and expertise
Sets boundaries (what it should/shouldn’t do)
Provides output format guidelines
Includes examples and best practices

2. Model Selection

gemini-2.5-flash: Fast, efficient (our choice)
gemini-2.5-pro: More capable, slower
gemini-2.0-ultra: Most powerful

3. Tools

Built-in: google_search, code_execution
Custom: Your own functions
MCP: External services

Agent 1: Brand Strategist (Research Specialist)

Our Brand Strategist needs to research markets, analyze competitors, and identify trends. This requires the Google Search tool.

Step 1: Create the File

mkdir -p agents/brand_strategist
cd agents/brand_strategist
touch agent.py

Step 2: Define the System Instruction

# agents/brand_strategist/agent.py
import logging
import datetime
from google.adk.agents import Agent
from google.adk.tools import google_search
import os
from dotenv import load_dotenv

load_dotenv()

SYSTEM_INSTRUCTION = f”“”You are a Brand Strategist specializing in market research and trend analysis.

IMPORTANT: Today’s date is {datetime.date.today().strftime(’%B %d, %Y’)}.
When conducting research, focus on current trends from {datetime.date.today().year}.

Your expertise includes:
- Identifying target audience insights and behaviors
- Analyzing competitor strategies
- Researching current social media trends
- Understanding platform algorithms and best practices

You have access to tools:
- google_search: Search the web for competitors, trends, and market insights

When given a campaign brief:
1. Use google_search to research the target audience’s current interests
2. Search for and analyze 2-3 competitor brands
3. Identify 3-5 trending topics related to the product category
4. Provide high-level strategic insights

DO NOT:
- Create captions, copy, or specific messaging
- Generate image concepts or designs
- Write TikTok scripts or Instagram posts
- Create content calendars

Your job is to provide RESEARCH INSIGHTS that other specialists will use.

Format your output as:

**Audience Insights:**
[Key behaviors, preferences, and pain points based on research]
**Competitive Analysis:**
[What 2-3 competitors are doing - their strengths and weaknesses]
**Trending Topics:**
[3-5 relevant trends to consider]
**Key Strategic Insights:**
[High-level themes and positioning opportunities]
“”“

Why This Instruction Works

Date-aware: Ensures current research, not outdated information
Clear boundaries: Explicitly states what NOT to do
Tool guidance: Tells agent when and how to use google_search
Structured output: Provides consistent format for downstream agents

Step 3: Create the Agent

# Continue in agent.py
logger = logging.getLogger(”ai_creative_studio.brand_strategist”)

root_agent = Agent(
    name=”brand_strategist”,
    model=”gemini-2.5-flash”,
    instruction=SYSTEM_INSTRUCTION,
    description=”Brand strategist for market research, trend analysis, and competitive insights”,
    tools=[google_search]  # ← Built-in Google Search tool
)
logger.info(”Brand Strategist agent created successfully”)

Step 4: Add Local Testing

# Continue in agent.py
if __name__ == “__main__”:
    import asyncio
    from google.adk import Runner
    from google.adk.sessions import InMemorySessionService
    from google.genai import types
    async def main():
        print(”🔍 Starting Brand Strategist Agent...\n”)
        brief = “”“
        Research the market for eco-friendly smart water bottles
        targeting health-conscious millennials.
        “”“
        print(f”Brief: {brief}\n”)
        # Create session service
        session_service = InMemorySessionService()
        # Create runner
        runner = Runner(
            app_name=”brand_strategist”,
            agent=root_agent,
            session_service=session_service
        )
        session_id = “test_session”
        user_id = “test_user”
        try:
            # Create session
            await session_service.create_session(
                app_name=”brand_strategist”,
                user_id=user_id,
                session_id=session_id
            )
            # Run agent
            print(”brand_strategist > “, end=’‘, flush=True)
            async for event in runner.run_async(
                user_id=user_id,
                session_id=session_id,
                new_message=types.Content(parts=[types.Part(text=brief)])
            ):
                if hasattr(event, ‘text’) and event.text:
                    print(event.text, end=’‘, flush=True)
            print(”\n\n✅ Research Complete!”)
        finally:
            await runner.close()
    asyncio.run(main())

Step 5: Test It!

python agent.py

Expected output:

🔍 Starting Brand Strategist Agent...

Brief: Research the market for eco-friendly smart water bottles...
brand_strategist > **Audience Insights:**
Health-conscious millennials (25-34) are increasingly seeking products
that align with their values. They prioritize:
- Sustainability and eco-friendly materials
- Smart features for health tracking
- Aesthetic design for social media sharing
- Convenience for active lifestyles
**Competitive Analysis:**
1. Hydro Flask: Strong brand loyalty, premium pricing ($30-50),
   lacks smart features
2. S’well: Fashion-forward design, sustainability focus,
   limited tech integration
3. HidrateSpark: Smart bottle with app, moderate price ($40-60),
   opportunity for better design
**Trending Topics:**
1. #SustainableLiving - 2.3M posts, growing 15% monthly
2. #HydrationChallenge - Viral trend, 500K+ posts
3. Smart health wearables integration
4. Minimalist lifestyle aesthetics
5. Water bottle as fashion accessory
**Key Strategic Insights:**
- Gap in market: Premium sustainable + smart features
- Millennials willing to pay $50-80 for value-aligned products
- Instagram and TikTok key platforms for awareness
- Positioning opportunity: “Tech meets sustainability”
✅ Research Complete!

Perfect! Our Brand Strategist is working. It used Google Search to find real market data and presented insights in a structured format.

Agent 2: Copywriter (Pure LLM)

The Copywriter creates engaging social media captions. Unlike the Brand Strategist, it doesn’t need external tools — just excellent writing skills.

Create the Agent

# agents/copywriter/agent.py
from google.adk.agents import Agent
import logging
from dotenv import load_dotenv
load_dotenv()
logger = logging.getLogger(”ai_creative_studio.copywriter”)

SYSTEM_INSTRUCTION = “”“You are an expert Social Media Copywriter specializing in Instagram and TikTok content.

Your expertise includes:
- Writing engaging, scroll-stopping captions
- Creating platform-optimized hashtag strategies
- Crafting clear, compelling CTAs
- Adapting tone and voice to brand personality

When given a campaign brief and research insights:
1. Create 3-5 Instagram posts with complete captions
2. Include relevant hashtags (mix of popular and niche)
3. Suggest strong CTAs that drive action
4. Match the brand voice and target audience

DO NOT:
- Conduct market research (Brand Strategist’s job)
- Create visual design concepts (Designer’s job)
- Review your own work (Critic’s job)

Format each post as:
### Post [Number]: [Theme]
**Full Caption:**
[Engaging caption with emojis where appropriate]
**Hashtags:**
#hashtag1 #hashtag2 #hashtag3...
**Suggested CTA:**
[Clear call-to-action]
---
Remember: You receive research insights from the Brand Strategist.
Use those insights to inform your copy, but create original,
engaging content that resonates with the target audience.
“”“

root_agent = Agent(
    name=”copywriter”,
    model=”gemini-2.5-flash”,
    instruction=SYSTEM_INSTRUCTION,
    description=”Expert social media copywriter for creating engaging captions and copy”,
    tools=[]  # ← No tools needed, pure LLM
)
logger.info(”Copywriter agent created successfully”)
# Add testing code similar to Brand Strategist...

Why No Tools?

The Copywriter is a pure LLM agent because:

Creative writing doesn’t require external data
LLMs excel at language generation
Simpler is better when tools aren’t needed
Faster and more cost-efficient

Agent 3: Designer (Pure LLM)

The Designer generates visual concepts and AI image generation prompts.

# agents/designer/agent.py
from google.adk.agents import Agent
import logging
from dotenv import load_dotenv
load_dotenv()
logger = logging.getLogger(”ai_creative_studio.designer”)

SYSTEM_INSTRUCTION = “”“You are a Creative Visual Designer specializing in social media visual concepts.

Your expertise includes:
- Creating detailed AI image generation prompts
- Defining visual style, mood, and composition
- Selecting color palettes and design elements
- Ensuring brand consistency

When given social media posts:
1. Create 2-3 visual concepts per post
2. Write detailed Imagen/DALL-E prompts for each concept
3. Specify style, mood, colors, and composition
4. Ensure Instagram-optimized layouts (1:1 or 4:5)

DO NOT:
- Write captions or copy (Copywriter’s job)
- Actually generate images (you create prompts only)
- Provide strategic insights (Brand Strategist’s job)

Format each concept as:

**For Post [Number]: [Theme]**
**Concept A: [Visual Theme]**
- **Prompt**: [Detailed AI image generation prompt]
- **Style**: [e.g., minimalist, vibrant, cinematic, lifestyle]
- **Colors**: [Color palette]
- **Mood**: [e.g., energetic, calm, inspiring, professional]
- **Composition**: [Layout and key elements]
**Concept B: [Alternative Theme]**
[Same structure...]
---
Remember: Create prompts that an AI image generator can understand.
Be specific about elements, style, lighting, and mood.
“”“

root_agent = Agent(
    name=”designer”,
    model=”gemini-2.5-flash”,
    instruction=SYSTEM_INSTRUCTION,
    description=”Creative visual designer for generating social media image concepts”,
    tools=[]  # Pure LLM
)
logger.info(”Designer agent created successfully”)

Key Patterns and Best Practices

1. Single Responsibility

Each agent does ONE thing well:

Brand Strategist → Research
Copywriter → Writing
Designer → Visual concepts

❌ Don’t: Make agents do multiple jobs ✅ Do: Create focused specialists

2. Clear Boundaries

Use “DO NOT” instructions to prevent scope creep:

DO NOT:
- Create captions (that’s Copywriter’s job)
- Generate images (you create prompts only)

This prevents agents from overstepping their roles.

3. Structured Output

Always specify output format:

Format your output as:
**Section Header:**
[Content]
**Another Section:**
[More content]

This makes downstream agents’ jobs easier.

4. Context Awareness

SYSTEM_INSTRUCTION = f”“”
Today’s date is {datetime.date.today().strftime(’%B %d, %Y’)}.
Focus on trends from {datetime.date.today().year}.
“”“

Date-aware instructions ensure current, relevant outputs.

5. Tool Selection

Use tools when:

Need external data (google_search)
Need computation (code_execution)
Need external services (MCP tools)

Don’t use tools when:

Pure language generation (copywriting)
Creative tasks (design concepts)
Analysis of provided data

Common Pitfalls and Solutions

Pitfall 1: Over-Complicated Instructions

❌ Bad:

instruction = “”“You are an expert in everything related to marketing,
including but not limited to research, copywriting, design, analytics,
SEO, SEM, content strategy...”“”  # 500 lines later...

✅ Good:

instruction = “”“You are a Brand Strategist specializing in market research.
Your expertise: [3-4 bullet points]
When given a brief: [3-4 steps]
DO NOT: [3-4 boundaries]
Format: [clear structure]
“”“

Pitfall 2: Missing Boundaries

❌ Bad:

instruction = “You are a copywriter. Write great content.”

Result: Agent might also try to do research, design, strategy…

✅ Good:

instruction = “”“You are a copywriter.
DO NOT:
- Conduct research (Brand Strategist’s job)
- Create visuals (Designer’s job)
“”“

Pitfall 3: Ignoring Output Format

❌ Bad: No format specification → inconsistent outputs

✅ Good: Clear format → predictable, parseable outputs

Local Testing with `adk web`

ADK provides a web UI for interactive testing:

cd agents/brand_strategist
adk web --log_level DEBUG

Then open

http://localhost:8000

in your browser.

Benefits:

Nice UI for testing
See full conversation history
Debug mode shows tool calls
Export conversations

Code Repository: https://github.com/Saoussen-CH/ai-creative-studio-adk-a2a-mcp-vertexai-cloudrun

Next: Part 2: A2A Protocol Deep Dive →

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Google ADK: From Local Development to Vertex AI Deployment: Part 9

Saoussen CHAABNIA — Tue, 06 Jan 2026 19:05:21 GMT

Welcome to Part 9 of Google ADK: From Local Development to Vertex AI Deployment — the series finale! You’ve journeyed from your first agent to cloud deployment. Now let’s complete the stack with a full web application.

Google ADK: From Local Development to Vertex AI Deployment series:

Introduction

The final step: A production web app accessible to anyone!

This tutorial demonstrates full-stack deployment for learning, demonstration, and testing purposes.

Architecture:

React Frontend → FastAPI Backend → Agent Engine → Gemini

All on Cloud Run (serverless, auto-scaling).

GitHub: content_creation_mas_workshop

Understanding the Architecture

User → Cloud Run (Frontend + Backend) → Agent Engine → Gemini

Components:

Cloud Run Service — Hosts both frontend and backend
Agent Engine — Runs your multi-agent system
Connection — Backend uses RemoteRunner

Backend-to-Agent Engine Integration

Connection Flow

1. User → POST /api/create-content
2. Backend → Initialize RemoteRunner with AGENT_ENGINE_RESOURCE_NAME
3. RemoteRunner → Authenticate with Google Cloud
4. Request → Agent Engine via gRPC
5. Agent Engine → Execute workflow
6. Response → Stream back to user via SSE

Backend Code (`backend/api_server.py`)

from google import genai
import os

# Initialize client
client = genai.Client(
    vertexai=True,
    project=os.getenv(”GOOGLE_CLOUD_PROJECT”),
    location=os.getenv(”GOOGLE_CLOUD_LOCATION”)
)

# Get Agent Engine resource
AGENT_ENGINE_RESOURCE_NAME = os.getenv(”AGENT_ENGINE_RESOURCE_NAME”)

@app.post(”/api/create-content”)
async def create_content(request: ContentRequest):
    # Connect to Agent Engine
    agent = client.agentic.get_agent(AGENT_ENGINE_RESOURCE_NAME)

    # Send request
    response = agent.query(
        user_query=request.topic,
        session_id=request.session_id or generate_session_id()
    )

    # Stream response
    async def event_stream():
        for chunk in response:
            yield f”data: {json.dumps(chunk)}\n\n”

    return StreamingResponse(event_stream(), media_type=”text/event-stream”)

Docker Container

Multi-Stage Dockerfile

# Stage 1: Build React frontend
FROM node:20-alpine AS frontend-build
WORKDIR /frontend
COPY frontend/ ./
RUN npm ci && npm run build

# Stage 2: Python backend
FROM python:3.11-slim
WORKDIR /app

# Install dependencies
COPY backend/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy frontend build
COPY --from=frontend-build /frontend/dist ./static

# Copy backend
COPY backend/ ./

# Environment variables (set by Cloud Run)
ENV AGENT_ENGINE_RESOURCE_NAME=”“

# Start server
CMD [”uvicorn”, “api_server:app”, “--host”, “0.0.0.0”, “--port”, “8080”]

Deployment Process

Step 1: Deploy Agent to Agent Engine

cd deployment
python deploy.py --action deploy

# Copy the AGENT_ENGINE_RESOURCE_NAME output

Step 2: Update .env File

cat > .env << EOF
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_CLOUD_LOCATION=us-central1
AGENT_ENGINE_RESOURCE_NAME=projects/.../reasoningEngines/...
EOF

Step 3: Deploy to Cloud Run

./deployment/deploy-combined.sh

What it does:

Builds Docker image (frontend + backend)
Pushes to Artifact Registry
Deploys to Cloud Run
Sets environment variables
Returns service URL

IAM and Security

Required Roles

SERVICE_ACCOUNT=”content-studio-sa@${PROJECT_ID}.iam.gserviceaccount.com”

# Add roles
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member=”serviceAccount:${SERVICE_ACCOUNT}” \
    --role=”roles/aiplatform.user”

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member=”serviceAccount:${SERVICE_ACCOUNT}” \
    --role=”roles/ml.developer”

Testing Production Deployment

Access the Application

# Get URL
gcloud run services describe content-studio \
    --region=us-central1 \
    --format=’value(status.url)’

# Test API
curl -X POST “YOUR_CLOUD_RUN_URL/api/create-content” \
    -H “Content-Type: application/json” \
    -d ‘{
        “topic”: “AI in Healthcare”,
        “target_audience”: “Healthcare professionals”,
        “tone”: “Professional”,
        “keywords”: “AI, healthcare”
    }’

Monitoring and Logs

Cloud Run Logs

# Real-time logs
gcloud run services logs read content-studio \
    --region=us-central1 \
    --follow

# Search for errors
gcloud run services logs read content-studio \
    --region=us-central1 \
    --filter=”severity>=ERROR”

Agent Engine Logs

gcloud logging read \
    “resource.type=aiplatform.googleapis.com/ReasoningEngine” \
    --limit=50

Health Check

curl https://YOUR_URL/health

# Response:
{
  “status”: “healthy”,
  “agent_engine_configured”: true,
  “agent_engine_resource”: “projects/.../reasoningEngines/...”
}

Troubleshooting

Common Issues

1. “AGENT_ENGINE_RESOURCE_NAME not set”

# Check environment variable
gcloud run services describe content-studio \
    --region=us-central1 \
    --format=”value(spec.template.spec.containers[0].env)”

# Update if missing
gcloud run services update content-studio \
    --region=us-central1 \
    --set-env-vars=”AGENT_ENGINE_RESOURCE_NAME=projects/...”

2. “Permission Denied”

# Check service account
gcloud run services describe content-studio \
    --region=us-central1 \
    --format=”value(spec.template.spec.serviceAccountName)”

# Add required role
gcloud projects add-iam-policy-binding PROJECT_ID \
    --member=”serviceAccount:SERVICE_ACCOUNT” \
    --role=”roles/aiplatform.user”

3. “Agent Engine Not Found”

Verify Agent Engine is deployed
Check resource name matches
Ensure both services in same region

4. Connection Timeout

# Increase timeout
gcloud run services update content-studio \
    --region=us-central1 \
    --timeout=300

Deployment Best Practices for Learning/Demo

Note: These are basic best practices for demonstration deployments. Production systems would require significantly more robust practices including comprehensive testing, CI/CD automation, security hardening, and advanced monitoring.

1. Region Consistency

# All in same region
REGION=”us-central1”

2. Error Handling

try:
    agent = client.agentic.get_agent(RESOURCE_NAME)
    response = agent.query(user_query)
except google.api_core.exceptions.GoogleAPIError as e:
    logger.error(f”Agent Engine error: {e}”)
    raise HTTPException(status_code=500, detail=”Agent unavailable”)

3. Timeouts

# Cloud Run timeout (deploy-combined.sh)
--timeout=300  # 5 minutes

# Request timeout in code
response = agent.query(user_query, timeout=240)

4. Retry Logic

from google.api_core.retry import Retry

retry = Retry(
    initial=1.0,
    maximum=10.0,
    multiplier=2.0,
    deadline=60.0
)

response = agent.query(user_query, retry=retry)

Congratulations!

You’ve completed the entire series! You now know how to:

Build AI agents (Parts 1–2)
Create agent teams (Part 3)
Design workflows (Parts 4–6)
Build complete agent systems (Part 7)
Deploy to cloud for learning/demo (Parts 8–9)

Access the full code at GitHub: content_creation_mas_workshop

Next Steps:

Experiment with your own use cases
Extend the system
Deploy to production
Build amazing things!

Thank you for following along! 🚀

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Google ADK: From Local Development to Vertex AI Deployment: Part 8

Saoussen CHAABNIA — Tue, 06 Jan 2026 19:01:50 GMT

Welcome to Part 8 of Google ADK: From Local Development to Vertex AI Deployment! You’ve built sophisticated agents locally. Now comes the pivotal transition: deploying to the cloud.

Google ADK: From Local Development to Vertex AI Deployment series:

Introduction: From Prototype to Cloud Deployment

You’ve built an incredible Content Creation Studio locally — 11 specialist agents working together in sophisticated workflows. It works beautifully on your laptop. But now you face the critical question:

How do you make your local prototype accessible to others by deploying it to the cloud?

This is where many AI projects stall. Moving from “works on my machine” to a scalable, cloud-hosted system requires:

Infrastructure management — Servers, containers, orchestration
Scalability — Handling 1 user vs 1,000 users simultaneously
Reliability — Uptime, error handling, monitoring
Security — Authentication, authorization, data protection
Cost optimization — Efficient resource usage
Maintenance — Updates, logging, debugging

Building all of this from scratch is complex, time-consuming, and expensive. That’s the problem Google Cloud’s Vertex AI Agent Engine solves for deployment.

What you’ll learn in this part:

Why deployment matters and when to deploy
The power of Vertex AI Agent Engine
Setting up your Google Cloud environment with setup_gcp.sh
Deploying your multi-agent system
Understanding what you get: the Agent Engine endpoint
How this connects to the full-stack app in Part 9

Colab Notebook: Part 8 — Agent Engine Deployment

Why Deployment Matters

The Local Development Trap

Your Content Creation Studio works perfectly locally. You run it in a Jupyter notebook or Python script:

# Local execution
async def create_content():
    session = await session_service.create_session(...)
    await run_agent_query(coordinator_agent, query, session, user_id)

But this has fundamental limitations:

Single User — Only you can use it
No Persistence — Restart = lose everything (InMemorySessionService)
Your Computer — Requires your machine to be running
No Scalability — Can’t handle multiple requests
No Monitoring — Can’t track performance or errors
No API — Can’t integrate with web apps or other services
No Reliability — Crashes stop everything

The Cloud Deployment Requirements

To make your agent accessible to others, you need:

Accessible 24/7 — Available from anywhere, anytime
Scalable — Handles 1 or 1,000 concurrent users
Reliable — Automatic restarts, error recovery
Secure — Authentication, authorization, data encryption
Fast — Low latency, optimized performance
Observable — Logs, metrics, tracing
API-First — HTTP endpoints for integration
Cost-Effective — Pay for what you use

This is what cloud deployment enables.

When Should You Deploy?

Deploy to the cloud when:

Your agent system works reliably in local testing
You want others to use it (team, colleagues, demo purposes)
You need it integrated with a web app or service
You want 24/7 availability for testing and demos
You need to handle multiple concurrent requests
You want cloud monitoring and logging

⚠️ Note: This deployment approach is suitable for learning, testing, and demonstration. For production use, you’ll need to add comprehensive testing, CI/CD pipelines, evaluation frameworks, and other production-grade infrastructure not covered in this tutorial.

Introducing Vertex AI Agent Engine

What is Agent Engine?

Vertex AI Agent Engine is Google Cloud’s fully-managed platform for deploying and running Google ADK agents at scale. Think of it as “Cloud Run for AI Agents.”

According to Google Cloud documentation:

“Vertex AI Agent Engine is a fully managed service that allows you to deploy reasoning engines (ADK agents) to a scalable, serverless infrastructure. It handles infrastructure, auto-scaling, monitoring, and provides a gRPC API for integration.”

In simple terms: You give Google your agent code, and they handle everything else.

The Power of Agent Engine

1. Zero Infrastructure Management

You don’t set up:

Virtual machines or Kubernetes clusters
Container orchestration
Load balancers
Auto-scaling rules
Health checks
Network configuration

Google handles all of it. You focus on your agent logic.

2. Serverless Auto-Scaling

1 user  → 1 instance  → $X
10 users → 2 instances → $2X  (automatically scales up)
1 user  → 1 instance  → $X   (automatically scales down)
0 users → 0 instances → $0   (scales to zero!)

Pay only for actual usage. No idle servers burning money.

3. Managed Reliability Features

Automatic restarts — Agent crashes? Restarted instantly
Health monitoring — Continuous health checks
Multi-zone deployment — High availability across data centers
Versioning — Deploy new versions without downtime
Rollback — Instant rollback to previous versions

4. Built-in Observability

Cloud Logging — All agent logs automatically collected
Cloud Monitoring — Performance metrics, latency, errors
Cloud Trace — Request tracing through your agent system
Debuggable — Full visibility into deployed behavior

5. Secure by Default

IAM integration — Fine-grained access control
Service accounts — Secure authentication
VPC integration — Private network deployment
Encryption — Data encrypted in transit and at rest

6. Seamless ADK Integration

Here’s the magic: Your local ADK agent code deploys with minimal changes.

# Local execution
agent = create_content_creation_coordinator()
runner = Runner(agent=agent, session_service=session_service)
await runner.run_async(...)

# Cloud deployment - SAME agent code!
deployed_agent = agent.deploy(
    project=”my-project”,
    location=”us-central1”
)
# Done! Now accessible via gRPC API

The agent you built locally just works in the cloud.

Prerequisites: Setting Up Google Cloud

Before deploying your agent, you need to prepare your Google Cloud environment. We’ve created a comprehensive setup script that handles everything automatically.

Step 0: Create a Google Cloud Project

Go to Google Cloud Console
Create a new project (or select an existing one)
Enable billing (required for Agent Engine and Cloud Run)
Note your Project ID (e.g., my-content-studio)

Step 1: Run the GCP Setup Script

We provide setup_gcp.sh which automates your entire Google Cloud environment setup.

📂 Location: content_creation_mas/deployment/setup_gcp.sh

What Does setup_gcp.sh Do?

This script is your one-stop setup for Google Cloud. Here’s everything it handles:

1. Environment Configuration

Loads your .env file with GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION
Validates required variables
Sets gcloud defaults

2. Enables Required Google Cloud APIs

✓ aiplatform.googleapis.com           # Vertex AI / Agent Engine
✓ run.googleapis.com                  # Cloud Run (Part 9)
✓ cloudbuild.googleapis.com           # Cloud Build
✓ artifactregistry.googleapis.com     # Docker registry
✓ storage.googleapis.com              # Cloud Storage
✓ iam.googleapis.com                  # IAM
✓ cloudresourcemanager.googleapis.com # Resource Manager

3. Creates Artifact Registry Repository

Creates a Docker repository: content-studio
Location: {region}-docker.pkg.dev/{project}/content-studio
This stores your Cloud Run container images (Part 9)

4. Configures Docker Authentication

Authenticates Docker with Artifact Registry
Allows pushing images: docker push {region}-docker.pkg.dev/...

5. Creates Service Account

Name: content-studio-sa
Email: content-studio-sa@{project}.iam.gserviceaccount.com
This service account will run your deployed services

6. Grants IAM Roles

roles/aiplatform.user - Access Agent Engine
roles/run.invoker - Invoke Cloud Run services
roles/storage.objectViewer - Read from Cloud Storage
roles/logging.logWriter - Write logs

7. Creates Cloud Storage Bucket

Bucket: gs://{project}-content-studio
Used for Agent Engine staging and assets

Output:

========================================
  Setup Complete!
========================================

Summary:
  Project: my-content-studio
  Region: us-central1
  Artifact Registry: us-central1-docker.pkg.dev/my-content-studio/content-studio
  Service Account: content-studio-sa@my-content-studio.iam.gserviceaccount.com
  Storage Bucket: gs://my-content-studio-content-studio

Next Steps:
1. Deploy your agent to Agent Engine:
   cd deployment
   python deploy.py

2. Deploy frontend and backend to Cloud Run:
   cd deployment
   ./deploy-cloudrun.sh

Running the Setup Script

# Navigate to deployment directory
cd content_creation_mas/deployment

# Create .env file with your project
cat > .env << EOF
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_CLOUD_LOCATION=us-central1
GOOGLE_API_KEY=your-api-key
EOF

# Make script executable
chmod +x setup_gcp.sh

# Run setup (takes 2-3 minutes)
./setup_gcp.sh

The script is interactive — it will show you what it will do and ask for confirmation before proceeding.

Important: Run this script once before deploying. It prepares your entire Google Cloud environment.

Deploying Your Agent to Agent Engine

Now that your Google Cloud environment is ready, let’s deploy the Content Creation Studio!

Step 1: Prepare Your Environment

cd content_creation_mas/deployment

# Verify .env file exists with these variables:
cat .env

Required variables:

GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_CLOUD_LOCATION=us-central1
GOOGLE_API_KEY=your-gemini-api-key
GOOGLE_CLOUD_STORAGE_BUCKET=gs://your-project-id-content-studio

Step 2: Deploy with the Deployment Script

We provide deploy.py which handles the deployment:

# Deploy your agent
python deploy.py --action deploy

# What happens:
# 1. Initializes Vertex AI with your project/region/bucket
# 2. Imports your Content Creation Coordinator agent
# 3. Packages all dependencies
# 4. Uploads to Agent Engine
# 5. Configures auto-scaling and monitoring
# 6. Returns the Agent Engine resource name

Deployment takes 5–10 minutes — Google is:

Packaging your agent code and dependencies
Creating the Agent Engine instance
Configuring networking and IAM
Running health checks
Making it available via gRPC API

⚠️ Important: This deployment is suitable for learning, development, testing, and demonstration only. For production use, you’d need to add comprehensive testing pipelines, CI/CD automation, model evaluation frameworks, advanced monitoring and alerting, security hardening, and disaster recovery strategies.

Output:

🚀 Deploying agent to Vertex AI Agent Engine...

📦 Packaging agent code and dependencies...
✓ Agent code packaged

☁️  Uploading to Agent Engine...
✓ Agent uploaded

⚙️  Configuring deployment...
✓ Deployment configured

🎉 Deployment successful!

Resource Name:
projects/123456789/locations/us-central1/reasoningEngines/987654321

📋 Important: Save this resource name!
Add to your .env file:
AGENT_ENGINE_RESOURCE_NAME=projects/123456789/locations/us-central1/reasoningEngines/987654321

🔗 Your agent is now accessible at this endpoint

This resource name is critical — it’s the unique identifier for your deployed agent.

Step 3: Save the Resource Name

# Add to your .env file
echo “AGENT_ENGINE_RESOURCE_NAME=projects/123456789/locations/us-central1/reasoningEngines/987654321” >> .env

Why is this important?

This resource name is the endpoint for your deployed agent
Your backend (Part 9) uses this to connect to the agent
It’s how other services query your agent via API

Understanding What You Just Deployed

The Agent Engine Endpoint

Think of the resource name as the URL of your deployed agent:

projects/123456789/locations/us-central1/reasoningEngines/987654321
           ↑                 ↑                      ↑
      Project ID          Region            Agent Instance ID

This endpoint:

Is globally unique
Is accessible via gRPC API
Handles authentication via IAM
Auto-scales based on demand
Is monitored and logged automatically

How the Endpoint Works

┌─────────────────────┐
│  Your Application   │  (Backend server, Cloud Run, etc.)
│  (Part 9)           │
└──────────┬──────────┘
           │
           │ gRPC API call with:
           │ - resource_name
           │ - user_query
           │ - session_id
           ↓
┌─────────────────────────────────────────┐
│  Vertex AI Agent Engine                 │
│  projects/.../reasoningEngines/...      │
│                                         │
│  ┌────────────────────────────────┐    │
│  │  Your Content Creation Studio  │    │
│  │  - 11 Specialist Agents        │    │
│  │  - Sequential/Parallel/Loop    │    │
│  │  - Session Management          │    │
│  └────────────────────────────────┘    │
│                                         │
│  Auto-scaling: 0-N instances           │
│  Monitoring: Logs, Metrics, Traces     │
└─────────────────┬───────────────────────┘
                  │
                  │ Calls Gemini API
                  ↓
         ┌──────────────────┐
         │   Gemini 2.5     │
         └──────────────────┘

The flow:

Your backend sends a request to the Agent Engine endpoint
Agent Engine routes it to your Content Creation Studio agent
Your agent executes (11 specialist agents working together)
Agent calls Gemini API for LLM inference
Response streams back through Agent Engine to your backend
Your backend serves it to the user

Everything in between is managed by Google.

Testing Your Deployed Agent

Using Python Client

import vertexai
from vertexai import agent_engines
import os

# Initialize Vertex AI
PROJECT_ID = os.getenv(”GOOGLE_CLOUD_PROJECT”)
LOCATION = os.getenv(”GOOGLE_CLOUD_LOCATION”)
RESOURCE_NAME = os.getenv(”AGENT_ENGINE_RESOURCE_NAME”)

vertexai.init(project=PROJECT_ID, location=LOCATION)

# Connect to deployed agent
agent = agent_engines.ReasoningEngine(RESOURCE_NAME)

# Query the agent
response = agent.query(
    query=”Create content about sustainable living for eco-conscious millennials”
)

print(response)

Using gcloud CLI

# List your deployed agents
gcloud beta ai reasoning-engines list \
    --project=your-project-id \
    --location=us-central1

# Get details about your specific agent
gcloud beta ai reasoning-engines describe \
    projects/123456789/locations/us-central1/reasoningEngines/987654321

Checking Logs

# View real-time logs from your agent
gcloud logging read \
    “resource.type=aiplatform.googleapis.com/ReasoningEngine” \
    --project=your-project-id \
    --limit=50 \
    --format=json

# Filter for errors
gcloud logging read \
    “resource.type=aiplatform.googleapis.com/ReasoningEngine AND severity>=ERROR” \
    --project=your-project-id

The Bridge to Part 9: Full-Stack Deployment

What We Have Now

After completing Part 8, you have:

Agent Engine Endpoint — Your multi-agent system running in the cloud
Resource Name — The API identifier to access it
Auto-scaling Infrastructure — Handles any load
Production Monitoring — Logs and metrics
gRPC API — Programmatic access

What’s Missing

But users can’t interact with it yet because:

No user interface (UI)
No web backend to handle HTTP requests
No authentication flow
No session management for web users
No public URL

This is what Part 9 solves. Part 8 provides the AI backend. Part 9 provides the user-facing application.

Together, they form a complete, production-ready system accessible to anyone with a web browser.

What’s Next?

Your AI agents are now running in the cloud! But users still can’t interact with them through a web interface.

In Part 9: Full-Stack Deployment with Cloud Run, we’ll complete the stack:

React Frontend — Beautiful web UI for content creation
FastAPI Backend — REST API that connects to Agent Engine
Docker Containerization — Package frontend + backend
Cloud Run Deployment — Serverless hosting with auto-scaling
Complete Integration — Users → Web App → Agent Engine → Gemini

The final piece of the puzzle!

GitHub Repository: content_creation_mas_workshop

Colab Notebook: Part 8 — Agent Engine Deployment

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Google ADK: From Local Development to Vertex AI Deployment: Part 7

Saoussen CHAABNIA — Tue, 06 Jan 2026 19:00:15 GMT

Welcome to Part 7 of Google ADK: From Local Development to Vertex AI Deployment — the capstone! You’ve mastered individual concepts. Now we’re bringing everything together into one sophisticated system.

Google ADK: From Local Development to Vertex AI Deployment series:

The Journey So Far:

Part 1–2: Agents and tools
Part 3: Agent teams
Part 4: Sequential workflows
Part 5: Iterative improvement
Part 6: Parallel execution

Today: We combine ALL patterns into one complete multi-agent system!

Colab: Part 7

Concept: Hierarchical Orchestration

4-Layer Architecture:

Layer 1: Master Orchestrator (routing)
Layer 2: Sub-Workflows (Sequential, Loop, Parallel)
Layer 3: Specialist Agents (11 agents)
Layer 4: Tools (custom + built-in)

Benefits:

Clean separation of concerns
Easy to extend
Testable components
Scalable design

Concept: End-to-End Autonomous Workflows

Complete task execution with minimal human intervention: User Request → Parse → Research → Draft → Improve (Loop) → Multi-Channel (Parallel) → Package → Deliver

All automatic!

The Complete Architecture

User Query
     ↓
Master Orchestrator
     ├─→ Full Content Workflow
     │        ├─ Intake Agent
     │        ├─ Sequential (Research + Draft)
     │        ├─ Loop (Quality Check + Improve)
     │        ├─ Parallel (Blog + Social + Email + SEO)
     │        └─ Final Packager
     │
     └─→ Content Analyzer (simple analysis)

Building the Complete System

(Due to length, showing key components)
All 11 Specialist Agents

# 1. intake_agent
# 2. topic_research_agent
# 3. content_drafter_agent
# 4. quality_checker_agent
# 5. content_improver_agent
# 6. blog_post_writer_agent
# 7. social_media_creator_agent
# 8. email_newsletter_writer_agent
# 9. seo_metadata_agent
# 10. content_analyzer_agent
# 11. final_packager_agent

Complete Workflow Assembly

from google.adk.agents import SequentialAgent, LoopAgent, ParallelAgent

# Sequential: Research + Draft
research_and_draft_workflow = SequentialAgent(
    sub_agents=[topic_research_agent, content_drafter_agent]
)

# Loop: Quality Improvement
quality_improvement_loop = LoopAgent(
    sub_agents=[quality_checker_agent, content_improver_agent],
    max_iterations=3
)

# Parallel: Multi-Channel Content
parallel_content_creation = ParallelAgent(
    sub_agents=[
        blog_post_writer_agent,
        social_media_creator_agent,
        email_newsletter_writer_agent,
        seo_metadata_agent
    ]
)

# Full Content Workflow
full_content_workflow = SequentialAgent(
    sub_agents=[
        intake_agent,
        research_and_draft_workflow,
        quality_improvement_loop,
        parallel_content_creation,
        final_packager_agent
    ]
)

# Master Orchestrator
from google.adk.tools.agent_tool import AgentTool

master_orchestrator_agent = Agent(
    name=”master_orchestrator_agent”,
    model=”gemini-2.5-flash”,
    instruction=”“”
    You are the Master Content Creation Studio orchestrator.

    - For FULL content creation, use `full_content_workflow_tool`.
    - For ANALYZING existing text, use `content_analyzer_tool`.

    Always delegate. Present responses clearly.
    “”“,
    tools=[
        AgentTool(agent=full_content_workflow),
        AgentTool(agent=content_analyzer_agent)
    ]
)

Testing the Complete System

async def run_capstone_project():
    session = await session_service.create_session(
        app_name=master_orchestrator_agent.name,
        user_id=user_id
    )

    # Query 1: Full Content Creation
    query1 = “”“
    Create a complete content package for:
    - Topic: Productivity hacks using AI for remote workers
    - Target Audience: Remote professionals and digital nomads
    - Tone: Conversational and helpful
    - Keywords: AI productivity, remote work, automation tools
    “”“

    # Query 2: Quick Analysis
    sample_text = “Remote work has transformed productivity...”
    query2 = f”Analyze this text:\n\n{sample_text}”

    # Run both queries...

What’s Next?

Part 8: Deploy to Google Cloud’s Agent Engine
Part 9: Full-stack deployment to Cloud Run

GitHub: content_creation_mas_workshop

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Google ADK: From Local Development to Vertex AI Deployment: Part 6

Saoussen CHAABNIA — Tue, 06 Jan 2026 18:58:50 GMT

Welcome to Part 6 of Google ADK: From Local Development to Vertex AI Deployment! We’ve built an incredible system of agents that can delegate, follow ordered plans, and even iterate to solve problems. In this lesson, we’ll tackle a new dimension: efficiency.

Our workflows so far have been linear. What if we need to perform multiple, independent tasks at the same time? For this, we use ParallelAgent. Today, we’ll build a workflow that can simultaneously create blog posts, social media content, and email newsletters — all from a single request, creating our most efficient agent yet.

Prerequisites

This article builds on our previous work. Please ensure you’re familiar with the concepts from the entire series.

Google ADK: From Local Development to Vertex AI Deployment series:

Introduction

The Efficiency Problem:

Sequential: Blog (10s) → Social (10s) → Email (10s) → SEO (10s) = 40 seconds
Parallel:   [Blog | Social | Email | SEO] all run together = 10 seconds

Same result, 4x faster!

Today, you’ll learn how to build parallel workflows using ParallelAgent — executing multiple agents simultaneously for maximum efficiency.

Colab: Part 6

🆕 Concept: ParallelAgent

What is ParallelAgent? A workflow agent that executes ALL sub-agents concurrently (simultaneously). Perfect for independent tasks that don’t depend on each other.

Key Characteristics:

Runs all sub-agents at the same time
Total time = longest single agent (not sum!)
Each sub-agent works independently
Collects all results via output_key

When to use:

Multiple independent tasks
Content for different channels
Information gathering from multiple sources

📖 Workflow Agents

🆕 Concept: Fan-Out/Fan-In Pattern

Architectural pattern for parallel processing:

Fan-Out: Distribute single input to multiple workers
Fan-In: Collect all parallel results and combine

Input Brief
     ↓
  Fan-Out
     ├──→ Blog Writer
     ├──→ Social Creator
     ├──→ Email Writer
     └──→ SEO Generator
     ↓
  Fan-In
     ↓
Complete Package

🆕 Concept: Intake Agent Pattern

Pattern for parsing natural language into structured data: Instead of requiring structured input, an intake agent extracts parameters from conversational requests.

Example:

User: “Create content about AI for small businesses, friendly tone”
     ↓
Intake Agent extracts:
     topic = “AI for small businesses”
     tone = “friendly”
     audience = “small business owners”
     ↓
Stores in session.state for other agents

Building a Parallel Content Factory

Setup

!pip install google-adk==1.19.0 -q

Step 1: Intake Agent with Session State

from google.adk.tools import ToolContext
from google.adk.agents import Agent

def update_session_state(
    tool_context: ToolContext,
    topic: str,
    target_audience: str,
    tone: str,
    keywords: str
) -> str:
    “”“
    Saves extracted content brief parameters to session state.
    “”“
    print(f”🔧 Updating session state...”)
    print(f”   Topic: {topic}”)
    print(f”   Audience: {target_audience}”)
    print(f”   Tone: {tone}”)

    tool_context.state[’topic’] = topic
    tool_context.state[’target_audience’] = target_audience
    tool_context.state[’tone’] = tone
    tool_context.state[’keywords’] = keywords

    return “Session state updated with content brief parameters.”

intake_agent = Agent(
    name=”intake_agent”,
    model=”gemini-2.5-flash”,
    instruction=”“”
    You are a content brief analyzer. From the user’s request, identify:
    - The main topic
    - The target audience
    - The desired tone
    - Key SEO keywords (comma-separated)

    Then call the `update_session_state` tool with the extracted values.
    “”“,
    tools=[update_session_state]
)

print(”🧞 Intake agent created!”)

Step 2: Create Parallel Content Creators

# Agent 1: Blog Post Writer
blog_post_writer_agent = Agent(
    name=”blog_post_writer_agent”,
    model=”gemini-2.5-flash”,
    instruction=”“”
    Write a complete blog post about: {{topic}}

    Target audience: {{target_audience}}
    Tone: {{tone}}

    Requirements:
    - 600-800 words
    - Engaging introduction
    - 3-4 H2 headings
    - Clear call-to-action

    Output only the blog post in markdown.
    “”“,
    tools=[],
    output_key=”blog_post”
)

# Agent 2: Social Media Creator
social_media_creator_agent = Agent(
    name=”social_media_creator_agent”,
    model=”gemini-2.5-flash”,
    instruction=”“”
    Create social posts about: {{topic}}

    Target audience: {{target_audience}}
    Tone: {{tone}}

    Create THREE posts:

    **1. LinkedIn Post** (150-200 words)
    - Professional and insightful
    - 3-4 professional hashtags

    **2. Twitter/X Thread** (3-4 tweets, 280 chars each)
    - Engaging thread with hashtags
    - Call-to-action in last tweet

    **3. Instagram Caption** (100-150 words)
    - Engaging with emojis
    - 8-10 hashtags at end

    Format clearly with headers for each platform.
    “”“,
    tools=[],
    output_key=”social_media_content”
)

# Agent 3: Email Newsletter Writer
email_newsletter_writer_agent = Agent(
    name=”email_newsletter_writer_agent”,
    model=”gemini-2.5-flash”,
    instruction=”“”
    Write an email newsletter about: {{topic}}

    Target audience: {{target_audience}}
    Tone: {{tone}}

    Structure:
    - **Subject Line**: Compelling (50-60 chars)
    - **Preview Text**: Enticing (40-50 chars)
    - **Body** (300-400 words):
      * Personal greeting
      * Engaging introduction
      * 2-3 key points
      * Clear call-to-action
      * Friendly sign-off

    Format with clear sections.
    “”“,
    tools=[],
    output_key=”email_newsletter”
)

# Agent 4: SEO Metadata Generator
seo_metadata_agent = Agent(
    name=”seo_metadata_agent”,
    model=”gemini-2.5-flash”,
    instruction=”“”
    Generate SEO metadata for content about: {{topic}}

    Target keywords: {{keywords}}

    Create:
    1. **Meta Title** (50-60 characters)
    2. **Meta Description** (150-160 characters)
    3. **URL Slug** (lowercase with hyphens)
    4. **Focus Keyword**
    5. **5 Related Keywords**
    6. **3 Internal Link Suggestions**

    Format as structured list.
    “”“,
    tools=[],
    output_key=”seo_metadata”
)

print(”🧞 All parallel content creator agents created!”)

Step 3: Build the Parallel Workflow

from google.adk.agents import ParallelAgent

# Create the parallel workflow (Fan-Out)
parallel_content_creation = ParallelAgent(
    name=”parallel_content_creation”,
    sub_agents=[
        blog_post_writer_agent,
        social_media_creator_agent,
        email_newsletter_writer_agent,
        seo_metadata_agent
    ]
)

print(”✅ Parallel workflow created!”)

Step 4: Add Synthesizer (Fan-In)

# Synthesizer combines all parallel outputs
content_package_synthesizer_agent = Agent(
    name=”content_package_synthesizer_agent”,
    model=”gemini-2.5-flash”,
    instruction=”“”
    Combine all created content into one comprehensive package.

    You have:
    - Blog post: {{blog_post}}
    - Social media content: {{social_media_content}}
    - Email newsletter: {{email_newsletter}}
    - SEO metadata: {{seo_metadata}}

    Create a well-organized content package with:
    1. **📝 Blog Post** section
    2. **📱 Social Media Content** section
    3. **📧 Email Newsletter** section
    4. **🔍 SEO Metadata** section

    Add brief executive summary at top.
    “”“
)

print(”🧞 Synthesizer agent created!”)

Step 5: Complete Workflow

from google.adk.agents import SequentialAgent

full_parallel_workflow = SequentialAgent(
    name=”full_parallel_workflow”,
    sub_agents=[
        intake_agent,                           # Parse brief
        parallel_content_creation,              # Fan-out (parallel)
        content_package_synthesizer_agent      # Fan-in
    ]
)

print(”✅ Complete parallel workflow assembled!”)

Testing the System

from google.adk.sessions import InMemorySessionService
from google.adk.runners import Runner
from google.genai.types import Content, Part
from IPython.display import display, Markdown

session_service = InMemorySessionService()
user_id = “adk_content_creator_001”

async def run_parallel_content_creation():
    session = await session_service.create_session(
        app_name=full_parallel_workflow.name,
        user_id=user_id
    )

    query = “”“
    Create a complete content package for:
    - Topic: Using AI tools to boost small business productivity
    - Target Audience: Small business owners and solopreneurs
    - Tone: Friendly and approachable, but professional
    - Keywords: AI productivity, small business automation, AI tools for business
    “”“

    print(f”👤 User Content Brief:\n{query}\n”)

    runner = Runner(
        agent=full_parallel_workflow,
        session_service=session_service,
        app_name=full_parallel_workflow.name
    )

    async for event in runner.run_async(
        user_id=user_id,
        session_id=session.id,
        new_message=Content(parts=[Part(text=query)], role=”user”)
    ):
        if event.is_final_response():
            print(”\n” + “=”*60)
            print(”✅ FINAL CONTENT PACKAGE:”)
            print(”=”*60)
            display(Markdown(event.content.parts[0].text))
            print(”=”*60)

await run_parallel_content_creation()

Sequential Execution (Parts 1–5):

Blog (10s) → Social (10s) → Email (10s) → SEO (10s) = 40 seconds total

Parallel Execution (Part 6):

Blog    (10s) ┐
Social  (10s) ├─ All run together
Email   (10s) ┤
SEO     (10s) ┘
= 10 seconds total (4x faster!)

What’s Next?

We’ve mastered all workflow patterns! Now it’s time to combine them all.

Part 7: The Capstone Project builds a complete production-ready system:

Intake → Sequential → Loop → Parallel → Package
All patterns working together
11 specialist agents
Hierarchical orchestration

Colab: Part 7

GitHub: content_creation_mas_workshop

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Google ADK: From Local Development to Vertex AI Deployment: Part 5

Saoussen CHAABNIA — Tue, 06 Jan 2026 18:57:19 GMT

Welcome to Part 5 of Google ADK: From Local Development to Vertex AI Deployment! You’ve built agents that work in sequence. Now let’s add intelligence — agents that critique and improve their own work

Google ADK: From Local Development to Vertex AI Deployment series:

Introduction

First drafts are rarely perfect. Professional writers know this — which is why they revise. But what if your AI agent could critique and improve its own work until it meets quality standards?

Today, you’ll build a self-improving AI system using LoopAgent — a workflow that iteratively refines content until it passes quality gates. No human intervention required.

What you’ll learn:

LoopAgent for iterative workflows
ToolContext for runtime control
tool_context.actions.escalate for loop termination
The Critique-Refine Pattern

Colab Notebook: Part 5

The Quality Problem

# Sequential workflow from Part 4
SequentialAgent([research, draft, format])
# Problem: What if the draft is poor quality?

We need:

# Loop until quality meets threshold
LoopAgent([check_quality, improve], max_iterations=3)
# Stops when: quality >= 70 OR 3 iterations reached

🆕 Concept: LoopAgent

What is LoopAgent? A workflow agent that repeatedly executes its sub-agents until either:

A condition is met (tool_context.actions.escalate = True)
Maximum iterations reached

Perfect for quality improvement, optimization, and refinement tasks.

Exit mechanisms:

Fixed iterations: max_iterations=3 runs exactly 3 times
Conditional exit: Tool sets escalate = True to stop early

📖 Workflow Agents

🆕 Concept: ToolContext

What is ToolContext? A special parameter that gives tools access to runtime information and control over workflow behavior.

Usage:

def my_tool(tool_context: ToolContext, param: str):
    # Access session state
    data = tool_context.state

    # Control workflow
    tool_context.actions.escalate = True  # Exit loop!

Capabilities:

Access session state via tool_context.state
Control workflows via tool_context.actions
Get runtime context

📖 Tool Context

🆕 Concept: tool_context.actions.escalate

The Escalate Flag Setting tool_context.actions.escalate = True signals to LoopAgent: “We’re done, exit now!”

Pattern:

def exit_loop(tool_context: ToolContext):
    tool_context.actions.escalate = True
    return {”result”: “Quality threshold met”}

When to use:

Quality thresholds met
Goal achieved
Condition satisfied

The Critique-Refine Pattern

Architecture for autonomous quality improvement:

Drafter Agent — Creates initial version (runs once)
Checker Agent — Evaluates quality, calculates scores
Improver Agent — Fixes issues OR exits if quality met
Loop: Checker → Improver → Checker → … until threshold met

Building the Quality Loop

Setup

!pip install google-adk==1.19.0 -q

Step 1: Define Quality Tools

from google.adk.tools import ToolContext

def calculate_content_quality_score(
    word_count: int,
    readability_score: float,
    has_headings: bool,
    has_conclusion: bool
) -> dict:
    “”“
    Calculates overall content quality score (0-100).
    Threshold for approval: 70+
    “”“
    print(f”🔧 Calculating quality score...”)

    # Word count scoring (optimal: 800-2000)
    if word_count < 500:
        word_score = 30
    elif word_count < 800:
        word_score = 60
    elif word_count <= 2000:
        word_score = 100
    else:
        word_score = 80

    # Readability scoring
    read_score = min(100, readability_score * 1.5) if readability_score > 0 else 40

    # Structure scoring
    structure_score = 0
    if has_headings:
        structure_score += 50
    if has_conclusion:
        structure_score += 50

    # Overall
    overall = (word_score * 0.3) + (read_score * 0.3) + (structure_score * 0.4)

    result = {
        “overall_score”: round(overall, 2),
        “meets_threshold”: overall >= 70
    }

    print(f”   Score: {result[’overall_score’]}/100”)
    return result

QUALITY_THRESHOLD_MET = “QUALITY_THRESHOLD_MET”

def exit_loop(tool_context: ToolContext):
    “”“Terminates loop when quality meets threshold.”“”
    print(f”🔧 Quality approved! Terminating loop...”)
    tool_context.actions.escalate = True  # ← THE MAGIC
    return {”result”: “Quality threshold met”}

print(”✅ Tools defined!”)

Step 2: Create the Agent Team

from google.adk.agents import Agent

# Agent 1: Drafter (runs once)
content_drafter_agent = Agent(
    name=”content_drafter_agent”,
    model=”gemini-2.5-flash”,
    instruction=”“”
    Write a blog post about: {{topic}}

    Create a draft (300-500 words) with:
    - Engaging intro
    - At least one H2 heading
    - A conclusion

    Output only the content in markdown.
    “”“,
    tools=[],
    output_key=”current_content”
)

# Agent 2: Quality Checker (runs each loop iteration)
quality_checker_agent = Agent(
    name=”quality_checker_agent”,
    model=”gemini-2.5-flash”,
    instruction=f”“”
    Analyze: {{{{current_content}}}}

    Your job:
    1. Count approximate words
    2. Estimate readability (60+ is good)
    3. Check for headings
    4. Check for conclusion

    Use `calculate_content_quality_score` tool.

    Then:
    - IF overall_score >= 70: respond ‘{QUALITY_THRESHOLD_MET}’
    - ELSE: respond ‘Score: [X]. Issues: [specific problems]’
    “”“,
    tools=[calculate_content_quality_score],
    output_key=”quality_feedback”
)

# Agent 3: Improver (runs each loop iteration)
content_improver_agent = Agent(
    name=”content_improver_agent”,
    model=”gemini-2.5-flash”,
    instruction=f”“”
    Current content: {{{{current_content}}}}
    Feedback: {{{{quality_feedback}}}}

    - IF feedback is ‘{QUALITY_THRESHOLD_MET}’: call `exit_loop` immediately
    - ELSE: improve based on issues:
      * Expand if short
      * Simplify if complex
      * Add headings if missing
      * Add conclusion if missing

    Output the COMPLETE improved content.
    “”“,
    tools=[exit_loop],
    output_key=”current_content”  # ← Overwrites for next iteration!
)

print(”🧞 Agent team created!”)

Key insight: current_content gets overwritten each iteration, allowing content to evolve!

Step 3: Build the Loop

from google.adk.agents import SequentialAgent, LoopAgent

# The iterative quality loop
quality_improvement_loop = LoopAgent(
    name=”quality_improvement_loop”,
    sub_agents=[quality_checker_agent, content_improver_agent],
    max_iterations=3  # Safety limit
)

# Complete workflow: Draft → Loop → Present
quality_workflow = SequentialAgent(
    name=”quality_workflow”,
    sub_agents=[
        content_drafter_agent,
        quality_improvement_loop,
        # Optional: final presenter agent
    ]
)

print(”✅ Iterative workflow created!”)

Testing the Loop

from IPython.display import display, Markdown
from google.adk.sessions import InMemorySessionService
from google.adk.runners import Runner
from google.genai.types import Content, Part

session_service = InMemorySessionService()
user_id = “adk_content_creator_001”

async def run_quality_workflow():
    session = await session_service.create_session(
        app_name=quality_workflow.name,
        user_id=user_id
    )

    topic = “The benefits of meditation for busy professionals”
    session.state[”topic”] = topic

    query = f”Create high-quality content about: {topic}”
    print(f”👤 User: {query}\n”)

    runner = Runner(
        agent=quality_workflow,
        session_service=session_service,
        app_name=quality_workflow.name
    )

    async for event in runner.run_async(
        user_id=user_id,
        session_id=session.id,
        new_message=Content(parts=[Part(text=query)], role=”user”),
        state_delta={”topic”: topic}
    ):
        if event.is_final_response():
            display(Markdown(event.content.parts[0].text))

await run_quality_workflow()

Example Output

👤 User: Create high-quality content about: The benefits of meditation...

[Iteration 1]
📝 Draft created (350 words, score: 55)
🔧 Calculating quality score...
   Score: 55/100 - BELOW THRESHOLD
Issues: Too short, missing headings, needs more structure

[Iteration 2]
✏️ Improving content...
   Added H2 headings, expanded to 650 words
🔧 Calculating quality score...
   Score: 68/100 - BELOW THRESHOLD
Issues: Almost there, needs better conclusion

[Iteration 3]
✏️ Final improvements...
   Enhanced conclusion, polished language
🔧 Calculating quality score...
   Score: 75/100 - MEETS THRESHOLD ✅
🔧 Quality approved! Terminating loop...

✅ Final approved content delivered!

What’s Next?

We can create self-improving workflows! But what about efficiency?

Sequential execution: 40 seconds Parallel execution: 10 seconds (4x faster!)

Part 6: Parallel AI Workflows introduces ParallelAgent for concurrent execution.

Try It Yourself!

Ready to build loop workflows? Click the button below:

GitHub Repository: content_creation_mas_workshop

Happy looping !

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Google ADK: From Local Development to Vertex AI Deployment: Part 4

Saoussen CHAABNIA — Tue, 06 Jan 2026 16:40:44 GMT

Welcome to Part 4 of Google ADK: From Local Development to Vertex AI Deployment! You’ve mastered agent delegation. Now let’s tackle workflows where order matters.

Google ADK: From Local Development to Vertex AI Deployment series:

Part 1: Building Your First AI Agent
Part 2: Custom Tools — Extending Agent Capabilities
Part 3: Multi-Agent Orchestration with Agent-as-a-Tool
Part 4: Sequential Workflows with SequentialAgent (You are here)
Part 5: Self-Improving Agents with LoopAgent
Part 6: Efficient Workflows with ParallelAgent
Part 7: Complete Multi-Agent System — The Capstone
Part 8: Deploying to Vertex AI Agent Engine
Part 9: Full-Stack Deployment with Cloud Run

Introduction: The Order Problem

In Part 3, we built an orchestrator that delegates to specialists based on user intent. Powerful stuff! But there’s a problem:

What if a task inherently requires multiple steps in a specific order?

Consider creating a blog post:

First: Research trending topics ← Can’t skip this
Then: Write content based on research ← Needs topic from step 1
Finally: Format for social media ← Needs content from step 2

You can’t write before researching. You can’t format before writing. Order matters.

Today, you’ll learn how to build sequential workflows using the SequentialAgent pattern. By the end of this article, you’ll have a three-stage content pipeline where data automatically flows from research → writing → social formatting.

What you’ll learn:

The SequentialAgent workflow pattern
The output_key parameter for state management
Variable interpolation with {{variable}}
Automatic data flow between agents

Prerequisites:

Completed Parts 1–3 or familiar with agents and orchestration
Understanding of sessions and state
Google API key

Let’s build your first workflow!

The Need for Sequential Execution

Current Limitation

With our orchestrator from Part 3:

# User asks for research
orchestrator → topic_research_agent → returns topics

# User asks to write based on those topics
orchestrator → content_writer_agent → ... wait, it doesn’t have the topics!

The problem: Each agent call is independent. Data doesn’t flow automatically.

What We Need

topic_research_agent → blog_topic (stored)
         ↓
content_writer_agent → uses blog_topic → blog_content (stored)
         ↓
social_formatter_agent → uses blog_content → social_posts

The solution: SequentialAgent with automatic state passing!

Introducing SequentialAgent

Concept: Workflow Agents

What is SequentialAgent? A SequentialAgent is a workflow agent that executes its sub-agents in a specific order. It’s designed for processes where the order of operations matters — each agent’s output becomes input for the next.

Key Characteristics:

Executes sub-agents one after another (not simultaneously)
Automatically passes state between agents
Useful for tasks with dependencies
Parameter: sub_agents - list of agents to run in order

When to use:

Multi-step processes
Data transformations (input → process → output)
Pipelines with dependencies

Reference: Workflow Agents

State Management: The Key to Data Flow

Concept: output_key Parameter

What is output_key? The output_key parameter tells ADK to store an agent’s final response in the session state under a specific variable name. This makes the output available to subsequent agents in the workflow.

How it works:

agent1 = Agent(
    name=”researcher”,
    output_key=”blog_topic”  # ← Stores response here
)
# After agent1 runs, session.state[”blog_topic”] = agent1’s response

The magic:

Agent 1 runs → output stored as blog_topic
Agent 2 can reference {{blog_topic}} in its instructions
ADK automatically replaces {{blog_topic}} with the actual value

Reference: Workflow Agents

Concept: Variable Interpolation

What is Variable Interpolation? ADK uses {{variable_name}} syntax in agent instructions to reference values from the session state. At runtime, ADK automatically replaces these placeholders with actual values.

Syntax:

instruction = “Write a blog post about: {{blog_topic}}”
# At runtime: {{blog_topic}} → “10 AI Tools for Small Businesses”

Rules:

Use double curly braces: {{variable}}
Variable must exist in session state
Previous agents set these via output_key

Reference: Workflow Agents

Building a Three-Stage Content Pipeline

Let’s build a complete workflow: Research → Write → Format

Step 1: Setup

!pip install google-adk==1.19.0 -q

import os
from getpass import getpass

api_key = getpass(’Enter your Google API Key: ‘)
os.environ[’GOOGLE_API_KEY’] = api_key
print(’✅ API Key configured!’)

Step 2: Agent 1 — Topic Researcher

This agent finds ONE perfect topic and stores it:

from google.adk.agents import Agent
from google.adk.tools import google_search

topic_research_agent = Agent(
    name=”topic_research_agent”,
    model=”gemini-2.5-flash”,
    description=”Researches trending blog topics”,
    instruction=”“”
    You are a content strategist. Find compelling blog topics.

    Process:
    1. Search for trending topics in the niche
    2. Select the SINGLE BEST topic
    3. Output ONLY the title

    Example output: “10 Zero-Waste Swaps to Transform Your Kitchen”

    Important: Output ONLY the blog post title, nothing else.
    “”“,
    tools=[google_search],
    output_key=”blog_topic”  # 🔑 Stores result in session.state[”blog_topic”]
)

print(f”🧞 Agent ‘{topic_research_agent.name}’ created!”)

Key points:

output_key="blog_topic" : Stores the title for next agent
Instructions emphasize : “Output ONLY the title”
This ensures clean data for the next stage

Step 3: Agent 2 — Content Writer

This agent uses the topic from Agent 1:

content_writer_agent = Agent(
    name=”content_writer_agent”,
    model=”gemini-2.5-flash”,
    description=”Writes engaging blog posts”,
    instruction=”“”
    You are a blog writer. Write about: {{blog_topic}}

    Requirements:
    - 400-600 words
    - Engaging intro
    - 3-4 sections with H2 headings
    - Clear conclusion with CTA
    - Conversational tone

    Output ONLY the blog post in markdown.
    “”“,
    tools=[],
    output_key=”blog_content”  # 🔑 Stores result in session.state[”blog_content”]
)

print(f”🧞 Agent ‘{content_writer_agent.name}’ created!”)

Notice:

{{blog_topic}} : References the variable from Agent 1
output_key="blog_content" : Stores for Agent 3
No tools needed, pure content generation

Step 4: Agent 3 — Social Media Formatter

This agent creates social posts from the blog content:

social_formatter_agent = Agent(
    name=”social_formatter_agent”,
    model=”gemini-2.5-flash”,
    description=”Creates social media posts”,
    instruction=”“”
    Create social posts from: {{blog_content}}

    Create THREE posts:

    1. **Twitter/X** (280 chars)
       - Hook + hashtags + CTA

    2. **LinkedIn** (150-200 words)
       - Professional tone
       - Key insights
       - Hashtags

    3. **Instagram** (150 words)
       - Engaging + emojis
       - 8-10 hashtags
       - Strong CTA

    Format with clear headers for each platform.
    “”“,
    tools=[]
    # Note: No output_key - this is the final agent
)

print(f”🧞 Agent ‘{social_formatter_agent.name}’ created!”)

Notice:

{{blog_content}} : Uses content from Agent 2
No output_key : Final output goes to user

Step 5: Chain Them with SequentialAgent

Now the magic happens:

from google.adk.agents import SequentialAgent

content_creation_workflow = SequentialAgent(
    name=”content_creation_workflow”,
    sub_agents=[
        topic_research_agent,      # Step 1: Research
        content_writer_agent,      # Step 2: Write
        social_formatter_agent     # Step 3: Format
    ],
    description=”Research → Write → Format workflow”
)

print(”✅ Sequential workflow created!”)
print(”\n🔄 Execution Flow:”)
print(”   1. Research trending topics → blog_topic”)
print(”   2. Write blog post using {{blog_topic}} → blog_content”)
print(”   3. Format social posts using {{blog_content}} → final output”)

That’s it! Three agents, three lines of code, automatic data flow.

Understanding the Data Flow

Concept: State Passing Between Agents

How Does State Passing Work? In a SequentialAgent workflow, data flows automatically:

Agent 1 runs → stores output via output_key="var1"
Agent 2 reads {{var1}} from state → stores output via output_key="var2"
Agent 3 reads {{var2}} from state → produces final output

ADK handles storing, interpolating, and passing state. Zero manual work!

Visual representation:

┌─────────────────────────────────────┐
│ topic_research_agent                │
│ output_key=”blog_topic”             │
└──────────────┬──────────────────────┘
               │
               ▼ (blog_topic stored in state)
┌─────────────────────────────────────┐
│ content_writer_agent                │
│ instruction: “Write about:          │
│              {{blog_topic}}”        │ ← ADK replaces this
│ output_key=”blog_content”           │
└──────────────┬──────────────────────┘
               │
               ▼ (blog_content stored in state)
┌─────────────────────────────────────┐
│ social_formatter_agent              │
│ instruction: “Create posts from:    │
│              {{blog_content}}”      │ ← ADK replaces this
└──────────────┬──────────────────────┘
               │
               ▼
         Final Output

Reference: Workflow Agents

Running the Workflow

Setup execution engine:

from IPython.display import display, Markdown
from google.adk.sessions import InMemorySessionService, Session
from google.adk.runners import Runner
from google.genai.types import Content, Part

session_service = InMemorySessionService()
user_id = “adk_content_creator_001”

async def run_agent_query(agent, query, session, user_id):
    print(f”\n🚀 Running: ‘{agent.name}’...”)

    runner = Runner(agent=agent, session_service=session_service, app_name=agent.name)

    final_response = “”
    try:
        async for event in runner.run_async(
            user_id=user_id,
            session_id=session.id,
            new_message=Content(parts=[Part(text=query)], role=”user”)
        ):
            if event.is_final_response():
                final_response = event.content.parts[0].text
    except Exception as e:
        final_response = f”Error: {e}”

    print(”\n” + “-”*50)
    display(Markdown(final_response))
    print(”-”*50)

    return final_response

print(”✅ Execution engine ready!”)

Run the complete workflow:

async def run_workflow():
    session = await session_service.create_session(
        app_name=content_creation_workflow.name,
        user_id=user_id
    )

    query = “Create content for sustainable living and zero-waste lifestyle blog”
    print(f”👤 User: {query}\n”)

    await run_agent_query(content_creation_workflow, query, session, user_id)

await run_workflow()

Example Execution

Watch the three-stage pipeline in action:

👤 User: Create content for sustainable living and zero-waste lifestyle blog

🚀 Running: ‘content_creation_workflow’...

[Stage 1: Topic Research]
🔍 Searching for trending topics...
✓ Selected topic: “10 Zero-Waste Swaps to Transform Your Kitchen”
✓ Stored in state as: blog_topic

[Stage 2: Content Writing]
✓ Retrieved from state: blog_topic = “10 Zero-Waste Swaps...”
✍️ Writing 500-word blog post...
✓ Stored in state as: blog_content

[Stage 3: Social Formatting]
✓ Retrieved from state: blog_content = “Transform your kitchen...”
📱 Creating social posts for 3 platforms...

--------------------------------------------------
✅ Final Response:

## Twitter/X
🌱 Transform your kitchen into a zero-waste powerhouse! Discover 10 simple swaps that
save money & the planet. From beeswax wraps to compost bins. Start today!
#ZeroWaste #SustainableLiving #EcoFriendly #GreenKitchen

## LinkedIn
The average household produces 4.4 pounds of waste daily, with kitchens being the
biggest culprit. But transformation doesn’t require perfection—it requires small,
intentional swaps.

In our latest blog post, we explore 10 zero-waste kitchen alternatives:
- Beeswax wraps replace plastic wrap
- Glass containers instead of disposable bags
- Compost bins for organic waste
- Reusable produce bags

Each swap is practical, affordable, and immediately implementable. Perfect for
businesses promoting sustainability or individuals starting their eco-journey.

Read the full guide: [link]

#Sustainability #ZeroWaste #EcoFriendly #GreenBusiness #CircularEconomy

## Instagram
🌿✨ Your kitchen called—it wants to go green! ✨🌿

Tired of single-use plastics? We’ve got you covered with 10 game-changing swaps:
🐝 Beeswax wraps > plastic wrap
🥫 Glass jars > disposable containers
🗑️ Compost bin > landfill waste
🛍️ Reusable bags > plastic produce bags

Small changes, BIG impact! 🌍💚

Click the link in bio to discover all 10 swaps + how to implement them TODAY!

#ZeroWaste #SustainableLiving #EcoFriendly #GreenKitchen #PlasticFree
#Zerowaste #Sustainability #EcoConscious #GreenLiving #SaveThePlanet

Notice how:

Stage 1 found a specific topic
Stage 2 wrote content about that exact topic
Stage 3 created social posts from that exact content
All automatic, no manual copying needed!

What’s Next?

We can now create ordered workflows where data flows automatically! But what if we need iterative refinement?

Imagine:

Draft content → Check quality → Improve → Check again → Improve → … → Until good enough

Current problem:

# This runs once and stops
SequentialAgent([drafter, checker, improver])

We need:

# This loops until quality threshold met
LoopAgent([checker, improver], max_iterations=3)

In Part 5: Self-Improving AI with LoopAgent, we’ll learn how to build iterative workflows that improve content through critique-refine cycles until quality standards are met.

Try It Yourself!

Ready to build sequential workflows? Click the button below:

GitHub Repository: content_creation_mas_workshop

Happy sequential workflow sequencing!

Thanks for reading! If this was helpful, hit the ❤️, drop a comment, ⭐ the GitHub repo, and subscribe so you don’t miss the next one. Let’s connect on LinkedIn!

Saoussen’s Substack

Production-Ready MLOps on GCP Part 8: Model Monitoring & Continuous Training

Introduction

The Model Degradation Problem

Event-Driven Continuous Training

Cloud Run Function Trigger

Trigger Conditions

Event Flow

Configuration

Scheduled Retraining

Vertex AI Pipeline Schedule Setup

Common Schedules

Scheduled Pipeline Parameters

Production Orchestration Patterns

Pattern 1: Scheduled Training → Automatic Prediction

Pattern 2: New Data → Continuous Training

Pattern 3: Event-Driven Training via Cloud Run Function

Observability and Debugging

Key Metrics to Monitor

Cloud Logging Queries

Dashboards

Responding to Model Degradation

Alert → Investigate → Retrain Workflow

Optimization Strategies

Best Practices

1. Always Version Everything

2. Use Champion/Challenger Pattern

3. Monitor Before Optimizing

4. Set Up Alerts Thoughtfully

5. Document Retraining Decisions

Conclusion: Your Complete MLOps System

Production-Ready MLOps on GCP Part 7: CI/CD for ML

Introduction

CI/CD Architecture Overview

The 6 Cloud Build Pipelines

1. PR Checks (pr-checks.yaml)

2. E2E Tests (e2e-test.yaml)

3. Terraform Plan (terraform-plan.yaml)

4. Terraform Apply (terraform-apply.yaml)

5. Release (release.yaml)

6. Schedule Pipelines (schedule-pipelines.yaml)

Testing Strategies for ML

Level 1: Unit Tests

Level 2: Pipeline Compilation Tests

Level 3: End-to-End Tests

Pre-commit Hooks: Local Quality Gates

Complete Workflow: From Code to Production

1. Feature Development

2. Pull Request

3. Code Review

4. Merge

5. Release

6. Deploy to Test Environment

7. Deploy to Production

Event-Driven Execution (Optional)

Artifact Management

Docker Image Versioning

Pipeline Versioning

Best Practices

1. Fail Fast

2. Make CI/CD Logs Searchable

3. Separate Admin Project

4. Use Substitution Variables

5. Test Terraform in Dev First

Conclusion

Production-Ready MLOps on GCP Part 6: Prediction Pipeline(From Champion Model to Batch Predictions)

Introduction

Prediction Pipeline Architecture

Step 1: Lookup Champion Model

Step 2: Data Preprocessing

Why Same SQL as Training?

Step 3: Batch Prediction

Batch Prediction Workflow

Horizontal Scaling

Step 4: Model Monitoring and Skew Detection

Training-Serving Skew Detection

Alert Configuration

Complete Prediction Pipeline Code

Pipeline Execution DAG on Vertex AI pipeline

Key Design Decisions

1. PR Checks (`pr-checks.yaml`)

2. E2E Tests (`e2e-test.yaml`)

3. Terraform Plan (`terraform-plan.yaml`)

4. Terraform Apply (`terraform-apply.yaml`)

5. Release (`release.yaml`)

6. Schedule Pipelines (`schedule-pipelines.yaml`)