Finding the Sweet Spot: Our Migration from Heroku to Dokploy

Last year, our Heroku bill hit $3,000/month. The breakdown was painful: $500/month for managed Postgres, $200 for Redis, and over $2,000 for dynos running our Django REST API and Celery workers. The compute costs were especially galling—we were paying premium prices for dynos that sat idle most of the time.

We needed to migrate. The database part was straightforward—Heroku Postgres to Cloud SQL. But what about the compute? That's where things got interesting.

The False Dichotomy

My first attempt was the "pure infrastructure as code" approach. Since we were already using managed databases, I figured I could just use Terraform for GCP compute resources and bash scripts for deployment. How hard could it be?

#!/bin/bash
# deploy.sh - "simple" deployment script
# (300 lines of git pulls, pip installs, migrations, 
# celery restart logic, nginx reloads...)

Two weeks later, I had a working system. I also had:

  • Zero-downtime deployment scripts that nobody else understood
  • Manual coordination between Django migrations and code deploys
  • Celery workers that occasionally forgot to restart properly
  • A 10-page runbook for "simple" deployments
  • Team members afraid to deploy on Fridays (or any day, really)

The worst part? Debugging failures meant SSHing into multiple instances, grep-ing through systemd logs, and praying the issue was reproducible.

Why Not Cloud Run?

"Just use Cloud Run," was the first suggestion. It's Google's serverless container platform—sounds perfect, right?

Wrong. Cloud Run is built for request/response workloads. Our Celery workers were long-running processes that pulled tasks from Redis queues. Cloud Run would spin them down after 60 minutes of processing, killing any in-progress tasks.

Plus, Celery workers need persistent connections to Redis. They need to handle signals properly for graceful shutdowns. They need specific concurrency settings based on the task types. Cloud Run's autoscaling would constantly fight with Celery's own process management.

We needed actual long-running containers, not serverless functions pretending to be containers.

Why Not Cloud Tasks?

"What about Cloud Tasks?" someone suggested. "It's Google's managed task queue. Ditch Celery entirely!"

This sounded great until we looked at our codebase. We have five years of Celery tasks:

  • Complex task chains and groups using Celery's Canvas
  • Custom retry logic with exponential backoff
  • Task routing to specific queues based on priority
  • Periodic tasks managed by Celery Beat
  • Result backends for tracking long-running jobs

Migrating to Cloud Tasks would mean:

# Current Celery code (thousands of these)
@app.task(bind=True, max_retries=3)
def process_report(self, user_id, report_type):
    try:
        # Complex processing logic
        return result
    except Exception as exc:
        raise self.retry(exc=exc, countdown=60 * 2 ** self.request.retries)

# Would need to become Cloud Tasks + Pub/Sub + custom retry logic

We're talking about rewriting hundreds of async tasks, rebuilding our entire task routing system, and reimplementing Celery features we rely on. All for what? To replace a system that already works?

Our legacy tasks might not be pretty, but they process millions of jobs reliably. Rewriting them for a different queue system would take months and introduce bugs in code that hasn't failed in years.

Why Not Just Use Kubernetes?

"Fine, use GKE then," everyone said. We actually run GKE for other projects, so we know the platform well. But that's exactly why we didn't want it for this project.

Out of our team of seven, only two of us are comfortable with Kubernetes operations. When something goes wrong at 2 AM, those two people become single points of failure. The rest of the team can deploy to Heroku, but ask them to debug a CrashLoopBackOff or update a ConfigMap? Different story.

For a simple Django + Celery setup, we'd need:

  • Deployment manifests for API and each worker type
  • Service definitions and ingress rules
  • ConfigMaps for environment variables
  • HPA configurations for scaling
  • PersistentVolumeClaims for media files

Every new developer would need days of onboarding just to understand the deployment pipeline. Every on-call rotation would depend on those two Kubernetes experts being available. It wasn't about the technology—it was about the bus factor.

The Missing Middle

What we needed was something between "bash scripts and systemd" and "full Kubernetes." Coming from Heroku, we wanted:

  • Git push deployments that any developer could do
  • Simple environment variable management
  • Easy worker scaling without YAML
  • No cluster operations knowledge required
  • Support for our existing Celery setup without changes

Unfortunately, GCP doesn't have an equivalent to AWS ECS. But after testing various solutions, Dokploy emerged as the winner.

Dokploy + Terraform: A Pragmatic Approach

Here's what made sense: use Terraform for what it's good at (cloud resources) and Dokploy for what it's good at (application deployment):

# terraform/main.tf
# Network setup
resource "google_compute_network" "main" {
  name                    = "dokploy-network"
  auto_create_subnetworks = false
}

resource "google_compute_subnetwork" "main" {
  name          = "dokploy-subnet"
  ip_cidr_range = "10.0.0.0/24"
  network       = google_compute_network.main.id
  region        = "us-central1"
}

# Firewall rules
resource "google_compute_firewall" "dokploy_web" {
  name    = "dokploy-web"
  network = google_compute_network.main.name

  allow {
    protocol = "tcp"
    ports    = ["80", "443"]
  }
  
  source_ranges = ["0.0.0.0/0"]
  target_tags   = ["dokploy-app"]
}

# Compute instances
resource "google_compute_instance" "dokploy_app" {
  count        = 2
  name         = "dokploy-app-${count.index}"
  machine_type = "e2-standard-4"
  tags         = ["dokploy-app"]
  
  boot_disk {
    initialize_params {
      image = "ubuntu-os-cloud/ubuntu-2204-lts"
      size  = 50
    }
  }
  
  network_interface {
    subnetwork = google_compute_subnetwork.main.id
    access_config {} # External IP
  }
  
  metadata_startup_script = file("install-dokploy.sh")
}

resource "google_compute_instance" "dokploy_worker" {
  count        = 3
  name         = "dokploy-worker-${count.index}"
  machine_type = "e2-highmem-2"  # Celery loves memory
  tags         = ["dokploy-worker"]
  
  # Similar configuration...
}

# Storage for static files and media
resource "google_storage_bucket" "media" {
  name     = "myapp-media-files"
  location = "US"
}

resource "google_storage_bucket" "static" {
  name     = "myapp-static-files"
  location = "US"
}

# Managed databases (already migrated from Heroku)
resource "google_sql_database_instance" "main" {
  name             = "main-postgres"
  database_version = "POSTGRES_14"
  
  settings {
    tier = "db-custom-4-16384"
    
    ip_configuration {
      ipv4_enabled    = true
      private_network = google_compute_network.main.id
    }
  }
}

resource "google_redis_instance" "cache" {
  name           = "main-redis"
  tier           = "STANDARD_HA"
  memory_size_gb = 4
  
  authorized_network = google_compute_network.main.id
}

The beauty? Dokploy handles all the application-level concerns:

  • Load balancing between instances with Caddy
  • Automatic SSL certificate management
  • Django static file collection and serving from GCS
  • Celery worker management with proper signal handling
  • Zero-downtime deployments
  • Our existing Celery code runs unchanged

The Deployment Flow

Now deployments are dead simple:

  1. Developer pushes to main branch
  2. CI/CD runs tests
  3. git push dokploy main
  4. Dokploy builds containers, runs migrations, restarts services
  5. Caddy automatically handles SSL and routing

No SSH. No bash scripts. No Kubernetes expertise required. No rewriting Celery tasks.

The Results

Six months later:

  • Hosting costs: $3,000/month → $800/month
  • Compute costs specifically: $2,300/month → $300/month
  • Deployment time: 15 minutes → 3 minutes
  • Failed deployments: ~1 per week → ~1 per month
  • Engineers comfortable deploying: 2 → 7 (entire team)
  • Legacy Celery code touched: 0 lines

We kept using managed services for databases—Cloud SQL for Postgres, Memorystore for Redis. No point in managing stateful services when GCP does it better.

Lessons Learned

This migration taught me:

  1. Don't rewrite working code for infrastructure reasons - If your Celery tasks work, keep them
  2. Consider your team's expertise distribution - Two Kubernetes experts doesn't mean the whole team can operate it
  3. Developer experience compounds - Every friction point in deployment slows down the entire team
  4. Terraform + PaaS tools > Pure Terraform - Let each tool do what it's good at
  5. Managed databases are non-negotiable - Never run your own Postgres in production

The "pure" infrastructure-as-code approach looked good in architecture diagrams but was miserable in practice. Sometimes the best solution is the one that gets out of your team's way and lets your existing code keep running.