Skip to content

VerteX Air-Gapped Install on EKS

Self-hosted Palette VerteX deployment on AWS EKS in the Recro internal account. Air-gapped: private-only EKS API, no NAT, all images pulled from ECR via VPC endpoints.

The bastion is fully autonomous — apply Terraform, wait ~40 minutes, VerteX is running. No manual intervention.


Architecture

S3 Bundle Bucket (vertex-ecr module)
  ├── charts.zip, palette-vertex-appliance-4.8.40.tar.zst
  ├── index.json, oci-layout (OCI metadata)
  ├── Scripts (vertex-mirror-pull, ecr-mirror, cluster-setup, full-setup, dns-update)
  └── Helm values (cert-manager, image-swap, vertex airgap configs)

ECR (vertex-ecr module)
  └── 128 repos under vertex-bootstrap/* (350 container images)

vertex-bootstrap module
  ├── VPC 10.1.0.0/16 (private + public subnets, no NAT)
  ├── VPC Endpoints (ECR, S3, EKS, STS, logs, EC2, SSM)
  ├── EKS cluster (K8s 1.32, private-only, 3x m5.2xlarge)
  ├── Client VPN (mutual TLS, split tunnel, 10.250.0.0/22)
  └── Bastion (t3.large, public subnet, SSM access)
        On boot:
        1. Install tools (kubectl, helm, oras, docker)
        2. Download scripts + values from S3
        3. Pull bundle from S3 + extract (17 GB)
        4. Wait for EKS cluster ACTIVE
        5. Wait for nodes Ready + create gp3 StorageClass
        6. Check ECR → mirror if empty (~20 min first time, skip on rebuild)
        7. Helm install cert-manager, image-swap, hubble
        8. Update DNS (vertex.recrocog.com → NLB)
        9. Export VPN .ovpn config to S3
        10. VerteX running

Prerequisites (one-time)

1. Spectro Cloud credentials

Download VerteX installer files from Artifact Studio.

Username: spectro
Password: mV715z##spPSJC

Shared credentials. Do not commit.

2. Download from Artifact Studio

Artifact Studio portal → Palette VerteX → Releases → 4.8.40

File Size Purpose
charts.zip ~390 KB Helm charts (cert-manager, image-swap, VerteX mgmt plane)
palette-vertex-appliance-4.8.40.tar.zst ~17 GB Container images + Spectro packs

3. Extract OCI metadata from the bundle

The tar.zst contains index.json, index.json.lock, and oci-layout that the mirror script needs. Extract them separately:

tar --use-compress-program=unzstd -xf palette-vertex-appliance-4.8.40.tar.zst \
  -C /tmp index.json index.json.lock oci-layout

4. AWS access

Member of the recro GitHub org with access to recro-aws-iac.

aws sso login --profile <your-profile>
git clone https://github.com/recro/recro-aws-iac.git
cd recro-aws-iac

5. Local tools

  • git, terraform 1.10+, aws CLI v2, aws-session-manager-plugin

First-Time Setup

Step 1: Apply vertex_ecr (S3 bucket + ECR repos)

cd terraform && make init
terraform apply \
  -var="region=us-east-1" \
  -var-file=sso.tfvars -var-file=dns.tfvars -var-file=eks.tfvars \
  -var-file=cognito.tfvars -var-file=resource-cleanup.tfvars \
  -var-file=resource-startup.tfvars -var-file=manual-cleanup-web-app.tfvars \
  -var-file=vertex.tfvars \
  -target='module.vertex_ecr[0]'

Step 2: Upload bundle files to S3

aws s3 cp charts.zip s3://891377028731-vertex-bootstrap-bundle/
aws s3 cp palette-vertex-appliance-4.8.40.tar.zst s3://891377028731-vertex-bootstrap-bundle/

# Upload OCI metadata (extracted in prerequisite step 3)
aws s3 cp /tmp/index.json s3://891377028731-vertex-bootstrap-bundle/
aws s3 cp /tmp/index.json.lock s3://891377028731-vertex-bootstrap-bundle/
aws s3 cp /tmp/oci-layout s3://891377028731-vertex-bootstrap-bundle/

The 17 GB upload takes 60-90 min on residential internet. This is a one-time step — the files persist in S3 across rebuilds.

Step 3: Apply vertex_bootstrap

Via CI (recommended):

git tag <date>-vertex -m "Deploy vertex_bootstrap"
git push origin <date>-vertex

Tag must contain "vertex" to trigger the vertex-scoped CI apply. ~30-40 min total.

Or locally:

make plan-vertex && make apply-vertex

Step 4: Wait

The bastion does everything automatically:

  1. Install tools (~3 min)
  2. Download scripts + values from S3 (~10 sec)
  3. Pull 17 GB bundle from S3 + extract (~10 min)
  4. Wait for EKS cluster ACTIVE (~12 min, overlaps with bastion boot)
  5. Create gp3 StorageClass (~5 sec)
  6. Mirror ~350 images to ECR (~20 min, first time only — skipped if ECR already populated)
  7. Push Spectro packs to ECR (~2 min)
  8. Helm install cert-manager + image-swap (~5 min)
  9. Restore K8s Secrets from latest.tgz (~30 sec, fresh install only) — preserves master encryption key
  10. Helm install hubble (~8-10 min)
  11. Restore mongo from latest.tgz (~1 min, fresh install only) — all prior data comes back: activation, System ID, users, tenants
  12. Refresh CoreDNS with current VPC endpoint IPs
  13. Update DNS vertex.recrocog.com → NLB (public + private Route53 zones)
  14. Export fresh .ovpn to S3

Total first-time (empty ECR, no backup): ~40 min. Rebuild (ECR pre-populated, with backup): ~25 min — cluster comes back with all prior data, no manual intervention.

Step 5: Access VerteX console via VPN

The bastion auto-exports a .ovpn config file to S3 after each rebuild. Download it and connect.

First time — install AWS VPN Client:

Download from aws.amazon.com/vpn/client-vpn-download

Download .ovpn config (re-download after each rebuild):

aws s3 cp s3://891377028731-vertex-bootstrap-bundle/vertex-vpn.ovpn .

Connect:

  1. Open AWS VPN Client
  2. File → Manage Profiles → Add Profile → select vertex-vpn.ovpn
  3. Click Connect
  4. Browse https://vertex.recrocog.com/system

Login: admin / set your password (14+ chars, upper, lower, digit, special)

kubectl also works directly while connected:

aws eks update-kubeconfig --region us-east-1 --name vertex-bootstrap
kubectl get nodes

Fallback — SSM tunnel (if VPN is unavailable):

BASTION=$(aws ec2 describe-instances --region us-east-1 \
  --filters "Name=tag:Name,Values=vertex-bootstrap-bastion" \
            "Name=instance-state-name,Values=running" \
  --query 'Reservations[0].Instances[0].InstanceId' --output text)

aws ssm start-session --region us-east-1 --target $BASTION \
  --document-name AWS-StartPortForwardingSessionToRemoteHost \
  --parameters '{"host":["<NLB-hostname>"],"portNumber":["443"],"localPortNumber":["8443"]}'

Browse: https://localhost:8443/system


Destroy / Rebuild

Destroy (save costs)

git tag <date>-vertex -m "Destroy"
# Or locally:
terraform destroy -target='module.vertex_bootstrap[0]' -auto-approve ...

Destroys: VPC, EKS, bastion, VPN, all K8s workloads Keeps: ECR repos (350 images), S3 bucket (bundle + scripts), KMS keys, ACM certs

Rebuild

git tag <date>-vertex -m "Rebuild"
git push origin <date>-vertex

Bastion boots, skips ECR mirror (images already there), installs Helm, VerteX up in ~25 min. DNS auto-updates. New .ovpn uploaded to S3 — re-download and reconnect.


Monitoring & Troubleshooting

Check bastion logs

aws ssm start-session --region us-east-1 --target <bastion-id>
sudo -i

cat /var/log/vertex-full-setup.log     # Full orchestrator
cat /var/log/vertex-ecr-mirror.log     # ECR mirror
cat /var/log/vertex-cluster-setup.log  # Node wait + StorageClass
cat /var/log/vertex-dns-update.log     # DNS update
cat /var/log/vertex-bastion-bootstrap.log  # Tool versions

Re-run orchestrator

sudo /usr/local/bin/vertex-full-setup.sh

Idempotent — skips steps already completed.

Common issues

Issue Symptom Fix
Scripts missing from S3 [boot] ERROR: Failed to download terraform apply -target='module.vertex_ecr[0]'
Cluster not ready Cluster status is CREATING Wait — orchestrator retries every 30s for 20 min
Image pull failures Pods ImagePullBackOff Check ECR mirror log, re-run sudo /usr/local/bin/vertex-ecr-mirror.sh
MongoDB PVCs pending PVCs stuck Pending Check kubectl get sc — gp3 should be default
ELB blocks destroy DependencyViolation on subnet Destroy provisioner handles this automatically (ELB + ENI + SG cleanup)
DNS stale vertex.recrocog.com wrong NLB sudo /usr/local/bin/vertex-dns-update.sh
VPN .ovpn stale Can't connect after rebuild Re-download: aws s3 cp s3://891377028731-vertex-bootstrap-bundle/vertex-vpn.ovpn .
VPN cert error .ovpn missing cert/key sudo /usr/local/bin/vertex-vpn-export.sh then re-download

Quick Reference

Thing Value
Region us-east-1
EKS cluster vertex-bootstrap
EKS endpoint Private only
VPC CIDR 10.1.0.0/16
ECR registry 891377028731.dkr.ecr.us-east-1.amazonaws.com
ECR repo prefix vertex-bootstrap/
S3 bundle bucket s3://891377028731-vertex-bootstrap-bundle/
Root domain vertex.recrocog.com
VerteX version 4.8.40
Node sizing 3x m5.2xlarge (8 vCPU, 32 GB, 110 GB)
VPN client CIDR 10.250.0.0/22
VPN auth Mutual TLS (certs in ACM)
VPN .ovpn s3://891377028731-vertex-bootstrap-bundle/vertex-vpn.ovpn
Spectro Artifact Studio spectro / mV715z##spPSJC

CI Tag Convention

Tag Scope
<date> (e.g., 2026-04-17) Core infra only (SSO, DNS, recro-eks) — no vertex
<date>-vertex (e.g., 2026-04-17-vertex) Vertex only (vertex_ecr + vertex_bootstrap)

Adding a new module to main.tf: add a -target line to plan-core in the Makefile or non-vertex CI applies will skip it.


Scripts Reference

All scripts are Terraform-managed S3 objects (terraform/modules/vertex-ecr/scripts/). The bastion downloads them on boot.

Bastion-side (run from /usr/local/bin/ on bastion)

Script Purpose Auto on boot?
vertex-mirror-pull.sh Pull bundle + values from S3, extract Yes
vertex-ecr-mirror.sh Push OCI images to ECR Yes (if ECR empty)
vertex-pack-push.sh Push Spectro packs to ECR Yes
vertex-cluster-setup.sh Wait for nodes, create gp3 StorageClass Yes
vertex-secrets-restore.sh Pre-helm-install K8s Secrets restore (preserves master encryption key) Yes (before helm install)
vertex-full-setup.sh Orchestrator — runs all steps in order Yes
vertex-restore-backup.sh Post-helm mongo restore from s3://.../backups/latest.tgz Yes (on fresh install only)
vertex-coredns-refresh.sh CoreDNS VPC endpoint IP freshness check Yes (on boot + hourly systemd timer)
vertex-dns-update.sh UPSERT vertex.recrocog.com in Route53 public + private zones Yes (after Helm)
vertex-vpn-export.sh Export .ovpn config to S3 Yes (after DNS)

In-cluster CronJobs (installed by vertex-full-setup.sh)

CronJob Schedule Purpose
vertex-mongo-backup Daily 02:00 UTC Dump mongo + K8s Secrets, rotate 7 daily / 4 weekly / 3 monthly, update latest.tgz
vertex-backup-verify Sundays 04:00 UTC Integrity check of latest.tgz (extracts, validates mongo archive + secrets), writes backups/last-verify.json

Config: /usr/local/bin/vertex-mirror-config.sh (injected by Terraform user_data)


Backup & Restore

Automated backup (in-cluster CronJob)

The vertex-mongo-backup CronJob runs daily at 02:00 UTC and writes a bundle to S3:

s3://891377028731-vertex-bootstrap-bundle/backups/
  ├── latest.tgz                    # most recent, always overwritten
  ├── daily/<Mon|Tue|...|Sun>.tgz   # 7-day rotation
  ├── weekly/week-<NN>.tgz          # 4-week rotation
  └── monthly/<YYYY-MM>.tgz         # 3-month rotation

Each bundle contains: - dump.archivemongodump of the entire hubbledb replica set - secrets.jsonkubectl get secrets -n hubble-system (stripped of runtime fields), excluding spectro-mongodb-replicaset-key and spectro-mongo-key (these must be regenerated fresh by Spectro's mongodb-key-manager Job on every install)

The vertex-backup-verify CronJob runs weekly and writes backups/last-verify.json so stale or broken backups are detected before you need them.

Automated restore on fresh install

When the bastion runs vertex-full-setup.sh against an empty hubble-system namespace, it restores automatically in this order:

  1. Secrets first (vertex-secrets-restore.sh, runs before helm install hubble): Pre-applies 30+ K8s Secrets from latest.tgz into the empty namespace. Helm's lookup template then reuses existing configserversecret.secretEncryptKey / hashSalt / rootKey instead of generating new ones. This preserves the master encryption key so restored mongo {cipher}... values stay decryptable.

  2. Helm install hubble — now installs with the preserved secrets.

  3. Mongo restore (vertex-restore-backup.sh, runs after helm install when FRESH_INSTALL=true): Runs mongorestore from dump.archive in the same bundle. The restored data matches the preserved encryption key → no crashloop cascade.

On rebuild (destroy + apply), the S3 bucket survives, so latest.tgz is still there and the new cluster comes up with all prior data intact: activation, System ID, users, tenants, registries. No manual intervention.