Pre-infrastructure document. What pod tier and operating mode to commit to before building Docker images, cron schedulers, and per-client deploy scripts. Numbers grounded in observed RunPod pricing as of 2026-05-05.
Both run identical RTX 4090 hardware (24GB VRAM). The difference is who operates the machine.
| Attribute | Community | Secure |
|---|---|---|
| Hourly rate (4090) | $0.34 | $0.69 |
| Monthly @ 24/7 | $250 | $504 |
| Monthly @ 8–6 M–F | $75 | $152 |
| Uptime SLA | ~98% | ~99.5% |
| Supply availability | Variable | Reliable |
| Compliance fit | Non-regulated | HIPAA / SOC2 |
| Network | Variable, mostly gigabit | Multi-gigabit, low latency |
| Best for | Most B2B SaaS | Regulated / customer-facing / demos |
24/7 = 730 GPU-hrs/month. 8–6 M–F = 220 GPU-hrs/month (10 hr × 22 weekdays). 8–6 costs 30% of 24/7.
| Attribute | 24/7 Always-On | 8–6 Business Hours |
|---|---|---|
| GPU-hrs / month | 730 | 220 |
| Cost vs 24/7 | 100% | 30% |
| Operational complexity | Low | High |
| Cold start risk | None | Daily (mitigated by Docker image) |
| Required infrastructure | Pod + monitoring | Pod + Docker image + cron + monitoring + state-off-pod |
| Daily failure modes | Hardware failure (rare) | Cron failure, supply gap, redeploy issues |
| Best for | <10 pods, customer-facing, regulated | ≥10 pods, sales-team-only, cost-tight |
Revenue: $21,000 / mo. Sizing: one pod per location for tenant isolation = 6 pods total. Normal call volume (~50K calls/month).
| Mode | Per pod / mo | × 6 pods | + Vol+Platform | Total / mo | Margin % |
|---|---|---|---|---|---|
| Secure 24/7 | $504 | $3,024 | $68 | $3,092 | 85.3% |
| Community 24/7 | $250 | $1,500 | $68 | $1,568 | 92.5% |
| Secure 8–6 | $152 | $912 | $68 | $980 | 95.3% |
| Community 8–6 | $75 | $450 | $68 | $518 | 97.5% |
Volume: 4,000 calls/hour × 10 hrs × 20 days = 800,000 calls/month. Constraint flips from users per pod to processing throughput. With pipeline parallelism added: 30 pods required to keep up with peak. At flat $3,500/location pricing, the same $21K revenue must cover 5× more compute — Secure 24/7 becomes unprofitable.
| Mode | Per pod / mo | × 30 pods | + Vol+Platform | Total / mo | Margin @ $21K |
|---|---|---|---|---|---|
| Secure 24/7 | $504 | $15,120 | $152 | $15,272 | 27% |
| Community 24/7 | $250 | $7,500 | $152 | $7,652 | 64% |
| Secure 8–6 | $152 | $4,560 | $152 | $4,712 | 78% |
| Community 8–6 | $75 | $2,250 | $152 | $2,402 | 89% |
Production recommendation. Captures Community's cost savings with Secure's reliability backstop.
Current enterprise tier is $3,500/location flat. Works for moderate-volume clients. Breaks down at Scenario-B volumes where 30+ pods are needed. The fix is volume-tiered pricing.
| Tier | Base / location / mo | Calls included | Overage |
|---|---|---|---|
| Starter | $1,500 | 25,000 | $0.02 / call |
| Mid | $3,500 | 100,000 | $0.02 / call |
| Enterprise | $7,500 | 300,000 | $0.015 / call |
| Unlimited | $12,000 | Unlimited | — |
| # | Item | Required for | Build time |
|---|---|---|---|
| 1 | Docker image (Node + Ollama + model + repo) | All modes | 1–2 hr |
| 2 | Per-client network volume | All modes | 30 min |
| 3 | Poller state → Supabase (off pod disk) | 8–6 mode | 30 min |
| 4 | deploy-client.sh script | All modes | 1 hr |
| 5 | GitHub Actions cron (deploy/terminate) | 8–6 mode | 30 min |
| 6 | Health monitor + auto-redeploy | All modes | 1 hr |
| 7 | Community→Secure failover script | Hybrid mode | 1 hr |
| 8 | Pipeline parallelism (concurrent processing) | High-volume clients | 4–8 hr |
To reach Phase 2 (Hybrid 24/7): ~5 hours total. To reach Phase 3 (full 8–6 capability): ~8 hours.