Open-weight models on your GPUs vs. paying per token. Break-even volume and month, a blind quality bake-off protocol, and the 30-point ops audit that kills GPU impulse buys.
Break-even volume and break-even month for self-hosting vs. API pricing — including the MLOps staffing line most comparisons quietly omit, plus the capacity ceiling: the month your workload outgrows the hardware.
Three go/no-go gates, a blind quality bake-off protocol with pass/fail thresholds, and a 30-point operations audit built to kill GPU impulse buys before they hit the budget.
Teams deciding whether to run open-weight models on their own GPUs instead of paying per token — and the skeptics who have to sign off on the purchase.
Excel or LibreOffice for the calculator (verified), any word processor for the framework document.
No — it's vendor-neutral. You supply the quotes and volumes; the math doesn't care whose logo is on the hardware.
Then the calculator will tell you that, plainly. "Keep paying per token" is a valid verdict — the kit exists to find the truth, not to sell you on GPUs.