Run multi-tenant Claude Code on Fly.io with ephemeral VMs

Problem

You want to provide Claude Code as a service to multiple users, but running Claude Code through the API directly costs $15/$75 per million tokens (input/output). A Claude Max subscription at $100-200/month provides significantly more value, but the Claude Code binary is designed for single-user local use. You need an architecture that runs the actual Claude Code binary for each user while keeping costs manageable and sessions isolated.

Solution

Architecture: Node.js orchestrator + Fly.io Firecracker VMs + WebSocket tunnels

User Browser
    ↓ WebSocket
Node.js Orchestrator (Fly.io app)
    ↓ SSH / WebSocket tunnel
Ephemeral Firecracker VM (per user)
    └── Claude Code binary + user's git repo

Step 1: Define the VM template with Fly Machines API

import Anthropic from "@anthropic-ai/sdk";

interface VMConfig {
  userId: string;
  repoUrl: string;
  oauthToken: string;
}

async function provisionVM(config: VMConfig): Promise<string> {
  const response = await fetch("https://api.machines.dev/v1/apps/cc-workers/machines", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${FLY_API_TOKEN}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      config: {
        image: "registry.fly.io/cc-worker:latest",
        guest: { cpus: 2, memory_mb: 2048 },
        env: {
          REPO_URL: config.repoUrl,
          CLAUDE_OAUTH_TOKEN: config.oauthToken,
        },
        auto_destroy: true,
        restart: { policy: "no" },
      },
    }),
  });

  const machine = await response.json();
  return machine.id;
}

Step 2: Dockerfile for the worker VM

FROM ubuntu:24.04

RUN apt-get update && apt-get install -y \
    curl git nodejs npm openssh-server \
    && npm install -g @anthropic-ai/claude-code

# Setup SSH for tunnel access
RUN mkdir /run/sshd
COPY sshd_config /etc/ssh/sshd_config

COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

ENTRYPOINT ["/entrypoint.sh"]

Step 3: Stream output back via WebSocket

import WebSocket from "ws";
import { spawn } from "child_process";

// Inside the VM: bridge Claude Code stdio to WebSocket
const wss = new WebSocket.Server({ port: 8080 });

wss.on("connection", (ws) => {
  const claude = spawn("claude", ["--json"], {
    cwd: "/workspace",
    env: { ...process.env, CLAUDE_OAUTH_TOKEN: process.env.CLAUDE_OAUTH_TOKEN },
  });

  claude.stdout.on("data", (data) => ws.send(data.toString()));
  ws.on("message", (msg) => claude.stdin.write(msg + "\n"));
  claude.on("exit", () => ws.close());
  ws.on("close", () => claude.kill());
});

Step 4: Destroy VM after session ends

async function destroyVM(machineId: string): Promise<void> {
  await fetch(`https://api.machines.dev/v1/apps/cc-workers/machines/${machineId}`, {
    method: "DELETE",
    headers: { Authorization: `Bearer ${FLY_API_TOKEN}` },
  });
}

Why It Works

Fly.io Machines use Firecracker microVMs that boot in under a second and are fully isolated. Each user gets their own ephemeral VM with Claude Code installed, their git repo cloned, and OAuth credentials injected. The auto_destroy: true flag ensures VMs are cleaned up when the process exits, so you only pay for active compute. By running the actual Claude Code binary (not the API), you use OAuth subscription tokens at the subscription rate rather than per-token API pricing.

Context

Fly Machines bill per-second of compute, so a 10-minute session costs fractions of a cent in compute
The OAuth token approach uses the Claude subscription, which is significantly cheaper than API pricing for heavy usage
Always use auto_destroy and set resource limits to prevent runaway costs
Git operations in the VM should point at a remote server so work persists after VM destruction
Consider Fly.io Sprites (sprites.dev) for a managed version of this pattern with built-in sandbox tooling