Portoser Documentation
Portoser is a declarative service orchestrator for clusters of 2 to 20 machines — homelabs, small teams, mixed-architecture rigs. It's agentless (SSH only), runs on macOS and Linux, and ships a self-healing loop that turns recurring failures into reusable playbooks.
What it actually does
Given a registry.yml describing your machines and services, Portoser:
- Deploys services to the right host, in the right form — Docker Compose, a
uv-managed Python app, or a native systemd / launchd unit. - Watches every deploy. When something fails, the analyzer matches the failure against known patterns (port conflicts, stale processes, disk space, dependency health, SSH/permission issues, Docker daemon errors).
- Fixes what it recognizes, automatically — runs the matching playbook from
~/.portoser/knowledge/playbooks/. - Learns from new fixes — when a manual or scripted resolution succeeds, it gets saved as a playbook with a frequency count, so the next occurrence is automatic.
That loop — observer → analyzer → solver → learning — is the differentiator. Everything else is table stakes.
Capabilities
Deployment
- Three deployment types from one registry:
docker(Docker Compose),local(uv-managed Python),native(systemd / launchd). - Drag-and-drop service moves in the web UI. Drops stage as pending changes; you click Deploy to apply. No surprises.
- Multi-architecture from one registry: arm64 and amd64 services side-by-side, with
lib/cluster/buildx.shbuilding the right platform per host. - Dependency graph + impact analysis: see what depends on what before you change it.
- Deployment history with config diffs and one-click rollback.
Self-healing
- Auto-recovery for the failure modes shipped in
lib/diagnose/analyzer.sh: port conflicts, stale processes, disk-space pressure, dependency health failures, Docker daemon hiccups, SSH connectivity, permissions. - Knowledge base at
~/.portoser/knowledge/playbooks/with a frequency map of what's failing most. - Intelligent Deployment Panel in the web UI: detect → rank solutions → dry-run → apply.
Observability
- Live metrics over WebSocket (
/api/metrics/ws) — CPU, memory, disk, uptime per service. - Live deployment logs over WebSocket (
/ws) during a deploy. - Uptime, MTBF, MTTR, and event history per service.
Security
- Agentless: workers are reached over SSH only — no daemon to install, no port to expose.
- mTLS between services with a built-in CA + automated cert distribution (
lib/certificates.sh,install_ca_on_hosts.sh). - HashiCorp Vault for secrets, referenced from the registry.
- Keycloak OIDC on the backend (frontend login UI is in progress).
Networking
- Caddy auto-config: Portoser writes to Caddy's admin API and live-reloads. Automatic TLS via ACME for any reachable hostname.
*.internalrouting for cluster-private services.- dnsmasq integration for hostname queries (configuration is left to you).
AI
- MCP server (preview): a FastMCP instance is wired into the API. Tool registration metadata + audit logging are in place; first-party tools are coming.
Quick links
- Installation — pick your path.
- Quickstart — first deploy in five minutes.
- Architecture — components, data flow, file layout.
- Self-Healing Loop — how the loop runs, what it can recognize, how to extend it.
- Hardware Setups — six concrete topologies (solo laptop, Pi cluster, Mac mini lab, mixed arch, GPU + CPU split, VPS + home hybrid).
- Operations — backup, upgrade, troubleshoot.
- Reference — CLI, registry schema, HTTP API.
Platform support
| Platform | Status |
|---|---|
| Linux (Ubuntu, Debian, Fedora, Arch) | Supported |
| macOS (Intel, Apple Silicon) | Supported (requires Bash 5.x via Homebrew) |
| BSD | Untested |
Bash 5.x is required on the control host. macOS ships with 3.2 — install via brew install bash.
Designed for
- Homelabs running 2–20 machines.
- Small teams that want orchestration without operating Kubernetes.
- Mixed-architecture clusters (Pi + x86 + Apple Silicon).
- Personal infrastructure where downtime is annoying but not catastrophic — the self-healing loop earns back its weight on the failures you'd otherwise debug at 1 a.m.
Not designed for: 100+ node clusters, multi-tenant production with strict SLAs, or anything that needs Kubernetes-level scheduler features.
Start with the Installation Guide or jump to a Hardware Setup that matches your rig.