The Self-Healing Loop
Every portoser deploy runs through a four-stage loop: observe → analyze → solve → learn. Each stage is its own script in lib/. The loop is on by default and can be opted out per deploy with --no-auto-heal.
This is the part of Portoser that earns its keep. Most orchestrators stop at "deploy failed." Portoser tries to figure out why and what to do about it from a growing library of fixes.
portoser deploy: observe finds no problems and the service starts on the first try.Stage 1 — Observe (lib/observe/observer.sh)
Before deciding anything is wrong, Portoser collects facts about the host and the service:
- Disk free space and inode pressure
- Memory and swap usage
- Process health (PID alive? zombied? OOM-killed?)
- Port binding (is the expected port held by the expected process?)
- Container state (running, restarting, exited, paused)
- Network reachability of declared dependencies
These observations are written as structured events the next phase can match against.
Stage 2 — Analyze (lib/diagnose/analyzer.sh)
The diagnoser matches observations against a library of problem fingerprints. Examples that ship today:
- Port already in use — another process binds the service's port
- Dependency not ready — a declared dependency hasn't passed its health check
- Docker daemon not running
- Stale process holding files — leftover from a previous run
- Disk space exhausted
- SSH unreachable — the worker host stopped answering
- Permission denied on bind-mount or socket
When a fingerprint matches, the diagnoser produces a structured problem record with confidence and the observation evidence that triggered it.
Stage 3 — Solve (lib/solve/solver.sh and lib/solve/patterns/)
For each known problem fingerprint there is a corresponding solution pattern. The solver runs the matching pattern script. Examples:
- Port conflict → identify the holder, prompt for kill, retry bind
- Stale process → terminate, clean PID file, retry start
- Disk space → run cleanup pattern (prune images, rotate logs), then retry
- Docker daemon → start it, wait for socket, retry
- Dependency not ready → wait with bounded backoff, retry health check
Solutions are intentionally conservative — they retry the deploy instead of forcing fixes that could mask real bugs.
Stage 4 — Learn (lib/standardize/learning.sh)
Every successful resolution is recorded as a playbook in ~/.portoser/knowledge/playbooks/, and a frequency map at ~/.portoser/knowledge/problem_frequency.txt is updated.
Two outcomes:
- Per-cluster memory — the next time the same fingerprint appears, the solver tries the recorded playbook first.
- Visible patterns — recurring problems surface as high-frequency entries, often pointing at a config bug worth fixing instead of patching repeatedly.
The Knowledge Base UI (web/frontend/src/pages/KnowledgeBase.jsx) reads from this directory.
How it runs
The loop is invoked by lib/intelligent_deploy.sh and is the default path for portoser deploy. To deploy without the auto-fix step (e.g. when you want to see the raw failure), use:
portoser deploy <machine> <service> --no-auto-heal
For investigating without deploying, the analyzer is exposed directly:
portoser diagnose <service>
To see what the loop has learned so far:
portoser learn summary # one-screen overview
portoser learn stats --json-output # full stats
portoser learn playbooks # every recorded playbook
portoser learn insights <service> # what's failed, what got auto-fixed
What it isn't
- Not an autonomous agent. The loop runs only as part of a deploy or when you invoke a phase. It does not poll your cluster looking for trouble in the background.
- Not magic. It can only solve problems whose fingerprint is in the catalog. New shapes of failure are recorded as observations and surface as unmatched problems for you to triage.
- Not a replacement for monitoring. It runs at deploy-time. Continuous health is handled by the health monitoring subsystem.
Adding your own pattern
Patterns are shell scripts that follow a small contract. Drop a new file in lib/solve/patterns/ named after the fingerprint, expose a solve() function, and the solver will pick it up. See lib/solve/patterns/port_conflict.sh for the canonical example.