The Self-Healing Loop

Every portoser deploy runs through a four-stage loop: observe → analyze → solve → learn. Each stage is its own script in lib/. The loop is on by default and can be opted out per deploy with --no-auto-heal.

This is the part of Portoser that earns its keep. Most orchestrators stop at "deploy failed." Portoser tries to figure out why and what to do about it from a growing library of fixes.

A clean portoser deploy: observe finds no problems and the service starts on the first try.

Stage 1 — Observe (`lib/observe/observer.sh`)

Before deciding anything is wrong, Portoser collects facts about the host and the service:

Disk free space and inode pressure
Memory and swap usage
Process health (PID alive? zombied? OOM-killed?)
Port binding (is the expected port held by the expected process?)
Container state (running, restarting, exited, paused)
Network reachability of declared dependencies

These observations are written as structured events the next phase can match against.

Stage 2 — Analyze (`lib/diagnose/analyzer.sh`)

The diagnoser matches observations against a library of problem fingerprints. Examples that ship today:

Port already in use — another process binds the service's port
Dependency not ready — a declared dependency hasn't passed its health check
Docker daemon not running
Stale process holding files — leftover from a previous run
Disk space exhausted
SSH unreachable — the worker host stopped answering
Permission denied on bind-mount or socket

When a fingerprint matches, the diagnoser produces a structured problem record with confidence and the observation evidence that triggered it.

Stage 3 — Solve (`lib/solve/solver.sh` and `lib/solve/patterns/`)

For each known problem fingerprint there is a corresponding solution pattern. The solver runs the matching pattern script. Examples:

Port conflict → identify the holder, prompt for kill, retry bind
Stale process → terminate, clean PID file, retry start
Disk space → run cleanup pattern (prune images, rotate logs), then retry
Docker daemon → start it, wait for socket, retry
Dependency not ready → wait with bounded backoff, retry health check

Solutions are intentionally conservative — they retry the deploy instead of forcing fixes that could mask real bugs.

Stage 4 — Learn (`lib/standardize/learning.sh`)

Every successful resolution is recorded as a playbook in ~/.portoser/knowledge/playbooks/, and a frequency map at ~/.portoser/knowledge/problem_frequency.txt is updated.

Two outcomes:

Per-cluster memory — the next time the same fingerprint appears, the solver tries the recorded playbook first.
Visible patterns — recurring problems surface as high-frequency entries, often pointing at a config bug worth fixing instead of patching repeatedly.

The Knowledge Base UI (web/frontend/src/pages/KnowledgeBase.jsx) reads from this directory.

How it runs

The loop is invoked by lib/intelligent_deploy.sh and is the default path for portoser deploy. To deploy without the auto-fix step (e.g. when you want to see the raw failure), use:

portoser deploy <machine> <service> --no-auto-heal

For investigating without deploying, the analyzer is exposed directly:

portoser diagnose <service>

To see what the loop has learned so far:

portoser learn summary               # one-screen overview
portoser learn stats --json-output   # full stats
portoser learn playbooks             # every recorded playbook
portoser learn insights <service>    # what's failed, what got auto-fixed

What it isn't

Not an autonomous agent. The loop runs only as part of a deploy or when you invoke a phase. It does not poll your cluster looking for trouble in the background.
Not magic. It can only solve problems whose fingerprint is in the catalog. New shapes of failure are recorded as observations and surface as unmatched problems for you to triage.
Not a replacement for monitoring. It runs at deploy-time. Continuous health is handled by the health monitoring subsystem.

Adding your own pattern

Patterns are shell scripts that follow a small contract. Drop a new file in lib/solve/patterns/ named after the fingerprint, expose a solve() function, and the solver will pick it up. See lib/solve/patterns/port_conflict.sh for the canonical example.

The Self-Healing Loop

Stage 1 — Observe (lib/observe/observer.sh)

Stage 2 — Analyze (lib/diagnose/analyzer.sh)

Stage 3 — Solve (lib/solve/solver.sh and lib/solve/patterns/)