Troubleshooting

When something goes wrong, work through these in order.

1. Run the diagnose phase by hand

Before reading logs, ask Portoser what it sees:

portoser observe <service>
portoser diagnose <service>

observe gathers facts. diagnose matches them against known fingerprints and prints which problem (if any) is detected with what confidence. About 80% of failures match a fingerprint and the self-healing loop tells you exactly what's wrong.

2. Tail the deployment log

portoser logs <service> --follow

For multi-host, add --machine <host>. Logs include the full Compose / process output plus Portoser's deploy markers.

3. Check the dependency chain

portoser dependencies <service>

Many "service X is broken" reports are actually "service Y, which X depends on, never came up." The dep graph shows the order Portoser tried.

Common Issues

Port already in use

Fingerprint: port_conflict

A different process is bound to the port the service wants. The solver will identify the holder and offer to terminate it. If the holder is a leftover from a previous deploy, accept the kill. If it's something else, change the port in the registry.

sudo lsof -iTCP:<port> -sTCP:LISTEN

Stale process from a previous deploy

Fingerprint: stale_process

A local or native service crashed or was stopped without cleaning its PID file. The solver removes the PID file and re-launches. If this keeps happening, look at the service's exit code in ~/.portoser/logs/<service>.log.

Docker daemon not running

Fingerprint: docker_daemon_down

On macOS, start Docker Desktop. On Linux, sudo systemctl start docker. The solver retries automatically.

Disk space exhausted

Fingerprint: disk_exhausted

The solver runs the disk cleanup pattern (prune unused images and stopped containers, rotate logs) and retries. If it still fails, something on the host other than Portoser is filling the disk.

Dependency not ready

Fingerprint: dependency_not_ready

The service's depends_on target hasn't passed its health check. Possible causes:

  • The dependency is on a different host that's unreachable
  • The dependency's health check URL is wrong
  • The dependency is genuinely slow to start; increase its health_check.timeout

SSH unreachable

Fingerprint: ssh_unreachable

The worker stopped answering SSH. Most often a network blip or the host went to sleep. Check:

ssh <user>@<host> 'echo ok'

If SSH key auth is failing on a worker that previously worked, the host's authorized_keys may have been rotated. See lib/cluster/ssh_keys.sh for re-distribution.

Permission denied on bind-mount or socket

Fingerprint: permission_denied

Often a UID mismatch between container user and host file owner. Fix the host file's owner or set user: <uid>:<gid> in the Compose file.

When the fingerprint doesn't match

If diagnose reports unknown_problem, you have a new shape of failure. Two options:

  1. Investigate manually with the standard tools — journalctl, docker logs, systemctl status, the service's own log.
  2. Add a new pattern so future failures of this shape are caught. See lib/solve/patterns/port_conflict.sh for the contract.

Health check is flapping

If a service oscillates between healthy and unhealthy:

  • Check that the health endpoint isn't behind something that takes longer than health_check.timeout to warm up
  • Check the network path — a Caddy / proxy in the middle can intermittently 502 during reloads
  • Check health_check.interval — too short and you're polling faster than the service can respond

Web UI shows stale data

The frontend uses React Query with a 5–30 second refetch depending on the page. If you need fresh data right now, hit the browser's refresh button or open the page in a private window. WebSocket-based panels (deployment logs, metrics stream) bypass the cache.

Vault errors

  • vault sealed — run portoser vault unseal (interactive) or set VAULT_TOKEN and re-run.
  • permission denied on path X — the AppRole policy doesn't grant access. See Vault Integration.
  • vault connection refused — Vault is down. Check VAULT_ADDR and that the Vault service is running.

"Nothing helps"

Open an issue on GitHub with:

  • The output of portoser observe <service> and portoser diagnose <service>
  • The relevant section of ~/.portoser/logs/<service>.log
  • Your registry.yml entry for the service (with secrets redacted)
  • OS / architecture of the host