Skip to main content

OWASAKA — Log Triage Cheatsheet

One-page reference. Patterns assume the default JSON log format (logging.format: "json" in configs/examples/default.yaml). For incident response context see INCIDENT.md; for specific failure modes see COMMON_FAILURES.md.


Where logs live

/var/log/oswaka/oswaka.log # primary log file, lumberjack-rotated
/var/log/oswaka/oswaka.log.*.gz # rotated, compressed (max_backups=5, max_age_days=30)
journalctl -u owasaka # systemd journal (whatever stdout produced)

Rotation is governed by logging.{max_size_mb, max_backups, max_age_days, compress} in the config. The journal is always available even when the file sink is broken (e.g., disk full on /var/log).

Output sink is set by logging.output (stdout, file, or both). The NixOS module defaults to stdout, which means journalctl is your primary view; standalone deployments often run both.


Common log fields

OWASAKA uses zap's structured logging. The base fields you'll see:

FieldWhere it appearsWhat it means
levelevery linedebug / info / warn / error
tsevery lineRFC 3339 timestamp
msgevery linethe human message
principal_idauthz decisions, login flowsthe authenticated subject id (e.g., principal-admin-001)
usernamelogin attemptsthe subject string the user typed
remoteauth-relatedthe client's RemoteAddr
kidJWT issuance, event signing, STHthe key id involved (truncate prefix for compact display)
resource/action/decision/reasonauthz audit linesRBAC decision shape (see internal/authz/middleware.go)
errorany failure paththe underlying Go error string
subsystemhealth-probe responses, app start lineswhich OWASAKA module emitted the line

trace_id is not currently emitted as a structured field — correlation across services is by principal_id + timestamp window.


Key search patterns

Authentication failures

# Rejected logins (wrong password, wrong TOTP, missing factor).
journalctl -u owasaka --since "1 hour ago" | grep "login rejected"

# Rejected at middleware (bad/missing Authorization header).
journalctl -u owasaka | grep "auth rejected"

# In the rotated file:
grep -E "login rejected|auth rejected" /var/log/oswaka/oswaka.log

A short burst of login rejected from one username against one remote is interesting (brute force, stuck client). A flood across many usernames from many remotes is an attack.

RBAC denials

# Every authz decision is logged via LogAuditSink; filter to denies.
journalctl -u owasaka | grep '"decision":"deny"'

# Group by resource+action to spot a misconfigured role.
journalctl -u owasaka | grep '"decision":"deny"' \
| jq -r '"\(.resource)/\(.action) \(.principal_id) \(.reason)"' \
| sort | uniq -c | sort -rn

A burst of denies on the same resource/action after a roles hot-reload usually means the YAML diff dropped a permission — check RBAC policy reloaded for the diff.

NATS reconnects

# Disconnect / reconnect / closed lifecycle.
journalctl -u owasaka | grep -E "NATS (disconnected|reconnected|connection closed)"

# Are we currently connected? Check the health probe.
curl -sS http://127.0.0.1:8080/readyz | jq '.subsystems[] | select(.name=="nats")'

One disconnect-reconnect cycle per blip is normal. Continuous flapping means the broker side is unstable.

Retention sweep summaries

# Each sweep writes one structured "sweep complete" line.
journalctl -u owasaka | grep "retention: sweep complete"

# Failures.
journalctl -u owasaka | grep -E "retention:.*(failed|compaction failed)"

The complete line carries events_removed, alerts_removed, assets_removed, compaction_ran, duration_ms. Sudden zero removals when the sweep used to remove thousands is a yellow flag — verify the clock has not jumped.

Backup runs

# Successful backup writes emit at info level via the admin endpoint
# or scheduler. Failed runs surface the sink that failed.
journalctl -u owasaka | grep -E "backup: (sink|encrypt|source write)"

# On-disk artifact survey:
ls -lh /var/lib/owasaka/backups/ | tail
sha256sum -c /var/lib/owasaka/backups/backup-*.db.age.sha256 | grep -v OK

Breaker state changes

# The OnStateChange hook logs one line per transition. Pattern
# depends on caller wiring; grep generously.
journalctl -u owasaka | grep -iE "breaker|circuit"

# Stuck-open detection: many "closed -> open" with no subsequent
# "open -> half-open -> closed" inside the configured Timeout.

STH and transparency-log activity

# Boot banner — the STH at startup.
journalctl -u owasaka | grep -E "Current STH|stands ready" | tail -5

# Signature verification failures.
journalctl -u owasaka | grep -E "ErrSignatureInvalid|ErrSignerKeyUnknown|ErrSignerKeyRetired"

# Audit log integrity (Merkle verifier — see internal/storage/integrity).
journalctl -u owasaka | grep -i "AUDIT LOG INTEGRITY VIOLATION"

AUDIT LOG INTEGRITY VIOLATION is the loudest line OWASAKA emits. Treat as an immediate page (see INCIDENT.md "Escalation criteria").

Migrations at boot

journalctl -u owasaka | grep -iE "migration|migrate|pending|downgrade"

pending at boot means COMMON_FAILURES.md "Migration pending at boot".

Dev-mode warnings

# This must NEVER appear in production logs.
journalctl -u owasaka | grep -E "DEV MODE: static auth token"

If it does, OSWAKA_ENV=development slipped into a production environment. Page the deploy owner.


Quick aggregations

# Error-rate over the last hour, bucketed per minute.
journalctl -u owasaka --since "1 hour ago" -p err -o short-iso \
| awk '{print substr($1,1,16)}' | sort | uniq -c

# Top 10 noisiest error messages today.
journalctl -u owasaka --since today -p err \
| jq -r .msg 2>/dev/null | sort | uniq -c | sort -rn | head

# Login attempts per username (last 24h).
journalctl -u owasaka --since "24 hours ago" \
| jq -r 'select(.msg | test("login (rejected|succeeded)")) | .username' \
| sort | uniq -c | sort -rn

jq only works when logging.format: "json". For text format, substitute awk/grep on the printable representation.


See also

  • INCIDENT.md — incident response playbook (first 5 minutes, triage matrix, evidence collection)
  • COMMON_FAILURES.md — known failure modes with diagnostic commands matching the searches above
  • DR.md — disaster recovery (lost disk, suspected tampering, failover, STH-regression triage)
  • docs/auth/OPERATIONS.md — provisioning and revocation procedures