Introduction

kanade — 奏 is an endpoint management system for Windows fleets. It gives an operator a single CLI / SPA to run scripts, install software, gather inventory, and stream live perf data from hundreds of PCs at once. The pieces:

ComponentWhat it is
kanade-agentService that runs on each managed PC. Subscribes to NATS, executes commands, ships results.
kanade-backendHTTP API + projector. Persists state, serves the SPA, exposes operator endpoints.
kanade-clientOptional Tauri desktop app. End-user-facing surface.
NATS serverMessage broker for command fan-out + result aggregation. The agent talks NATS-only; the backend reads NATS too.
kanade CLIOperator-facing command line: publish binaries, fire jobs, query state.

This site covers two audiences:

  • Operators running a kanade fleet — how to update each component without ssh-ing into endpoints (see Agent-mediated updates).
  • Developers writing PowerShell jobs the agent will execute — what works, what doesn't, what changed in recent agent versions (see Writing scripts for the agent).

The detailed protocol / on-wire spec lives at Spec (the legacy single-page document; will be split into chapters as the docs site fills out).

Developer Quickstart

This guide gets you up and running with a local kanade development environment.

1. Prerequisites

Before starting, ensure you have the following installed on your Windows machine:

  • Rust toolchain (stable channel)
  • cargo-make (run cargo install --force cargo-make)
  • bun (for SPA dependency management and build execution)
  • gsudo (for local service deployment tests)
  • nats-server (runnable from PATH)

2. One-Time Setup

Run the following command at the workspace root to register the git pre-push hooks and install agent skills defined in apm.yml:

cargo make setup

3. Launching the Dev Sandbox

You can spin up a fully isolated, multi-component development stack on your local host using a single command:

cargo make dev

This task runs the following services concurrently in a loopback sandbox:

  1. nats-dev: Unauthenticated NATS broker listening on port 4223.
  2. backend-dev: Dev API server listening on port 8081 with auth disabled.
  3. agent-dev: Local dev agent talking to the dev NATS broker on 4223.
  4. web-dev: Vite dev server for the React SPA listening on http://localhost:5173.

Press Ctrl+C to tear down all components cleanly.

4. Multi-Agent Fleet Simulation

To debug behavior that only shows up when managing multiple machines (e.g. concurrent execution result projection or ID collisions), you can launch a multi-agent sandbox:

cargo make dev-fleet

This spawns the NATS broker, backend, and SPA, plus three separate dev agents with independent IDs (dev-pc-1, dev-pc-2, dev-pc-3) and isolated state databases.

5. Local Deploy Testing

If you want to test the full lifecycle of installing components as Windows services (mirroring production environments), use the local deployment scripts:

# Installs CLI, agent, backend, and NATS services locally via gsudo elevation
cargo make local-deploy

After deployment, you can verify and interact with the real Windows services. Use the following task to stop and cleanly delete the services when finished:

cargo make local-undeploy

System Architecture

kanade is designed to manage hundreds of Windows endpoints concurrently, safely, and asynchronously.

Component Topology

The system consists of five main components, coordinated through an event-driven pub/sub structure:

graph TD
    subgraph Operator Session
        CLI[kanade CLI]
        SPA[React SPA]
    end

    subgraph Server Infrastructure
        Backend[kanade-backend]
        NATS[NATS Broker / JetStream]
    end

    subgraph Windows Endpoints
        Agent1[kanade-agent PC-1]
        Agent2[kanade-agent PC-2]
        Client[kanade-client Tauri App]
    end

    CLI -->|Command / Query API| Backend
    SPA -->|REST / WebSockets| Backend
    Backend <-->|State/PubSub| NATS
    Agent1 <-->|NATS-only Connection| NATS
    Agent2 <-->|NATS-only Connection| NATS
    Client <-->|Tauri IPC| Agent1

1. kanade-agent

A high-performance Windows service running on each managed host.

  • Role: Core executor.
  • Communication: Establishes an outbound-only NATS connection. It does not open any inbound ports, making it firewall-friendly.
  • Capabilities: Launches secure, isolated PowerShell subprocesses, inventories hardware/software specs, streams live performance data (CPU, RSS memory, disk I/O), and manages local packages.

2. kanade-backend

The central HTTP API and projection server.

  • Role: Coordinates commands, processes incoming telemetry, and hosts the operator Web interface.
  • State management: Persists events, activity logs, and status records in a localized SQLite database.
  • Projector pattern: Subscribes to the NATS command-response stream, parses incoming payloads, and projects them into state tables in real-time.

3. NATS Broker (with JetStream)

The message transport layer of the entire fleet.

  • Role: Lightweight, high-throughput message broker.
  • JetStream: Retains command streams, job registrations, and file storage (using NATS Object Store buckets for distributing packages and agent scripts).
  • Isolation: Decouples the backend from the agents. If the backend is offline or restarting, agents continue execution and cache outbox records, pushing them once connection resumes.

4. kanade-client

An optional Tauri desktop application running in the logged-in user's desktop session on endpoints.

  • Role: Provides end-user interaction (e.g., prompt dialogs, notifications, or a user-facing dashboard).
  • Communication: Shares state with the local kanade-agent via secured local IPC mechanisms.

5. kanade CLI

The primary command-line tool for operators.

  • Role: Packages and publishes software updates, submits and executes job manifests, and queries live fleet inventory from the command-line.

Security & Reliability Design

Outbound-only Connections

Agents strictly communicate with the NATS broker by initiating outbound TCP connections. No firewall ports need to be opened on endpoints, neutralizing the risk of lateral traversal or external port scanning.

Agent Job Sandboxing

When executing scripts, the agent stages commands in %ProgramData%\Kanade\agent-scripts and executes them using customized launcher templates. Administrators can enforce identity configurations via job manifests, specifying run_as: system (for elevated system management) or run_as: user (to run safely under the active user's credentials with restricted directory ACLs).

Operations overview

Day-2 operations in kanade fall into two flows:

  1. Direct install — drop binaries + config on a fresh host and register the Windows service. Used to bootstrap the first agent, the initial backend, and the NATS server. Scripts: scripts/deploy/agent.ps1, scripts/deploy/backend.ps1, scripts/deploy/nats.ps1. Run manually on the target host.

  2. Agent-mediated update — once an agent is running, the agent itself can install / update other components on its own host without ssh / RDP. The operator publishes binaries + script bodies to the broker, then fires a job; the agent fetches, verifies, swaps, and restarts services. This is the bulk of day-2 operations.

The agent-mediated flow has the same shape regardless of what you're updating:

operator host ─► kanade CLI ─► NATS broker ─► agent (on target host)
                    │                              │
                    ├── publish binary ────────────► fetches from
                    │   to OBJECT_APP_PACKAGES        OBJECT_APP_PACKAGES
                    ├── publish script ────────────► fetches from
                    │   to OBJECT_SCRIPTS             OBJECT_SCRIPTS
                    ├── register / update job ─────► reads job manifest
                    │                                 from `jobs` KV
                    └── exec job ──────────────────► PowerShell child
                                                      runs the script

Component-specific guides:

  • kanade-backend — the HTTP / projector binary
  • kanade-client — the Tauri end-user app
  • NATS server — the broker itself (yes, you can update the broker over the broker)
  • kanade-agent itself — the agent self-update path (different from the other three; it uses a dedicated rollout bucket, not the generic OBJECT_APP_PACKAGES + script pair)

Installation and Deployment

This section details how to bootstrap kanade components as native Windows services in production or staging environments.

Deployment Model

Production hosts and target endpoints run kanade components as background Windows services. This ensures high availability and automatic startup.

Service NameTriple / BinaryConfig SourceTypical Target
KanadeNatsnats-server.exeHardened Registry / Registry-baked CLI flagsCentral server
KanadeBackendkanade-backend.exeHardened Registry / Config fileCentral server
KanadeAgentkanade-agent.exeHardened Registry / Local state DBManaged endpoints

1. Prerequisites

  • Host OS: Windows 10/11 or Windows Server 2016+.
  • gsudo: Required to perform elevated installations from standard user shells (or run commands from an Administrator-level PowerShell prompt).
  • Network Routing: Managed endpoints must be able to reach the NATS server port (default 4222) over TCP.

2. Setting Up the NATS Server (Broker)

The NATS server acts as the messaging core.

  1. Stage the deployment bundle using scripts/build-release.ps1 -Roles nats.
  2. Deploy the service with elevation:
    # Elevated PowerShell prompt
    & "dist\nats\deploy-nats.ps1" -NatsToken "your-secure-nats-token" -Recreate
    

This installs the KanadeNats service, configures it to run under the local system account, sets up JetStream data directories, and locks down the secure authorization token in the Windows registry.


3. Deploying the Backend API & SPA

The backend manages operator connections and processes event logs.

  1. Stage the backend binaries and React SPA bundle using scripts/build-release.ps1 -Roles backend.
  2. Deploy the service:
    # Elevated PowerShell prompt
    & "dist\backend\deploy-backend.ps1" `
        -NatsToken "your-secure-nats-token" `
        -StaticToken "your-operator-spa-bearer-token" `
        -ForceConfig -Recreate
    
  • -NatsToken: Connects the backend to the local NATS server securely.
  • -StaticToken: Defines the API bearer token required for operator CLI/SPA logins.

The deployment script registers the KanadeBackend Windows service, sets the appropriate ACLs, and verifies the endpoint.


4. Installing the Agent on Target Endpoints

Install the agent on every endpoint PC that you want to manage.

  1. Stage the agent bundle using scripts/build-release.ps1 -Roles agent.
  2. Copy the contents of the dist/agent folder to the target PC.
  3. On the target PC, run the installer:
    # Elevated PowerShell prompt
    & ".\deploy-agent.ps1" -NatsToken "your-secure-nats-token" -ForceConfig -Recreate
    

The script:

  • Places kanade-agent.exe into its destination directory.
  • Secures the configuration and NATS token in the Windows registry path (HKLM:\SOFTWARE\Kanade\agent).
  • Registers and starts the KanadeAgent service.

Once the service is active, the agent establishes an outbound NATS connection, subscribes to command streams, and reports its online heartbeat back to the fleet backend.

Agent-mediated updates

The agent is the universal installer. Once it's running on a target host, the operator never needs to touch the host directly to update any other component — including the backend it talks to, the broker that carries its messages, and the agent itself.

This chapter has one page per component:

Common machinery used by all of them:

Bucket / StreamPurpose
OBJECT_APP_PACKAGESGeneric binary storage (backend, client, NATS server, …). Keyed by <name>/<version>.
OBJECT_SCRIPTSPowerShell script bodies referenced by manifests via script_object. Keyed by <name>/<version>.
OBJECT_AGENT_RELEASESAgent binaries only. Separate from APP_PACKAGES because agent rollout has its own watcher / target_version flow.
agent_config (KV)Layered config — global / per-group / per-PC. target_version lives here.
jobs (KV)Job catalog. Each entry is a manifest the operator can exec.

The CLI surface:

CommandWhat it does
kanade app publish <name> <version> <file>Upload to OBJECT_APP_PACKAGES.
kanade script publish <name> <version> <file>Upload to OBJECT_SCRIPTS.
kanade job create <yaml>Upsert a job manifest into the jobs KV.
kanade exec <job-id> --pcs <pc> [--pcs <pc> …]Fire a registered job at a set of PCs.
kanade agent publish <file>Upload an agent binary (version extracted from PE VERSIONINFO).
kanade agent rollout <version> --pc | --group | --globalFlip target_version on the chosen scope; agents pick it up via their self-update watcher.

Updating kanade-backend

The backend lives on one (or more) of your managed hosts as a Windows service. Agent-mediated update means: an agent running on that host stops the service, swaps the binary, starts it back up — while the operator never logs into the host.

End-to-end flow

┌── operator host ──────────────────────────────────────────┐
│  1. build kanade-backend.exe                              │
│  2. kanade app publish kanade-backend <v> <exe>           │
│  3. edit deploy-backend.ps1 (set $AgentSource* knobs)     │
│  4. kanade script publish deploy-backend <v> <edited.ps1> │
│  5. kanade job create install-kanade-backend.yaml         │
│  6. kanade exec install-kanade-backend --pcs <host>       │
└────────────────────────────────────────────────────────────┘
                      │
                      ▼
┌── target host (running kanade-agent as LocalSystem) ──────┐
│  • agent receives the Command on commands.pc.<host>       │
│  • fetches deploy-backend.ps1 from OBJECT_SCRIPTS         │
│    sha-verifies it (`script_object` machinery, #214)      │
│  • stages it under                                        │
│    C:\ProgramData\Kanade\agent-scripts\<UUID>\            │
│    kanade-<UUID>.ps1                                      │
│  • runs `powershell -File <launcher>` (PR #230 fix)       │
│  • launcher invokes the user script via `& '...'`         │
│    so [CmdletBinding()] / param() headers parse           │
│  • script downloads kanade-backend.exe from               │
│    OBJECT_APP_PACKAGES (via /api/app-packages/…)          │
│    sha-verifies it (separate hash, on the exe itself)     │
│  • Stop-Service KanadeBackend                             │
│  • copy exe over C:\Program Files\Kanade\…                │
│  • Start-Service KanadeBackend                            │
│  • exit 0 — result published to NATS                      │
└────────────────────────────────────────────────────────────┘

The two sha checks are intentional: the script body's hash is verified by the agent before execution (script integrity); the binary's hash is verified by the script before the swap (binary integrity, defined by the operator in $AgentSourceSha256).

Step-by-step

1. Build kanade-backend

cargo build --release -p kanade-backend

Output: target/release/kanade-backend.exe.

2. Publish the binary

kanade app publish kanade-backend 0.43.0 target/release/kanade-backend.exe

This uploads the binary to OBJECT_APP_PACKAGES/kanade-backend/0.43.0 and prints the sha-256 digest. Copy the digest — you'll need its lowercase-hex form for the script.

3. Edit scripts/deploy/backend.ps1

Make a local copy. Set the four $Agent* knobs at the top:

$AgentSourceUrl       = 'http://kanade-backend.example.com:8080'
$AgentSourceVersion   = '0.43.0'
$AgentSourceSha256    = '<lowercase hex of kanade-backend.exe>'
$AgentSourceAuthToken = '<bearer for the backend HTTP API>'

Leave the rest of the script alone — those knobs are how the script knows it's running in "agent mode" (downloading from the backend) vs the manual-install mode (local folder of files).

The $AgentSourceSha256 is the hex form of Get-FileHash kanade-backend.exe -Algorithm SHA256.

If you only have the base64url form printed by kanade app publish, decode it. The base64 from the CLI is URL-safe and may be unpadded, so the PowerShell snippet needs to re-pad before FromBase64String accepts it:

$b64 = '<paste the SHA-256= value here, without the SHA-256= prefix>'
$b64 = $b64.Replace('-', '+').Replace('_', '/')
if ($b64.Length % 4) { $b64 += '=' * (4 - $b64.Length % 4) }
[BitConverter]::ToString([Convert]::FromBase64String($b64)).Replace('-', '').ToLowerInvariant()

Or in Python: python -c "import base64; print(base64.urlsafe_b64decode('<b64>' + '=' * (-len('<b64>') % 4)).hex())"

The $AgentSourceAuthToken is required as of the live test on 2026-05-26 — the backend's /api/app-packages/<name>/<ver> endpoint returns HTTP 401 without it. Leave empty only for no-auth lab setups.

4. Publish the edited script

kanade script publish deploy-backend 0.43.0 .\deploy-backend.edited.ps1

Upload goes to OBJECT_SCRIPTS/deploy-backend/0.43.0.

5. Register / update the job

configs/jobs/installers/install-kanade-backend.yaml in the repo is the template. Edit version: + script_object: to point at the version you just published, then upsert:

id: install-kanade-backend
version: 0.43.0
execute:
  shell: powershell
  script_object: deploy-backend/0.43.0
  timeout: 300s
  run_as: system
require_approval: true
kanade job create jobs\install-kanade-backend.yaml

run_as: system is required: Stop-Service / Start-Service / sc.exe all need admin. The agent already runs as LocalSystem in production.

6. Fire it

kanade exec install-kanade-backend --pcs <backend-host>

The CLI returns an exec_id immediately. The actual install happens asynchronously on the target.

7. Verify

Query the backend's results endpoint (or watch the SPA Activity view):

curl -H "Authorization: Bearer <token>" `
  "http://<backend>/api/results?limit=5"

Look for your exec_id with exit_code: 0 and a stdout that ends with kanade-backend <new-version>.

What can go wrong

SymptomCauseFix
[CmdletBinding()] / param() parse error in stderrAgent older than 0.42.2 (running -Command mode)Upgrade the agent first via kanade agent rollout (see agent self-update).
Start-BitsTransfer : HTTP status 401$AgentSourceAuthToken empty but backend requires authSet it.
Start-BitsTransfer : The transfer encountered an error / job state TransientErrorBITS service not running, or target machine's WinHTTP can't reach $AgentSourceUrlGet-Service BITS; check WinHTTP proxy with netsh winhttp show proxy (BITS uses WinHTTP, not IE/WinINet).
sha256 mismatch — expected=<x> actual=<y>Hash in script doesn't match the published binaryRe-publish or recompute the hash. The script aborts BEFORE the swap, so the existing install is intact.
Job runs but kanade-backend doesn't come back upService-failure / config drift on targetRead C:\ProgramData\Kanade\log\backend.*.log on the target. The agent can fetch it via kanade logs <pc> (when implemented) or you can pull the file directly.

Updating kanade-client

The Tauri desktop client is shipped to endpoints the same way as the backend: binary in OBJECT_APP_PACKAGES, script in OBJECT_SCRIPTS, job in jobs KV. The shape mirrors backend updates — only the script content and package name differ.

What's different from backend updates

Aspectkanade-backendkanade-client
Service to (re)startKanadeBackend (Windows service)None — the client is launched by the user
Install location%ProgramFiles%\Kanade\kanade-backend.exe%ProgramFiles%\Kanade\kanade-client.exe
Script in reposcripts/deploy/backend.ps1configs/jobs/installers/scripts/install-kanade-client.ps1 (lives in the manifest's script_file path)
Manifest file refscript_object: deploy-backend/<v>script_file: scripts/install-kanade-client.ps1 (relative to the manifest YAML; inlined at kanade job create)
Atomic swap patternStop service → copy → start serviceStage to <exe>.newMove-Item → drop <exe>.old
Inventory projectionNone (the backend reports its own version)inventory: block emits per-PC client version into the SPA Inventory page

Both shapes — script_object (referenced by hash from OBJECT_SCRIPTS, agent fetches on demand) and script_file (script body inlined into the manifest at kanade job create time) — are supported. The client manifest uses script_file for historical reasons; the backend manifest uses script_object because it was rewritten to test the Object Store path.

Step-by-step

1. Build kanade-client

cargo build --release -p kanade-client

Output: target/release/kanade-client.exe.

2. Publish the binary

kanade app publish kanade-client 0.42.0 target/release/kanade-client.exe

3. Edit configs/jobs/installers/scripts/install-kanade-client.ps1

Set the three knobs at the top:

$BackendBase    = 'http://kanade-backend.example.com:8080'
$Version        = '0.42.0'
$ExpectedSha256 = '<lowercase hex of kanade-client.exe>'

Set $ClientSourceAuthToken to the backend's bearer when auth is enabled — same token the agent uses against the rest of /api/*. Leave it blank for dev / smoke-test setups where the /api/app-packages/kanade-client/<v> route is unauthenticated. Mirrors the $AgentSourceAuthToken knob in scripts/deploy/backend.ps1.

4. Register / update the job

configs/jobs/installers/install-kanade-client.yaml:

id: install-kanade-client
version: 0.42.0
execute:
  shell: powershell
  script_file: scripts/install-kanade-client.ps1   # body inlined at `job create` (relative to the manifest YAML)
  timeout: 180s
  run_as: system

require_approval: true

inventory:
  display:
    - { field: version, label: Version }
    - { field: path,    label: Install path }
  summary:
    - { field: version, label: Client version }
kanade job create jobs\install-kanade-client.yaml

The inventory: block tells the projector that the script's stdout is a single JSON blob whose version / path fields populate the SPA's Inventory page. Operators can spot stragglers from a fleet-wide table — no ssh needed.

5. Fire it

kanade exec install-kanade-client --pcs <host> [--pcs <host> …]

Or against a group:

kanade exec install-kanade-client --groups office

6. Verify in the SPA

Open the SPA Inventory page (or query /api/inventory?app=kanade-client) and confirm the target hosts report the new version.

Updating NATS server

Updating the broker on a managed host is the most interesting case because the agent talks to the broker over the broker — stopping NATS means losing the agent's connection mid-job. The machinery handles this with two mechanisms working together:

  1. Reconnect. The agent's NATS client reconnects automatically on broker restart. No human intervention needed.
  2. Outbox. Job results produced while the broker is down are queued under %ProgramData%\Kanade\outbox\ and replayed once the connection comes back. The result row reaches the backend as soon as the new NATS server is up.

So the flow looks the same as backend updates — the script Stops the service, swaps the binary, Starts it — and the agent transparently rides out the broker gap.

Caveats specific to NATS updates

ConcernReality
Will the result row be lost?No — outbox persists it across the broker outage and drains on reconnect.
Can I update from the SPA?Yes, same as any job — kanade exec install-kanade-nats --pcs <broker-host>.
What if NATS doesn't come back up?The result will sit in the outbox indefinitely. Operators should monitor outbox/ on the broker host as a leading indicator.
What if the new NATS version is incompatible (JetStream upgrade etc.)?Roll a single canary first (--pcs <one-broker>), watch outbox + backend health, then roll out fleet-wide. The 5-min cache TTL for SPA queries means you'll see the canary's state within a few minutes.

Manual install (bootstrap)

For the very first install — when there's no agent on the broker host yet — use the direct workflow:

.\scripts\build-release.ps1 -Roles nats       # fetches nats-server.exe
                                              # from github.com/nats-io/nats-server/releases
.\scripts\deploy\nats.ps1 -NatsToken '<token>'

This installs nats-server.exe to %ProgramFiles%\Kanade\ and nats-server.conf to %ProgramData%\Kanade\config\ (with ACL hardened to SYSTEM + Administrators because the bearer token lives in plaintext), registers the KanadeNats Windows service, opens TCP 4222 (broker) + 8222 (monitoring HTTP), and starts the service.

Agent-mediated update (steady state)

Status: template-only. scripts/deploy/nats.ps1 doesn't ship $AgentSource* knobs yet — agent-mode is on the backlog. The shape below is what it WILL look like once the knobs land.

1. Build / fetch nats-server.exe

Either:

.\scripts\build-release.ps1 -Roles nats   # fetches the binary

…or download it directly from github.com/nats-io/nats-server/releases.

2. Publish the binary

kanade app publish nats-server 2.10.20 .\nats-server.exe

3. Edit deploy-nats.ps1

Once the agent-mode knobs ship, the pattern matches deploy/backend.ps1:

$AgentSourceUrl       = 'http://kanade-backend.example.com:8080'
$AgentSourceVersion   = '2.10.20'
$AgentSourceSha256    = '<lowercase hex of nats-server.exe>'
$AgentSourceAuthToken = '<bearer for the backend HTTP API>'

4. Publish + register + exec

kanade script publish deploy-nats 2.10.20 .\deploy-nats.edited.ps1
kanade job create jobs\install-kanade-nats.yaml
kanade exec install-kanade-nats --pcs <broker-host>

The job manifest will look like:

id: install-kanade-nats
version: 2.10.20
execute:
  shell: powershell
  script_object: deploy-nats/2.10.20
  timeout: 300s
  run_as: system
require_approval: true

5. Verify

After the broker comes back, the outbox drains and you'll see the result row in /api/results. Confirm the new NATS version via the broker's monitoring endpoint:

curl http://<broker>:8222/varz | python -m json.tool | rg version

Why we don't need a separate "broker update" mechanism

Earlier designs considered a dedicated bootstrap channel (parallel NATS link the agent uses just for broker updates) to avoid the self-update-over-broker chicken-and-egg. The outbox + reconnect pair makes that unnecessary: the result is "merely delayed", not "lost". One transport, one mental model.

Updating kanade-agent itself

Agent self-update is the only component that doesn't use OBJECT_APP_PACKAGES + a script_object job. It has dedicated machinery because the agent has to swap its own running binary without ssh — a tighter loop than the generic install jobs.

Mechanism

Bucket / KeyPurpose
OBJECT_AGENT_RELEASESAgent binaries, keyed by <version>. Separate from OBJECT_APP_PACKAGES so the rollout watcher only fires on agent updates.
agent_config.<scope>.target_versionThe version each scope (global / group / pc) should be on. Watched by the agent's self_update loop.

Flow:

1. agent.self_update watches agent_config for target_version
2. If target_version != my agent_version:
   a. Pull `OBJECT_AGENT_RELEASES/<target_version>` to <exe>.new
   b. Sha-verify against the bucket's recorded digest
   c. Atomic swap: <exe> ← <exe>.new (via SCM stop/start)
   d. New binary boots, watcher arms again, loop closes

The rollout watcher has to survive a cold broker (e.g. agent and broker boot at the same time after a host reboot). Pre-#226 a permanent Err(_) => return; on the first get_object_store call killed the watcher forever; the agent would never self-update on that boot. Post-#226 the watcher retries with backoff until the broker is reachable.

Step-by-step

1. Build the agent

cargo build --release -p kanade-agent

Output: target/release/kanade-agent.exe.

2. Publish

kanade agent publish target/release/kanade-agent.exe

The CLI extracts the version from the PE VERSIONINFO resource — no --version flag, no chance of a label / binary mismatch.

3. Roll out

Pick a scope. Start with one canary host:

kanade agent rollout 0.42.2 --pcs canary-01

Watch via ping:

kanade ping canary-01     # agent_version should flip to 0.42.2
                          # within a few seconds

If happy, widen:

kanade agent rollout 0.42.2 --groups office --jitter 5m
# or fleet-wide
kanade agent rollout 0.42.2 --global --jitter 30m

--jitter spreads the actual swap moment across a window so a wide fan-out doesn't hammer the OS service manager on every host at once. Recommended for fleets ≥ 100 hosts.

4. Verify

kanade agent current
# → target_version = 0.42.2 (global)

Then a fleet-wide spot-check via the SPA Agents page (or /api/agents): the agent_version column should converge to the new version within jitter + ~30s heartbeat cadence.

What can go wrong

SymptomCauseFix
kanade agent rollout says "version not in OBJECT_AGENT_RELEASES"Typo or wrong scopeRe-check with kanade agent current and kanade jetstream object list agent_releases.
kanade ping <host> still shows the old version after several minutesAgent didn't self-update — either the watcher's dead (pre-#226 agent) or the host can't reach the brokerCheck %ProgramData%\Kanade\log\agent.*.log on the target. If self_update is silent (no "checking target_version" log lines), the agent is too old; bootstrap manually with deploy-agent.ps1.
Agent flaps: starts, immediately exits with exit_code: 1The new binary is bad on this host (config drift, missing dep, etc.). SCM's failure-actions restart it, it crashes again — observable in Event Viewer as a Service Control Manager error clusterRoll back: kanade agent rollout <prev-version> --pcs <host>. The host will swap back at the next watcher tick.

Why a separate bucket / scope?

OBJECT_APP_PACKAGES is a generic blob store keyed by <name>/<version>. The agent rollout pattern needs:

  • A watcher that fires only on agent changes (cheap KV watch on one specific key, not a poll over a bucket of many names).
  • A "current target" semantic per scope, not just "all known versions" — agent_config.<scope>.target_version IS the answer to "what should I be running" without the agent enumerating.
  • Operator UX (kanade agent publish / rollout) that's divergent enough from kanade app publish to warrant its own subcommand tree.

So agents get OBJECT_AGENT_RELEASES + a layered config KV; the other components share OBJECT_APP_PACKAGES + per-app jobs.

Removing kanade from a host (undeploy)

Production rollback path. When a host needs to come off kanade — because a rollout broke something, the host is being decommissioned, or you just want a clean slate to re-install from — there's one undeploy script per component, mirroring the deploy script that put it there.

ComponentDeployUndeploy
Agentscripts/deploy/agent.ps1scripts/undeploy/agent.ps1
Backendscripts/deploy/backend.ps1scripts/undeploy/backend.ps1
NATS serverscripts/deploy/nats.ps1scripts/undeploy/nats.ps1
Client (Tauri)configs/jobs/installers/scripts/install-kanade-client.ps1 (agent-driven)scripts/undeploy/client.ps1

All four are admin-only and idempotent — safe to re-run after a partial uninstall, safe to run when the component is already gone (each step logs "not present, skipping" and moves on).

Default posture: safe

Run with no flags and the script:

  • Stops the Windows service.
  • Unregisters it from SCM (waits for the entry to actually disappear, so a re-deploy doesn't race a pending removal).
  • Removes the installed binary from %ProgramFiles%\Kanade\, including any half-completed <exe>.new / <exe>.old swap artefacts.
  • Removes any inbound firewall rule the deploy script created (pass -KeepFirewall to skip this — useful when an external WAF / Group Policy owns the rule).
  • Keeps everything under %ProgramData%\Kanade\ (config, logs, JetStream data, SQLite DB, …) so forensics / rollback / re-deploy can proceed without losing state.
  • Keeps registry-stored secrets at HKLM:\SOFTWARE\kanade\<role>\*.

That's enough for the common case: "this host's kanade is misbehaving, get it off without destroying state".

-Purge: destructive cleanup

Adds:

  • Removes the component's exclusive entries under %ProgramData%\Kanade\. Crucially, only the component's own files — agent / backend / NATS share the same root and each script avoids touching the others' files.
  • Removes the matching HKLM:\SOFTWARE\kanade\<role>\* key (unless -KeepSecrets is also passed — useful when multiple components share the same bearer).
ComponentWhat -Purge removes
Agentconfig\agent.toml, logs\agent.*.log, outbox\, HKLM:\SOFTWARE\kanade\agent\
Backendconfig\backend.toml, data\*.db* (SQLite — historical results / inventory wiped), logs\backend.*.log, HKLM:\SOFTWARE\kanade\backend\
NATSconfig\nats-server.conf, nats\ (JetStream — all KV / Object Store / streams wiped), logs\nats*.log
ClientNothing extra (no per-user state yet)

⚠️ undeploy-nats.ps1 -Purge and undeploy-backend.ps1 -Purge are the dangerous ones. The first wipes the fleet's entire JetStream state (agent_releases, app_packages, scripts, jobs, agent_config, results stream); the second wipes the projector's historical SQLite. Both are unrecoverable without out-of-band backups. The scripts print a loud banner before running.

Rollback recipes

Bad rollout on one canary host

# On the canary, as Admin:
.\scripts\undeploy\agent.ps1            # safe default
# kanade is now off the host. Re-deploy when ready:
.\scripts\deploy\agent.ps1 -SourceDir C:\path\to\prev-version

Decommission a host permanently

.\scripts\undeploy\agent.ps1 -Purge

Wipe a dev box for a clean re-install

.\scripts\undeploy\agent.ps1 -Purge
.\scripts\undeploy\backend.ps1 -Purge   # ⚠️ SQLite gone
.\scripts\undeploy\nats.ps1 -Purge      # ⚠️ JetStream gone
.\scripts\undeploy\client.ps1
# Now nothing about kanade exists on the box.

Rebuild a single bad service without touching state

.\scripts\undeploy\backend.ps1          # safe default: SQLite intact
.\scripts\deploy\backend.ps1 -Recreate  # fresh service registration, same data

What undeploy does NOT do

  • It doesn't notify the rest of the fleet that this host has gone away — the backend will keep listing it under "agents" until its heartbeat ages out (/api/agents staleness threshold). If you want it removed from the SPA immediately, delete the row via the backend API after undeploy.
  • It doesn't roll back the deployed binary to a previous version. "Roll back" in this script's vocabulary means "remove entirely"; if you want to swap to an older version, re-run the matching deploy-*.ps1 against a folder containing the older binary.
  • It doesn't touch NATS-side state when you remove the agent — the agent's target_version entry under agent_config.pcs.<pc> stays in the KV. Clean those up server-side with kanade jetstream kv del agent_config pcs.<pc>.target_version if needed.

Writing scripts for the agent

PowerShell scripts the agent will run are almost normal .ps1 files. This page collects the gotchas that aren't obvious from the script source alone.

The agent stages scripts on disk and runs them via -File

As of PR #230 (agent version 0.42.0+), the agent:

  1. Writes your script body to a temp .ps1 under %ProgramData%\Kanade\agent-scripts\<UUID>\kanade-<UUID>.ps1 (Windows) or $TMPDIR/kanade-agent-<UUID>/kanade-<UUID>.ps1 (non-Windows dev only).
  2. Writes a launcher .ps1 next to it that sets UTF-8 console encoding then & '<your-script>' @args.
  3. Spawns powershell -NoProfile -NonInteractive -ExecutionPolicy Bypass -File <launcher>.

This means your script:

  • Can have [CmdletBinding()] and param(...) at the top. The call-operator boundary in the launcher gives your script its own scope where those headers are valid.
  • Should not rely on $PSCommandPath matching the operator's source path — it'll be the staged temp file.
  • Should not write to $PSScriptRoot (see next section).

Pre-0.42.0 agents used powershell -Command "<body>", which parses the body as a command-line expression and rejects [CmdletBinding()] as a syntax error. If you see "Unexpected token '[CmdletBinding()]'" in stderr, the host's agent is too old — upgrade it (see agent self-update).

$PSScriptRoot is read-only for run_as: user

When run_as: user (or system_gui), the child process runs as the logged-in user — not as the LocalSystem agent that wrote the staged file. The staging directory inherits its ACL from %ProgramData%, which grants users Read & Execute but not Modify.

That means:

# OK from any run_as
Get-ChildItem $PSScriptRoot              # list contents
Get-Content   $PSScriptRoot\anything     # read

# NG from run_as: user (access denied)
New-Item    -Path $PSScriptRoot\out.txt
Set-Content -Path $PSScriptRoot\log.log

Write to $env:TEMP, $env:LOCALAPPDATA, or an absolute path under the user's profile instead. Even for run_as: system (where SYSTEM can write to its own staged dir), the directory is cleaned up when the script exits, so writing siblings is fragile either way.

Identity table

run_as: (manifest)Child identityReads $PSScriptRootWrites $PSScriptRootHas admin
system (default)LocalSystem✓ but pointless (GC'd)yes
userLogged-in user✗ access deniedno
system_guiLocalSystem, in user session✓ but pointless (GC'd)yes

system_gui is the "PsExec -i -s" pattern — admin privilege but visible in the user's desktop session (useful for GUI tools that need both elevation and an interactive window).

stdout vs Write-Host

The backend's result projector reads stdout as the script's output. If your manifest has an inventory: block, stdout is parsed as a single JSON blob.

Use Write-Host for progress chatter — it goes to the host stream, NOT stdout, so it doesn't pollute the JSON parse.

Write-Host "Downloading..."         # → host stream (logged but ignored by projector)
Write-Output ($obj | ConvertTo-Json) # → stdout (parsed)

Avoid Write-Output for chatter — anything on stdout that isn't the expected JSON will fail the inventory parse.

UTF-8 by default

The launcher sets [Console]::OutputEncoding = UTF-8 and $OutputEncoding = UTF-8 before invoking your script, so any stdout / stderr you produce is UTF-8 regardless of the host's system codepage. Operator-shipped scripts with Japanese / DE / KR / CN strings show up correctly in the SPA Activity view without per-host workarounds.

If you explicitly need OEM / CP932 / Shift-JIS output (e.g. calling a legacy CLI that ignores $OutputEncoding), set it yourself in the script after the launcher prelude has run — your assignment takes precedence.

Native command exit codes

If your script ends with a successful native command run, the overall exit is 0 — that's PowerShell's default. If a native command fails ($LASTEXITCODE -ne 0) and you DON'T handle it, PowerShell still exits 0 — $ErrorActionPreference = 'Stop' does not save you here.

Windows PowerShell 5.1 (the default on Windows endpoints — and what the agent's powershell.exe resolves to) treats native command non-zero exits as non-terminating regardless of $ErrorActionPreference. PowerShell 7.3+ adds $PSNativeCommandUseErrorActionPreference = $true which makes them terminating, but that's not available in the deployment target. Always check $LASTEXITCODE explicitly.

The agent does NOT auto-propagate $LASTEXITCODE either — that would exit nonzero even when your script handled the native error gracefully. If you want the script's exit code to reflect a specific native call, propagate it yourself:

& git pull
if ($LASTEXITCODE -ne 0) { throw "git pull failed with exit code $LASTEXITCODE" }
# or, if you want the exact native code propagated:
if ($LASTEXITCODE -ne 0) { exit $LASTEXITCODE }

throw is usually preferable because it produces a clean PowerShell error record (which the trap { … break } cleanup pattern can intercept) and exits non-zero. exit $LASTEXITCODE is right when the caller cares about the exact code.

Timeouts

The manifest's timeout: is enforced by the agent. When it fires, the agent calls child.kill() on the PowerShell process — no graceful shutdown, no trap, no finally. Plan for it:

  • Budget the script to finish in timeout * 0.6 and leave headroom.
  • Use trap { ... ; break } for cleanup of resources that need explicit release (staging dirs, lock files) — trap fires on terminating errors, NOT on the agent's kill. Don't rely on it for the timeout case.
  • If you need cooperative cancellation, poll a sentinel file or a registry value and exit early. The agent has no way to send the script a graceful "wrap up" signal.

Killing a running job

kanade kill <exec_id> publishes a kill message the agent subscribes to. On receipt, the agent calls child.kill() — same hard-kill as the timeout path. Operators get an immediate result row marked Killed with whatever stdout / stderr the agent managed to capture before termination.

Developer Workflow and Contribution

This document outlines the standard workflows, lint/test requirements, and VCS branching guidelines for contributors working on the kanade codebase.


1. Quality Gates (Pre-Push & CI)

The local test suite must be green before you push or submit PRs. This matches the automated checks running on GitHub Actions.

# Run formatting check, clippy checks, target tests, and cargo lock checks
cargo make check
  • FMT & Clippy: We maintain a strict zero-warning policy. Do not sprinkle #[allow(clippy::...)] unless there is a strong architectural justification.
  • TDD (Test-Driven Development): Follow Kent Beck's TDD methodology. Write failing tests first to define the what, then implement the code to satisfy them.

2. Worktree Management with renri

For isolated feature development, we use renri to manage lightweight repository worktrees. This prevents staging pollution, keeps your main checkout clean, and allows you to switch tasks instantly.

Why renri?

In co-located Git and Jujutsu (jj) environments, managing worktrees manually can get complex. renri simplifies this by automatically wrapping VCS-specific worktree creation (favoring jj when configured) and cleanups.

Common Commands

# Create an isolated worktree (uses Jujutsu by default if present)
renri add feat/your-awesome-feature

# Force a Git-native worktree (bypassing jj)
renri --vcs git add feat/your-awesome-feature

# Clean up and delete a worktree after merging
renri remove feat/your-awesome-feature

# Garbage-collect and prune stale or broken worktrees
renri prune

Note: Worktree creation automatically invokes the cargo-make on-add hook to fetch remote refs and bootstrap APM configurations immediately.


3. Co-located Jujutsu (jj) & Git Workflow

Our development environment is configured with co-located Git and Jujutsu. We prefer jj for local version control due to its safe, conflict-free commit model.

Guidelines

  • No Direct Push to main: All changes must land via a Pull Request.
  • Branch/Bookmark Naming:
    • feat/... for new features.
    • fix/... for bugs.
    • chore/... for infrastructure, dependency bumps, or releases.
  • Commit Messages: Write commit messages, PR titles, and bodies in English.
  • Version Bumps: Release version bumps are managed exclusively via PRs on main and automated tagging pipelines. Never run git tag manually.

4. Documentation Policy

Documentation must stay in lock-step with code changes. Whenever you add or modify features:

  • Update docstrings and comments explaining the why (avoid comments restating how).
  • Update the relevant book pages (written in English under book/src/).
  • Synchronize localization catalogs by running the translation template generator.

Spec (legacy single-page)

The full protocol / on-wire spec hasn't been migrated into the book yet. The authoritative source is the single-file version in the repo:

docs/SPEC.md on GitHub

Splitting it into chapters under this section is a follow-up once the rest of the operator / developer guides settle.

Configuration Reference

kanade services rely on structured configurations loaded from TOML files, environment variables, or registry paths.


1. Agent Configuration

The agent searches for its configuration via the KANADE_AGENT_CONFIG environment variable or falls back to native paths.

Dev Configuration (configs/agent.dev.toml)

# Dev configuration schema
[agent]
id = "dev-pc"
nats_url = "nats://localhost:4223"
data_dir = "target/dev-data/agent"

[log]
level = "debug"
file = "target/dev-data/agent/logs/agent.log"

Configuration Parameters

FieldTypeDescriptionEnvironment Override
agent.idStringUnique hardware identifier (pc_id).KANADE_DEV_AGENT_ID (templated)
agent.nats_urlStringNetwork address of the NATS broker.KANADE_NATS_URL
agent.data_dirPathRoot path to cache outbox scripts, state database, and local completions.KANADE_AGENT_DATA_DIR
log.levelStringLogging verbosity (error, warn, info, debug, trace).RUST_LOG
log.filePathFilepath destination for rolling logs.-

2. Backend Configuration

The backend coordination layer retrieves its configurations from the file specified by KANADE_BACKEND_CONFIG or registers default structures.

Dev Configuration (configs/backend.dev.toml)

[backend]
listen_addr = "127.0.0.1:8081"
nats_url = "nats://localhost:4223"
database_url = "sqlite://target/dev-data/backend/state.db"

[auth]
# Auth settings

Configuration Parameters

FieldTypeDescriptionEnvironment Override
backend.listen_addrStringNetwork bind address for HTTP/WebSocket traffic.KANADE_BIND_ADDR
backend.nats_urlStringTarget NATS broker URL.KANADE_NATS_URL
backend.database_urlStringSQLite database connection string.DATABASE_URL
auth.disableBooleanSet to true to disable operator token validation (dev environment only).KANADE_AUTH_DISABLE

3. Windows Registry Integration

In production environments, security-sensitive tokens (like NATS client tokens and administrative API bearer tokens) are stored in the secure Windows Registry rather than plaintext files.

Key Paths

  • Agent Settings: HKLM:\SOFTWARE\Kanade\agent
  • Backend Settings: HKLM:\SOFTWARE\Kanade\backend

These registry paths are protected with local ACL configurations, allowing read permissions strictly to SYSTEM and designated operators.