Introduction
kanade — 奏 is an endpoint management system for Windows fleets. It gives an operator a single CLI / SPA to run scripts, install software, gather inventory, and stream live perf data from hundreds of PCs at once. The pieces:
| Component | What it is |
|---|---|
| kanade-agent | Service that runs on each managed PC. Subscribes to NATS, executes commands, ships results. |
| kanade-backend | HTTP API + projector. Persists state, serves the SPA, exposes operator endpoints. |
| kanade-client | Optional Tauri desktop app. End-user-facing surface. |
| NATS server | Message broker for command fan-out + result aggregation. The agent talks NATS-only; the backend reads NATS too. |
| kanade CLI | Operator-facing command line: publish binaries, fire jobs, query state. |
This site covers two audiences:
- Operators running a kanade fleet — how to update each component without ssh-ing into endpoints (see Agent-mediated updates).
- Developers writing PowerShell jobs the agent will execute — what works, what doesn't, what changed in recent agent versions (see Writing scripts for the agent).
The detailed protocol / on-wire spec lives at Spec (the legacy single-page document; will be split into chapters as the docs site fills out).
Developer Quickstart
This guide gets you up and running with a local kanade development environment.
1. Prerequisites
Before starting, ensure you have the following installed on your Windows machine:
- Rust toolchain (stable channel)
- cargo-make (run
cargo install --force cargo-make) - bun (for SPA dependency management and build execution)
- gsudo (for local service deployment tests)
- nats-server (runnable from PATH)
2. One-Time Setup
Run the following command at the workspace root to register the git pre-push hooks and install agent skills defined in apm.yml:
cargo make setup
3. Launching the Dev Sandbox
You can spin up a fully isolated, multi-component development stack on your local host using a single command:
cargo make dev
This task runs the following services concurrently in a loopback sandbox:
- nats-dev: Unauthenticated NATS broker listening on port
4223. - backend-dev: Dev API server listening on port
8081with auth disabled. - agent-dev: Local dev agent talking to the dev NATS broker on
4223. - web-dev: Vite dev server for the React SPA listening on
http://localhost:5173.
Press Ctrl+C to tear down all components cleanly.
4. Multi-Agent Fleet Simulation
To debug behavior that only shows up when managing multiple machines (e.g. concurrent execution result projection or ID collisions), you can launch a multi-agent sandbox:
cargo make dev-fleet
This spawns the NATS broker, backend, and SPA, plus three separate dev agents with independent IDs (dev-pc-1, dev-pc-2, dev-pc-3) and isolated state databases.
5. Local Deploy Testing
If you want to test the full lifecycle of installing components as Windows services (mirroring production environments), use the local deployment scripts:
# Installs CLI, agent, backend, and NATS services locally via gsudo elevation
cargo make local-deploy
After deployment, you can verify and interact with the real Windows services. Use the following task to stop and cleanly delete the services when finished:
cargo make local-undeploy
System Architecture
kanade is designed to manage hundreds of Windows endpoints concurrently, safely, and asynchronously.
Component Topology
The system consists of five main components, coordinated through an event-driven pub/sub structure:
graph TD
subgraph Operator Session
CLI[kanade CLI]
SPA[React SPA]
end
subgraph Server Infrastructure
Backend[kanade-backend]
NATS[NATS Broker / JetStream]
end
subgraph Windows Endpoints
Agent1[kanade-agent PC-1]
Agent2[kanade-agent PC-2]
Client[kanade-client Tauri App]
end
CLI -->|Command / Query API| Backend
SPA -->|REST / WebSockets| Backend
Backend <-->|State/PubSub| NATS
Agent1 <-->|NATS-only Connection| NATS
Agent2 <-->|NATS-only Connection| NATS
Client <-->|Tauri IPC| Agent1
1. kanade-agent
A high-performance Windows service running on each managed host.
- Role: Core executor.
- Communication: Establishes an outbound-only NATS connection. It does not open any inbound ports, making it firewall-friendly.
- Capabilities: Launches secure, isolated PowerShell subprocesses, inventories hardware/software specs, streams live performance data (CPU, RSS memory, disk I/O), and manages local packages.
2. kanade-backend
The central HTTP API and projection server.
- Role: Coordinates commands, processes incoming telemetry, and hosts the operator Web interface.
- State management: Persists events, activity logs, and status records in a localized SQLite database.
- Projector pattern: Subscribes to the NATS command-response stream, parses incoming payloads, and projects them into state tables in real-time.
3. NATS Broker (with JetStream)
The message transport layer of the entire fleet.
- Role: Lightweight, high-throughput message broker.
- JetStream: Retains command streams, job registrations, and file storage (using NATS Object Store buckets for distributing packages and agent scripts).
- Isolation: Decouples the backend from the agents. If the backend is offline or restarting, agents continue execution and cache outbox records, pushing them once connection resumes.
4. kanade-client
An optional Tauri desktop application running in the logged-in user's desktop session on endpoints.
- Role: Provides end-user interaction (e.g., prompt dialogs, notifications, or a user-facing dashboard).
- Communication: Shares state with the local
kanade-agentvia secured local IPC mechanisms.
5. kanade CLI
The primary command-line tool for operators.
- Role: Packages and publishes software updates, submits and executes job manifests, and queries live fleet inventory from the command-line.
Security & Reliability Design
Outbound-only Connections
Agents strictly communicate with the NATS broker by initiating outbound TCP connections. No firewall ports need to be opened on endpoints, neutralizing the risk of lateral traversal or external port scanning.
Agent Job Sandboxing
When executing scripts, the agent stages commands in %ProgramData%\Kanade\agent-scripts and executes them using customized launcher templates.
Administrators can enforce identity configurations via job manifests, specifying run_as: system (for elevated system management) or run_as: user (to run safely under the active user's credentials with restricted directory ACLs).
Operations overview
Day-2 operations in kanade fall into two flows:
-
Direct install — drop binaries + config on a fresh host and register the Windows service. Used to bootstrap the first agent, the initial backend, and the NATS server. Scripts:
scripts/deploy/agent.ps1,scripts/deploy/backend.ps1,scripts/deploy/nats.ps1. Run manually on the target host. -
Agent-mediated update — once an agent is running, the agent itself can install / update other components on its own host without ssh / RDP. The operator publishes binaries + script bodies to the broker, then fires a job; the agent fetches, verifies, swaps, and restarts services. This is the bulk of day-2 operations.
The agent-mediated flow has the same shape regardless of what you're updating:
operator host ─► kanade CLI ─► NATS broker ─► agent (on target host)
│ │
├── publish binary ────────────► fetches from
│ to OBJECT_APP_PACKAGES OBJECT_APP_PACKAGES
├── publish script ────────────► fetches from
│ to OBJECT_SCRIPTS OBJECT_SCRIPTS
├── register / update job ─────► reads job manifest
│ from `jobs` KV
└── exec job ──────────────────► PowerShell child
runs the script
Component-specific guides:
- kanade-backend — the HTTP / projector binary
- kanade-client — the Tauri end-user app
- NATS server — the broker itself (yes, you can update the broker over the broker)
- kanade-agent itself — the agent self-update path (different from the other three; it uses a dedicated rollout bucket, not the generic OBJECT_APP_PACKAGES + script pair)
Installation and Deployment
This section details how to bootstrap kanade components as native Windows services in production or staging environments.
Deployment Model
Production hosts and target endpoints run kanade components as background Windows services. This ensures high availability and automatic startup.
| Service Name | Triple / Binary | Config Source | Typical Target |
|---|---|---|---|
| KanadeNats | nats-server.exe | Hardened Registry / Registry-baked CLI flags | Central server |
| KanadeBackend | kanade-backend.exe | Hardened Registry / Config file | Central server |
| KanadeAgent | kanade-agent.exe | Hardened Registry / Local state DB | Managed endpoints |
1. Prerequisites
- Host OS: Windows 10/11 or Windows Server 2016+.
- gsudo: Required to perform elevated installations from standard user shells (or run commands from an Administrator-level PowerShell prompt).
- Network Routing: Managed endpoints must be able to reach the NATS server port (default
4222) over TCP.
2. Setting Up the NATS Server (Broker)
The NATS server acts as the messaging core.
- Stage the deployment bundle using
scripts/build-release.ps1 -Roles nats. - Deploy the service with elevation:
# Elevated PowerShell prompt & "dist\nats\deploy-nats.ps1" -NatsToken "your-secure-nats-token" -Recreate
This installs the KanadeNats service, configures it to run under the local system account, sets up JetStream data directories, and locks down the secure authorization token in the Windows registry.
3. Deploying the Backend API & SPA
The backend manages operator connections and processes event logs.
- Stage the backend binaries and React SPA bundle using
scripts/build-release.ps1 -Roles backend. - Deploy the service:
# Elevated PowerShell prompt & "dist\backend\deploy-backend.ps1" ` -NatsToken "your-secure-nats-token" ` -StaticToken "your-operator-spa-bearer-token" ` -ForceConfig -Recreate
-NatsToken: Connects the backend to the local NATS server securely.-StaticToken: Defines the API bearer token required for operator CLI/SPA logins.
The deployment script registers the KanadeBackend Windows service, sets the appropriate ACLs, and verifies the endpoint.
4. Installing the Agent on Target Endpoints
Install the agent on every endpoint PC that you want to manage.
- Stage the agent bundle using
scripts/build-release.ps1 -Roles agent. - Copy the contents of the
dist/agentfolder to the target PC. - On the target PC, run the installer:
# Elevated PowerShell prompt & ".\deploy-agent.ps1" -NatsToken "your-secure-nats-token" -ForceConfig -Recreate
The script:
- Places
kanade-agent.exeinto its destination directory. - Secures the configuration and NATS token in the Windows registry path (
HKLM:\SOFTWARE\Kanade\agent). - Registers and starts the KanadeAgent service.
Once the service is active, the agent establishes an outbound NATS connection, subscribes to command streams, and reports its online heartbeat back to the fleet backend.
Agent-mediated updates
The agent is the universal installer. Once it's running on a target host, the operator never needs to touch the host directly to update any other component — including the backend it talks to, the broker that carries its messages, and the agent itself.
This chapter has one page per component:
Common machinery used by all of them:
| Bucket / Stream | Purpose |
|---|---|
OBJECT_APP_PACKAGES | Generic binary storage (backend, client, NATS server, …). Keyed by <name>/<version>. |
OBJECT_SCRIPTS | PowerShell script bodies referenced by manifests via script_object. Keyed by <name>/<version>. |
OBJECT_AGENT_RELEASES | Agent binaries only. Separate from APP_PACKAGES because agent rollout has its own watcher / target_version flow. |
agent_config (KV) | Layered config — global / per-group / per-PC. target_version lives here. |
jobs (KV) | Job catalog. Each entry is a manifest the operator can exec. |
The CLI surface:
| Command | What it does |
|---|---|
kanade app publish <name> <version> <file> | Upload to OBJECT_APP_PACKAGES. |
kanade script publish <name> <version> <file> | Upload to OBJECT_SCRIPTS. |
kanade job create <yaml> | Upsert a job manifest into the jobs KV. |
kanade exec <job-id> --pcs <pc> [--pcs <pc> …] | Fire a registered job at a set of PCs. |
kanade agent publish <file> | Upload an agent binary (version extracted from PE VERSIONINFO). |
kanade agent rollout <version> --pc | --group | --global | Flip target_version on the chosen scope; agents pick it up via their self-update watcher. |
Updating kanade-backend
The backend lives on one (or more) of your managed hosts as a Windows service. Agent-mediated update means: an agent running on that host stops the service, swaps the binary, starts it back up — while the operator never logs into the host.
End-to-end flow
┌── operator host ──────────────────────────────────────────┐
│ 1. build kanade-backend.exe │
│ 2. kanade app publish kanade-backend <v> <exe> │
│ 3. edit deploy-backend.ps1 (set $AgentSource* knobs) │
│ 4. kanade script publish deploy-backend <v> <edited.ps1> │
│ 5. kanade job create install-kanade-backend.yaml │
│ 6. kanade exec install-kanade-backend --pcs <host> │
└────────────────────────────────────────────────────────────┘
│
▼
┌── target host (running kanade-agent as LocalSystem) ──────┐
│ • agent receives the Command on commands.pc.<host> │
│ • fetches deploy-backend.ps1 from OBJECT_SCRIPTS │
│ sha-verifies it (`script_object` machinery, #214) │
│ • stages it under │
│ C:\ProgramData\Kanade\agent-scripts\<UUID>\ │
│ kanade-<UUID>.ps1 │
│ • runs `powershell -File <launcher>` (PR #230 fix) │
│ • launcher invokes the user script via `& '...'` │
│ so [CmdletBinding()] / param() headers parse │
│ • script downloads kanade-backend.exe from │
│ OBJECT_APP_PACKAGES (via /api/app-packages/…) │
│ sha-verifies it (separate hash, on the exe itself) │
│ • Stop-Service KanadeBackend │
│ • copy exe over C:\Program Files\Kanade\… │
│ • Start-Service KanadeBackend │
│ • exit 0 — result published to NATS │
└────────────────────────────────────────────────────────────┘
The two sha checks are intentional: the script body's hash is
verified by the agent before execution (script integrity); the
binary's hash is verified by the script before the swap (binary
integrity, defined by the operator in
$AgentSourceSha256).
Step-by-step
1. Build kanade-backend
cargo build --release -p kanade-backend
Output: target/release/kanade-backend.exe.
2. Publish the binary
kanade app publish kanade-backend 0.43.0 target/release/kanade-backend.exe
This uploads the binary to OBJECT_APP_PACKAGES/kanade-backend/0.43.0
and prints the sha-256 digest. Copy the digest — you'll need
its lowercase-hex form for the script.
3. Edit scripts/deploy/backend.ps1
Make a local copy. Set the four $Agent* knobs at the top:
$AgentSourceUrl = 'http://kanade-backend.example.com:8080'
$AgentSourceVersion = '0.43.0'
$AgentSourceSha256 = '<lowercase hex of kanade-backend.exe>'
$AgentSourceAuthToken = '<bearer for the backend HTTP API>'
Leave the rest of the script alone — those knobs are how the script knows it's running in "agent mode" (downloading from the backend) vs the manual-install mode (local folder of files).
The
$AgentSourceSha256is the hex form ofGet-FileHash kanade-backend.exe -Algorithm SHA256.If you only have the base64url form printed by
kanade app publish, decode it. The base64 from the CLI is URL-safe and may be unpadded, so the PowerShell snippet needs to re-pad beforeFromBase64Stringaccepts it:$b64 = '<paste the SHA-256= value here, without the SHA-256= prefix>' $b64 = $b64.Replace('-', '+').Replace('_', '/') if ($b64.Length % 4) { $b64 += '=' * (4 - $b64.Length % 4) } [BitConverter]::ToString([Convert]::FromBase64String($b64)).Replace('-', '').ToLowerInvariant()Or in Python:
python -c "import base64; print(base64.urlsafe_b64decode('<b64>' + '=' * (-len('<b64>') % 4)).hex())"
The
$AgentSourceAuthTokenis required as of the live test on 2026-05-26 — the backend's/api/app-packages/<name>/<ver>endpoint returns HTTP 401 without it. Leave empty only for no-auth lab setups.
4. Publish the edited script
kanade script publish deploy-backend 0.43.0 .\deploy-backend.edited.ps1
Upload goes to OBJECT_SCRIPTS/deploy-backend/0.43.0.
5. Register / update the job
configs/jobs/installers/install-kanade-backend.yaml in the repo is the template.
Edit version: + script_object: to point at the version you
just published, then upsert:
id: install-kanade-backend
version: 0.43.0
execute:
shell: powershell
script_object: deploy-backend/0.43.0
timeout: 300s
run_as: system
require_approval: true
kanade job create jobs\install-kanade-backend.yaml
run_as: systemis required:Stop-Service/Start-Service/sc.exeall need admin. The agent already runs as LocalSystem in production.
6. Fire it
kanade exec install-kanade-backend --pcs <backend-host>
The CLI returns an exec_id immediately. The actual install
happens asynchronously on the target.
7. Verify
Query the backend's results endpoint (or watch the SPA Activity view):
curl -H "Authorization: Bearer <token>" `
"http://<backend>/api/results?limit=5"
Look for your exec_id with exit_code: 0 and a stdout that
ends with kanade-backend <new-version>.
What can go wrong
| Symptom | Cause | Fix |
|---|---|---|
[CmdletBinding()] / param() parse error in stderr | Agent older than 0.42.2 (running -Command mode) | Upgrade the agent first via kanade agent rollout (see agent self-update). |
Start-BitsTransfer : HTTP status 401 | $AgentSourceAuthToken empty but backend requires auth | Set it. |
Start-BitsTransfer : The transfer encountered an error / job state TransientError | BITS service not running, or target machine's WinHTTP can't reach $AgentSourceUrl | Get-Service BITS; check WinHTTP proxy with netsh winhttp show proxy (BITS uses WinHTTP, not IE/WinINet). |
sha256 mismatch — expected=<x> actual=<y> | Hash in script doesn't match the published binary | Re-publish or recompute the hash. The script aborts BEFORE the swap, so the existing install is intact. |
| Job runs but kanade-backend doesn't come back up | Service-failure / config drift on target | Read C:\ProgramData\Kanade\log\backend.*.log on the target. The agent can fetch it via kanade logs <pc> (when implemented) or you can pull the file directly. |
Updating kanade-client
The Tauri desktop client is shipped to endpoints the same way as
the backend: binary in OBJECT_APP_PACKAGES, script in
OBJECT_SCRIPTS, job in jobs KV. The shape mirrors
backend updates — only the script content and
package name differ.
What's different from backend updates
| Aspect | kanade-backend | kanade-client |
|---|---|---|
| Service to (re)start | KanadeBackend (Windows service) | None — the client is launched by the user |
| Install location | %ProgramFiles%\Kanade\kanade-backend.exe | %ProgramFiles%\Kanade\kanade-client.exe |
| Script in repo | scripts/deploy/backend.ps1 | configs/jobs/installers/scripts/install-kanade-client.ps1 (lives in the manifest's script_file path) |
| Manifest file ref | script_object: deploy-backend/<v> | script_file: scripts/install-kanade-client.ps1 (relative to the manifest YAML; inlined at kanade job create) |
| Atomic swap pattern | Stop service → copy → start service | Stage to <exe>.new → Move-Item → drop <exe>.old |
| Inventory projection | None (the backend reports its own version) | inventory: block emits per-PC client version into the SPA Inventory page |
Both shapes —
script_object(referenced by hash from OBJECT_SCRIPTS, agent fetches on demand) andscript_file(script body inlined into the manifest atkanade job createtime) — are supported. The client manifest usesscript_filefor historical reasons; the backend manifest usesscript_objectbecause it was rewritten to test the Object Store path.
Step-by-step
1. Build kanade-client
cargo build --release -p kanade-client
Output: target/release/kanade-client.exe.
2. Publish the binary
kanade app publish kanade-client 0.42.0 target/release/kanade-client.exe
3. Edit configs/jobs/installers/scripts/install-kanade-client.ps1
Set the three knobs at the top:
$BackendBase = 'http://kanade-backend.example.com:8080'
$Version = '0.42.0'
$ExpectedSha256 = '<lowercase hex of kanade-client.exe>'
Set
$ClientSourceAuthTokento the backend's bearer when auth is enabled — same token the agent uses against the rest of/api/*. Leave it blank for dev / smoke-test setups where the/api/app-packages/kanade-client/<v>route is unauthenticated. Mirrors the$AgentSourceAuthTokenknob in scripts/deploy/backend.ps1.
4. Register / update the job
configs/jobs/installers/install-kanade-client.yaml:
id: install-kanade-client
version: 0.42.0
execute:
shell: powershell
script_file: scripts/install-kanade-client.ps1 # body inlined at `job create` (relative to the manifest YAML)
timeout: 180s
run_as: system
require_approval: true
inventory:
display:
- { field: version, label: Version }
- { field: path, label: Install path }
summary:
- { field: version, label: Client version }
kanade job create jobs\install-kanade-client.yaml
The
inventory:block tells the projector that the script's stdout is a single JSON blob whoseversion/pathfields populate the SPA's Inventory page. Operators can spot stragglers from a fleet-wide table — no ssh needed.
5. Fire it
kanade exec install-kanade-client --pcs <host> [--pcs <host> …]
Or against a group:
kanade exec install-kanade-client --groups office
6. Verify in the SPA
Open the SPA Inventory page (or query /api/inventory?app=kanade-client)
and confirm the target hosts report the new version.
Updating NATS server
Updating the broker on a managed host is the most interesting case because the agent talks to the broker over the broker — stopping NATS means losing the agent's connection mid-job. The machinery handles this with two mechanisms working together:
- Reconnect. The agent's NATS client reconnects automatically on broker restart. No human intervention needed.
- Outbox. Job results produced while the broker is down are
queued under
%ProgramData%\Kanade\outbox\and replayed once the connection comes back. The result row reaches the backend as soon as the new NATS server is up.
So the flow looks the same as backend updates — the script Stops the service, swaps the binary, Starts it — and the agent transparently rides out the broker gap.
Caveats specific to NATS updates
| Concern | Reality |
|---|---|
| Will the result row be lost? | No — outbox persists it across the broker outage and drains on reconnect. |
| Can I update from the SPA? | Yes, same as any job — kanade exec install-kanade-nats --pcs <broker-host>. |
| What if NATS doesn't come back up? | The result will sit in the outbox indefinitely. Operators should monitor outbox/ on the broker host as a leading indicator. |
| What if the new NATS version is incompatible (JetStream upgrade etc.)? | Roll a single canary first (--pcs <one-broker>), watch outbox + backend health, then roll out fleet-wide. The 5-min cache TTL for SPA queries means you'll see the canary's state within a few minutes. |
Manual install (bootstrap)
For the very first install — when there's no agent on the broker host yet — use the direct workflow:
.\scripts\build-release.ps1 -Roles nats # fetches nats-server.exe
# from github.com/nats-io/nats-server/releases
.\scripts\deploy\nats.ps1 -NatsToken '<token>'
This installs nats-server.exe to %ProgramFiles%\Kanade\ and
nats-server.conf to %ProgramData%\Kanade\config\ (with ACL
hardened to SYSTEM + Administrators because the bearer token
lives in plaintext), registers the KanadeNats Windows service,
opens TCP 4222 (broker) + 8222 (monitoring HTTP), and starts the
service.
Agent-mediated update (steady state)
Status: template-only.
scripts/deploy/nats.ps1doesn't ship$AgentSource*knobs yet — agent-mode is on the backlog. The shape below is what it WILL look like once the knobs land.
1. Build / fetch nats-server.exe
Either:
.\scripts\build-release.ps1 -Roles nats # fetches the binary
…or download it directly from github.com/nats-io/nats-server/releases.
2. Publish the binary
kanade app publish nats-server 2.10.20 .\nats-server.exe
3. Edit deploy-nats.ps1
Once the agent-mode knobs ship, the pattern matches deploy/backend.ps1:
$AgentSourceUrl = 'http://kanade-backend.example.com:8080'
$AgentSourceVersion = '2.10.20'
$AgentSourceSha256 = '<lowercase hex of nats-server.exe>'
$AgentSourceAuthToken = '<bearer for the backend HTTP API>'
4. Publish + register + exec
kanade script publish deploy-nats 2.10.20 .\deploy-nats.edited.ps1
kanade job create jobs\install-kanade-nats.yaml
kanade exec install-kanade-nats --pcs <broker-host>
The job manifest will look like:
id: install-kanade-nats
version: 2.10.20
execute:
shell: powershell
script_object: deploy-nats/2.10.20
timeout: 300s
run_as: system
require_approval: true
5. Verify
After the broker comes back, the outbox drains and you'll see the
result row in /api/results. Confirm the new NATS version via
the broker's monitoring endpoint:
curl http://<broker>:8222/varz | python -m json.tool | rg version
Why we don't need a separate "broker update" mechanism
Earlier designs considered a dedicated bootstrap channel (parallel NATS link the agent uses just for broker updates) to avoid the self-update-over-broker chicken-and-egg. The outbox + reconnect pair makes that unnecessary: the result is "merely delayed", not "lost". One transport, one mental model.
Updating kanade-agent itself
Agent self-update is the only component that doesn't use
OBJECT_APP_PACKAGES + a script_object job. It has dedicated
machinery because the agent has to swap its own running binary
without ssh — a tighter loop than the generic install jobs.
Mechanism
| Bucket / Key | Purpose |
|---|---|
OBJECT_AGENT_RELEASES | Agent binaries, keyed by <version>. Separate from OBJECT_APP_PACKAGES so the rollout watcher only fires on agent updates. |
agent_config.<scope>.target_version | The version each scope (global / group / pc) should be on. Watched by the agent's self_update loop. |
Flow:
1. agent.self_update watches agent_config for target_version
2. If target_version != my agent_version:
a. Pull `OBJECT_AGENT_RELEASES/<target_version>` to <exe>.new
b. Sha-verify against the bucket's recorded digest
c. Atomic swap: <exe> ← <exe>.new (via SCM stop/start)
d. New binary boots, watcher arms again, loop closes
The rollout watcher has to survive a cold broker (e.g. agent and
broker boot at the same time after a host reboot). Pre-#226 a
permanent Err(_) => return; on the first get_object_store
call killed the watcher forever; the agent would never self-update
on that boot. Post-#226 the watcher retries with backoff until the
broker is reachable.
Step-by-step
1. Build the agent
cargo build --release -p kanade-agent
Output: target/release/kanade-agent.exe.
2. Publish
kanade agent publish target/release/kanade-agent.exe
The CLI extracts the version from the PE VERSIONINFO resource — no
--version flag, no chance of a label / binary mismatch.
3. Roll out
Pick a scope. Start with one canary host:
kanade agent rollout 0.42.2 --pcs canary-01
Watch via ping:
kanade ping canary-01 # agent_version should flip to 0.42.2
# within a few seconds
If happy, widen:
kanade agent rollout 0.42.2 --groups office --jitter 5m
# or fleet-wide
kanade agent rollout 0.42.2 --global --jitter 30m
--jitterspreads the actual swap moment across a window so a wide fan-out doesn't hammer the OS service manager on every host at once. Recommended for fleets ≥ 100 hosts.
4. Verify
kanade agent current
# → target_version = 0.42.2 (global)
Then a fleet-wide spot-check via the SPA Agents page (or
/api/agents): the agent_version column should converge to
the new version within jitter + ~30s heartbeat cadence.
What can go wrong
| Symptom | Cause | Fix |
|---|---|---|
kanade agent rollout says "version not in OBJECT_AGENT_RELEASES" | Typo or wrong scope | Re-check with kanade agent current and kanade jetstream object list agent_releases. |
kanade ping <host> still shows the old version after several minutes | Agent didn't self-update — either the watcher's dead (pre-#226 agent) or the host can't reach the broker | Check %ProgramData%\Kanade\log\agent.*.log on the target. If self_update is silent (no "checking target_version" log lines), the agent is too old; bootstrap manually with deploy-agent.ps1. |
Agent flaps: starts, immediately exits with exit_code: 1 | The new binary is bad on this host (config drift, missing dep, etc.). SCM's failure-actions restart it, it crashes again — observable in Event Viewer as a Service Control Manager error cluster | Roll back: kanade agent rollout <prev-version> --pcs <host>. The host will swap back at the next watcher tick. |
Why a separate bucket / scope?
OBJECT_APP_PACKAGES is a generic blob store keyed by
<name>/<version>. The agent rollout pattern needs:
- A watcher that fires only on agent changes (cheap KV watch on one specific key, not a poll over a bucket of many names).
- A "current target" semantic per scope, not just "all known
versions" —
agent_config.<scope>.target_versionIS the answer to "what should I be running" without the agent enumerating. - Operator UX (
kanade agent publish/rollout) that's divergent enough fromkanade app publishto warrant its own subcommand tree.
So agents get OBJECT_AGENT_RELEASES + a layered config KV; the
other components share OBJECT_APP_PACKAGES + per-app jobs.
Removing kanade from a host (undeploy)
Production rollback path. When a host needs to come off kanade — because a rollout broke something, the host is being decommissioned, or you just want a clean slate to re-install from — there's one undeploy script per component, mirroring the deploy script that put it there.
| Component | Deploy | Undeploy |
|---|---|---|
| Agent | scripts/deploy/agent.ps1 | scripts/undeploy/agent.ps1 |
| Backend | scripts/deploy/backend.ps1 | scripts/undeploy/backend.ps1 |
| NATS server | scripts/deploy/nats.ps1 | scripts/undeploy/nats.ps1 |
| Client (Tauri) | configs/jobs/installers/scripts/install-kanade-client.ps1 (agent-driven) | scripts/undeploy/client.ps1 |
All four are admin-only and idempotent — safe to re-run after a partial uninstall, safe to run when the component is already gone (each step logs "not present, skipping" and moves on).
Default posture: safe
Run with no flags and the script:
- Stops the Windows service.
- Unregisters it from SCM (waits for the entry to actually disappear, so a re-deploy doesn't race a pending removal).
- Removes the installed binary from
%ProgramFiles%\Kanade\, including any half-completed<exe>.new/<exe>.oldswap artefacts. - Removes any inbound firewall rule the deploy script created
(pass
-KeepFirewallto skip this — useful when an external WAF / Group Policy owns the rule). - Keeps everything under
%ProgramData%\Kanade\(config, logs, JetStream data, SQLite DB, …) so forensics / rollback / re-deploy can proceed without losing state. - Keeps registry-stored secrets at
HKLM:\SOFTWARE\kanade\<role>\*.
That's enough for the common case: "this host's kanade is misbehaving, get it off without destroying state".
-Purge: destructive cleanup
Adds:
- Removes the component's exclusive entries under
%ProgramData%\Kanade\. Crucially, only the component's own files — agent / backend / NATS share the same root and each script avoids touching the others' files. - Removes the matching
HKLM:\SOFTWARE\kanade\<role>\*key (unless-KeepSecretsis also passed — useful when multiple components share the same bearer).
| Component | What -Purge removes |
|---|---|
| Agent | config\agent.toml, logs\agent.*.log, outbox\, HKLM:\SOFTWARE\kanade\agent\ |
| Backend | config\backend.toml, data\*.db* (SQLite — historical results / inventory wiped), logs\backend.*.log, HKLM:\SOFTWARE\kanade\backend\ |
| NATS | config\nats-server.conf, nats\ (JetStream — all KV / Object Store / streams wiped), logs\nats*.log |
| Client | Nothing extra (no per-user state yet) |
⚠️
undeploy-nats.ps1 -Purgeandundeploy-backend.ps1 -Purgeare the dangerous ones. The first wipes the fleet's entire JetStream state (agent_releases, app_packages, scripts, jobs, agent_config, results stream); the second wipes the projector's historical SQLite. Both are unrecoverable without out-of-band backups. The scripts print a loud banner before running.
Rollback recipes
Bad rollout on one canary host
# On the canary, as Admin:
.\scripts\undeploy\agent.ps1 # safe default
# kanade is now off the host. Re-deploy when ready:
.\scripts\deploy\agent.ps1 -SourceDir C:\path\to\prev-version
Decommission a host permanently
.\scripts\undeploy\agent.ps1 -Purge
Wipe a dev box for a clean re-install
.\scripts\undeploy\agent.ps1 -Purge
.\scripts\undeploy\backend.ps1 -Purge # ⚠️ SQLite gone
.\scripts\undeploy\nats.ps1 -Purge # ⚠️ JetStream gone
.\scripts\undeploy\client.ps1
# Now nothing about kanade exists on the box.
Rebuild a single bad service without touching state
.\scripts\undeploy\backend.ps1 # safe default: SQLite intact
.\scripts\deploy\backend.ps1 -Recreate # fresh service registration, same data
What undeploy does NOT do
- It doesn't notify the rest of the fleet that this host has
gone away — the backend will keep listing it under "agents"
until its heartbeat ages out (
/api/agentsstaleness threshold). If you want it removed from the SPA immediately, delete the row via the backend API after undeploy. - It doesn't roll back the deployed binary to a previous
version. "Roll back" in this script's vocabulary means
"remove entirely"; if you want to swap to an older version,
re-run the matching
deploy-*.ps1against a folder containing the older binary. - It doesn't touch NATS-side state when you remove the agent —
the agent's
target_versionentry underagent_config.pcs.<pc>stays in the KV. Clean those up server-side withkanade jetstream kv del agent_config pcs.<pc>.target_versionif needed.
Writing scripts for the agent
PowerShell scripts the agent will run are almost normal .ps1
files. This page collects the gotchas that aren't obvious from
the script source alone.
The agent stages scripts on disk and runs them via -File
As of PR #230 (agent version 0.42.0+), the agent:
- Writes your script body to a temp
.ps1under%ProgramData%\Kanade\agent-scripts\<UUID>\kanade-<UUID>.ps1(Windows) or$TMPDIR/kanade-agent-<UUID>/kanade-<UUID>.ps1(non-Windows dev only). - Writes a launcher
.ps1next to it that sets UTF-8 console encoding then& '<your-script>' @args. - Spawns
powershell -NoProfile -NonInteractive -ExecutionPolicy Bypass -File <launcher>.
This means your script:
- Can have
[CmdletBinding()]andparam(...)at the top. The call-operator boundary in the launcher gives your script its own scope where those headers are valid. - Should not rely on
$PSCommandPathmatching the operator's source path — it'll be the staged temp file. - Should not write to
$PSScriptRoot(see next section).
Pre-0.42.0 agents used powershell -Command "<body>", which
parses the body as a command-line expression and rejects
[CmdletBinding()] as a syntax error. If you see
"Unexpected token '[CmdletBinding()]'" in stderr, the host's
agent is too old — upgrade it (see
agent self-update).
$PSScriptRoot is read-only for run_as: user
When run_as: user (or system_gui), the child process runs as
the logged-in user — not as the LocalSystem agent that wrote the
staged file. The staging directory inherits its ACL from
%ProgramData%, which grants users Read & Execute but not
Modify.
That means:
# OK from any run_as
Get-ChildItem $PSScriptRoot # list contents
Get-Content $PSScriptRoot\anything # read
# NG from run_as: user (access denied)
New-Item -Path $PSScriptRoot\out.txt
Set-Content -Path $PSScriptRoot\log.log
Write to $env:TEMP, $env:LOCALAPPDATA, or an absolute path
under the user's profile instead. Even for run_as: system (where
SYSTEM can write to its own staged dir), the directory is cleaned
up when the script exits, so writing siblings is fragile either
way.
Identity table
run_as: (manifest) | Child identity | Reads $PSScriptRoot | Writes $PSScriptRoot | Has admin |
|---|---|---|---|---|
system (default) | LocalSystem | ✓ | ✓ but pointless (GC'd) | yes |
user | Logged-in user | ✓ | ✗ access denied | no |
system_gui | LocalSystem, in user session | ✓ | ✓ but pointless (GC'd) | yes |
system_gui is the "PsExec -i -s" pattern — admin privilege but
visible in the user's desktop session (useful for GUI tools that
need both elevation and an interactive window).
stdout vs Write-Host
The backend's result projector reads stdout as the script's
output. If your manifest has an inventory: block, stdout is
parsed as a single JSON blob.
Use Write-Host for progress chatter — it goes to the host
stream, NOT stdout, so it doesn't pollute the JSON parse.
Write-Host "Downloading..." # → host stream (logged but ignored by projector)
Write-Output ($obj | ConvertTo-Json) # → stdout (parsed)
Avoid Write-Output for chatter — anything on stdout that isn't
the expected JSON will fail the inventory parse.
UTF-8 by default
The launcher sets [Console]::OutputEncoding = UTF-8 and
$OutputEncoding = UTF-8 before invoking your script, so any
stdout / stderr you produce is UTF-8 regardless of the host's
system codepage. Operator-shipped scripts with Japanese / DE /
KR / CN strings show up correctly in the SPA Activity view
without per-host workarounds.
If you explicitly need OEM / CP932 / Shift-JIS output (e.g.
calling a legacy CLI that ignores $OutputEncoding), set it
yourself in the script after the launcher prelude has run —
your assignment takes precedence.
Native command exit codes
If your script ends with a successful native command run, the
overall exit is 0 — that's PowerShell's default. If a native
command fails ($LASTEXITCODE -ne 0) and you DON'T handle it,
PowerShell still exits 0 — $ErrorActionPreference = 'Stop'
does not save you here.
Windows PowerShell 5.1 (the default on Windows endpoints — and what the agent's
powershell.exeresolves to) treats native command non-zero exits as non-terminating regardless of$ErrorActionPreference. PowerShell 7.3+ adds$PSNativeCommandUseErrorActionPreference = $truewhich makes them terminating, but that's not available in the deployment target. Always check$LASTEXITCODEexplicitly.
The agent does NOT auto-propagate $LASTEXITCODE either — that
would exit nonzero even when your script handled the native error
gracefully. If you want the script's exit code to reflect a
specific native call, propagate it yourself:
& git pull
if ($LASTEXITCODE -ne 0) { throw "git pull failed with exit code $LASTEXITCODE" }
# or, if you want the exact native code propagated:
if ($LASTEXITCODE -ne 0) { exit $LASTEXITCODE }
throw is usually preferable because it produces a clean
PowerShell error record (which the trap { … break } cleanup
pattern can intercept) and exits non-zero. exit $LASTEXITCODE
is right when the caller cares about the exact code.
Timeouts
The manifest's timeout: is enforced by the agent. When it
fires, the agent calls child.kill() on the PowerShell process
— no graceful shutdown, no trap, no finally. Plan for it:
- Budget the script to finish in
timeout * 0.6and leave headroom. - Use
trap { ... ; break }for cleanup of resources that need explicit release (staging dirs, lock files) —trapfires on terminating errors, NOT on the agent's kill. Don't rely on it for the timeout case. - If you need cooperative cancellation, poll a sentinel file or a registry value and exit early. The agent has no way to send the script a graceful "wrap up" signal.
Killing a running job
kanade kill <exec_id> publishes a kill message the agent
subscribes to. On receipt, the agent calls child.kill() —
same hard-kill as the timeout path. Operators get an immediate
result row marked Killed with whatever stdout / stderr the
agent managed to capture before termination.
Developer Workflow and Contribution
This document outlines the standard workflows, lint/test requirements, and VCS branching guidelines for contributors working on the kanade codebase.
1. Quality Gates (Pre-Push & CI)
The local test suite must be green before you push or submit PRs. This matches the automated checks running on GitHub Actions.
# Run formatting check, clippy checks, target tests, and cargo lock checks
cargo make check
- FMT & Clippy: We maintain a strict zero-warning policy. Do not sprinkle
#[allow(clippy::...)]unless there is a strong architectural justification. - TDD (Test-Driven Development): Follow Kent Beck's TDD methodology. Write failing tests first to define the what, then implement the code to satisfy them.
2. Worktree Management with renri
For isolated feature development, we use renri to manage lightweight repository worktrees. This prevents staging pollution, keeps your main checkout clean, and allows you to switch tasks instantly.
Why renri?
In co-located Git and Jujutsu (jj) environments, managing worktrees manually can get complex. renri simplifies this by automatically wrapping VCS-specific worktree creation (favoring jj when configured) and cleanups.
Common Commands
# Create an isolated worktree (uses Jujutsu by default if present)
renri add feat/your-awesome-feature
# Force a Git-native worktree (bypassing jj)
renri --vcs git add feat/your-awesome-feature
# Clean up and delete a worktree after merging
renri remove feat/your-awesome-feature
# Garbage-collect and prune stale or broken worktrees
renri prune
Note: Worktree creation automatically invokes the cargo-make on-add hook to fetch remote refs and bootstrap APM configurations immediately.
3. Co-located Jujutsu (jj) & Git Workflow
Our development environment is configured with co-located Git and Jujutsu. We prefer jj for local version control due to its safe, conflict-free commit model.
Guidelines
- No Direct Push to
main: All changes must land via a Pull Request. - Branch/Bookmark Naming:
feat/...for new features.fix/...for bugs.chore/...for infrastructure, dependency bumps, or releases.
- Commit Messages: Write commit messages, PR titles, and bodies in English.
- Version Bumps: Release version bumps are managed exclusively via PRs on
mainand automated tagging pipelines. Never rungit tagmanually.
4. Documentation Policy
Documentation must stay in lock-step with code changes. Whenever you add or modify features:
- Update docstrings and comments explaining the why (avoid comments restating how).
- Update the relevant book pages (written in English under
book/src/). - Synchronize localization catalogs by running the translation template generator.
Spec (legacy single-page)
The full protocol / on-wire spec hasn't been migrated into the book yet. The authoritative source is the single-file version in the repo:
Splitting it into chapters under this section is a follow-up once the rest of the operator / developer guides settle.
Configuration Reference
kanade services rely on structured configurations loaded from TOML files, environment variables, or registry paths.
1. Agent Configuration
The agent searches for its configuration via the KANADE_AGENT_CONFIG environment variable or falls back to native paths.
Dev Configuration (configs/agent.dev.toml)
# Dev configuration schema
[agent]
id = "dev-pc"
nats_url = "nats://localhost:4223"
data_dir = "target/dev-data/agent"
[log]
level = "debug"
file = "target/dev-data/agent/logs/agent.log"
Configuration Parameters
| Field | Type | Description | Environment Override |
|---|---|---|---|
agent.id | String | Unique hardware identifier (pc_id). | KANADE_DEV_AGENT_ID (templated) |
agent.nats_url | String | Network address of the NATS broker. | KANADE_NATS_URL |
agent.data_dir | Path | Root path to cache outbox scripts, state database, and local completions. | KANADE_AGENT_DATA_DIR |
log.level | String | Logging verbosity (error, warn, info, debug, trace). | RUST_LOG |
log.file | Path | Filepath destination for rolling logs. | - |
2. Backend Configuration
The backend coordination layer retrieves its configurations from the file specified by KANADE_BACKEND_CONFIG or registers default structures.
Dev Configuration (configs/backend.dev.toml)
[backend]
listen_addr = "127.0.0.1:8081"
nats_url = "nats://localhost:4223"
database_url = "sqlite://target/dev-data/backend/state.db"
[auth]
# Auth settings
Configuration Parameters
| Field | Type | Description | Environment Override |
|---|---|---|---|
backend.listen_addr | String | Network bind address for HTTP/WebSocket traffic. | KANADE_BIND_ADDR |
backend.nats_url | String | Target NATS broker URL. | KANADE_NATS_URL |
backend.database_url | String | SQLite database connection string. | DATABASE_URL |
auth.disable | Boolean | Set to true to disable operator token validation (dev environment only). | KANADE_AUTH_DISABLE |
3. Windows Registry Integration
In production environments, security-sensitive tokens (like NATS client tokens and administrative API bearer tokens) are stored in the secure Windows Registry rather than plaintext files.
Key Paths
- Agent Settings:
HKLM:\SOFTWARE\Kanade\agent - Backend Settings:
HKLM:\SOFTWARE\Kanade\backend
These registry paths are protected with local ACL configurations, allowing read permissions strictly to SYSTEM and designated operators.