wrkr scan

Synopsis

wrkr scan [--repo <owner/repo> | --org <org> | --github-org <org> | --path <dir> | --my-setup | --target <mode>:<value> ...] [--mode quick|governance|deep] [--progress auto|bar|plain|events|none] [--source-retention ephemeral|retain_for_resume|retain] [--deployment-mode local_only|customer_controlled_storage|connected_saas_metadata|managed_platform] [--allow-source-materialization] [--timeout <duration>] [--diff] [--enrich] [--baseline <path>] [--config <path>] [--state <path>] [--policy <path>] [--approved-tools <path>] [--production-targets <path>] [--production-targets-strict] [--profile baseline|standard|strict|assessment] [--github-api <url>] [--github-token <token>] [--report-md] [--report-md-path <path>] [--report-template exec|operator|audit|public|ciso|appsec|platform|customer-draft|agent-action-bom|design-partner-summary] [--report-share-profile internal|public|customer-redacted|design-partner|external-redacted|investor-safe] [--report-top <n>] [--sarif] [--sarif-path <path>] [--json] [--json-path <path>] [--resume] [--quiet] [--explain]

Govern-first `action_paths` in scan JSON now carry additive policy-coverage fields (`policy_coverage_status`, `policy_refs`, `policy_missing_reasons`, `policy_confidence`), buyer-facing `control_state`, `risk_zone`, and `review_burden` fields, and optional `introduced_by` metadata derived from deterministic repo-local provenance before local git fallback when available.
wrkr scan status --state <path> [--json]

Scan-time action_paths[*] are evidence-scoped. control_resolution_state and the canonical approval_evidence_state, owner_evidence_state, proof_evidence_state, runtime_evidence_state, target_evidence_state, and credential_evidence_state fields explain what Wrkr could verify, what was only declared or inferred, and what remained unknown in the scanned inputs. action_path_type keeps plain-source, CI/CD, automation-bot, AI-assisted, and agent-framework paths distinct so downstream reports do not overclaim agent behavior where the evidence only supports a broader action path.

When imported or declared enterprise evidence is present, action_paths[*] may also emit additive evidence_decisions[] and contradictions[]. These preserve the selected source, freshness state (fresh, stale, expired, unknown), rejected candidates, stable reason codes, and contradiction evidence refs instead of flattening everything into one winner string.

Start here with one of these first-value paths:

# Hosted org posture when prerequisites are ready
# Initialize the default hosted target first as described in docs/commands/init.md, then run:
wrkr scan --config ~/.wrkr/config.json --json

# Evaluator-safe fallback when hosted prerequisites are not ready yet
wrkr scan --path ./scenarios/wrkr/scan-mixed-org/repos --json

# Developer-machine hygiene
wrkr scan --my-setup --json

Use either one legacy target source (--repo, --org, --github-org, --path, or --my-setup) or one or more repeatable --target <mode>:<value> flags. Legacy target flags remain supported as one-entry shims and cannot be combined with --target in the same invocation. Supported --target modes are repo, org, path, and my_setup. For my_setup, use --target my_setup:local-machine. Use --target public-surface:<manifest-path> when you want an opt-in public-evidence-only assessment from a structured local manifest instead of a private repo scan.

Acquisition behavior is fail-closed by target:

--path runs fully local/offline.
--path supports two deterministic interpretations:
- repo_root: scan the selected directory itself as one repo when it carries a strong repo-root signal such as .git, or when weak root signals are present without multiple child repo roots.
- repo_set: scan immediate non-hidden child repos when the selected directory is a bundle root, and discover nested owner/repo layouts up to a bounded depth when immediate children are namespace folders.
repo_set child repos are enumerated in deterministic lexical order by repo name. Child repos without tool markers are still included when sibling repos have markers so detector-level permission and symlink diagnostics remain visible.
--my-setup runs fully local/offline against the local machine setup rooted at the current user home directory. It inspects supported user-home tool configs, selected environment key names, and common workspace roots for local agent project markers without emitting raw secret values.
--repo and --org require real GitHub acquisition via --github-api, config github_api_base, or WRKR_GITHUB_API_BASE.
--target public-surface:<manifest-path> is explicit and local-input-only. It loads a structured manifest of public repos, docs, SDKs, engineering blogs, release notes, status pages, or public workflows; it does not scrape the internet or infer private runtime/control proof from public marketing claims.
Hosted GitHub materialization is sparse by default: Wrkr fetches detector-relevant files such as agent instructions, MCP/Codex/Cursor/Claude configs, skills, workflows, policy files, dependency manifests, and AI/MCP declaration surfaces instead of every repository blob.
--deployment-mode is explicit metadata for how scan-derived artifacts should describe the customer data boundary. Supported values are local_only, customer_controlled_storage, connected_saas_metadata, and managed_platform. The default is local_only.
--deployment-mode does not enable network calls, hosted uploads, or source retention by itself. It only labels the resulting machine-readable artifacts and source-privacy contract.
If a repo already contains deterministic provenance sidecars under .wrkr/provenance/, Wrkr can project PR-level introduced_by metadata from source-metadata.json, github-event.json, or gitlab-event.json without live provider calls.
If a repo contains .wrkr/provenance/external-control-evidence.json, Wrkr can also project local ownership, approval, branch-protection, protected-environment, required-check, security-gate, freeze-window, and kill-switch evidence into govern-first path posture without live provider calls.
If a repo contains wrkr-control-declarations.yaml or .wrkr/control-declarations.yaml, Wrkr loads versioned customer declarations for owner mappings, target classes, non-production declarations, and control evidence links as local declared evidence only.
Invalid control declarations fail closed with policy_schema_violation (exit 3) instead of being ignored.
Repo-local Gait policy controls.deployment_constraints[] declarations are treated as declared control evidence for branch, environment, approval, required-check, freeze-window, kill-switch, and security-gate context when present.
Hosted scans do not fetch broad source-code extensions by default. Use --mode deep or --allow-source-materialization only when you explicitly want generic source files such as .go, .py, .js, or .ts to be materialized for deeper static detector coverage.
Hosted GitHub API base resolution order is: --github-api, config github_api_base, then WRKR_GITHUB_API_BASE.
Hosted GitHub token resolution order is: --github-token, config auth.scan.token, WRKR_GITHUB_TOKEN, then GITHUB_TOKEN.
--github-org is an additive alias for --org.
Explicit multi-target scans set target.mode=multi and add deterministic targets[] arrays to the top-level scan payload, saved state snapshot, and source_manifest.
--repo and --org materialize the required hosted files into a deterministic local workspace under the scan state directory before detectors run.
Hosted materialized source retention defaults to --source-retention ephemeral: Wrkr removes the managed materialized root after scan artifacts are committed, and it also cleans up failed runs unless retention is explicitly requested. Use retain_for_resume to preserve materialized files after a failed/interrupted run for resume, or retain to keep them after success. Both modes leave private repository contents on disk and should be used deliberately.
Hosted scan artifacts emit source_privacy with retention_mode, additive deployment_mode, materialized_source_retained, raw_source_in_artifacts=false, serialized_locations, cleanup_status, and optional warnings.
Materialized workspace root (materialized-sources/) is ownership-gated:
- Wrkr-managed roots include marker .wrkr-materialized-sources-managed with state-bound provenance, not just a static marker body.
- Non-empty roots without a valid marker are blocked (no recursive cleanup).
- Marker must be a regular file with valid state-bound marker payload; symlink/directory/legacy-static/invalid marker content is blocked.
- On --resume, previously materialized repo directories and checkpoint files must also be regular in-root artifacts; symlink-swapped repo roots or checkpoint files are blocked.
- Ownership violations return unsafe_operation_blocked (exit 8).
When GitHub acquisition is unavailable, scan returns dependency_missing with exit code 7 (no synthetic repos are emitted).
--state defaults to .wrkr/last-scan.json, with manifest/proof artifacts written alongside it.
Existing --state files must be regular files; symlinked --state inputs fail closed with unsafe_operation_blocked (exit 8) before any managed artifact mutation.
Scan-owned managed artifacts are published transactionally: state snapshot, lifecycle chain, proof chain/attestation, manifest, and any requested --json-path, --report-md-path, or --sarif-path sidecars commit as one generation.
Scan status is written as a deterministic sidecar next to --state and can be inspected with wrkr scan status --state <path> --json without rescanning.
Invalid scan-owned artifact paths such as --report-md-path and --sarif-path are preflight-validated before any managed artifact mutation.
--json-path, --report-md-path, and --sarif-path must stay unique from one another and from Wrkr-managed artifacts derived from --state; collisions fail closed with invalid_input (exit 6) before any scan-managed artifact is written.
Late write failures after preflight still fail closed and roll managed artifacts back to the previous committed generation instead of leaving mixed state/proof/manifest outputs behind.
For --path scans, detector file reads stay bounded to the selected repo root. Root-escaping symlinked config, env, workflow, and MCP files are rejected with deterministic parse_error.kind=unsafe_path diagnostics instead of being read.

Scan mode behavior is explicit:

--mode governance is the default enterprise posture. It emits the versioned control_backlog, keeps raw findings for compatibility, and reports generated/package-manager noise in scan_quality.
--mode quick runs the highest-signal governance detectors for coding assistant configs, MCP, skills, CI automation, secret references, and policy files.
--mode deep runs the full detector set and marks scan_quality.mode=deep; scan_quality.detectors[*] then distinguishes clean coverage from partial, reduced, or blocked detector health instead of forcing you to infer confidence from raw findings alone.
Invalid mode values fail closed with invalid_input (exit 6) and the normal JSON error envelope in --json mode.
--diff requires the previous saved snapshot and current scan to use the same recorded scan mode. A mode mismatch fails closed with invalid_input (exit 6) instead of reporting synthetic drift caused by quick/governance/deep scope differences.
When no target is provided and no usable config default target exists, scan --json fails closed with exit 6, error.code=invalid_input, and additive error.next_steps[] guidance for hosted org setup, the evaluator-safe scenario fallback, and --my-setup.

Flags

--json
--json-path
--resume
--explain
--quiet
--progress
--repo
--org
--github-org
--path
--my-setup
--target
--mode
--source-retention
--deployment-mode
--allow-source-materialization
--timeout
--diff
--enrich
--baseline
--config
--state
--policy
--approved-tools
--production-targets
--production-targets-strict
--profile
--github-api
--github-token
--report-md
--report-md-path
--report-template
--report-share-profile
--report-top
--sarif
--sarif-path

Status inspection

wrkr scan status --state ./.wrkr/last-scan.json --json

The status payload includes status, current_phase, last_successful_phase, repo counts, partial_result, partial_result_marker, phase timings, artifact paths, and source_privacy when scan state includes source-retention metadata. Completed scans that hit non-fatal source acquisition failures keep partial_result=true in status JSON until the same target is rerun cleanly. During active, interrupted, or completed-partial scans, additive progress fields may also include progress_percent, progress_message, last_progress_at, elapsed_seconds, phase_progress, repo_progress, and detector_progress. When present, repo_progress.completed counts repos that reached a terminal source-acquisition result, repo_progress.succeeded isolates successful materializations, and repo_progress.pending stays total - completed so failed repos are counted once. Existing state files without a status sidecar are interpreted as completed when the state snapshot can be loaded, otherwise unknown.

Developer personal-hygiene example

wrkr scan --my-setup --json

This local/offline mode inventories supported user-home tool configs, selected environment key presence, and local agent project markers. Use it when a developer wants to answer "what AI tooling is already on this machine?" before widening to the org workflow. Environment-key presence and source bookkeeping stay in findings/risk output only; they do not become lifecycle identities, manifest identities, inventory agents, or regress tools. For the current minimum-now launch posture, security/platform teams should start with the org example below; --my-setup remains the secondary local-machine path.

Security-team org example

wrkr scan --github-org acme --github-api https://api.github.com --json --json-path ./.wrkr/scan.json

--github-org is the additive alias for --org. Use it when security or platform teams need the deterministic saved-state input for wrkr report, wrkr evidence, wrkr mcp-list, or wrkr inventory --diff. Private repos and public API rate-limit avoidance usually require a GitHub token even when --github-api is set. If you already configured the hosted source and target with wrkr init, you can reuse them:

wrkr init --org acme --github-api https://api.github.com --json
wrkr scan --config ~/.wrkr/config.json --json

Wrkr's hosted connector currently calls these GitHub REST endpoints:

GET /orgs/{org}/repos?per_page=100&page=N
GET /repos/{owner}/{repo}
GET /repos/{owner}/{repo}/git/trees/{default_branch}?recursive=1
GET /repos/{owner}/{repo}/git/blobs/{sha}

Fine-grained PAT guidance for the selected repositories:

repository metadata: read-only
repository contents: read-only

Opinionated large-org command path:

wrkr scan --github-org acme --github-api https://api.github.com --state ./.wrkr/last-scan.json --timeout 30m --json --json-path ./.wrkr/scan.json --report-md --report-md-path ./.wrkr/scan-summary.md --sarif --sarif-path ./.wrkr/wrkr.sarif

Wrkr now exposes one explicit progress contract through --progress auto|bar|plain|events|none.

auto is the default.
auto uses events for --json scans so stdout stays reserved for the final JSON payload while progress stays on stderr.
auto uses a single updating bar on interactive terminals when Wrkr can safely render it.
auto degrades to plain newline-delimited progress on non-TTY stderr targets or conservative terminals.
events preserves machine-oriented progress target=... event=... stderr lines for automation and log parsers.
plain emits stable human-readable progress lines to stderr without terminal control characters.
bar requests the interactive updating bar explicitly; when stderr cannot safely render it, Wrkr falls back to plain and explains the fallback on stderr.
none disables progress output without muting errors.

--quiet overrides --progress and suppresses all non-error progress output. The saved-state report hooks can now target design-partner-summary and the expanded design-partner, external-redacted, and investor-safe share profiles when a scan wants to publish a buyer-facing static action summary directly from the scan flow.

When --json is set, Wrkr keeps stdout reserved for the final JSON payload and emits progress to stderr only. Existing event-style progress for hosted org and local path scans remains available by default through --progress auto. Event-mode progress includes retry, cooldown, resume, per-repo materialization completion/discovery, detector lifecycle detail, heartbeat updates, scan phase transitions, completion, and final footer lines. For hosted scans, repo_materialize means a repo job was dispatched to a worker and repo_materialize_done means that repo reached a success or failure result. For path scans, repo_discovered means a local repo root was selected for detector execution. --json-path writes the same final JSON payload to disk, and --json --json-path emits byte-identical payload bytes to both stdout and the selected file. Any requested --json-path, --report-md-path, or --sarif-path must be unique from one another and from scan-managed --state sibling artifacts.

Long-running source acquisition, detector execution, analysis, and artifact commit phases emit heartbeats with elapsed time so operators can distinguish a slow scan from a stuck scan. The reported percent is an operator UX estimate only. It is additive progress metadata and is not consumed by risk scoring, proof emission, compliance mapping, regress baselines, or policy decisions.

For CI or log-stable automation, prefer --progress none when you want no progress stderr, or --progress events when you want deterministic machine-readable liveness. --quiet is stronger and suppresses progress output entirely. --resume is supported only when every requested target is an org target. Wrkr stores internal checkpoint metadata under the scan-state directory in org-checkpoints/ and reuses already-materialized repositories only when the checkpoint target set, per-org repo sets, and materialized-root path still match the current org-target scan. Resume also revalidates that checkpoint files and reused repo roots are still trusted local artifacts under the managed materialized root; symlink-swapped entries fail closed as unsafe_operation_blocked. Default successful hosted scans remove that managed root, so resume from retained materialized source requires an explicit retention mode such as --source-retention retain for completed runs or retain_for_resume for failed/interrupted runs. Mixed target sets such as org-plus-path scans fail closed with invalid_input when --resume is requested. If a run is interrupted after some repositories are checkpointed, rerun the same target with --resume and keep the same --state path. Use wrkr scan status --state <path> --json to inspect the last successful phase, partial marker, and repo counters before rerunning. If partial_result, source_errors, or source_degraded is present, treat the scan as incomplete and rerun after the blocking condition is resolved.

For long org scans, run the foreground command under your process supervisor or shell backgrounding rather than relying on a hidden daemon:

nohup wrkr scan --github-org acme --github-api https://api.github.com --state ./.wrkr/last-scan.json --json --json-path ./.wrkr/scan.json > ./.wrkr/scan.stdout 2> ./.wrkr/scan.stderr &
wrkr scan status --state ./.wrkr/last-scan.json --json

Mixed target example:

wrkr scan --target org:acme --target path:./repos --github-api https://api.github.com --json
wrkr scan --target public-surface:./docs/examples/public-surface-assessment.v1.yaml --json

Repo/path example

wrkr scan --path ./scenarios/wrkr/scan-mixed-org/repos --profile assessment --report-md --report-md-path ./.tmp/scan-summary.md --report-template operator --json

This is the canonical repo_set example for --path: the selected directory is a bundle of immediate child repos, so Wrkr preserves per-child repo manifests and deterministic ordering instead of collapsing the bundle into one repo. Expected JSON keys include status, additive deployment_mode, target, source_privacy, scan_mode, scan_quality, findings, additive control_backlog, ranked_findings, top_findings, attack_paths, top_attack_paths, additive action_paths, additive action_path_to_control_first, inventory, privilege_budget, agent_privilege_map, repo_exposure_summaries, profile, posture_score, compliance_summary, additive activation, and optional report when summary output is requested. Top-level deployment_mode and source_privacy.deployment_mode stay aligned. local_only is the default when no alternate mode is explicitly selected. Public-surface scans also populate source_manifest.public_evidence_manifest_name and additive source_manifest.public_evidence[] rows so downstream reports can preserve public observed facts, inferred context, unsupported public claims, and explicit private-evidence gaps without re-reading the input manifest. Explicit multi-target runs also emit additive targets[] arrays at the top level and inside source_manifest, and saved state snapshots preserve the same additive targets[] contract. control_backlog.control_backlog_version is the stable backlog schema version. control_backlog.items[*] includes repo, path, control surface/path type, capability, additive write_path_classes, additive governance_controls, owner/source/status, evidence source/basis, approval status, governance security visibility, queue (control_first|review_queue|inventory_hygiene|debug_only), finding visibility (primary|appendix|debug), recommended action (attach_evidence|approve|remediate|downgrade|deprecate|exclude|monitor|inventory_review|suppress|debug_only), concrete remediation text, confidence (high|medium|low), additive action-path-backed confidence_lane / confidence_lane_reasons when a backlog row is linked to a govern-first path, evidence gaps, confidence-raising guidance, SLA, closure criteria, optional secret signal types, and linked raw finding IDs. Raw findings remain the compatibility evidence surface. Governance backlog visibility uses known_approved, known_unapproved, unknown_to_security, accepted_risk, deprecated, revoked, and needs_review. Legacy inventory compatibility fields may still emit or accept approved for existing consumers, while new backlog items map that state to known_approved. scan_quality.scan_quality_version is the stable scan-quality schema version. In governance mode, generated/package-manager surfaces such as node_modules/, dist/, build/, nested generated SDK folders, .yarn/sdks/, VitePress cache/dist assets, minified JavaScript, and other package internals are reported as scan-quality context instead of active backlog items. Parser diagnostics in scan_quality.parse_errors[*] include deterministic recommended_action values such as suppress for generated/package noise and debug_only for non-generated diagnostics. scan_quality.detectors[*] reports per-repo detector health with deterministic status (complete|partial|reduced|blocked) plus attempted/parsed/partial/suppressed/failure counts and coverage_reasons. Use these rows to judge whether a negative MCP/WebMCP result is trustworthy or whether reduced coverage should keep the repo in a review/debug path. For local-machine scans, target.mode is my_setup. When target.mode=my_setup, activation.items projects concrete local tool, MCP, secret, and parse-error signals first without mutating the raw top_findings ranking. Policy-only items remain available in ranked_findings / top_findings. When target.mode=org, target.mode=path, or target.mode=multi, activation.items projects govern-first candidate paths from the saved privilege map and adds item_class values such as production_target_backed, unknown_to_security_write_path, approval_gap_path, and govern_first_candidate. action_paths[*] combines path identity, write capability, additive write_path_classes, additive action_classes, additive action_reasons, additive mutable_endpoint_semantics[], additive governance_controls, approval gap, security visibility, credential/deployment posture, delivery-chain metadata (pull_request_write, merge_execute, deploy_write, delivery_chain_status), additive workflow trigger posture (workflow_trigger_class such as scheduled, workflow_dispatch, or deploy_pipeline), production-target truth (production_target_status, production_write), additive execution-identity fields (execution_identity, execution_identity_type, execution_identity_source, execution_identity_status, execution_identity_rationale), additive standing-authority fields (standing_privilege, standing_privilege_reasons), additive credentials[], additive credential_authority, additive purpose/version/config metadata (purpose, purpose_source, purpose_confidence, version, version_source, config_fingerprint, config_source), additive action_lineage.segments[], path_context, tool_family_id, and tool_instance_id, additive buyer-lane fields (confidence_lane, confidence_lane_reasons), path-linked attack_path_score, labeled govern-first dimensions (inventory_risk, control_priority, risk_tier), additive join refs (attack_path_refs, source_finding_keys), and a stable recommended_action enum of inventory|approval|proof|control. Purpose metadata prefers explicit repo-local wrkr:purpose annotations when present, then falls back to structured workflow, MCP, script, symbol, and location evidence. action_paths[*].path_id is an opaque deterministic identifier currently emitted in apc-<hex> form. Treat it as a stable join key only; do not parse business meaning from its string format. action_path_to_control_first exposes one prioritized path plus deterministic summary counts (total_paths, write_capable_paths, production_target_backed_paths, govern_first_paths) without removing the legacy attack_paths surfaces. action_path_to_control_first.summary.empty_state_status and empty_state_reasons are additive metadata explaining whether the current govern-first path set supports a clean buyer-facing empty state, blocks it, or downgrades it because detector coverage was reduced. --profile assessment narrows govern-first surfaces such as action_paths, action_path_to_control_first, activation, and report summaries for sample/test/vendor-style noise while leaving raw findings, proof output, and exit codes unchanged. warnings is included when Wrkr can prove posture may be incomplete even though the scan succeeded, for example when known MCP-bearing declaration files failed to parse. detector_errors is included when non-fatal detector failures occur and partial scan results are preserved. partial_result, source_errors, and source_degraded are included when source acquisition/materialization has non-fatal failures. When filesystem permission or stat failures prevent full detector coverage, detector_errors[*].code stays explicit (permission_denied, path_not_found) and scan_quality.detectors[*].status degrades to blocked or reduced instead of quietly presenting a clean negative result. Downstream wrkr campaign aggregate treats these completeness markers as fail-closed input signals and rejects such artifacts instead of producing a campaign summary from incomplete scans. sarif.path is included when --sarif output is requested. compliance_summary.frameworks[*].controls[*] emits deterministic framework/control rollups with mapped_rule_ids, finding_count, and proof-derived coverage status. inventory.methodology emits machine-readable scan metadata (wrkr_version, timing, repo/file counts, detector inventory). inventory.agents is always present (possibly empty) and is deterministically sorted by org/framework/instance/location; agent entries may include additive symbol, security_visibility_status, and location_range when parser metadata is available. Source coverage remains intentionally scoped:

supported framework-native parsing covers LangChain, CrewAI, OpenAI Agents, AutoGen, LlamaIndex, and MCP-client patterns
conservative custom-agent scaffolds come from .wrkr/agents/custom-agent.{yaml,yml,json,toml}
explicit bespoke custom-source coverage uses wrkr:custom-agent annotations in Python or JS/TS source files ranked_findings[*] and attack_paths[*] now include deterministic agent-aware amplification and edge rationale when agent declarations expose deployment, delegation, dynamic discovery, or bound tool/data/auth/deploy chains. inventory.tools[*] includes deterministic approval_classification (approved|unapproved|unknown), and inventory.approval_summary emits aggregate approval-gap ratios for campaign/report workflows. inventory.tools[*], inventory.agents[*], and agent_privilege_map[*] also emit additive security_visibility_status without overloading approval_classification. Existing readers should continue to accept approved|known_unapproved|unknown_to_security; governance additions may also surface known_approved, accepted_risk, deprecated, revoked, and needs_review where lifecycle or approval evidence supports those states. inventory.tools[*], agent_privilege_map[*], control_backlog.items[*], and action_paths[*] may emit additive write_path_classes such as read, write, pr_write, repo_write, release_write, package_publish, deploy_write, infra_write, secret_bearing_execution, and production_adjacent_write. agent_privilege_map[*], inventory.tools[*], and action_paths[*] also emit additive static endpoint classification via mutable_endpoint_semantics[] (read, write, delete, deploy, refund, payment, user_admin, data_export, production_mutation) with deterministic confidence, surface, operation, and evidence refs. action_paths[*] also carries additive target_class / target_class_reasons / target_class_evidence_refs plus additive action_path_type / action_path_type_reasons / action_path_type_evidence_refs so downstream reports can distinguish production-impacting, release-adjacent, customer-data-adjacent, internal-tooling, developer-productivity, sandbox, and unknown targets, while only using agent-specific language when the path type is actually agentic. These fields are declaration-only; they do not claim live reachability or runtime observation. agent_privilege_map[*] and action_paths[*] also emit additive credential classification fields credential_kind, access_type, standing_access, likely_jit, evidence_location, and classification_reasons, plus additive normalized credential_authority posture, purpose/version/config metadata, action_lineage, additive action_classes, action_reasons, and standing_privilege_reasons. governance_controls[*] maps review evidence for owner_assigned, approval_recorded, least_privilege_verified, rotation_evidence_attached, deployment_gate_present, production_access_classified, proof_artifact_generated, and review_cadence_set; each control reports satisfied, gap, or not_applicable with deterministic evidence/gap reasons. Workflow-backed findings may emit additive first-class workflow capabilities such as repo.write, pull_request.write, merge.execute, release.write, package.write, deploy.write, db.write, and iac.write. Each capability remains static-only and is paired with workflow_capability.* evidence showing which workflow permission or step pattern produced the claim. Workflow evidence may also carry additive workflow_environment and target_class_hint values when structured environment or delivery signals are present. inventory.tools[*].locations[*] preserves the legacy owner string and adds owner_source plus ownership_status so CODEOWNERS-backed ownership stays distinguishable from deterministic fallback. agent_privilege_map[*] and action_paths[*] add operational_owner, additive ownership provenance, and approval_gap_reasons so governance-first paths can show who should act next and why the approval model is incomplete. inventory.security_visibility_summary emits additive reference-basis and count fields including unknown_to_security_write_capable_agents. inventory.local_governance is emitted for --my-setup scans so workstation tool/config discoveries can be compared against an --approved-tools baseline without turning secret-presence signals into lifecycle identities. inventory.non_human_identities[*] is emitted when static repo evidence shows durable GitHub App, bot-user, or service-account execution identities behind AI-enabled delivery paths. When a downstream workflow does not have a usable reference_basis, Wrkr suppresses unknown_to_security claims rather than fabricating them. inventory.tools[*] also emits report-ready tool_category and deterministic confidence_score (0.00-1.00) for inventory breakdown tables. inventory.tools[*] emits normalized permission_surface, permission_tier, risk_tier, adoption_pattern, and per-tool regulatory_mapping statuses. inventory.adoption_summary and inventory.regulatory_summary provide deterministic rollups for report section tables. agent_privilege_map[*] is instance-scoped and includes additive agent_instance_id, tool_family_id, tool_instance_id, symbol, location, location_range, credentials[], credential_authority, purpose/version/config metadata, and path_context fields for multi-agent same-file repos and multi-credential authority paths. --approved-tools <path> accepts a schema-validated YAML policy (schemas/v1/policy/approved-tools.schema.json) for explicit approved-list matching (tool_ids, agent_ids, tool_types, orgs, repos via exact/prefix sets). Invalid --approved-tools policy files fail closed with invalid_input (exit 6). For --my-setup, omitting --approved-tools keeps inventory.local_governance.reference_basis=unavailable instead of fabricating sanctioned or unsanctioned local claims. For --repo and --org scans, source_manifest.repos[*].source is github_repo_materialized, and source_manifest.repos[*].location is a logical hosted reference such as github://acme/backend. The detector filesystem root is internal-only and is not serialized in customer-facing artifacts. Prompt-channel findings use stable reason codes and evidence hashes only (pattern_family, evidence_snippet_hash, location_class, confidence_class) and do not emit raw secret values. Secret-bearing workflow evidence separates secret_reference_detected, secret_value_detected, secret_scope_unknown, secret_rotation_evidence_missing, secret_owner_missing, and secret_used_by_write_capable_workflow. Workflow references such as ${{ secrets.NAME }} are classified as references, not leaked values, and raw secret values are not emitted. Static endpoint detection covers OpenAPI specs, common route files, and MCP declaration hints. Structured OpenAPI parsing is preferred when available; route-file classification is heuristic and lower-confidence by design. When --enrich is enabled, MCP findings include enrich provenance and quality fields: source, as_of, package, version, advisory_count, registry_status, enrich_quality (ok|partial|stale|unavailable), advisory_schema, registry_schema, and enrich_errors. Built-in production-target packs classify common deploy, Terraform/IaC, Kubernetes, package publishing, release automation, database migration, and customer-impacting workflows even when --production-targets is not supplied. Custom --production-targets files remain authoritative when present, and non-fatal custom-policy load errors may still surface policy_warnings.

Timeout/cancellation contract:

--timeout <duration> bounds end-to-end scan runtime (0 disables timeout).
When timeout is exceeded, JSON error code is scan_timeout with exit code 1.
When canceled by signal or parent context, JSON error code is scan_canceled with exit code 1.

Retry/degradation contract:

GitHub connector retries retryable failures with bounded jittered backoff.
HTTP 429 and recognizable rate-limit 403 responses retry deterministically.
When GitHub supplies Retry-After or X-RateLimit-Reset, Wrkr uses that observed window before retrying.
Exhausted hosted throttling keeps exit code 1 but emits JSON error code rate_limited so automation can distinguish retryable wait conditions from generic runtime failure.
Repeated transient failures trigger connector cooldown degradation; scan surfaces this in partial-result output (source_degraded=true when applicable).
In --json org mode, retry/cooldown/resume/completion operator progress is emitted to stderr only; stdout remains reserved for the final JSON payload.

SARIF contract:

--sarif emits a SARIF 2.1.0 report from scan findings.
--sarif-path selects output path (default wrkr.sarif).
SARIF runs include properties.source_privacy when source-retention metadata is available.
The core scan --json contract remains backward-compatible; SARIF is additive.

Approved-tools policy example: docs/examples/approved-tools.v1.yaml.

Production target policy files are YAML and schema-validated (schemas/v1/policy/production-targets.schema.json), with exact/prefix matching only. Example: docs/examples/production-targets.v1.yaml.

Production write rule:

production_write = has_any(write_permissions) AND matches_any_production_target

Safe claim rule:

write_capable is always available from the privilege budget and agent_privilege_map.
production_write is safe to claim only when --production-targets is configured and valid.
When production targets are missing or invalid, public/report wording must stay at write_capable and only expose production-target status, not a production-write count.

Every discovered entity now emits discovery_method: static in both findings and inventory.tools for deterministic v1 schema compatibility. Saved lifecycle-bearing identities written beside scan state are intentionally narrower: real tool, agent, CI, skill, and MCP surfaces only. Posture/bookkeeping findings such as secret_presence, source_discovery, policy_*, and parse_error remain in findings/risk surfaces only.

--explain also emits short compliance rollup lines derived from the same machine-readable compliance_summary contract.

Emerging discovery surfaces are static-only in default deterministic mode:

WebMCP detection uses repository HTML/JS/route files only.
A2A detection uses repo-hosted agent-card JSON files only.
MCP gateway posture is derived from local config files only.
Non-human execution identities are derived from static workflow/config signals only.
No live endpoint probing is performed by default.

Wrkr stays in the See boundary: it inventories and scores tools plus agents from files and CI declarations, but it does not claim runtime observation, enforce runtime side effects, or execute agent workflows. Wrkr also does not assess package or MCP-server vulnerabilities in this path; use dedicated scanners such as Snyk for that class of assessment. Gait is optional interoperability for control-layer decisions, not a prerequisite for scan.

Custom extension detectors are loaded from .wrkr/detectors/extensions.json when present in scanned repositories. Their findings remain on additive finding and risk surfaces only by default; they do not create authoritative inventory, lifecycle, regress, or action-path state unless a future explicit contract says so. See docs/extensions/detectors.md. Canonical state and artifact lifecycle: docs/state_lifecycle.md.