wrkr scan
wrkr scan
Synopsis
wrkr scan [--repo <owner/repo> | --org <org> | --github-org <org> | --path <dir> | --my-setup | --target <mode>:<value> ...] [--mode quick|governance|deep] [--progress auto|bar|plain|events|none] [--source-retention ephemeral|retain_for_resume|retain] [--deployment-mode local_only|customer_controlled_storage|connected_saas_metadata|managed_platform] [--allow-source-materialization] [--timeout <duration>] [--diff] [--enrich] [--baseline <path>] [--config <path>] [--state <path>] [--policy <path>] [--approved-tools <path>] [--production-targets <path>] [--production-targets-strict] [--profile baseline|standard|strict|assessment] [--github-api <url>] [--github-token <token>] [--report-md] [--report-md-path <path>] [--report-template exec|operator|audit|public|ciso|appsec|platform|customer-draft|agent-action-bom|design-partner-summary] [--report-share-profile internal|public|customer-redacted|design-partner|external-redacted|investor-safe] [--report-top <n>] [--sarif] [--sarif-path <path>] [--json] [--json-path <path>] [--resume] [--quiet] [--explain]
Govern-first `action_paths` in scan JSON now carry additive policy-coverage fields (`policy_coverage_status`, `policy_refs`, `policy_missing_reasons`, `policy_confidence`), buyer-facing `control_state`, `risk_zone`, and `review_burden` fields, and optional `introduced_by` metadata derived from deterministic repo-local provenance before local git fallback when available.
wrkr scan status --state <path> [--json]Scan-time action_paths[*] are evidence-scoped. control_resolution_state and the canonical approval_evidence_state, owner_evidence_state, proof_evidence_state, runtime_evidence_state, target_evidence_state, and credential_evidence_state fields explain what Wrkr could verify, what was only declared or inferred, and what remained unknown in the scanned inputs. action_path_type keeps plain-source, CI/CD, automation-bot, AI-assisted, and agent-framework paths distinct so downstream reports do not overclaim agent behavior where the evidence only supports a broader action path.
When imported or declared enterprise evidence is present, action_paths[*] may also emit additive evidence_decisions[] and contradictions[]. These preserve the selected source, freshness state (fresh, stale, expired, unknown), rejected candidates, stable reason codes, and contradiction evidence refs instead of flattening everything into one winner string.
Start here with one of these first-value paths:
# Hosted org posture when prerequisites are ready
# Initialize the default hosted target first as described in docs/commands/init.md, then run:
wrkr scan --config ~/.wrkr/config.json --json
# Evaluator-safe fallback when hosted prerequisites are not ready yet
wrkr scan --path ./scenarios/wrkr/scan-mixed-org/repos --json
# Developer-machine hygiene
wrkr scan --my-setup --jsonUse either one legacy target source (--repo, --org, --github-org, --path, or --my-setup) or one or more repeatable --target <mode>:<value> flags.
Legacy target flags remain supported as one-entry shims and cannot be combined with --target in the same invocation.
Supported --target modes are repo, org, path, and my_setup.
For my_setup, use --target my_setup:local-machine.
Use --target public-surface:<manifest-path> when you want an opt-in public-evidence-only assessment from a structured local manifest instead of a private repo scan.
Acquisition behavior is fail-closed by target:
--pathruns fully local/offline.--pathsupports two deterministic interpretations:repo_root: scan the selected directory itself as one repo when it carries a strong repo-root signal such as.git, or when weak root signals are present without multiple child repo roots.repo_set: scan immediate non-hidden child repos when the selected directory is a bundle root, and discover nested owner/repo layouts up to a bounded depth when immediate children are namespace folders.
repo_setchild repos are enumerated in deterministic lexical order by repo name. Child repos without tool markers are still included when sibling repos have markers so detector-level permission and symlink diagnostics remain visible.--my-setupruns fully local/offline against the local machine setup rooted at the current user home directory. It inspects supported user-home tool configs, selected environment key names, and common workspace roots for local agent project markers without emitting raw secret values.--repoand--orgrequire real GitHub acquisition via--github-api, configgithub_api_base, orWRKR_GITHUB_API_BASE.--target public-surface:<manifest-path>is explicit and local-input-only. It loads a structured manifest of public repos, docs, SDKs, engineering blogs, release notes, status pages, or public workflows; it does not scrape the internet or infer private runtime/control proof from public marketing claims.- Hosted GitHub materialization is sparse by default: Wrkr fetches detector-relevant files such as agent instructions, MCP/Codex/Cursor/Claude configs, skills, workflows, policy files, dependency manifests, and AI/MCP declaration surfaces instead of every repository blob.
--deployment-modeis explicit metadata for how scan-derived artifacts should describe the customer data boundary. Supported values arelocal_only,customer_controlled_storage,connected_saas_metadata, andmanaged_platform. The default islocal_only.--deployment-modedoes not enable network calls, hosted uploads, or source retention by itself. It only labels the resulting machine-readable artifacts and source-privacy contract.- If a repo already contains deterministic provenance sidecars under
.wrkr/provenance/, Wrkr can project PR-levelintroduced_bymetadata fromsource-metadata.json,github-event.json, orgitlab-event.jsonwithout live provider calls. - If a repo contains
.wrkr/provenance/external-control-evidence.json, Wrkr can also project local ownership, approval, branch-protection, protected-environment, required-check, security-gate, freeze-window, and kill-switch evidence into govern-first path posture without live provider calls. - If a repo contains
wrkr-control-declarations.yamlor.wrkr/control-declarations.yaml, Wrkr loads versioned customer declarations for owner mappings, target classes, non-production declarations, and control evidence links as local declared evidence only. - Invalid control declarations fail closed with
policy_schema_violation(exit3) instead of being ignored. - Repo-local Gait policy
controls.deployment_constraints[]declarations are treated as declared control evidence for branch, environment, approval, required-check, freeze-window, kill-switch, and security-gate context when present. - Hosted scans do not fetch broad source-code extensions by default. Use
--mode deepor--allow-source-materializationonly when you explicitly want generic source files such as.go,.py,.js, or.tsto be materialized for deeper static detector coverage. - Hosted GitHub API base resolution order is:
--github-api, configgithub_api_base, thenWRKR_GITHUB_API_BASE. - Hosted GitHub token resolution order is:
--github-token, configauth.scan.token,WRKR_GITHUB_TOKEN, thenGITHUB_TOKEN. --github-orgis an additive alias for--org.- Explicit multi-target scans set
target.mode=multiand add deterministictargets[]arrays to the top-level scan payload, saved state snapshot, andsource_manifest. --repoand--orgmaterialize the required hosted files into a deterministic local workspace under the scan state directory before detectors run.- Hosted materialized source retention defaults to
--source-retention ephemeral: Wrkr removes the managed materialized root after scan artifacts are committed, and it also cleans up failed runs unless retention is explicitly requested. Useretain_for_resumeto preserve materialized files after a failed/interrupted run for resume, orretainto keep them after success. Both modes leave private repository contents on disk and should be used deliberately. - Hosted scan artifacts emit
source_privacywithretention_mode, additivedeployment_mode,materialized_source_retained,raw_source_in_artifacts=false,serialized_locations,cleanup_status, and optional warnings. - Materialized workspace root (
materialized-sources/) is ownership-gated:- Wrkr-managed roots include marker
.wrkr-materialized-sources-managedwith state-bound provenance, not just a static marker body. - Non-empty roots without a valid marker are blocked (no recursive cleanup).
- Marker must be a regular file with valid state-bound marker payload; symlink/directory/legacy-static/invalid marker content is blocked.
- On
--resume, previously materialized repo directories and checkpoint files must also be regular in-root artifacts; symlink-swapped repo roots or checkpoint files are blocked. - Ownership violations return
unsafe_operation_blocked(exit8).
- Wrkr-managed roots include marker
- When GitHub acquisition is unavailable,
scanreturnsdependency_missingwith exit code7(no synthetic repos are emitted). --statedefaults to.wrkr/last-scan.json, with manifest/proof artifacts written alongside it.- Existing
--statefiles must be regular files; symlinked--stateinputs fail closed withunsafe_operation_blocked(exit8) before any managed artifact mutation. - Scan-owned managed artifacts are published transactionally: state snapshot, lifecycle chain, proof chain/attestation, manifest, and any requested
--json-path,--report-md-path, or--sarif-pathsidecars commit as one generation. - Scan status is written as a deterministic sidecar next to
--stateand can be inspected withwrkr scan status --state <path> --jsonwithout rescanning. - Invalid scan-owned artifact paths such as
--report-md-pathand--sarif-pathare preflight-validated before any managed artifact mutation. --json-path,--report-md-path, and--sarif-pathmust stay unique from one another and from Wrkr-managed artifacts derived from--state; collisions fail closed withinvalid_input(exit6) before any scan-managed artifact is written.- Late write failures after preflight still fail closed and roll managed artifacts back to the previous committed generation instead of leaving mixed state/proof/manifest outputs behind.
- For
--pathscans, detector file reads stay bounded to the selected repo root. Root-escaping symlinked config, env, workflow, and MCP files are rejected with deterministicparse_error.kind=unsafe_pathdiagnostics instead of being read.
Scan mode behavior is explicit:
--mode governanceis the default enterprise posture. It emits the versionedcontrol_backlog, keeps raw findings for compatibility, and reports generated/package-manager noise inscan_quality.--mode quickruns the highest-signal governance detectors for coding assistant configs, MCP, skills, CI automation, secret references, and policy files.--mode deepruns the full detector set and marksscan_quality.mode=deep;scan_quality.detectors[*]then distinguishes clean coverage from partial, reduced, or blocked detector health instead of forcing you to infer confidence from raw findings alone.- Invalid mode values fail closed with
invalid_input(exit6) and the normal JSON error envelope in--jsonmode. --diffrequires the previous saved snapshot and current scan to use the same recorded scan mode. A mode mismatch fails closed withinvalid_input(exit6) instead of reporting synthetic drift caused by quick/governance/deep scope differences.- When no target is provided and no usable config default target exists,
scan --jsonfails closed with exit6,error.code=invalid_input, and additiveerror.next_steps[]guidance for hosted org setup, the evaluator-safe scenario fallback, and--my-setup.
Flags
--json--json-path--resume--explain--quiet--progress--repo--org--github-org--path--my-setup--target--mode--source-retention--deployment-mode--allow-source-materialization--timeout--diff--enrich--baseline--config--state--policy--approved-tools--production-targets--production-targets-strict--profile--github-api--github-token--report-md--report-md-path--report-template--report-share-profile--report-top--sarif--sarif-path
Status inspection
wrkr scan status --state ./.wrkr/last-scan.json --jsonThe status payload includes status, current_phase, last_successful_phase, repo counts, partial_result, partial_result_marker, phase timings, artifact paths, and source_privacy when scan state includes source-retention metadata.
Completed scans that hit non-fatal source acquisition failures keep partial_result=true in status JSON until the same target is rerun cleanly.
During active, interrupted, or completed-partial scans, additive progress fields may also include progress_percent, progress_message, last_progress_at, elapsed_seconds, phase_progress, repo_progress, and detector_progress.
When present, repo_progress.completed counts repos that reached a terminal source-acquisition result, repo_progress.succeeded isolates successful materializations, and repo_progress.pending stays total - completed so failed repos are counted once.
Existing state files without a status sidecar are interpreted as completed when the state snapshot can be loaded, otherwise unknown.
Developer personal-hygiene example
wrkr scan --my-setup --jsonThis local/offline mode inventories supported user-home tool configs, selected environment key presence, and local agent project markers. Use it when a developer wants to answer "what AI tooling is already on this machine?" before widening to the org workflow.
Environment-key presence and source bookkeeping stay in findings/risk output only; they do not become lifecycle identities, manifest identities, inventory agents, or regress tools.
For the current minimum-now launch posture, security/platform teams should start with the org example below; --my-setup remains the secondary local-machine path.
Security-team org example
wrkr scan --github-org acme --github-api https://api.github.com --json --json-path ./.wrkr/scan.json--github-org is the additive alias for --org. Use it when security or platform teams need the deterministic saved-state input for wrkr report, wrkr evidence, wrkr mcp-list, or wrkr inventory --diff.
Private repos and public API rate-limit avoidance usually require a GitHub token even when --github-api is set.
If you already configured the hosted source and target with wrkr init, you can reuse them:
wrkr init --org acme --github-api https://api.github.com --json
wrkr scan --config ~/.wrkr/config.json --jsonWrkr's hosted connector currently calls these GitHub REST endpoints:
GET /orgs/{org}/repos?per_page=100&page=NGET /repos/{owner}/{repo}GET /repos/{owner}/{repo}/git/trees/{default_branch}?recursive=1GET /repos/{owner}/{repo}/git/blobs/{sha}
Fine-grained PAT guidance for the selected repositories:
- repository metadata: read-only
- repository contents: read-only
Opinionated large-org command path:
wrkr scan --github-org acme --github-api https://api.github.com --state ./.wrkr/last-scan.json --timeout 30m --json --json-path ./.wrkr/scan.json --report-md --report-md-path ./.wrkr/scan-summary.md --sarif --sarif-path ./.wrkr/wrkr.sarifWrkr now exposes one explicit progress contract through --progress auto|bar|plain|events|none.
autois the default.autouseseventsfor--jsonscans so stdout stays reserved for the final JSON payload while progress stays on stderr.autouses a single updatingbaron interactive terminals when Wrkr can safely render it.autodegrades toplainnewline-delimited progress on non-TTY stderr targets or conservative terminals.eventspreserves machine-orientedprogress target=... event=...stderr lines for automation and log parsers.plainemits stable human-readable progress lines to stderr without terminal control characters.barrequests the interactive updating bar explicitly; when stderr cannot safely render it, Wrkr falls back toplainand explains the fallback on stderr.nonedisables progress output without muting errors.
--quiet overrides --progress and suppresses all non-error progress output.
The saved-state report hooks can now target design-partner-summary and the expanded design-partner, external-redacted, and investor-safe share profiles when a scan wants to publish a buyer-facing static action summary directly from the scan flow.
When --json is set, Wrkr keeps stdout reserved for the final JSON payload and emits progress to stderr only. Existing event-style progress for hosted org and local path scans remains available by default through --progress auto. Event-mode progress includes retry, cooldown, resume, per-repo materialization completion/discovery, detector lifecycle detail, heartbeat updates, scan phase transitions, completion, and final footer lines. For hosted scans, repo_materialize means a repo job was dispatched to a worker and repo_materialize_done means that repo reached a success or failure result. For path scans, repo_discovered means a local repo root was selected for detector execution. --json-path writes the same final JSON payload to disk, and --json --json-path emits byte-identical payload bytes to both stdout and the selected file. Any requested --json-path, --report-md-path, or --sarif-path must be unique from one another and from scan-managed --state sibling artifacts.
Long-running source acquisition, detector execution, analysis, and artifact commit phases emit heartbeats with elapsed time so operators can distinguish a slow scan from a stuck scan. The reported percent is an operator UX estimate only. It is additive progress metadata and is not consumed by risk scoring, proof emission, compliance mapping, regress baselines, or policy decisions.
For CI or log-stable automation, prefer --progress none when you want no progress stderr, or --progress events when you want deterministic machine-readable liveness. --quiet is stronger and suppresses progress output entirely.
--resume is supported only when every requested target is an org target. Wrkr stores internal checkpoint metadata under the scan-state directory in org-checkpoints/ and reuses already-materialized repositories only when the checkpoint target set, per-org repo sets, and materialized-root path still match the current org-target scan.
Resume also revalidates that checkpoint files and reused repo roots are still trusted local artifacts under the managed materialized root; symlink-swapped entries fail closed as unsafe_operation_blocked.
Default successful hosted scans remove that managed root, so resume from retained materialized source requires an explicit retention mode such as --source-retention retain for completed runs or retain_for_resume for failed/interrupted runs.
Mixed target sets such as org-plus-path scans fail closed with invalid_input when --resume is requested.
If a run is interrupted after some repositories are checkpointed, rerun the same target with --resume and keep the same --state path. Use wrkr scan status --state <path> --json to inspect the last successful phase, partial marker, and repo counters before rerunning. If partial_result, source_errors, or source_degraded is present, treat the scan as incomplete and rerun after the blocking condition is resolved.
For long org scans, run the foreground command under your process supervisor or shell backgrounding rather than relying on a hidden daemon:
nohup wrkr scan --github-org acme --github-api https://api.github.com --state ./.wrkr/last-scan.json --json --json-path ./.wrkr/scan.json > ./.wrkr/scan.stdout 2> ./.wrkr/scan.stderr &
wrkr scan status --state ./.wrkr/last-scan.json --jsonMixed target example:
wrkr scan --target org:acme --target path:./repos --github-api https://api.github.com --json
wrkr scan --target public-surface:./docs/examples/public-surface-assessment.v1.yaml --jsonRepo/path example
wrkr scan --path ./scenarios/wrkr/scan-mixed-org/repos --profile assessment --report-md --report-md-path ./.tmp/scan-summary.md --report-template operator --jsonThis is the canonical repo_set example for --path: the selected directory is a bundle of immediate child repos, so Wrkr preserves per-child repo manifests and deterministic ordering instead of collapsing the bundle into one repo.
Expected JSON keys include status, additive deployment_mode, target, source_privacy, scan_mode, scan_quality, findings, additive control_backlog, ranked_findings, top_findings, attack_paths, top_attack_paths, additive action_paths, additive action_path_to_control_first, inventory, privilege_budget, agent_privilege_map, repo_exposure_summaries, profile, posture_score, compliance_summary, additive activation, and optional report when summary output is requested.
Top-level deployment_mode and source_privacy.deployment_mode stay aligned. local_only is the default when no alternate mode is explicitly selected.
Public-surface scans also populate source_manifest.public_evidence_manifest_name and additive source_manifest.public_evidence[] rows so downstream reports can preserve public observed facts, inferred context, unsupported public claims, and explicit private-evidence gaps without re-reading the input manifest.
Explicit multi-target runs also emit additive targets[] arrays at the top level and inside source_manifest, and saved state snapshots preserve the same additive targets[] contract.
control_backlog.control_backlog_version is the stable backlog schema version. control_backlog.items[*] includes repo, path, control surface/path type, capability, additive write_path_classes, additive governance_controls, owner/source/status, evidence source/basis, approval status, governance security visibility, queue (control_first|review_queue|inventory_hygiene|debug_only), finding visibility (primary|appendix|debug), recommended action (attach_evidence|approve|remediate|downgrade|deprecate|exclude|monitor|inventory_review|suppress|debug_only), concrete remediation text, confidence (high|medium|low), additive action-path-backed confidence_lane / confidence_lane_reasons when a backlog row is linked to a govern-first path, evidence gaps, confidence-raising guidance, SLA, closure criteria, optional secret signal types, and linked raw finding IDs. Raw findings remain the compatibility evidence surface.
Governance backlog visibility uses known_approved, known_unapproved, unknown_to_security, accepted_risk, deprecated, revoked, and needs_review. Legacy inventory compatibility fields may still emit or accept approved for existing consumers, while new backlog items map that state to known_approved.
scan_quality.scan_quality_version is the stable scan-quality schema version. In governance mode, generated/package-manager surfaces such as node_modules/, dist/, build/, nested generated SDK folders, .yarn/sdks/, VitePress cache/dist assets, minified JavaScript, and other package internals are reported as scan-quality context instead of active backlog items. Parser diagnostics in scan_quality.parse_errors[*] include deterministic recommended_action values such as suppress for generated/package noise and debug_only for non-generated diagnostics.
scan_quality.detectors[*] reports per-repo detector health with deterministic status (complete|partial|reduced|blocked) plus attempted/parsed/partial/suppressed/failure counts and coverage_reasons. Use these rows to judge whether a negative MCP/WebMCP result is trustworthy or whether reduced coverage should keep the repo in a review/debug path.
For local-machine scans, target.mode is my_setup.
When target.mode=my_setup, activation.items projects concrete local tool, MCP, secret, and parse-error signals first without mutating the raw top_findings ranking. Policy-only items remain available in ranked_findings / top_findings.
When target.mode=org, target.mode=path, or target.mode=multi, activation.items projects govern-first candidate paths from the saved privilege map and adds item_class values such as production_target_backed, unknown_to_security_write_path, approval_gap_path, and govern_first_candidate.
action_paths[*] combines path identity, write capability, additive write_path_classes, additive action_classes, additive action_reasons, additive mutable_endpoint_semantics[], additive governance_controls, approval gap, security visibility, credential/deployment posture, delivery-chain metadata (pull_request_write, merge_execute, deploy_write, delivery_chain_status), additive workflow trigger posture (workflow_trigger_class such as scheduled, workflow_dispatch, or deploy_pipeline), production-target truth (production_target_status, production_write), additive execution-identity fields (execution_identity, execution_identity_type, execution_identity_source, execution_identity_status, execution_identity_rationale), additive standing-authority fields (standing_privilege, standing_privilege_reasons), additive credentials[], additive credential_authority, additive purpose/version/config metadata (purpose, purpose_source, purpose_confidence, version, version_source, config_fingerprint, config_source), additive action_lineage.segments[], path_context, tool_family_id, and tool_instance_id, additive buyer-lane fields (confidence_lane, confidence_lane_reasons), path-linked attack_path_score, labeled govern-first dimensions (inventory_risk, control_priority, risk_tier), additive join refs (attack_path_refs, source_finding_keys), and a stable recommended_action enum of inventory|approval|proof|control. Purpose metadata prefers explicit repo-local wrkr:purpose annotations when present, then falls back to structured workflow, MCP, script, symbol, and location evidence.
action_paths[*].path_id is an opaque deterministic identifier currently emitted in apc-<hex> form. Treat it as a stable join key only; do not parse business meaning from its string format.
action_path_to_control_first exposes one prioritized path plus deterministic summary counts (total_paths, write_capable_paths, production_target_backed_paths, govern_first_paths) without removing the legacy attack_paths surfaces.
action_path_to_control_first.summary.empty_state_status and empty_state_reasons are additive metadata explaining whether the current govern-first path set supports a clean buyer-facing empty state, blocks it, or downgrades it because detector coverage was reduced.
--profile assessment narrows govern-first surfaces such as action_paths, action_path_to_control_first, activation, and report summaries for sample/test/vendor-style noise while leaving raw findings, proof output, and exit codes unchanged.
warnings is included when Wrkr can prove posture may be incomplete even though the scan succeeded, for example when known MCP-bearing declaration files failed to parse.
detector_errors is included when non-fatal detector failures occur and partial scan results are preserved.
partial_result, source_errors, and source_degraded are included when source acquisition/materialization has non-fatal failures.
When filesystem permission or stat failures prevent full detector coverage, detector_errors[*].code stays explicit (permission_denied, path_not_found) and scan_quality.detectors[*].status degrades to blocked or reduced instead of quietly presenting a clean negative result.
Downstream wrkr campaign aggregate treats these completeness markers as fail-closed input signals and rejects such artifacts instead of producing a campaign summary from incomplete scans.
sarif.path is included when --sarif output is requested.
compliance_summary.frameworks[*].controls[*] emits deterministic framework/control rollups with mapped_rule_ids, finding_count, and proof-derived coverage status.
inventory.methodology emits machine-readable scan metadata (wrkr_version, timing, repo/file counts, detector inventory).
inventory.agents is always present (possibly empty) and is deterministically sorted by org/framework/instance/location; agent entries may include additive symbol, security_visibility_status, and location_range when parser metadata is available.
Source coverage remains intentionally scoped:
- supported framework-native parsing covers LangChain, CrewAI, OpenAI Agents, AutoGen, LlamaIndex, and MCP-client patterns
- conservative custom-agent scaffolds come from
.wrkr/agents/custom-agent.{yaml,yml,json,toml} - explicit bespoke custom-source coverage uses
wrkr:custom-agentannotations in Python or JS/TS source filesranked_findings[*]andattack_paths[*]now include deterministic agent-aware amplification and edge rationale when agent declarations expose deployment, delegation, dynamic discovery, or bound tool/data/auth/deploy chains.inventory.tools[*]includes deterministicapproval_classification(approved|unapproved|unknown), andinventory.approval_summaryemits aggregate approval-gap ratios for campaign/report workflows.inventory.tools[*],inventory.agents[*], andagent_privilege_map[*]also emit additivesecurity_visibility_statuswithout overloadingapproval_classification. Existing readers should continue to acceptapproved|known_unapproved|unknown_to_security; governance additions may also surfaceknown_approved,accepted_risk,deprecated,revoked, andneeds_reviewwhere lifecycle or approval evidence supports those states.inventory.tools[*],agent_privilege_map[*],control_backlog.items[*], andaction_paths[*]may emit additivewrite_path_classessuch asread,write,pr_write,repo_write,release_write,package_publish,deploy_write,infra_write,secret_bearing_execution, andproduction_adjacent_write.agent_privilege_map[*],inventory.tools[*], andaction_paths[*]also emit additive static endpoint classification viamutable_endpoint_semantics[](read,write,delete,deploy,refund,payment,user_admin,data_export,production_mutation) with deterministic confidence, surface, operation, and evidence refs.action_paths[*]also carries additivetarget_class/target_class_reasons/target_class_evidence_refsplus additiveaction_path_type/action_path_type_reasons/action_path_type_evidence_refsso downstream reports can distinguish production-impacting, release-adjacent, customer-data-adjacent, internal-tooling, developer-productivity, sandbox, and unknown targets, while only using agent-specific language when the path type is actually agentic. These fields are declaration-only; they do not claim live reachability or runtime observation.agent_privilege_map[*]andaction_paths[*]also emit additive credential classification fieldscredential_kind,access_type,standing_access,likely_jit,evidence_location, andclassification_reasons, plus additive normalizedcredential_authorityposture, purpose/version/config metadata,action_lineage, additiveaction_classes,action_reasons, andstanding_privilege_reasons.governance_controls[*]maps review evidence forowner_assigned,approval_recorded,least_privilege_verified,rotation_evidence_attached,deployment_gate_present,production_access_classified,proof_artifact_generated, andreview_cadence_set; each control reportssatisfied,gap, ornot_applicablewith deterministic evidence/gap reasons. Workflow-backed findings may emit additive first-class workflow capabilities such asrepo.write,pull_request.write,merge.execute,release.write,package.write,deploy.write,db.write, andiac.write. Each capability remains static-only and is paired withworkflow_capability.*evidence showing which workflow permission or step pattern produced the claim. Workflow evidence may also carry additiveworkflow_environmentandtarget_class_hintvalues when structured environment or delivery signals are present.inventory.tools[*].locations[*]preserves the legacyownerstring and addsowner_sourceplusownership_statusso CODEOWNERS-backed ownership stays distinguishable from deterministic fallback.agent_privilege_map[*]andaction_paths[*]addoperational_owner, additive ownership provenance, andapproval_gap_reasonsso governance-first paths can show who should act next and why the approval model is incomplete.inventory.security_visibility_summaryemits additive reference-basis and count fields includingunknown_to_security_write_capable_agents.inventory.local_governanceis emitted for--my-setupscans so workstation tool/config discoveries can be compared against an--approved-toolsbaseline without turning secret-presence signals into lifecycle identities.inventory.non_human_identities[*]is emitted when static repo evidence shows durable GitHub App, bot-user, or service-account execution identities behind AI-enabled delivery paths. When a downstream workflow does not have a usablereference_basis, Wrkr suppressesunknown_to_securityclaims rather than fabricating them.inventory.tools[*]also emits report-readytool_categoryand deterministicconfidence_score(0.00-1.00) for inventory breakdown tables.inventory.tools[*]emits normalizedpermission_surface,permission_tier,risk_tier,adoption_pattern, and per-toolregulatory_mappingstatuses.inventory.adoption_summaryandinventory.regulatory_summaryprovide deterministic rollups for report section tables.agent_privilege_map[*]is instance-scoped and includes additiveagent_instance_id,tool_family_id,tool_instance_id,symbol,location,location_range,credentials[],credential_authority, purpose/version/config metadata, andpath_contextfields for multi-agent same-file repos and multi-credential authority paths.--approved-tools <path>accepts a schema-validated YAML policy (schemas/v1/policy/approved-tools.schema.json) for explicit approved-list matching (tool_ids,agent_ids,tool_types,orgs,reposvia exact/prefix sets). Invalid--approved-toolspolicy files fail closed withinvalid_input(exit6). For--my-setup, omitting--approved-toolskeepsinventory.local_governance.reference_basis=unavailableinstead of fabricating sanctioned or unsanctioned local claims. For--repoand--orgscans,source_manifest.repos[*].sourceisgithub_repo_materialized, andsource_manifest.repos[*].locationis a logical hosted reference such asgithub://acme/backend. The detector filesystem root is internal-only and is not serialized in customer-facing artifacts. Prompt-channel findings use stable reason codes and evidence hashes only (pattern_family,evidence_snippet_hash,location_class,confidence_class) and do not emit raw secret values. Secret-bearing workflow evidence separatessecret_reference_detected,secret_value_detected,secret_scope_unknown,secret_rotation_evidence_missing,secret_owner_missing, andsecret_used_by_write_capable_workflow. Workflow references such as${{ secrets.NAME }}are classified as references, not leaked values, and raw secret values are not emitted. Static endpoint detection covers OpenAPI specs, common route files, and MCP declaration hints. Structured OpenAPI parsing is preferred when available; route-file classification is heuristic and lower-confidence by design. When--enrichis enabled, MCP findings include enrich provenance and quality fields:source,as_of,package,version,advisory_count,registry_status,enrich_quality(ok|partial|stale|unavailable),advisory_schema,registry_schema, andenrich_errors. Built-in production-target packs classify common deploy, Terraform/IaC, Kubernetes, package publishing, release automation, database migration, and customer-impacting workflows even when--production-targetsis not supplied. Custom--production-targetsfiles remain authoritative when present, and non-fatal custom-policy load errors may still surfacepolicy_warnings.
Timeout/cancellation contract:
--timeout <duration>bounds end-to-end scan runtime (0disables timeout).- When timeout is exceeded, JSON error code is
scan_timeoutwith exit code1. - When canceled by signal or parent context, JSON error code is
scan_canceledwith exit code1.
Retry/degradation contract:
- GitHub connector retries retryable failures with bounded jittered backoff.
- HTTP
429and recognizable rate-limit403responses retry deterministically. - When GitHub supplies
Retry-AfterorX-RateLimit-Reset, Wrkr uses that observed window before retrying. - Exhausted hosted throttling keeps exit code
1but emits JSON error coderate_limitedso automation can distinguish retryable wait conditions from generic runtime failure. - Repeated transient failures trigger connector cooldown degradation; scan surfaces this in partial-result output (
source_degraded=truewhen applicable). - In
--jsonorg mode, retry/cooldown/resume/completion operator progress is emitted to stderr only; stdout remains reserved for the final JSON payload.
SARIF contract:
--sarifemits a SARIF2.1.0report from scan findings.--sarif-pathselects output path (defaultwrkr.sarif).- SARIF runs include
properties.source_privacywhen source-retention metadata is available. - The core
scan --jsoncontract remains backward-compatible; SARIF is additive.
Approved-tools policy example: docs/examples/approved-tools.v1.yaml.
Production target policy files are YAML and schema-validated (schemas/v1/policy/production-targets.schema.json), with exact/prefix matching only. Example: docs/examples/production-targets.v1.yaml.
Production write rule:
production_write = has_any(write_permissions) AND matches_any_production_targetSafe claim rule:
write_capableis always available from the privilege budget andagent_privilege_map.production_writeis safe to claim only when--production-targetsis configured and valid.- When production targets are missing or invalid, public/report wording must stay at
write_capableand only expose production-target status, not a production-write count.
Every discovered entity now emits discovery_method: static in both findings and inventory.tools for deterministic v1 schema compatibility.
Saved lifecycle-bearing identities written beside scan state are intentionally narrower: real tool, agent, CI, skill, and MCP surfaces only. Posture/bookkeeping findings such as secret_presence, source_discovery, policy_*, and parse_error remain in findings/risk surfaces only.
--explain also emits short compliance rollup lines derived from the same machine-readable compliance_summary contract.
Emerging discovery surfaces are static-only in default deterministic mode:
- WebMCP detection uses repository HTML/JS/route files only.
- A2A detection uses repo-hosted agent-card JSON files only.
- MCP gateway posture is derived from local config files only.
- Non-human execution identities are derived from static workflow/config signals only.
- No live endpoint probing is performed by default.
Wrkr stays in the See boundary: it inventories and scores tools plus agents from files and CI declarations, but it does not claim runtime observation, enforce runtime side effects, or execute agent workflows.
Wrkr also does not assess package or MCP-server vulnerabilities in this path; use dedicated scanners such as Snyk for that class of assessment.
Gait is optional interoperability for control-layer decisions, not a prerequisite for scan.
Custom extension detectors are loaded from .wrkr/detectors/extensions.json when present in scanned repositories. Their findings remain on additive finding and risk surfaces only by default; they do not create authoritative inventory, lifecycle, regress, or action-path state unless a future explicit contract says so. See docs/extensions/detectors.md.
Canonical state and artifact lifecycle: docs/state_lifecycle.md.