Skip to content
Go To Dashboard

GitHub Export

Export any GitHub repository as a sanitized archive. Strip specified paths, scan for secrets and PII, redact findings in place, and receive a presigned download URL — all free, with no infrastructure to manage.

GitHub Export has no REST gateway endpoint and no ctx.sapiom.* client method — it runs only as a remote MCP tool.

Not available from a workflow step today (no ctx.sapiom.* method, no REST endpoint). Use the remote MCP tool directly.

GitHub Export is driven by an MCP client (agent) reading tool text responses — there is no REST endpoint or SDK method. Both tools return text, not JSON.

Step 1 — Enqueue the export:

Call sapiom_github_export with your parameters. The tool returns text like:

GitHub export job created.
**Job ID:** ghe_a1b2c3d4e5f6
**Status:** queued
Poll for status using `sapiom_github_export_status` with this job_id.
The export typically takes 10-15 seconds.

The agent reads the **Job ID:** line to get the job ID for polling.

Step 2 — Poll until done (every 2–5 seconds):

Call sapiom_github_export_status with { job_id: "ghe_a1b2c3d4e5f6" }. Keep polling until the **Status:** line shows a terminal value: complete, failed, blocked, or expired.

While still running, the tool returns:

**Job ID:** ghe_a1b2c3d4e5f6
**Status:** processing
Export is still processing. Poll again in a few seconds.

Step 3 — Read the download URL (on complete):

When **Status:** complete, the tool returns:

**Job ID:** ghe_a1b2c3d4e5f6
**Status:** complete
**Format:** tar.gz
**Archive Size:** 4.2 MB
**Download URL:** https://…
**Expires:** 2026-…
**Redactions:** 2 secrets were replaced with `REDACTED` in the archive.
**Patterns:** AWS_SECRET_ACCESS_KEY, GITHUB_TOKEN
**Files:** .env, config/secrets.yaml

The agent reads the **Download URL:** line and fetches the archive directly.

GitHub Export is an asynchronous job backed by BullMQ:

  1. Enqueuesapiom_github_export creates a job and immediately returns a job_id (format: ghe_…) with status queued. The enqueue call typically takes under 2 seconds.
  2. Process — A background worker clones the repository at the specified ref, strips any strip_patterns, and runs the secret/PII scanner.
  3. Scan — Depending on scan_level, the archive is scanned for secrets and PII.
  4. Complete — The sanitized archive is stored in Cloudflare R2 and a presigned download URL (1h TTL) is minted.
  5. Poll — Call sapiom_github_export_status with the job_id every 2–5 seconds. Stop when the **Status:** line shows a terminal value: complete, failed, blocked, or expired. The typical end-to-end time is 10–15 seconds.

The scan_level parameter controls what the scanner does:

LevelBehavior
offNo scanning — archive is exported as-is after stripping
standardRegex-based secret and PII pattern matching (default)
deepAdds a Claude agent scan on top of regex (Phase 3, not yet live)

The auto_remediate parameter controls what happens when findings are detected:

  • true (default): Findings are replaced with the literal string REDACTED in place, preserving syntactic validity. The export completes with status complete and the redaction summary is included in the status response.
  • false (strict mode): Any critical or high finding blocks the export. Status becomes blocked and no archive is produced. Re-run with auto_remediate: true or strip the offending paths first.
StatusMeaning
queuedJob accepted, waiting to be picked up by a worker
processingWorker is cloning and stripping the repository
scanningArchive is being scanned for secrets/PII
completeArchive ready — download_url is present in the response
failedAn error occurred — failure_reason describes the cause
blockedScan found critical/high findings and auto_remediate was false
expiredThe R2 archive was deleted by lifecycle policy — re-export required

Completed archives are stored in Cloudflare R2 with a lifecycle policy. Once the archive is deleted by R2, polling sapiom_github_export_status returns status expired. There is no way to recover an expired archive — create a new export with sapiom_github_export.

The presigned download URL has a 1-hour TTL and is auto-regenerated on each status check if it has expired (as long as the underlying R2 object still exists).

Archives are stored in Cloudflare R2. Secret and PII scanning uses regex-based pattern matching (standard level). Exports are enqueued via BullMQ.

GitHub Export is reached through the Sapiom MCP server’s sapiom_github_export and sapiom_github_export_status tools — there is no REST gateway endpoint and no ctx.sapiom.* client method. The full parameters are documented below.

Enqueue a new export job. Requires a GitHub PAT or App installation token with repo read access. Returns tool text with a **Job ID:** immediately — the export runs asynchronously.

Key parameters:

ParameterTypeRequiredDefaultDescription
github_tokenstringYesGitHub PAT or App installation token with repo read access
repostringYesFull repo name in owner/repo format
refstringNoHEADBranch, tag, or commit SHA to export
formatstringNotar.gzArchive format: tar.gz or zip
strip_patternsstring[]No[]Glob patterns for paths to strip before scanning
scan_levelstringNostandardoff, standard, or deep
auto_remediatebooleanNotrueRedact findings in place (true) or block on findings (false)

Response (on enqueue) — tool text:

GitHub export job created.
**Job ID:** ghe_a1b2c3d4e5f6
**Status:** queued
Poll for status using `sapiom_github_export_status` with this job_id.
The export typically takes 10-15 seconds.

Check the status of an export job. Returns tool text with the current **Status:** and, when complete, the presigned download URL and redaction summary. Poll until status is one of the terminal values: complete, failed, blocked, or expired.

Parameters:

ParameterTypeRequiredDescription
job_idstringYesThe job ID returned by sapiom_github_export

Response when complete — tool text:

**Job ID:** ghe_a1b2c3d4e5f6
**Status:** complete
**Format:** tar.gz
**Archive Size:** 4.2 MB
**Download URL:** https://…
**Expires:** 2026-…
**Redactions:** 2 secrets were replaced with `REDACTED` in the archive.
**Patterns:** AWS_SECRET_ACCESS_KEY, GITHUB_TOKEN
**Files:** .env, config/secrets.yaml

Response while processing — tool text:

**Job ID:** ghe_a1b2c3d4e5f6
**Status:** processing
Export is still processing. Poll again in a few seconds.

Response when failed or blocked — tool text:

**Job ID:** ghe_a1b2c3d4e5f6
**Status:** blocked
**Reason:** Critical finding detected: AWS_SECRET_ACCESS_KEY

Response when expired — tool text:

**Job ID:** ghe_a1b2c3d4e5f6
**Status:** expired
Archive has expired and is no longer available. Please create a new export with `sapiom_github_export`.

ConditionHow it surfaces
No valid tenant identityThe MCP tool returns an error result (401 from the underlying auth guard) — ensure the MCP server is configured with a valid Sapiom API key
Rate limit exceededThe tool returns an error text result: Rate limit exceeded: max 50 exports per hour. — this is not an HTTP 429 the MCP client catches as an exception; the agent receives it as a tool error response
Job not foundThe tool returns an error text: Export job not found: <id>
LimitValue
Max exports per hour per tenant50
Presigned download URL TTL1 hour (auto-regenerated on status check)

GitHub Export is a control-plane service and is not charged per call. There are no micropayments, no x402 guards, and no balance deduction for any export operation. The only gate is authentication (401 if no valid tenant identity).