Infrastructure

Sigil Bot

Name: Sigil
Author: NOMARK

An autonomous scanner that monitors PyPI, npm, ClawHub, and GitHub for new and updated packages, scans them with all eight Sigil phases, and publishes results to the public scan database. Runs 24/7 — no human input required.

What Sigil Bot does

Sigil Bot watches public package registries for newly published and updated packages. When a new package appears, the bot downloads it into quarantine, runs all eight scan phases, stores the results, and publishes a report page at sigilsec.ai/scans.

                ┌──────────────────────┐
                │     SIGIL BOT        │
                │                      │
                │  Monitors registries  │
                │  Downloads packages   │
                │  Runs Sigil scans     │
                │  Stores results       │
                └──────────┬───────────┘
                           │
          ┌────────────────┼────────────────┐
          │                │                │
          ▼                ▼                ▼
   ┌────────────┐  ┌────────────┐  ┌────────────┐
   │ Scan DB    │  │ Badges     │  │ Threat     │
   │ /scans/*   │  │ /badge/*   │  │ Feed       │
   │ pages      │  │ SVGs       │  │ RSS + API  │
   └────────────┘  └────────────┘  └────────────┘

Public scan database

Every scanned package gets a report page. Each page is an SEO surface that AI models and search engines can cite.

Real-time threat feed

New scans published as they happen via RSS feed, API endpoint, and alerts for HIGH RISK and CRITICAL RISK findings.

Badge generation

Automatically generates and caches SVG badges for every scanned package. Badges update when packages are rescanned.

Downstream integrations

The GitHub App, MCP server, and CLI threat intelligence all consume scan data produced by the bot.

Monitored registries

Four registries are monitored continuously. Each has a dedicated watcher process with optimised polling for that registry's API.

PyPI

Polls every 5 min

RSS feeds for new packages and version updates, plus the changelog serial API for incremental event tracking. Packages are downloaded via pip download --no-deps — no code is installed or executed.

Feed	pypi.org/rss/packages.xml + changelog serial API
Scope	AI ecosystem packages (langchain, openai, anthropic, mcp, agent, etc.)
Volume	~200–400 relevant packages/day

npm

Polls every 60 sec

CouchDB _changes stream from the npm registry replicate. Packages in @langchain/*, @anthropic/*, @openai/*, and @modelcontextprotocol/* scopes are scanned regardless of keyword matches.

Feed	replicate.npmjs.com/registry/_changes
Scope	AI ecosystem packages + all MCP-related scopes
Volume	~300–600 relevant packages/day

ClawHub

Polls every 6 hours

REST API paginated by update time. All skills are scanned — no keyword filtering needed. The entire registry is relevant because every skill has direct access to the user's environment.

Feed	clawhub.ai/api/v1/skills?sort=updated
Scope	All skills (no filtering)
Volume	~50–100 new/updated skills per day

GitHub (MCP Servers)

Sweeps every 12 hours

GitHub Search API for repositories matching MCP server patterns, plus the Events API for push events to known repos between sweeps. Repositories are cloned with git clone --depth 1 into quarantine.

Feed	api.github.com/search/repositories + /events
Scope	MCP server repos (>0 stars or >1 commit)
Volume	~20–50 new/updated repos per day

Scan pipeline

Every scan follows the same five-stage pipeline: watch, queue, scan, store, publish.

WATCHER ──▶ QUEUE ──▶ SCANNER ──▶ STORE ──▶ PUBLISHER
 Poll feeds   Redis     Download    Postgres   Report page
 Deduplicate  Priority  Extract     Findings   Badge cache
 Filter       Retry     Sigil scan  Metadata   RSS feed
 Enqueue      Backoff   All phases             Alerts

Deduplication

Key: {ecosystem}:{name}:{version}:{content_hash}. If the exact same content has been scanned, it's skipped. If the version is the same but the content hash differs (re-upload), it's rescanned.

Priority levels

Priority	SLA	Criteria
critical	Immediate	Typosquatting patterns, suspicious new publisher names
high	5 min	MCP scopes, ClawHub skills, popular packages with new versions
normal	30 min	Everything else matching AI keyword filters

Scan isolation

Each scan runs in a fresh temporary directory. No network access during the scan — Sigil is static analysis only. No code is installed or executed. The quarantine directory is destroyed after scanning.

Typosquatting detection

New packages with names within edit distance 2 of popular AI packages are automatically boosted to critical priority. This catches common squatting patterns before developers encounter them.

text

Target packages monitored for typosquats:
  langchain, openai, anthropic, transformers,
  huggingface, crewai, autogen, llamaindex,
  pinecone, chromadb, fastapi, streamlit

Detection patterns:
  Character substitution: langch4in, openal
  Character insertion:    langchainn, openaai
  Character deletion:     langchai, opena
  Transposition:          langchian, openia

Flagged packages receive an additional finding in the Provenance phase noting the name similarity.

Threat feed

Scan results are published to multiple output channels for downstream consumption.

RSS feed

Standard RSS 2.0 feed at sigilsec.ai/feed.xml. Contains the latest 100 scan results. Supports filtered variants:

text

All scans:      sigilsec.ai/feed.xml
Threats only:   sigilsec.ai/feed.xml?verdict=high_risk,critical_risk
ClawHub only:   sigilsec.ai/feed.xml?ecosystem=clawhub
PyPI only:      sigilsec.ai/feed.xml?ecosystem=pypi
npm only:       sigilsec.ai/feed.xml?ecosystem=npm

API endpoint

bash

GET /api/v1/feed?ecosystem={eco}&verdict={v}&limit={n}&since={iso_datetime}

JSON array of recent scans. Same filtering as RSS. This is what the MCP server queries, the GitHub App looks up, and third-party integrations consume.

Alerts

HIGH RISK and CRITICAL RISK findings trigger alerts to subscribed webhook endpoints. Only findings with a risk score of 25 or above generate alerts.

Scan attestations

Every scan produced by Sigil Bot is cryptographically signed and recorded in a public transparency log. This lets anyone verify that a scan result is genuine and untampered.

Ed25519 signatures

Each scan is wrapped in a DSSE envelope and signed with an Ed25519 key. The public key is published at /.well-known/sigil-verify.json.

in-toto attestations

Attestations follow the in-toto Statement v1 format with a custom predicate type for Sigil scan results.

Transparency log

Signed attestations are recorded in the Sigstore Rekor transparency log. Each scan report links to its log entry.

Verification API

Verify any scan via GET /api/v1/verify?scan_id=... or fetch the raw attestation from GET /api/v1/attestation/{id}.

For full verification steps, public keys, and SDK usage, see the Attestation docs.

AI ecosystem filtering

The bot doesn't scan every package on PyPI and npm — it targets the AI agent supply chain. Packages are matched if their name, description, or keywords contain any of these terms:

text

Frameworks:    langchain, crewai, autogen, llamaindex, haystack, dspy
LLM providers: openai, anthropic, cohere, mistral, groq, together
MCP / agents:  mcp, model-context-protocol, agentic, tool-use
RAG:           rag, retrieval, vector, embedding, pinecone, chroma
ML:            transformers, huggingface, diffusers, torch, tensorflow
Skills:        skill, plugin, extension, chatgpt-plugin, copilot-extension

Full coverage registries

All ClawHub skills and GitHub MCP server repos are scanned regardless of keyword matches. No filtering is applied to these registries.

Expected volume

Registry	Scans/day	Avg time	Compute
PyPI (AI-filtered)	200–400	~5 sec	~30 min
npm (AI-filtered)	300–600	~5 sec	~50 min
ClawHub	50–100	~3 sec	~5 min
GitHub MCP	20–50	~8 sec	~7 min
Total	570–1,150	—	~90 min

Bot identity

The bot operates under a dedicated sigil-bot account, separate from NOMARK staff activity. Automated outputs are clearly labeled as automated.

GitHub: The GitHub App acts as sigil-bot[bot]
Scan database: Report pages show “Scanned by Sigil Bot” with timestamp
Threat feed: RSS and API entries attributed to the bot identity

Note

Automated scan results are clearly labeled as automated output. Verdicts are statements of algorithmic opinion — see our Methodology and Terms of Service.

Dispute a result

Packages are scanned automatically from public registries without author consent. If you believe a scan result is incorrect, you can:

Use the “Request a review” link on any scan report page
Email security@sigilsec.ai directly

Disputes are acknowledged within 48 hours. See the full dispute process in our Terms of Service.

Sigil Bot

What Sigil Bot does

Public scan database

Real-time threat feed

Badge generation

Downstream integrations

Monitored registries

PyPI

npm

ClawHub

GitHub (MCP Servers)

Scan pipeline

Deduplication

Priority levels

Scan isolation

Typosquatting detection

Threat feed

RSS feed

API endpoint

Alerts

Scan attestations

Ed25519 signatures

in-toto attestations

Transparency log

Verification API

AI ecosystem filtering

Expected volume

Bot identity

Dispute a result

See also

Need help?