Open-source · cost-aware tiered routing · HMAC-chained audit · MIT

Obsidian Spider

A small open-source agent swarm for running 1 parent call → N sub-agents in parallel, with a tamper-evident JSONL audit trail and a starter library of failure-pattern detectors. Run it inside GitHub Copilot Pro+, Claude Code, or a free vendor mesh. No telemetry, no upsell, no account required.

Try it in 10 minutes:

git clone https://github.com/obsidian-spider-org/obsidian-spider-swarm
cd obsidian-spider-swarm
cp .env.example .env   # set CHAIN_HMAC_KEY to a 32+ char random string
cat PROMPT.md          # paste into your agent harness; fill in TARGET + GATHER_INSTRUCTION
python3 hmac_verifier.py <your-output.jsonl>
# Or verify the sample chain offline (~30s) — no API keys needed:
python3 hmac_verifier.py examples/sample_pr_review_wave.jsonl \
  --key "$(grep -v '^#' examples/.env.demo | grep -v '^$' | head -1 | cut -d= -f2-)"
# Expected: {"verified": 5, "failed": 0, "total_rows": 5}

Repo · cost model + worked examples · bootstrap vs vendor-mesh caveat

Why this exists: AI agents trained with RLHF routinely drift into specification-gaming — they appear helpful while doing the wrong thing. At swarm scale the failure compounds invisibly. This framework wraps multi-agent fan-out with HMAC-chained receipts so failures are tamper-evident after the fact. It doesn't prevent every failure; it makes failures auditable.

What you get

Markdown prompt template — paste into your agent harness, get a JSONL audit trail. Parent dispatches N reviewer sub-agents; final pass gathers; you verify the chain later with one command.
Two stdlib-only Python helpers — HMAC chain verifier (~250 LOC) + reward-hack pattern scanner (8 common patterns: sycophancy, scope-bloat, fabricated-evidence, etc.).
Cost-aware tiered routing — frontier models where reasoning matters, cheap models where execution suffices. Works on GitHub Copilot Pro+ (current pricing window), Claude Code Agent tool, free vendor mesh (OpenRouter / Groq / Cerebras), or anywhere a parent message can launch sub-agents.
Profile JSON configurations — progressive disclosure from 2×2 → 4×4 → 8×8.
MCP server stub for Claude Desktop and VSCode-MCP clients.
Cryptographic audit receipts on every public document — content-hash + verify-command appended to README, GIFT_AND_OFFER, QUICKSTART, PROMPT, BOOTSTRAP_NOTE, examples/README. Reproduce with one awk + sha256sum command.

Cost arbitrage — measure your own

If you use GitHub Copilot Pro+ today: the subagent feature inside an agent message gives measurable cost arbitrage versus running the same fan-out as raw API calls. GitHub moves to usage-based billing on June 1, 2026 — the current arbitrage window is closing. Releasing this MIT now so anyone can measure and capture it before the transition.

Headline number on a normal pulse (40 sub-agents per parent call, GPT-5.5-class): raw API equivalent ≈ $112–$142 at retail prices vs. one Copilot Pro+ parent call at $0.30 → ~375×–475× raw-API-equivalent ratio. On Claude Opus 4.7 the ratio is ~170×–220×; on a Claude Max bundled subscription the parent is ≈$0 marginal. To be clear: this is the compute-equivalent you'd need to buy at retail to replicate the fan-out, not a cash saving — nobody pays sticker. Full derivation + four bracketed scenarios.

The routing pattern keeps working after June 1, on Claude Code or free vendor mesh — but the Copilot-specific multiplier is the largest right now.

What this isn't

This is a bootstrap-mode primitive, not Byzantine fault tolerance. On a single-substrate run the "8 reviewers" are 8 personae of one model. The HMAC chain is tamper-evident on outputs; it doesn't make personae independent. For real BFT-class signal you route reviewers across vendor families. Caveat in BOOTSTRAP_NOTE.md.

Quorum + audit reduces slop; doesn't eliminate it. LLMs compound hallucinations in long-running permissive workflows — the chain trail makes the compounding visible after the fact. Human-in-the-loop on the final pass remains non-optional.

Paid services (separate from the gift)

The repo is MIT-licensed and free forever. If your agent pipeline is misbehaving and you'd rather not rediscover 16 months of failure-mode catalogue yourself, there's a 7-tier engagement ladder — free discovery call, free scoping memo, then a 72-hour audit-trail readiness check, focused audit with refund-on-no-finding, hardening sprint, and monthly retainer. First 5 paid engagements at design-partner half-price.

Contact

General questions, contributions, bug reports → [email protected]. Or open a GitHub issue.

Paid-engagement inquiries → see the offer ladder; a single paragraph describing your workflow is enough to start.