TopRank Skills

Home / Claw Skills / Bot / siphonclaw
Official OpenClaw rules 56%

siphonclaw

Document intelligence pipeline with visual search, OCR, and field capture

Stars

0

Installs

0

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词,发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店,若未安装,请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店,然后安装 siphonclaw 技能。 若已安装,则直接安装 siphonclaw 技能。

Overview

Skill Key
curtisgc1/siphonclaw
Author
curtisgc1
Source Repo
openclaw/skills
Version
1.2.0
Source Path
skills/curtisgc1/siphonclaw
Latest Commit SHA
db5bbc33c2f65958d73d62107d6eeba8f71e1671

Extracted Content

SKILL.md excerpt

# SiphonClaw

Domain-agnostic document intelligence pipeline. Ingest PDFs, images, and spreadsheets into a searchable knowledge base with dual-track retrieval (text + visual), OCR, confidence scoring, and field capture.

Built for field service engineers, researchers, mechanics, and anyone who needs fast answers from large document collections.

## What SiphonClaw Does

- **Ingest** documents (PDF, Excel, images, screenshots) into a local vector database with text and visual embeddings
- **Search** using triple hybrid retrieval: BM25 keyword matching + semantic text vectors + visual page embeddings, fused with RRF and reranked with a cross-encoder
- **Identify** equipment, parts, or components from photos using vision models, then search the local knowledge base
- **Capture** field fixes and repair notes as first-class knowledge base entries for future retrieval
- **Score** every response with composite confidence (retrieval + faithfulness + relevance + coverage) and footnote-style source citations

## MCP Tools

SiphonClaw exposes five tools via MCP for integration with agents and other MCP-compatible clients.

---

### siphonclaw_search

Search the knowledge base using triple hybrid retrieval (text + visual + keyword).

**Parameters:**

| Name | Type | Required | Description |
|------|------|----------|-------------|
| `query` | string | yes | Natural language search query or exact part number / error code |
| `top_k` | integer | no | Number of results to return (default: 5, max: 20) |
| `filters` | object | no | Metadata filters (e.g., `{"source_type": "service_manual", "model": "ModelA"}`) |
| `mode` | string | no | Search mode: `"hybrid"` (default), `"text"`, `"visual"`, `"keyword"` |

**Returns:**

```json
{
  "results": [
    {
      "content": "Extracted text from the matching chunk or page",
      "source": "ServiceManual_ModelA.pdf",
      "page": 42,
      "section": "4.3 Transformer Replacement",
      "score": 0.92,
      "match_type": "hybrid"
    }...

README excerpt

# SiphonClaw

**Document intelligence pipeline with triple hybrid search, visual retrieval, and a learning loop.**

You are a mobile worker. Your documentation lives in a thousand-page PDF on a shared drive somewhere. The fix that saved you four hours last month is trapped in your head -- or worse, in someone else's. There is no system that connects what you know to what you need, when you need it. SiphonClaw is that system.

## What It Does

- **Ingest any document collection** -- PDFs, spreadsheets, images, scanned pages, URLs -- into a searchable knowledge base with zero manual tagging
- **Ask questions in natural language**, get cited answers with confidence scores and source attribution
- **Capture field fixes and resolutions** that feed back into the knowledge base, making every solved problem improve future answers
- **Identify parts and equipment from photos** using vision AI with automatic OCR extraction
- **Access from anywhere** -- Telegram bot, email pipeline, CLI, or Python API
- **Five-tier model routing** with automatic failover and budget protection -- runs on a $0.50/month budget or fully local for free

## The Pipeline

```
Documents (PDFs, images, spreadsheets, URLs)
                    |
          +-------------------+
          |   INGESTION       |
          |   OCR (Qwen3-VL)  |
          |   Chunking        |
          |   Metadata detect |
          +-------------------+
                    |
              ChromaDB + BM25
              (indexed + stored)
                    |
                    v
              USER QUERY
                    |
          +---------+---------+
          |   Query Expander  |
          +---------+---------+
                    |
    +---------------+---------------+
    |               |               |
+---+-----+   +----+----+   +------+------+
| BM25    |   | BGE-M3  |   | Visual      |
| Keyword |   | Vector  |   | Page Search |
+---------+   +---------+   +-------------+
    |               |...

Related Claw Skills