TopRank Skills

Home / Claw Skills / 数据解析 / mineru-pdf
Official OpenClaw rules 54%

mineru-pdf

Parse PDF documents with MinerU MCP to extract text, tables, and formulas. Supports multiple backends including MLX-accelerated inference on Apple Silicon.

Stars

0

Installs

0

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词,发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店,若未安装,请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店,然后安装 mineru-pdf 技能。 若已安装,则直接安装 mineru-pdf 技能。

Overview

Skill Key
etoile04/mineru-pdf
Author
etoile04
Source Repo
openclaw/skills
Version
-
Source Path
skills/etoile04/mineru-pdf
Latest Commit SHA
7454f1996e3ed8b5f17184173c52280bea34fd3a

Extracted Content

SKILL.md excerpt

# MinerU PDF Parser

Parse PDF documents using MinerU MCP to extract structured content including text, tables, and formulas with MLX acceleration on Apple Silicon.

## Installation

### Option 1: Install MinerU MCP (for Claude Code)

```bash
claude mcp add --transport stdio --scope user mineru -- \
  uvx --from mcp-mineru python -m mcp_mineru.server
```

This installs and configures MinerU for all Claude projects. Models are downloaded on first use.

### Option 2: Use Direct Tool (preserves files)

The skill includes a direct parsing tool that saves output to a persistent directory:

```bash
python /Users/lwj04/clawd/skills/mineru-pdf/parse.py <pdf_path> <output_dir> [options]
```

**Advantages:**
- ✅ Files are saved permanently (not auto-deleted)
- ✅ Full control over output location
- ✅ No MCP overhead
- ✅ Works with any Python environment that has MinerU

## Quick Start

### Method 1: Using the Direct Tool (Recommended)

```bash
# Parse entire PDF
python /Users/lwj04/clawd/skills/mineru-pdf/parse.py \
  "/path/to/document.pdf" \
  "/path/to/output"

# Parse specific pages
python /Users/lwj04/clawd/skills/mineru-pdf/parse.py \
  "/path/to/document.pdf" \
  "/path/to/output" \
  --start-page 0 --end-page 2

# Use Apple Silicon optimization
python /Users/lwj04/clawd/skills/mineru-pdf/parse.py \
  "/path/to/document.pdf" \
  "/path/to/output" \
  --backend vlm-mlx-engine

# Text only (faster)
python /Users/lwj04/clawd/skills/mineru-pdf/parse.py \
  "/path/to/document.pdf" \
  "/path/to/output" \
  --no-table --no-formula
```

### Method 2: Using MinerU MCP (Temporary Files)

### Parse a PDF document

```bash
uvx --from mcp-mineru python -c "
import asyncio
from mcp_mineru.server import call_tool

async def parse_pdf():
    result = await call_tool(
        name='parse_pdf',
        arguments={
            'file_path': '/path/to/document.pdf',
            'backend': 'pipeline',
            'formula_enable': True,
            'table_enable': True,
            'sta...

Related Claw Skills