TopRank Skills

Home / Claw Skills / 数据解析 / pdf-process-mineru
Official OpenClaw rules 72%

pdf-process-mineru

PDF document parsing tool based on local MinerU, supports converting PDF to Markdown, JSON, and other machine-readable formats.

Stars

0

Installs

0

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词,发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店,若未安装,请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店,然后安装 pdf-process-mineru 技能。 若已安装,则直接安装 pdf-process-mineru 技能。

Overview

Skill Key
baokui/pdf-parser-mineru
Author
baokui
Source Repo
openclaw/skills
Version
-
Source Path
skills/baokui/pdf-parser-mineru
Latest Commit SHA
5683e76b5bb21abd44619e2660bb04bf8b69c64c

Extracted Content

SKILL.md excerpt

## Tool List

### 1. pdf_to_markdown

Convert PDF documents to Markdown format, preserving document structure, formulas, tables, and images.

**Description**: Use MinerU to parse PDF documents and output in Markdown format, supporting OCR, formula recognition, table extraction, and other features.

**Parameters**:
- `file_path` (string, required): Absolute path to the PDF file
- `output_dir` (string, required): Absolute path to the output directory
- `backend` (string, optional): Parsing backend, options: `hybrid-auto-engine` (default), `pipeline`, `vlm-auto-engine`
- `language` (string, optional): OCR language code, such as `en` (English), `ch` (Chinese), `ja` (Japanese), etc., defaults to auto-detection
- `enable_formula` (boolean, optional): Whether to enable formula recognition, defaults to true
- `enable_table` (boolean, optional): Whether to enable table extraction, defaults to true
- `start_page` (integer, optional): Start page number (starting from 0), defaults to 0
- `end_page` (integer, optional): End page number (starting from 0), defaults to -1 meaning parse all pages

**Return Value**:
```json
{
  "success": true,
  "output_path": "/path/to/output",
  "markdown_content": "Converted Markdown content...",
  "images": ["List of image paths"],
  "tables": ["List of table information"],
  "formula_count": 10
}
```

**Examples**:
```bash
python .claude/skills/pdf-process/script/pdf_parser.py \
  '{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output"}}'

# Use specific backend
python .claude/skills/pdf-process/script/pdf_parser.py \
  '{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "backend": "pipeline"}}'

# Parse specific pages
python .claude/skills/pdf-process/script/pdf_parser.py \
  '{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "start_page": 0, "end_page": 5}}'
```...

Related Claw Skills