TopRank Skills

Home / Claw Skills / Document / agent-mineru
Official OpenClaw rules 72%

agent-mineru

MinerU document parsing CLI with layout.json post-processing and S3 integration. Parse PDF/Word/PPT/images to structured Markdown with formula, table, and code extraction.

Stars

0

Installs

0

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词,发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店,若未安装,请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店,然后安装 agent-mineru 技能。 若已安装,则直接安装 agent-mineru 技能。

Overview

Skill Key
decrystal/agent-mineru
Author
decrystal
Source Repo
openclaw/skills
Version
-
Source Path
skills/decrystal/agent-mineru
Latest Commit SHA
c089770109e08a8d44d67b57124087776b80852a

Extracted Content

SKILL.md excerpt

# Document Parsing with agent-mineru

## Installation

```bash
npm install -g agent-mineru
```

## Authentication

```bash
export MINERU_TOKEN="your_api_token"
```

Get your token at: https://mineru.net/apiManage/docs

## Quick start

```bash
agent-mineru parse https://arxiv.org/pdf/2410.17247   # Parse PDF
agent-mineru extract ./task_id/layout.json             # Extract formulas/tables
agent-mineru convert ./task_id/layout.json -o custom.md # Custom Markdown
```

## Important: HTML vs PDF output difference

- **PDF/Doc/PPT/Image** → ZIP with `layout.json` + `full.md` + `images/` → supports fine-grained post-processing
- **HTML** → only `full.md` → no layout.json, no post-processing available

## Commands

### Parse (single file)

```bash
agent-mineru parse <url|file>              # Auto-detect type, parse & download
agent-mineru parse ./paper.pdf             # Local file
agent-mineru parse https://example.com/doc.pdf --model pipeline
agent-mineru parse https://example.com/page.html  # Auto-selects MinerU-HTML
agent-mineru parse ./paper.pdf --no-wait   # Submit only, don't wait
agent-mineru parse ./paper.pdf --json      # JSON output for piping
agent-mineru parse ./paper.pdf --s3        # Auto-upload to S3 after download
```

### Parse batch (multiple URLs)

```bash
agent-mineru parse-batch url1.pdf url2.pdf
agent-mineru parse-batch --file urls.txt         # URLs from file
agent-mineru parse-batch --file urls.txt --model vlm
```

### Upload (local files)

```bash
agent-mineru upload ./paper1.pdf ./paper2.pdf
agent-mineru upload ./docs/*.pdf --model pipeline
```

### Check status

```bash
agent-mineru status <task_id>              # Single task status
agent-mineru status <task_id> --json       # JSON output
agent-mineru status-batch <batch_id>       # Batch task status
```

### Extract elements (PDF only, needs layout.json)

```bash
agent-mineru extract <json_file>                           # All elements as JSON
agent-mineru extract layout.json --types formula...

Related Claw Skills