TopRank Skills

Home / Claw Skills / 文档 / paddleocr-doc-parsing
Official OpenClaw rules 54%

paddleocr-doc-parsing

Parse documents using PaddleOCR's API. Supports both sync and async modes for images and PDFs.

Stars

0

Installs

0

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词,发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店,若未安装,请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店,然后安装 paddleocr-doc-parsing 技能。 若已安装,则直接安装 paddleocr-doc-parsing 技能。

Overview

Skill Key
hiotec/paddleocr-doc-parsing-v2
Author
hiotec
Source Repo
openclaw/skills
Version
-
Source Path
skills/hiotec/paddleocr-doc-parsing-v2
Latest Commit SHA
28c7028d454f1aa5ae9195f7619e42672042fc39

Extracted Content

SKILL.md excerpt

# PaddleOCR Document Parsing

Parse images and PDF files using PaddleOCR's API. Supports both synchronous and asynchronous parsing modes with structured output.

## Resource Links

| Resource              | Link                                                                           |
| --------------------- | ------------------------------------------------------------------------------ |
| **Official Website**  | [https://www.paddleocr.com](https://www.paddleocr.com)                                     |
| **API Documentation** | [https://ai.baidu.com/ai-doc/AISTUDIO/Cmkz2m0ma](https://ai.baidu.com/ai-doc/AISTUDIO/Cmkz2m0ma)         |
| **GitHub**            | [https://github.com/PaddlePaddle/PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) |

## Key Features

- **Multi-format support**: PDF and image files (JPG, PNG, BMP, TIFF)
- **Two parsing modes**:
  - **Sync mode**: Fast response for small files (<600s timeout)
  - **Async mode**: For large files with progress polling
- **Layout analysis**: Automatic detection of text blocks, tables, formulas
- **Multi-language**: Support for 110+ languages
- **Structured output**: Markdown format with preserved document structure

## Setup

1. Visit [PaddleOCR](https://www.paddleocr.com) to obtain your API credentials
2. Set environment variables:

```bash
export PADDLEOCR_ACCESS_TOKEN="your_token_here"
export PADDLEOCR_API_URL="https://your-endpoint.aistudio-app.com/layout-parsing"

# Optional: For async mode
export PADDLEOCR_JOB_URL="https://your-job-endpoint.aistudio-app.com/api/v2/ocr/jobs"
export PADDLEOCR_MODEL="PaddleOCR-VL-1.5"
```

## Usage Examples

### Sync Mode (Default)

For small files and quick processing:

```bash
# Parse local image
{baseDir}/paddleocr_parse.sh document.jpg

# Parse PDF
{baseDir}/paddleocr_parse.sh -t pdf document.pdf

# Parse from URL
{baseDir}/paddleocr_parse.sh https://example.com/document.jpg

# Save output to file
{baseDir}/paddleocr_parse.sh -o result.json document.jpg

# Verbo...

Related Claw Skills