Pdf Ocr Layout

GLM OCR Multimodal Deep Analysis

View Source SKILL.md

Stars

Installs

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词，发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店，若未安装，请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店，然后安装 Pdf Ocr Layout 技能。若已安装，则直接安装 Pdf Ocr Layout 技能。

Overview

Skill Key: baokui/pdf-ocr-layout
Author: baokui
Source Repo: openclaw/skills
Version: -
Source Path: skills/baokui/pdf-ocr-layout
Latest Commit SHA: be0a5a9a63f12fcc33fc20238d237551bbe0136d

Extracted Content

SKILL.md excerpt

# GLM-OCR Multimodal Deep Analysis

This tool builds a high-precision document parsing pipeline: using **GLM-OCR** for layout element extraction, calling **GLM-4.7** for logical interpretation of table data, and calling **GLM-4.6V** for multimodal visual interpretation of images and charts.

## Pipeline Implementation Architecture

This Skill consists of two core script stages, orchestrated through `glm_ocr_pipeline.py`:

### 1. Extraction Stage (`scripts/glm_ocr_extract.py`)

- **Core Model**: GLM-OCR
- **Function**: Responsible for physical layout analysis of documents
- **Output**: Extract table HTML and clean to Markdown, automatically crop independent chart image files based on Bbox coordinates, and generate intermediate JSON containing full page reading order

### 2. Understanding Stage (`scripts/glm_understanding.py`)

- **Core Model**: GLM-4.7 (text) / GLM-4.6V (visual)
- **Function**: Responsible for deep semantic reasoning of content
- **Logic**:
  - **Tables**: Combine full text context, use GLM-4.7 to analyze business meaning of Markdown table data
  - **Charts**: Combine full text context + cropped images, use GLM-4.6V for multimodal visual analysis

## Invocation Methods

### Command Line Invocation

```bash
# Run complete pipeline: extraction -> cropping -> understanding analysis, supports input in .pdf, .jpg, .png and other formats
python scripts/glm_ocr_pipeline.py \
  --file_path "/data/report_page.jpg" \
  --output_dir "/data/output"
```

## API Parameter Description

| Parameter | Type | Required | Description |
| --- | --- | --- | --- |
| file_path | string | ✅ | Absolute path to input file (supports .pdf, .png, .jpg) |
| output_dir | string | ✅ | Result output directory (used to save cropped images and JSON reports) |

## Return Result Structure (JSON)

The tool returns a list containing layout elements and their deep understanding:

```json
[
  {
    "type": "table",
    "bbox": [100, 200, 500, 600],
    "content_info": "| Revenue | Q1 |\n|-...

Related Claw Skills

edholofy

dojo.md

★ 4

University for AI agents. 92 courses, 4400+ scenarios, any model via OpenRouter. Auto-training loops generate per-model SKILL.md documents. Works with Claude Code, OpenClaw, Cursor, Windsurf. No fine-tuning required.

lethehades

wps-macos-helper

★ 1

macOS WPS Office workflow helper skill for safer document preparation, conversion, export, and compatibility guidance

capt-marbles

firecrawl

★ 0

Web scraping and crawling with Firecrawl API. Fetch webpage content as markdown, take screenshots, extract structured data, search the web, and crawl documentation sites. Use when the user needs to scrape a URL, get current web info, capture a screenshot, extract specific data from pages, or crawl docs for a framework/library.

caqlayan

Tweet Processor

★ 0

Tweet Processor Skill

carev01

md-docs-search

★ 0

Full-text search across structured Markdown documentation archives using SQLite FTS5. Use when you need to search large collections of Markdown articles that are separated by "---" delimiters and contain source URLs (marked with "*Source:" pattern). Provides fast BM25-ranked search with automatic source URL extraction for citations. Ideal for research, documentation lookups, and knowledge base exploration. Requires indexing documentation first with `docs.py index`.

camelsprout

duckdb-en

★ 0

DuckDB CLI specialist for SQL analysis, data processing and file conversion. Use for SQL queries, CSV/Parquet/JSON analysis, database queries, or data conversion. Triggers on "duckdb", "sql", "query", "data analysis", "parquet", "convert data".

Analysis Signals

Dependencies

gh pip python