docling

Extract and parse content from web pages, PDFs, documents (docx, pptx), and images using the docling CLI with GPU acceleration. Use INSTEAD of web_fetch for extracting content from specific URLs when you need clean, structured text. Use Brave (web_search) for searching/discovering pages. Use docling when you HAVE a URL and need its content parsed.

View Source SKILL.md

Stars

Installs

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词，发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店，若未安装，请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店，然后安装 docling 技能。若已安装，则直接安装 docling 技能。

Overview

Skill Key: er3mit4/docling
Author: er3mit4
Source Repo: openclaw/skills
Version: 1.0.2
Source Path: skills/er3mit4/docling
Latest Commit SHA: 1ccee7faefccdc2434b902a6042997c23f870a4c

Extracted Content

SKILL.md excerpt

# Docling - Document & Web Content Extraction

CLI tool for parsing documents and web pages into clean, structured text. Uses GPU acceleration for OCR and ML models.

## Prerequisites

- `docling` CLI must be installed (e.g., via `pipx install docling`)
- For GPU support: NVIDIA GPU with CUDA drivers

## When to Use

- **Extract content from a URL** → Use docling (not web_fetch)
- **Search for information** → Use web_search (Brave)
- **Parse PDFs, DOCX, PPTX** → Use docling
- **OCR on images** → Use docling

## Quick Commands

### Web Page → Markdown (default)
```bash
docling "<URL>" --from html --to md
```
Output: creates a `.md` file in current directory (or use `--output`)

### Web Page → Plain Text
```bash
docling "<URL>" --from html --to text --output /tmp/docling_out
```

### PDF with OCR
```bash
docling "/path/to/file.pdf" --ocr --device cuda --output /tmp/docling_out
```

## Key Options

| Option | Values | Description |
|--------|--------|-------------|
| `--from` | html, pdf, docx, pptx, image, md, csv, xlsx | Input format |
| `--to` | md, text, json, yaml, html | Output format |
| `--device` | auto, cuda, cpu | Accelerator (default: auto) |
| `--output` | path | Output directory (recommended: use controlled temp dir) |
| `--ocr` | flag | Enable OCR for images/scanned PDFs |
| `--tables` | flag | Extract tables (default: on) |

## Security Notes

⚠️ **Avoid these flags unless you trust the source:**
- `--enable-remote-services` - can send data to remote endpoints
- `--allow-external-plugins` - loads third-party code
- Custom `--headers` with untrusted values - can redirect requests

## Workflow

1. **For web content extraction**: Use `docling "<URL>" --from html --to text --output /tmp/docling_out`
2. **Read the output file** from the specified output directory
3. **Clean up** the output directory after reading

## GPU Support

Docling supports GPU acceleration via CUDA (NVIDIA). Verify CUDA is available:
```bash
python -c "import torch; print(torch.cuda.i...

Related Claw Skills

openstockdata

stock-data-skill

★ 4

OpenClaw Skill for stock data analysis

edholofy

dojo.md

★ 4

University for AI agents. 92 courses, 4400+ scenarios, any model via OpenRouter. Auto-training loops generate per-model SKILL.md documents. Works with Claude Code, OpenClaw, Cursor, Windsurf. No fine-tuning required.

lethehades

wps-macos-helper

★ 1

macOS WPS Office workflow helper skill for safer document preparation, conversion, export, and compatibility guidance

capt-marbles

geo-optimization

★ 1

Generative Engine Optimization (GEO) for AI search visibility. Optimize content to appear in ChatGPT, Perplexity, Claude, and Google AI Overviews. Use when optimizing websites, pages, or content for LLM discoverability and citation.

camopel

finviz-crawler

★ 0

Continuous financial news crawler for finviz.com with SQLite storage, article extraction, and query tool. Use when monitoring financial markets, building news digests, or needing a local financial news database. Runs as a background daemon or systemd service.

camopel

ddgs-search

★ 0

Free multi-engine web search via ddgs CLI (DuckDuckGo, Google, Bing, Brave, Yandex, Yahoo, Wikipedia) + arXiv API search. No API keys required. Use when user needs web search, research paper discovery, or when other skills need a search backend. Drop-in replacement for web-search-plus.

Analysis Signals

Dependencies

pip python rust