PDF OCR using Gemini LLM

Extract text from PDFs using Google Gemini OCR. Use when extracting text from PDFs, performing OCR on scanned documents, or processing image-based PDFs.

View Source SKILL.md

Stars

Installs

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词，发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店，若未安装，请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店，然后安装 PDF OCR using Gemini LLM 技能。若已安装，则直接安装 PDF OCR using Gemini LLM 技能。

Overview

Skill Key: ashtonizmev/geminipdfocr
Author: ashtonizmev
Source Repo: openclaw/skills
Version: -
Source Path: skills/ashtonizmev/geminipdfocr
Latest Commit SHA: c1f5803e02f49f7c5799097c044f3e780fb20e05

Extracted Content

SKILL.md excerpt

## Purpose

Use geminipdfocr to extract text from PDF documents via OCR (Google Gemini).

## Data and privacy

**Full page images/files are sent to Google's API.** PDFs are split into single-page files and each page is uploaded to Google Gemini for OCR. There are no hidden exfiltration endpoints or other data collection. Do not use with highly sensitive documents unless you accept that content is sent to Google.

## Setup (venv installation)

Before first use, create and activate the virtual environment:

```bash
cd geminipdfocr && python -m venv venv && source venv/bin/activate && pip install -r requirements.txt
```

Set `GOOGLE_API_KEY` in your environment before running (e.g. `export GOOGLE_API_KEY=your-key`).

## How to use

When requested to extract text or perform OCR on a PDF:

1. Run: `cd geminipdfocr && source venv/bin/activate && python -m geminipdfocr <path-to-pdf> [--json] [--output <file>]`
2. Use `--json` for structured data.
3. Use `--max-pages N` for testing or very long documents.
4. Use `--quiet` to suppress progress logs.

## Requirements

- A valid PDF file path.
- `GOOGLE_API_KEY` set in the process environment (e.g. `export GOOGLE_API_KEY=your-key`).

## CLI options

| Option | Description |
|--------|-------------|
| `pdf_path` | One or more PDF file paths (positional) |
| `--max-pages N` | Limit pages per PDF |
| `--json` | Output structured JSON instead of plain text |
| `--output FILE` | Write result to file (default: stdout) |
| `--quiet` | Suppress INFO/DEBUG logs |

Related Claw Skills

edholofy

dojo.md

★ 4

University for AI agents. 92 courses, 4400+ scenarios, any model via OpenRouter. Auto-training loops generate per-model SKILL.md documents. Works with Claude Code, OpenClaw, Cursor, Windsurf. No fine-tuning required.

openstockdata

stock-data-skill

★ 4

OpenClaw Skill for stock data analysis

lethehades

wps-macos-helper

★ 1

macOS WPS Office workflow helper skill for safer document preparation, conversion, export, and compatibility guidance

capt-marbles

geo-optimization

★ 1

Generative Engine Optimization (GEO) for AI search visibility. Optimize content to appear in ChatGPT, Perplexity, Claude, and Google AI Overviews. Use when optimizing websites, pages, or content for LLM discoverability and citation.

capt-marbles

phantombuster

★ 0

Control PhantomBuster automation agents via API. List agents, launch automations, get output/results, check status, and abort running agents. Use when the user needs to run LinkedIn scraping, Twitter automation, lead generation phantoms, or any PhantomBuster workflow.

capt-marbles

firecrawl

★ 0

Web scraping and crawling with Firecrawl API. Fetch webpage content as markdown, take screenshots, extract structured data, search the web, and crawl documentation sites. Use when the user needs to scrape a URL, get current web info, capture a screenshot, extract specific data from pages, or crawl docs for a framework/library.

Analysis Signals

Dependencies

gh pip python go -r