TopRank Skills

Home / Claw Skills / Document / scraper
Official OpenClaw rules 54%

scraper

Scrape documents from Notion, DocSend, PDFs, and other sources into local PDF files. Use when the user needs to download, archive, or convert web documents to PDF format. Supports authentication flows for protected documents and session persistence via profiles. Returns local file paths to downloaded PDFs.

Stars

0

Installs

0

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词,发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店,若未安装,请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店,然后安装 scraper 技能。 若已安装,则直接安装 scraper 技能。

Overview

Skill Key
chrisling-dev/links-to-pdfs
Author
chrisling-dev
Source Repo
openclaw/skills
Version
-
Source Path
skills/chrisling-dev/links-to-pdfs
Latest Commit SHA
87e5e4516c50a6a37092ec33d72b88b005eb6454

Extracted Content

SKILL.md excerpt

# docs-scraper

CLI tool that scrapes documents from various sources into local PDF files using browser automation.

## Installation

```bash
npm install -g docs-scraper
```

## Quick start

Scrape any document URL to PDF:

```bash
docs-scraper scrape https://example.com/document
```

Returns local path: `~/.docs-scraper/output/1706123456-abc123.pdf`

## Basic scraping

**Scrape with daemon** (recommended, keeps browser warm):
```bash
docs-scraper scrape <url>
```

**Scrape with named profile** (for authenticated sites):
```bash
docs-scraper scrape <url> -p <profile-name>
```

**Scrape with pre-filled data** (e.g., email for DocSend):
```bash
docs-scraper scrape <url> -D email=user@example.com
```

**Direct mode** (single-shot, no daemon):
```bash
docs-scraper scrape <url> --no-daemon
```

## Authentication workflow

When a document requires authentication (login, email verification, passcode):

1. Initial scrape returns a job ID:
   ```bash
   docs-scraper scrape https://docsend.com/view/xxx
   # Output: Scrape blocked
   #         Job ID: abc123
   ```

2. Retry with data:
   ```bash
   docs-scraper update abc123 -D email=user@example.com
   # or with password
   docs-scraper update abc123 -D email=user@example.com -D password=1234
   ```

## Profile management

Profiles store session cookies for authenticated sites.

```bash
docs-scraper profiles list     # List saved profiles
docs-scraper profiles clear    # Clear all profiles
docs-scraper scrape <url> -p myprofile  # Use a profile
```

## Daemon management

The daemon keeps browser instances warm for faster scraping.

```bash
docs-scraper daemon status     # Check status
docs-scraper daemon start      # Start manually
docs-scraper daemon stop       # Stop daemon
```

Note: Daemon auto-starts when running scrape commands.

## Cleanup

PDFs are stored in `~/.docs-scraper/output/`. The daemon automatically cleans up files older than 1 hour.

Manual cleanup:
```bash
docs-scraper cleanup                    # Delete...

Related Claw Skills