TopRank Skills

Home / Claw Skills / Document / crawl
Official OpenClaw rules 54%

crawl

Crawl any website and save pages as local markdown files. Use when you need to download documentation, knowledge bases, or web content for offline access or analysis. No code required - just provide a URL.

Stars

0

Installs

0

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词,发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店,若未安装,请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店,然后安装 crawl 技能。 若已安装,则直接安装 crawl 技能。

Overview

Skill Key
evanydl/tavily-crwal
Author
evanydl
Source Repo
openclaw/skills
Version
-
Source Path
skills/evanydl/tavily-crwal
Latest Commit SHA
b8f784a2de7c650b768f5ae544d7f891c364c421

Extracted Content

SKILL.md excerpt

# Crawl Skill

Crawl websites to extract content from multiple pages. Ideal for documentation, knowledge bases, and site-wide content extraction.

## Authentication

The script uses OAuth via the Tavily MCP server. **No manual setup required** - on first run, it will:
1. Check for existing tokens in `~/.mcp-auth/`
2. If none found, automatically open your browser for OAuth authentication

> **Note:** You must have an existing Tavily account. The OAuth flow only supports login — account creation is not available through this flow. [Sign up at tavily.com](https://tavily.com) first if you don't have an account.

### Alternative: API Key

If you prefer using an API key, get one at https://tavily.com and add to `~/.claude/settings.json`:
```json
{
  "env": {
    "TAVILY_API_KEY": "tvly-your-api-key-here"
  }
}
```

## Quick Start

### Using the Script

```bash
./scripts/crawl.sh '<json>' [output_dir]
```

**Examples:**
```bash
# Basic crawl
./scripts/crawl.sh '{"url": "https://docs.example.com"}'

# Deeper crawl with limits
./scripts/crawl.sh '{"url": "https://docs.example.com", "max_depth": 2, "limit": 50}'

# Save to files
./scripts/crawl.sh '{"url": "https://docs.example.com", "max_depth": 2}' ./docs

# Focused crawl with path filters
./scripts/crawl.sh '{"url": "https://example.com", "max_depth": 2, "select_paths": ["/docs/.*", "/api/.*"], "exclude_paths": ["/blog/.*"]}'

# With semantic instructions (for agentic use)
./scripts/crawl.sh '{"url": "https://docs.example.com", "instructions": "Find API documentation", "chunks_per_source": 3}'
```

When `output_dir` is provided, each crawled page is saved as a separate markdown file.

### Basic Crawl

```bash
curl --request POST \
  --url https://api.tavily.com/crawl \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "url": "https://docs.example.com",
    "max_depth": 1,
    "limit": 20
  }'
```

### Focused Crawl with Instructions

```bash
curl --request P...

Related Claw Skills