TopRank Skills

Home / Claw Skills / 搜索 / web-fetcher
Official OpenClaw rules 36%

web-fetcher

Fetch web pages and extract readable content for AI use. Use when reading, summarizing, or crawling a specific URL or small set of URLs. Prefer low-friction URL-to-Markdown services first, then fall back to browser-based retrieval, search snippets, or cached/indexed copies when sites are protected by Cloudflare or similar bot checks.

Stars

0

Installs

0

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词,发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店,若未安装,请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店,然后安装 web-fetcher 技能。 若已安装,则直接安装 web-fetcher 技能。

Overview

Skill Key
aurthes/aurthes-web-fetcher
Author
aurthes
Source Repo
openclaw/skills
Version
1.2.0
Source Path
skills/aurthes/aurthes-web-fetcher
Latest Commit SHA
dd9ed0dbcba39fa67f056668c8d4daaafd2e9316

Extracted Content

SKILL.md excerpt

# Web Fetcher

Fetch readable web content with a reliability-first fallback chain.

## Core rule

Do **not** promise direct access to every site. Some sites use Cloudflare, login walls, bot detection, or legal restrictions. In those cases, switch to the next fallback instead of insisting the first method should work.

## Preferred fetch order

### 1) Direct readable fetch

Try lightweight conversion services first:

1. **r.jina.ai**
   ```
   https://r.jina.ai/http://example.com
   ```

2. **markdown.new**
   ```
   https://markdown.new/https://example.com
   ```

3. **defuddle**
   ```
   https://defuddle.md/https://example.com
   ```

For deterministic retries, use the bundled script:

```bash
python {baseDir}/scripts/fetch_url.py "https://example.com/article"
```

The script returns JSON with:
- chosen method
- attempt history
- blocked/thin-content detection
- final content when successful

Use these when the user wants article text, page summaries, or structured extraction from normal public pages.

### 2) Detect failure modes early

Treat the fetch as failed or unreliable if you see signs like:

- `Just a moment...`
- `Performing security verification`
- `Enable JavaScript and cookies`
- CAPTCHA / challenge pages
- login wall instead of target content
- obvious truncation / missing article body

When this happens, **stop treating the result as the page content**.

### 3) Browser fallback for protected sites

For sites blocked behind Cloudflare or requiring real browser execution:

- Prefer a real browser session via OpenClaw browser tools when available.
- If the user is using the Chrome relay/extension, ask them to attach the tab and then inspect the live rendered page.
- Snapshot the page and extract only the needed fields.

Use browser fallback for:
- JS-heavy pages
- Cloudflare-protected pages
- sites that render key content after load
- pages where the direct markdown services return verification screens

### 4) Search / indexed fallback

If direct fetch...

Related Claw Skills