TopRank Skills

Home / Claw Skills / 机器人 / wechat-article-extractor
Official OpenClaw rules 38%

wechat-article-extractor

Extract full text and figures from a WeChat public account (微信公众号) article URL and save as a clean Markdown file. Handles WeChat's bot-detection by finding mirror sites automatically. Use when the user shares an mp.weixin.qq.com URL and asks to save, archive, extract, or read the article.

Stars

0

Installs

0

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词,发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店,若未安装,请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店,然后安装 wechat-article-extractor 技能。 若已安装,则直接安装 wechat-article-extractor 技能。

Overview

Skill Key
chunhualiao/wechat-article-extractor
Author
chunhualiao
Source Repo
openclaw/skills
Version
-
Source Path
skills/chunhualiao/wechat-article-extractor
Latest Commit SHA
eb46cc32205396ccb0a500f3517058a93ecaf2f9

Extracted Content

SKILL.md excerpt

# WeChat Article Extractor

Extract WeChat public account articles to clean Markdown. WeChat blocks headless browsers (环境异常 CAPTCHA) and `web_fetch` gets empty JS-rendered pages, so the reliable approach is: find a mirror on aggregator sites, then extract content.

## Scope & Boundaries

**This skill handles:**
- Extracting article text, images, and metadata from WeChat article URLs
- Finding mirror copies when direct access is blocked
- Converting HTML to clean Markdown
- Saving output as `.md` files

**This skill does NOT handle:**
- Publishing or syncing to note-taking apps (that's the user's workflow)
- Batch extraction of multiple articles (handle one at a time)
- WeChat login, authentication, or account management
- Translating article content

## Inputs

| Input | Required | Description |
|-------|----------|-------------|
| WeChat URL | Yes | An `mp.weixin.qq.com` link |
| Output filename | No | Defaults to kebab-case of article title |
| Save location | No | Defaults to `/tmp/` |

## Outputs

- A Markdown file with full article content, images, and metadata header
- Console confirmation with file path and character count

## Workflow

### Step 1 — Try direct fetch (fast path)

```
web_fetch(url, extractMode="markdown", maxChars=50000)
```

**Success check:** If result `rawLength > 500` AND content has real paragraphs (not just nav/footer text) → skip to Step 4 Option B.

**Failure indicators:** `rawLength < 500`, content is navigation/boilerplate only, or contains "环境异常" → go to Step 2.

### Step 2 — Extract article metadata

From the URL or any partial content, identify:
- Article title (from `<title>` or og:title)
- Author / account name (from og:description or page content)

If metadata is unavailable from the URL, ask the user for the article title.

### Step 3 — Search for mirrors

```
web_search("<article title> <author/account name>")
```

**Mirror site priority** (ranked by content quality and reliability):
1. **53ai.com** — full content, re...

README excerpt

# wechat-article-extractor

Extract WeChat public account (微信公众号) articles to clean Markdown files with images and metadata.

## Problem

WeChat articles are notoriously difficult to archive:
- Direct scraping is blocked by bot detection (环境异常 CAPTCHA)
- `web_fetch` gets empty JavaScript-rendered shells
- Headless browsers trigger anti-bot measures

This skill works around these limitations by automatically finding mirror copies on aggregator sites, then extracting clean content.

## How It Works

1. Attempts direct fetch (works ~10% of the time)
2. If blocked, searches for mirror copies on aggregator sites (53ai.com, ofweek.com, juejin.cn, etc.)
3. Downloads mirror HTML and extracts article content, images, and metadata
4. Outputs clean Markdown with proper formatting

Falls back to Chrome Extension Relay for very new or niche articles with no mirrors.

## Installation

Copy the skill directory to your OpenClaw skills folder:

```bash
cp -r wechat-article-extractor ~/.openclaw/<workspace>/skills/
```

### Requirements

- Python 3.8+
- `curl` (for downloading mirror pages)
- OpenClaw tools: `web_fetch`, `web_search`, `exec`
- Optional: `browser` tool (for Chrome Relay fallback)

## Usage

Share a WeChat article URL with your agent:

> "Save this article: https://mp.weixin.qq.com/s/example123"

The skill triggers automatically on `mp.weixin.qq.com` URLs.

### Trigger Phrases

- Any `mp.weixin.qq.com` URL
- "extract wechat article"
- "save wechat article"
- "archive wechat"
- "提取公众号文章"
- "保存公众号文章"

## Output Format

```markdown
# Article Title

**作者:** Author Name
**来源:** 微信公众号「Account Name」
**日期:** 2024-01-15
**原文:** https://mp.weixin.qq.com/s/...

---

Full article content with images preserved...
```

## Extraction Script

The included Python script handles HTML-to-Markdown conversion:

```bash
# Extract from downloaded HTML
python3 scripts/extract_wechat.py article.html output.md

# With source URL for metadata
python3 scr...

Related Claw Skills