article-extract

提取微信公众号、博客、新闻等网页的正文内容，绕过反爬机制，纯文本输出。

View Source SKILL.md

Stars

Installs

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词，发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店，若未安装，请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店，然后安装 article-extract 技能。若已安装，则直接安装 article-extract 技能。

Overview

Skill Key: caozeal/article-extract
Author: caozeal
Source Repo: openclaw/skills
Version: -
Source Path: skills/caozeal/article-extract
Latest Commit SHA: 40626596cf0833162428422e526919bbb23ff180

Extracted Content

SKILL.md excerpt

# Article Extract

网页文章内容提取工具。支持微信公众号、博客、新闻网站等，输出干净的纯文本内容。

## 特点

- ✅ 绕过微信公众号反爬机制
- ✅ 自动过滤脚本、样式、导航等无关内容
- ✅ 纯 Python 实现，无需额外依赖
- ✅ 支持任意网页 URL

## 安装

无需安装，直接使用 Python 3 运行。

## 使用

```bash
python3 skills/article-extract/scripts/extract.py <url>
```

### 示例

```bash
# 提取微信公众号文章
python3 skills/article-extract/scripts/extract.py "https://mp.weixin.qq.com/s/xxxxx"

# 提取博客文章
python3 skills/article-extract/scripts/extract.py "https://example.com/blog/post"

# 保存到文件
python3 skills/article-extract/scripts/extract.py "https://mp.weixin.qq.com/s/xxxxx" > article.txt
```

## 输出

工具会输出提取的纯文本内容到 stdout，可以通过重定向保存到文件：

```bash
python3 skills/article-extract/scripts/extract.py "https://..." > output.txt
```

## 原理

1. 使用标准浏览器 User-Agent 发送 HTTP 请求
2. 解析 HTML，过滤 `<script>`、`<style>`、`<nav>`、`<footer>` 等无关标签
3. 提取正文文本并清理多余空格

## 限制

- 需要目标网页允许标准浏览器访问
- 对于需要登录或特殊权限的页面可能无法提取
- 某些动态加载的内容（如无限滚动）可能无法完整提取

## 依赖

- Python 3.6+
- 无需第三方库（仅使用标准库）

## 作者

基于 OpenClaw 社区实践封装

Related Claw Skills

captchasco

captchas-openclaw

★ 0

OpenClaw integration guidance for CAPTCHAS Agent API, including OpenResponses tool schemas and plugin tool registration.

capncoconut

x402hub

★ 0

Register, communicate, and earn on the x402hub AI agent marketplace. Use when an agent needs to register on x402hub, browse or claim bounties, submit deliverables, send messages to other agents via x402 Relay, check marketplace stats, or manage agent credentials. Triggers on x402hub, agent marketplace, bounty, relay messaging, agent-to-agent communication, or USDC earning.

capt-marbles

Task Router Skill

★ 0

Task Router

carol-gutianle

Modelready

★ 0

name: modelready description: Start using a local or Hugging Face model instantly, directly from chat. metadata: {"openclaw":{"requires":{"bins": "bash", "curl" }, "env": "URL" }}

cartoonitunes

Ethereum History

★ 0

Read-only factual data about historical Ethereum mainnet contracts. Use when the user asks about a specific contract address, early Ethereum contracts, deployment era, deployer, bytecode, decompiled code, or documented history (what a contract is and is not). Data is non-opinionated and includes runtime bytecode, decompiled code, and editorial history when available. Base URL https://ethereumhistory.com (or set BASE_URL for local/staging).

canbirlik

wiz-light-control

★ 0

Controls Wiz smart bulbs (turn on/off, RGB colors, disco mode) via local WiFi.

Analysis Signals

Dependencies

python