crawl4ai

AI-powered web scraping framework for extracting structured data from websites. Use when Codex needs to crawl, scrape, or extract data from web pages using AI-powered parsing, handle dynamic content, or work with complex HTML structures.

View Source SKILL.md

Stars

Installs

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词，发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店，若未安装，请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店，然后安装 crawl4ai 技能。若已安装，则直接安装 crawl4ai 技能。

Overview

Skill Key: codylrn804/crawl4ai
Author: codylrn804
Source Repo: openclaw/skills
Version: -
Source Path: skills/codylrn804/crawl4ai
Latest Commit SHA: 30db0f76104c2ad310bd4a72ff17f107eb751133

Extracted Content

SKILL.md excerpt

# Crawl4ai

## Overview

Crawl4ai is an AI-powered web scraping framework designed to extract structured data from websites efficiently. It combines traditional HTML parsing with AI to handle dynamic content, extract text intelligently, and clean and structure data from complex web pages.

## When to Use This Skill

Use when Codex needs to:
- Extract structured data from web pages (products, articles, forms, tables, etc.)
- Scrape websites with dynamic content or complex JavaScript
- Clean and normalize extracted data from various HTML structures
- Work with APIs or web services that return HTML
- Handle CORS limitations by scraping directly
- Process web content at scale with reliability

**Trigger phrases:**
- "Extract data from this website"
- "Scrape this page for [specific data]"
- "Parse this HTML"
- "Get data from [URL]"
- "Extract structured information from [website]"
- "Scrape [website] for [data type]"
- "Web scrape [URL]"

## Quick Start

### Basic Usage

```python
from crawl4ai import AsyncWebCrawler, BrowserMode

async def scrape_page(url):
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url=url,
            browser_mode=BrowserMode.LATEST,
            headless=True
        )
        return result.markdown, result.clean_html
```

### Extracting Structured Data

```python
from crawl4ai import AsyncWebCrawler, JsonModeScreener
import json

async def extract_products(url):
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url=url,
            screenshot=True,
            javascript=True,
            bypass_cache=True
        )
        # Extract product data
        products = []
        for item in result.extracted_content:
            if item['type'] == 'product':
                products.append({
                    'name': item['name'],
                    'price': item['price'],
                    'url': item['url']
                })
        return products
```...

Related Claw Skills

openstockdata

stock-data-skill

★ 4

OpenClaw Skill for stock data analysis

capt-marbles

geo-optimization

★ 1

Generative Engine Optimization (GEO) for AI search visibility. Optimize content to appear in ChatGPT, Perplexity, Claude, and Google AI Overviews. Use when optimizing websites, pages, or content for LLM discoverability and citation.

camopel

ddgs-search

★ 0

Free multi-engine web search via ddgs CLI (DuckDuckGo, Google, Bing, Brave, Yandex, Yahoo, Wikipedia) + arXiv API search. No API keys required. Use when user needs web search, research paper discovery, or when other skills need a search backend. Drop-in replacement for web-search-plus.

camopel

arxivkb

★ 0

Local arXiv paper manager with semantic search. Crawls arXiv categories, downloads PDFs, chunks content, and indexes with FAISS + Ollama embeddings. No cloud API keys required — everything runs locally.

camopel

finviz-crawler

★ 0

Continuous financial news crawler for finviz.com with SQLite storage, article extraction, and query tool. Use when monitoring financial markets, building news digests, or needing a local financial news database. Runs as a background daemon or systemd service.

capgoblin

credex-protocol

★ 0

Access unsecured credit lines for AI agents on the Arc Network using the Credex Protocol. Use for borrowing USDC against reputation, repaying debt to grow credit limits, providing liquidity as an LP, or managing cross-chain USDC via Circle Bridge. Triggers on "borrow from credex", "repay debt", "deposit to pool", "check credit status", "provide liquidity", or any credit/lending task on Arc.

Analysis Signals

Dependencies

python java