Overview
- Skill Key
- codylrn804/crawl4ai
- Author
- codylrn804
- Source Repo
- openclaw/skills
- Version
- -
- Source Path
- skills/codylrn804/crawl4ai
- Latest Commit SHA
- 30db0f76104c2ad310bd4a72ff17f107eb751133
AI-powered web scraping framework for extracting structured data from websites. Use when Codex needs to crawl, scrape, or extract data from web pages using AI-powered parsing, handle dynamic content, or work with complex HTML structures.
Stars
0
Installs
0
Status
ACTIVE
Visibility
PUBLIC
直接复制以下提示词,发送给你的 AI 助手即可完成安装。
请先检查是否已安装 SkillHub 商店,若未安装,请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店,然后安装 crawl4ai 技能。 若已安装,则直接安装 crawl4ai 技能。
# Crawl4ai
## Overview
Crawl4ai is an AI-powered web scraping framework designed to extract structured data from websites efficiently. It combines traditional HTML parsing with AI to handle dynamic content, extract text intelligently, and clean and structure data from complex web pages.
## When to Use This Skill
Use when Codex needs to:
- Extract structured data from web pages (products, articles, forms, tables, etc.)
- Scrape websites with dynamic content or complex JavaScript
- Clean and normalize extracted data from various HTML structures
- Work with APIs or web services that return HTML
- Handle CORS limitations by scraping directly
- Process web content at scale with reliability
**Trigger phrases:**
- "Extract data from this website"
- "Scrape this page for [specific data]"
- "Parse this HTML"
- "Get data from [URL]"
- "Extract structured information from [website]"
- "Scrape [website] for [data type]"
- "Web scrape [URL]"
## Quick Start
### Basic Usage
```python
from crawl4ai import AsyncWebCrawler, BrowserMode
async def scrape_page(url):
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
url=url,
browser_mode=BrowserMode.LATEST,
headless=True
)
return result.markdown, result.clean_html
```
### Extracting Structured Data
```python
from crawl4ai import AsyncWebCrawler, JsonModeScreener
import json
async def extract_products(url):
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
url=url,
screenshot=True,
javascript=True,
bypass_cache=True
)
# Extract product data
products = []
for item in result.extracted_content:
if item['type'] == 'product':
products.append({
'name': item['name'],
'price': item['price'],
'url': item['url']
})
return products
```...
openstockdata
OpenClaw Skill for stock data analysis
capt-marbles
Generative Engine Optimization (GEO) for AI search visibility. Optimize content to appear in ChatGPT, Perplexity, Claude, and Google AI Overviews. Use when optimizing websites, pages, or content for LLM discoverability and citation.
camopel
Free multi-engine web search via ddgs CLI (DuckDuckGo, Google, Bing, Brave, Yandex, Yahoo, Wikipedia) + arXiv API search. No API keys required. Use when user needs web search, research paper discovery, or when other skills need a search backend. Drop-in replacement for web-search-plus.
camopel
Local arXiv paper manager with semantic search. Crawls arXiv categories, downloads PDFs, chunks content, and indexes with FAISS + Ollama embeddings. No cloud API keys required — everything runs locally.
camopel
Continuous financial news crawler for finviz.com with SQLite storage, article extraction, and query tool. Use when monitoring financial markets, building news digests, or needing a local financial news database. Runs as a background daemon or systemd service.
capgoblin
Access unsecured credit lines for AI agents on the Arc Network using the Credex Protocol. Use for borrowing USDC against reputation, repaying debt to grow credit limits, providing liquidity as an LP, or managing cross-chain USDC via Circle Bridge. Triggers on "borrow from credex", "repay debt", "deposit to pool", "check credit status", "provide liquidity", or any credit/lending task on Arc.