scrapling

Overview

Skill Key: cryptos3c/openclaw-scrapling
Author: cryptos3c
Source Repo: openclaw/skills
Version: 1.0.0
Source Path: skills/cryptos3c/openclaw-scrapling
Latest Commit SHA: 7a9a8a23a41e360bb7dc56415757d1f51bfa48f9

Extracted Content

SKILL.md excerpt

# Scrapling Web Scraping Skill

Use Scrapling to scrape modern websites, including those with anti-bot protection, JavaScript-rendered content, and adaptive element tracking.

## When to Use This Skill

- User asks to scrape a website or extract data from a URL
- Need to bypass Cloudflare, bot detection, or anti-scraping measures
- Need to handle JavaScript-rendered/dynamic content (React, Vue, etc.)
- Website requires login or session management
- Website structure changes frequently (adaptive selectors)
- Need to scrape multiple pages with rate limiting

## Commands

All commands use the `scrape.py` script in this skill's directory.

### Basic HTTP Scraping (Fast)

```bash
python scrape.py \
  --url "https://example.com" \
  --selector ".product" \
  --output products.json
```

**Use when:** Static HTML, no JavaScript, no bot protection

### Stealth Mode (Bypass Anti-Bot)

```bash
python scrape.py \
  --url "https://nopecha.com/demo/cloudflare" \
  --stealth \
  --selector "#content" \
  --output data.json
```

**Use when:** Cloudflare protection, bot detection, fingerprinting

**Features:**
- Bypasses Cloudflare Turnstile automatically
- Browser fingerprint spoofing
- Headless browser mode

### Dynamic/JavaScript Content

```bash
python scrape.py \
  --url "https://spa-website.com" \
  --dynamic \
  --selector ".loaded-content" \
  --wait-for ".loaded-content" \
  --output data.json
```

**Use when:** React/Vue/Angular apps, lazy-loaded content, AJAX

**Features:**
- Full Playwright browser automation
- Wait for elements to load
- Network idle detection

### Adaptive Selectors (Survives Website Changes)

```bash
# First time - save the selector pattern
python scrape.py \
  --url "https://example.com" \
  --selector ".product-card" \
  --adaptive-save \
  --output products.json

# Later, if website structure changes
python scrape.py \
  --url "https://example.com" \
  --adaptive \
  --output products.json
```

**Use when:** Website frequently redesigns, need robus...

README excerpt

# Scrapling Web Scraping Skill

Advanced web scraping for OpenClaw with anti-bot bypass and adaptive selectors.

## Features

✅ **Anti-Bot Bypass** - Automatically handles Cloudflare Turnstile, bot detection  
✅ **JavaScript Support** - Scrape React, Vue, Angular apps with full browser automation  
✅ **Adaptive Selectors** - Elements auto-relocate when websites redesign  
✅ **Session Management** - Persistent cookies, login state across requests  
✅ **Multiple Modes** - HTTP (fast), Stealth (anti-bot), Dynamic (full browser)  
✅ **Flexible Output** - JSON, JSONL, CSV, Markdown, plain text  

## Quick Start

### Install Skill

Via OpenClaw Gateway UI:
1. Navigate to Skills section
2. Click "Install Skill"
3. Select or upload `scrapling` skill
4. Wait for dependencies to install (~2-5 minutes for browsers)

Via CLI:
```bash
# Install dependencies
cd ~/.openclaw/skills/scrapling
pip install -r requirements.txt
scrapling install  # Downloads browsers (~500MB)
```

### Basic Usage

```bash
# Scrape a static site
python scrape.py --url "https://example.com" --selector ".product" --output products.json

# Bypass anti-bot protection
python scrape.py --url "https://protected-site.com" --stealth --selector ".content"

# Scrape JavaScript-rendered content
python scrape.py --url "https://spa-app.com" --dynamic --selector ".item"

# Adaptive mode (survives website changes)
python scrape.py --url "https://site.com" --selector ".product" --adaptive-save
# Later, even if site redesigns:
python scrape.py --url "https://site.com" --adaptive
```

## Examples

Check the `examples/` directory for:
- `basic.py` - Simple HTTP scraping
- `stealth.py` - Cloudflare bypass example
- `dynamic.py` - JavaScript-heavy sites
- `adaptive.py` - Adaptive selector demo

## Documentation

Full documentation in `SKILL.md` including:
- All command-line options
- Selector types (CSS, XPath)
- Output formats
- Session management
- Troubleshooting guide

## Requirements

- Python 3.10+
- ~500MB disk s...

TopRank Skills

安装方式

Overview

Extracted Content

SKILL.md excerpt

README excerpt

Related Claw Skills

openbotx

bambu-studio-ai

zsxq-digest

stock-data-skill

browser-use-skill

veriglow-agent-map-skill