x-extract

Overview

Skill Key: chunhualiao/x-extract
Author: chunhualiao
Source Repo: openclaw/skills
Version: -
Source Path: skills/chunhualiao/x-extract
Latest Commit SHA: 4d53f8c243db476e39eb692a638981635211455d

Extracted Content

SKILL.md excerpt

# X.com Tweet Extraction

Extract tweet content (text, media, author, metadata) from x.com URLs without requiring Twitter/X credentials.

## How It Works

Uses OpenClaw's browser tool to load the tweet page, then extracts content from the rendered HTML.

## Workflow

### 1. Validate URL

Check that the URL is a valid x.com/twitter.com tweet:
- Must contain `x.com/*/status/` or `twitter.com/*/status/`
- Extract tweet ID from URL pattern: `/status/(\d+)`

### 2. Open in Browser

```javascript
browser action=open profile=openclaw targetUrl=<x.com-url>
```

Wait for page load (targetId returned).

### 3. Capture Snapshot

```javascript
browser action=snapshot targetId=<TARGET_ID> snapshotFormat=aria
```

### 4. Extract Content

From the snapshot, extract:

**Required fields:**
- **Tweet text**: Look for role=article containing the main tweet content
- **Author**: role=link with author name/handle (usually @username format)
- **Timestamp**: role=time element

**Optional fields:**
- **Media**: role=img or role=link containing /photo/, /video/
- **Engagement**: Like count, retweet count, reply count (in role=group or role=button)
- **Thread context**: If tweet is part of thread, note previous/next tweet references

### 5. Format Output

Output as structured markdown:

```markdown
# Tweet by @username

**Author:** Full Name (@handle)  
**Posted:** YYYY-MM-DD HH:MM  
**Source:** <original-url>

---

<Tweet text content here>

---

**Media:**
- ![Image 1](<media-url-1>)
- ![Image 2](<media-url-2>)

**Engagement:**
- 👍 Likes: 1,234
- 🔄 Retweets: 567
- 💬 Replies: 89

**Thread:** [Part 2/5] | [View full thread](<thread-url>)
```

### 6. Download Media (Optional)

If user requests `--download-media` or "download images":

1. Extract all media URLs from snapshot
2. Use `exec` with `curl` or `wget` to download:
   ```bash
   curl -L -o "tweet-{tweetId}-image-{n}.jpg" "<media-url>"
   ```
3. Report downloaded files with paths

## Error Handling

**If page fails to load:**
- Check...

README excerpt

# x-extract

Extract tweet content from x.com URLs without requiring Twitter/X API credentials.

## Description

Browser-based tweet extraction tool that captures tweet text, author information, media, and engagement metrics from public x.com/twitter.com URLs using OpenClaw's browser automation.

## Usage

Trigger phrases:
- "extract tweet [URL]"
- "get tweet content from [URL]"
- "download x.com link [URL]"
- Any x.com/*/status/* or twitter.com/*/status/* URL

## Features

- ✅ No API credentials required
- ✅ Extract text, author, timestamp, media URLs
- ✅ Capture engagement metrics (likes, retweets, replies)
- ✅ Thread detection and extraction
- ✅ Optional media download
- ✅ Structured markdown output

## Requirements

- OpenClaw with browser tool enabled
- Profile: `openclaw` (or any browser profile)

## Limitations

- Cannot access protected/private tweets
- Cannot access login-required content (age-restricted, controversial)
- May be affected by X.com layout changes
- Subject to X.com rate limiting

## Documentation

See [SKILL.md](SKILL.md) for detailed workflow and technical documentation.

## Version

1.0.0 (2026-02-16)

TopRank Skills

安装方式

Overview

Extracted Content

SKILL.md excerpt

README excerpt

Related Claw Skills

dojo.md

humanpages

wps-macos-helper

openclaw-skills

swotpal-swot-analysis

excalidraw-diagram