TopRank Skills

Home / Claw Skills / Browser automation / multimodal-memory
Official OpenClaw rules 36%

multimodal-memory

Remember and retrieve visual content from conversations. Use when: (1) user sends an image, photo, chart, diagram, or screenshot and wants it saved/remembered; (2) user asks to capture or remember a website, URL, or web page UI; (3) user asks what you've seen before, wants to recall a past image, or searches visual memories; (4) user sends an image to find similar past content.

Stars

0

Installs

0

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词,发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店,若未安装,请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店,然后安装 multimodal-memory 技能。 若已安装,则直接安装 multimodal-memory 技能。

Overview

Skill Key
horisky/minds-eye
Author
horisky
Source Repo
openclaw/skills
Version
-
Source Path
skills/horisky/minds-eye
Latest Commit SHA
c1aed91fa1ba1f36a13792746753cf8343b10815

Extracted Content

SKILL.md excerpt

# Multimodal Memory

Store and retrieve visual content — user images, charts, diagrams, website UIs — across conversations.

## Important: Image Analysis

**The primary model may not support vision.** Always use `analyze.py` to analyze images — it calls GPT-4o directly via API and does not rely on your own vision capability.

## Storage Location

All data lives in `~/.multimodal-memory/`:
- `images/` — saved copies of captured images
- `metadata.db` — SQLite database (auto-created)
- `memory.md` — human-readable summary (auto-updated)

Read `~/.multimodal-memory/memory.md` at session start for a quick overview.

## Scenarios & Actions

### 1. User Sends an Image / Chart / Diagram

When a user sends an image, OpenClaw saves it locally and provides the file path in the message context (look for a path like `/tmp/...` or `~/.openclaw/...`).

Run `analyze.py` with that path — it calls GPT-4o to analyze and stores the result automatically:

```bash
python {baseDir}/scripts/analyze.py \
  --image-path "/absolute/path/to/image.jpg" \
  --source "image"
```

For charts use `--source "chart"`, for diagrams use `--source "image"`.

**If you cannot find the file path in the message context**, ask the user:
> "请问这张图片保存在哪个路径?或者你可以直接粘贴文件路径给我。"

### 2. User Asks to Capture / Remember a Website

Step 1 — take the screenshot:
```bash
python {baseDir}/scripts/capture_url.py --url "https://example.com"
```
The script prints the saved screenshot path.

Step 2 — analyze and store it:
```bash
python {baseDir}/scripts/analyze.py \
  --image-path "/path/printed/above.png" \
  --source "website" \
  --url "https://example.com"
```

### 3. User Searches by Text

```bash
python {baseDir}/scripts/search.py --query "login screen dark theme"
```

Show results with descriptions and image paths.

### 4. User Sends an Image to Search (find similar memories)

Step 1 — analyze the query image to get its description:
```bash
python {baseDir}/scripts/analyze.py \
  --image...

README excerpt

# minds-eye 🧠👁️

> Give your AI agent a visual memory — store, search, and recall images, charts, diagrams, and website screenshots across conversations.

**minds-eye** is an [OpenClaw](https://openclaw.ai) skill that lets AI agents remember visual content. Send your agent an image, chart, or website URL — it analyzes it with GPT-4o vision and stores the description, tags, and a copy of the image. Later, search by keyword to retrieve what was seen.

## Features

- **Image analysis** — Analyzes any image with GPT-4o (or compatible vision model)
- **Website capture** — Full-page screenshots of URLs via Playwright or headless Chrome
- **Semantic storage** — SQLite database with description, tags, source type, and URL
- **Keyword search** — Full-text search across all stored visual memories
- **Auto-summary** — Maintains a human-readable `memory.md` of recent entries
- **Works with any OpenAI-compatible API** — Uses your configured provider (OpenClaw, OpenAI, custom endpoint)

## How It Works

```
User sends image
       ↓
analyze.py calls GPT-4o vision API (base64)
       ↓
Returns: description + tags + raw_text
       ↓
store.py saves to SQLite + copies image file
       ↓
Agent confirms: "Saved! Description: ..."
```

## Installation

This skill is designed for [OpenClaw](https://openclaw.ai). Place the folder in your OpenClaw skills directory:

```bash
~/.openclaw/skills/skills/multimodal-memory/
```

For website capture, install Playwright (one-time setup):

```bash
pip install playwright
python -m playwright install chromium
```

## Usage (via OpenClaw agent)

Once installed as an OpenClaw skill, your agent will automatically:

- Analyze and store images sent in conversation
- Capture and remember websites when asked
- Search visual memories on request

### Direct script usage

**Analyze and store an image:**
```bash
python scripts/analyze.py --image-path /path/to/image.jpg --source image
python scripts/analyze.py --image-path chart.png --source chart
```

**Capt...

Related Claw Skills