Overview
- Skill Key
- horisky/minds-eye
- Author
- horisky
- Source Repo
- openclaw/skills
- Version
- -
- Source Path
- skills/horisky/minds-eye
- Latest Commit SHA
- c1aed91fa1ba1f36a13792746753cf8343b10815
Remember and retrieve visual content from conversations. Use when: (1) user sends an image, photo, chart, diagram, or screenshot and wants it saved/remembered; (2) user asks to capture or remember a website, URL, or web page UI; (3) user asks what you've seen before, wants to recall a past image, or searches visual memories; (4) user sends an image to find similar past content.
Stars
0
Installs
0
Status
ACTIVE
Visibility
PUBLIC
直接复制以下提示词,发送给你的 AI 助手即可完成安装。
请先检查是否已安装 SkillHub 商店,若未安装,请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店,然后安装 multimodal-memory 技能。 若已安装,则直接安装 multimodal-memory 技能。
# Multimodal Memory
Store and retrieve visual content — user images, charts, diagrams, website UIs — across conversations.
## Important: Image Analysis
**The primary model may not support vision.** Always use `analyze.py` to analyze images — it calls GPT-4o directly via API and does not rely on your own vision capability.
## Storage Location
All data lives in `~/.multimodal-memory/`:
- `images/` — saved copies of captured images
- `metadata.db` — SQLite database (auto-created)
- `memory.md` — human-readable summary (auto-updated)
Read `~/.multimodal-memory/memory.md` at session start for a quick overview.
## Scenarios & Actions
### 1. User Sends an Image / Chart / Diagram
When a user sends an image, OpenClaw saves it locally and provides the file path in the message context (look for a path like `/tmp/...` or `~/.openclaw/...`).
Run `analyze.py` with that path — it calls GPT-4o to analyze and stores the result automatically:
```bash
python {baseDir}/scripts/analyze.py \
--image-path "/absolute/path/to/image.jpg" \
--source "image"
```
For charts use `--source "chart"`, for diagrams use `--source "image"`.
**If you cannot find the file path in the message context**, ask the user:
> "请问这张图片保存在哪个路径?或者你可以直接粘贴文件路径给我。"
### 2. User Asks to Capture / Remember a Website
Step 1 — take the screenshot:
```bash
python {baseDir}/scripts/capture_url.py --url "https://example.com"
```
The script prints the saved screenshot path.
Step 2 — analyze and store it:
```bash
python {baseDir}/scripts/analyze.py \
--image-path "/path/printed/above.png" \
--source "website" \
--url "https://example.com"
```
### 3. User Searches by Text
```bash
python {baseDir}/scripts/search.py --query "login screen dark theme"
```
Show results with descriptions and image paths.
### 4. User Sends an Image to Search (find similar memories)
Step 1 — analyze the query image to get its description:
```bash
python {baseDir}/scripts/analyze.py \
--image...
# minds-eye 🧠👁️
> Give your AI agent a visual memory — store, search, and recall images, charts, diagrams, and website screenshots across conversations.
**minds-eye** is an [OpenClaw](https://openclaw.ai) skill that lets AI agents remember visual content. Send your agent an image, chart, or website URL — it analyzes it with GPT-4o vision and stores the description, tags, and a copy of the image. Later, search by keyword to retrieve what was seen.
## Features
- **Image analysis** — Analyzes any image with GPT-4o (or compatible vision model)
- **Website capture** — Full-page screenshots of URLs via Playwright or headless Chrome
- **Semantic storage** — SQLite database with description, tags, source type, and URL
- **Keyword search** — Full-text search across all stored visual memories
- **Auto-summary** — Maintains a human-readable `memory.md` of recent entries
- **Works with any OpenAI-compatible API** — Uses your configured provider (OpenClaw, OpenAI, custom endpoint)
## How It Works
```
User sends image
↓
analyze.py calls GPT-4o vision API (base64)
↓
Returns: description + tags + raw_text
↓
store.py saves to SQLite + copies image file
↓
Agent confirms: "Saved! Description: ..."
```
## Installation
This skill is designed for [OpenClaw](https://openclaw.ai). Place the folder in your OpenClaw skills directory:
```bash
~/.openclaw/skills/skills/multimodal-memory/
```
For website capture, install Playwright (one-time setup):
```bash
pip install playwright
python -m playwright install chromium
```
## Usage (via OpenClaw agent)
Once installed as an OpenClaw skill, your agent will automatically:
- Analyze and store images sent in conversation
- Capture and remember websites when asked
- Search visual memories on request
### Direct script usage
**Analyze and store an image:**
```bash
python scripts/analyze.py --image-path /path/to/image.jpg --source image
python scripts/analyze.py --image-path chart.png --source chart
```
**Capt...
youmind-openlab
AI skill for OpenClaw & Claude Code — recommend from 10000+ Nano Banana Pro (Gemini) image prompts. Smart search by use case, content remix, sample images.
23blocks-os
AI Agent Orchestrator with Skills System - Give AI Agents superpowers: memory search, code graph queries, agent-to-agent messaging. Manage Claude, Codex or any AI Agent from one dashboard. Move Agents between computers and locations
hashgraph-online
AI agent skills for the Universal Registry - search, chat, and register 72,000+ agents across 14+ protocols. Works with Claude, Codex, Cursor, OpenClaw, and any AI assistant.
rito-w
A cross-platform skills manager for AI IDEs. Search marketplace, download locally, and install to Claude, Cursor, Windsurf, and more with one click.
besoeasy
Battle-tested skill library for AI agents. Save 98% of API costs with ready-to-use code for crypto, PDFs, search, web scraping & more. No trial-and-error, no expensive APIs.
openbotx
An open-source platform for orchestrating AI agents — secure, simple, and built for everyone. Multi-agent, real-time task board, web control panel, skills system, browser automation, multi-provider, scheduler, and more. One command to start. Everything from the browser. No coding required.