minimax-image-understanding

使用多模态大模型理解图片内容，生成业务含义描述。支持多种模型：(1) MiniMax VLM (2) OpenAI GPT-4V (3) Claude Vision。用于理解截图、图表、文档照片等，生成精准的文字描述。

View Source SKILL.md

Stars

Installs

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词，发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店，若未安装，请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店，然后安装 minimax-image-understanding 技能。若已安装，则直接安装 minimax-image-understanding 技能。

Overview

Skill Key: aidescend/minimax-image-understanding
Author: aidescend
Source Repo: openclaw/skills
Version: -
Source Path: skills/aidescend/minimax-image-understanding
Latest Commit SHA: 3c8e071e6039949e84a4b821bd2280dc71859be2

Extracted Content

SKILL.md excerpt

# 图片理解

调用多模态大模型理解图片，生成精准的业务描述。

## 支持的模型

| 模型 | 环境变量 | 说明 |
|------|----------|------|
| MiniMax VLM | `MINIMAX_API_KEY`, `MINIMAX_API_HOST` | 默认，推荐用于中文理解 |
| OpenAI | `OPENAI_API_KEY` | GPT-4V |
| Anthropic | `ANTHROPIC_API_KEY` | Claude Vision |

## 使用方法

### 前提条件

设置对应模型的环境变量（至少一个）：

```bash
# MiniMax（默认）
export MINIMAX_API_KEY="your-minimax-key"
export MINIMAX_API_HOST="https://api.minimaxi.com"

# 或 OpenAI
export OPENAI_API_KEY="your-openai-key"

# 或 Anthropic
export ANTHROPIC_API_KEY="your-anthropic-key"
```

### 调用脚本

```bash
python3 <skill>/scripts/understand_image.py <图片路径> [model] [prompt]
```

**参数：**
- 图片路径：本地图片文件（PNG、JPG、JPEG、GIF、WebP）
- model（可选）：`minimax`（默认）、`openai`、`anthropic`
- prompt（可选）：自定义提示词

### 示例

```bash
# 使用默认（MiniMax）
python3 ~/.openclaw/workspace/skills/minimax-image-understanding/scripts/understand_image.py /path/to/image.png

# 指定模型
python3 ~/.openclaw/workspace/skills/minimax-image-understanding/scripts/understand_image.py /path/to/image.png openai

# 自定义提示词
python3 ~/.openclaw/workspace/skills/minimax-image-understanding/scripts/understand_image.py /path/to/image.png minimax "描述图表中的数据趋势"
```

## 输出

直接输出图片的业务含义描述，不再罗列元素位置，聚焦数据内容和业务逻辑。

Related Claw Skills

capt-marbles

Task Router Skill

★ 0

Task Router

capncoconut

x402hub

★ 0

Register, communicate, and earn on the x402hub AI agent marketplace. Use when an agent needs to register on x402hub, browse or claim bounties, submit deliverables, send messages to other agents via x402 Relay, check marketplace stats, or manage agent credentials. Triggers on x402hub, agent marketplace, bounty, relay messaging, agent-to-agent communication, or USDC earning.

capevace

claw

★ 0

Real-time event bus for AI agents. Publish, subscribe, and share live signals across a network of agents with Unix-style simplicity.

captchasco

captchas-openclaw

★ 0

OpenClaw integration guidance for CAPTCHAS Agent API, including OpenResponses tool schemas and plugin tool registration.

carol-gutianle

Modelready

★ 0

name: modelready description: Start using a local or Hugging Face model instantly, directly from chat. metadata: {"openclaw":{"requires":{"bins": "bash", "curl" }, "env": "URL" }}

canbirlik

wiz-light-control

★ 0

Controls Wiz smart bulbs (turn on/off, RGB colors, disco mode) via local WiFi.

Analysis Signals

Dependencies

python

External Services

openai anthropic x