gemini-stt

Transcribe audio files using Google's Gemini API or Vertex AI

View Source SKILL.md

Stars

Installs

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词，发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店，若未安装，请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店，然后安装 gemini-stt 技能。若已安装，则直接安装 gemini-stt 技能。

Overview

Skill Key: araa47/gemini-stt
Author: araa47
Source Repo: openclaw/skills
Version: -
Source Path: skills/araa47/gemini-stt
Latest Commit SHA: ad0dba29f438651f615379c20091b2b9620d8822

Extracted Content

SKILL.md excerpt

# Gemini Speech-to-Text Skill

Transcribe audio files using Google's Gemini API or Vertex AI. Default model is `gemini-2.0-flash-lite` for fastest transcription.

## Authentication (choose one)

### Option 1: Vertex AI with Application Default Credentials (Recommended)

```bash
gcloud auth application-default login
gcloud config set project YOUR_PROJECT_ID
```

The script will automatically detect and use ADC when available.

### Option 2: Direct Gemini API Key

Set `GEMINI_API_KEY` in environment (e.g., `~/.env` or `~/.clawdbot/.env`)

## Requirements

- Python 3.10+ (no external dependencies)
- Either GEMINI_API_KEY or gcloud CLI with ADC configured

## Supported Formats

- `.ogg` / `.opus` (Telegram voice messages)
- `.mp3`
- `.wav`
- `.m4a`

## Usage

```bash
# Auto-detect auth (tries ADC first, then GEMINI_API_KEY)
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg

# Force Vertex AI
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --vertex

# With a specific model
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --model gemini-2.5-pro

# Vertex AI with specific project and region
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --vertex --project my-project --region us-central1

# With Clawdbot media
python ~/.claude/skills/gemini-stt/transcribe.py ~/.clawdbot/media/inbound/voice-message.ogg
```

## Options

| Option | Description |
|--------|-------------|
| `<audio_file>` | Path to the audio file (required) |
| `--model`, `-m` | Gemini model to use (default: `gemini-2.0-flash-lite`) |
| `--vertex`, `-v` | Force use of Vertex AI with ADC |
| `--project`, `-p` | GCP project ID (for Vertex, defaults to gcloud config) |
| `--region`, `-r` | GCP region (for Vertex, default: `us-central1`) |

## Supported Models

Any Gemini model that supports audio input can be used. Recommended models:

| Model | Notes |
|-------|-------|
| `gemini-2.0-flash-lite` | **Default.** Fastest transcription speed...

Related Claw Skills

laborany

★ 50

基于 Claude Code 的桌面 AI 工作力平台 — 支持飞书/QQ 远程调度、技能创建、定时任务。OpenClaw 的桌面实现，零代码养好你的 AI 🦞 Desktop AI workforce platform built on Claude Code. Feishu/QQ bot integration, skill creation, scheduled tasks — OpenClaw for your desktop. Raise your AI lobsters 🦞

win4r

openclaw-remote-minimax-setup-skill

★ 38

Reusable OpenClaw skill for remote Linux deployment with MiniMax M2.1 and Telegram bot setup

botlearn-ai

awesome-openclaw-learning-skills

★ 25

Bots learn, human earns, curated open claw playbook list and skill list for life long learners at https://botlearn.ai

duanecilliers

openclaw-admin

★ 22

Web-based admin dashboard for OpenClaw — manage Discord persona bots, workspace files, skills, cron jobs, channels, and config

abczsl520

bug-audit-skill

★ 20

OpenClaw skill: Dynamic bug audit for Node.js web projects (games, data tools, WeChat, APIs, bots). 200+ real-world pitfalls.

pardnchiu

Agenvoy

★ 15

A Go agentic AI platform with skill routing, multi-provider intelligent dispatch, Discord bot integration, and security-first shared agent design