zhipu-asr

Overview

Skill Key: franklu0819-lang/zhipu-asr
Author: franklu0819-lang
Source Repo: openclaw/skills
Version: -
Source Path: skills/franklu0819-lang/zhipu-asr
Latest Commit SHA: 42bc707031f51f68def488a534c976f74d838e7d

Extracted Content

SKILL.md excerpt

# Zhipu AI Automatic Speech Recognition (ASR)

Transcribe Chinese audio files to text using Zhipu AI's GLM-ASR model.

## Setup

**1. Get your API Key:**
Get a key from [Zhipu AI Console](https://bigmodel.cn/usercenter/proj-mgmt/apikeys)

**2. Set it in your environment:**
```bash
export ZHIPU_API_KEY="your-key-here"
```

## Supported Audio Formats

- **WAV** - Recommended, best quality
- **MP3** - Widely supported
- **OGG** - Auto-converted to MP3
- **M4A** - Auto-converted to MP3
- **AAC** - Auto-converted to MP3
- **FLAC** - Auto-converted to MP3
- **WMA** - Auto-converted to MP3

> **Note:** The script automatically converts unsupported formats to MP3 using ffmpeg. Only WAV and MP3 are accepted by the API, but you can use any format that ffmpeg supports.

## File Constraints

- **Maximum file size:** 25 MB
- **Maximum duration:** 30 seconds
- **Recommended sample rate:** 16000 Hz or higher
- **Audio channels:** Mono or stereo

## Usage

### Basic Transcription

Transcribe an audio file with default settings:

```bash
bash scripts/speech_to_text.sh recording.wav
```

### Transcription with Context

Provide previous transcription or context for better accuracy:

```bash
bash scripts/speech_to_text.sh recording.wav "这是之前的转录内容，有助于提高准确性"
```

### Transcription with Hotwords

Use custom vocabulary to improve recognition of specific terms:

```bash
bash scripts/speech_to_text.sh recording.mp3 "" "人名,地名,专业术语,公司名称"
```

### Full Options

Combine context and hotwords:

```bash
bash scripts/speech_to_text.sh recording.wav "会议记录片段" "张三,李四,项目名称"
```

**Parameters:**
- `audio_file` (required): Path to audio file (.wav or .mp3)
- `prompt` (optional): Previous transcription or context text (max 8000 chars)
- `hotwords` (optional): Comma-separated list of specific terms (max 100 words)

## Features

### Context Prompts

**Why use context prompts:**
- Improves accuracy in long conversations
- Helps with domain-specific terminology
- Mai...

README excerpt

# Zhipu AI ASR Skill

Automatic Speech Recognition (ASR) using Zhipu AI (BigModel) GLM-ASR model. Transcribe Chinese audio files to text with high accuracy.

## Features

- 🎤 **Multiple Audio Formats**: WAV, MP3, OGG, M4A, AAC, FLAC, WMA
- 🇨🇳 **Chinese Language Support**: Optimized for Mandarin Chinese
- 📝 **Context Prompts**: Improve accuracy with previous transcription context
- 🔥 **Hotwords**: Custom vocabulary for specific terms (names, jargon, etc.)
- ⚡ **Fast Processing**: Real-time or faster transcription speed
- 🔄 **Auto Format Conversion**: Automatically converts unsupported formats to MP3

## Requirements

- `jq` - JSON processor
- `ffmpeg` - Audio format conversion
- `ZHIPU_API_KEY` environment variable

## Quick Start

```bash
# Install dependencies (if needed)
sudo apt-get install jq ffmpeg

# Set your API key
export ZHIPU_API_KEY="your-key-here"

# Transcribe an audio file
bash scripts/speech_to_text.sh recording.wav

# With context and hotwords
bash scripts/speech_to_text.sh recording.wav "previous context" "term1,term2,term3"
```

## File Constraints

- **Max file size**: 25 MB
- **Max duration**: 30 seconds
- **Supported formats**: WAV (recommended), MP3
- **Other formats**: Auto-converted to MP3

## Use Cases

- 🎙️ Meeting transcription
- 📚 Lecture recording
- 💼 Voice memos
- 🎞️ Video subtitle generation
- 📞 Call recording transcription

## Author

franklu0819-lang

## License

MIT

TopRank Skills

安装方式

Overview

Extracted Content

SKILL.md excerpt

README excerpt

Related Claw Skills

Task Router Skill

x402hub

claw

captchas-openclaw

Modelready

wiz-light-control