voice-stt-tts

Full voice message setup (STT + TTS) for OpenClaw using faster-whisper and Edge TTS

View Source SKILL.md

Stars

Installs

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词，发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店，若未安装，请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店，然后安装 voice-stt-tts 技能。若已安装，则直接安装 voice-stt-tts 技能。

Overview

Skill Key: aksenkin/voice-stt-tts
Author: aksenkin
Source Repo: openclaw/skills
Version: -
Source Path: skills/aksenkin/voice-stt-tts
Latest Commit SHA: 750d1ec1b9f25d251b1744805d13a6565a77a786

Extracted Content

SKILL.md excerpt

# Voice Messages (STT + TTS) for OpenClaw 🎙️

Complete voice message setup using **faster-whisper** for transcription and **Edge TTS** for voice replies.

## What we configure

- ✅ **STT** (Speech-to-Text) — transcribe voice messages via faster-whisper
- ✅ **TTS** (Text-to-Speech) — voice replies via Edge TTS
- 🎯 **Result:** voice → text → reply with voice

---

## Installation

### 1. Create virtual environment (venv)

For Ubuntu create an isolated venv:

```bash
python3 -m venv ~/.openclaw/workspace/voice-messages
```

### 2. Install faster-whisper

Install packages in venv:

```bash
~/.openclaw/workspace/voice-messages/bin/pip install faster-whisper
```

**What gets installed:**
- `faster-whisper` — Python library for transcription
- Dependencies: `ctranslate2`, `onnxruntime`, `huggingface-hub`, `av`, `numpy`, and others.
- Size: ~250 MB

---

## Transcription Script

### Path and content

**File:** `~/.openclaw/workspace/voice-messages/transcribe.py`

```python
#!/usr/bin/env python3
import argparse
from faster_whisper import WhisperModel


def transcribe(audio_path: str, model_name: str = "small", lang: str = "en", device: str = "cpu") -> str:
    model = WhisperModel(
        model_name,
        device=device,
        compute_type="int8" if device == "cpu" else "float16",
    )
    segments, _ = model.transcribe(audio_path, language=lang, vad_filter=True)
    text = " ".join(seg.text.strip() for seg in segments if seg.text and seg.text.strip()).strip()
    return text


def main():
    p = argparse.ArgumentParser()
    p.add_argument("--audio", required=True)
    p.add_argument("--model", default="small")
    p.add_argument("--lang", default="en")
    p.add_argument("--device", default="cpu", choices=["cpu", "cuda"])
    args = p.parse_args()

    text = transcribe(args.audio, args.model, args.lang, args.device)
    print(text if text else "")


if __name__ == "__main__":
    main()
```

**What the script does:**
1. Accepts audio file path (`--audio`)
2. Lo...

Related Claw Skills

openstockdata

stock-data-skill

★ 4

OpenClaw Skill for stock data analysis

capt-marbles

geo-optimization

★ 1

Generative Engine Optimization (GEO) for AI search visibility. Optimize content to appear in ChatGPT, Perplexity, Claude, and Google AI Overviews. Use when optimizing websites, pages, or content for LLM discoverability and citation.

capgoblin

credex-protocol

★ 0

Access unsecured credit lines for AI agents on the Arc Network using the Credex Protocol. Use for borrowing USDC against reputation, repaying debt to grow credit limits, providing liquidity as an LP, or managing cross-chain USDC via Circle Bridge. Triggers on "borrow from credex", "repay debt", "deposit to pool", "check credit status", "provide liquidity", or any credit/lending task on Arc.

capt-marbles

phantombuster

★ 0

Control PhantomBuster automation agents via API. List agents, launch automations, get output/results, check status, and abort running agents. Use when the user needs to run LinkedIn scraping, Twitter automation, lead generation phantoms, or any PhantomBuster workflow.

camelsprout

duckdb-en

★ 0

DuckDB CLI specialist for SQL analysis, data processing and file conversion. Use for SQL queries, CSV/Parquet/JSON analysis, database queries, or data conversion. Triggers on "duckdb", "sql", "query", "data analysis", "parquet", "convert data".

camohiddendj

ddg-search

★ 0

DuckDuckGo HTML search scraper CLI with JSON, CSV, OpenSearch, markdown, and compact outputs.