TopRank Skills

Home / Claw Skills / API 集成 / voice-assistant
Official OpenClaw rules 36%

voice-assistant

Real-time voice assistant for OpenClaw. Streams mic audio through configurable STT (Deepgram or ElevenLabs) into your OpenClaw agent, then speaks the response via configurable TTS (Deepgram Aura or ElevenLabs). Sub-2s time-to-first-audio with full streaming at every stage.

Stars

0

Installs

0

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词,发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店,若未安装,请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店,然后安装 voice-assistant 技能。 若已安装,则直接安装 voice-assistant 技能。

Overview

Skill Key
charantejmandali18/voice-assistant
Author
charantejmandali18
Source Repo
openclaw/skills
Version
-
Source Path
skills/charantejmandali18/voice-assistant
Latest Commit SHA
ea2df3c853dd2759bc79425fbcf16018fc0b978d

Extracted Content

SKILL.md excerpt

# Voice Assistant

Real-time voice interface for your OpenClaw agent. Talk to your agent and hear it respond — with configurable STT and TTS providers, full streaming at every stage, and sub-2 second time-to-first-audio.

## Architecture

```
Browser Mic → WebSocket → STT (Deepgram / ElevenLabs) → Text
  → OpenClaw Gateway (/v1/chat/completions, streaming) → Response Text
  → TTS (Deepgram Aura / ElevenLabs) → Audio chunks → Browser Speaker
```

The voice interface connects to your running OpenClaw gateway's OpenAI-compatible endpoint. It's the same agent with all its context, tools, and memory — just with a voice.

## Quick Start

```bash
cd {baseDir}
cp .env.example .env
# Fill in your API keys and gateway URL
uv run scripts/server.py
# Open http://localhost:7860 and click the mic
```

## Supported Providers

### STT (Speech-to-Text)
| Provider   | Model            | Latency  | Notes                        |
|-----------|------------------|----------|------------------------------|
| Deepgram  | nova-2 (streaming) | ~200-300ms | WebSocket streaming, best accuracy/speed |
| ElevenLabs | Scribe v1        | ~300-500ms | REST-based, good multilingual |

### TTS (Text-to-Speech)
| Provider    | Model        | Latency  | Notes                          |
|------------|--------------|----------|--------------------------------|
| Deepgram   | aura-2       | ~200ms   | WebSocket streaming, low cost  |
| ElevenLabs | Turbo v2.5   | ~300ms   | Best voice quality, streaming   |

## Configuration

All configuration is via environment variables in `.env`:

```bash
# === Required ===
OPENCLAW_GATEWAY_URL=http://localhost:4141/v1    # Your OpenClaw gateway
OPENCLAW_MODEL=claude-sonnet-4-5-20250929        # Model your gateway routes to

# === STT Provider (pick one) ===
VOICE_STT_PROVIDER=deepgram                      # "deepgram" or "elevenlabs"
DEEPGRAM_API_KEY=your-key-here                   # Required if STT=deepgram
ELEVENLABS_API_KEY=your-key-here                 # Required...

Related Claw Skills