voice-assistant

Real-time voice assistant for OpenClaw. Streams mic audio through configurable STT (Deepgram or ElevenLabs) into your OpenClaw agent, then speaks the response via configurable TTS (Deepgram Aura or ElevenLabs). Sub-2s time-to-first-audio with full streaming at every stage.

View Source SKILL.md

Stars

Installs

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词，发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店，若未安装，请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店，然后安装 voice-assistant 技能。若已安装，则直接安装 voice-assistant 技能。

Overview

Skill Key: charantejmandali18/voice-assistant
Author: charantejmandali18
Source Repo: openclaw/skills
Version: -
Source Path: skills/charantejmandali18/voice-assistant
Latest Commit SHA: ea2df3c853dd2759bc79425fbcf16018fc0b978d

Extracted Content

SKILL.md excerpt

# Voice Assistant

Real-time voice interface for your OpenClaw agent. Talk to your agent and hear it respond — with configurable STT and TTS providers, full streaming at every stage, and sub-2 second time-to-first-audio.

## Architecture

```
Browser Mic → WebSocket → STT (Deepgram / ElevenLabs) → Text
  → OpenClaw Gateway (/v1/chat/completions, streaming) → Response Text
  → TTS (Deepgram Aura / ElevenLabs) → Audio chunks → Browser Speaker
```

The voice interface connects to your running OpenClaw gateway's OpenAI-compatible endpoint. It's the same agent with all its context, tools, and memory — just with a voice.

## Quick Start

```bash
cd {baseDir}
cp .env.example .env
# Fill in your API keys and gateway URL
uv run scripts/server.py
# Open http://localhost:7860 and click the mic
```

## Supported Providers

### STT (Speech-to-Text)
| Provider   | Model            | Latency  | Notes                        |
|-----------|------------------|----------|------------------------------|
| Deepgram  | nova-2 (streaming) | ~200-300ms | WebSocket streaming, best accuracy/speed |
| ElevenLabs | Scribe v1        | ~300-500ms | REST-based, good multilingual |

### TTS (Text-to-Speech)
| Provider    | Model        | Latency  | Notes                          |
|------------|--------------|----------|--------------------------------|
| Deepgram   | aura-2       | ~200ms   | WebSocket streaming, low cost  |
| ElevenLabs | Turbo v2.5   | ~300ms   | Best voice quality, streaming   |

## Configuration

All configuration is via environment variables in `.env`:

```bash
# === Required ===
OPENCLAW_GATEWAY_URL=http://localhost:4141/v1    # Your OpenClaw gateway
OPENCLAW_MODEL=claude-sonnet-4-5-20250929        # Model your gateway routes to

# === STT Provider (pick one) ===
VOICE_STT_PROVIDER=deepgram                      # "deepgram" or "elevenlabs"
DEEPGRAM_API_KEY=your-key-here                   # Required if STT=deepgram
ELEVENLABS_API_KEY=your-key-here                 # Required...

Related Claw Skills

human-pages-ai

humanpages

★ 3

Search and hire real humans for tasks — photography, delivery, research, and more

zseven-w

openclaw-skills

★ 1

Reusable skill templates for OpenClaw AI agents. Templates for API integration, data processing, web scraping, CLI tools, and file processing.

capt-marbles

attio

★ 0

Attio CRM integration for managing companies, people, deals, notes, tasks, and custom objects. Use when working with Attio CRM data, searching contacts, managing sales pipelines, adding notes to records, creating tasks, or syncing prospect information.

capt-marbles

firecrawl

★ 0

Web scraping and crawling with Firecrawl API. Fetch webpage content as markdown, take screenshots, extract structured data, search the web, and crawl documentation sites. Use when the user needs to scrape a URL, get current web info, capture a screenshot, extract specific data from pages, or crawl docs for a framework/library.

caqlayan

Tweet Processor

★ 0

Tweet Processor Skill

carlosarturoleon

windsor-ai

★ 0

Connect to Windsor.ai MCP for natural language access to 325+ data sources including Facebook Ads, GA4, HubSpot, Shopify, and more.