audio-speaker-tools

Speaker separation, voice comparison, and audio processing tools. Use when working with multi-speaker audio, voice cloning, or speaker verification tasks including: (1) separating speakers from audio files via Demucs and pyannote diarization, (2) comparing voice samples for speaker verification or voice clone quality assessment using Resemblyzer, (3) extracting audio segments, (4) preparing samples for ElevenLabs voice cloning, or (5) validating speaker diarization results.

View Source SKILL.md

Stars

Installs

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词，发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店，若未安装，请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店，然后安装 audio-speaker-tools 技能。若已安装，则直接安装 audio-speaker-tools 技能。

Overview

Skill Key: cmfinlan/audio-speaker-tools
Author: cmfinlan
Source Repo: openclaw/skills
Version: -
Source Path: skills/cmfinlan/audio-speaker-tools
Latest Commit SHA: 040757688db490a94f7ab9f2328bcd6bd3c9e7dd

Extracted Content

SKILL.md excerpt

# Audio Speaker Tools

Tools for speaker separation, voice comparison, and audio processing using Demucs, pyannote, and Resemblyzer.

## Overview

This skill provides three main workflows:

1. **Speaker separation** - Extract per-speaker audio from multi-speaker recordings
2. **Voice comparison** - Measure speaker similarity between two audio files
3. **Audio processing** - Segment extraction and voice isolation

## Prerequisites

### Setup Virtual Environment

Run once to create the venv and install dependencies:

```bash
bash scripts/setup_venv.sh
```

Default venv location: `./.venv`

**Requirements:**
- Python 3.9+
- ffmpeg (`brew install ffmpeg`)
- HuggingFace token (set as env var `HF_TOKEN`)

## Scripts

### 1. Speaker Separation: `diarize_and_slice_mps.py`

Separate speakers from multi-speaker audio:

```bash
# Basic usage
HF_TOKEN=<your-hf-token> \
  /path/to/venv/bin/python scripts/diarize_and_slice_mps.py \
  --input audio.mp3 \
  --outdir /path/to/output \
  --prefix MyShow

# With speaker constraints
HF_TOKEN=$TOKEN python scripts/diarize_and_slice_mps.py \
  --input audio.mp3 \
  --outdir ./out \
  --min-speakers 2 \
  --max-speakers 5 \
  --pad-ms 100
```

**Process:**
1. Converts input to 16kHz mono WAV
2. Runs Demucs vocal/background separation (optional, for cleaner input)
3. Runs pyannote speaker diarization (MPS-accelerated)
4. Extracts concatenated per-speaker WAV files

**Output:**
- `<prefix>_speaker1.wav`, `<prefix>_speaker2.wav`, etc. (one per detected speaker)
- `diarization.rttm` (time-stamped speaker segments)
- `segments.jsonl` (JSON segments metadata)
- `meta.json` (pipeline info and speaker index)

**Important:**
- **Always pass HF token via `HF_TOKEN` env var**, never as CLI arg
- **MPS first, CPU fallback** - Script prefers Metal GPU, falls back to CPU if unavailable
- Default output: `./separated/`

### 2. Voice Comparison: `compare_voices.py`

Measure similarity between two voice samples using Resemblyzer:

```bash
# Basic comparis...

Related Claw Skills

openstockdata

stock-data-skill

★ 4

OpenClaw Skill for stock data analysis

capt-marbles

geo-optimization

★ 1

Generative Engine Optimization (GEO) for AI search visibility. Optimize content to appear in ChatGPT, Perplexity, Claude, and Google AI Overviews. Use when optimizing websites, pages, or content for LLM discoverability and citation.

capgoblin

credex-protocol

★ 0

Access unsecured credit lines for AI agents on the Arc Network using the Credex Protocol. Use for borrowing USDC against reputation, repaying debt to grow credit limits, providing liquidity as an LP, or managing cross-chain USDC via Circle Bridge. Triggers on "borrow from credex", "repay debt", "deposit to pool", "check credit status", "provide liquidity", or any credit/lending task on Arc.

capt-marbles

phantombuster

★ 0

Control PhantomBuster automation agents via API. List agents, launch automations, get output/results, check status, and abort running agents. Use when the user needs to run LinkedIn scraping, Twitter automation, lead generation phantoms, or any PhantomBuster workflow.

camelsprout

duckdb-en

★ 0

DuckDB CLI specialist for SQL analysis, data processing and file conversion. Use for SQL queries, CSV/Parquet/JSON analysis, database queries, or data conversion. Triggers on "duckdb", "sql", "query", "data analysis", "parquet", "convert data".

camohiddendj

ddg-search

★ 0

DuckDuckGo HTML search scraper CLI with JSON, CSV, OpenSearch, markdown, and compact outputs.

Analysis Signals

Dependencies

gh pip python go