faster-whisper-gpu

High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration. Transcribe audio files locally without sending data to external services.

View Source SKILL.md

Stars

Installs

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词，发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店，若未安装，请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店，然后安装 faster-whisper-gpu 技能。若已安装，则直接安装 faster-whisper-gpu 技能。

Overview

Skill Key: felipeoff/faster-whisper-gpu
Author: felipeoff
Source Repo: openclaw/skills
Version: -
Source Path: skills/felipeoff/faster-whisper-gpu
Latest Commit SHA: 540b25f7d174d709f0854325e782d82a085dcd1b

Extracted Content

SKILL.md excerpt

# 🎙️ Faster Whisper GPU

High-performance local speech-to-text transcription using [Faster Whisper](https://github.com/SYSTRAN/faster-whisper) with NVIDIA GPU acceleration.

## ✨ Features

- **🚀 GPU Accelerated**: Uses NVIDIA CUDA for blazing-fast transcription
- **🔒 100% Local**: No data leaves your machine. Complete privacy.
- **💰 Free Forever**: No API costs. Run unlimited transcriptions.
- **🌍 Multilingual**: Supports 99 languages with automatic detection
- **📁 Multiple Formats**: Input: MP3, WAV, FLAC, OGG, M4A. Output: TXT, SRT, JSON
- **🎯 Multiple Models**: From tiny (fast) to large-v3 (most accurate)
- **🎬 Subtitle Generation**: Create SRT files with word-level timestamps

## 📋 Requirements

### Hardware
- **NVIDIA GPU** with CUDA support (recommended: 4GB+ VRAM)
- Or CPU-only mode (slower but works on any machine)

### Software
- Python 3.8+
- NVIDIA drivers (for GPU support)
- CUDA Toolkit 11.8+ or 12.x

## 🚀 Quick Start

### Installation

```bash
# Install dependencies
pip install faster-whisper torch

# Verify GPU is available
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
```

### Basic Usage

```bash
# Transcribe an audio file (auto-detects GPU)
python transcribe.py audio.mp3

# Specify language explicitly
python transcribe.py audio.mp3 --language pt

# Output as SRT subtitles
python transcribe.py audio.mp3 --format srt --output subtitles.srt

# Use larger model for better accuracy
python transcribe.py audio.mp3 --model large-v3
```

## 🔧 Advanced Usage

### Command Line Options

```bash
python transcribe.py <audio_file> [options]

Options:
  --model {tiny,base,small,medium,large-v1,large-v2,large-v3}
                        Model size to use (default: base)
  --language LANG       Language code (e.g., 'pt', 'en', 'es'). Auto-detect if not specified.
  --format {txt,srt,json,vtt}
                        Output format (default: txt)
  --output FILE         Output file path (default: stdout)
  --device {cuda,cp...

README excerpt

# 🎙️ Faster Whisper GPU - OpenClaw Skill

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![CUDA](https://img.shields.io/badge/CUDA-11.8%2B-green.svg)](https://developer.nvidia.com/cuda-downloads)

> High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration.

## ✨ Why This Skill?

- **🔒 Privacy First**: Your audio never leaves your machine
- **⚡ GPU Accelerated**: 10-20x faster than CPU transcription
- **💰 Zero API Costs**: Unlimited transcriptions, forever free
- **🌍 99 Languages**: Automatic language detection
- **🎯 Perfect for OpenClaw**: Seamless integration with your agent workflows

## 🚀 Quick Start

### 1. Install Dependencies

```bash
pip install faster-whisper torch
```

### 2. Verify GPU Support

```bash
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
```

### 3. Transcribe!

```bash
python transcribe.py audio.mp3
```

## 📖 Usage Examples

### Basic Transcription
```bash
python transcribe.py meeting.mp3
```

### Portuguese Audio to SRT Subtitles
```bash
python transcribe.py podcast.mp3 --language pt --format srt --output podcast.srt
```

### High-Accuracy Mode
```bash
python transcribe.py interview.mp3 --model large-v3 --vad_filter --word_timestamps
```

### Translate to English
```bash
python transcribe.py japanese.mp3 --task translate --format txt
```

## 🛠️ Requirements

### Hardware
- NVIDIA GPU with 4GB+ VRAM (recommended)
- Or CPU-only mode (slower)

### Software
- Python 3.8+
- NVIDIA Drivers
- CUDA Toolkit 11.8+ or 12.x

## 📊 Performance

| Model | VRAM | Speed (RTX 4090) | Accuracy |
|-------|------|------------------|----------|
| tiny | 1 GB | ~32x realtime | Basic |
| base | 1 GB | ~16x realtime | Good |
| small | 2 GB | ~6x realtime | Better |
| medium | 5 GB | ~2x realtime | Great |
| large-v3 | 10...

Related Claw Skills

heyixuan2

bambu-studio-ai

★ 41

Bambu Lab 3D printer control and automation. Activate when user mentions: printer status, 3D printing, slice, analyze model, generate 3D, AMS filament, print monitor, Bambu Lab, or any 3D printing task. Full pipeline: search → generate → analyze → colorize → preview → open BS → user slice → print → monitor. Supports all 9 Bambu Lab printers (A1 Mini, A1, P1S, P2S, X1C, X1E, H2C, H2S, H2D).

capt-marbles

geo-optimization

★ 1

Generative Engine Optimization (GEO) for AI search visibility. Optimize content to appear in ChatGPT, Perplexity, Claude, and Google AI Overviews. Use when optimizing websites, pages, or content for LLM discoverability and citation.

carlulsoe

parakeet-stt

★ 0

Local speech-to-text with NVIDIA Parakeet TDT 0.6B v3 (ONNX on CPU). 30x faster than Whisper, 25 languages, auto-detection, OpenAI-compatible API. Use when transcribing audio files, converting speech to text, or processing voice recordings locally without cloud APIs.

carlzhao007

feishu-process-feedback

★ 0

飞书消息自动处理与进度反馈技能。安装后后台运行，监听飞书任务消息并自动创建独立进程处理。在处理前后发送实时进度反馈（任务确认、进度百分比、完成通知）。支持任务类型识别、智能解析、错误重试、并发控制、状态持久化。使用场景：飞书自动化工作流、任务进度追踪、批量任务处理、需要实时反馈的场景。

cartoonitunes

bottyfans

★ 0

BottyFans agent skill for autonomous creator monetization. Lets AI agents register, build a profile, publish posts (public, subscriber-only, or pay-to-unlock), upload media, accept USDC subscriptions and tips on Base, send and receive DMs, track earnings, and appear on the creator leaderboard. Use this skill when an agent needs to monetize content, interact with fans, manage a creator profile, handle payments in USDC, or operate as an autonomous creator on the BottyFans platform.

camopel

arxivkb

★ 0

Local arXiv paper manager with semantic search. Crawls arXiv categories, downloads PDFs, chunks content, and indexes with FAISS + Ollama embeddings. No cloud API keys required — everything runs locally.