Overview
- Skill Key
- choprahetarth/tandemn-tuna
- Author
- choprahetarth
- Source Repo
- openclaw/skills
- Version
- 0.0.1
- Source Path
- skills/choprahetarth/tandemn-tuna
- Latest Commit SHA
- 1dbfdb80ef72bc7b3d587dcbe992adf80222fbf1
Deploy and serve LLM models on GPU. Compare GPU pricing. Launch vLLM on Modal, RunPod, Cerebrium, Cloud Run, Baseten, or Azure with spot instance fallback. OpenAI-compatible inference endpoint.
Stars
0
Installs
0
Status
ACTIVE
Visibility
PUBLIC
直接复制以下提示词,发送给你的 AI 助手即可完成安装。
请先检查是否已安装 SkillHub 商店,若未安装,请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店,然后安装 tandemn-tuna 技能。 若已安装,则直接安装 tandemn-tuna 技能。
# Tuna — Deploy and Serve LLM Models on GPU Infrastructure
Tuna is a hybrid GPU inference orchestrator. It lets you deploy, serve, and manage LLM models (Llama, Qwen, Mistral, DeepSeek, Gemma, and any HuggingFace model) on serverless GPUs from **Modal, RunPod, Cerebrium, Google Cloud Run, Baseten, or Azure Container Apps**, with optional **spot instance fallback on AWS** via SkyPilot. Every deployment gets an **OpenAI-compatible `/v1/chat/completions` endpoint**.
The key idea: serverless GPUs handle requests immediately (fast cold start, pay-per-second) while a cheaper spot GPU boots in the background. Once spot is ready, traffic shifts there. If spot gets preempted, traffic falls back to serverless automatically. This gives you **3–5x cost savings** over pure serverless with zero downtime.
## Quick Start — Deploy a Model in 3 Commands
```bash
# 1. Install tuna
uv pip install tandemn-tuna
# 2. Deploy a model (auto-picks cheapest serverless provider for the GPU)
tuna deploy --model Qwen/Qwen3-0.6B --gpu L4 --service-name my-llm
# 3. Query your endpoint (shown in deploy output)
curl http://<router-ip>:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "Qwen/Qwen3-0.6B", "messages": [{"role": "user", "content": "Hello!"}]}'
```
For serverless-only (no spot, no AWS needed):
```bash
tuna deploy --model Qwen/Qwen3-0.6B --gpu L4 --serverless-only
```
## All Commands
### `tuna deploy` — Launch a model on GPU
Deploy a model across serverless + spot infrastructure. This is the main command.
```bash
tuna deploy --model <HuggingFace-model-ID> --gpu <GPU> [options]
```
**Required arguments:**
- `--model` — HuggingFace model ID (e.g., `Qwen/Qwen3-0.6B`, `meta-llama/Llama-3-70b`)
- `--gpu` — GPU type (e.g., `T4`, `L4`, `L40S`, `A100`, `H100`, `B200`)
**Common options:**
- `--service-name` — Name for the deployment (auto-generated if omitted)
- `--serverless-provider` — Force a specific provider: `modal`, `runpod`, `cloudrun`, `baseten`...
aicodelion
🚀 Clone your OpenClaw AI Agent to a new device in ~25 minutes — configs, memory, skills, everything.
cacheforge-ai
⚡ SOTA agent skills for OpenClaw — observability, security, code quality, incident response, and more. Built by Anvil AI.
zjianru
OpenClaw Skill: Safely restart the Gateway with context preservation, guardian watchdog, and multi-channel notification
jgm2025
Automated Linux server patching with PatchMon integration for OpenClaw
cyrustmods
🛡️ Audit and verify OpenClaw skills for safety, ensuring quality with 395 safe skills from an in-depth analysis of over 4,000 entries.
suryast
No summary available.