tandemn-tuna

Deploy and serve LLM models on GPU. Compare GPU pricing. Launch vLLM on Modal, RunPod, Cerebrium, Cloud Run, Baseten, or Azure with spot instance fallback. OpenAI-compatible inference endpoint.

View Source SKILL.md

Stars

Installs

Status

ACTIVE

Visibility

PUBLIC

安装方式

直接复制以下提示词，发送给你的 AI 助手即可完成安装。

请先检查是否已安装 SkillHub 商店，若未安装，请根据 https://skillhub-1388575217.cos.ap-guangzhou.myqcloud.com/install/skillhub.md 安装 SkillHub 商店，然后安装 tandemn-tuna 技能。若已安装，则直接安装 tandemn-tuna 技能。

Overview

Skill Key: choprahetarth/tandemn-tuna
Author: choprahetarth
Source Repo: openclaw/skills
Version: 0.0.1
Source Path: skills/choprahetarth/tandemn-tuna
Latest Commit SHA: 1dbfdb80ef72bc7b3d587dcbe992adf80222fbf1

Extracted Content

SKILL.md excerpt

# Tuna — Deploy and Serve LLM Models on GPU Infrastructure

Tuna is a hybrid GPU inference orchestrator. It lets you deploy, serve, and manage LLM models (Llama, Qwen, Mistral, DeepSeek, Gemma, and any HuggingFace model) on serverless GPUs from **Modal, RunPod, Cerebrium, Google Cloud Run, Baseten, or Azure Container Apps**, with optional **spot instance fallback on AWS** via SkyPilot. Every deployment gets an **OpenAI-compatible `/v1/chat/completions` endpoint**.

The key idea: serverless GPUs handle requests immediately (fast cold start, pay-per-second) while a cheaper spot GPU boots in the background. Once spot is ready, traffic shifts there. If spot gets preempted, traffic falls back to serverless automatically. This gives you **3–5x cost savings** over pure serverless with zero downtime.

## Quick Start — Deploy a Model in 3 Commands

```bash
# 1. Install tuna
uv pip install tandemn-tuna

# 2. Deploy a model (auto-picks cheapest serverless provider for the GPU)
tuna deploy --model Qwen/Qwen3-0.6B --gpu L4 --service-name my-llm

# 3. Query your endpoint (shown in deploy output)
curl http://<router-ip>:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "Qwen/Qwen3-0.6B", "messages": [{"role": "user", "content": "Hello!"}]}'
```

For serverless-only (no spot, no AWS needed):

```bash
tuna deploy --model Qwen/Qwen3-0.6B --gpu L4 --serverless-only
```

## All Commands

### `tuna deploy` — Launch a model on GPU

Deploy a model across serverless + spot infrastructure. This is the main command.

```bash
tuna deploy --model <HuggingFace-model-ID> --gpu <GPU> [options]
```

**Required arguments:**
- `--model` — HuggingFace model ID (e.g., `Qwen/Qwen3-0.6B`, `meta-llama/Llama-3-70b`)
- `--gpu` — GPU type (e.g., `T4`, `L4`, `L40S`, `A100`, `H100`, `B200`)

**Common options:**
- `--service-name` — Name for the deployment (auto-generated if omitted)
- `--serverless-provider` — Force a specific provider: `modal`, `runpod`, `cloudrun`, `baseten`...

Related Claw Skills

aicodelion

agent-pack-n-go

★ 73

🚀 Clone your OpenClaw AI Agent to a new device in ~25 minutes — configs, memory, skills, everything.

cacheforge-ai

cacheforge-skills

★ 8

⚡ SOTA agent skills for OpenClaw — observability, security, code quality, incident response, and more. Built by Anvil AI.

zjianru

restart-guard

★ 5

OpenClaw Skill: Safely restart the Gateway with context preservation, guardian watchdog, and multi-channel notification

jgm2025

linux-patcher-skill

★ 3

Automated Linux server patching with PatchMon integration for OpenClaw

cyrustmods

OPENCLAW-SKILL-SAFE

★ 1

🛡️ Audit and verify OpenClaw skills for safety, ensuring quality with 395 safe skills from an in-depth analysis of over 4,000 entries.

suryast

free-ai-agent-skills

★ 1

No summary available.

Analysis Signals

Dependencies

gh pip uv go tandemn-tuna modal google-cloud-run truss azure-mgmt-appcontainers cerebrium

External Services

openai x