Use when designing custom voices with Alibaba Cloud Model Studio CosyVoice customization models, especially cosyvoice-v3.5-plus or cosyvoice-v3.5-flash, from a voice prompt plus preview text before using the returned voice_id in TTS.
Category: provider
# Model Studio CosyVoice Voice Design
Use the CosyVoice voice enrollment API to create designed voices from a natural-language voice description.
## Critical model names
Use `model="voice-enrollment"` and one of these `target_model` values:
- `cosyvoice-v3.5-plus`
- `cosyvoice-v3.5-flash`
- `cosyvoice-v3-plus`
- `cosyvoice-v3-flash`
Recommended default in this repo:
- `target_model="cosyvoice-v3.5-plus"`
## Region and compatibility
- `cosyvoice-v3.5-plus` and `cosyvoice-v3.5-flash` are available only in China mainland deployment mode (Beijing endpoint).
- In international deployment mode (Singapore endpoint), `cosyvoice-v3-plus` and `cosyvoice-v3-flash` do not support voice clone/design.
- The `target_model` must match the later speech synthesis model.
## Endpoint
- Domestic: `https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization`
- International: `https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization`
## Prerequisites
- Set `DASHSCOPE_API_KEY` in your environment, or add `dashscope_api_key` to `~/.alibabacloud/credentials`.
## Normalized interface (cosyvoice.voice_design)
### Request
- `model` (string, optional): fixed to `voice-enrollment`
- `target_model` (string, optional): default `cosyvoice-v3.5-plus`
- `prefix` (string, required): letters/digits only, max 10 chars
- `voice_prompt` (string, required): max 500 chars, Chinese or English only
- `preview_text` (string, required): max 200 chars, Chinese or English
- `language_hints` (array[string], optional): `zh` or `en`, and should match `preview_text`
- `sample_rate` (int, optional): e.g. `24000`
- `response_format` (string, optional): e.g. `wav`
### Response
- `voice_id` (string)
- `request_id` (string)
- `status` (string, optional)
## Operational guidance
- Keep `voice_prompt` concrete: timbre, age range, pace, emotion, articulation, and scenario.
- If `language_hints` is used, it should match the language of `preview_text`.
- Designed v...