Home / Skills / tools / grpo

grpo

maintained by atrawog

star 0 account_tree 0 verified_user MIT License

Overview Implementation Examples History

Group Relative Policy Optimization for reinforcement learning from human feedback. Covers GRPOTrainer, reward function design, policy optimization, and KL divergence constraints for stable RLHF training. Includes thinking-aware reward patterns.

Key Features

Comprehensive skill evaluation and performance tracking
Community-driven ratings and reviews
Easy integration with Claude Code
Regular updates and maintenance

Quick Start

TopRank Skills install atrawog/grpo

chat Comments (0)

chat_bubble_outline

No comments yet. Be the first to share your thoughts!

Skill Details

GitHub Stars 0

GitHub Forks 0

Created Jan 2026

Last Updated il y a 5 mois

tools tools llm ai

Related Skills

ai-sdk

vercel

star 22.3k

chevron_right

planning-with-files

OthmanAdi

star 13.5k

chevron_right

ui-skills

baptisteArno

star 9.7k

chevron_right

biomni

K-Dense-AI

star 8.6k

chevron_right

building-agents

adenhq

star 8.6k

chevron_right

Build your own?

Join 12,000+ developers contributing to the Claude ecosystem.

Sign in to Comment

grpo

Key Features

Quick Start

chat Comments (0)

Skill Details

Related Skills

ai-sdk

planning-with-files

ui-skills

biomni

building-agents

Build your own?