GEPA Prompt Evolution (GEPA-SPO)
Genetic-Pareto prompt optimizer to evolve system prompts from a few rollouts. GEPA performs natural-language reflection over full trajectories, mutates prompts with multiple strategies, and maintains a Pareto frontier rather than collapsing to a single candidate.This is NOT the official GEPA implementation.
🚀 Quick Start
# Set your OpenAI API key
export OPENAI_API_KEY="your-api-key-here"
# Run optimization with detailed logging
npx gepa-spo \
--input ./examples/input.prompts.json \
--config ./examples/config.json \
--log
What you get:
- ✅ CLI Tool: Optimize prompts from JSON inputs with detailed statistics
- ✅ Modular Systems: Support for multi-component prompts with intelligent crossover
- ✅ Core API: TypeScript library for custom integrations
- ✅ Persistence: Resume interrupted runs, export best prompts
- ✅ Strategy Bandit: Adaptive strategy selection via UCB1
- ✅ Enhanced Logging: Comprehensive performance tracking and percentage improvements
📖 Documentation
📚 Complete Documentation - Comprehensive guides and references
🚀 Getting Started
- Quick Start Guide - Get up and running in minutes
- Basic Concepts - Understanding GEPA fundamentals
📖 User Guides
- CLI Reference - Complete command-line interface documentation
- Modular Systems - Multi-component prompt optimization
- Configuration Guide - All configuration options and settings
🔧 Developer Guides
- API Reference - Programmatic API documentation
- TypeScript Types - Complete type definitions
🔬 Technical Documentation
- GEPA Algorithm - Detailed algorithm explanation
- Research Background - Academic background and methodology
🎯 Key Features
📊 Enhanced Logging & Statistics
When --log
is enabled, GEPA provides comprehensive performance tracking:
📊 PERFORMANCE STATISTICS
├─ Initial Score: 0.523
├─ Final Score: 0.789
├─ Absolute Improvement: 0.266
├─ Percentage Improvement: 50.9%
├─ Iterations Completed: 15
├─ Candidates Generated: 18
├─ Candidates Accepted: 12 (66.7%)
├─ Crossover Operations: 4 (22.2%)
├─ Mutation Operations: 8
├─ Strategy Switches: 2
├─ Budget Used: 85/100 (85.0%)
├─ Data Split: Pareto=5, Feedback=10, Holdout=2
└─ Efficiency: 0.0093 score per budget unit
Single-System Optimization
{
"system": "You are a helpful assistant. Be concise.",
"prompts": [
{ "id": "p1", "user": "What are the benefits of unit testing?" }
]
}
Modular System Optimization
{
"modules": [
{ "id": "personality", "prompt": "You are friendly and helpful." },
{ "id": "safety", "prompt": "Never provide harmful content." }
],
"prompts": [
{ "id": "p1", "user": "What are the benefits of unit testing?" }
]
}
Advanced Features
- Round-robin mutation for modular systems
- Intelligent crossover combining complementary modules
- Trace-aware reflection with execution context
- Holdout gating to prevent overfitting
- Strategy bandit for adaptive optimization
- Detailed performance tracking with percentage improvements
🔧 Installation
Quick Start (Recommended)
# No installation needed - runs via npx
npx gepa-spo --help
Local Development
# Clone and install
git clone https://github.com/BeMoreDifferent/GEPA-Prompt-Evolution.git
cd GEPA-Prompt-Evolution
pnpm install
pnpm build
# Run locally
node dist/cli.js --help
📋 Requirements
- Node.js >= 18
- OpenAI API Key (or compatible endpoint)
- pnpm (recommended) or npm
🛠️ Usage Examples
Basic Optimization with Logging
npx gepa-spo \
--input ./examples/input.prompts.json \
--config ./examples/config.json \
--log
Modular System with Debug Logging
npx gepa-spo \
--input ./examples/input.modules.json \
--config ./examples/config.modular.json \
--log \
--log-level debug
Resume Interrupted Run
npx gepa-spo --resume ./runs/2024-01-15T10-30-45Z-demo-abc123
Save Best Prompt
npx gepa-spo \
--input ./input.json \
--config ./config.json \
--out ./best-prompt.txt \
--log
⚙️ Configuration
Basic Config
{
"actorModel": "gpt-5-mini",
"judgeModel": "gpt-5-mini",
"budget": 100,
"minibatchSize": 4,
"paretoSize": 8,
"crossoverProb": 0.2,
"rubric": "Correctness, clarity, and conciseness."
}
Key Settings
budget
: Total LLM calls for optimization (50-200 recommended)minibatchSize
: Items evaluated per iteration (2-6)paretoSize
: Items for multi-objective tracking (2-12)crossoverProb
: Probability of crossover vs mutation [0,1]rubric
: Evaluation criteria for optimization
See Configuration Guide for complete options.
🔌 Programmatic API
import { runGEPA_System } from 'gepa-spo/dist/gepa.js';
import { makeOpenAIClients } from 'gepa-spo/dist/llm_openai.js';
const { actorLLM } = makeOpenAIClients({
apiKey: process.env.OPENAI_API_KEY!,
actorModel: 'gpt-5-mini'
});
const best = await runGEPA_System(seed, taskItems, {
execute: async ({ candidate, item }) => ({
output: await actorLLM.complete(`${candidate.system}\n\nUser: ${item.user}`)
}),
mu: () => 0,
muf: async ({ item, output }) => ({ score: 0.5, feedbackText: 'neutral' }),
llm: actorLLM,
budget: 50,
minibatchSize: 3,
paretoSize: 4
});
console.log(best.system);
🧪 Testing
# Run all tests
pnpm test
# Type checking
pnpm typecheck
# Build
pnpm build
# End-to-end smoke test
pnpm build && node dist/cli.js \
--input ./examples/input.min.prompts.json \
--config ./examples/config.min.json \
--log
📁 Project Structure
GEPA-Prompt-Evolution/
├── src/ # Core TypeScript source
├── tests/ # Test suite
├── examples/ # Example configs and inputs
├── strategies/ # Strategy hints for optimization
├── docs/ # 📚 Comprehensive documentation
│ ├── getting-started/ # New user guides
│ ├── user-guides/ # User documentation
│ ├── developer-guides/ # Developer documentation
│ ├── technical/ # Technical documentation
│ └── reference/ # Reference materials
├── CLI_DOCUMENTATION.md # Legacy CLI reference
├── MODULE_DOCUMENTATION.md # Legacy module guide
└── CONTRIBUTING.md # Contribution guidelines
🤝 Contributing
We welcome contributions! Please see Contributing Guide for details.
Quick start for contributors:
- Fork the repository
- Create a feature branch
- Make your changes
- Run
pnpm test
andpnpm typecheck
- Submit a pull request
📄 License
This project is licensed under the MIT License - see LICENSE for details.
🔬 Research
GEPA (Genetic-Pareto) is a prompt optimization method that:
- Uses natural-language reflection on full system trajectories
- Maintains a Pareto frontier of high-performing candidates
- Achieves sample-efficient adaptation with up to 35× fewer rollouts
- Outperforms GRPO by ~10% on average and MIPROv2 by >10%
For detailed technical information, see the AI instructions and Technical Documentation.
🆘 Support
- Documentation: Complete Documentation
- Issues: GitHub Issues
- Examples: Check the
examples/
directory for working configurations - FAQ: Frequently Asked Questions
Made with ❤️ for the AI community