🎉 Exciting news! We've launched Nakxi, a powerful design tool for creating designs, app screenshots and mockups. Check it out today! 🚀
Code & Dev Compatible: claude

LLM Eval Suite Builder (Golden Sets)

Complexity Level: Expert
Prompt Code Block

You are an ML engineer building an evaluation harness for [AI_FEATURE] (e.g., support bot, codegen, summarizer). Create: 1) Eval taxonomy: capability buckets (accuracy, safety, tone, latency, tool-use) with weights summing to 100% 2) 30 golden test cases in JSONL: {id, input, context, expected, rubric, tags, difficulty} 3) Scorers: rule-based checks + LLM-as-judge prompt (include calibration examples for 1/3/5 scores) 4) Regression policy: when to block release (thresholds per bucket) 5) CI integration sketch (GitHub Actions job stages, artifact uploads) 6) Human review protocol: 10% sample weekly, disagreement resolution Domain rules: [DOMAIN_RULES]. Forbidden outputs: [FORBIDDEN]. Brand voice: [VOICE].

🌟 Example Output / Preview

### Generated Component Preview: ```typescript // Fully validated modern structure import { z } from 'zod'; export const RequestSchema = z.object({ id: z.string().uuid(), createdAt: z.date().default(() => new Date()), data: z.record(z.string(), z.any()) }); export type ValidatedRequest = z.infer<typeof RequestSchema>; ```

Prompt Metadata

DifficultyExpert
Compatibilityclaude

Primary Use Cases:

  • Legacy code modernization & technical refactoring
  • Full-stack layout generation & component structuring
  • CI/CD workflow automation & unit/E2E testing suites

Associated Tags:

#evals #testing #llm #quality

💡 Pro Tips & Advice

1. Use bracketed items: Be sure to fill out all [PLACEHOLDER] elements with specific details before sending the prompt to the AI model.

2. Adjust temperature: For creative tasks, set AI temperature higher (e.g., 0.8), or lower (e.g., 0.2) for strict coding/technical tasks.

🔗 Related AI Prompts

Code & Dev
★ Featured 🔥 Trending

Refactor legacy JavaScript to modern

Act as a Senior Frontend Engineer. Refactor the following legacy JavaScript code to modern ES2024 standards. Use const/let, arrow ...

Compatible:claudeDiff:Expert
#code-dev #refactor #legacy
Code & Dev
★ Featured

Generate Tailwind component

Create a responsive, accessible React component using Tailwind CSS for a [UI element, e.g., Pricing Table with 3 tiers]. Include h...

Compatible:claudeDiff:Beginner
#code-dev #generate #tailwind
Code & Dev
★ Featured

Playwright E2E test suite

Write a Playwright end-to-end test suite in TypeScript for a standard user login flow. Include tests for: successful login, invali...

Compatible:claudeDiff:Intermediate
#code-dev #playwright #e2e