$ initializing eval pipeline...
$ loading 8 production scenarios
$ scoring rubric: 5 dimensions × 5-point scale
$ judge model: calibrated LLM panel
$ mode: open-ended structured response evaluation
Ready. Awaiting candidate input. ▊
The AI PM Eval
This isn't a quiz. There are no multiple choice answers to guess.
8 production scenarios. Each one breaks down into focused sub-questions that scaffold your thinking — exactly the way a senior AI PM would structure a response. Then an LLM judge panel scores you across 5 dimensions, the same way you'd evaluate an AI agent's output.
Scoring Rubric — 5 Dimensions × 5-Point Scale
Do you see the whole picture or just the immediate problem?
Do you understand the underlying architecture?
Do you acknowledge what you're giving up?
Could an engineer build from this?
Do you think about what goes wrong?
Each response is scored 1-5 on every dimension. 40 individual scores total. Average calibrated around 2.8-3.2 for a typical PM.