PMtheBuilder logoPMtheBuilder
ยท2/1/2026ยท8 min read

Model Selection Isnt A Technical Decision

Guide

Why Model Selection Is a PM Concern

Let me show you why this matters:

Cost Structure

Different models have vastly different costs:

Model Input Cost (per 1M tokens) Output Cost (per 1M tokens)
GPT-4 Turbo $10 $30
Claude 3.5 Sonnet $3 $15
GPT-3.5 Turbo $0.50 $1.50
Llama 3.1 (self-hosted) ~$1-2 ~$1-2

At scale, these differences are millions of dollars annually. That's a product economics question, not a technical one.

Quality Tradeoffs

Models excel at different things:

  • Claude: Following complex instructions, nuanced analysis, safety
  • GPT-4o: Multi-modal, general reasoning, code
  • Gemini 1.5: Long context, video understanding
  • Llama: Self-hosting, customization, cost

Which tradeoffs matter depends on your use case. That's a product question.

Latency

Model latency affects user experience:

  • Fast models (GPT-4o, Claude Sonnet): 500ms-1s typical
  • Slower models (GPT-4, Claude Opus): 2-5s typical
  • Self-hosted: Variable based on infrastructure

For real-time features, latency determines UX. For batch processing, it doesn't matter. Product decides.

Privacy

Data handling varies by provider:

  • Do they train on your data?
  • Where is data processed geographically?
  • What's their retention policy?
  • Can you get a HIPAA BAA?

For healthcare, finance, or privacy-sensitive products, this determines which models are even legal to use.

Lock-in

Switching models has costs:

  • Prompts may not transfer cleanly
  • Output formats differ
  • Fine-tuning is model-specific
  • Integration points vary

The choice of model creates dependencies. PM should understand these.


The PM's Model Selection Framework

Here's how I approach model selection:

Step 1: Define Requirements (PM-Led)

Before any model comparison, define what you need:

Quality requirements:

  • What task is the model doing?
  • What does "good enough" look like?
  • What's unacceptable?

Performance requirements:

  • Target latency (p50, p95)
  • Throughput needs
  • Availability requirements

Cost constraints:

  • Budget per request
  • Monthly budget cap
  • Cost at projected scale

Privacy/compliance requirements:

  • Data sensitivity level
  • Regulatory requirements
  • Acceptable data processing locations

Strategic requirements:

  • How important is avoiding vendor lock-in?
  • Do we need fine-tuning capability?
  • Do we need multi-modal?

Write these down BEFORE comparing models. Otherwise you'll optimize for the wrong things.

Step 2: Shortlist Models (Joint PM/Eng)

Based on requirements, identify candidate models.

The usual suspects:

  • OpenAI: GPT-4o, GPT-4 Turbo, GPT-3.5
  • Anthropic: Claude 3 Opus, Claude 3.5 Sonnet
  • Google: Gemini 1.5 Pro, Gemini 1.5 Flash
  • Meta: Llama 3.1 (various sizes)
  • Mistral: Mistral Large, Mistral Small, Mixtral

Don't limit to one provider. Compare.

Step 3: Run Comparative Evals (Eng-Led, PM-Designed)

Here's where engineering does the work, but PM designs the test:

Create an eval set:

  • 50-100 real examples from your use case
  • Cover edge cases and adversarial inputs
  • Define clear scoring criteria

Run each candidate:

  • Same prompts, same examples
  • Measure quality scores
  • Measure latency
  • Calculate cost

Compare results:

Model Quality Score p50 Latency Cost/Request
Model A 87% 800ms $0.02
Model B 91% 1200ms $0.05
Model C 85% 600ms $0.01

Now you have data.

Step 4: Make the Decision (PM-Led)

With data in hand, PM makes the call:

If quality differences are small: Optimize for cost or latency

If quality differences are large: Pay for quality if business case supports it

If privacy constraints exist: Eliminate non-compliant options

If lock-in matters: Favor standards-based or open-source options

Document the decision and the reasoning. You'll revisit this.


The Model Selection Checklist

Use this for any model selection decision:

Requirements Clarity:

  • Quality bar defined with specific criteria
  • Latency requirements specified
  • Cost budget established
  • Privacy/compliance needs documented
  • Strategic considerations (lock-in, customization) identified

Evaluation Rigor:

  • Multiple models compared
  • Real use case examples in eval set
  • Quality scoring methodology defined
  • Latency measured under realistic conditions
  • Cost calculated at projected scale

Decision Quality:

  • Data-driven comparison completed
  • Tradeoffs explicitly acknowledged
  • Decision documented with reasoning
  • Fallback/migration plan considered
  • Review trigger defined (when to reconsider)

Multi-Model Strategies

Here's where it gets interesting: you don't have to pick one.

Model Routing: Use different models for different request types.

  • Simple queries โ†’ cheap/fast model (GPT-3.5, Haiku)
  • Complex queries โ†’ premium model (GPT-4, Opus)

Route based on query complexity. Reduce cost without sacrificing quality where it matters.

Fallback Chains: Primary model fails or is slow โ†’ fall back to alternative.

Improves reliability. Reduces dependency on single provider.

A/B Testing: Run different models for different user segments.

Learn which model performs better for your specific use case.

Ensemble: Multiple models vote or verify each other.

Improves quality for high-stakes decisions.

PM should push for multi-model architecture when it makes sense. Single-model dependency is a risk.


The "But Engineering Said X" Conversation

What to do when engineering has already picked a model?

Don't: Challenge the decision confrontationally.

Do: Ask good questions.

"Help me understand the model selection. I want to make sure I can defend it to stakeholders."

  • What options did we consider?
  • What were the quality scores on our use case?
  • What's the cost trajectory as we scale?
  • What are the lock-in implications?
  • What would trigger us to reconsider?

If the answers are solid, great. If the answers are "it's what we know" or "it's the best," dig deeper.


When to Reconsider Model Selection

Model selection isn't permanent. Revisit when:

Cost changes: Your costs spike, or a provider changes pricing.

Quality changes: New models release (happens constantly). Your model is no longer best-in-class.

Requirements change: You need longer context, multi-modal, or different capabilities.

Scale changes: Volume justifies self-hosting what you're buying via API.

Provider issues: Reliability problems, deprecation announcements, policy changes.

Build in regular model reviews (quarterly) to avoid complacency.


The Strategic View

Model selection is product strategy.

Commodity AI: Use APIs, optimize for cost, accept some vendor dependency. AI is a feature, not the differentiator.

Competitive AI: Customize models, prioritize quality, invest in differentiation. AI is core to the product.

Regulated AI: Prioritize compliance, accept cost premiums, prefer self-hosted or compliant providers. Constraints dominate.

Know which quadrant you're in. Let that drive model selection philosophy.


The Conversation with Engineering

When you engage engineering on model selection:

Come prepared with:

  • Requirements document (quality, latency, cost, privacy)
  • Business context (why these requirements matter)
  • Questions, not demands

Ask for:

  • Comparative eval data
  • Cost projections at scale
  • Lock-in assessment
  • Maintenance implications

Collaborate on:

  • Eval set design (you know the use cases)
  • Tradeoff decisions (you own the business case)
  • Review cadence (you'll know when requirements change)

Own:

  • The final decision (within your scope)
  • Communicating rationale to stakeholders
  • Revisiting when circumstances change

Model selection is a joint effort, but PM should drive the process, not just receive the output.


Key Takeaways

  1. Model selection has product implications โ€” cost, quality, latency, privacy, and lock-in are all PM concerns

  2. Define requirements before comparing โ€” know what you need, then evaluate; don't let engineering optimize for the wrong criteria

  3. Consider multi-model strategies โ€” routing, fallbacks, and A/B testing can optimize better than any single model choice

๐Ÿงช

Free Tool

How strong are your AI PM skills?

8 real production scenarios. LLM-judged across 5 dimensions. Takes ~15 minutes. See exactly where your gaps are.

Take the Free Eval โ†’
๐Ÿ› ๏ธ

PM the Builder

Practical AI product management โ€” backed by PM leaders who build AI products, hire AI PMs, and ship every day. Building what we wish existed when we started.

๐Ÿงช

Benchmark your AI PM skills

8 production scenarios. Free. LLM-judged. See where you stand.

Take the Eval โ†’
๐Ÿ“˜

Go deeper with the full toolkit

Playbooks, interview prep, prompt libraries, and production frameworks โ€” built by the teams who hire AI PMs.

Browse Products โ†’
โšก

Free: 68-page AI PM Prompt Library

Production-ready prompts for evals, architecture reviews, stakeholder comms, and shipping. Enter your email, get the PDF.

Get It Free โ†’

Want more like this?

Get weekly tactics for AI product managers.