Saying 'use GPT' is like saying 'buy a vehicle.' Do you need a bike, sedan, or truck? Model choice decides speed, quality, and budget.
This is Lesson 3 — Beginner in our Azure Openai Basics series. By the end, you will understand this topic well enough to explain it to a friend — no jargon overload, we promise.
Model Families: Why More Than One Exists
Azure OpenAI offers multiple model options because workloads differ. A customer support bot that answers short FAQs has different needs than a legal document analyzer. One size does not fit all.
Think of model families as camera lenses. A huge lens captures fine detail but can be heavier and slower. A compact lens is faster and cheaper but may miss subtle nuance. In AI terms, bigger models often reason better on hard tasks, while smaller models are faster and more economical.
As a beginner, start with practical criteria: response quality target, latency limit, and monthly budget. If your product is early-stage, fast iteration often matters more than perfect wording. You can upgrade model choice once usage patterns are clear.
Deployments: Your Friendly Alias Layer
In Azure, you do not call raw model IDs directly every time. You create a deployment with a name, then your app calls that name. This decouples app code from backend model swap decisions.
For example, you might create deployment chat-default mapped to gpt-4o-mini. Later, if you switch to a newer model, app code can stay unchanged if deployment name remains chat-default. This is clean architecture in practice: stable interface, flexible implementation.
var deployment = Environment.GetEnvironmentVariable("AZURE_OPENAI_DEPLOYMENT")
?? "chat-default";
// App code stays stable while backend mapping can evolve.
Use names by intent, not by vendor version. Examples: chat-default, chat-premium, embed-search.
How Selection Works in Real Apps
Production teams often route traffic by use-case. FAQ lookup might use a cost-efficient model, while complex report generation uses a stronger one. This keeps user experience high without overspending.
A small decision matrix helps:
- Simple Q&A: prioritize speed and cost.
- Detailed reasoning: prioritize quality consistency.
- High traffic: prioritize throughput and predictable spend.
Do not optimize blindly. Collect sample prompts and compare outputs before switching.
A Mini Comparison Lab You Can Run
Create two deployments and test the same prompts against both. Evaluate answer correctness, tone, and response time. Keep scoring simple: 1-5 for quality and note average latency.
test_prompts = [
"Explain recursion to a 12-year-old.",
"Summarize this policy in 5 bullets.",
"Write a polite email asking for extension."
]
# Run prompts through two deployments and compare quality/latency.
This exercise teaches an important engineering truth: model choice is an empirical decision, not a popularity contest. The "best" model is the one that meets your constraints and user expectations.
Versioning and Governance Basics
Every AI system eventually needs change management. When updating deployments, keep release notes. Record what changed, why, and expected impact. If answers regress, rollback becomes straightforward.
Use feature flags or environment configuration to shift traffic gradually. For example, route 10% users to a new deployment first. Observe metrics and feedback before full rollout.
Lesson 4 will show chat completions API usage, but model strategy remains behind every call. Choosing wisely today prevents expensive redesign later.
A Simple Model Selection Playbook
When you are unsure which deployment to pick, use a repeatable playbook instead of intuition. Step 1: list top three user tasks your app must handle. Step 2: write five realistic prompts per task. Step 3: score each deployment on correctness, clarity, and latency. Step 4: estimate token cost for expected volume. This gives you evidence, not guesswork.
Suppose your app has two flows: "quick FAQ answers" and "detailed study explanations." You may discover a smaller deployment performs perfectly for FAQ and only the study flow needs a stronger model. That design can reduce cost dramatically while users still feel high quality where it matters.
Also define a rollback trigger before changing production traffic. Example trigger: if average quality score drops below 4/5 for two consecutive evaluation batches, revert to prior deployment alias. This reduces emotional debates during incidents because the rule was agreed in advance.
Finally, keep one "known-good" deployment active for emergency fallback. During exam week or product launch, stability matters more than experimenting with new model versions. Strong teams separate innovation traffic from reliability traffic.
Operate Deployments Like Product Features
Treat each deployment as a product feature with owner, objective, and service-level expectations. For example, your chat-default deployment might target sub-2-second median latency and educational accuracy above a defined threshold. Stating these targets makes model changes accountable, not arbitrary.
Build a lightweight dashboard that shows request volume, average latency, error rate, and token consumption by deployment alias. If one alias suddenly becomes expensive or unstable, you can adjust routing quickly without touching core application code.
Another useful pattern is intent-based fallback chains. If premium deployment fails temporarily, route to a stable backup deployment with tighter response format and shorter output limits. Users receive slightly simpler answers, but service remains available.
When documenting deployments, include purpose statement, expected input type, and prohibited usage. This prevents accidental misuse by new engineers who may otherwise route unrelated tasks to an expensive model.
As your app matures, this operational discipline transforms model selection from ad-hoc experimentation into reliable platform engineering.
Common Misconceptions
"Largest model is always best." Best means fit-for-purpose: quality, latency, and budget together.
"Deployment names are optional cosmetics." They are abstraction points that reduce code churn.
"Model decisions are one-time." You should revisit choices as traffic and requirements evolve.
"Benchmarking is overkill for beginners." Even tiny prompt tests prevent poor assumptions.
Quick Recap
- Model families exist for different workload needs.
- Deployment names decouple app code from model changes.
- Evaluate models by quality, speed, and cost together.
- Run prompt comparison experiments before switching.
- Document and govern model changes like any production release.
Summary
Lesson 3 teaches model literacy: pick with evidence, deploy with intent-based names, and design for future model changes without rewriting app code.
Ready for the next step? Continue with the suggested reads below — each lesson builds on the last.
Frequently Asked Questions
Key Takeaways
- Choose models by constraints, not hype.
- Deployment names are architecture leverage.
- Measure before changing.
- Plan for model evolution.
- Keep a simple governance trail.