Friday, March 06, 2026

AI 103 Part 3 LLM's

AI models

Microsoft Foundry — Model Catalog

Key models available today, and what each one is built for

11,000+ models available
What you can build
💬
Chat apps
GPT-4o, Claude Sonnet, Llama 3.3, Phi-4
🧠
Reasoning & agents
GPT-5.2, Claude Opus 4.6, o3, DeepSeek-V3
💻
Code generation
GPT-5.1-codex, Claude Opus, Codestral
🖼️
Image generation
DALL-E 3, GPT-image-1.5, FLUX.2, SD 3.5
👁️
Vision / image analysis
GPT-4o, Claude (all), GPT-5.2
🎙️
Speech & audio
gpt-4o-mini-transcribe, gpt-4o-mini-tts
📄
Long documents
Claude Opus 4.6 (1M ctx), GPT-5.2
High-volume / cheap
Claude Haiku 4.5, Phi-4, GPT-4o-mini
Model Providers

OpenAI — GPT Family

Microsoft's flagship — deepest Azure integration, most widely used

Azure-native
ModelBest forCapabilities
gpt-5.2 Complex reasoning, enterprise agents, multi-step tasks
ChatAgentsVisionCode
gpt-5.1-codex Engineering-scale code generation, debugging, refactoring
CodeAgents
gpt-4o Balanced chat + vision + speed — great everyday workhorse
ChatVisionFast
gpt-4o-mini Low-cost, high-volume tasks — same API as gpt-4o
ChatFastVision
o3 Deep reasoning — math, science, complex problem solving
ReasoningCode
gpt-image-1.5 Text-to-image generation, inpainting, image editing
Image gen
dall-e-3 High-quality image generation from text prompts
Image gen
gpt-4o-mini-transcribe Fast, accurate speech-to-text transcription
SpeechFast
gpt-4o-mini-tts Natural text-to-speech with multiple voice options
TTS

Anthropic — Claude Family

Best-in-class reasoning, coding, and 1M token context

Available in Foundry
ModelBest forCapabilities
claude-opus-4-6 Most intelligent — complex agents, financial analysis, 1M token documents
AgentsLong docsCodeVision
claude-sonnet-4-6 Best balance of intelligence and speed — ideal for production apps
ChatAgentsVisionFast
claude-haiku-4-5 Fastest and cheapest Claude — high-volume tasks, real-time responses
ChatFast

Microsoft — Phi Family

Small, efficient models — great for edge, on-device, and cost-sensitive workloads

Microsoft-built
ModelBest forCapabilities
phi-4 Punches above its size — strong reasoning at small model cost
ChatCodeFast
phi-4-mini On-device and edge AI — runs locally with minimal resources
ChatEdge

Meta — Llama Family

Open-weight models — fully customizable, fine-tunable, self-hostable on Azure compute

Open-weight
ModelBest forCapabilities
llama-3.3-70b Best open-weight generalist — strong multilingual, fine-tunable
ChatCodeAgents
llama-3.2-vision Open-weight model with image understanding capability
ChatVision

Specialized Models

Purpose-built for image generation, code, and research

Multi-vendor
ModelBest forCapabilities
flux-2-pro High-quality image generation — photorealistic, artistic styles
Image gen
stable-diffusion-3.5 Image + text input, fine-tunable image generation
Image genVision input
mistral-codestral Code-specialized — fast autocomplete, fill-in-the-middle
CodeFast
deepseek-v3 Strong reasoning and coding — competitive with GPT-4 class
ReasoningCodeChat
💡 The Foundry advantage — all models share the same Azure AI inference API.
Switch from GPT-4o to Claude Sonnet by changing one line: model: "claude-sonnet-4-6"
No rewrites. No new SDK. Benchmark any model against your own data before committing.

Hugging face

Hugging Face is actually the biggest single source of models in Foundry. Here's the full picture:

The scale: Over 11,000 of the most popular and downloaded open-source models from the Hugging Face Hub are available in Foundry's model catalog — all with verified, security-scanned weights deployable to managed endpoints with one click. Microsoft Learn The partnership: Microsoft and Hugging Face have an expanded collaboration that puts new models in Foundry on the same day they land on the Hugging Face Hub — day-zero access — plus customized fine-tuned variants of trending models. Microsoft Learn Gated models too: Foundry now also supports Hugging Face's gated models — models behind an access boundary requiring approval from the model owner. You connect your Hugging Face access token to Foundry once, and it handles secure download and deployment automatically. CBT Nuggets Run locally: Foundry Local lets you convert any Hugging Face model (Safetensors or PyTorch) to ONNX using Olive and run it on your own device — fully offline, no Azure subscription needed.

The models can be invoked with API endpoints but also invoked from Foundry local

1 Install the tool

winget install Microsoft.FoundryLocal

List models

foundry model list

Download and run model

foundry model run phi-4-mini
and in second terminal get the endpoint
foundry service status
This gives you the local port, something like http://localhost:5273. Never hardcode this port — always fetch it at runtime since it's dynamic.

Use it in code


 using Microsoft.AI.Foundry.Local;

var manager = await FoundryLocalManager.StartModelAsync("phi-4-mini");
var client = new ChatClient(
    model: manager.GetModelId("phi-4-mini"),
    new ApiKeyCredential(manager.ApiKey),
    new OpenAIClientOptions { Endpoint = new Uri(manager.Endpoint) }
);

var response = await client.CompleteChatAsync("What is MCP?");
Console.WriteLine(response.Value.Content[0].Text);

Links Microsoft Foundry Local: Run AI Models On Your Device Building AI Apps with the Foundry Local C# SDK Example WPF Contoso medical service

Publishing an agent in Foundry


No comments: