Friday, March 06, 2026

AI 103 Part 3 LLM's

AI models

Microsoft Foundry — Model Catalog

Key models available today, and what each one is built for

11,000+ models available
What you can build
💬
Chat apps
GPT-4o, Claude Sonnet, Llama 3.3, Phi-4
🧠
Reasoning & agents
GPT-5.2, Claude Opus 4.6, o3, DeepSeek-V3
💻
Code generation
GPT-5.1-codex, Claude Opus, Codestral
🖼️
Image generation
DALL-E 3, GPT-image-1.5, FLUX.2, SD 3.5
👁️
Vision / image analysis
GPT-4o, Claude (all), GPT-5.2
🎙️
Speech & audio
gpt-4o-mini-transcribe, gpt-4o-mini-tts
📄
Long documents
Claude Opus 4.6 (1M ctx), GPT-5.2
High-volume / cheap
Claude Haiku 4.5, Phi-4, GPT-4o-mini
Model Providers

OpenAI — GPT Family

Microsoft's flagship — deepest Azure integration, most widely used

Azure-native
ModelBest forCapabilities
gpt-5.2 Complex reasoning, enterprise agents, multi-step tasks
ChatAgentsVisionCode
gpt-5.1-codex Engineering-scale code generation, debugging, refactoring
CodeAgents
gpt-4o Balanced chat + vision + speed — great everyday workhorse
ChatVisionFast
gpt-4o-mini Low-cost, high-volume tasks — same API as gpt-4o
ChatFastVision
o3 Deep reasoning — math, science, complex problem solving
ReasoningCode
gpt-image-1.5 Text-to-image generation, inpainting, image editing
Image gen
dall-e-3 High-quality image generation from text prompts
Image gen
gpt-4o-mini-transcribe Fast, accurate speech-to-text transcription
SpeechFast
gpt-4o-mini-tts Natural text-to-speech with multiple voice options
TTS

Anthropic — Claude Family

Best-in-class reasoning, coding, and 1M token context

Available in Foundry
ModelBest forCapabilities
claude-opus-4-6 Most intelligent — complex agents, financial analysis, 1M token documents
AgentsLong docsCodeVision
claude-sonnet-4-6 Best balance of intelligence and speed — ideal for production apps
ChatAgentsVisionFast
claude-haiku-4-5 Fastest and cheapest Claude — high-volume tasks, real-time responses
ChatFast

Microsoft — Phi Family

Small, efficient models — great for edge, on-device, and cost-sensitive workloads

Microsoft-built
ModelBest forCapabilities
phi-4 Punches above its size — strong reasoning at small model cost
ChatCodeFast
phi-4-mini On-device and edge AI — runs locally with minimal resources
ChatEdge

Meta — Llama Family

Open-weight models — fully customizable, fine-tunable, self-hostable on Azure compute

Open-weight
ModelBest forCapabilities
llama-3.3-70b Best open-weight generalist — strong multilingual, fine-tunable
ChatCodeAgents
llama-3.2-vision Open-weight model with image understanding capability
ChatVision

Specialized Models

Purpose-built for image generation, code, and research

Multi-vendor
ModelBest forCapabilities
flux-2-pro High-quality image generation — photorealistic, artistic styles
Image gen
stable-diffusion-3.5 Image + text input, fine-tunable image generation
Image genVision input
mistral-codestral Code-specialized — fast autocomplete, fill-in-the-middle
CodeFast
deepseek-v3 Strong reasoning and coding — competitive with GPT-4 class
ReasoningCodeChat
💡 The Foundry advantage — all models share the same Azure AI inference API.
Switch from GPT-4o to Claude Sonnet by changing one line: model: "claude-sonnet-4-6"
No rewrites. No new SDK. Benchmark any model against your own data before committing.

Hugging face

Hugging Face is actually the biggest single source of models in Foundry. Here's the full picture:

The scale: Over 11,000 of the most popular and downloaded open-source models from the Hugging Face Hub are available in Foundry's model catalog — all with verified, security-scanned weights deployable to managed endpoints with one click. Microsoft Learn The partnership: Microsoft and Hugging Face have an expanded collaboration that puts new models in Foundry on the same day they land on the Hugging Face Hub — day-zero access — plus customized fine-tuned variants of trending models. Microsoft Learn Gated models too: Foundry now also supports Hugging Face's gated models — models behind an access boundary requiring approval from the model owner. You connect your Hugging Face access token to Foundry once, and it handles secure download and deployment automatically. CBT Nuggets Run locally: Foundry Local lets you convert any Hugging Face model (Safetensors or PyTorch) to ONNX using Olive and run it on your own device — fully offline, no Azure subscription needed.

The models can be invoked with API endpoints but also invoked from Foundry local

1 Install the tool

winget install Microsoft.FoundryLocal

List models

foundry model list

Download and run model

foundry model run phi-4-mini
and in second terminal get the endpoint
foundry service status
This gives you the local port, something like http://localhost:5273. Never hardcode this port — always fetch it at runtime since it's dynamic.

Use it in code


 using Microsoft.AI.Foundry.Local;

var manager = await FoundryLocalManager.StartModelAsync("phi-4-mini");
var client = new ChatClient(
    model: manager.GetModelId("phi-4-mini"),
    new ApiKeyCredential(manager.ApiKey),
    new OpenAIClientOptions { Endpoint = new Uri(manager.Endpoint) }
);

var response = await client.CompleteChatAsync("What is MCP?");
Console.WriteLine(response.Value.Content[0].Text);

Links Microsoft Foundry Local: Run AI Models On Your Device Building AI Apps with the Foundry Local C# SDK Example WPF Contoso medical service

Publishing an agent in Foundry


AI 103 Azure AI app and Agent develop Associate

AI 103 What is A1-103

 It's tollow up on the AI 102 exam. Its all about buildin en deploying generative AI applicaties and production ready agents using Microsoft Foundry and AI Services. Where AI 102 covers subject like

- Vision 
- NLP 
- Speech 
- Search 
- Gen

AI basic AI 103 goes deeper on the agentic and genericive layer. Get the skills required to build real-wold AI powered apps and Multi agent systems AI-102 vs AI-103 — The Key Difference AI-102 = Azure AI breadth (vision, NLP, speech, search, GenAI intro) AI-103 = GenAI + Agentic depth (Foundry, RAG, multi-agent orchestration, production ops)

 
Subtopic Key Skills Study Resource
Select Azure AI Services Choose the right service for GenAI, NLP, vision, speech, search docs.microsoft.com/azure/ai-services
Plan & Deploy with Foundry Create hub/project, deploy models, CI/CD integration, containers Microsoft Foundry documentation

Microsofrt foundry?

Microsoft Foundry is a unified platform for building AI-powered apps and agents on Azure. It gives you access to thousands of AI models (GPT, Claude, open-source), ready-to-use AI APIs for vision, speech, language and document processing, and a full agent service to build autonomous workflows. You can ground your apps on your own data with RAG, monitor and govern everything in one place, and deploy to any target — cloud, containers, or edge.

As a Developer, Concretely You Can: Chat/GenAI apps — deploy GPT, Claude, or open models and call them via API\
  • RAG systems — connect your documents/data and build Q&A over them
  • Agents — build bots that use tools, search, call APIs, and take actions
  • Multi-agent pipelines — orchestrate multiple agents working together
  • Fine-tune models — adapt base models on your own data
  • Deploy anywhere — containers, serverless, edge, or local with Foundry Local
  • Publish to Teams/M365 — ship agents to Teams and M365 with one click using low-code/no-code tools

MCP

What is MCP?MCP (Model Context Protocol) is an open standard for connecting AI assistants to the systems where data lives — content repositories, business tools, and development environments. Instead of maintaining separate connectors for each data source, developers build against one standard protocol. CBT NuggetsThink of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect devices to peripherals, MCP provides a standardized way to connect AI models to different data sources and tools. FlashgeniusImportantly, MCP is not an agent framework — it's a standardized integration layer. It doesn't decide when a tool is called; the LLM does that. MCP simply provides a standardized connection to streamline tool integration. Microsoft LearnMCP was introduced by Anthropic in November 2024 and has since been adopted by OpenAI, Google DeepMind, and many toolmakers as an industry standard.

How it works

The LLM receives a prompt. The LLM decides what to do it. It knows its registered MCP. TheMCP should have a clear description of what its capable of and what properties it needs to execute

   ```
   {
  "tools": [
    {
      "name": "get_weather",
      "description": "Get the current weather for a city. Use this when the user asks about weather, temperature, or climate conditions in a specific location.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "city": {
            "type": "string",
            "description": "The name of the city, e.g. 'Amsterdam'"
          }
        },
        "required": ["city"]
      }
    }
  ]
}
```
This manifest gets injected into the LLM's **system prompt** before the conversation starts. So the LLM literally reads a description of every available tool, just like you'd read a menu. **The `description` field is everything.** That's the plain English sentence that tells the LLM *when* to use the tool. The LLM matches the user's intent against those descriptions to decide which tool fits. If your description is vague or wrong, the LLM will either call the wrong tool or not call it at all.


So the full picture is

:
``` MCP server starts

Sends tool manifest (name + description + schema) to the runtime

Runtime injects it into the LLM system prompt

User sends a prompt → LLM reads it against the known tools

LLM pattern-matches intent to the right tool description

Calls it with correctly structured args (validated against the schema)
```

Part 1 — The MCP Server (tool registration) The [McpServerTool] attribute with Description = "..." is the tool manifest. That description string is literally what gets injected into the LLM system prompt — it's how the LLM knows when to call GetWeather vs Calculate. The [Description] on each parameter maps to the inputSchema, telling the LLM how to construct the arguments.

Part 2 — The Client + Agent Loop (automatic) mcpClient.ListToolsAsync() fetches the manifest from the server, t.AsAIFunction() converts each tool into an LLM function definition, and UseFunctionInvocation() handles the whole loop automatically — LLM calls tool → MCP executes it → result goes back to LLM → repeat until done.

Part 3 — The Manual Loop (what actually happens under the hood) This strips away the magic and shows each step explicitly:

The agnet


[McpServerToolType]
public static class TimeTools
{
    [McpServerTool,
     Description("Get the current date and time for a timezone. " +
                 "Use this when the user asks what time it is, " +
                 "or what time it is in a specific city or country.")]
    public static Task<string> GetCurrentTime(
        [Description("IANA timezone, e.g. 'Europe/Amsterdam' or 'Asia/Tokyo'. " +
                     "Defaults to UTC if not specified.")] string? timezone = null)
    {
        var tz = timezone != null
            ? TimeZoneInfo.FindSystemTimeZoneById(timezone)
            : TimeZoneInfo.Utc;
        var time = TimeZoneInfo.ConvertTimeFromUtc(DateTime.UtcNow, tz);
        return Task.FromResult(
            $"Current time in {tz.DisplayName}: {time:dddd, MMMM dd yyyy HH:mm:ss}");
    }
}

```
// ============================================================
//  MCP Agent Demo — C#
//  Shows: tool manifest registration, agent loop, tool dispatch
//
//  NuGet packages needed:
//    dotnet add package Azure.AI.OpenAI
//    dotnet add package ModelContextProtocol   (official MS SDK)
//    dotnet add package Microsoft.Extensions.AI
// ============================================================

using System.Text.Json;
using System.Text.Json.Serialization;
using Azure.AI.OpenAI;
using ModelContextProtocol.Server;
using ModelContextProtocol.Client;
using Microsoft.Extensions.AI;

// ============================================================
// PART 1 — THE MCP SERVER
// This is what registers tools and their descriptions.
// The LLM reads these descriptions to know WHAT each tool does.
// ============================================================

[McpServerToolType]
public static class WeatherTools
{
    // The [McpServerTool] attribute is the tool manifest.
    // The Description = "..." is exactly what gets injected
    // into the LLM system prompt — this is how the LLM knows
    // when and why to call this tool.
    [McpServerTool,
     Description("Get the current weather for a city. " +
                 "Use this when the user asks about weather, " +
                 "temperature, or climate in a specific location.")]
    public static async Task GetWeather(
        [Description("The city name, e.g. 'Amsterdam'")] string city)
    {
        // In production: call a real weather API here
        // This simulates an MCP tool execution
        await Task.Delay(200); // simulate network call
        return city.ToLower() switch
        {
            "amsterdam"  => "Amsterdam: 17°C, mostly cloudy",
            "madrid"     => "Madrid: 11°C, light rain",
            "paris"      => "Paris: 18°C, partly sunny",
            "london"     => "London: 13°C, overcast",
            _            => $"{city}: {Random.Shared.Next(5, 25)}°C, variable"
        };
    }
}

[McpServerToolType]
public static class MathTools
{
    [McpServerTool,
     Description("Perform a mathematical calculation. " +
                 "Use this for any arithmetic, percentages, or numeric operations.")]
    public static Task Calculate(
        [Description("A math expression, e.g. '15% of 847' or '(42 * 3) + 7'")] string expression)
    {
        // In production: use a proper expression evaluator
        // For demo: handle simple percentage pattern
        if (expression.Contains('%'))
        {
            var parts = expression.ToLower().Replace("% of", "").Split(' ',
                StringSplitOptions.RemoveEmptyEntries);
            if (parts.Length == 2
                && double.TryParse(parts[0], out var pct)
                && double.TryParse(parts[1], out var num))
            {
                var result = (pct / 100) * num;
                return Task.FromResult($"{expression} = {result:F2}");
            }
        }
        return Task.FromResult($"Calculated: {expression} = [result]");
    }
}

[McpServerToolType]
public static class TimeTools
{
    [McpServerTool,
     Description("Get the current date and time for a timezone. " +
                 "Use this when the user asks what time it is, " +
                 "or what time it is in a specific city or country.")]
    public static Task GetCurrentTime(
        [Description("IANA timezone, e.g. 'Europe/Amsterdam' or 'Asia/Tokyo'. " +
                     "Defaults to UTC if not specified.")] string? timezone = null)
    {
        var tz = timezone != null
            ? TimeZoneInfo.FindSystemTimeZoneById(timezone)
            : TimeZoneInfo.Utc;
        var time = TimeZoneInfo.ConvertTimeFromUtc(DateTime.UtcNow, tz);
        return Task.FromResult(
            $"Current time in {tz.DisplayName}: {time:dddd, MMMM dd yyyy HH:mm:ss}");
    }
}

// ============================================================
// PART 2 — THE MCP CLIENT + AGENT LOOP
// This is the glue between the LLM and the MCP server.
// It:
//   1. Connects to the MCP server
//   2. Fetches the tool manifest
//   3. Injects tools into the LLM
//   4. Runs the agentic loop (LLM → tool call → result → LLM)
// ============================================================

public class McpAgentDemo
{
    public static async Task Main(string[] args)
    {
        Console.WriteLine("=== MCP Agent Demo ===\n");

        // ── Step 1: Start the MCP server (in-process for demo)
        // In production this would be a separate process or remote server
        await using var mcpServer = await McpServerFactory.CreateAsync(
            new McpServerOptions
            {
                ServerInfo = new() { Name = "demo-server", Version = "1.0" }
            });

        // ── Step 2: Connect MCP client to server
        await using var mcpClient = await McpClientFactory.CreateAsync(
            new McpClientOptions
            {
                ClientInfo = new() { Name = "agent-client", Version = "1.0" }
            });

        // ── Step 3: Fetch tool manifest from MCP server
        // This is the JSON with name + description + inputSchema
        // that gets injected into the LLM system prompt
        var mcpTools = await mcpClient.ListToolsAsync();

        Console.WriteLine("Tools registered via MCP manifest:");
        foreach (var tool in mcpTools)
        {
            Console.WriteLine($"  [{tool.Name}] — {tool.Description}");
        }
        Console.WriteLine();

        // ── Step 4: Set up the LLM (Azure OpenAI)
        var openAiClient = new AzureOpenAIClient(
            new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!),
            new System.ClientModel.ApiKeyCredential(
                Environment.GetEnvironmentVariable("AZURE_OPENAI_KEY")!));

        var chatClient = openAiClient
            .GetChatClient("gpt-4o")
            .AsBuilder()
            .UseFunctionInvocation()  // auto-handles tool call loop
            .Build();

        // ── Step 5: Convert MCP tools → AI tools for the LLM
        // This is where the manifest becomes LLM function definitions
        var aiTools = mcpTools
            .Select(t => t.AsAIFunction())
            .ToList();

        // ── Step 6: Run the agent
        await RunAgentAsync(chatClient, aiTools, mcpClient);
    }

    static async Task RunAgentAsync(
        IChatClient chatClient,
        List tools,
        IMcpClient mcpClient)
    {
        // System prompt tells the LLM it has tools available
        // The tool descriptions (from the manifest) are appended automatically
        var messages = new List
        {
            new(ChatRole.System,
                "You are a helpful agent with access to tools via MCP. " +
                "Use tools whenever they would help answer the user's question. " +
                "Always explain what tool you're using and why.")
        };

        // Demo queries — showing different tools being called
        string[] userQueries =
        [
            "What's the weather like in Amsterdam right now?",
            "What is 23% of 1540?",
            "What time is it in Tokyo?",
            "Is it warmer in Paris or Amsterdam today?"  // triggers TWO tool calls
        ];

        foreach (var query in userQueries)
        {
            Console.WriteLine($"User: {query}");
            Console.WriteLine(new string('-', 50));

            messages.Add(new(ChatRole.User, query));

            // ── The agent loop
            // UseFunctionInvocation() handles this automatically:
            //   LLM response → detect tool call → execute via MCP → inject result → LLM continues
            var response = await chatClient.CompleteAsync(
                messages,
                new ChatOptions
                {
                    Tools = tools,
                    ToolChoice = ChatToolChoice.Auto  // LLM decides when to use tools
                });

            // Show what happened
            foreach (var content in response.Message.Contents)
            {
                switch (content)
                {
                    case TextContent text:
                        Console.WriteLine($"Agent: {text.Text}");
                        break;

                    case FunctionCallContent call:
                        Console.WriteLine($"  → Calling MCP tool: {call.Name}");
                        Console.WriteLine($"    Args: {JsonSerializer.Serialize(call.Arguments)}");
                        break;

                    case FunctionResultContent result:
                        Console.WriteLine($"  ← Tool result: {result.Result}");
                        break;
                }
            }

            messages.Add(response.Message);
            Console.WriteLine();
        }
    }
}

// ============================================================
// PART 3 — MANUAL AGENT LOOP (no SDK magic)
// This shows exactly what happens under the hood,
// without the auto function invocation middleware.
// ============================================================

public class ManualAgentLoop
{
    public static async Task RunManualLoopAsync(
        IChatClient llm,
        IMcpClient mcpClient,
        string userPrompt)
    {
        Console.WriteLine("=== Manual MCP Agent Loop ===");

        // Fetch tool manifest from MCP server
        var mcpTools = await mcpClient.ListToolsAsync();
        var aiTools  = mcpTools.Select(t => t.AsAIFunction()).ToList();

        var messages = new List
        {
            new(ChatRole.System,
                "You are a helpful agent. Use the provided tools when needed."),
            new(ChatRole.User, userPrompt)
        };

        int maxSteps = 5;

        while (maxSteps-- > 0)
        {
            // Step A: Ask LLM what to do next
            var response = await llm.CompleteAsync(messages,
                new ChatOptions { Tools = aiTools });

            messages.Add(response.Message);

            // Step B: Check if LLM wants to call a tool
            var toolCalls = response.Message.Contents
                .OfType()
                .ToList();

            if (toolCalls.Count == 0)
            {
                // No tool calls — LLM has a final answer
                var finalText = response.Message.Contents
                    .OfType()
                    .FirstOrDefault()?.Text;
                Console.WriteLine($"Final answer: {finalText}");
                break;
            }

            // Step C: Execute each tool call via MCP
            foreach (var call in toolCalls)
            {
                Console.WriteLine($"LLM wants to call: {call.Name}");

                // MCP dispatches to the right server + tool
                var result = await mcpClient.CallToolAsync(
                    call.Name,
                    call.Arguments?.ToDictionary(
                        kv => kv.Key,
                        kv => (object?)kv.Value) ?? []);

                Console.WriteLine($"MCP result: {result.Content.FirstOrDefault()?.ToString()}");

                // Step D: Inject result back into conversation
                messages.Add(new ChatMessage(ChatRole.Tool,
                    result.Content.FirstOrDefault()?.ToString() ?? "")
                {
                    AdditionalProperties = new() { ["toolCallId"] = call.CallId! }
                });
            }

            // Loop back — LLM now sees the tool results and continues
        }
    }
}
```

Workflows

Now you know what an MCP is. In a workflow you can make a process diagram like thing where MCP agents get connected in a workflow. Between each agent a prompt helps to give context.

In foundy its al visualized and a yml can be used as blueprint of it. In code it looks like this

// Each agent is created with its own MCP tools
var researchAgent = await agentClient.CreateAgentAsync(
    model: "gpt-4o",
    name: "ResearchAgent",
    instructions: "You research topics using web search. " +
                  "Always cite your sources.",
    tools: [ mcpWebSearchTool ]   // ← MCP server attached here
);

var writerAgent = await agentClient.CreateAgentAsync(
    model: "gpt-4o",
    name: "WriterAgent",
    instructions: "You write structured reports based on " +
                  "research provided to you.",
    tools: [ mcpDocumentTool ]    // ← different MCP server
);

// The workflow thread passes context between agents
var thread = await agentClient.CreateThreadAsync();

// Step 1 — Research agent runs with its MCP tools
await agentClient.CreateMessageAsync(thread.Id,
    MessageRole.User, userPrompt);

var researchRun = await agentClient.CreateRunAsync(
    thread.Id, researchAgent.Id);
await researchRun.WaitForCompletionAsync();

var researchOutput = await GetLastMessageAsync(thread.Id);

// Step 2 — Writer agent receives research output as input
await agentClient.CreateMessageAsync(thread.Id,
    MessageRole.User,
    $"Based on this research, write a report:\n\n{researchOutput}");

var writerRun = await agentClient.CreateRunAsync(
    thread.Id, writerAgent.Id);
await writerRun.WaitForCompletionAsync();