Chat with OpenAI and Azure OpenAI

In this first chat post we start right where most projects do: connecting to OpenAI and Azure OpenAI, understanding the differences, and securing access with Managed Identity.

1) Start a chat: OpenAI and Azure OpenAI

Both services expose a chat API. The biggest day‑one differences are how you authenticate and, on Azure OpenAI, that you call a model by its deployment name (not the raw model name).

OpenAI (API key)

import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [
    { role: "system", content: "You are a concise, helpful assistant." },
    { role: "user", content: "Give me two prompt-writing tips." }
  ],
  stream: false
});

console.log(response.choices[0]?.message?.content);

Azure OpenAI (API key)

On Azure you deploy a model (for example “gpt-4o-mini”) and give it a deployment name, like chat-gpt4o-mini. You call that deployment on your Azure endpoint.

import { OpenAIClient, AzureKeyCredential } from "@azure/openai";

const endpoint = process.env.AZURE_OPENAI_ENDPOINT!; // e.g. https://my-aoai.openai.azure.com
const deployment = process.env.AZURE_OPENAI_DEPLOYMENT!; // e.g. chat-gpt4o-mini
const client = new OpenAIClient(endpoint, new AzureKeyCredential(process.env.AZURE_OPENAI_API_KEY!));

const result = await client.getChatCompletions(deployment, [
  { role: "system", content: "You are a concise, helpful assistant." },
  { role: "user", content: "Give me two prompt-writing tips." }
]);

console.log(result.choices?.[0]?.message?.content);

Azure OpenAI with Managed Identity (System- or User‑Assigned)

With Azure OpenAI you can remove API keys entirely by using Azure AD (Entra ID). This is perfect in Azure App Service, Functions, Container Apps, AKS, etc.

Requirements:

Assign the identity the role “Cognitive Services OpenAI User” on the Azure OpenAI resource.
For user-assigned MI, set AZURE_CLIENT_ID in your app to the identity’s Client ID.

import { OpenAIClient } from "@azure/openai";
import { DefaultAzureCredential } from "@azure/identity";

const endpoint = process.env.AZURE_OPENAI_ENDPOINT!; // e.g. https://my-aoai.openai.azure.com

// DefaultAzureCredential resolves a System Assigned Identity automatically in Azure.
// For a User Assigned Identity, set AZURE_CLIENT_ID to that identity's client ID.
const credential = new DefaultAzureCredential();
const client = new OpenAIClient(endpoint, credential);

const result = await client.getChatCompletions(
  process.env.AZURE_OPENAI_DEPLOYMENT!,
  [
    { role: "system", content: "You are a concise, helpful assistant." },
    { role: "user", content: "Give me two prompt-writing tips." }
  ]
);

console.log(result.choices?.[0]?.message?.content);

Optional: Semantic Kernel (C#) with Azure OpenAI and OpenAI

If you’re building with Semantic Kernel (SK), you can plug either service with a few lines and keep the rest of your app unchanged.

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using Azure.Identity;

var builder = Kernel.CreateBuilder()
  .AddAzureOpenAIChatCompletion(
    deploymentName: Environment.GetEnvironmentVariable("AZURE_OPENAI_DEPLOYMENT")!,
    endpoint: new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!),
    credential: new DefaultAzureCredential());

Kernel kernel = builder.Build();
var chat = kernel.GetRequiredService<IChatCompletionService>();
var history = new ChatHistory();
history.AddSystemMessage("You are a concise, helpful assistant.");
history.AddUserMessage("Give me two prompt-writing tips.");

await foreach (var chunk in chat.GetStreamingChatMessageContentsAsync(history, kernel: kernel))
{
  if (!string.IsNullOrEmpty(chunk.Content)) Console.Write(chunk.Content);
}

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;

var builder = Kernel.CreateBuilder()
  .AddOpenAIChatCompletion(
    modelId: "gpt-4o-mini",
    apiKey: Environment.GetEnvironmentVariable("OPENAI_API_KEY")!);

Kernel kernel = builder.Build();
var chat = kernel.GetRequiredService<IChatCompletionService>();
var history = new ChatHistory();
history.AddSystemMessage("You are a concise, helpful assistant.");
history.AddUserMessage("Give me two prompt-writing tips.");
var reply = await chat.GetChatMessageContentAsync(history);
Console.WriteLine(reply);

How streaming works (and why it feels fast)

When you enable streaming, the model sends partial tokens as soon as they’re generated instead of waiting for the full sentence. Under the hood this is an HTTP response that stays open and flushes small “delta” chunks. You render each chunk immediately for a snappy UI.

The request includes your messages (system/user/assistant) and optional parameters (temperature, max tokens, etc.).
The server tokenizes your prompt (input tokens), runs inference, and streams output tokens as they are produced.
You reconstruct the final answer by concatenating deltas in order.
The connection ends with a finish event indicating why the model stopped (e.g., stop sequence, max tokens, end-of-turn).

OpenAI (Node/TS) — streaming

import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const stream = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [
    { role: "system", content: "You are a concise, helpful assistant." },
    { role: "user", content: "Explain streaming and tokens in two bullets." }
  ],
  stream: true
});

let fullText = "";
for await (const part of stream) {
  const delta = part.choices?.[0]?.delta?.content ?? "";
  if (delta) {
    fullText += delta;
    process.stdout.write(delta);
  }
}

// Optionally get the final aggregated object (includes usage on supported models)
const final = await stream.finalChatCompletion();
console.log("\n\nusage:", final.usage);

Azure OpenAI (Node/TS) — streaming

import { OpenAIClient, AzureKeyCredential } from "@azure/openai";

const endpoint = process.env.AZURE_OPENAI_ENDPOINT!;
const deployment = process.env.AZURE_OPENAI_DEPLOYMENT!; // e.g. chat-gpt4o-mini
const client = new OpenAIClient(endpoint, new AzureKeyCredential(process.env.AZURE_OPENAI_API_KEY!));

const events = await client.getChatCompletions(deployment, [
  { role: "system", content: "You are a concise, helpful assistant." },
  { role: "user", content: "Explain streaming and tokens in two bullets." }
], { stream: true });

let fullText = "";
for await (const event of events) {
  for (const choice of event.choices ?? []) {
    const delta = choice?.delta?.content ?? "";
    if (delta) {
      fullText += delta;
      process.stdout.write(delta);
    }
  }
}

Tip: In browsers and edge runtimes you can convert the async iterator to a ReadableStream and pipe it to your UI progressively.

Token accounting: prompt vs completion

Every call spends tokens in two buckets:

Prompt tokens (input): tokens used by system+user+assistant history you send.
Completion tokens (output): tokens generated by the model.

APIs usually return usage numbers on the final response:

const res = await client.chat.completions.create({ model: "gpt-4o-mini", messages, stream: false });
console.log(res.usage); // { prompt_tokens, completion_tokens, total_tokens }

For streaming, usage is reported at the end of the stream (OpenAI) or only on non-streaming calls (older SDKs). If you must estimate in real time, you can tokenize locally (e.g., tiktoken-compatible tokenizers) to approximate prompt size and running output length, then reconcile with server-reported usage after completion.

Practical tips:

Keep prompts compact; move long docs into a retriever and send only relevant chunks.
Set a max_tokens ceiling to bound latency and spend.
Stream to improve perceived latency; you can render immediately while the tail completes.

OpenAI vs Azure OpenAI — what actually differs

Authentication
- OpenAI: API key only.
- Azure OpenAI: API key or Azure AD (Managed Identity / service principal) with RBAC.
Endpoint & model identifiers
- OpenAI: model: "gpt-4o-mini" (global model name).
- Azure OpenAI: endpoint + deploymentName (you choose the deployment name when you deploy a model in Azure AI Studio).
Compliance, data handling, regions
- Azure OpenAI runs in Azure regions with enterprise controls (SLA, compliance attestations, data residency, customer-managed keys, private networking via Private Link).
Content filters and abuse monitoring
- Azure OpenAI includes Azure AI Content Safety integration and enterprise guardrails options.
Rate limits & availability
- Quotas and limits are resource‑scoped in Azure; OpenAI limits are account‑scoped. Model availability can differ temporarily between the two.
Pricing
- Pricing differs; check the respective calculators/portals for latest rates.

Azure model deployment 101 (Azure OpenAI and Azure AI Model Catalog)

Create an Azure OpenAI resource (access required). Choose a region.
In Azure AI Foundry, deploy a model (e.g., GPT‑4o). Give it a deployment name.
Call the deployment via your endpoint: https://<resource>.openai.azure.com/ using the deployment name. Mind the api-version your SDK uses.
Secure access:
- API keys: simple to start, rotate in Key Vault.
- Managed Identity: assign the role “Cognitive Services OpenAI User” to the identity; your code uses DefaultAzureCredential.
- Network: lock down with Private Link and deny public network access.

For OSS models (e.g., Llama 3, Phi‑3), use Azure AI Model Catalog to create a managed/serverless endpoint. The SDKs differ from Azure OpenAI; your flow is similar: deploy → get endpoint → call with AAD.

Quick troubleshooting

401/403 on Azure: identity lacks “Cognitive Services OpenAI User” on the AOAI resource or you’re calling the wrong endpoint/region.
404 on Azure: you used a model name where a deployment name is required.
Throttling: respect per‑minute/token limits; use retry‑after headers and small backoff.
SK: ensure the right connector method is used (OpenAI vs Azure OpenAI) and that the deployment name is correct.

References

Semantic Kernel chat completion (C#): https://learn.microsoft.com/semantic-kernel/concepts/ai-services/chat-completion/
Azure OpenAI SDK (JS/TS): https://learn.microsoft.com/azure/ai-services/openai/quickstart?pivots=programming-language-javascript
Azure OpenAI SDK (C#): https://learn.microsoft.com/azure/ai-services/openai/quickstart?pivots=programming-language-csharp
Managed Identity with Azure SDKs: https://learn.microsoft.com/azure/developer/intro/managed-identity-service-principal
Azure AI Model Catalog: https://learn.microsoft.com/azure/ai-studio/how-to/model-catalog-overview