Semantic Index with Azure Cosmos DB

Semantic Index with Azure Cosmos DB

Azure Cosmos DB for NoSQL now includes an integrated vector database. That means you can store your JSON data and its embeddings side‑by‑side, index vectors with high‑performance algorithms like DiskANN, and run similarity search with the same SQL syntax you already use.

In this post:

  • What Azure Cosmos DB is (quick primer)
  • What DiskANN (ANN on Disk) is and why it matters for vector search
  • How I use Cosmos DB vectors for chat sessions and “memory”
  • How to use Cosmos DB as a vector DB in C# (create index, insert data, query with VectorDistance)
  • Conclusion

What is Azure Cosmos DB (quickly)

Azure Cosmos DB is Microsoft’s globally distributed, multi‑model database with elastic scale and low‑latency reads/writes. In the NoSQL (Core) API you work with JSON documents, partition keys, and a powerful automatic indexing engine. With the vector search capability enabled, Cosmos DB can also store, index, and query high‑dimensional embeddings directly in your container—no separate vector store required.

Key traits you benefit from as a vector store:

  • Global distribution, geo‑replication, and SLAs for availability/latency
  • Familiar NoSQL JSON model with partitioning and filters combined with vector search
  • Integrated vector indexes: flat, quantizedFlat, and DiskANN

DiskANN (aka ANN on Disk): what it is and why it’s useful

DiskANN is a family of approximate nearest‑neighbor (ANN) algorithms created by Microsoft Research that builds a disk‑backed graph index. In Azure Cosmos DB, choosing the diskANN vector index type gives you:

  • High recall with low latency and low RU costs at large scale
  • Excellent performance when you have more than ~50k vectors per physical partition
  • Up to 4096 dimensions supported (like quantizedFlat)

Trade‑offs and tips:

  • Requires at least ~1,000 vectors to kick in (fewer → falls back to a full scan)
  • Index builds take time for very large ingestions; insert in batches
  • If your dataset is small or heavily filtered, quantizedFlat can be a great fit

References: DiskANN research paper, Cosmos DB vector index types, and limits (flat 505 dims; quantizedFlat/diskANN 4096 dims).

How I use vectors here (personal touch)

In this series I use Cosmos DB vectors mainly for chat sessions:

  • Each chat session receives its own vector (embedding of the conversation summary) so I can find the right session fast.
  • The full chat history is stored as “memory knowledge.” I also extract small per‑conversation memory facts about the user (preferences, entities, projects) and store them separately. These memories help steer prompts and personalize responses over time.
  • More on extraction and usage in the memory section of the series.

This design keeps operational chat data, vectors, and user‑level memories together in one place.

Use Azure Cosmos DB as a vector DB (C#)

You can create a container with a vector embedding policy and add a diskANN (or quantizedFlat) vector index. Then insert documents with vectors and query with VectorDistance.

Prerequisites:

  • Azure Cosmos DB for NoSQL account with Vector Search enabled
  • .NET SDK v3.45.0+ (3.46.0‑preview+ for latest features)
  • An embeddings service (e.g., Azure OpenAI) to generate vectors

1) Define vector policy and create the container (DiskANN)

using Microsoft.Azure.Cosmos;
using System.Collections.ObjectModel;

var client = new CosmosClient("https://<account>.documents.azure.com:443/", "<key>");

Database db = await client.CreateDatabaseIfNotExistsAsync("chatdb");

// Describe which fields contain vectors and how to compare them
var embeddings = new List<Embedding>
{
  new Embedding
  {
    Path = "/contentVector",             // vector property path
    DataType = VectorDataType.Float32,     // element type
    DistanceFunction = DistanceFunction.Cosine, // cosine | dotproduct | euclidean
    Dimensions = 1536                      // match your embedding model
  }
};

var containerProps = new ContainerProperties(id: "chats", partitionKeyPath: "/sessionId")
{
  VectorEmbeddingPolicy = new(new Collection<Embedding>(embeddings)),
  IndexingPolicy = new IndexingPolicy
  {
    // Add a DiskANN vector index for the vector path
    VectorIndexes =
    {
      new VectorIndexPath
      {
        Path = "/contentVector",
        Type = VectorIndexType.DiskANN
      }
    }
  }
};

// General index includes everything…
containerProps.IndexingPolicy.IncludedPaths.Add(new IncludedPath { Path = "/*" });
// …but exclude the vector path from the classic index to optimize writes
containerProps.IndexingPolicy.ExcludedPaths.Add(new ExcludedPath { Path = "/contentVector/*" });

Container container = await db.CreateContainerIfNotExistsAsync(containerProps, throughput: 1000);

Notes:

  • Use VectorIndexType.DiskANN for large collections; VectorIndexType.QuantizedFlat can be great for smaller or heavily‑filtered searches.
  • The vector feature currently applies when creating new containers (policies can’t be added later to an existing container).

2) Insert a chat session with an embedding

using Microsoft.Azure.Cosmos;

// Example “chat session” document with a vector
var chat = new
{
  id = Guid.NewGuid().ToString("N"),
  sessionId = "abc-123",                 // partition key
  title = "Fabric onboarding with Teams",
  lastMessageAt = DateTimeOffset.UtcNow,
  content = "We discussed rollout steps and governance.",
  // 1536-dim embedding for the chat summary (shortened for brevity)
  contentVector = new float[] { /* ... your 1536 floats ... */ }
};

await container.CreateItemAsync(chat, new PartitionKey(chat.sessionId));

Tip: Generate contentVector with your embedding model (for example, Azure OpenAI text‑embedding‑3‑small at 1536 dims). Index‑time and query‑time models should match.

3) Query by similarity with VectorDistance

using Microsoft.Azure.Cosmos;

// Query for the top 5 most similar sessions to a query embedding
float[] queryEmbedding = /* your 1536‑dim vector from the user query */;

var sql = @"SELECT TOP 5 c.id, c.sessionId, c.title,
          VectorDistance(c.contentVector, @q) AS SimilarityScore
       FROM c
       ORDER BY VectorDistance(c.contentVector, @q)";

var qd = new QueryDefinition(sql).WithParameter("@q", queryEmbedding);

using FeedIterator<dynamic> feed = container.GetItemQueryIterator<dynamic>(qd, requestOptions: new QueryRequestOptions
{
  // Optional: scope by partition or filters, which also speeds up vector search
  PartitionKey = new PartitionKey("abc-123")
});

while (feed.HasMoreResults)
{
  foreach (var item in await feed.ReadNextAsync())
  {
    Console.WriteLine($"{item.id}  |  score: {item.SimilarityScore}");
  }
}

Why this works well:

  • You can combine vector search with normal filters (time ranges, session/user IDs, tags)
  • You keep vectors and original JSON together (one write, one place to secure and back up)
  • DiskANN gives great performance as your chat set grows

Conclusion

Cosmos DB’s integrated vector store is powerful and flexible—especially with DiskANN when your collection grows large. In this project I mainly use it for chat sessions and memory: each session carries a vector for fast recall, the chat history becomes a memory knowledge source, and I extract per‑conversation user memories for personalization.

I haven’t used Cosmos DB yet as the primary production knowledge source for RAG in this series, but technically it works and scales well if you choose your vector index type appropriately and match your embedding model across indexing and query time.

Useful links: