Semantic Index What Is It?

This post gives you a practical, no‑nonsense overview of semantic indexing and vector search so you can tell when (and how) to use it.

What is a semantic index?

A semantic index is a searchable “map” of meaning built from your content using vectors (numerical representations of text). Instead of matching exact words like a traditional keyword index, a semantic index groups and retrieves items by similarity in meaning.

Keyword search: “laptop” matches only documents containing the word “laptop”.
Semantic search: “lightweight notebook for travel” can also find content about “ultrabooks” or “portable laptops” because the meanings are close.

Under the hood, a semantic index stores an embedding vector for each chunk/document and uses a similarity metric (most commonly cosine similarity) to find the closest items to your query’s vector.

What is a vector? (with text examples)

A vector is just a list of numbers. Embedding models turn text into vectors so that semantic closeness becomes geometric closeness.

Examples of short texts (conceptual only; numbers are illustrative):

“I like coffee” → [0.62, 0.11, -0.28, 0.44, …]
“I enjoy espresso” → [0.60, 0.08, -0.25, 0.47, …] ← very close in meaning
“The cat sits on the mat” → [-0.12, 0.91, 0.33, -0.05, …] ← unrelated topic

The closer two vectors are, the more similar the meanings. Cosine similarity is a common way to measure this closeness.

Tiny cosine example (C#)

using System;
using System.Linq;

static double Cosine(double[] a, double[] b)
{
    if (a.Length != b.Length) throw new ArgumentException("Vector lengths differ");
    double dot = a.Zip(b, (x, y) => x * y).Sum();
    double na = Math.Sqrt(a.Sum(x => x * x));
    double nb = Math.Sqrt(b.Sum(x => x * x));
    return dot / (na * nb);
}

var v1 = new double[] { 0.62, 0.11, -0.28, 0.44 };
var v2 = new double[] { 0.60, 0.08, -0.25, 0.47 };
var v3 = new double[] { -0.12, 0.91, 0.33, -0.05 };

Console.WriteLine($"sim(v1, v2) = {Cosine(v1, v2):F3}"); // high → similar meaning
Console.WriteLine($"sim(v1, v3) = {Cosine(v1, v3):F3}"); // low  → different topic

How does search work in a vector database?

At a high level:

Chunk your content (e.g., paragraphs, sections, pages).
Generate an embedding vector for each chunk with a text embedding model.
Store vectors (plus metadata like title, URL, tags) in a vector database.
At query time, embed the user’s query into a vector.
Retrieve the top‑k nearest vectors by similarity (ANN search such as HNSW/IVF/ScaNN).
(Optional) Re‑rank and filter with metadata, apply hybrid (keyword + vector) search, or pass top results to an LLM for answer synthesis.

Query-time flow

Input: “best travel laptop with long battery life”
Embed query → q⃗
Vector DB returns nearest chunks about “ultrabooks”, “portable notebooks”, “battery life tests”.
You display results or feed the retrieved text to an LLM for a grounded answer.

What is an embedding?

An embedding is a dense vector representation of input text produced by a specialized model. Good embeddings place semantically similar texts near each other in vector space.

Key properties:

Dimensionality: typical sizes range from a few hundred to several thousand numbers per text.
Domain: general-purpose embeddings work broadly; domain‑tuned embeddings (e.g., code, legal, medical) capture niche semantics better.
Stability: for the same input and model, embeddings are deterministic.

Why is a text embedding different from an LLM?

They solve different problems and are trained with different objectives:

Purpose:
- Embedding model → representation learning for similarity, clustering, classification, retrieval.
- LLM (chat/completion) → generative modeling to produce fluent text.
I/O shape:
- Embedding model → input text, output vector (numbers).
- LLM → input prompt, output text tokens.
Objective:
- Embedding → place semantically similar items close together; optimize contrastive/similarity loss.
- LLM → predict next token; optimize language modeling loss.
Usage:
- Embedding → searching, deduplication, recommendations, RAG retrieval.
- LLM → answering, summarizing, transforming text.
Cost/latency:
- Embedding inference is usually cheaper/faster than full LLM generation.

Think of embeddings as the “memory index” and LLMs as the “reasoning/writing engine.” In RAG systems, embeddings help you find the right facts; the LLM helps explain or compose them.

When should I use a semantic index?

Use a semantic index when:

Keyword search misses relevant content due to synonyms or phrasing.
You need similarity search (“find things like this”).
You’re building RAG pipelines and must retrieve the right context for the LLM.

Complement it with keyword search when:

You need strict term matching (IDs, code symbols, exact phrases).
Legal or compliance requires exact matches.

Key takeaways

A semantic index stores vector embeddings of your content and retrieves by meaning, not exact words.
Embeddings are numeric representations optimized for similarity; LLMs generate text.
Vector databases power fast similarity search and pair well with keyword search and re‑ranking.
In RAG, embeddings do the “recall,” LLMs do the “reasoning and writing.”