Semantic Index What Is It?

Semantic Index What Is It?

This post gives you a practical, no‑nonsense overview of semantic indexing and vector search so you can tell when (and how) to use it.

What is a semantic index?

A semantic index is a searchable “map” of meaning built from your content using vectors (numerical representations of text). Instead of matching exact words like a traditional keyword index, a semantic index groups and retrieves items by similarity in meaning.

  • Keyword search: “laptop” matches only documents containing the word “laptop”.
  • Semantic search: “lightweight notebook for travel” can also find content about “ultrabooks” or “portable laptops” because the meanings are close.

Under the hood, a semantic index stores an embedding vector for each chunk/document and uses a similarity metric (most commonly cosine similarity) to find the closest items to your query’s vector.

What is a vector? (with text examples)

A vector is just a list of numbers. Embedding models turn text into vectors so that semantic closeness becomes geometric closeness.

Examples of short texts (conceptual only; numbers are illustrative):

  • “I like coffee” → [0.62, 0.11, -0.28, 0.44, …]
  • “I enjoy espresso” → [0.60, 0.08, -0.25, 0.47, …] ← very close in meaning
  • “The cat sits on the mat” → [-0.12, 0.91, 0.33, -0.05, …] ← unrelated topic

The closer two vectors are, the more similar the meanings. Cosine similarity is a common way to measure this closeness.

Tiny cosine example (C#)

using System;
using System.Linq;

static double Cosine(double[] a, double[] b)
{
    if (a.Length != b.Length) throw new ArgumentException("Vector lengths differ");
    double dot = a.Zip(b, (x, y) => x * y).Sum();
    double na = Math.Sqrt(a.Sum(x => x * x));
    double nb = Math.Sqrt(b.Sum(x => x * x));
    return dot / (na * nb);
}

var v1 = new double[] { 0.62, 0.11, -0.28, 0.44 };
var v2 = new double[] { 0.60, 0.08, -0.25, 0.47 };
var v3 = new double[] { -0.12, 0.91, 0.33, -0.05 };

Console.WriteLine($"sim(v1, v2) = {Cosine(v1, v2):F3}"); // high → similar meaning
Console.WriteLine($"sim(v1, v3) = {Cosine(v1, v3):F3}"); // low  → different topic

How does search work in a vector database?

At a high level:

  1. Chunk your content (e.g., paragraphs, sections, pages).
  2. Generate an embedding vector for each chunk with a text embedding model.
  3. Store vectors (plus metadata like title, URL, tags) in a vector database.
  4. At query time, embed the user’s query into a vector.
  5. Retrieve the top‑k nearest vectors by similarity (ANN search such as HNSW/IVF/ScaNN).
  6. (Optional) Re‑rank and filter with metadata, apply hybrid (keyword + vector) search, or pass top results to an LLM for answer synthesis.

Query-time flow

  • Input: “best travel laptop with long battery life”
  • Embed query → q⃗
  • Vector DB returns nearest chunks about “ultrabooks”, “portable notebooks”, “battery life tests”.
  • You display results or feed the retrieved text to an LLM for a grounded answer.

What is an embedding?

An embedding is a dense vector representation of input text produced by a specialized model. Good embeddings place semantically similar texts near each other in vector space.

Key properties:

  • Dimensionality: typical sizes range from a few hundred to several thousand numbers per text.
  • Domain: general-purpose embeddings work broadly; domain‑tuned embeddings (e.g., code, legal, medical) capture niche semantics better.
  • Stability: for the same input and model, embeddings are deterministic.

Why is a text embedding different from an LLM?

They solve different problems and are trained with different objectives:

  • Purpose:
    • Embedding model → representation learning for similarity, clustering, classification, retrieval.
    • LLM (chat/completion) → generative modeling to produce fluent text.
  • I/O shape:
    • Embedding model → input text, output vector (numbers).
    • LLM → input prompt, output text tokens.
  • Objective:
    • Embedding → place semantically similar items close together; optimize contrastive/similarity loss.
    • LLM → predict next token; optimize language modeling loss.
  • Usage:
    • Embedding → searching, deduplication, recommendations, RAG retrieval.
    • LLM → answering, summarizing, transforming text.
  • Cost/latency:
    • Embedding inference is usually cheaper/faster than full LLM generation.

Think of embeddings as the “memory index” and LLMs as the “reasoning/writing engine.” In RAG systems, embeddings help you find the right facts; the LLM helps explain or compose them.

When should I use a semantic index?

Use a semantic index when:

  • Keyword search misses relevant content due to synonyms or phrasing.
  • You need similarity search (“find things like this”).
  • You’re building RAG pipelines and must retrieve the right context for the LLM.

Complement it with keyword search when:

  • You need strict term matching (IDs, code symbols, exact phrases).
  • Legal or compliance requires exact matches.

Key takeaways

  • A semantic index stores vector embeddings of your content and retrieves by meaning, not exact words.
  • Embeddings are numeric representations optimized for similarity; LLMs generate text.
  • Vector databases power fast similarity search and pair well with keyword search and re‑ranking.
  • In RAG, embeddings do the “recall,” LLMs do the “reasoning and writing.”