Semantic Index What Is It?
This post gives you a practical, no‑nonsense overview of semantic indexing and vector search so you can tell when (and how) to use it.
What is a semantic index?
A semantic index is a searchable “map” of meaning built from your content using vectors (numerical representations of text). Instead of matching exact words like a traditional keyword index, a semantic index groups and retrieves items by similarity in meaning.
- Keyword search: “laptop” matches only documents containing the word “laptop”.
- Semantic search: “lightweight notebook for travel” can also find content about “ultrabooks” or “portable laptops” because the meanings are close.
Under the hood, a semantic index stores an embedding vector for each chunk/document and uses a similarity metric (most commonly cosine similarity) to find the closest items to your query’s vector.
What is a vector? (with text examples)
A vector is just a list of numbers. Embedding models turn text into vectors so that semantic closeness becomes geometric closeness.
Examples of short texts (conceptual only; numbers are illustrative):
- “I like coffee” → [0.62, 0.11, -0.28, 0.44, …]
- “I enjoy espresso” → [0.60, 0.08, -0.25, 0.47, …] ← very close in meaning
- “The cat sits on the mat” → [-0.12, 0.91, 0.33, -0.05, …] ← unrelated topic
The closer two vectors are, the more similar the meanings. Cosine similarity is a common way to measure this closeness.
Tiny cosine example (C#)
using System;
using System.Linq;
static double Cosine(double[] a, double[] b)
{
if (a.Length != b.Length) throw new ArgumentException("Vector lengths differ");
double dot = a.Zip(b, (x, y) => x * y).Sum();
double na = Math.Sqrt(a.Sum(x => x * x));
double nb = Math.Sqrt(b.Sum(x => x * x));
return dot / (na * nb);
}
var v1 = new double[] { 0.62, 0.11, -0.28, 0.44 };
var v2 = new double[] { 0.60, 0.08, -0.25, 0.47 };
var v3 = new double[] { -0.12, 0.91, 0.33, -0.05 };
Console.WriteLine($"sim(v1, v2) = {Cosine(v1, v2):F3}"); // high → similar meaning
Console.WriteLine($"sim(v1, v3) = {Cosine(v1, v3):F3}"); // low → different topic
How does search work in a vector database?
At a high level:
- Chunk your content (e.g., paragraphs, sections, pages).
- Generate an embedding vector for each chunk with a text embedding model.
- Store vectors (plus metadata like title, URL, tags) in a vector database.
- At query time, embed the user’s query into a vector.
- Retrieve the top‑k nearest vectors by similarity (ANN search such as HNSW/IVF/ScaNN).
- (Optional) Re‑rank and filter with metadata, apply hybrid (keyword + vector) search, or pass top results to an LLM for answer synthesis.
Query-time flow
- Input: “best travel laptop with long battery life”
- Embed query → q⃗
- Vector DB returns nearest chunks about “ultrabooks”, “portable notebooks”, “battery life tests”.
- You display results or feed the retrieved text to an LLM for a grounded answer.
What is an embedding?
An embedding is a dense vector representation of input text produced by a specialized model. Good embeddings place semantically similar texts near each other in vector space.
Key properties:
- Dimensionality: typical sizes range from a few hundred to several thousand numbers per text.
- Domain: general-purpose embeddings work broadly; domain‑tuned embeddings (e.g., code, legal, medical) capture niche semantics better.
- Stability: for the same input and model, embeddings are deterministic.
Why is a text embedding different from an LLM?
They solve different problems and are trained with different objectives:
- Purpose:
- Embedding model → representation learning for similarity, clustering, classification, retrieval.
- LLM (chat/completion) → generative modeling to produce fluent text.
- I/O shape:
- Embedding model → input text, output vector (numbers).
- LLM → input prompt, output text tokens.
- Objective:
- Embedding → place semantically similar items close together; optimize contrastive/similarity loss.
- LLM → predict next token; optimize language modeling loss.
- Usage:
- Embedding → searching, deduplication, recommendations, RAG retrieval.
- LLM → answering, summarizing, transforming text.
- Cost/latency:
- Embedding inference is usually cheaper/faster than full LLM generation.
Think of embeddings as the “memory index” and LLMs as the “reasoning/writing engine.” In RAG systems, embeddings help you find the right facts; the LLM helps explain or compose them.
When should I use a semantic index?
Use a semantic index when:
- Keyword search misses relevant content due to synonyms or phrasing.
- You need similarity search (“find things like this”).
- You’re building RAG pipelines and must retrieve the right context for the LLM.
Complement it with keyword search when:
- You need strict term matching (IDs, code symbols, exact phrases).
- Legal or compliance requires exact matches.
Key takeaways
- A semantic index stores vector embeddings of your content and retrieves by meaning, not exact words.
- Embeddings are numeric representations optimized for similarity; LLMs generate text.
- Vector databases power fast similarity search and pair well with keyword search and re‑ranking.
- In RAG, embeddings do the “recall,” LLMs do the “reasoning and writing.”