Vector Search Explained

Vector SearchEmbeddingsAzure AI SearchRAG

You walk into a library and ask the librarian, "I need something about fixing a phone that gets too hot." She does not search for the exact words "phone too hot." She thinks for a second and hands you a book on thermal throttling — a topic you never mentioned.

That is the idea behind vector search. Regular search looks for matching words. Vector search looks for matching meaning. If you are building AI apps that answer questions from company documents, you will run into this concept quickly. Let us break it down.

What is vector search?

Vector search (also called semantic search) finds information by comparing mathematical representations of text called embeddings.

An embedding is a long list of numbers — think of it as a fingerprint for meaning. Two sentences about the same idea produce similar fingerprints, even when they use different words. "My laptop overheats" and "Notebook thermal issues" land close together in this number space.

A vector is simply a list of numbers arranged in order. In search, we store one vector per document chunk and compare them to the vector for your question.

Why do we need it?

Keyword search fails in real life all the time. Users type how they talk, not how documents are written. A customer searches "cancel membership" but your help article says "terminate subscription." Keyword search returns nothing. Vector search still finds it.

It matters especially for RAG (Retrieval-Augmented Generation) — the pattern where an AI reads your documents before answering. RAG is like an open-book exam: the model can only answer well if it finds the right page first. Vector search is how you find that page.

Without it, chatbots guess from memory and hallucinate. With it, they pull relevant chunks from your knowledge base first.

How does it work?

The process has two sides: indexing and querying.

Indexing: Split documents into chunks. Send each chunk to an embedding model (like Azure OpenAI's text-embedding-3-small). Store the text plus its vector in a search index.

Querying: Embed the user's question the same way. Compare that vector to all stored vectors. Return the closest matches.

Cosine similarity is the usual way to measure "closeness." Imagine two arrows drawn from the center of a circle. If they point in nearly the same direction, the ideas are similar. Cosine similarity scores that angle — 1.0 means very similar, 0 means unrelated.

Company PDFs and wiki pages
        ↓
   Split into chunks
        ↓
   Embedding model → vectors stored
        ↓
User asks: "How do I reset my password?"
        ↓
   Question → embedding
        ↓
   Find nearest vectors → top 5 chunks
        ↓
   Send chunks to GPT for the final answer

Hybrid search blends vector search with classic keyword search. That way you catch exact matches (error codes, product SKUs) and meaning-based matches. Azure AI Search supports this out of the box.

Note Embeddings are model-specific. If you switch embedding models, you must re-index all documents — old vectors are not compatible with new ones.

Real-world example

A SaaS company builds an internal support bot. An engineer types: "SSO login broken after update."

Keyword search might miss a doc titled "SAML federation troubleshooting." Vector search connects "SSO login" with "SAML federation" because the meanings overlap. The bot retrieves the right section, quotes it, and the engineer fixes the issue in minutes instead of opening a ticket.

Spotify uses a related idea when recommending songs — tracks with similar audio fingerprints cluster together. Vector search does the same thing, but for text meaning.

Step-by-step: add vector search

Step 1: Create an Azure AI Search index with a text field and a vector field (size must match your embedding model — often 1536 dimensions).

Step 2: Chunk your documents (500–800 tokens per chunk is a common starting point).

Step 3: Generate embeddings with Azure OpenAI and upload chunks plus vectors.

Step 4: At query time, embed the question, run vector or hybrid search, take the top K results.

Step 5: Pass those chunks to a chat model as context for the final answer.

var embeddingClient = openAI.GetEmbeddingClient("text-embedding-3-small");
OpenAIEmbedding embedding = await embeddingClient.GenerateEmbeddingAsync(chunkText);

await searchClient.MergeOrUploadDocumentsAsync(new[]
{
    new SearchDocument
    {
        ["id"] = chunkId,
        ["content"] = chunkText,
        ["contentVector"] = embedding.ToFloats().ToArray(),
        ["source"] = blobPath
    }
});

Common misconceptions

"Vector search replaces keyword search." Not always. Use hybrid search. Exact codes and names still benefit from keyword matching.

"Bigger chunks are always better." Huge chunks dilute meaning. Small, focused chunks retrieve more precisely.

"If vector search returns something, the answer is correct." Retrieval finds similar text, not true text. Bad source documents produce bad answers. You still need quality content and evaluation.

Search typeGood atWeak at
KeywordExact product codes, names, error IDsSynonyms and paraphrasing
VectorMeaning, paraphrases, conceptsVery short exact tokens
HybridBoth worlds combinedSlightly more setup and cost

Quick recap

  • Vector search matches meaning using number fingerprints called embeddings.
  • Indexing converts document chunks to vectors; querying embeds the question and finds nearest neighbors.
  • Hybrid search combines keywords and vectors for the best real-world results.
  • RAG apps depend on vector search to fetch the right context before the model answers.

Summary

Vector search is how computers imitate a good librarian — understanding what you mean, not just what you typed. Embeddings turn words into comparable numbers; cosine similarity finds the closest ideas.

Start with hybrid search on Azure AI Search, chunk documents thoughtfully, and always test with real user questions. When retrieval works, everything upstream — RAG, copilots, support bots — gets dramatically better.

Frequently Asked Questions

Vector search finds documents by meaning. Text is converted to number lists called embeddings, and the search engine returns items whose embeddings are closest to your question.

An embedding is a list of numbers that represents the meaning of text. Similar ideas produce similar number patterns, even when the exact words differ.

Keyword search matches exact words. Vector search matches ideas — so a query about laptop overheating can find a document titled thermal throttling guide.

Cosine similarity measures how close two vectors point in the same direction. Higher scores mean more similar meaning.

Hybrid search combines keyword matching and vector matching so you get both exact matches (like error codes) and meaning-based matches.

You need a search engine that stores vectors and compares them quickly. Azure AI Search, dedicated vector databases, and some general databases support this.

RAG systems use vector search to find relevant document chunks before asking a language model to answer. It is the retrieval step in retrieval-augmented generation.

Key Takeaways

  • Embeddings turn text into numbers that capture meaning, not just words.
  • Vector search finds the closest ideas using similarity scores like cosine similarity.
  • Hybrid search is the practical default — combine keywords and vectors.
  • Good chunking and quality source documents matter as much as the algorithm.

Suggested Next Reads

Share: LinkedIn Facebook X

Need help implementing this in your organization?

Contact Emerrank Consultancy