We live in a world full of information, but finding exactly what you need can still be frustrating. Traditional search engines rely on keywords, meaning if you don’t type the exact words in a document, you might never find it.
But major search engines like Google are now moving away from this keyword-centric lexical search approach. They are now adopting what is called semantic search. Instead of matching keywords, semantic search understands the meaning of your query and finds the most relevant results. Finding that meaning involves understanding the searcher’s context and intent behind the terms that appear in searchable databases.
Organizations aiming to stay ahead are already exploring how semantic search can be woven into their enterprise web strategies, helping them deliver more connected and meaningful user experiences.
In this blog, we’ll build a semantic search engine for SQL databases that runs fully offline on your computer using PostgreSQL 17, sentence-transformers, and Streamlit.
How semantic search works
Semantic search is a smarter way to find information. Instead of just matching exact keywords, it looks at the meaning and context of words. To do this, it turns words, sentences, or documents into dense vectors, which are long lists of numbers that capture their meaning.
When you search, your query is also converted into a vector. The system then compares it against stored vectors to find the closest matches. The closer two vectors are, the more similar their meanings.
A common way to measure this closeness is cosine similarity. It checks how much two vectors “point” in the same direction. If the score is close to 1, it means the system has found something highly relevant to your query.
Why use PostgreSQL and embeddings
Normally, SQL databases search by matching words exactly. So, if you search “car”, SQL will only find rows that contain car and overlook results like automobile.
But semantic search changes this by using something called embeddings. To put it simply, embeddings are numbers that represent the meaning of text, just like how semantic search works.
PostgreSQL is not just a relational database; with the pgvector extension, it becomes a powerful vector database. This means we can store text embeddings and perform similarity searches directly inside SQL.
Managing these embeddings alongside vast amounts of structured and unstructured information requires strong data management practices, which ensure that search systems remain both efficient and scalable.
For example, instead of using a query that looks for exact keyword matches, we can run a similarity search that retrieves the closest matches based on vector embeddings. This ensures that searching for a word like “automobile” will still return results that contain “car,” since the concepts are related.
For example, instead of:
sql
SELECT * FROM docs WHERE text ILIKE ‘%car%’;
We can do:
sql
SELECT * FROM docs ORDER BY embedding <-> query_embedding LIMIT 5;
Implementing semantic search in SQL
We’ll be using these three technologies for this purpose:
PostgreSQL 17 for storing documents and embeddings
Sentence-Transformers for generating embeddings
Streamlit for a user-friendly search interface
Step 1: Installing dependencies
First, install PostgreSQL 17 on your system and enable the pgvector extension. Then install the necessary Python libraries, including psycopg2, sentence-transformers, and streamlit.
Enable the pgvector extension:
sql
CREATE EXTENSION IF NOT EXISTS vector;
Now install Python libraries:
bash
pip install psycopg2 sentence-transformers streamlit
Step 2: Creating the database table
Inside PostgreSQL, create a table to hold documents and their embeddings. Each document should have an ID, title, category, content, and a vector column for the embedding. The embedding column must match the dimension of the model you use. For example, the model all-MiniLM-L6-v2 produces 384-dimensional vectors.
sql
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT,
category TEXT,
content TEXT,
embedding vector(384) — 384 = embedding dimension
);
Step 4: Inserting data with Python
Next, prepare your dataset and generate embeddings using a sentence-transformers model. Store both the document text and the generated embedding in PostgreSQL. This ensures that each document can be searched not only by its raw content but also by its meaning, represented by the embedding.
from sentence_transformers import SentenceTransformer
import psycopg2
# Load embedding model (768 dimensions)
model = SentenceTransformer(“intfloat/e5-base-v2”)
# Connect to PostgreSQL
conn = psycopg2.connect(
dbname=”semantic_db”,
user=”postgres”,
password=”mypass”, # replace with your password
host=”localhost”,
port=”5432″
)
cur = conn.cursor()
# Example dataset
docs = [
{“title”: “Fast Cars”, “category”: “transport”, “content”: “The car is fast and reliable.”},
{“title”: “Automobiles”, “category”: “transport”, “content”: “I love driving an automobile.”},
{“title”: “Motorbikes”, “category”: “transport”, “content”: “Bikes are a fun mode of transport.”},
{“title”: “Air Travel”, “category”: “aviation”, “content”: “Airplanes fly in the sky.”},
{“title”: “Cats”, “category”: “animals”, “content”: “Cats are beautiful animals.”},
{“title”: “Dogs”, “category”: “animals”, “content”: “Dogs are loyal companions.”},
]
for doc in docs:
emb = model.encode(doc[“content”]).tolist()
cur.execute(
“INSERT INTO documents (title, category, content, embedding) VALUES (%s, %s, %s, %s)”,
(doc[“title”], doc[“category”], doc[“content”], emb)
)
conn.commit()
cur.close()
conn.close()
print(“Data inserted with metadata!”)
Step 4: Performing semantic search
To perform a search, encode the user’s query into an embedding and compare it against stored embeddings in the database. Instead of keyword matches, results are ranked based on semantic similarity.
For example, searching for “automobile” will bring up entries related to “car” even if the word “automobile” does not appear in the database.
python
query = “automobile”
q_emb = model.encode(query).tolist()
cur.execute(“””
SELECT title, content
FROM documents
ORDER BY embedding <-> %s
LIMIT 3;
“””, (q_emb,))
Step 5: Adding a Streamlit user interface
Streamlit can be used to build a simple front-end where users can type queries and instantly see the most relevant results from the database. This makes the search system more accessible and user-friendly, without needing to directly query SQL.
Behind the scenes, connecting such interfaces with well-structured data warehousing solutions can further enhance performance, making large-scale search and retrieval faster and more reliable.
import streamlit as st
import psycopg2
from sentence_transformers import SentenceTransformer
# Load embedding model (same one used during insertion)
model = SentenceTransformer(“intfloat/e5-base-v2”)
# PostgreSQL connection
def get_connection():
return psycopg2.connect(
dbname=”semantic_db”,
user=”postgres”,
password=”mypass”, #replace with your password
host=”localhost”,
port=”5432″
)
# Semantic search function
def semantic_search(query, category=None, limit=5):
conn = get_connection()
cur = conn.cursor()
q_emb = model.encode(query).tolist()
if category and category != “All”:
cur.execute(“””
SELECT title, category, content
FROM documents
WHERE category = %s
ORDER BY embedding <-> %s::vector
LIMIT %s;
“””, (category, q_emb, limit))
else:
cur.execute(“””
SELECT title, category, content
FROM documents
ORDER BY embedding <-> %s::vector
LIMIT %s;
“””, (q_emb, limit))
results = cur.fetchall()
cur.close()
conn.close()
return results
# —————– Streamlit UI —————–
st.set_page_config(page_title=”Semantic Search”, layout=”wide”)
st.title(“🔎Local Semantic Search (PostgreSQL + Embeddings)”)
query = st.text_input(“Enter your search query:”, “”)
col1, col2 = st.columns([2,1])
with col1:
limit = st.slider(“Number of results”, 1, 10, 3)
with col2:
category = st.selectbox(“Category filter”, [“All”, “transport”, “aviation”, “animals”])
if st.button(“Search”) and query:
with st.spinner(“Searching…”):
results = semantic_search(query, category, limit)
if results:
st.success(f”Found {len(results)} results:”)
for r in results:
st.markdown(f”**[{r[1]}] {r[0]}** \n{r[2]}”)
st.markdown(“—“)
else:
st.warning(“No results found.”)
Example in action
- Searching for “pets” returns documents about dogs or cats, even if the word “pets” is not in the database.
- Searching for “automobile” finds entries about cars.
- Searching for “flights” brings up documents about airplanes.
This demonstrates the power of semantic search to capture meaning instead of relying only on keywords.
How to improve further
- Add indexing with IVF Flat to speed up searches for large datasets.
- Add category filters to restrict results to specific topics such as transport, animals, or aviation.
- Allow file uploads so users can insert PDFs or text files into the database automatically.
- Add highlighting in the results to show why a document was selected.
- Explore reranking approaches to improve result quality.
Why this matters
The system runs offline, without relying on third-party APIs, and all data remains under your control inside PostgreSQL.
It is also highly extendable; you can add new document types, categories, or even expand into images in the future. This setup gives you a local AI-powered search engine that you fully own and control.
Conclusion
Semantic search is changing the traditional SEO practices and methods. Search engines now look for the intent of the user to improve search results. Semantic search gives more accurate answers in an information-saturated world.
With semantic search in SQL, organizations don’t need a separate vector database. Instead, they can enrich existing SQL workflows with meaning-based retrieval, which reduces infrastructure costs while improving search relevance.
Companies can make faster, better decisions without leaving the SQL environment they already trust. Partner with Xavor’s enterprise web presence services if you want to transform your data through semantic search.
Drop us a line at [email protected], and our experts will get back to you in 24-48 hours.

