Problem
Building semantic search for a knowledge base (markdown files, session logs, vault notes) usually requires deploying a vector database like Pinecone, Qdrant, or pgvector. This adds infrastructure cost, latency, and a backend dependency for what could be a fully client-side feature.
Solution
Combine DuckDB WASM for in-browser SQL with HuggingFace Transformers.js for client-side embeddings. No backend required.
1. Load the embedding model in the browser
import { pipeline } from "@huggingface/transformers";
const embedder = await pipeline(
"feature-extraction",
"Xenova/all-MiniLM-L6-v2",
{ device: "webgpu" } // Falls back to WASM if WebGPU unavailable
);
async function embed(text: string): Promise<Float32Array> {
const output = await embedder(text, {
pooling: "mean",
normalize: true,
});
return output.data as Float32Array;
}
2. Initialize DuckDB WASM with vector support
import * as duckdb from "@duckdb/duckdb-wasm";
const bundle = await duckdb.selectBundle(MANUAL_BUNDLES);
const worker = new Worker(bundle.mainWorker!);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule);
const conn = await db.connect();
await conn.query(`
CREATE TABLE documents (
id INTEGER PRIMARY KEY,
title VARCHAR,
content VARCHAR,
embedding FLOAT[384]
)
`);
3. Index documents and search
async function indexDocument(id: number, title: string, content: string) {
const vector = await embed(content);
await conn.query(`
INSERT INTO documents VALUES (
${id},
'${title.replace(/'/g, "''")}',
'${content.replace(/'/g, "''")}',
[${Array.from(vector).join(",")}]
)
`);
}
async function search(query: string, limit = 5) {
const queryVec = await embed(query);
const result = await conn.query(`
SELECT id, title, content,
list_cosine_similarity(embedding, [${Array.from(queryVec).join(",")}]) AS score
FROM documents
ORDER BY score DESC
LIMIT ${limit}
`);
return result.toArray();
}
Why It Works
DuckDB WASM runs the full DuckDB engine in the browser, including array operations for cosine similarity. HuggingFace Transformers.js runs embedding models (like MiniLM-L6-v2 at 23MB) directly in the browser using WebGPU or WASM. Together they eliminate the need for any backend infrastructure. DuckDB's columnar storage and vectorized execution make similarity searches over thousands of documents fast enough for real-time use.
Context
- The
all-MiniLM-L6-v2model produces 384-dimensional embeddings and is only 23MB -- loads in seconds on modern connections - DuckDB WASM supports
list_cosine_similaritynatively for vector distance calculations - For larger datasets (10k+ documents), consider pre-computing embeddings at build time and loading as a Parquet file
- Josh Peak uses DuckDB to query Claude Code session
.jsonlfiles for self-reflection and cost tracking - This pattern pairs well with Obsidian vaults or markdown knowledge bases exported as JSON
- WebGPU acceleration makes embedding generation 5-10x faster than WASM-only on supported browsers