Skip to content

Client-side vector search with DuckDB WASM and HuggingFace embeddings

pattern

Semantic search on markdown knowledge bases typically requires a server-side vector database

huggingfaceduckdbwasmvector-searchbrowser
21 views

Problem

Building semantic search for a knowledge base (markdown files, session logs, vault notes) usually requires deploying a vector database like Pinecone, Qdrant, or pgvector. This adds infrastructure cost, latency, and a backend dependency for what could be a fully client-side feature.

Solution

Combine DuckDB WASM for in-browser SQL with HuggingFace Transformers.js for client-side embeddings. No backend required.

1. Load the embedding model in the browser

import { pipeline } from "@huggingface/transformers";

const embedder = await pipeline(
  "feature-extraction",
  "Xenova/all-MiniLM-L6-v2",
  { device: "webgpu" } // Falls back to WASM if WebGPU unavailable
);

async function embed(text: string): Promise<Float32Array> {
  const output = await embedder(text, {
    pooling: "mean",
    normalize: true,
  });
  return output.data as Float32Array;
}

2. Initialize DuckDB WASM with vector support

import * as duckdb from "@duckdb/duckdb-wasm";

const bundle = await duckdb.selectBundle(MANUAL_BUNDLES);
const worker = new Worker(bundle.mainWorker!);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule);

const conn = await db.connect();

await conn.query(`
  CREATE TABLE documents (
    id INTEGER PRIMARY KEY,
    title VARCHAR,
    content VARCHAR,
    embedding FLOAT[384]
  )
`);

3. Index documents and search

async function indexDocument(id: number, title: string, content: string) {
  const vector = await embed(content);
  await conn.query(`
    INSERT INTO documents VALUES (
      ${id},
      '${title.replace(/'/g, "''")}',
      '${content.replace(/'/g, "''")}',
      [${Array.from(vector).join(",")}]
    )
  `);
}

async function search(query: string, limit = 5) {
  const queryVec = await embed(query);
  const result = await conn.query(`
    SELECT id, title, content,
      list_cosine_similarity(embedding, [${Array.from(queryVec).join(",")}]) AS score
    FROM documents
    ORDER BY score DESC
    LIMIT ${limit}
  `);
  return result.toArray();
}

Why It Works

DuckDB WASM runs the full DuckDB engine in the browser, including array operations for cosine similarity. HuggingFace Transformers.js runs embedding models (like MiniLM-L6-v2 at 23MB) directly in the browser using WebGPU or WASM. Together they eliminate the need for any backend infrastructure. DuckDB's columnar storage and vectorized execution make similarity searches over thousands of documents fast enough for real-time use.

Context

  • The all-MiniLM-L6-v2 model produces 384-dimensional embeddings and is only 23MB -- loads in seconds on modern connections
  • DuckDB WASM supports list_cosine_similarity natively for vector distance calculations
  • For larger datasets (10k+ documents), consider pre-computing embeddings at build time and loading as a Parquet file
  • Josh Peak uses DuckDB to query Claude Code session .jsonl files for self-reflection and cost tracking
  • This pattern pairs well with Obsidian vaults or markdown knowledge bases exported as JSON
  • WebGPU acceleration makes embedding generation 5-10x faster than WASM-only on supported browsers
About this share
Contributormblode
Repositorymblode/shares
CreatedFeb 10, 2026
View on GitHub