Client-side vector search with DuckDB WASM and HuggingFace embeddings

Problem

Building semantic search for a knowledge base (markdown files, session logs, vault notes) usually requires deploying a vector database like Pinecone, Qdrant, or pgvector. This adds infrastructure cost, latency, and a backend dependency for what could be a fully client-side feature.

Solution

Combine DuckDB WASM for in-browser SQL with HuggingFace Transformers.js for client-side embeddings. No backend required.

1. Load the embedding model in the browser

import { pipeline } from "@huggingface/transformers";

const embedder = await pipeline(
  "feature-extraction",
  "Xenova/all-MiniLM-L6-v2",
  { device: "webgpu" } // Falls back to WASM if WebGPU unavailable
);

async function embed(text: string): Promise<Float32Array> {
  const output = await embedder(text, {
    pooling: "mean",
    normalize: true,
  });
  return output.data as Float32Array;
}

2. Initialize DuckDB WASM with vector support

import * as duckdb from "@duckdb/duckdb-wasm";

const bundle = await duckdb.selectBundle(MANUAL_BUNDLES);
const worker = new Worker(bundle.mainWorker!);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule);

const conn = await db.connect();

await conn.query(`
  CREATE TABLE documents (
    id INTEGER PRIMARY KEY,
    title VARCHAR,
    content VARCHAR,
    embedding FLOAT[384]
  )
`);

3. Index documents and search

async function indexDocument(id: number, title: string, content: string) {
  const vector = await embed(content);
  await conn.query(`
    INSERT INTO documents VALUES (
      ${id},
      '${title.replace(/'/g, "''")}',
      '${content.replace(/'/g, "''")}',
      [${Array.from(vector).join(",")}]
    )
  `);
}

async function search(query: string, limit = 5) {
  const queryVec = await embed(query);
  const result = await conn.query(`
    SELECT id, title, content,
      list_cosine_similarity(embedding, [${Array.from(queryVec).join(",")}]) AS score
    FROM documents
    ORDER BY score DESC
    LIMIT ${limit}
  `);
  return result.toArray();
}

Why It Works

DuckDB WASM runs the full DuckDB engine in the browser, including array operations for cosine similarity. HuggingFace Transformers.js runs embedding models (like MiniLM-L6-v2 at 23MB) directly in the browser using WebGPU or WASM. Together they eliminate the need for any backend infrastructure. DuckDB's columnar storage and vectorized execution make similarity searches over thousands of documents fast enough for real-time use.

Context

The all-MiniLM-L6-v2 model produces 384-dimensional embeddings and is only 23MB -- loads in seconds on modern connections
DuckDB WASM supports list_cosine_similarity natively for vector distance calculations
For larger datasets (10k+ documents), consider pre-computing embeddings at build time and loading as a Parquet file
Josh Peak uses DuckDB to query Claude Code session .jsonl files for self-reflection and cost tracking
This pattern pairs well with Obsidian vaults or markdown knowledge bases exported as JSON
WebGPU acceleration makes embedding generation 5-10x faster than WASM-only on supported browsers