Building LumenVec: A Go Vector Database for Predictable Retrieval Performance

> Recommended tags: go, golang, vector-database, rag, backend, performance

Why I started building LumenVec

Vector search has become a core primitive for modern AI applications.

Whether the workload is semantic search, retrieval-augmented generation, recommendations, or agent memory, the same pattern shows up quickly:

the prototype works, but production pressure exposes different constraints.

The issue is usually not whether vector search can return relevant results.

The issue is whether the system can do it:

with stable latency,
with useful throughput,
with enough retrieval quality,
and without becoming operationally expensive or overly complex.

That is why I started building LumenVec, a vector database written in Go.

The real problem is production behavior

A lot of vector infrastructure discussion focuses on feature matrices or isolated speed claims.

Those are useful only up to a point.

In practice, teams care more about:

how p95 and p99 behave under concurrency,
whether throughput stays acceptable as demand grows,
how expensive the system is to operate,
and how much effort it takes to make it predictable.

That is the lens I wanted to apply to LumenVec from the start.

What LumenVec is optimizing for

LumenVec is designed around a few practical priorities:

1. Predictable tail latency

Median numbers are easy to advertise. Production pain usually appears in the tail.

2. High-throughput concurrent retrieval

The system should remain useful when multiple clients are querying simultaneously.

3. Operational simplicity

Infrastructure becomes more valuable when it is easier to run, inspect, and evolve.

4. Pragmatic API design

Some integrations care more about compatibility. Others care more about lower overhead. The system should support practical choices.

Why Go was the right fit

Go gives a strong engineering balance for infrastructure software:

straightforward deployment,
strong concurrency support,
a solid runtime story,
and a good ecosystem for observability and systems programming.

For LumenVec, that mattered as much as raw speed.

I was not only optimizing for a benchmark. I was optimizing for a system that is realistic to operate.

Benchmark setup

The current benchmark snapshot used:

10,000 vectors
128 dimensions
500 measured queries
100 warmup queries
concurrency 4
top-k 10
3 runs per row
ingest batch sizes of 100, 500, 1000, and 2000

Compared systems included LumenVec, pgvector, Weaviate, Qdrant, and Chroma.

How I plan to evaluate it

I am preparing a benchmark comparison between LumenVec and established alternatives.

The benchmark will focus on metrics that actually matter:

QPS
p95 / p99 latency
recall@k
behavior under concurrency

That last metric is essential.

A system is not meaningfully faster if it only looks better after quality is reduced or the test environment is unrealistic.

What the current benchmark shows

The strongest ANN result so far is:

LumenVec ann-quality over HTTP
1,947.29 QPS
3.466 ms p95
5.285 ms p99
0.7568 recall@10

That makes it the top search-throughput result in this benchmark among rows with recall@10 >= 0.75.

The strongest exact-search result is:

LumenVec exact over gRPC
865.39 QPS
6.701 ms p95
9.012 ms p99
1.0000 recall@10

Exact search comparison

Best exact rows in this snapshot:

LumenVec exact gRPC: 865.39 QPS, 6.701 ms p95
pgvector exact: 523.03 QPS, 10.954 ms p95
Qdrant default: 90.45 QPS, 47.903 ms p95

On this benchmark host, LumenVec exact outperformed the best exact pgvector row in both throughput and p95 latency.

Exact-search throughput on this benchmark host. p95 latency is shown beside each result.

What I want the benchmark to show

The benchmark is meant to be useful for engineers making infrastructure decisions.

So the point is not to produce a vanity chart.

The point is to clarify trade-offs like:

when one system has better throughput but worse tail latency,
when another preserves quality better under pressure,
and where LumenVec sits in terms of the balance between speed, quality, and simplicity.

The search-throughput and latency charts already show a useful pattern:

And the recall-vs-QPS frontier makes the profile trade-off explicit:

What still needs validation

There is still work ahead.

That includes:

broader workload coverage,
more production-style evaluation,
clearer reproducibility artifacts,
and ongoing product hardening.

There is also one caveat I would keep explicit: the baseline comparison for this run flagged broad regressions across multiple engines on this host. So this is a valid snapshot benchmark, but it should not yet be presented as a final controlled-lab result.

That is part of building serious infrastructure.

What comes next

I will publish the benchmark results with methodology, charts, and interpretation once the benchmark package is finalized.

That follow-up will include:

the test setup,
the compared engines,
the workloads,
the quality threshold,
and the results in a form other engineers can evaluate critically.

Final note

If you are building in the space of:

RAG,
semantic search,
recommendation systems,
or agent retrieval pipelines,

I would be interested in hearing what matters most in your production environment.

That is exactly the kind of feedback loop I want LumenVec to be shaped by.

Building LumenVec: A Go Vector Database for Predictable Retrieval Performance

Building LumenVec: A Go Vector Database for Predictable Retrieval Performance

Why I started building LumenVec

The real problem is production behavior