WoW: A Window-to-Window Incremental Index for Range-Filtering Approximate Nearest Neighbor Search, SIGMOD 2026 (Round 2)

This is the header-only library of WoW, for the benchmarking repository, please check here.

Despite for range-filtering scenarios, it can also be used as a generic predicate-filtering index.

Requirements

A modern C++ compiler that supports C++20
CMake 3.20 or higher
OpenMP
Dependencies: hnswlib for fast distance computation, ygg for WBT definition.

Installation

This is a header-only library, just include the header wow/index.hh in your project.

Quick Start

Python Example

We provide a python binding for WoW index to support a variety of attribute types, which is implemented using pybind11. Please refer to the README in the python folder for more details. We provide a python example to build and search the WoW index. The example is in the example/ folder: example_sequential.py and example_arbitrary.py. These examples require no dataset configuration by creating random vectors and attribute values and are good for a quick start.

C++ Example

We provide a list of detailed shell scripts to run the WoW index on benchmarked datasets using real-world datasets. The dataset format is described in the following section.

example/build_wow_bench.sh: build a WoW index taken the vector ID as the attribute.
example/build_wow_common.sh: build a WoW index for arbitrary attributes (allow duplicated attribute values).
example/search_wow_bench.sh: search the top-k nearest neighbors with range filter using WoW index for benchmarking, given the ranges with fractions varying from $2^{-16}-2^0$ including a mixed range fraction workload. Refer to the paper for the detailed introduction.
example/search_wow_common.sh: search the top-k nearest neighbors with range filter using WoW index for arbitrary attributes (allow duplicated attribute values).
example/search_wow_generic.sh: we also notice that WoW can be used as a generic predicate-filtering index for ANN search (just like ACORN). However, the performance is not guaranteed to be the best but it is still competitive (10% points to pass the filter, reach 97% recall), this will use the index generated by example/build_wow_bench.sh.

Datasets

If you want to test with your own datasets using our scripts for C++ usage, please follow the instructions below.

Vector Datasets

We provide the urls of the benchmarked vector datasets used in the paper:

SIFT1M: SIFT1M dataset, the synthetic attribute is generated by the authors.
GIST1M: GIST1M dataset, the synthetic attribute is generated by the authors.
ArXiv: a recent hybrid dataset released by QDrant. It involves vectors embedded from over two million paper titles using the all-MiniLM-L6 encoder, along with release time as the attribute.
Wikidata: a multilingual dataset released by Cohere, and the English version subset is used. Two sizes of the dataset are used in the paper: 4M and 41M (complete dataset). The attribute is the entity title, ranked by the dictionary order.
Deep: obtained from the last fully-connected layer of the GoogleNet model with image size as the attribute. Two sizes of the dataset are used in the paper: 10M and 100M.

For more details, please refer to the paper. The vector file format is .fvecs.

Attribute File Format

Base attribute is a .bin file with nb * sizeof(int) bytes, where nb is the number of vectors in the dataset. It is required by example/build_wow_common.sh and example/search_wow_common.sh.
Query range is a .bin file with nq * sizeof(int) * 2 bytes, where nq is the number of queries. It is required by all range search scripts.
For generic predicate-filtering script (example/search_wow_generic.sh), no attribute file or query range file is required. It will use the vector ID as the attribute (please build the index using vector IDs) and randomly generate the bitmap for allowed vector IDs. The --npass parameter is used to specify how many IDs can pass the filter.

Ground Truth

The search_wow_common and search_wow_bench will generate the ground truth file if it can not be found in the provided path. You can also provide your own ground truth file in .ivecs format.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
example		example
python		python
wow		wow
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
appendix.pdf		appendix.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WoW: A Window-to-Window Incremental Index for Range-Filtering Approximate Nearest Neighbor Search, SIGMOD 2026 (Round 2)

Requirements

Installation

Quick Start

Python Example

C++ Example

Datasets

Vector Datasets

Attribute File Format

Ground Truth

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

nju-websoft/WoW

Folders and files

Latest commit

History

Repository files navigation

WoW: A Window-to-Window Incremental Index for Range-Filtering Approximate Nearest Neighbor Search, SIGMOD 2026 (Round 2)

Requirements

Installation

Quick Start

Python Example

C++ Example

Datasets

Vector Datasets

Attribute File Format

Ground Truth

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages