Skip to content

WoW: A Window-to-Window Incremental Index for Range-Filtering Approximate Nearest Neighbor Search, SIGMOD 2026

License

Notifications You must be signed in to change notification settings

nju-websoft/WoW

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WoW: A Window-to-Window Incremental Index for Range-Filtering Approximate Nearest Neighbor Search, SIGMOD 2026 (Round 2)

This is the header-only library of WoW, for the benchmarking repository, please check here.

Despite for range-filtering scenarios, it can also be used as a generic predicate-filtering index.

Requirements

  • A modern C++ compiler that supports C++20
  • CMake 3.20 or higher
  • OpenMP
  • Dependencies: hnswlib for fast distance computation, ygg for WBT definition.

Installation

This is a header-only library, just include the header wow/index.hh in your project.

Quick Start

Python Example

We provide a python binding for WoW index to support a variety of attribute types, which is implemented using pybind11. Please refer to the README in the python folder for more details. We provide a python example to build and search the WoW index. The example is in the example/ folder: example_sequential.py and example_arbitrary.py. These examples require no dataset configuration by creating random vectors and attribute values and are good for a quick start.

C++ Example

We provide a list of detailed shell scripts to run the WoW index on benchmarked datasets using real-world datasets. The dataset format is described in the following section.

  • example/build_wow_bench.sh: build a WoW index taken the vector ID as the attribute.
  • example/build_wow_common.sh: build a WoW index for arbitrary attributes (allow duplicated attribute values).
  • example/search_wow_bench.sh: search the top-k nearest neighbors with range filter using WoW index for benchmarking, given the ranges with fractions varying from $2^{-16}-2^0$ including a mixed range fraction workload. Refer to the paper for the detailed introduction.
  • example/search_wow_common.sh: search the top-k nearest neighbors with range filter using WoW index for arbitrary attributes (allow duplicated attribute values).
  • example/search_wow_generic.sh: we also notice that WoW can be used as a generic predicate-filtering index for ANN search (just like ACORN). However, the performance is not guaranteed to be the best but it is still competitive (10% points to pass the filter, reach 97% recall), this will use the index generated by example/build_wow_bench.sh.

Datasets

If you want to test with your own datasets using our scripts for C++ usage, please follow the instructions below.

Vector Datasets

We provide the urls of the benchmarked vector datasets used in the paper:

  • SIFT1M: SIFT1M dataset, the synthetic attribute is generated by the authors.
  • GIST1M: GIST1M dataset, the synthetic attribute is generated by the authors.
  • ArXiv: a recent hybrid dataset released by QDrant. It involves vectors embedded from over two million paper titles using the all-MiniLM-L6 encoder, along with release time as the attribute.
  • Wikidata: a multilingual dataset released by Cohere, and the English version subset is used. Two sizes of the dataset are used in the paper: 4M and 41M (complete dataset). The attribute is the entity title, ranked by the dictionary order.
  • Deep: obtained from the last fully-connected layer of the GoogleNet model with image size as the attribute. Two sizes of the dataset are used in the paper: 10M and 100M.

For more details, please refer to the paper. The vector file format is .fvecs.

Attribute File Format

  • Base attribute is a .bin file with nb * sizeof(int) bytes, where nb is the number of vectors in the dataset. It is required by example/build_wow_common.sh and example/search_wow_common.sh.
  • Query range is a .bin file with nq * sizeof(int) * 2 bytes, where nq is the number of queries. It is required by all range search scripts.
  • For generic predicate-filtering script (example/search_wow_generic.sh), no attribute file or query range file is required. It will use the vector ID as the attribute (please build the index using vector IDs) and randomly generate the bitmap for allowed vector IDs. The --npass parameter is used to specify how many IDs can pass the filter.

Ground Truth

The search_wow_common and search_wow_bench will generate the ground truth file if it can not be found in the provided path. You can also provide your own ground truth file in .ivecs format.

About

WoW: A Window-to-Window Incremental Index for Range-Filtering Approximate Nearest Neighbor Search, SIGMOD 2026

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages