Manu_ A Cloud Native Vector Database Management System
Manu_ A Cloud Native Vector Database Management System
Rentong Guo†∗ , Xiaofan Luan†∗ , Long Xiang‡∗ , Xiao Yan‡∗ , Xiaomeng Yi†∗ , Jigao Luo†§
Qianya Cheng† , Weizhi Xu† , Jiarui Luo‡ , Frank Liu† , Zhenshan Cao† , Yanliang Qiao† , Ting Wang†
Bo Tang‡ Charles Xie†
† Zilliz
‡ Department of Computer Science and Engineering, Southern University of Science and Technology
§ Technical University of Munich
† {firstname.lastname}@zilliz.com
‡ {xiangl3@mail., yanx@, 11911419@mail., tangb3@}sustech.edu.cn, § [email protected]
arXiv:2206.13843v1 [cs.DB] 28 Jun 2022
For the sake of clarity, we only illustrate the parts related to insert Cache WAL Channel 4 Node
requests. The loggers are organized in a hash ring, and each logger Proxy
Proxy
Proxy
handles one or more logical buckets in the hash ring based on
consistent hashing. Each shard corresponds to a logical bucket in
the hash ring and a WAL channel. Each entity in insert requests
is hashed to a shard (and thus channel) based on their ID. When Figure 4: Detailed structure of Manu’s log system.
a logger receives a request, it will first verify the legibility of the
request, assign an LSN for the logged entity by consulting the Table 1: Major indexes in Manu
central time service oracle (TSO), determine the segment the entity
should go to, and write the entity to WAL. The logger also writes Vector Quantization PQ, OPQ, RQ, SQ
the mapping of the new entity ID to segment ID into a local LSM Inverted Index IVF-Flat, IVF-PQ, IVF-SQ, IVF-HNSW, IMI
tree and periodically flushes the incremental part of the LSM tree Proximity Graph HNSW, NSG, NGT
Numerical Attribute B-Tree, Sorted List
to object storage, which keeps the entity to segment mapping using
the SSTable format of RocksDB. Each logger caches the segment
mapping (e.g., for checking if the entity to delete exists) for the can be stale for up to delta time units, with respect to time of the last
shards it manages by consulting the SSTable in object storage. data update, where delta is an user-specified “staleness tolerance”
The WAL is row-based and read in a streaming manner for low given in virtual time.
delay and fine-grained log pub/sub. It is implemented via a cloud- In practice, users prefer to define temporal tolerance as physical
based message queue such as Kafka or Pulsar. We use multiple time, e.g., 10 seconds. Manu achieves this by making the LSN as-
logical channels for the WAL in order to prevent different types of signed to each request extremely close to physical time. Manu uses
requests from interfering with each other, thus achieving a high a hybrid logical clock in the TSO to generate timestamps. Each
throughput. Data definition requests and system coordination mes- timestamp has two components: a physical component that tracks
sages use their own channels while data manipulation requests physical time, and a logical component that tracks event order. The
hashed across multiple channels to increase throughput. logical component is needed since multiple events may happen at
Data nodes subscribe to the WAL and convert the row-based the same physical time unit. Since a timestamp is used as a request’s
WALs into column-based binlogs. Specifically, values from the same LSN, the value of the physical component indicates the physical
field (e.g., attribute and vector) from the WALs are stored together time when the request was received by Manu.
in a column format in binlog files. The column-based nature of For a log subscriber, e.g., a query node, to run the delta consis-
binlog makes it suitable for reading per field values in batches, thus tency model, it needs to know three things: (1) the user-specified
increasing storage and IO efficiency. An example of this efficiency staleness tolerance 𝜏, (2) the time of the last data update, and (3) the
comes with the index nodes. Index nodes only read the required issue time of the search request. In order to let each log subscriber
fields (e.g., attribute or vector) from the binlog for index building know (2), we introduce a time-tick mechanism. Special control mes-
and thus are free from the read amplifications. sages called time-ticks (similar to watermarks in Apache Flink [26])
are periodically inserted into each log channel (for example, WAL
System coordination: Inter-component messages are also passed
channel) signaling the progress of data synchronization. Denote
via log, e.g., data nodes announce when segments are written to
the latest time-tick a subscriber consumed as 𝐿𝑠 and the issue time
storage and index nodes announce when indexes have been built.
of a query as 𝐿𝑟 , if 𝐿𝑟 − 𝐿𝑠 < 𝜏 is not satisfied, the query node will
This is because the log system provides a simple and reliable mecha-
wait for the next time-tick before executing the query.
nism for broadcasting system events. Moreover, the time semantics
Note that strong consistency and eventual consistency are two
of the log system provide a deterministic order for coordination
special cases of delta consistency, where delta equals to 0 and infin-
messages. For example, when a collection should be released from
ity, respectively. To the best of our knowledge, our work is the first
memory, the query coordinator publishes the request to log, and
to support delta consistency in a vector database.
does not need to confirm whether the query nodes receive the mes-
sage or handle query node failure. The query nodes independently
3.5 Index Building
subscribe to the log and asynchronously release segments of the
collection. Searching similar vectors in large collections by brute-force, i.e.,
scanning the whole dataset, usually yields unacceptably long de-
lays. Numerous indexes have been proposed to accelerate vector
3.4 Tunable Consistency search and Manu automatically builds user specified indexes. Ta-
We adopt a delta consistency model to enable flexible performance- ble 1 summarizes the indexes currently supported by Manu, and
consistency trade-offs, which guarantees a bounded staleness of we are continuously adding new indexes following the latest in-
data seen by search queries. Specifically, the data seen by a query dexing algorithms. These indexes differ in their properties and use
cases. Vector quantization (VQ) [34, 45] methods compress vectors entity similarity function. For more details about how Manu han-
to reduce memory footprint and the costs for vector distance/simi- dles attribute filtering and multi-vector search, interested readers
larity computation. For example, scalar quantization (SQ) [91] maps can refer to Milvus [81].
each dimension of vector (data types typically are int32 and float) For vector search, Manu partitions a collection into segments
to a single byte. Inverted indexes [69] group vectors into clusters, and distributes the segments among query nodes for parallel exe-
and only scan the most promising clusters for a query. Proximity cution. 2 The proxies cache a copy of the distribution of segments
graphs [33, 42, 61] connect similar vectors to form a graph, and on query nodes by inquiring the query coordinator, and dispatch
achieve high accuracy and low latency at the cost of high mem- search requests to query nodes that hold segments of the searched
ory consumption [54]. Besides vector indexes, Manu also supports collection. The query nodes perform vector searches on their local
indexes on the attribute field of the entities to accelerate attribute- segments without coordination using a two-phase reduce procedure.
based filtering. For a top-𝑘 vector search request, the query nodes search their local
There are two index building scenarios in Manu, i.e., batch in- segments to obtain the segment-wise top-𝑘 results. These results
dexing and stream indexing. Batch indexing occurs when the user are merged by each query node to form the node-wise top-𝑘 results.
builds an index for an entire collection (e.g., when all vectors are Then, the node-wise top-𝑘 results are aggregated by the proxy for
updated with a new embedding model). In this case, the index co- the global top-𝑘 results and returned to the application. To handle
ordinator obtains the paths of all segments in the collection from the deletion of vectors, the query nodes use a bitmap to record
the data coordinator, and instructs index nodes to build indexes the deleted vectors in each segment and filter the deleted vectors
for each segment. Stream indexing happens when users contin- from the segment-wise search results. Users can configure Manu to
uously insert new entities, and indexes are built asynchronously batch search requests to improve efficiency. In this case, the proxies
on-the-fly without stopping search services. Specifically, after a seg- organize cache search requests if results of the previous batches
ment accumulates a sufficient number of vectors, its resident data have not been returned yet. In the cache, requests of the same type
node seals the segment and writes it to object storage as a binlog. (i.e., target the same collection and use the same similarity function)
The data coordinator then notifies the index coordinator, which are organized into the one batch and handled by Manu together.
instructs a index node to build index for the segment. The index Manu also allows maintaining multiple hot replicas of a collection
node loads only the required column (e.g., vector or attribute) of to serve queries for availability and throughput.
the segment from object storage for indexing building to avoid read Query nodes obtain data from three sources, i.e., the WAL, the
amplification. For entity deletions, Manu uses a bitmap to record index files, and the binlog. For data in the growing segments, query
the deleted vectors and rebuilds the index for a segment when a nodes subscribe to the WAL and conduct searches using brute
sufficient number of its entities have been deleted. In both batch force scan so that updates can be searched within a short delay. A
and stream indexing scenarios, after the required index is built for a dilemma for segment size is that larger size yields better search
segment, the index node persists it in the object storage and sends efficiency once the index is built but brute force scan on growing
the path to the index coordinator, which notifies the query coordi- segment is also more costly. To tackle this problem, we divide each
nator so that query nodes can load the index for processing queries. segment into slices (each containing 10,000 vectors by default). New
The index coordinator also monitors the status of the index nodes data are inserted into the slices sequentially, and after a slice is
and shuts down idle index nodes to save costs. As vector indexes full, a light-weight temporary index (e.g., IVF-FLAT) is built for it.
generally have sub-linear search complexity w.r.t. the number of Empirically, we observed that the temporary index brings up to 10X
vectors, searching a large segment is cheaper than several small speedup for searching growing segments. When a segment changes
segments, Manu builds joint indexes on multiple segments when from growing state to sealed state, its index will be built by an index
appropriate. node and then stored in object storage. After that, query nodes are
notified to load the index and replace the temporary index.
Query nodes access the binlog for data when the distribution
3.6 Vector Search of segments among the query nodes changes, which may happen
Manu supports classical vector search, attribute filtering, and multi- during scaling, load-balancing, query node failure and recovery.
vector search. For classical vector search, the distance/similarity Specifically, the query coordinator manages the segment distribu-
function can be Euclidean distance, inner product or angular dis- tion and monitors the query nodes for liveness and workload to
tance. Attribute filtering is useful when searching vectors similar coordinate failure recovery and scaling. On failure recovery, the
to the query subject to some attribute constraints. For example, an segments and their corresponding indexes (if they exist) handled
e-commerce platform may want to find products that interest the by failed query nodes are loaded to the healthy ones. 3 In the case
customer and cost less than 100$. Manu supports three strategies of scaling down, a query node can be removed once other query
for attribute filtering and uses a cost-based model to choose the nodes load the indexes for the segments it handles from the object
most suitable strategy for each segment. Multi-vector search is storage. When scaling up, the query coordinator assigns some of
required when an entity is encoded by multiple vectors, for exam-
ple, a product can be described by both embeddings of its image 2 Manu loads all data to the query nodes as different queries may access different parts
and embeddings of its text description. In this case, the similarity of the data, and a hot compute side cache is necessary for low latency. This is different
function between entities is defined as a composition of similarity from general cloud DBMSs that decouple compute and storage (e.g., Snowflake [29]),
which only fetch the required data to compute side upon request.
functions on the constituting vectors. Manu supports two strategies 3 The WAL channels subscribed to by failed query nodes are also assigned to healthy
for multi-vector search and chooses the one to use according to the ones.
Table 2: Main commands of Python-based PyManu API
Methods Description
Collection(name, schema) Create collection with name str and schema schema
Collection.insert(vec) Insert vector vec into collection
Collection.delete(expr) Delete vectors satisfying boolean expression expr from collection
Collection.create_index(field, params) Create index on a field of the vectors using parameters params
Collection.search(vec, params) Vector search for vec with parameters params
Collection.query(vec, params, expr) Vector search for vec with boolean expression expr as filters
the segments to the newly added nodes. A new query node can join applications can migrate with little or no change across different
after it loads the assigned segments, and existing query nodes can deployment scenarios.
release the segments no longer handled by them. The query coordi-
nator also balances the workloads (and memory consumption) of 4.2 Good Usability
the query nodes by migrating segments. Note that Manu does not
Data pipelines interact with Manu in simple ways: vector collec-
ensure that segment redistribution is atomic, and a segment can re-
tions, updates for vector data and search requests are fed to Manu,
side on more than one query node. This does not affect correctness
and Manu returns the identifiers of the search results for each
as the proxies remove duplicate result vectors for a query.
search request, which can be used to retrieve objects (e.g.. images,
advertisements, movies) in other systems. Because different users
4 FEATURE HIGHLIGHTS adopt different programming languages and development envi-
In this part, we introduce several key features of Manu for usability ronments, Manu provides APIs in popular languages including
and performance. Python, Java, Go, C++, along with RESTful APIs. As an example, we
show key commands of the Python-based PyManu API in Table 2,
which uses the object-relational mapping (ORM) model and most
4.1 Cloud Native and Adaptive commands are related to the collection class. As shown in Table 2,
The primary design goal of Manu is to be a cloud native vector PyManu allows users to manage collections and indexes, update
database such that it fits well into cloud-based data pipelines. To collections, and conduct vector searches. The search command is
this end, Manu decouples system functionalities into storage, coor- used for similarity-based vector search while the query command
dinators, and workers in the overall design. For storage, Manu uses is mainly used for attribute filtering. We show an example of con-
a transaction KV for metadata, message queues for logs, and an ob- ducting top-𝑘 vector search by specifying the parameters in params
ject KV for data, which are all general storage services provided by in as follows.
major cloud vendors and thus enables easy deployment. For coordi- query_param = {
nators that manage system functionalities, Manu uses the standard "vec": [[0.6, 0.3, ..., 0.8]],
one main plus two hot backups configuration for high availability. "field": "vector",
For workers, Manu decouples vector search, log archiving and in- "param": {"metric_type": "Euclidean"},
dex building tasks for component-wise scaling, a model suitable for "limit": 2,
cloud-based on-demand resource provisioning. The log backbone "expr": "product_count > 0",
}
allows the system components to interact by writing/reading logs
res = collection.search(**query_param)
in their own ways. This enables the system components to evolve
independently and makes it easy to add new components. The log In the above example, the search request provides a high dimen-
backbone also provides consistent time semantics in the system, sional vector [0.6, 0.3, ..., 0.8] as query and searches the feature
which are crucial for deterministic execution and failure recovery. vector field of the collection. The similarity function is Euclidean
Our customers use vector databases in the entire life-cycle of distance and the targets are the top-2 similar vectors in the collec-
their applications. For example, an application usually starts with tion (i.e., with limit = 2).
data scientists conducting proof of concept (PoC) on their personal For easy system management, Manu provides a GUI tool called
computers. Then, it is migrated to dedicated clusters for testing Attu, for which a screen shot is shown in Figure 5. In the system view,
and finally deployed on the cloud. Thus, to reduce migration costs, users can observe overall system status including queries processed
our customers expect vector databases to adapt to different deploy- per second (QPS), average query latency, and memory consumption
ment scenarios while providing a consistent set of APIs. To this on the top of screen. By clicking a specific service (e.g., data service),
end, Manu defines unified interface for the system components users can view detailed information of the worker nodes for the
but provides different invocation methods and implementations for service on the side. We also allow users to add and drop worker
different platforms. For example, on cloud, local cluster and per- nodes with mouse clicks. In the collection view, users can check the
sonal computer, Manu uses cloud service APIs, remote procedure collections in the system, load/dump collections to/from memory,
call (RPC) and direct function calls to invoke system functionali- delete/import collections, check the index built for the collections,
ties, respectively. The object KV can be the local file system (e.g., and build new indexes. In the vector search view, users can check
MinIO [8]) on personal computers, and S3 on AWS. Thus, Manu the search traffic and performance on each collection, configure the
0 D Q X N 0 D Q X N 0 D Q X N
0 L O Y X V N 0 L O Y X V N 0 L O Y X V N
1.25
1.00
Latency (s)
0.75
0.50
0.25
0.00
0 500 1000 1500
Time (s)
Figure 6: Manu and Milvus for mixed workloads, numbers
Figure 5: A screenshot of Attu, the GUI tool of Manu.
behind legends (e.g., 1k) indicate insertion rate.
index and search parameters to use for each collection. The vector SSD is 100x cheaper than dram and offers 10x larger bandwidth
search view also allows to issue queries for functionality test. than HDD. thus, Manu supports using SSD to store large vector
For vector search, using different parameters for the indexes collections on cheap query nodes with limited dram capacity. the
(e.g., neighbor size 𝑀 and queue size 𝐿 for HNSW [61]) yields dif- challenge is that SSD bandwidth is still much smaller than dram,
ferent trade-offs among cost, accuracy, and performance. However, which may lead to low query processing throughput and thus ne-
even experts find it difficult to set proper index parameters as the cessitates careful designs for storage layout and index structure. as
parameters are interdependent and their influences vary across SSD reads are conducted with 4kb blocks (i.e., reading less than
collections. Manu adopts a Bayesian Optimization with Hyperband 4kb has the same cost as reading 4kb), Manu organizes the vectors
(BOHB) [32] method to automatically explore good index param- into buckets whose sizes are close to but smaller than 4kb. 4 this is
eter configurations. Users provide a utility function to score the achieved by conducting hierarchical k-means for the vectors and
configurations (e.g., according to search recall, query throughput) controlling the sizes of the clusters. each bucket is stored on 4kb
and set a budget to limit the costs of parameter search. BOHB starts aligned blocks on SSD for efficient read and represented by its k-
with a group of initial configurations and evaluates their utilities. means center in dram. these centers are organized using existing
Then, Bayesian Optimization is used to generate new candidate vector search indexes (e.g., ivf-flat, hnsw).
configurations according to historical trials and Hyperband is used vector search with SSD is conducted in two stages. first, we
to allocate budgets to different areas in the configuration space. search the cluster centers in dram for the ones that are most similar
The idea is to prioritize the exploration of areas close to high util- to the query. then, the corresponding buckets are loaded from
ity configurations to find even better configurations. Manu also SSD for scan. to reduce the amount of data fetched from SSD, we
supports sampling a subset of the collection for the trails to reduce compress the vectors using scalar quantization, which has negligible
search costs. We are still improving the automatic parameter search influence on the quality of search results according to our trials.
module and plan to extend it to searching system configurations another problem is that k-means can put vectors similar to a query
(e.g., the number and type of query nodes). into several buckets but the centers of some buckets may not be
similar to the query, which leads to a low recall. to tackle this
4.3 Time Travel problem, Manu uses a strategy similar to multiple hash tables in
Users often need to rollback the database to fix corrupted data or locality sensitive hashing [41]. hierarchical k-means is conducted
code bugs. Manu allows users to specify a target physical time 𝑇 by multiple times, each time assigning a vector to a bucket. this
for database restore, and jointly uses checkpoint and log replay for means that a vector is replicated multiple times in SSD and we index
rollback. We mark each segment with its progress 𝐿 and periodi- all cluster centers for bucket search in dram. Manu’s SSD solution
cally checkpoints the segment map for a collection, which contains wins track 2 (search with SSD) of the billion-scale approximate
information (such a route, rather than data) of all its segments. To nearest neighbor search challenge at neurips’2021 [3]. tests results
restore the database at time 𝑇 , we read the closest checkpoint before show that Manu’s solution improves the recall of the competition
𝑇 , load all segments in the segment map and replay the WAL log for baseline by up to 60% at the same query processing throughput. 5
each segment from its local progress 𝐿. This design reduces storage we notice that another work adopts similar designs for SSD-based
consumption as we do not write entire collection for each check- vector search [27].
point. Instead, segments that have no changes are shared among
checkpoints. The replay overhead is also reduced as each segment 5 USE CASES AND EVALUATION
has its own progress. Users can also specify a expiration period to Before introducing the use cases of Manu, we first compare Manu
delete outdated log and segments to reduce storage consumption. with Milvus, our previous vector database. Milvus adopts an even-
tual consistency model and thus does not support the tunable con-
4.4 Hardware Optimizations sistency of Manu. To show the advantages brought by Manu’s
Manu comes with extensively optimized implementations for CPU, 4 we set the bucket size to a few times (e.g., 4 and 8) of 4kb if the size of an individual
GPU and SSD for efficiency. For more details about our CPU and vector is large.
GPU optimizations, interested readers can refer to Milvus [81]. 5 for more details about results please refer to [72].
0 D Q X + 1 6 : 9 D O G 1 * 7 9 H D U F K + 1 6 : ( 6 + 1 6 :
Movie Finance
0 D Q X , 9 ) B ) / $ 7 9 H V S D + 1 6 : 9 H D U F K , 9 ) B ) / $ 7
Music News
Information 10k
8k
8k
Dialogue
8k
)
Search latency
Latency ms
6k
200 6k
4k
(
150
4k
4k
100
2k 2k
2k
50
2 4 6 8 10 2 4 6 8 10
0 Query Node Number Query Node Number
0 4 8 12
Time hour
(
16
)
20 24
SIFT10M (Euclidean) DEEP10M (Inner Product)
Figure 9: Search workload, query latency, and number of Figure 10: Scalability of Manu w.r.t. query nodes.
query nodes used by Manu over time. Different colors indi-
0 D Q X , 9 ) 0 D Q X + 1 6 : 9 H D U F K , 9 ) 9 H D U F K + 1 6 :
cate different number of query nodes are used.
10k
8k
8k
Latency (ms)
300 imity graph (PG). SPT algorithms divide the space into areas, and
300 use tree structures to quickly narrow down search results to some
200 200 areas [30, 58, 64, 71, 80]. LSH algorithms design hash functions such
100 100 that similar vectors are hashed to the same bucket with high prob-
ability, and examples include [35, 36, 41, 43, 53, 56, 57, 60, 70, 90].
0 0 1000 2000 0 0 1000 2000 VQ algorithms compress vectors and accelerate similarity compu-
Grace time(ms) Grace time (ms)
tation by quantizing the vectors using a set of vector codebooks,
(a) SIFT10M (Euclidean) (b) DEEP10M (Inner Product) and well-known VQ algorithms include [23, 34, 42, 45, 91]. PG al-
Figure 12: The relation between search latency and grace gorithms form a graph by connecting a vector with those most
time, legends stand for the time tick interval. similar to it in the dataset, and conduct vector search by graph
walk [33, 61, 73, 89]. Different algorithms have different trade-offs,
3000 2000
HNSW HNSW
e.g., LSH is cheap in indexing building but poor in result quality,
2500 IVF_FLAT
1500
IVF_FLAT VQ reduces memory and computation but also harms result quality,
2000 PG has high efficiency but requires large memory. Manu supports
Time (s)
Time (s)
1500 1000 a comprehensive set of search algorithms such that users can trade
1000 off between different factors.
500
500 Vector databases. Vector data management solutions have gone
0 0 through two stages of development. Solutions in the first stage
20M 40M 60M 80M 100M 20M 40M 60M 80M 100M
Dataset Size Dataset Size are libraries (e.g., Facebook Faiss [46], Microsoft SPTAG [16], HN-
(a) SIFT (Euclidean) (b) DEEP (Inner Product) SWlib [61] and Annoy [1]) and plugins (e.g., ES plugins [6], Postgres
plugins [12]) for vector search. They are insufficient for current ap-
Figure 13: Index construction time of Manu vs. data volume. plications as full-fledged management functionalities are required,
e.g., distributed execution for scalability, online data update, and
viruses and develop specialized algorithms to map virus and user failure recovery. Two OLAP database systems, AnalyticDB-V [82]
APK to vector embedding. To conduct a virus scan, they find viruses and PASE [85] support vector data by adding a table column to
in their base that have embedding similar to the query APK and then store them but lacks optimizations tailored for vector data.
compare the search results with the APK in more detail. They have The second stage solutions are full-fledged vector databases such
two requirements for vector DBMS: (1) short delay for streaming as Vearch [52], Vespa [19], Weaviate [20], Vald [18], Qdrant [13],
update as new viruses (vectors) are continuously added to their Pinecone [11], and our previous effort Milvus [81]. 6 Vearch uses
virus base and vector search needs to observe the latest viruses Faiss as the underlying search engine and adopts a three-layer ag-
with a short delay; (2) fast index building as they frequently adjust gregation procedure to conduct distributed search. Similarly, Vespa
their embedding algorithm to fix problems, which leads to update distributes data over nodes for scalability. A modified version of the
of the entire dataset and requires to rebuild index. HNSW algorithm is used to support online updates for vector data,
In Figure 12, we show the average delay of search requests for and Vespa also allows attribute filtering during search and learning-
Manu. Recall that grace time (i.e., 𝜏) means that a search request based inference on search results (e.g., for re-ranking). Weaviate
must observe updates that happen time 𝜏 before it, and is con- adopts a GraphQL interface and allows storing objects (e.g., texts,
figurable by users. The legends correspond to different time tick images), properties, and vectors. Users can directly import vec-
interval, with which the loggers write time tick to WAL. The results tors or customize embedding models to map objects to vectors,
show that search latency decreases quickly with grace time, and and Weaviate can retrieve objects based on vector search results.
shorter time tick interval results in shorter search latency. This is Vald supports horizontal scalability by partitioning a vector dataset
because with longer grace time, search requests can tolerate longer into segments and builds indexes without stopping search services.
update delay and are less likely to wait for updates. When the time Qdrant is a single-machine vector search engine with extensive
tick interval reduces, each segment can confirm that all updates support for attribute filtering. It allows filtering with various data
have been received more quickly, thus the search requests wait for types and query conditions (e.g., string matching, numerical ranges,
a shorter time. In Figure 13, we report the index building time of geo-locations), and uses a tailored optimizer to determine the fil-
Manu when changing data volume. The results show that index tering strategy. Note that Vespa, Weaviate and Vald only support
building time scales linearly with data volume. This is because proximity graph index.
Manu builds index for each segment and larger data volume leads We can observe that these vector databases focus on different
to more segments. functionalities, e.g., learning-based inference, embedding genera-
tion, object retrieval, and attribute filtering. Thus, we treat evolv-
6 RELATED WORK ability as first class priority when design Manu such that new func-
tionalities can be easily introduced. Manu also differs from these
Vector search algorithms. Vector search algorithms have a long
research history, and most works focus on efficient approximate 6 Pinecone is offered as SaaS and closed source. Thus, we do not know its design
search on large-scale datasets. Existing algorithms can be roughly details.
vector databases in important perspectives. First, the log backbone
of Manu provides time semantics and allows tunable consistency. • Multi-way search: Many applications jointly search multiple
Second, Manu decomposes system functionalities with fine granu- types of contents, e.g., vector and primary key, vector and text.
larity and instantiates them as cloud services for performance and The log system of Manu allows to add search engines for other
failure isolation, and thus is more suitable for cloud deployment. contents (e.g., primary key and text) as co-processors by sub-
Third, Manu comes with more comprehensive optimizations for scribing to the log stream. We will explore how multiple search
usability and performance, e.g., support various indexes, hardware engines can interact efficiently and how to flexibly coordinate
tailored implementations, and GUI tools. different search engines to meet application requirements.
Cloud native databases. Many OLAP databases are designed to
• Modularized algorithms: We think vector search algorithms
run on the cloud recently and examples include Redshift [38], Big-
can be distilled into independent components, e.g., compression
Query [62], Snowflake [29] and AnalyticDB [88]. Redshift is a data
for memory reduction and efficient computation, indexing for
warehouse system offered as a service on Amazon Web Service and
limiting computation to a small portion of vectors, and bucketing
adopts a shared-nothing architecture. It scales by adding or remov-
for grouping similar vectors. Existing vector search algorithms
ing EC2 instances, and data is redistributed in the granularity of
only explore some combinations of techniques for different com-
columns. Snowflake uses a shared-data architecture by delegating
ponents. We will provide a unified framework for vector search
data storage to Amazon S3. Compute nodes are stateless and fetch
such that users can flexibly combine different techniques accord-
read-only copies of data for tasks, and thus can be easily scaled. For
ing to their desired trade-off between cost and performance.
efficiency, high-performance local disk is used to cache hot data.
Aurora [78] and PolarDB Serverless [51] are two cloud native • Hierarchical storage aware index: Current vector search in-
OLTP databases. Aurora uses a shared-disk architecture and pro- dex assumes a single type of storage, e.g., GPU memory, main
poses the “log is database" principle by pushing transaction pro- memory or disk. We will explore indexes that can jointly utilize all
cessing down to the storage engine. It observes that the bottleneck devices on the storage hierarchy. For example, most applications
of cloud-based platforms has shifted from computation and storage have some hot vectors (e.g., popular products in e-commerce)
IO to network IO. Thus, it only persists redo log for transaction that are frequently accessed by search requests, which can be
processing and commits transaction by processing log according to placed in fast storage. As a query accesses only a portion of
LSN. PolarDB Serverless adopts a disaggregation architecture, which the vectors and a node processes many concurrent queries, the
uses high-speed RDMA network to decouple hardware resources storage swap latency may be hidden by pipelining.
(e.g., compute, memory and storage) as resource pools.
Our Manu follows the general design principles of cloud native • Advanced hardware: NVM [67] costs about one-third of DRAM
databases to decouple the system functionalities at fine granularity for unit capacity but provides comparable read bandwidth and
for high elasticity, fast evolution and failure isolation. However, we latency comparable, which makes it a good choice for replacing
also consider the unique design opportunities of vector databases to expensive DRAM when storing large datasets. RDMA [25, 47]
trade the simple data model and weak consistency requirement for significantly reduces the communication latency among nodes,
performance, cost and flexibility. Specifically, complex transactions and NVLink [66] directly connects GPUs with much larger band-
are not supported and the log backbone is utilized to support tunable width than PCIe. By exploiting these fast connections, we will
consistency-performance trade-off. Moreover, vector search, index explore indexes and search algorithms that jointly use multiple
building and log archiving tasks are further decoupled as their devices. We are also working with hardware vendors to apply
workloads may have significant variations. FPGA and MLU for vector search and index building.
• Embedding generation toolbox: For better application level-
integration, we plan to incorporate a application-oriented toolbox
for generating embedding vectors. This toolbox would incorpo-
7 CONCLUSIONS AND FUTURE DIRECTIONS rate model fine-tuning in addition to providing a number of
pre-trained models that can be used out-of-the-box, allowing for
In this paper, we introduce the designs of Manu as a cloud native
rapid prototyping.
vector database. To ensure that Manu suits vector data applications,
we set ambitious design goals, which include good evolvability, tun-
able consistency, high elasticity, good efficiency, and etc. To meet
these design goals, Manu trades the simple data model of vectors ACKNOWLEDGMENTS
and weak consistency requirement of applications for performance, Manu is a multi-year project open sourced by Zilliz. The devel-
costs and flexibility. Specifically, Manu conducts fine-grained de- opment of Manu involves many engineers in its community. In
coupling of the system functionalities for component-wise scaling particular, we thank Bingyi Sun, Weida Zhu, Yudong Cai, Yihua
and evolution, and uses the log backbone to connect the system Mo, Xi Ge, Yihao Dai, Jiquan Long, Cai Zhang, Congqi Xia, Xuan
components while providing time semantics and simplifying inter- Yang, Binbin Lv, Xiaoyun Liu, Wenxing Zhu, Yufen Zong, Jie Zeng,
component interaction. We also introduce important features such Shaoyue Chen, Jing Li, Zizhao Chen, Jialian Ji, Min Tian, Yan Wang
as high-level API, GUI tool, hardware optimizations, and complex and all the other contributors in the community for their contri-
search. We think Manu is still far from perfect and some of our butions. We also thank Filip Haltmayer for proofreading the paper
future directions include: and valuable suggestions to improve paper quality.
REFERENCES [34] Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2013. Optimized product
[1] 2021. Annoy: Approximate Nearest Neighbors Oh Yeah. https://github.com/ quantization for approximate nearest neighbor search. In Proceedings of the IEEE
spotify/annoy. Conference on Computer Vision and Pattern Recognition. 2946–2953.
[2] 2021. Benchmarks for Billion-Scale Similarity Search. https://research.yandex. [35] Aristides Gionis, Piotr Indyk, Rajeev Motwani, et al. 1999. Similarity search in
com/datasets/biganns. high dimensions via hashing. In Vldb, Vol. 99. 518–529.
[3] 2021. Billion-Scale Approximate Nearest Neighbor Search Challenge. https://big- [36] Long Gong, Huayi Wang, Mitsunori Ogihara, and Jun Xu. 2020. iDEC: indexable
ann-benchmarks.com. distance estimating codes for approximate nearest neighbor search. Proceedings
[4] 2021. binlog. https://hevodata.com/learn/using-mysql-binlog/. of the VLDB Endowment 13, 9 (2020).
[5] 2021. Datasets for approximate nearest neighbor search. http://corpus-texmex. [37] Ruiqi Guo, Sanjiv Kumar, Krzysztof Choromanski, and David Simcha. 2016. Quan-
irisa.fr/. tization based fast inner product search. In Artificial Intelligence and Statistics.
[6] 2021. ElasticSearch: Open Source, Distributed, RESTful Search Engine. https: PMLR, 482–490.
//github.com/elastic/elasticsearch. [38] Anurag Gupta, Deepak Agarwal, Derek Tan, Jakub Kulesza, Rahul Pathak, Ste-
[7] 2021. etcd. https://etcd.io/. fano Stefani, and Vidhya Srinivasan. 2015. Amazon Redshift and the Case for
[8] 2021. MinIO. https://min.io/. Simpler Data Warehouses. In Proceedings of the 2015 ACM SIGMOD International
[9] 2021. MySQL. https://www.mysql.com/. Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June
[10] 2021. NGT. https://github.com/yahoojapan/NGT. 4, 2015. ACM, 1917–1923.
[11] 2021. Pinecone. https://www.pinecone.io/. [39] Michael Hersovici, Michal Jacovi, Yoelle S Maarek, Dan Pelleg, Menachem Shtal-
[12] 2021. PostgreSQL: The World’s Most Advanced Open Source Relational Database. haim, and Sigalit Ur. 1998. The shark-search algorithm. an application: tailored
https://www.postgresql.org/. web site mapping. Computer Networks and ISDN Systems 30, 1-7 (1998), 317–326.
[13] 2021. Qdrant. https://qdrant.tech/. [40] Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu
[14] 2021. S3. https://aws.amazon.com/cn/s3/. Tang, Yuxing Zhou, Menglong Huang, et al. 2020. TiDB: a Raft-based HTAP
[15] 2021. siri. https://www.apple.com/siri/. database. Proceedings of the VLDB Endowment 13, 12 (2020), 3072–3084.
[16] 2021. SPTAG: A library for fast approximate nearest neighbor search. https: [41] Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards
//github.com/microsoft/SPTAG. removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM
[17] 2021. User Behavior Data from Taobao for Recommendation. https://tianchi. symposium on Theory of computing. 604–613.
aliyun.com/dataset/dataDetail?dataId=649. [42] Masajiro Iwasaki and Daisuke Miyazaki. 2018. Optimization of indexing based on
[18] 2021. Vald. https://github.com/vdaas/vald. k-nearest neighbor graph for proximity search in high-dimensional data. arXiv
[19] 2021. Vespa. https://vespa.ai/. preprint arXiv:1810.07355 (2018).
[20] 2021. Weaviate. https://github.com/semi-technologies/weaviate. [43] Omid Jafari, Parth Nagarkar, and Jonathan Montaño. 2020. mmLSH: A Practical
[21] 2021. Xiaoice. https://en.wikipedia.org/wiki/Xiaoice. and Efficient Technique for Processing Approximate Nearest Neighbor Queries on
[22] Reza Akbarinia, Esther Pacitti, and Patrick Valduriez. 2007. Best Position Algo- Multimedia Data. In International Conference on Similarity Search and Applications.
rithms for Top-k Queries. In Proceedings of the 33rd International Conference on Springer, 47–61.
Very Large Data Bases (Vienna, Austria). VLDB Endowment, 495–506. [44] Mohsen Jamali and Martin Ester. 2010. A matrix factorization technique with
[23] Fabien André, Anne-Marie Kermarrec, and Nicolas Le Scouarnec. 2015. Cache trust propagation for recommendation in social networks. In Proceedings of the
Locality is Not Enough: High-Performance Nearest Neighbor Search with Product fourth ACM conference on Recommender systems. 135–142.
Quantization Fast Scan. Proc. VLDB Endow. 9, 4, 288–299. [45] Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization
[24] Artem Babenko and Victor S. Lempitsky. 2015. The Inverted Multi-Index. IEEE for nearest neighbor search. IEEE transactions on pattern analysis and machine
Trans. Pattern Anal. Mach. Intell. 37, 6 (2015), 1247–1260. https://doi.org/10.1109/ intelligence 33, 1 (2010), 117–128.
TPAMI.2014.2361319 [46] Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity
[25] Wei Cao, Yingqiang Zhang, Xinjun Yang, Feifei Li, Sheng Wang, Qingda Hu, Xun- search with gpus. IEEE Transactions on Big Data (2019).
tao Cheng, Zongzhi Chen, Zhenjun Liu, Jing Fang, et al. 2021. PolarDB Serverless: [47] Anuj Kalia, Michael Kaminsky, and David G Andersen. 2014. Using RDMA
A Cloud Native Database for Disaggregated Data Centers. In Proceedings of the efficiently for key-value services. In Proceedings of the 2014 ACM Conference on
2021 International Conference on Management of Data. 2477–2489. SIGCOMM. 295–306.
[26] Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, [48] Timothy King. 2019. 80 Percent of Your Data Will Be Unstructured in Five
and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a Years. https://solutionsreview.com/data-management/80-percent-of-your-data-
single engine. Bulletin of the IEEE Computer Society Technical Committee on Data will-be-unstructured-in-five-years/.
Engineering 36, 4 (2015). [49] Yann LeCun, Yoshua Bengio, et al. 1995. Convolutional networks for images,
[27] Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zhiyong Zheng, speech, and time series. The handbook of brain theory and neural networks 3361,
Mao Yang, and Jingdong Wang. 2021. SPANN: Highly-efficient Billion-scale 10 (1995), 1995.
Approximate Nearest Neighborhood Search. Advances in Neural Information [50] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature
Processing Systems 34 (2021). 521, 7553 (2015), 436–444.
[28] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks [51] Feifei Li. 2019. Cloud native database systems at Alibaba: Opportunities and
for youtube recommendations. In Proceedings of the 10th ACM conference on Challenges. Proc. VLDB Endow. 12, 12 (2019), 2263–2272.
recommender systems. 191–198. [52] Jie Li, Haifeng Liu, Chuanghua Gui, Jianyu Chen, Zhenyuan Ni, Ning Wang,
[29] Benoît Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin and Yuan Chen. 2018. The design and implementation of a real time visual
Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, search system on jd e-commerce platform. In Proceedings of the 19th International
Jiansheng Huang, Allison W. Lee, Ashish Motivala, Abdul Q. Munir, Steven Pelley, Middleware Conference Industry. 9–16.
Peter Povinec, Greg Rahn, Spyridon Triantafyllis, and Philipp Unterbrunner. 2016. [53] Mingjie Li, Ying Zhang, Yifang Sun, Wei Wang, Ivor W Tsang, and Xuemin
The Snowflake Elastic Data Warehouse. In Proceedings of the 2016 International Lin. 2020. I/O efficient approximate nearest neighbour search based on learned
Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, functions. In 2020 IEEE 36th International Conference on Data Engineering (ICDE).
USA, June 26 - July 01, 2016. ACM, 215–226. IEEE, 289–300.
[30] Sanjoy Dasgupta and Yoav Freund. 2008. Random projection trees and low [54] Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Mingjie Li, Wenjie Zhang, and
dimensional manifolds. In Proceedings of the fortieth annual ACM symposium on Xuemin Lin. 2019. Approximate nearest neighbor search on high dimensional
Theory of computing. 537–546. data—experiments, analyses, and improvement. IEEE Transactions on Knowledge
[31] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. and Data Engineering 32, 8 (2019), 1475–1488.
BERT: Pre-training of Deep Bidirectional Transformers for Language Under- [55] Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personalized news
standing. In Proceedings of the 2019 Conference of the North American Chap- recommendation based on click behavior. In Proceedings of the 15th international
ter of the Association for Computational Linguistics: Human Language Tech- conference on Intelligent user interfaces. 31–40.
nologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 [56] Wanqi Liu, Hanchen Wang, Ying Zhang, Wei Wang, and Lu Qin. 2019. I-LSH:
(Long and Short Papers). Association for Computational Linguistics, 4171–4186. I/O efficient c-approximate nearest neighbor search in high-dimensional space.
https://doi.org/10.18653/v1/n19-1423 In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE,
[32] Stefan Falkner, Aaron Klein, and Frank Hutter. 2017. Combining hyperband and 1670–1673.
bayesian optimization. In NIPS 2017 Bayesian Optimization Workshop (Dec 2017). [57] Kejing Lu and Mineichi Kudo. 2020. R2LSH: A Nearest Neighbor Search Scheme
[33] Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2019. Fast approximate Based on Two-dimensional Projected Spaces. In 2020 IEEE 36th International
nearest neighbor search with the navigating spreading-out graph. Proceedings of Conference on Data Engineering (ICDE). IEEE, 1045–1056.
the VLDB Endowment 12, 5 (2019), 461–474. [58] Kejing Lu, Hongya Wang, Wei Wang, and Mineichi Kudo. 2020. VHP: approximate
nearest neighbor search via virtual hypersphere partitioning. Proceedings of the
VLDB Endowment 13, 9 (2020), 1443–1455.
[59] Lailong Luo, Deke Guo, Richard TB Ma, Ori Rottenstreich, and Xueshan Luo. [80] Jingdong Wang, Naiyan Wang, You Jia, Jian Li, Gang Zeng, Hongbin Zha, and
2018. Optimizing bloom filter: Challenges, solutions, and comparisons. IEEE Xian-Sheng Hua. 2014. Trinary-Projection Trees for Approximate Nearest Neigh-
Communications Surveys & Tutorials 21, 2 (2018), 1912–1949. bor Search. IEEE Trans. Pattern Anal. Mach. Intell. 36, 2 (2014), 388–403.
[60] Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. 2017. In- [81] Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xi-
telligent probing for locality sensitive hashing: Multi-probe LSH and beyond. angyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, et al. 2021. Milvus:
(2017). A Purpose-Built Vector Data Management System. In Proceedings of the 2021
[61] Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate International Conference on Management of Data. 2614–2627.
nearest neighbor search using hierarchical navigable small world graphs. IEEE [82] Chuangxian Wei, Bin Wu, Sheng Wang, Renjie Lou, Chaoqun Zhan, Feifei Li, and
transactions on pattern analysis and machine intelligence 42, 4 (2018), 824–836. Yuanzhe Cai. 2020. AnalyticDB-V: A Hybrid Analytical Engine Towards Query
[62] Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shiv- Fusion for Structured and Unstructured Data. Proc. VLDB Endow. 13, 12 (2020),
akumar, Matt Tolton, and Theo Vassilakis. 2010. Dremel: Interactive Analysis of 3152–3165.
Web-Scale Datasets. Proc. VLDB Endow. 3, 1 (2010), 330–339. [83] Xiang Wu, Ruiqi Guo, Ananda Theertha Suresh, Sanjiv Kumar, Daniel N
[63] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Holtmann-Rice, David Simcha, and Felix Yu. 2017. Multiscale quantization for
Distributed representations of words and phrases and their compositionality. In fast similarity search. Advances in Neural Information Processing Systems 30
Advances in neural information processing systems. 3111–3119. (2017), 5745–5755.
[64] Marius Muja and David G Lowe. 2014. Scalable nearest neighbor algorithms [84] SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and
for high dimensional data. IEEE transactions on pattern analysis and machine Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning ap-
intelligence 36, 11 (2014), 2227–2240. proach for precipitation nowcasting. In Advances in neural information processing
[65] Michael P Papazoglou and Willem-Jan Van Den Heuvel. 2006. Service-oriented systems. 802–810.
design and development methodology. International Journal of Web Engineering [85] Wen Yang, Tao Li, Gai Fang, and Hong Wei. 2020. PASE: PostgreSQL Ultra-High-
and Technology 2, 4 (2006), 412–442. Dimensional Approximate Nearest Neighbor Search Extension. In Proceedings of
[66] Carl Pearson, I-Hsin Chung, Zehra Sura, Wen-Mei Hwu, and Jinjun Xiong. 2018. the 2020 International Conference on Management of Data, SIGMOD Conference
NUMA-aware data-transfer measurements for power/NVLink multi-GPU sys- 2020, online conference [Portland, OR, USA], June 14-19, 2020. ACM, 2241–2253.
tems. In International Conference on High Performance Computing. Springer, 448– [86] Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, Ion
454. Stoica, et al. 2010. Spark: Cluster computing with working sets. HotCloud 10,
[67] Jie Ren, Minjia Zhang, and Dong Li. 2020. Hm-ann: Efficient billion-point nearest 10-10 (2010), 95.
neighbor search on heterogeneous memory. Advances in Neural Information [87] Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural
Processing Systems 33 (2020). network regularization. arXiv preprint arXiv:1409.2329 (2014).
[68] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele [88] Chaoqun Zhan, Maomeng Su, Chuangxian Wei, Xiaoqiang Peng, Liang Lin, Sheng
Monfardini. 2008. The graph neural network model. IEEE transactions on neural Wang, Zhe Chen, Feifei Li, Yue Pan, Fang Zheng, et al. 2019. Analyticdb: Real-time
networks 20, 1 (2008), 61–80. olap database system at alibaba cloud. Proceedings of the VLDB Endowment 12,
[69] Falk Scholer, Hugh E Williams, John Yiannis, and Justin Zobel. 2002. Compression 12 (2019), 2059–2070.
of inverted indexes for fast query evaluation. In Proceedings of the 25th annual [89] Weijie Zhao, Shulong Tan, and Ping Li. 2020. SONG: Approximate Nearest Neigh-
international ACM SIGIR conference on Research and development in information bor Search on GPU. In 36th IEEE International Conference on Data Engineering,
retrieval. 222–229. ICDE 2020, Dallas, TX, USA, April 20-24, 2020. IEEE, 1033–1044.
[70] Anshumali Shrivastava and Ping Li. 2014. Asymmetric LSH (ALSH) for sublinear [90] Bolong Zheng, Zhao Xi, Lianggui Weng, Nguyen Quoc Viet Hung, Hang Liu,
time maximum inner product search (MIPS). Advances in neural information and Christian S Jensen. 2020. PM-LSH: A fast and accurate LSH framework for
processing systems 27 (2014). high-dimensional approximate NN search. Proceedings of the VLDB Endowment
[71] Chanop Silpa-Anan and Richard Hartley. 2008. Optimised KD-trees for fast image 13, 5 (2020), 643–655.
descriptor matching. In 2008 IEEE Conference on Computer Vision and Pattern [91] Wengang Zhou, Yijuan Lu, Houqiang Li, and Qi Tian. 2012. Scalar quantization for
Recognition. IEEE, 1–8. large scale image search. In Proceedings of the 20th ACM international conference
[72] Harsha Vardhan Simhadri, George Williams, Martin Aumüller, Matthijs Douze, on Multimedia. 169–178.
Artem Babenko, Dmitry Baranchuk, Qi Chen, Lucas Hosseini, Ravishankar
Krishnaswamy, Gopal Srinivasa, et al. 2022. Results of the NeurIPS’21 Chal-
lenge on Billion-Scale Approximate Nearest Neighbor Search. arXiv preprint
arXiv:2205.03763 (2022).
[73] Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar
Krishnaswamy, and Rohan Kadekodi. 2019. Rand-NSG: Fast Accurate Billion-
point Nearest Neighbor Search on a Single Node. In Advances in Neural Informa-
tion Processing Systems 32: Annual Conference on Neural Information Processing
Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada. 13748–
13758.
[74] Rebecca Taft, Irfan Sharif, Andrei Matei, Nathan VanBenschoten, Jordan Lewis,
Tobias Grieger, Kai Niemi, Andy Woods, Anne Birzin, Raphael Poss, et al. 2020.
Cockroachdb: The resilient geo-distributed sql database. In Proceedings of the
2020 ACM SIGMOD International Conference on Management of Data. 1493–1509.
[75] Gábor Takács and Domonkos Tikk. 2012. Alternating least squares for personal-
ized ranking. In Proceedings of the sixth ACM conference on Recommender systems.
83–90.
[76] Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka,
Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: a
warehousing solution over a map-reduce framework. Proceedings of the VLDB
Endowment 2, 2 (2009), 1626–1629.
[77] Aäron Van Den Oord, Sander Dieleman, and Benjamin Schrauwen. 2013. Deep
content-based music recommendation. In Neural Information Processing Systems
Conference (NIPS 2013), Vol. 26. Neural Information Processing Systems Founda-
tion (NIPS).
[78] Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam,
Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz
Kharatishvili, and Xiaofeng Bao. 2017. Amazon Aurora: Design Considerations
for High Throughput Cloud-Native Relational Databases. In Proceedings of the
2017 ACM International Conference on Management of Data, SIGMOD Conference
2017, Chicago, IL, USA, May 14-19, 2017. ACM, 1041–1052.
[79] Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun
Lee. 2018. Billion-scale commodity embedding for e-commerce recommendation
in alibaba. In Proceedings of the 24th ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining. 839–848.